Bell Computer Engineering

Bell-ComputerEngineering Bell-ComputerEngineering

User Manual: Bell-ComputerEngineering

Open the PDF directly: View PDF PDF.
Page Count: 609

DownloadBell-Computer Engineering
Open PDF In BrowserView PDF
COM[P)lUJllE~
[ENG~NlElE~~NG
A DEC VIEW OF HARDWARE SYSTEMS DESIGN

C. GORDON BELL· J. CRAIG MUDGE· JOHN E. McNAMARA

DIGITAL PRESS

Copyright

©

1978 by Digital Equipment Corporation.

All rights reserved. Reproduction of this book, in
part or in whole, is strictly prohibited. For copy information contact Digital Press, Educational Services, Digital Equipment Corporation, Bedford,
Massachusetts 01730.

Printed'in U.S.A.

I st Printing, September 1978
2nd Printing, December 1978
3rd Printing. January 1979
4th Printing, August 1979
Documentation Number JB066-A
Library of Congress Catalog Card Number 77-91677
ISBN 0-932376-00-2

The manuscript was created on a DEC Word Processing System and, via a translation program, was
typeset on Digital's DECset-8000 Typesetting System.
Cover and display pages designed by Elliott N.
Hendrickson.

To the people at Digital, especially
the engineers, and Ben

The progress which has brought the number of computers in use in the world
from dozens to millions within a generation has not been the result of a single
discovery or the work of a single inventor or company. Rather, men and women
from fields as diverse as semiconductor physics and mechanical engineering have
studied long hours and worked with various measures of inspiration and perspiration to make the discoveries and develop the technologies needed to advance
the state of the art in computer technology.
There are several aspects of the progress in computer technology which have
made it an exceptionally exciting and rewarding field for the people involved.
First of all, a great many of the major steps forward, such as the invention of the
transistor, have taken place within our lifetimes. Secondly, there has been an
opportunity to associate with many fine colleagues whose brilliance, courage of
conviction, and capacity for endless work have been a great inspiration. Finally,
there has been the great promise of computers - their ability to free men's minds
of repetitive and boring tasks, their ability to reduce the cost of producing goods,
their ability to improve the lives of so many people in so many ways - and the fun
and excitement of working with them.
In the chapters of this book, various authors relate some of their experiences in
the past twenty years, draw some conclusions about how computer technology
got to where it is, and project into the future from some of the trends they have
seen. While it is impossible in a single book to capture all of the excitement and
challenge of these years, they have done an admirable job for which they are to be
commended. Hopefully, this glimpse into the past and present will encourage the
students of the future to enter the computer engineering field and bring with them
ideas, ambition, and courage.
Kenneth H. Olsen
President
Digital Equipment Corporation
v

This book has been written for practicing computer designers, whether their
domain is microcomputers, minicomputers, or large computers, and for those
who by their contact with computer are students of design - users, programmers,
designers of peripherals and memories, and students of computer engineering and
computer science.
Computer engineering is a collage of different activities and disciplines, only
one of which - the technical aspects (multiplier design, the behavior of synchronizer circuits, and series/parallel tradeoffs, for example) - is covered by conventional texts. This book uses the case study method to show how all the different
factors (technology push, the marketplace, manufacturing, etc.) form the realworld constraints and opportunities which influence computer engineering.
Computer engineering can be thought of as a multivariable mathematical problem in which the engineer searches for an optimum within certain constraints.
Unfortunately, an optimum in one variable is rarely an optimum in another, and
thus a major portion of computer engineering is the search for reasonable compromises. A common method used to aid the search is to assign weights to various
system variables and to seek a weighted optimum. The weights vary with the
intended application. In one situation, speed might receive the maximum weight;
in another, instruction set compatibility might be the most important; and in yet
another, reliability might be paramount. The number of dimensions to the problem is large, and the meaningful measures for them are few. For example, the cost
variable is multidimensional and includes manufacturing, development, and field
support costs. In addition, there are numerous interdependencies among the variables such as the relationships between instruction set, machine organization,
logic design, and circuit design. These relationships and the contraints that control the weighting of the variables change with time. For example, the cost function changes when different subsystems use different technologies, and this
influences the relationships. In addition, constraints such as maintainability and
vii

viii

PREFACE

compatibility vary in importance from year to year. Finally, while some of the
relationships, such as the time-space tradeoff in adder design, are well understood, others, particularly those involving marketing factors, are not.
Because no theory exists to undergird this multidimensional design problem,
we believe that there is no substitute for an extensive, critical understanding of the
existing examples of designed and marketed systems. Therefore, this book uses
the case study approach. For examples, we have used the thirty DEC computers
that have been built over the twenty years that the company has existed, plus
some PDP-II-based machines built at Carnegie-Mellon University. CarnegieMellon's machines explore interconnect structures that we feel will form the basis
of future generations.
The association between DEC and Carnegie-Mellon has produced not only
some interesting machines to examine but also some of the written material for
this book. People in universities can and do write, whereas engineers directly
involved in design work are less inclined or encouraged to publish their work.
A substantial portion of the material contributed by DEC authors is historical.
We strongly believe that historical information is worth the expense in terms of
writing, reading, and learning; machine design principles and techniques change
slowly. In fact, the machines currently being designed are based on principles that
have been understood and used for years, and we are often asked, "Are we running out of design issues?" Yes, we feel technology provides the forcing function
for new designs, not new principles.
Learning about design is always important. Although new designs often appear
to be a reapplication of old principles, in the process of being reapplied they
change and go beyond their first application. Design is learned by examining and
emulating previous designs plus incorporating general principles, new use, and
new technology. Indeed, the microcomputer developments draw (or should draw)
extensively from the minicomputers. As we build new structures, we should be
able to avoid the pitfalls of the immediate past design.
We have intentionally restricted our scope to DEC computers. The reason is
obvious: we can speak with first-hand knowledge. If we had used other companies' designs, our data would have been less accurate, and some factors, e.g.,
design styles, would have been omitted. The main reason, however, is a key part
of the philosophy of the book. To understand machine design evolution, the
effects of changes in the underlying technologies, and time-invariant principles,
we must analyze a family beginning at birth and follow it over several generations
of technology. Four series of DEC computers allow such an analysis. DEC computers also provide an opportunity to study another dimension of computer engineering - the coexistence of complementary (and sometimes competing) products.
Particular design efforts must compete for resources (design talent, manufacturing-plant capacity, and software, marketing, and sales support). DEC computers have, in general, been designed to be complementary and to avoid
overlapping or redundant products. Thus, another set of constraints can be seen
at work in the design space.

PREFACE

ix

The book concerns itself with general purpose computers which are intended to
be widely available commercially. The engineering of computers for highly specialized applications, for which only a few copies are built, is not treated. Moreover, because not all major principles of computer architecture and computer
engineering are embodied in the DEC computers, the reader may want to examine
other designs, as well. For example, the reader cannot learn about descriptor
architectures, array processors, list-processing machines, or general purpose
em ulators from this book.
At one time consideration was given to postI'oning the publication of a book
until 1982, at which time DEC will celebrate its twenty-fifth anniversary. This
idea was rejected because another five years would further impede the collection
of data about the early machines. More importantly, the twenty-year period of
DEC modules and computers (1957-1977) has extended from the early second
generation to the fourth generation. Today, the processor of several DEC computers occupies a single large-scale integrated circuit consisting of several thousand transistors, whereas in 1957 only one transistor could be fabricated on a
single piece of germanium. In another five years, the design, manufacture, and
distribution of computers will be radically different - so much so as to merit a new
book.
We expect an increasingly larger number of people to be involved in computer
engineering and hence students of this material, because we expect computers as
we know them today will disappear within ten years! With the processor-on-achip, the number of computer systems designers (users) has risen by several orders
of magnitude.
In the area of large computer systems, the buyers and users are also clearly the
computer designers: they select components (from the set of available components) and interconnect them to form specific structures. It is essential for us all
to have a model of the price, performance, and reliability parameters and how
they vary with time. Previous generations have focused first on the invention of
the computer, next on the understanding of price/performance tradeoffs, and
most recently on manufacturing - especially the fabrication of the semiconductors
that now drive computer evolution. In the next five years, design will focus on
applications: conventional applications will be more efficient, computers will be
extended to reach new applications, and life-cycle costs will receive more attention. For the computer engineer, the evolution of DEC machines provides an
excellent pers~ctive on the influence of applications on design. For those of us
who must deal with design goals, constraints, and objective functions to improve
reliability, availability and maintainability, it; is imperative that we first clearly
understand previous design problems.
For the programmers who use computers and are a part of the computer design
process, understanding this material is mandatory in order to know the rules of
the game. We say comparatively little about software, other than how it has
influenced hardware design. The increasing role of software functions in the hardware domain is a clear process that has allowed (and forced) computer architecture to change. The engineering of DEC software will be treated in subsequent

x

PREFACE

volumes, perhaps one on language translators and one on operating systems. We
hope also that future volumes will be devoted to mass storage devices, terminals,
and applications.
Two notations, ISP and PMS, were introduced in the book, Computer Structures [Bell and Newell, 1971]. We continue to use them in this book, especially
since they have left the realm of notations and have become working design tools.
ISP was introduced to describe the instruction set processor of a computer - the
machine seen by the program (and programmer). ISP is now used for machine
description, simulation, verification of diagnostics, microprogramming, automatic assembler generation, and the comparison of computer architectures. The
evolution and improvement of ISP is principally due to needs of the Army/Navy
Computer Family Architecture (CFA) project and the work of Mario Barbacci.
The latest version, ISPS, is being used within DEC for implementing processors,
simulators, etc. ISPS language descriptions of current DEC machines (PDP-8,
PDP-10, PDP-II, VAX-II) and several terminals have been made. We hope that
these will be made widely available and so further stimulate the use of machinedescription languages. The widespread application of good languages would help
alleviate two current design problems: first, that of hand-crafted design tooling
keeping up with the rate of introduction of new technologies and second, the
problem of managing the ever-increasing complexity of computer structures. The
PDP-8 description presented in Appendix I has been verified by machine diagnostics, in contrast to conventional descriptions.
PMS (processor-memory-switch) notation (given in Appendix 2) has not yet
been widely used in formal methods to aid design. It has, however, been used
extensively to describe computer structures. A prototype system which recognizes
PMS and performs several performance analysis functions was constructed by
Knudsen [1972]. Currently, ISPS is being extended to include the interconnection
of computational blocks so that PMS and ISPS form a single system describing
structure and behavior. In this book, we use PMS to describe functional blocks.
However, all PMS components are enclosed to form a block diagram, unlike the
original stick notation.
The book begins with three introductory chapters. The first presents the major
themes to be illustrated by the book. We show that computer evolution has been
based primarily on semiconductor and magnetic recording technologies. These
technologies determine costs, and therefore price, performance, reliability, size,
weight, power, and other dimensions which constitute the physical characteristics
of the machines. The major theme of the book is that technology has enabled (or
forced) three types of computers to be built:

1.
2.
3.

Machines with constant performance and decreasing cost.
Machines with contant cost and increasing performance.
Radically new (large or small) structures, often research machines, which
create new computer classes outside the evolution possibilities.

PREFACE

xi

Chapter 2 traces the evolution of memory and logic technology. Engineering is
firmly rooted in economics and inherently practical. Packaging (including component interconnections) is covered in Chapter 3 for a very pragmatic reason: of
the total product cost of a small computer system, 50 percent is due to packaging
and power, and these costs are rising. To further emphasize the practical aspects
of engineering in Chapter 3, a section on high-volume manufacturing is included;
the result of a designer's creativity must not only work but be buildable by production-line methods.
Following the introductory chapters are five parts:

I.

In the Beginning

II.

Beginning of the Minicomputer

III.

The PDP-II Family

IV.

The Evolution of Computer Building Blocks

V.

The PDP-lO Family

The introductions to each part describe what to look for in the evolution of
each machine: its interaction with designers, technology, and use (marketplace).
More importantly, we have tried to point out the classic (timeless - so far) design
principles. Data that has become available since the original papers were published is also included.
Part I describes modules, the product on which DEC was initially founded.
Chapter 5 shows how modules evolved and assimilated semiconductor technology
in order to build computers.
The PDP-I and other 18-bit machines and the PDP-8 began the minicomputer
phenomenon as described in Part II. Although six computers form the 18-bit
family, there is only one chapter devoted to them, primarily because there has
been a dearth of written papers; this chapter was written for Computer Engineering. Chapter 7 shows the historical development of the 12-bit machines, and
Chapter 8 explores the structure of the PDP-8 in detail.
Part III, nearly two-thirds of the book, is based on the PDP-II. The PDP-II
has been implemented with multiple technologies and multiple design goals at a
given time, i.e., a set of machines to span a performance range. Because of cost
and performance goals, a number of problems have had to be solved to permit
subsetting (for the LSI-II) and supersetting (for the larger memory PDP-I 1/70
and for VAX-II).
Part IV is devoted to module set evolution. Chapter 18 describes the Register
Transfer Modules (RTMs, also called PDP-I6), a set of modules for building

xii

PREFACE

digital systems. Although these modules were unsuccessful in the marketplace,
they were the forerunner of the bit-slice approach now widely used for implementing mid-range processors and special-purpose digital systems. Chapter 20 describes a set of modules based on the PDP-II computer, called Computer Modules, which grew out of the original RTM research and were used to construct
Cm*, a multi-microprocessor system.
Part V covers the PDP-lO. Prior to the publication of the paper reproduced
here as Chapter 21, very little had been published at the engineering level. The
published literature had emphasized operating systems, languages, networks, and
applications.
Computer Engineering is modeled after Computer Structures [Bell and Newell,
1971] and is intended to complement the subject matter therein. Computer Structures treats the design of instruction set architectures; Computer Engineering treats
the design of machines which implement instruction sets. Computer Structures
covers a broad range of ISP structures and PMS structures, from early stack
machines and bit-serial machines, through list processors and higher level language machines, to supercomputers. By giving the seminal Burks, Goldstine, and
von Neumann paper and the Whirlwind paper, it reaches far back into history.
Computer Engineering on the other hand, takes a much narrower set of I SPs (four)
and examines their implementations in detail. Instruction set design is mentioned
only as it interacts with implementation. We focus on four computer families
from both the designer and the historical viewpoint. In particular, we emphasize
the lower level technological, economic, organizational, and environmental forces
affecting the evolution of DEC computer families.
Although this book is principally for designers and students, it will also be of
interest (as an historical record) to DEC employees who have been involved in the
design, manufacture, distribution, and servicing of the computers.
Our recommendations for the use of this text in university curricula are based
on teaching experience, requests from academic colleagues for material to teach
design, and our participation in curriculum development. The book directly addresses the philosophy of the IEEE Computer Society Task Force on Computer
Architecture [Rossman et al., 1975]: "To appreciate how the architectures of
computer systems develop, one must analyze complete systems." As such, Computer Engineering serves to complement Buchholz [1962], Bell and Newell [1971],
and Blaauw and Brooks [in preparation] in a course on computer architecture, for
example, IEEE course CO-3.*
For undergraduate courses on computer organization, such as IEEE CO-l *
and the ACM courses 13 and A2t, we believe that the book could be used as a
supplementary text. In a course on computer engineering, using the style given in

*"A Curriculum in Computer Science and Engineering-Committee Report," Model Curricula Subcommittee, IEEE Computer Society, EHOI19-8, January 1977.
t"Curriculum 68," Commun. ACM. II. 3. pp. 151-197, March 1968.

PREFACE

xiii

the syllabus of CO-2* (I/O and Memory Systems) as a model, ihis could be a
primary text, provided that material on other manufacturers' computers is made
available to show different viewpoints.
ACKNOWLEDGEMENTS

We gratefully acknowledge our contributing authors, whose insights have
greatly enhanced the scope of this book, and our colleagues at DEC, who assembled information, and provided subject matter expertise and advice.
We would like to thank R. Eckhouse, R. Glorioso, S. Fuller, J. Lipovski, and P.
Jesse! whose critiques of the preliminary drafts of the introductory chapters and
book outline proved very helpful. We would also like to thank J. Cudmore, R.
Doane, R. Elia-Shaoul, S. Fuller, L. Gale, L. Hughes, R. Peyton, and S. Teicher,
who provided data for Chapter 2 and valuable critiques of earlier drafts. We also
acknowledge the reviewers of the second draft of the manuscript, to whose criticisms we have especially tried to respond. We received instructive comments and
evaluations from D. Aspinall, G. Blaauw, R. Clayton, D. Cox, J. Dennis, P.
Enslow, D. Freeman, J. Grason, J. Gray, W. Heller, G. Korn, J. Lipcon, J. Marshall, E. McCluskey, C. Minter, M. Moshell, E. Organick, W. Schmitt, B.
Schunck, I. Sutherland, J. Wakerly, and J. Wipfli. We would like to extend special
thanks to H. Stone for his extensive and particularly useful review comments.
We are also indebted to many for their support in producing Computer Engineering. We are particularly indebted to Heidi Baldus of Digital Press who coordinated the production of Computer Engineering and whose encouragement kept us
going through a number of difficult times. For their expertise and patience, we
thank the Technical Documentation group, especially Denise Peters. We also
thank Mary Jane Forbes and Louise Principe for their constant support in the
courseof this book's development and production. The manuscript creation and
preparation on the DEC Word Processing System, followed by transmission to
the DECset-8000 Typesetting System, permitted numerous drafts and rapid creation of the final typeset material.
C.G.B.
J.C.M.
J.E.M.
August 1978

C.G. Bell, J.C. Mudge, and J.E. McNamara: Seven Views of Computer Systems.
C.G. Bell, Digital Equipment Corporation and Carnegie-Mellon University; J.C. Mudge and J.E. McNamara, Digital Equipment Corporation.
C.G. Bell, J.C. Mudge, and J.E. McNamara: Technology Progress in Logic and
Memories. C.G. Bell, Digital Equipment Corporation and Carnegie-Mellon
University; J.C. Mudge and J.E. McNamara, Digital Equipment Corporation.
C.G. Bell, J.C. Mudge, and J.E. McNamara: Packaging and Manufacturing.
C.G. Bell, Digital Equipment Corporation and Carnegie-Mellon University; J.C. Mudge and J.E. McNamara, Digital Equipment Corporation.
K.H. Olsen: Transistor Circuitry in the Lincoln TX-2. Copyright © 1957 by
AFIPS. Reprinted, with permission, from the Proceedings of the Western
Computer Conference, 1957, pp. 167-171. This work was supported jointly
by the U.S. Army, Navy, and Air Force under contract with M.LT. K.H.
Olsen, Lincoln Laboratory M .I.T. (currently with Digital Equipment Corporation).
R.L. Best, R.C. Doane, and J.E. McNamara: Digital Modules, the Basis for
Computers. R.L. Best, R.C. Doane, and J.E. McNamara, Digital Equipment Corporation.
C.G. Bell, G. Butler, R. Gray, J.E. McNamara, D. Vonada, and R. Wilson: The
PDP-l and Other 18-Bit Computers. C.G. Bell, Digital Equipment Corporation and Carnegie-Mellon University; G. Butler et at., Digital Equipment
Corporation.
C.G. Bell and J.E. McNamara: The PDP-8 and Other 12-Bit Computers. e.G.
Bell, Digital Equipment Corporation and Carnegie-Mellon University; J .E.
McN amara, Digital Equipment Corporation.
xv

xvi

ACKNOWLEDGEMENTS

C.G. Bell, A. Newell and D.P. Siewiorek: Structural Levels of the PDP-8. Revised
and updated version of Chapter 5, "The DEC PDP-8," Computer Structures: Reading and Examples, C.G. Bell and A. Newell, McGraw-Hill Book
Co., New York, 1971. C.G. Bell, Digital Equipment Corporation and Carnegie-Mellon University; A. Newell and D.P. Siewiorek, Carnegie-Mellon
University.
C.G. BeU et al.: A New Architecture for Minicomputers - The DEC PDP-II.
Copyright © 1970 by AFIPS. Reprinted, with permission, from the Proceedings of the Spring Joint Computer Conference, 1970, pp. 657-675. C.G.
Bell, Digital Equipment Corporation and Carnegie-Mellon University.
Those who have contributed subject matter expertise include R. Cady, H.
McFarland, B.A. Delagi, J.F. O'Loughlin, R. Noonan, and W.A. Wulf.
W.O. Strecker: Cache Memories for PDP-II Family Computers. Copyright e
1976 by the Institute of Electrical and Electronics Engineers, Inc. Reprinted,
with permission, from the Proceedings of the 3rd Annual Symposium on
Computer Architecture, 1976, pp. 155-158. W.O. Strecker, Digital Equipment Corporation.
J .V. Levy: Buses, The Skeleton of Computer Structures. J.V. Levy, Digital Equip. ment Corporation (currently with Tandem Computers, Inc.).
MJ. Sebern: A Minicomputer-Compatible Microcomputer System: The DEC
LSI-II. Copyright © 1976 by the Institute of Electrical and Electronics Engineers, Inc. Reprinted, with permission, from the Proceedings of the IEEE,
June 1976, Vol. 64, No.6. Manuscript received by IEEE on October 10,
1975; revised December 12, 1975. MJ. Sebern, Digital Equipment Corpoation (currently with Sebern Engineering, Inc.).
J.C. Mudge: Design Decisions for the PDP-I 1/60 Mid-Range Minicomputer.
Copyright © 1977 by the Computer Design Publishing Corp. Reprinted,
with permission, from Computer Design, August 1977, pp. 87-95. Appears
under title "Design Decisions Achieve Price/Performance Balance in MidRange Minicomputers" in Computer Design issue. J.C. Mudge, Digital
Equipment Corporation.
E.A. Snow and D.P. Siewiorek: Impact of Implementation Design Tradeoffs on
Performance: The PDP-II, A Case Study. Copyright © 1978 by Edward A.
Snow and Daniel P. Siewiorek. This research was supported in part by the
National Science Foundation under grant GJ-32758X and by an IBM fellowship. Engineering documentation was supplied by the Digital Equipment Corporation. E.A. Snow (currently with Intel Corp.) and D.P.
Siewiorek, Carnegie-Mellon University.
R.F. Brender: Turning Cousins into Sisters: An Example of Software Smoothing
of Hardware Differences. R.F. Brender, Digital Equipment Corporation.

ACKNOWLEDGEMENTS

xvii

C.G. Bell and j.e. Mudge: Tne Evoiution of the PDP-II. Chapter includes material from "What Have We Learned From the PDP-II?" by C.G. Bell, in
Perspectives on Computer Science: From the 10th University Symposium at
the Computer Science Department, Carnegie-Mellon University, A. Jones
(Ed.), Academic Press, Inc., 1978. C.G. Bell, Digital Equipment Corporation and Carnegie-iviellon University; j.e. Mudge, Digitai Equipment
Corpora tion.
W.D. Strecker: VAX-ll/780: A Virtual Address Extension to the DEC PDP-II
Family. Copyright © 1978 by American Federation of Information Processing Societies, Inc. Reprinted, with permission, from the Proceedings of the
National Computer Conference, June 1978, pp. 967-980. W.D. Strecker,
Digital Equipment Corporation.
C.G. Bell, J. Eggert, J. Grason, and P. Williams: The Description and Use of
Register Transfer Modules (RTMs). Copyright © 1972 by the Institute of
Electrical and Electronics Engineers, Inc. Reprinted, with permission, from
the IEEE Transactions on Computers, May 1972, Vol. C-21, No.5, pp.
495-500. Manuscript received by IEEE February 19, 1971; revised May 11,
1971. C.G. Bell, Digital Equipment Corporation and Carnegie-Mellon University; J. Eggert, Digital Equipment Corporation (currently with Eggert
Engineering); J. Grason, Carnegie-Mellon University (currently with Bell
Laboratories); P. Williams, Digital Equipment Corporation (currently with
Data Terminal Systems, Inc.).
T.M. McWilliams, S.H. Fuller, and W.H. Sherwood: Using LSI Processor BitSlices to Build a PDP-II - A Case Study in Microcomputer Design. Copyright © 1977 by AFIPS. Reprinted, with permission, from the Proceedings of
the National Computer Conference, 1977, pp. 243-253. This work was partially supported by the Advanced Research Projects Agency (ARPA) of the
Department of Defense under contract F44620-73-C-0074, monitored by
the Air Force Office of Scientific Research. T.M. McWilliams, CarnegieMellon University (currently with Stanford University and Lawrence Livermore Laboratory, University of California); S.H. Fuller, Carnegie-Mellon
University (currently with Digital Equipment Corporation); W.H. Sherwood, Carnegie-Mellon University (currently with Digital Equipment Corporation).
S.H. Fuller, J.K. Ousterhout, L. Raskin, P. Rubinfeld, P.S. Sindhu, and R.J.
Swan: Multi-Microprocessors: An Overview and Working Example. Copyright © 1978 by Institute of Electrical and Electronics Engineers, Inc. Reprinted, with permission, from the Proceedings of the IEEE, February 1978,
Vol. 61, No.2, pp. 216-228. Manuscript received by IEEE November 11,
1977. This work was supported in part by the Advanced Research Projects
Agency of the Department of Defense under Contract F44620-73-C-0074,
which is monitored by the Air Force Office of Scientific Research, and in

xviii

ACKNOWLEDGEMENTS

part by the National Science Foundation under Grant GJ 32758X. The LSI11 s and related equipment were supplied by Digital Equipment Corporation. S.H. Fuller, Carnegie-Mellon University (currently with Digital
Equipment Corporation); J.K. Ousterhout et at., Carnegie-Mellon University.
C.G. Bell, A. Kotok, T.N. Hastings, and R. Hill: The Evolution of the DECsystem-IO. Copyright © 1978 by the Association for Computing Machinery.
Reprinted, with permission, from the Communications of the A C M, January
1978, Vol. 21, No. I, pp. 44-63. C.G. Bell, Digital Equipment Corporation
and Carnegie-Mellon University; A. Kotok, T.N. Hastings, and R. Hill,
Digital Equipment Corporation.
M. Barbacci: Appendix 1 - An ISPS Primer for the Instruction Set Processor. M.
Barbacci, Carnegie-Mellon University.
J.C. Mudge: Appendix 2 - The PMS Notation. J.C. Mudge, Digital Equipment
Corporation.
C.G. Bell, J.C. Mudge, and J.E. McNamara: Appendix 3 - Performance. CG.
Bell, Digital Equipment Corporation and Carnegie-Mellon University; J.C.
Mudge and J.E. McNamara, Digital Equipment Corporation.
TRADEMARKS
The following trademarks appear in Computer Engineering: A DEC View o.lHardware Systems Design.
Company

Trademark

Computer Automation Corporation

Naked Mini

Digital Equipment Corporation

DEC
DECSYSTEM-20
DECUS
DIBOL
Fastbus
FOCAL
Massbus
RSTS
TOPS-IO
Unibus

Fairchild Camera and Instrument
Corporation

Macrologic

Friden Company - A Division
of Singer Company

Flexowriter

Gardner-Denver Company

Wire-wrap

Teletype Corporation

Teletype

Xerox Corporation

Xerox 6500 color graphics printer

DECsystem-IO
DECtape
DDT
DIGITAL
Flip Chip
LSI-II
PDP
RSX
TOPS-20

Foreword
Preface ..........................................................................

v
Vll

Acknowledgements ......................................................... xv
]

Seven Views of Computer Systems .............................. ..
C. Gordon Bell, J. Craig Mudge, and
John E. McNamara

Technology Progress in
Logic and Memories .....................................................

27

c.

Gordon Bell, J. Craig Mudge, and
John E. McNamara

Packaging and Manufacturing

63

C. Gordon Bell, J. Craig Mudge, and
John E. McNamara

PART I
IN THE BEGINNING

93

Transistor Circuitry
in the Lincoln TX-2

97

Kenneth H. Olsen

Digital Modules,
The Basis for Computers

103

Richard L. Best, Russell C. Doane, and
John E. McNamara
xix

xx

CONTENTS

PART II
BEGINNING OF THE MINICOMPUTER .................. 119

6>

The PDP-I and Other
I8-Bit Computers ......................................................... 123

c. Gordon Bell, Gerald Butler, Robert Gray,
John E. McNamara, Donald Vonada, and
Ronald Wilson

'1

The PDP-8 and Other
I2-Bit Computers ......................................................... 175

c.
~

Gordon Bell and John E. McNamara

Structural Levels of the PDP-8 ...................................... 209

c. Gordon

Bell, Allen Newell, and
Daniel P. Siewiorek

PART III
THE PDP-II FAMILY ................................................ 229

<9

A New Architecture
for Minicomputers
-The DEC PDP-II

241

C. Gordon Bell, Roger Cady,
Harold McFarland, Bruce A. Delagi,
James F. O'Loughlin, Ronald Noonan, and
William A. Wulf

Cache Memories for PDP-II
Family Computers ........................................................ 263
William D. Strecker

]] Buses, The Skeleton of
Computer Structures .................................................... 269
John V. Levy

]2

A Minicomputer-Compatible
Microcomputer System:
The DEC LSI-II ........................................................... 301
Mark J. Sebern

CONTENTS

xxi

Design Decisions for the
PDP-ll/60 Mid-Range Minicomputer ......................... 315
J. Craig Mudge

W

Impact of Implementation
Design Tradeoffs on Performance:
The PD P-II, A Case Study ........................................... 327
Edward A. Snow and Daniel P. Siewiorek

Turning Cousins into Sisters:
An Example of Software Smoothing
of Hardware Differences ............................................... 365
Ronald F. Brender

The Evolution of the PDP-II

379

C. Gordon Bell and J. Craig Mudge

]1

VAX-ll/780:
A Virtual Address Extension
to the DEC PDP-II·Family

409

William D. Strecker

PART IV
EVOLUTION OF
COMPUTER BUILDING BLOCKS ............................ 429
]~

The Description and Use of
Register Transfer Modules (RTMs)

441

C. Gordon Bell, John Eggert, John Grason,
and Peter Williams

U sing LSI Processor Bit-Slices
to Build a PDP-II - A Case Study
in Microcomputer Design ............................................. 449
Thomas M. McWilliams, Samuel H. Fuller,
and William H. Sherwood

M ulti- Microprocessors:
An Overview and Working Example
Samuel H. Fuller, John K. Ousterhout, Levy Raskin,
Paul I. Rubinfeld, Pradeep S. Sindhu,
and Richard J. Swan

463

xxii

CONTE NTS

PART V

THEPDP-IO FAMILY
2~

485

The Evolution of the DECsystem-l0 ............................. 489
c. Gordon Bell, Alan Kotok,
Thomas N. Hastings, and Richard Hill

Appendix 1
An ISPS Primer for the
Instruction Set Processor Notation ............................... 519
Mario Barbacci

Appendix 2
The PMS Notation ....................................................... 537
J. Craig Mudge

Appendix 3
Performance

541

C. Gordon Bell, J. Craig Mudge, and
John E. McNamara

Bibliography ................................................................. 553
Index ............................................................................ 563

Seven Views of Computer Systems
C. GORDON BELL, J. CRAIG MUDGE,
and JOHN E. McNAMARA

A computer is determined by many factors,
including architecture, structural properties, the
technological environment, and the human aspects of the environment in which it was designed and built. In this book various authors
reflect on these factors for a wide range of DEC
computers - their goals, their architectures,
their various implementations and realizations,
and occasionally on the people who designed
them.
Computer engineering is the complete set of
activities, including the use of taxonomies, theories, models, and heuristics, associated with
the design and construction of computers. It is
like other engineering, and the definition that
Richard Hamming (then at Bell Laboratories)
gave is especially appropriate: engineers first
turn to science for answers and help, then to
mathematics for models and intuition, and finally to the seat of their pants.
In the few decades since computers were first
conceived and built, computer engineering has
come from a set of design activities that were
mostly seat-of-the-pants based to a point where
some parts are quite well understood and based

on good models and rules of thumb, such as
technology models, and other parts are completely understood and employ useful theories
such as circuit minimization.
In this chapter, seven views are presented that
the authors have found useful in thinking about
computers and the process that molds their
form and function. They are intentionally independent; each is a different way of looking at a
computer. A computer scientist or mathematician sees a computer as levels-of-interpreters.
An engineer sees the computer on a structural
basis, with particular emphasis on the logic design of the structure. The view most often taken
by a buyer is a marketplace view. While these
people each favor a particular view of computers, each typically understands certain aspects of the other views. The goals of Chapter 1
are to increase this understanding of other
views and to increase the number of representations used to describe the object of study and,
hence, improve on its exposition. Thus, "The
Seven Views of Computer Systems" forms a
useful background for the subsequent chapters
on past, present, and future computers.

2

COMPUTER ENGINEERING

VIEW 1: STRUCTURAL LEVELS OF A
COMPUTER SYSTEM

In Computer Stuctures [Bell and Newell,
1971], a set of conceptual levels for describing,
understanding, analyzing, designing, and using
computer systems was postulated. The model
has survived major changes in technology, such
as the fabrication of a complete computer on a
single silicon chip, and changes in architecture,
such as the addition of vector and array datatypes.
As shown in Figure 1, there are at least five
levels of system description that can be used to

PMS LEVEL

Ms

I
I

REGISTER
TRANSFER
LEVEL

I
I
I
I I
,

1

~

\

OW,,,,,". I

I

~::~~:~:~: 1· ... ~~...--_.- -.....\~~' '·~A,-,·~:~ : :'
COMBINATIONAL

/

\

7

ELECTRICAL
CIRCUIT
LEVEL

I

TRANSISTOR

DEVICE
LEVEL

A\EVICE

11 \
P AREA

N AREA

\

~

METAL
AREA

,
P AREA

'"
N AREA

Figure 1. Hierarchy of computer levels, adapted from
Bell and Newell [1971].

describe a computer. Each level is characterized
by a distinct language for representing the components associated with that level, their modes
of combination, and their laws of behavior.
Within each level there exists a whole hierarchy
of systems and subsystems, but as long as these
are all described in the same language, they do
not constitute separate levels. With this general
view, one can work up through the levels of
computer systems, starting at the bottom.
The lowest level in Figure 1 is the device level.
Here the components are p-type and n-type
semiconductor materials, dielectric materials,
and metal formed in various ways. The behavior of the components is described in the languages of semiconductor physics and materials
science.
The next level is the circuit level. Here the
components are resistors, inductors, capacitors,
voltage sources, and nonlinear devices. The behavior of the system is measured in terms of
voltage, current, and magnetic flux. These are
continuously varying quantities associated with
various components; hence, there is continuous
behavior through time, and equations (including differential equations) can be written to describe the behavior of the variables. The
components have a discrete number of terminals whereby they can be connected to other
components.
A bove the circuit level is the switching circuit
or logic level. While the circuit level in digital
technology is very similar to the rest of electrical engineering, the logic level is the point at
which digital technology diverges from electrical engineering. The behavior of a system is
now described by discrete variables which take
on only two values, called 0 and 1 (or + and -,
true and false, high and low). The components
perform logic functions called AND, OR,
NAND, NOR, and NOT. Systems are constructed in the same way as at the circuit level,
by connecting the terminals of components,
which thereby identify their behavioral values.

SEVEN VIEWS OF COMPUTER SYSTEMS

After a system has been so constructed, the iaws
of Boolean algebra can be used to compute the
behavior of the system from the behavior and
properties of its components.
In addition to combinational logic circuits,
whose outputs are directly related to the inputs
at any instant of time, there are sequential logic
circuits which have the ability to hold values
over time and thus store information. The problem that the combinational level analysis solves
is the production of a set of outputs at time t as
a function of a number of inputs at the same
time t. The representation of a sequential
switching circuit is basically the same as that of
a com binational switching circuit, although one
needs to add memory components. The equations that specify sequential logic circuit structure must be difference equations involving
time, rather than the simple Boolean algebra
equations which describe purely combinational
logic circuits.
The level above the switching circuit level is
called the register transfer (R T) level. The components of the register transfer level are registers and the functional transfers between those
registers. The functional transfers occur as the
system undergoes discrete operations, whereby
the values of various registers are combined according to some rule and are then stored (transferred) into another register. The rule, or law, of
combination may be almost anything, from the
simple unmodified transfer (A ~ B) to logical
combination (A ~ B 1\ (AND) C) or arithmetic
combination (A ~ B + (PLUS) C). Thus, a
specification of the behavior, equivalent to the
Boolean equations of sequential circuits or to
the differential equations of the circuit level, is a
set of expressions (often called productions)
that give the conditions under which such transfers will be made.
The fifth and last level in Figure 1 is called
the processor-memory-switch (PMS) level. This
level, which gives only the most aggregate behavior of a computer system, consists of central
processors, core memories, tapes, disks, in-

3

put/output processors, communications lines,
printers, tape controllers, buses, teleprinters,
scopes, etc. The computer system is viewed as
processing a medium, information, which can
be measured in bits (or digits, characters,
words, etc.). Thus, the components have capacities and flow rates as their operating characteristics.
The program level from the original set of
levels shown in Bell and Newell has been
dropped because it is a functional rather than a
structural level.
Many notations are used at each of the five
structural levels. Two of the less common ones
are the processor-memory-switch (PMS) and
instruction set processor (lSP) notations. A
complete description of these notations is given
in Bell and Newell [1971: Chapter 2]. Those aspects of PMS that are used in this book are described in Appendix 2. The ISP notation has
evolved to the ISPS language, which is described in Appendix 1.
VIEW 2: LEVY'S LEVELS-OFINTERPRETERS

In contrast to the Structural View, this view is
functional. According to this view, presented by
John Levy [1974], a computer system consists
of layers of interpreters, much like the layers of
an onion.
An interpreter is a processing system that is
driven by instructions and operates upon state
information. The basic interpretive loop, shown
in Figure 2, is most familiar at the machine language level but also exists at several other levels.
To formalize the notion of Levels-of-Interpretation, one can represent a processing system by the diagram in Figure 3.
The state information operated on by an interpreter is either internal or external. This can
best be understood by considering the "onion
skin" levels of the five processing systems that
form a typical airline reservation system. These
levels are listed in Table 1.

4

COMPUTER ENGINEERING

The Level 0 system is the logic that sequences
the Level 1 micromachine. The Level 1 system is
a microprogrammed processor implemented in
real hardware. It is the machine seen by the
logic designer. The Level 2 system is the central
processing unit (CPU). It is the machine seen by
the machine language programmer. The Level 3
system shown here is a FORTRAN language
processing system. The Level 4 system is an airline reservation system. Four of these five systems form the hierarchy shown in Figure 4,
where each system is an interpreter that sequences through multiple steps in order to perform a single operation for the next level
interpreter. The highest level system, the airline
reservation system, is an interpreter operating
on messages received from outside of the system. It tests and modifies states and generates

messages to send back outside the system, thus
performing a single operation for the outermost
in terp reter .
In practice, few systems are levels of pure interpreters, although layers are present. Deviations from the model have occurred for both
hardware and software reasons. In the hardware deviation case, the micromachine shown
in Level 1 is often not present, but rather the
Level 2 central processing unit is implemented
directly using Level 0 sequential controllers.
This practice of skipping Level 1 was initially
due to the lack of adequate read-only memories
but is now generally limited to the case of very
high speed machines such as the Cray 1 and the
Amdahl V6 which cannot tolerate the fetch and
execute cycle times associated with a control
store.

EXECUTE INSTRUCTION

Figure 2. The basic interpretive
loop I Levy. 19741·

I

INSTRUCTIONS

!
INTERPRETER

Figure 3.

OPERATOR
CONSOLE

I

MAINTENANCE
CONSOLE

-

A processing system [Levy. 19741.

LEVELQ
(SEQUENTIAL
MACHINENOT SHOWN)

Figure 4.

A hierarchy of interpreters ILevy. 19741·

SEVEN VIEWS OF COMPUTER SYSTEMS

Tabie i.
Level 4

Five leveiseof-interpreters for an Airiine Reservation System [Levy, 1974]
Instruction:

Seat allocation request message

Interpreter:

Airline reservation system

Irternal state:

Number of requests pending at this moment
Location of passengei list on a disk file

Number of lines connected to system
Externe;',tate:

Level 3

Level 2

Level 1

Level 0

Number of reserved seats on a given flight
Airline name for a given flight

Instructions:

FORTRAN statement codes

Interpreter:

FORTRAN execution system

Internal state:

Memory management parameters
User name
Main storage size
Location of disk files
Interrupt enable bits
Expression evaluation stack
Dimensions of arrays

External state:

Subroutine names
Values of data in arrays
Statement number
Program size
Value of an expression
DO-loop variable value
Printed characters on line printer

Instructions:

Machine language instructions

Interpreter:

Processor

Internal state:

Program registers
Condition codes
Program counter

External state:

Data in main memory
Disk controller registers

Instructions:

Microcode

Interpreter:

Micromachine

Internal state:

I nstruction register
Flip-flops holding error status
Stack of microprogram subroutine links

External state:

Program registers
Condition codes
Program counter

Instructions:

Hardwired combinational network

Interpreter:

Sequential machine controlling the
micromachine

Internal state:

Clock, counters, etc., controlling
micromachine timing

External state:

Micromachine, console

5

6

COMPUTER ENGINEERING

There are two primary software driven departures from the pure interpreter model: (I) high
level languages are usually executed by a compiler rather than by an interpreter, and (2) some
layers are bypassed when more ideal primitives
exist at deeper levels. Figure 5 illustrates this
bypassing process. A pure interpreter implementation of FORTRAN would use an object
time system (OTS) for all FORTRAN C operations designated in the figure. The object time
system would require an operating system
(OPSYS) for the interpretation of some of its
operations, and the operating system in turn

Figure 5.
Levels-of-interpreters with "pipes" that bypass levels. FORTRAN operation C is interpreted by an
OT5 function which in turn is interpreted by the operating system which is interpreted by the 15P. FORTRAN
operation A has a pipe directly to the 15P interpreter.

would be interpreted by the instruction set interpreter (ISP interpreter). However, the A operations in the figure would be directly
interpreted by the instruction set interpreter.
In the final analysis, the number of levels is
just another tradeoff. Performance considerations lead to the deletion of levels; complexity leads to the addition of levels. Having
presented the pure interpreter model, one can
now return to the Onion-Skin-Layered Model

to better understand how the different layers relate.
, The macromachine hardware can be thought
of as a base level interpreter. It is most often
extended upward with an operating system.
There may be several operating system levels so
that the machine can be built up in an orderly
fashion. A kernel machine might manage and
diagnose the hardware components (disks, terminals) and provide synchronizing operations
so that the multiple processes controlling the
physical hardware can operate concurrently.
Next, more complex operations such as the file
system and basic utilities are added, followed by
policy elements such as facilities resource management and accounting. As viewed through
the operating system, one sees a much different
machine than that provided by the basic instruction set architecture. In fact, the resultant
machine is hardly recognizable as the architecture most usually given by a symbolic assembler. It includes the basic machine but has more
capable I/O and often the ability to be shared
by many programs (or tasks).
Operating systems designers believe all these
facilities are necessary in order to implement
the next higher level interpreter - the standard
language. The language level may include interpreters or compilers to translate back to the machine architecture for ALGOL, BASIC,
COBOL, FORTRAN, or any of the other
standard languages and their dialects.
VIEW 3: PACKAGING LEVELS-OFINTEGRATION

This is a structural view that packages the
various components (hardware and software)
into levels. The levels for DEC computers in
1978 were as follows:
9
8
7
6

Applications
Applications components
Special languages
Standard languages

SEVEN VIEWS OF COMPUTER SYSTEMS

5
4
3
2

Operaiing systems
Cabinets (to hold complete hardware
systems)
Boxes
Modules (printed circuit boards)
Integrated circuits

This view is the most important in the book,
because it shows how computer systems are actually structured and, hence, how their costs are
structured. As a structural view of the object
being sold, however, it is completely a function
of the technology, the organization building the
system, and the marketplace, all of which are
changing so rapidly that the view could better
be titled "Dynamic Levels-of-Integration."
There are three major changes taking place:
1.

2.

3.

Changes in the hardware levels, where
the shrinking in physical size of functions has three effects:
a. Lower levels subsume higher levels.
b. The semiconductor component supplier is forced to assume higher and
higher level design responsibilities.
c. Levels disappear.
Changes in the software levels, again
with three effects:
a. Each level grows in size as more
functionality is added over time.
b. More levels are added as minicomputers are applied to a broader
range of applications.
c. Functions migrate downward from
level to level.
Changes in the hardware/software interface, where software functions migrate
into hardware for higher performance.

For the first of these areas of change, hardware levels, it is interesting to note that interconnection and packaging now constrain and
limit design more than any other factor, excluding the basic lowest level component (semiconductor) technology.

7

The constraint caused by the interconnection
and packaging takes place because most manufacturing costs are associated with the physical
structure. As interconnection levels must be introduced to build complex structures, many
usuaHy undesirabie side effects occur. Electrical
interconnection requires cables which require
space and interfere with cooling airflow. Long
interconnections increase signal transmission
delays, and these reduce performance. Signal
transmission not only makes the computer susceptible to electromechanical interference but
also may radiate electromagnetic waves that
need to be controlled.
Figure 6 shows the costs of various levels-ofintegration versus time for small computers.
The cost depends partly on implementation and
architecture word length. As the word length is
made shorter, there are some savings, particularly for very small computers, because some
levels-of-integration cease to exist. For example, most hand-held calculators are implemented using 4-bit, stored program computers
with fixed programs that occupy a single integrated circuit. There are associated modules,
backplanes, boxes, and cabinets - but all are
contained in a single package that fits in the
hand.
Semiconductors, the lowest level of technology, have had the greatest price decline (Figure 6). Modules have a lesser price decline
because they are a mix of integrated circuits,
printed circuit boards, component insertion labor, and testing labor. The price decline for the
integrated circuit portion of the module cost is
moderated by the labor-intensive nature of
module fabrication, thus producing a price decline for modules that is markedly less than that
for integrated circuits. At the box level-of-integration, power supplies and metal or plastic
boxes are also labor-intensive and further moderate the price decline provided by the integrated circuits. Finally, as boxes are
integrated (by people) and applied at a system

8

COMPUTER ENGINEERING

1M~

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _~

1960

1975

1980

TIME_

Figure 6. Machine price for various levels-ofintegration versus time.

level (by people), the price decline almost disappears.
Many of the cost improvements brought
about by new technology are derivative. They
are by-products of using less power and less
space, thus avoiding the labor-intensive levels
of packaging integration.
An astute marketing-oriented person might
ask, "How, with all the technology, can we do
something unique so that we can maximize the
benefit from the technology without having to
pay so much for labor-intensive items such as
packaging?" One answer: "Reduce prices by
not providing a power supply and mounting
hardware. Let the user provide all added-on
parts and mount the computer as needed. In

this way, the price, though not necessarily the
total cost to the user, is reduced. We'll sell at the
board level." Computer Automation followed
this philosophy when it introduced the Naked
Mini so that users could supply more added
value (packaging and power technology).
A similar effect can be seen in the PDP-II
series since the PDP-II/20's introduction in
1970. At that time, the 4,096-word PDP-ll/20
(mounted in a box) sold for $9,300. In 1976, the
boxed version of an LSI-II cost $1,995, reflecting a factor of 4.7 improvement over the PDP11/20. The 4,096-word core memory module
used in the PDP-II /20 sold for $3,500, while a
16,384-word metal-oxide semiconductor (MOS)
memory module for an LSI-II sold for $1,800,
reflecting a factor of 7.8 improvement.
The changing levels-of-integration have also
changed the domain of the semiconductor suppliers. In the early 1970s, Intel, North American
Rockwell, and other semiconductor companies
began to use the higher semiconductor densities
to reduce the number oflevels-of-integration by
packaging a complete processor-on-a-chip.
These organizations had assimilated logic design, but were frustrated because their customers could really not identify higher functionality
units (beyond m~mory) requiring on the order
of 1,000 gates on a chip. Also, the speed of these
high density units was quite low.
They discovered that the best finite state machine to make was just a simple computer, because it provided the finite state machine plus
the useful functions that were not covered by
switching circuit theory. It was "simply a small
matter of programming" to do something useful. Whereas programs for these simple computers cost $1 to $100 per instruction to write,
the prices for processors-on-a-chip have followed a very steep decline of up to 50 percent
price reduction per year.
Robert Noyce of Intel developed Figure 7 in
October 1975. It illustrates what has been happening in the semiconductor industry and has
been modified slightly to show the technology

SEVEN VIEWS OF COMPUTER SYSTEMS

9

COMPUTATION
SERVICE

APPLICATIONS

SYSTEM
INTEGRATION

t

SYSTEM
TASKS

~
MICRO·
COMPUTER

SEMICONDUCTOR
SUPPLIER TASKS

MSI

DEVICE DESIGN

1960

1970

1980

YEAR
NOTE

Each change of level of integration has forced
the component supplier to assume additional responsibilities.

Figure 7. Semiconductor (Noyce) manufacturer's
levels-of-integration versus time.

that DEC has assimilated with time. It indicates
the breadth that semiconductor manufacturers
now have in t~chnology, starting from the semiconductor device level, through Noyce's view of
the various levels-of-integration, and continuing into end-user applications.
The Levels-of-Integration View can be summarized as components of one level being combined into a system at the next highest level in a
hierarchy. A level denotes a single conceptual
design discipline or set of interacting disciplines
which determine the function, structure, performance, and cost of the constituent level.
"'Level" is a deceptive word, because as Figure
8 shows, the structure is actually a lattice, or
network, style of hierarchy rather than the classical tree style of hierarchy. In Figure 8 various
standard languages can be used on any of several different hardware/software systems,
which in turn can be implemented on several
different processors. Each processor is available
in several different boxes.

Figure 8. A computer system is a network.
not just a tree-structured hierarchy of
eight distinct levels.

VIEW 4: A MARKETPLACE VIEW OF
COMPUTER CLASSES

Because it is the complete marketplace process that produces the computer, this view is the
most complex. In terms of marketability, a
computer can be characterized as a function of
price, performance, and time of introduction in
what might appear to be a commodity-like environment.
Because various computers operate at different performance rates and at various costs,
computation can be purchased in multiple
ways, and price/performance ratios will thus affect marketability. For example, computation
can be supplied by a shared large, central batch
computer; each organizational entity can own

10

COMPUTER ENGINEERING

and operate a shared minicomputer; an individual can operate a single desk-top system; or
each individual can operate a programmable
calculator.
The price/performance ratio is not the sole
factor determining marketability, however.
Program compatibility with previous machines
is important. Compatibility considerations are
based on the economic necessity of using a common software base. The computer user's investment in software dwarfs that of the computer
manufacturer, if the machine is successful. For
example, if there is only one man-year of software investment associated with each of the
50,000 PDP-lIs, and each man-year costs about
$40,000 and produces something on the order
of 5,000 instructions, there is then a cumulative
investment of $2 billion and 250 million lines of
program for the PO P-ll. This investment is
roughly the same scale as the original hardware
cost.
Thus, while rapidly evolving technology permits new designs to be more cost-effective even radical - in a price/performance sense,
there must be backward (in time) compatibility
in order to build on and preserve the user's program base. The user must be able to operate
programs unchanged to take advantage of the
improvements brought about by technology
changes.
In a similar way, compatibility over a range
of machines at a given time allows a user to select a machine that matches his problem set
while having the comfort that the problems can
change and there will be a sufficiently large or
small machine available to solve the new problems.
For these reasons, nearly all modern computer designs are part of a compatible computer
family which extends over price and time. Technology provides basic improvements with each
new generation at approximately six-year intervals, and most new designs usually provide increased performance at constant price.

The influence of technology on the computers that are built and taken to the marketplace is so strong that the four generations of
computers have been named after the technology of their components: vacuum-tubes,
transistors, integrated circuits (multiple transistors packaged together), and large-scale integrated (LSI) circuits.
Each electronic technology has its own set of
characteristics, including cost, speed, heat dissipation, packing density, and reliability, all of
which the designer must balance. These factors
combine to limit the applicability of anyone
technology; typically, one technology is used
until either a limit is reached or another technology supersedes it.
Design Alternatives

When an improved basic technology becomes
available to a computer designer, there are four
paths the designs can take to incorporate the
technology:
1.

Use the newer technology to build a
cheaper system with the same performance.

2.

Hold the price constant and use the technological improvement to get an increase in performance.

3.

Push the design to the limits of the new
technology, thereby increasing both performance and price.

4.

Find a drastically new structure using
the computer as a basic archetype (e.g.,
calculators) such that the design can be
considered off the evolutionary path.

Figure 9 shows the trajectory of the first three
design alternatives. In general, the design alternatives occur in an evolutionary fashion as in
Figure 10 with a first (base) design, and subsequent designs evolving from the base.

SEVEN VIEWS OF COMPUTER SYSTEMS

#~

~~

~"

11

DESIGN STYLE 3

~ , .",G' m,,'
DESIGN STYLE 1

TIME

Figure 9. Three design styles on the
evolutionary path.

In the first design style, the performance is
held constant, and the improved technology is
used to build lower price machines which attract new applications. This design style has as
its most important consequence the concept of
the minimal computer. The minimal computer
has traditionally been the vehicle for entering
new applications, since it is the smallest computer that can be constructed with a given technology. Each year, as the price of the minimal
computer declines, new applications become
economically feasible.
The second, constant cost alternative uses the
improved technology to get better performance
at a constant price and will usually yield the
best increase in total system cost and effectiveness, for reasons which will be discussed
shortly.
The third alternative is to use the new technology to build the most powerful machine possible. New designs using this alternative often
solve previously unsolved problems and, in
doing so, advance the state-of-the-art. This design alternative must be used cautiously, however, because going too far in price or
performance (i.e., building beyond the technology) is dangerous and can lead to a zero performance, high-cost product. There are usually
two motivations for operating at this leading
edge: preliminary research motivated by the
knowledge that the technology ,will calch up;
and national defense, whera an 'es~entially in-

TIME

Figure 10.
design B.

Evolution from the base

finite amount of money is available because the
benefit - avoiding annihilation - is infinite.
Table 2 shows the effect of pursuing the two
design strategies of: (l) constant performance at
decreased price, and (2) constant price at increased performance. The first column gives the
base case at a given time t. Because this is the
base case, the price, performance, and
price/performance ratio of the computer are all
1. As the computer is applied to a particular environment, operational overhead is added at a
cost of 2 to 4 times the original cost of the computer; the total cost to operate the computer becomes 3 to 5 times higher, and the
performance/total cost ratio is reduced to between 0.33 and 0.2 (depending on the total
cost).
Now assume the same operating environment, with the same fixed (overhead) costs to
operate, at a new time t + 1, when technology
has improved by a factor of 2. Two alternative
designs are carried out; one is at constant
price/higher performance, and the other is at
constant performance/lower price (columns 2
and 3). The application is constant in three
cases (columns 1-3), and a new application is
discovered for the fourth case (column 4). Both
the constant-cost and constant-performance designs give the same basic performance/cost improvement - when only the cost of the
computer is considered. However, when one

12

COMPUTER ENGINEERING

Table 2.

Using New Technology for Constant Price and Constant Performance Designs

Introduction
(generation)

Time
t+1

t + 1

t+1

Design style

Base case

Constant price/
increased
performance

Constant
performance/
decreased
price

Constant
performance/
decreased
price

Application

Base

Base

Base

New base

0.5

0.5

Operating costs
(range)

2-4

2-4

2-4

1-2

Total cost

3-5

3-5

2.5-4.5

1.5-2.5

0.83-0.9

0.5

2

2

2

0.66-0.4

0.4-0.22

0.66-0.4

2

1.21-1.1

2

considers the high fixed overhead costs associated with the application (columns 1-3), there is
a relatively small improvement in performance/cost, although there has been a cost savings of 17 to 10 percent. The greatest gains
come in applying the computer with greater
performance and getting tfte attendant factor of
2 gain in performance and in price/performance ratio.
To summarize, the constant price/increased
performance design style gives a better gain because operating costs remain the same. Its gain
can only be equalled by the constant-performance design style when operating costs are
halved upon its application. This only occurs
when a new application is tackled, such as that
shown in column 4 ..

Computer Classes

Computer price

2

Performance
(and improvement)
Improvement
(in total cost)
Performance/price
(computer only
and improvement)
Performance/
total cost
Improvement
(in performance/total cost)

0.33-0.2

Applying the three design styles shown in
Figure 9 over several generations produces the
plot given in Figure 11. These figures lead to
one of the most interesting results of the Marketplace View, which is that computer classes
can be distinguished by price and named as follows: submicro (to come in the next generation say by 1980), micro, mini, midi, maxi, and super.
The classes midi and maxi are sometimes referred to by the single, nondescriptive name,
mainframe.
When one distinguishes computer classes by
price, a new range of price can be made possible
by new technology and create a new class. The

SEVEN VIEWS OF COMPUTER SYSTEMS

',",,,
f

13

The effect of technology upon computer classes can be summarized in the following thesis:

MAXIMUM
PERFORMANCE

~.\--....\

\

\

\

\

\

\

\

\

Continual application of technology via
the two major design styles results in: (1)
price declines creating new classes of
computers, (2) new classes becoming established classes, and (3) established
classes being encroached upon.

\
\

\

'\

\
\

\~.
\
\

\

\

\

\

\
\

\

\

\

\

~

\

\

\
\

t - 3

t - 2

t -

1

t

+

1

t

+

2

TIME_

Figure 11.

Price versus time for each machine class.

new class appears at the low end of the price
scale where the minimal computer is introduced
at a significantly lower price level than existing
computers.
The measure used to define a new class is
price, whereas the measure defining an established class is performance. This is because once
a new class has become established in the marketplace, the users become familiar with what
computers of that class can do for their applications and tend to characterize that class on a
performance basis. The characterization of existing classes on a performance basis is important to this discussion because at each new
technology time, performance increases by one
category, and midi performance becomes available on a mini, for example.

Some question may arise as to how much of a
price reduction is necessary to create a new
class. The continuity implied by the thesis is deceptive in that it suggests that new classes come
about by the continual application of the constant performance/decreasing cost style of design. Viewing the industry as a whole, this is
true. However, a new class is usually not created by the same organization that is designing
computers in existing classes. A new company,
or new organization within a company, is usually required to provide the requisite fresh viewpoint needed to create a new class. It is the fresh
viewpoint and not some arbitrary amount of
price reduction that creates a new class.
For both the minicomputer and microcomputer, a fresh organization broke out. A
fresh viewpoint was needed because existing organizations, like most human organizations, act
to preserve the status quo, and adopt the increased performance/constant price design alternative for the existing customer base, as
indicated by the analysis given in the discussion
of Table 2. A new organization with a fresh
viewpoint goes after new applications and new
customers with a new minimal computer that
establishes a new class.
As a by-product of the use of new technology, conflicts occur within the established
computer classes. An established computer
class, which is defined on the basis of performance, is encroached upon by constant
cost/higher performance successors from the
class below it. Moreover, suppliers within a
class are, by their dominant constant

14

COMPUTER ENGINEERING

price/higher performance evolution, operating
to move up out of their class.
While movement by computer designs and
computer suppliers between and among the various classes may be encouraged by price and
performance trends, the speed with which that
movement occurs is moderated by the software
compatibility considerations discussed earlier.
The computer class thesis is not meant to imply
that each class implements the same instruction
set processor and processor-memory-switch
configurations with the only difference being
speed. Rather, much specialization occurs in
each class, and many of the attributes of the
higher performance machines appear in substantially less degree in the lower performance
classes. For example, there are more data-types
in the larger machines, their address spaces
(both physical and virtual) are larger, and the
software support is generally broader. Resources devoted to increasing reliability and
availability are more common in the higher
priced machines. The PDP-II Family, from the
LSI-II up to the VAX-II/780, exemplifies
these functionality differences.

Definition of the Minicomputer

The concept of computer classes that can be
distinguished by price and named submicro, micro, mini, midi, maxi, and super may be of assistance in finding a definition for the
minicomputer, a definition which has thus far
been rather elusive. While the classes suggest
that minicomputers are those computers whose
prices fall between microcomputers and midicomputers, and thus somewhere near the
middle of the range of computers available, earlier definitions [Bell and Newell, I97Ia] use the
term mini to denote minimal.
The Marketplace View defines new computer
classes according to price and established computer classes according to performance. This
would suggest that a definition of the minicomputer should include some historical data

on price and some comments on performance,
or at least some indication of performance by a
discussion of applications and configurations.
In 1977 Gordon Bell provided such a hybrid
definition for the Director of Computer Resources, U. S. Air Force. The definition was as
follows:
MINICOMPUTER: A computer
originating in the early 1960s and predicated on being the lowest (minimum)
priced computer built with current technology. From this origin, at prices ranging from 50 to 100 thousand dollars, the
computer has evolved both at a price reduction rate of 20 percent per year and
has also evolved to have increased functionality and a slightly higher price with
increasing functionality and performance.
Minicomputers are integrated into
systems requiring direct human and process interaction on a dedicated basis (versus being configured with a structure to
solve a wide set of problems on a highly
general basis).
Minicomputers are produced and distributed in a variety of ways and levelsof-integration from: printed circuit
boards containing the electronics; to
boxes which hold the processor, primary
memory, and interfaces to other equipment; to complete systems with peripherals oriented to solving a particular
application(s) problem. The price
range(s) for the above levels-of-integration, in 1978, are roughly: 500 to
2,000; 2,000 to 50,000; and 5,000 to
250,000.

This discussion of the Marketplace View has
been a qualitative explanation of the effect of
technology on the computer industry. It is an
engineering view, rather than one that would be
given by technology historians or economists.
The 20 years described in this book and the individual cost and performance measures surely
invite analysis by professionals. The studies reported in Phister [1976] and Sharpe [1969] are a
good departure point.

SEVEN VIEWS OF COMPUTER SYSTEMS

VIEW 5: AN APPliCATiONS;
FUNCTIONAL VIEW OF COMPUTER
CLASSES

Because of the general purpose nature of
computers, all of the functional specialization
occurs at the time of programming rather than
at the time of design. As a result, there is remarkably little shaping of computer structure
to fit the function to be performed.
The shaping that does take place uses four
primary techniques.

1.

2.

3.

4.

PMS level configuration. A configuration is chosen to match the function to be performed. The user (designer)
chooses the amount of primary memory,
the number and types of secondary
memory, the types of switches, and the
number and types of transducers to suit
his particular application.
Physical packaging. Special environmental packaging is used to specialize a computer system for certain environments,
such as factory floor, submarine, or
aerospace applications.
Data-type emphasis. Computers are designed with data-types (and operations
to match) that are appropriate to their
tasks. Some emphasize floating-point
arithmetic, others string handling. Special-purpose processors, such as Fast
Fourier Transform processors, belong in
this category also.
Operating system. The generality of the
computer is used to program operating
systems that emphasize batch, time sharing, real-time, or transacting processing
needs.

Current Dimensions of Use

In the early days of computers, there were
just two classifications of computer use: scientific and commercial. By the early 1970s, computer use had diversified to seven different

15

functionai segmentations: scientific, business,
control, communication, file control, terminal,
and timesharing. Since that time, very little has
changed in terms of functional characterization,
but two points are worthy of mention. First, file
control computers stili have not materiaiized as
mainstream separate functional entities, despite
isolated cases such as the IBM 3850 Mass Storage System; second, terminal computers have
evolved to a much higher degree than expected.
The high degree of evolution in terminals has
been due to the use of microprocessors as control elements, thus providing every terminal
with a stored program computer. Given this
generality, it has been simple to provide the terminal user with facilities to write programs. In
turn, this phenomenon has affected the evolution of timesharing (when using the term to denote close man-machine interaction as opposed
to shared use of an expensive resource).
Functional segmentation into categories with
labels such as business, control, communication,
and file control reflects a naming convention
rooted in the old two-category scientific/commercial tradition. An alternative classification, more useful today, is the
segmentation scheme shown in Table 3. It is
based on the intellectual disciplines and environment (e.g., home based) that use and develop the computer systems. It shows the
evolving structures in each of the disciplines,
permitting one to see that nearly all the environments evolve to provide some form of direct,
interactive use in a multiprogrammed environment. The structures that interconnect to mechanical processes are predominately for
manufacturing control. Other environments,
such as transportation, are also basically realtime control. Another feature of disciplinebased functional segmentation is that each of
the disciplines operates on different symbols.
For example, commercial (or financial) environments hold records of identifier names for
entities (e.g., part number) and numbers which
are values for the entity (e.g., cost, number in
inventory).

16

COMPUTER ENGINEERING

Table 3. Discipline/Environment-Based
Functional Segmentation Scheme
Commercial environment
• Financial control for industry, retail/wholesale, and
distribution
• Billing, inventory, payroll, accounts receivable/
payable
• Records storage and processing
• Traditional batch data entry
• Transaction processing against data base
• Business analysis (includes calculators)~
Scientific, engineering, and design
• Numbers, algorithms, symbols, text, graphs, storage,
and processing
• Traditional batch computation *
• Data acquisition
• Interactive problem solving*
• Real time (includes calculators and text processing)
• Signal and image processing*
• Data base (notebooks and records)
Manufacturing
• Record storage and processing
• Batch*
• Data logging and alarm checking
• Continuous real-time control
• Discrete real-time control
• Machine based
• People/parts flow
Communications and publishing
• Message switching
• Front-end processing
• Store and forward networks
• Speech input/output
• Terminals and systems
• Word processing, including computer conferencing
and publishing
Transportation systems
• Network flow control
• On-board control
Education
• Computer-assisted instruction
• Algorithms, symbols, text storage, and processing
• Drill and practice
• Library storage
Home using television set
• Entertainment. record keeping, instruction, data base
access

The scientific, engineering, and design disciplines use various algorithms for deriving
symbols or evaluating values. Texts, graphs,
and diagrams, the major ways of representing
objects, have to be processed. For these environments, the computer has changed from a
calculator (it was initially funded to do trajectory calculations for ballistic weapons) to a
sophisticated notebook for keeping specifications, designs, and scientific records. Whereas
the minicomputer was initially only used as a
transducer to collect data to be analyzed on
larger machines, it has since evolved to direct
recording and analysis of time-varying signals
and images and even to direct analysis and control. With minicomputers taking on such additional capabilities, connections to larger
computers are used solely in a network fashion
to handle graphic display and control functions.
The function of computers in both the manufacturing and the commercial environments has
evolved from simple record keeping to direct
on-line human control.
Process control computers have evolved from
their initial use of assisting human operators
(controllers) with data logging and alarm condition monitoring to full control of processes with
either human or secondary computer backup.
The structure of the computer and the control
task vary widely depending on whether the process is continuous (e.g., refinery, rolling mill) or
discrete (e.g., warehouse, automotive, appliance
manufacturing).
Transportation applications for aircraft,
trains, and eventually automotive vehicles are
forms of real-time control that use both discrete
and continuous control. Control is carried out
in two parts: on board the vehicle and in the
network (airspace, highway) that carries the vehicles. The transportation control function dictates three unique characteristics for the
computer structure:
1.

* Implies continuous program development

Very high reliability. Society has placed
such a high value on a single human life

SEVEN VIEWS OF COMPUTER SYSTEMS

2.
3.

that all computers in this environment
cannot appreciably raise the likelihood
of a fatality.
Very small size for on-board computers.
Extreme operating and storage temperature range for on-board computers - especially for automotive vehicles.

Communications and message-based computers have evolved from telephone switching
control, message switching, and front ends to
other computers to become the dominant part
of communications systems. With these evolving systems, the communications links have
changed from analog-based transmission to
sampled-data, digital transmission. By using
digital transmission, data and voice (and video)
can ultimately be used in the same system.
Word processing (i.e., creation, editing, and
reproduction) together with long term storage
and retrieval and transmission to other sites
(i.e" electronic mail) have evolved from several
systems:
1.

2.

3.

4.

5.

Conventional teletypewriter messages
and torn-tape message switching (e.g.,
TWX, Western Union, Telex).
Terminals with local storage and editing
(e.g., Flexowriters, Teletype (with paper
tape reader and punch), magnetic card/
magnetic tape automatic typewriters,
and the evolving stand-alone word processing terminals for office use).
Large, shared text preparation systems
for centralized documentation preparation, newspaper publication, etc.
Large systems with central filing and
transmission (distribution). These will
negate the need for substantial hard
copy. With these systems, text can be
prepared either centrally with the system
or with local intelligent word processing
systems.
Computer conferencing. People can sit
at terminals and converse with others
without leaving their office.

17

The education-based environment implies a
system which is a combination of transaction
processing (for the human interaction part), scientific computation as the computer is required
to simulate real world conditions (i.e., physical/natura! phenomena), and information retrieval from a data base. These systems are
evolving from the simple drill-and-practice systems which use a small simple algorithm,
through simulation of particular real world
phenomena, to knowledge-based systems which
have a limited, but useful, natural language
comm unications capability.
Home-based computers are beginning to
emerge. The dominant use to date is in providing entertainment in the form of games that
model simple, real world phenomena, such as
ping-pong. Appliances are beginning to have
embedded computers that have particular
knowledge of their environments. For example,
computer-controlled ranges can cook in fairly
standard ways. Alternatively, cooking can be
controlled by embedded temperature sensors.
Simple calculators to record checkbooks have
existed for quite some time. These will soon
evolve to provide written transactions for recording and control purposes. Many domestic
activities are in essence scaled-down versions of
commercial, scientific, educational, and message environments.
With the evolution of each computer class,
one can see several cases of machine structures
which begin as highly specialized and evolve to
being quite general. This evolution is driven by
applications in accordance with the Applications/Functional View of Computer Classes.
The applications-driven evolution toward
generality applies to both hardware and software. As a hardware example; consider the case
of a computer installations using large, highly
general computers, where minicomputers are
applied to offload the large computers. The first
application of the minicomputer is thus on a
well-defined problem, but then more problems
are added, and the minicomputer system is soon

18

COMPUTER ENGINEERING

performing as a general computation facility
with the help of a general purpose operating
system. A similar effect occurs in software,
where operating systems take on multiple functions as they evolve with time because users
specify additional needs, and operating systems
designers like to add function. Thus, a COBOL
run-time environment might be added to a
simple FORTRAN-based real-time operating
system. At the next stage, a comprehensive file
system might be added. In the hardware system,
the next step in the evolution is usually offloading the minicomputer; in the software case, the
next step is often the development of a new
small, simple, and fast operating system.
Part of this evolution is due to the inherent
generality of a computer, and part is a consequence of constant-cost design philosophy.
The evolution is observable in computers of all
classes, including calculators. The early scientific calculators evolved from just having logs,
exponentials, and transcendental functions to
include statistical analysis, curve fitting, vectors, and matrices.
Machines, then, evolve to carry out more and
more functions. Since a prime discriminant is
data-type, Figure 12 is presented to show an estimate of data-type usage for various applications, using mostly high level data-types, e.g.,
process descriptions. The estimates shown are
very rough, because attempts to measure such
distributions to date have not shown marked
differences across applications (except for numerical versus non-numerical) because the
data-types have not been of a sufficiently high
level.
VIEW 6: THE PRACTICE OF DESIGN

Whereas previous views emphasized the object being designed, this is a view of the design
process which gives rise to the object. Two
models of design, those of Asimow and Simon,
are presented, followed by some remarks on
factors that particularly influence computer design.

NUMERICAL COMPUTATION

[Jd
WORO PROCESSING

r==rIlJ

D

COMMUNICATIONS

rilb
[kCbM'"
REAL·TIME PROCESS CONTROL

TRANSACTION PROCESSING

IIIb
Figure 12.

Data-type usage by application.

In Introduction to Design [1962], Asimow
gives a general perspective of engineering design
and how the formal alternative generators and
evaluating procedures are used. He also indicates where these formalisms break down and
where they do not apply. He defines engineering
design as an activity directed toward fulfilling
human needs, based on the technology of our
culture.
Asimow distinguishes two types of design:
design by evolution and design by innovation.

SEVEN VIEWS OF COMPUTER SYSTEMS

6.
,

_J, I
.

FEEDBACK

I.NFORMATIO~,'----":"=~:":;""'-_J

f PARTICULAR
ABOUT A
I
\

' ..... -,
DESIGN

I
I

V

--'
___

.:i __ _

r..... ________-'

A PARTICULAR DESIGN)

~

Figure 13. Philosophy of design. The feedback becomes operable when a solution is judged to be inadequate and requires improvement. The dotted
elements represent a particular application [Asimow,
1962:51·

While there are examples of both in this book,
design by evolution predominates both in this
book and in the computer industry. Asimow's
first diagram (Figure 13), called Philosophy of
Design, shows the basic design process. Asimow lists the following principles [Asimow,
1962: 5-6].
1.

2.

3.

4.

5.

Need. Design must be a response to individual or social needs which can be satisfied by the technological factors of
culture.
Physical realizability. The object of a design is a material good or service which
must be physically realizable.
Economic worthwhileness. The good or
service, described by a design, must have
a utility to the consumer that equals or
exceeds the sum of the proper costs of
making it available to him.
Financial feasibility. The operations of
designing, producing, and distributing
the good must be financially supportable.
Optimality. The choice of a design concept must be optimal among the available alternat,ives; the selection of a

7.

8.

9.

10.

11.

12.

13.

19

manifestation of the chosen design concept must be optimal among all permissible manifestations.
Design criterion. Optimality must be established relative to a design criterion
which represents the designer's compromise among possibly conflicting
value judgments that include those of the
consumer, the producer, the distributor,
and his own.
Morphology. Design is a progression
from the abstract to the concrete. (This
gives a vertical structure to a design project.)
Design process. Design is an iterative
problem-solving process. (This gives a
horizontal structure to each design step.)
Subproblems. In attending to the solution of a design problem, there is uncovered a substratum of subproblems; the
solution of the original problem is dependent on the solution of the subproblem.
Reduction of uncertainty. Design is a processing of information that results in a
transition from uncertainty about the
success or failure of a design toward certainty.
Economic worth of evidence. Information
and its processing has a cost which must
be balanced by the worth of the evidence
bearing on the success or failure of the
design.
Bases for decision. A design project (or
subprobject) is terminated whenever
confidence in its failure is sufficient to
warrant its abandonment, or is continued when confidence in an available design solution is high enough to warrant
the commitment of resources necessary
for the next phase.
Minimum commitment. In the solution of
a design problem at any stage of the process, commitments which will fix future

20

COMPUTER ENGINEERING

14.

design decisions must not be made beyond what is necessary to execute the immediate solution. This will allow the
maximum freedom in finding solutions
to subproblems at the lower levels of design.
Communication. A design is a description of an object and a prescription for
its production; therefore, it will have existence to the extent that it is expressed
in the available modes of communication.

Asimow goes on to define the phases of a
complete project.
1.

2.

3.

Feasibility study. The purpose is to determine some useful solutions to the design
problem. It also allows the problem to
be fully defined and tests whether the
original need which initiated the process
can be realized. Here the general design
principles are formulated and tested.
Preliminary design. This is the sifting,
from all possible alternatives, to find a
useful alternative on which the detailed
design is based.
Detailed design. This furnishes the engineering description of a tested and producible design.

6.

7.

Planning for consumption. This includes
maintenance, reliability, safety, use, aesthetics, operational economy, and the
base for enhancements to extend the
product life.
Retirement of the product.

Obviously all of these activities overlap one
another in time and interact as the basic design
is carried out. Phister [1976] posits a model of
this process (Figures 14 and 15) and gives the
amount of time spent in each activity (Figure
16) for a hardware product.
Simon uses a more abstract model of design
for human problem solving, which he calls generate and test. In The Sciences of the Artificial,
Simon [1969] discusses the science of design and
breaks the problem into representing the design
problem alternatives, searching (i.e., generating
alternatives), and computing the optimum.
When it is too expensive to search for the optimum, as is often the case, satisfactory alternatives (which Simon calls satisficing alternatives)
must be selected and tested. For most parts of
computer design, the design variables are selected on the basis of satisfactory rather than
optimal choice. Simon also discusses the tools

TECHNOLOGY OEVELOPMENT

While the above are the primary design
phases, there are four succeeding phases resulting from the need for production and consumption by the outside world.
4.

5.

Planning the production process. This is
really another design process which is
simply a special case of design. The goal
is to design and build the system that will
produce the object.
Planning for distribution. This activity includes all aspects related to sales, shipping, warehousing, promotion, and
display of the product.

TECHNOLOGY
SPECIFICATiON
PLANNING
PROOUCT
SPECIFICATiON

J---"'---MANUFACTURING

(MARKETING

\

TiME (YEARS)

Figure 14. Hardware product development
schedule L comprehensive view [Phister. 1976).

SEVEN VIEWS OF COMPUTER SYSTEMS

PRODUCT

. . . ...

I

PROOUCT
SPECIFI'
CATIO

O

PROJECT
PLAN

------OEVElOPMENT

MANUFACTURING

PURCHASING

15

12

18

21

24

TIME (MONTHS)

Figure 15. Hardware product development schedule II.
development organization details (Phister, 1976].

16r------------------------------------,
12

z
o
;::

;:
z

:;
::;)

(,)

o
c

DOCUMENTATION
(16%)

12

24

ELAPSED TIME FROM START OF PROJECT (MONTHSI

NOTE:
Excludes 40 man-months of technology engineering
to develop ten plug-in modules.

Figure 16. Hardware development costs for developing
a $50.000 processor in 1974 (Phister. 1976].

21

of design, including the use of simulation both
as an alternative to building the complete system and as a method to evaluate the behavior of
various alternatives.
In addition to his contribution of the generate and test design model to the Practice of Design View, Simon's work has also contributed
indirectly to the first three views discussed earlier in the chapter. In his discussion of the importance of the design hierarchy, Simon
introduced the notion of architecture of complexity.
In the search for design optima, whether it be
by generate and test or some other algorithm,
the problem of design representation is often
encountered. The more representations one has,
the larger the number of design problems that
can be tackled and, hence, the closer one can get
to a global optimum. Most disciplines have at
least two representations: schematic and visual.
In chemical engineering, heat balance is obtained by thermodynamic equations, not from a
plant piping diagram. In the design of power
supplies, transformer design is accomplished
using equivalent circuits, not by using physical
representations. In the design of computer
buses, most designers work with timing diagrams, although state diagrams and Petri nets
are alternative representations.
In general, the importance of alternative representations in computer engineering is not well
understood. The large number of representations that exist at the programming level is deceptive. There are many different algorithmic
languages, but they differ mostly in syntax, not
in semantics.
It is too simplistic to think that computer design should be a well-defined activity in which
mathematical programming can be employed to
obtain optimum solutions. There are major
problems, five of which are listed below:
l.
2.

The cost function is multivariable.
The primary measure, performance, is
not well understood.

22

COMPUTER ENGINEERING

3.
4.
5.

The objective function that relates cost
and performance is not understood.
Objectives are not as objective as they
look.
There is a dynamic aspect (because the
technology changes rapidly) which is
hard to quantify.

These problems are explored in the following
extract from a discussion of design given in Bell
et al., [1972a:23-24].
Objectives can often be stated as maximizing or minimizing some measure on
a system. A system should be as reliable
as possible, as cheap as possible, as small
as possible, as fast as possible, as general
as possible, as simple as possible, as easy
to construct and debug as possible, as
easy to maintain as possible - and so on,
if there are any system virtues that have
been left out.
There are two deficiencies with such
an enumeration. First, one cannot, in
general, maximize all these aspects at
once. The fastest system is not the
cheapest system. Neither is it the most
reliable. The most general system is not
the simplest. The easiest to construct is
not the smallest, and so on. Thus, the
objectives for a system must be traded
off against each other. More of one is
less of another and one must decide
which of all these desirables one wants
most and to what degree.
The second deficiency is that each of
these objectives is not so objective as it
looks. Each must be measured, and for
complex systems there is no single satisfactory measurement. Even for something as standardized as costs there are
difficulties. Is it the cost of the materials
- the components? Does one use a listed
retail cost or a negotiated cost based on
volume order? What about the cost of
assembly? And should this be measured
for the first item to be built, or for subsequent items if there are to be several?
What about the costs of design? That is
particularly tricky, since the act of designing to minimize costs itself costs

money. What about cost measured in
the time to produce the equipment?
What about the cost of revising the design if it isn't right; this is a cost that may
or may not occur. How does one assign
overhead or indirect costs? And so on.
In a completely particular situation one
can imagine an omniscient designer
knowing exactly which of these costs
count and being able to put dollar figures on each to reduce them all to a common denominator. In fact, no one
knows that much about the world they
live in and what they care about.
The dilemma is real: there is no reducing the evaluation of performance in the
world to a few simple numbers. The solution is to understand what systems objectives are: they are guides to
understanding and assessing system behavior in various partial aspects. Various measures for each type of objective
are developed, and each shows something useful. Since all measures are partial and approximate (even
conceptually), rough and ready measures that are easy to make, display and
understand are often to be preferred to
more exact and complex measures.
Standard measures are to be developed
and used, even if not perfect. Experience
with how a measure behaves on many
systems is often to be preferred to a better, but unique, measure with which no
experience exists.

Although this book does not systematically
treat all the different system measures, many of
them are illustrated throughout the book. Table
4 provides a guideline, listing in one place the
components that contribute to overall cost and
performance.
The following list points out some tradeoffs,
taken from experience, among the various activities.
System Cost Versus Component Cost.

DEC sells products at each of the packaging
levels-of-integration - from chips to turnkey application systems. Because each product is constructed from lower packaged levels, and
because the levels model (View 3: Packaging

SEVEN VIEWS OF COMPUTER SYSTEMS

Table 4. Cost and Performance Components
for a System [Bell et al., 1972a:24]
Cost Components
Arising from the design effort
• Specifying
Designing (drawing. checking. verifyingj
• Prototyping
• Packaging design
• Describing (documenting)
• Production system design
• Standardizing
Arising from production
• Buying (parts)
• Assembling
• Inspecting
• Testing
Arising from selling and distribution
• Understanding
• Configuring (i.e., user designing)
• Purchasing
• Applying
• Operating in the environment (heat, humidity, vibration, color, power, space)
•
•
•
•

Repairing
Remodeling
Redesigning
Retiring

Performance Components
Arising from designing, producing, and selling environment
• For a single task
• For a set of tasks
operation times
operation rate
memory size and utilization
•

Reliability, availability, maintainability, and error rate
mean time between failures (MTBF)
availability (percent)
mean time to repair (MTTR)
error rate (detected, undetected)

Levels-of-Integration) strictly applies, it is very
difficult to have designs that are optimally competitive at every level. For example, if DEC sold
just hardware systems (cabinet level) it would
not need a boxed version of its central proces-

23

sors. The box ievel couid then be deleted and
the price of the systems product would be proportionately lower. When primitives are to be
used as building blocks, there is a cost associated with providing generality. For example,
some boxes have too much power for most of
their final applications because the powering
was designed for the worst possible configuration of modules within the box. (Some
boxes have too little power because increased
logic density was accompanied by increased
power density, permitting new worst-case configurations in existing boxes.)
Initial Sales Price Versus User Life Cycle
Cost. There is a cost associated with parts that

break and have to be repaired and maintained.
Nearly every part of the computer can be improved over a range of a maximum of a factor
of 10 to provide increased reliability (extended
mean time between failure) for a price. To the
extent that these costs are added, the product
will be less competitive in terms of a higher purchase price. However, if the total life cycle costs
are considered, the product may still be better
even at the higher initial cost.
Reliability, Availability, Maintainability
(and Producibility) Versus Performance. By

designing to take advantage of the fastest components and operating them at the limit of their
capability, one is able to have increased performance. In doing so, the tradeoff is clear: producibility, reliability (error rate), and
maintainability (ease of fixing) all generally suffer.
Performance Versus Cost. This is the most
traditional design tradeoff. In addition to the
conventional product selection, the planning of
a computer family further increases the selection/tradeoff process.
Early Shipment Versus Product Life and
Quality. Delivering products before they are

fully engineered for manufacture is risky. If
faults are found that have to be corrected in the
factory or field, the cost far outweighs any early
product availability.

24

COMPUTER ENGINEERING

Length of Time to Design Versus Product
Life. By allowing more time for design, a prod-

uct can be designed in such a way that it is easier to enhance. On the other hand, if
prospective customers, especially new customers, are faced with a choice between the competitor's available nonoptimum product and
your unavailable optimum product, they may
not be willing to wait.
Operating Environment Versus Cost. Here
there are numerous tradeoffs even within a conventional environment. In each of the packaging dimensions (heat, humidity, altitude, dust,
electromagnetic interface (EM I), etc.), there are
similar tradeoffs that may appeal to unique
markets or may simply translate to increased reliability in a given setting. The Norden 11 134M
is an example of packaging to provide a PDP-II
for the aerospace environment.
The principles of computer design and the
optimization efforts associated with those principles are parts of computer science and electrical engineering, the responsible disciplines.
From computer science come many of the technical aspects (such as instruction set architecture), much of the theory (such as algorithms
and computational complexity), and almost all
of the software design (such as operating systems and language translators) applied in the
Table 5.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.

practice of computer engineering. However, in
their construction, computers are electrical; and
the discipline that has fundamental responsibility is electrical engineering. Thus, discussion
of the Practice of Design View concludes with
Table 5, a set of maxims compiled by Don Vonada, an experienced DEC engineer. Many
other engineers in many other companies have
developed similar sets of maxims.
VIEW 7: THE BLAAUW
CHARACTERIZATION OF COMPUTER
DESIGN

Another view is based on the work of Blaauw
[1970]. He distinguishes between architecture,
implementation, and realization as three separable levels in the construction of anything, including computer structures.
The architecture of a computer system defines its functionality (behavior) as it appears to
the machine level programmer and can be characterized by the instruction set processor (lSP).
The implementation of a computer system is the
actual hardware structure - the register transfer
(RT) level behavior and data-flow organization.
This also includes various algorithms for controlling a machine as it interprets an architecture. Realization encompasses the actual

Vonada's Engineering Maxims

There is no such thing as ground.
Digital circuits are made from analog parts.
Prototype designs always work.
Asserted timing conditions are designed first; unasserted timing conditions are found later.
When all but one wire in a group of wires switch, that one will switch also.
When all but one gate in a module switches, that one will switch also.
Every little pico farad has a nano henry all its own.
Capacitors convert voltage glitches to current glitches (conservation of energy).
Interconnecting wires are probably transmission lines.
Synchronizing circuits may take forever to make a decision.
Worse-case tolerances never add - but when they do, they are found in the best customer's machine.
Diagnostics are highly efficient in finding solved problems.
Processing systems are only partially tested since it is impractical to simulate all possible machine states.
Murphy's Laws apply 95 percent of the time. The other 5 percent of the time is a coffee break.

SEVEN VIEWS OF COMPUTER SYSTEMS

technologies used and includes the kind of logic
and how it is packaged and interconnected. Realization includes all the details associated with
the physical aspects of the machine.
Modern architectures (ISPs) usually have
multiple (RT) implementations. For example,
the LSI-II, PDP-I 1/40, and PDP-I 1/60 are different implementations of the same basic PDP11 instruction set. Sometimes, although rarely,
a particular implementation has more than one
realization. For example, the IBM 7090 has the
same architecture and implementation (i.e., the
same ISP and RT structure) as the IBM 709.
The difference lies in realization: the 709 used
vacuum tubes, the 7090 transistors. For a more
recent example, two models of the PDP-ll architecture that share the same implementation
are the DEC PDP-II/34 and Norden's
II/34M. The realization differs, however, as
the latter uses militarized semiconductor components and component mountings, and a different packaging and cooling system. Table 6
attempts to clarify the distinguishing characteristics of architecture, implementation, and realization.

25

This book concentrates on the realization
and implementation columns in Table 6. Instruction set architecture is discussed only insofar as it interacts with the other two
chara2teristics. There are also some differences
between the views of Blaauw and Brooks [in
preparation] and those expressed in this book.
It is important to try to reconcile these differences, because everyone engaged in computer engineering uses the words "architecture,"
"implementation," and "realization" - quite
often to mean different things. This book will
not limit the definition of architecture to just a
machine as seen by a machine language programmer. Instead, it will use architecture to
mean the ISP associated with any of the machine levels described in View 2, Levels-of-Interpreters. Therefore, architecture standing
alone will mean the machine language, the ISP.
This book will also use architecture of the microprogrammed machine as seen by a microprogrammed machine's microprogram mer,
architecture of the operating system as the combined machine of operating system and machine language, and architecture of a language

Table 6. Characteristics of Design Areas [Blaauw and Brooks,
in preparation: Chapter 1]
Architecture

Implementation

Realization

Purpose

Function

Cost and
performance

Buildable and
maintainable

Product

Principles of
operation

Logic design

Release to
manufacturing

Language

Written
algorithms

Block diagram,
expressions

Lists and
diagrams

Quality
measure

Consistency

Broad scope

Reliability

Meanings
(used herein)

ISP
Machine ISP

RT level machine;
microprogrammed
sequential machine
(at logic level)

Physical
realization;
physical
implementation

26

COMPUTER ENGINEERING

for each language machine. For example, ALGOL, APL, BASIC, COBOL, and FORTRAN
all have as separate and distinct architectures as
a PDP-IO and a PDP-II do. This use of architecture, because it describes behavior, is quite
consistent with that of Blaauw. Moreover,
when applied to software structures, Blaauw's
framework fits well. There are two implementations, FORTRAN IV-PLUS (an optimizing
compiler) and the initial FORTRAN IV of the
one ANSI FORTRAN architecture. Moreover,
different implementations use different realization techniques: some use BLISS, others use assembler language.
Although Blaauw and Brooks define implementation and realization clearly, these definitions are not widely used. The main problem is
that both terms are sensitive to technology
changes and, hence, interact closely. Computer
engineers tend to overuse and intermix them so
that the two words are used interchangeably.
This is reflected in this book, where they are
used to have roughly the same meaning (e.g.,
"The KIlO processor for the PDP-IO was implemented using high-speed (H-Series) transistor-transistor logic. "). In Table 6, definitions
are given for the two words so that the reader
may further relate descriptions back to these
definitions. "Implementation" is the register
transfer level machine, roughly the micro-

programmed machine; "realization" is the
physical realization, the physical implementation in terms of packaging and technology.
The most useful distinction is between architecture, on the one hand, and implementation
(subsuming realization), on the other. Seeing
the distinction clearly enables one to preserve
architectural compatibility between machine
models, and this is crucial if users' and manufacturers' software investments are to be preserved. Implementation can then be as dynamic
as desired, being continually changed by technology. Architecture must remain static for
long periods (10 years is a common goal).
In 1949 Maurice Wilkes, only one month after his EDSAC computer was operational and
before any stored program computers in the
United States were operating, had already perceived the value in having a series, or set, of
computers share the same instruction set:
When a machine was finished, and a
number of subroutines were in use, the
order code could not be altered without
causing a good deal of trouble. There
would be almost as much capital sunk in
the library of subroutines as the machine
itself, and builders of new machines in
the future might wish to make use of the
same order code as an existing machine
in order that the subroutines could be
taken over without modification.

Technology Progress in
Logic and Memories
C. GORDON BELL, J. CRAIG MUDGE,
and JOHN E. McNAMARA

It is customary when reviewing the history of
an industry to ascribe events to either market
pull or technology push. The history of the auto
industry contains many good examples of market pull, such as the trends toward large cars,
small cars, tail fins, and hood ornaments. The
history of the computer industry, on the other
hand, is almost solely one of technology push.
Technology push in the computer industry
has been strongest in the areas of logic and
memory, as the case studies in the following
chapters indicate. Where the following chapters
give examples of the effects of the technology
push in these areas, this chapter explores individual elements of that push, with particular
emphasis on the role of semiconductors.
Semiconductor devices are discussed from
the viewpoint of the user because, until recently,
DEC has always bought its semiconductors (especially integrated circuits) from semiconductor
manufacturers, and its engineers (users of integrated circuits) have viewed the integrated circuit as a black box with a carefully defined set
of electrical and functional parameters. Most
design engineers will probably continue to hold
that view (and be encouraged to do so), even

though some integrated circuits will be supplied
by an in-house design and manufacturing facility. The advantages and disadvantages of inhouse integrated circuit design will be discussed
later in the chapter.
The portion of the discussion dealing with
semiconductors begins by presenting a family
tree of the possible technologies, arranged according to the function each carries out and
showing how these have evolved over the last
two or three generations to affect computer engineering. The cost, density, performance, and
reliability parameters are briefly reviewed; the
application of semiconductors, using various
logic design methods, is then discussed with
particular emphasis on how the semiconductor
technology has pushed the design methods.
The discussion of the use of semiconductors
in logic applications is followed by a section on
memories for primary, secondary, and tertiary
storage. While semiconductors have been a
dominant factor in technology push within the
computer industry. for both logic and memory
applications, magnetic recording density on
disks and tapes has evolved rapidly, too, and
must be understood as a component of cost and
as a limit of system performance.
27

28

COMPUTER ENGINEERING

The section on memory is followed by a section containing some general observations
about technology evolution: how technology is
measured, why it evolves (or does not), cases of
it being overthrown, and a general model for
how its use in computers operates and is managed.
SEMICONDUCTOR LOGIC TECHNOLOGY

A single transistor circuit performing a primitive logic function within an integrated circuit is
among the smallest and most complex of manmade objects. Alone, such a circuit is intrinsically trivial, but the fabrication process required for a set of structures to form a complete
integrated circuit is complex. For users of
digital integrated circuits there are several relevant parameters:
1.

2.

3.
4.

5.

The function of an individual circuit in
the integrated circuit, the aggregate
function of the integrated circuit, and
the functions of a complete integrated
circuit family such as the 7400-series.
The number of switching circuit functions per integrated circuit. This quantity and density is a measure of the
capability of the integrated circuit and
the ingenuity of the designers.
Cost.
The speed of each circuit and the speed
of the integrated circuit and set of integrated circuits within a family. The
semiconductor device family (transistortransistor logic = TTL, Schottky TTL =
TTL/S, emitter-coupled logic = ECL,
metal oxide semiconductor = MOS,
complementary MOS = CMOS, silicon
on saphire = SOS, integrated injection
logic = I2L) usually determines this performance.
The number of interconnections (pins)
to communicate outside the integrated
circuit.

6.

7.

The reliability. This is a function of the
circuit technology, the density, the number of pins, the operating temperature,
the use (or misuse), and the maturity (experience) of the manufacturing process.
Power consumption and speed-power
product. A frequently used metric is the
speed-power product, where the delay
through a typical gate is multiplied by
the power consumption of the gate. For
a particular technology, the speed-power
product tends to be constant because
short gate delays usually are accompanied by high power consumption. A
technical advance that lowers the speedpower product is considered noteworthy.

Figure 1 shows a family tree (taxonomy) of
the most common digital integrated circuits.
The least complex functions are in the upper
portion of the figure, and the most complex are
at the bottom. In addition, the circuits are ordered by generation, starting with the second
generation on the left side of the figure and
progressing to the fifth generation on the right
side. The circuits are clustered roughly by the
regularity of the function and whether memory
is associated with the function. Circuit regularity is important in large-scale integrated circuits because it is desirable to implement
regular structures to minimize area-consuming
interconnections and, thus, to simplify layout
and understanding and to aid testing.
As indicated in Figure 1, the branching of the
integrated circuit family tree began in earnest at
the beginning of the third generation. At that
time, advances in integrated-circuit technology
permitted collections of basic logic primitives
(AND, NAND, etc.) and sequential circuit
components (flip-flops, registers, etc.) to occupy a single integrated circuit rather than an
entire module. This had the benefit of providing
a drastic reduction in size between the second
and third generation computer designs, as

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

SECOND

THIRD

GEI\IERATiON

GENERATION

FOURTH
GENERATION

GENERATION

SEQUEN
REGISTER

,...._ _ _ _ _ _- -. . GATE ARRAY
PlA

TRAN
SISTOR

FPlA

en
-;
:0

C

~o

_ _ _- R A M

,..

RAM

(HIGH SPEED)

(HIGH DENSITYI

......~..-..-~~+~~+--_SWRAM

1"---"

ISlOWWRITEI

ROM

! - - -.. PRDM
---_EAROM
AND
MEMORY

SECOND
GENERATION

Figure 1.
functions.

..........-.-,;,,;,,;,;;.----. . CCD

THIRO
GENERATION

FOURTH
GENERATION

FIFTH
GENERATION

Family tree of digital integrated circuit

shown most vividly by comparing the PDP-9
and PDP-IS (Chapter 6), but it also had the
drawback that modules contained a wide variety of functions and were thus specialized.
As the densities began to approach 100 gates,
the construction of complete arithmetic units
on a single chip became possible. The earliest
and most famous function, the 74181 arithmetic
logic unit (ALU) shown in Figure 2, provided
up to 32 functions of two 4-bit variables. By the

29

fourth generation, it became possible to construct on a single chip very large combinational
circuits, such as a complete 16 X 16-bit multiplication circuit (e.g., the TRW Corp. multiplier) requiring about 5,000 gates.
Progress during the fourth and fifth generations has not been without its problems, however. Without well defined functions such as
addition and multiplication, semiconductor
suppliers cannot provide high density products
in high vol ume because there are few largescale, general purpose universal functions. The
alternative for users is to interconnect simple
logic circuits (AND gates, flip-flops), but that
does not permit efficient use of the technology,
and the cost per function remains high (about
that of the third generation) because the printed
circuit board and integrated circuit packaging
costs (pins) limit the cost reduction.
To address these problems, two methods of
effectively customizing large-scale integrated'
circuit logic are included in Figure 1 and discussed in greater detail later in the chapter.
These are the programmable logic array (PLA)
and the gate array (also called master slice). The
programmable logic array (PLA) is an array of
AND-OR gates that can be interconnected to
form the sum-of-products terms in a combinational logic design. Gate arrays are simply
a large number of gates placed on the chip in
fixed locations where they can be interconnected during the final metalization stages
of semiconductor manufacture.
There is a special branch of the tree shown in
Figure 1 purely for memory functions. Memory
is used in the processor as conventional memory, but it can also be used as an alternative to
conventional logic for performing combinational logic functions. For example, the inputs to a combinational function can be used as
an address, and the output can be obtained by
reading the contents of that address. Memory
can also be used to implement sequential logic
functions. For example, it can be used to hold
state information for a microprogram. Because

30

COMPUTER ENGINEERING

~~ :!~
51
50

(5)
(6)
(171_
)O,.-------'-G OR Y

83 OR B3 (18)

_
B2

OR

B2

(201

A20RA2~(2~1~)_ _~~~

li1 OR B1 (22)

80

OR BO 11)

o OR

FO

AOORAO~12~)----~~

Table 2

Table 1
Active low Data
Selection
S3

S2

S1

SO

L
H
H
H
H
H
H
H
H
H
H
H
H

L
H
H
L

H
H
H
H
H

L
H
L
H
H

H

H

M=H
logic
Functions

F=A
F = Ali
F =A + B
F= 1
F=A+s
F=S
F = A"'Ef5"B
F = A+S
F = AB
F=A(i1B
F= B
F=A+B
F=O
F = AS
F = AB
F=A

M = l; Arithmetic Operations
Cn - H
Cn - l
(With Carry)
(No Carry)
F=A
F = A minus 1
F = AB minus 1
F = AB
F = AS minus 1
F = AS
F = Minus 1
F = Zero
12's comp,)
F = A plus IA + S)
F = A plus IA + S) plus 1
F = AB plus (A + S)
F = AB plus (A + S) plus 1
F = A minus B minus 1 F = A minus B
F=A+S
F = (A + S) plus 1
F = A plus (A + B)
F = A plus (A + B) plus 1
F = A plus B plus 1
F = A plus B
F = AS plus IA + B)
F = AS plus IA + B) plus 1
F = IA + B)
F = IA+B) plus 1
F = AplusA*
F = A plus A plus 1
F = AB plus A
F = AB plus A plus 1
F = AS plusA
F = AS plus A plus 1
F= A
F = A plus 1

*Each bit is shifted to the next more significant position.

Active High Data

Selection
S3

S2

S1

SO

H
H
H

H
H

H
L
H

H
L

H
L

H

H

H

H
H
H
H

H
H
H
H

M=H
logic
Functions

F= A
F=A+B
F = AB
F=O
F =AB
F=S
F=A(i1B
F = AS
F= A + B
F=A(i1B
F=B
F = AB
F= 1
F=A+S
F=A+B
F= A

M = l; Arithmetic Operatiorn;

Cn = H
(No Carry)

(;n = L
(With Carry)

F = A plus 1
F=A
F = IA + B) plus 1
F=A+B
F = (A + S) plus 1
F =A + B
F = Zero
F = Minus 1
(2's complement)
F = A plus AS plus 1
F = ApiusAS
F = (A + B) plus AS plus 1
F = (A + B) plus AS
F = A minus B minus 1 F = A minus B
F = AS
F = AS minus 1
F = A plus AB plus 1
F = ApiusAB
F = A plus B plus 1
F = A plus B
F = IA + 8) plus AB plus 1
F = (A + 8) plus AB
F = AB
F = AB minus 1
F
= A plus A plus 1
F = A plus A"'*
F = (A + B) plus A plus 1
F = IA + B) plus A
F = (A + 8) plus A plus 1
F = IA + S) plus A
F=A
F = A minus 1

* Each bit is shifted to the next more significant position,

Figure 2. A functional block diagram of the 181 arithmetic logic unit (courtesy of Texas Instruments, Inc., from TTL
Data Book. 2nd edition, 1976, p. 7-273, 7-280).

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

memories have so many uses, this branch is discussed separately in the memory section of this
chapter.
The remainder of the interesting logic functions include combinations of logic and memory. There are various special functions such as
linear predictive coding algorithms for use in
real-time applications and data encryption algorithms for use in communication systems.
One of the most useful communications functions, and the first one to use large-scale integration, is the Universal Asynchronous
Receiver ITransmitter (UART).
There is a special branch for bit-slice components that can be combined to form data
paths of arbitrary widths. These are being used
to construct most of today's high speed digital
systems, mid-range computers, and computer
peripherals. Although there have been several
bit-slice families, the AMD Corp. 2900-series
whose register transfer diagram is shown in Figure 3 has become the most widely used. Note
that all the primitives of this series were present
in the Register Transfer Module Family (Chapter 18), including the microprogrammed control
unit referred to as the Programmed Control Sequencer.
The final branch of the tree in Figure 1 is the
most complex and is used to mark the fourth
(microprocessor-on-a-chip) generation of technology and the beginning of the fifth (computer-on-a-chip) generation. The fourth
generation is marked by the packaging of a
complete processor on a single silicon die; by
this standard, the fifth generation has already
begun since a complete computer (processor
with memory) now occupies a single die. The
evolution in complexity during each generation
simply permits larger word length processors or
computers to be placed on one chip. At the beginning of the fourth generation, a 4-bit processor was the benchmark; toward the end of the
fourth generation, a complete 16-bit processor
such as the PDP-II could be placed on a single
chip.

31

Figure 3. AMD2900 four-bit microprocessor slice
block diagram (registers and data path).

Gates per Chip

The function performed by a chip is clearly
dependent on the number of gates that can be
placed on a chip. Thus, density in gates per chip
is the single most important parameter determining chip functionality. By this measure, one
can predict the functions likely to be implemented by just following the tree. It should be
noted that the whole tree is relatively alive and
has dense areas of new branches everywhere except at the top, where unconnected gate and
register structures have been relatively static. In

32

COMPUTER ENGINEERING

the growing areas, as density increases sufficiently, a new branch grows. For example, the
processor-on-a-chip started out as a 4-bit processor (or rather as 2 chips for a single processor) and then progressed to 8-bit and then 16bit processors on a single chip. Similar effects
can be observed with the arithmetic logic unit
and with memories.
The number of gate circuits per chip not only
determines chip functionality, it also is the measure of density as seen by a user (Figure 4). This
metric is the product of the circuit area and the
number of circuits per unit area. Progress in
lithography has led to a reduction of conductor
linewidths and a corresponding reduction of
circuit size to yield higher speeds and higher
densities. Linewidths have decreased from 10
microns in early large-scale integrated circuit
chips to 6 microns in the LSI-II chips, and
more recently to 3 or 4 microns in Intel's 8086.
Linewidths of less than a micron have been
achieved at the research level, but they require
electron beam techniques instead of present
photographic methods of production. The processing techniques to create semiconductor materials have also been improved for better manufacturing yields (and lower costs). Circuit and
device innovation (such as reducing the number
of transistors per memory cell) have also contributed to density and yield increases.
The result given in Figure 4 is exponential
and indicates that the number of bits per chip
for a metal oxide semiconductor (MaS) memory doubles every two years according to the
relationship:
Number of bits per chip = 2t-1962
There are separate curves, each following this
relationship, for read-only memories in prototype quantities, read-only memories in production quantities, read-write memories in
prototype quantities, and read-write memories
in production quantities. Thus, depending on
the product and the maturity of its production
process, products lead or lag behind the above

SSI

MSI

LSI

65.536
(65K)
16.384
116K)
4.096
14K)
1.024
11 K)
256

64

16~"
1959

1964

1969

1974

1979

YEAR

Figure 4. Components per single integrated circuit die
versus time. Number of components per circuit in the
most advanced integrated circuits has doubled every
year since 1959. when the planar transistor was developed. Gordon E. Moore. then at Fairchild Semiconductor.
noted the trend in 1964 and predicted that it would continue (from [Noyce. 1977:671; courtesy of Scientific
American).

state-of-the-art time line by one to three years
according to the following rules:
• Bipolar read-write memories lag by two to
three years.
• Bipolar read-only memories lag by about
one year.
• MaS read-only memories lead by one
year.
This model gives the availability of various
sizes of semiconductor memories as shown in
Figure 5. The significance of various size memory availabilities is that they determine (technology push) when certain architectures and
implementations can occur. The chapter discussing the PDP-II (Chapter 16) uses this
model to show how semiconductors accomplish
this push.

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

I~:l
~

1975

:>

(BITS/CHIP)

(BITS;'
CHIP)

33

(BITS/CHIP)'

65,536
2020
VAX·ll
16,384

'''j

PDP·11/60

8,192

4,096

PDP11/S5

4,096

12 l

4,096

1,024

1,024

256

PDP·ll/03,04
PDP·l1/70

2,048

ECl·l00K

VT78

SUPER
HEX
.< 12)

IS 6

2040, SO
PDP·11/34
KL10
PDP·8/A

NMOS
ECll0K

1,024
CMOS

NHOS

f

PDP·11/40
KilO
PDP.11/45

TTl/lS

I

PMOS

2

1970

0

PDP·11120
PDP·15

TTL/S

ECll0K

>=

g

PDP·9/l

16

~
0

TTL/l

PDP·8/1
KA10

TTL/H

PDP·8/S

PDP·9
ROPE
MEMORY

a:

I.

1965

2

0

>=

VARIOUS

ECl

IBM
SYSTEMI
360 INTRO

TRANS- {
FORMERS
AND
CAPACITORS
ROMS FOR
IBM/360

SI. TRANS
DTL
)C PATENT

i

PDp·S

M·SERIES

PDp·12
PDP8/l

TTL
(SWITCH TO
+ POLARITY
LOGIC)

PDP·9

2 MHz-R

PDP7
PDP·6

10MHz-B
FLIP CHIP 1 MHz-R
(2 S X S
S "< S) lOMHz
1 MHz

PDP·4

(CD- GATE
PDP·l

0

~

EXTENDED
QUAD SIZE
(10.4 X 85)

PDP·8

TTL
RTL

TTL/S
HEX·SIZE
(156X85)

PDP·8/E,F

256

64

PDP·11/05

.~

PDp·l
PROTO

1960
TX·2
38 X 256 X 2S6
CORE

SILICON·
CONTROllED
RECTIFIER

MIT LINCOLN
LABS TX2

MIT LINCOLN
LABS TX·O
BELL LABS
LEPRECHAN
(TRANSISTOR)

SYSTEMS
AND LAB
MODULES

500 kHz
5 MHz

1955

WHIRLWIND

MIT
WHIRLWIND
16 X 32 X 32
CORE

1950

TRANSISTOR
(POINT
CONTACT)

Hg DELAY
LINE

CAMBRIDGE
EDSAC

STORAGE
TUBE
(WilLIAMS
TUBE)

MANCHESTER
MARK 1

ENIAC
1945
SEMICONDUCTOR
lOGIC TECH

Figure 5,

---MEMORY---READI
WRITE

FAST
READIWRITE

READ ONLY
(BIPOLAR)

MACHINES
(ESP. DEC)

DEC
MODULE lOGIC
CLOCK
SPEED
DEC
MODULES

Logic and memory technology evolution timeline.

Cost

After density, the most important characteristic of integrated circuits is cost. The cost of
integrated circuits is probably the hardest of all
the parameters to identify and predict because it
is set by a complex marketplace. For circuits
that have been in production for some time, and

for memory arrays, the price is set in essentially
the same way as the price of a commodity like
eggs or bacon is set; and users generally consider these integrated circuits as very similar to
commodities, with the attendant benefits, costs,
and problems (having a sufficient supply). In
low volumes, integrated circuit prices are proportional to the die cost (which is proportional

34

COMPUTER ENGINEERING

to the die area); but at higher volumes, assembly, testing, packaging, and distribution become the dominant cost factors. Furthermore,
for those low volume circuits that have not yet
reached commodity status, the prices also depend on the strategy of the supplier - whether
he is willing to encourage competition.
Two curves are presented to reflect the price
of various components (transistors) implemented in integrated circuits. Figure 6 shows
the price per gate for MOS and TTL circuits as
a function of time and scale of integration.
Table 1 gives some idea of how circuit density
(in elements) relates to actual function.
The cost history of integrated circuits is reflected very dramatically in the cost history of a
special class of integrated circuits, semiconductor memory. The semiconductor memory cost curves, given in Figure 7, are also
interesting because of the important role of
memory in past and future computer structures.
As shown in the figure, the 1978 cost per bit was
roughly 0.08¢ and 0.07¢ per bit for the 4-Kbit
and 16-Kbit integrated circuit chips, respectively, giving costs of $3.30 and $11.50.
Two factors influence the cost of integrated
circuits: density in bits per integrated circuit
and cost per bit. The two factors have not had
equal influence in reducing costs because, while
chip density has improved by a factor of 2 each
year (Figure 4) [Noyce, 1977], the cost per bit
(at the integrated circuit level) has not declined
by a factor of 2 every two years. The equation
for the line drawn in Noyce's [1977] Figure 7 is:

YEAR

Figure 6.

Price per gate versus time.

0.5 , - - - - - - - - - - - - - - - - - - ,

0.2

0.1

0.05

0.02

1973

1975

1977

1979

1981

1983

YEAR

Cost/bit (¢) = 0.3 X 0.72t-1974
It is interesting to note that the cost decline
compares favorably with the price decline in
core memory over the period since 1960-1970
for the 18-bit computers (Chapter 6), and with
the memory price declines in both the PDP-8
(Chapter 7) and the PDP-I0 (Chapter 21).

Figure 7. Cost per bit of integrated circuit memory versus time. Cost per bit of computer memory has declined
and should continue to decline as is shown here for successive generations of random-access memory circuits
capable of handling from 1,024 (1 K) to 65,536 (65 K)
bits of memory. Increasing complexity of successive circuits is primarily responsible for cost reduction, but less
complex circuits also continue to decline in cost (adapted
from INoyce, 1977: 691; courtesy of Scientific American).

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

Table 1.

35

The Number of Area! E!ements to Implement logic Functions in Different Technologies
MOS

Bipolar

Function

NMOS

PMOS

CMOS

Eel

TTL

Inverter
2-input gate
8-input gate
RIS latch
Memory cell (dynamic)
Memory cell (static)
D flip-flop
JK flip-flop

2
3
9
6
2
6
20
20

2
3
9
6
2
6
20
20

2
30r4
9 or 16
6or8
2
6
20 or 28
20 or 36

7
8
14
12

3

4-6
28

4-6
20
26

Performance

The performance for each semiconductor
technology evolves at different rates depending
on the cumulative learning associated with the
design and manufacturing processes together
with marketplace pressure to have higher performance for the particular technology. One
may hypothesize that each technology can be
looked at as being relatively appealing or relevant to the particular design(er) styles associated with various computer marketplaces. One
would then expect the evolution to continue
along the lines shown in Table 2.
DEC's use of the various integrated circuit
technologies shown in Table 2 is probably typical of most of the computer industry: TTL for
mid- and high-sized minicomputers; ECL for
the larger scale machines (PDP-IO); MOS for
memories, microprocessors, and specialized
high density circuits; and CMOS for special microcomputers, especially those intended for battery operation.
Some of the lesser used technologies such as
FL (integrated-injection logic) and SOS (silicon
on saphire) have been omitted from the table.
FL features high density and very low power
consumption, but it is slow as initially implemented. SOS MOS enhances CMOS speed by
removing stray capacitance, making it com-

3
3
6

I2l
1
2
2
2
4
9
11

parable with low power Schottky (TTLjLS)
speed while retaining MOS complexity capabilities. Both FL and SOS have been touted as replacements for various technologies shown in
the table. But, if an entrenched technology has
evolved for some time and continues to evolve,
it is difficult for alternative technologies to displace it because of the investment in process
technology and understanding. Semiconductors
appear to be characteristic of other technologies
in that usually only a single technology is used
for a given problem.
The early technologies, R TL (resistor transistor logic), TRL (transistor resistor logic), and
DTL (diode transistor logic) have also been
omitted from the table. These technologies are
important historically because they were used in
the first integrated circuits. However, many
manufacturers, including DEC, did not use
them in computers (RTL was used in DEC industrial control modules) because they did not
represent a sufficient advance over the discrete
transistor circuits already being used. In addition, early circuits were packaged in flat packages and metal cans rather than in the dual inline package used today, and automated manufacture using the components was thus not economically feasible.
Table 3 gives the speed-power product and
the gate delay, the two most useful measures of

36

COMPUTER ENGINEERING

Table 2.

Characteristics of Dominant (1978) Semiconductor Technologies

Type

Evolution

Use

TTL (transistor-transistor logic)

TTL
TTLlSchottky
TTLllS

logic. bus interfacing
Higher speed than TTL
Same speed as TTL. but low power

ECl (emitter-coupled logic)

MEClll.l1I
MECl 10 K. 100 K

High and higher performance
Easier to work with
Evolving to gate array design

MOS (metal oxide semiconductor)

p-channel
n-channel

low cost
Greater densities. cost
Evolving to performance (memory)
Evolving to shorter channels: HMOS.
VMOS

CMOS (complementary MOS)

Table 3.

CMOS

low power, higher speed
Better noise immunity

Gate Delay of Various Semiconductor Technologies [Luecke, 1976:53]*

Year

Type
of
Logic

11963
11964
1965
1967
1968
1970
1972

DTl
RTl
TTL
TTUH-series
TTL
TTL (Schottky)
TTL (low power Schottky)

1967
1974

ECl
ECl

1970
1973
1973
1974
11976
11978

PMOS
NMOS
CMOS
SOS
NMOS
HMOS

1975
1976

12 l
I2l

Power
Gate
Dissipation
Delay
(nanoseconds) (milliwatts)

10
5
30
3
10
2
0.7

Speed- Power
Product
(pico joules)

20
2

2001
1801
100
100
30
60
20

30
43

60
30

10
20

200
100
30
15
4
0.9

0.1
0.1
1.0
0.05

20
10
30
7.5
41
0.91

35
20

0.085
0.05

3.0
1.0

*The four entries in brackets have been added by the authors.

DMOS,

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

performance, for the various technologies as
they have evolved with time. The speed-power
product metric for a technology at a given time
indicates what performance versus power tradeoffs the user can make. There are limits to this
tradeoff. Only about one watt can be dissipated
by the off-the-shelf integrated circuit package,
and tradition in integrated circuit package design has been strong. The table was formulated
by Jerry Luecke of Texas Instruments (TI) at a
time when PL technology had just been introduced (October, 1975) by TI.
Reliability

Over the past 15 years, the failure rate for
standard integrated circuits has been reduced
by two orders of magnitude to the neighborhood of 0.01 percent per 1,000 hours. This corresponds to 10 7 hours (about a millenium) mean
time to failure (MTTF) per component. Figure
8, from a recent survey article by Hodges
[1977:63], shows the trend. The lower curves
show the higher reliability obtained when more
extensive testing and screening are employed.
The improved MTTF of between 108 and 109
are obtained at a cost increase of 4 to 100 times
per component.

100.0.----------------------.

en

0:

::>

~

o
o

o

>1:::

~
-

0.01

____

----------- -----

::>

~

r~~~~;~~l-----­

~

0.001
0:

\.~-WITH CAPTIVE/CONTRO~

0.0001

LINE ASSEMBLY

1961

1963

1965

1967

1969

1971

1973

1975

1977

YEAR

Figure 8. Failure rate of silicon integrated circuits.
(Rate of 0.0001 percent per 1.000 hours is 10 9 hours
mean time to failure.) [Hodges. 1977:631

37

I/O Connections
The number of pins per integrated circuit
package has risen relatively slowly because of
the mechanical handling equipment (e.g., sorters, bonders, testers, inserters) to the point
where 48 pins has just become accepted in 1978.
The packages of the 1980s will no doubt go beyond 100 with the ability for multiple die per
package.
The Large-Scale Integrated Circuit
Dilemma

As indicated in the discussion of Figure 1, a
dilemma involving a search for universal circuits has developed in the manufacture of largescale integrated (LSI) circuits. The economics
of the LSI industry make it essential that integrated circuit suppliers produce circuits with a
high degree of universality. This is because the
learning curve of a manufacturing process
causes cost to be inversely proportional to volume, and for a design to be sold in high volume,
it must be usable in a large number of applications. However, the trend in circuit complexity, which allows semiconductor
manufacturers to put more transistors on a constant die area each year, tends to increase specialization of function, lowering the volume and
raising the price.
The LSI product designer is therefore continually in search of universal primitives or building blocks. For a certain class of applications,
such as controller applications, the microprocessor is a fine primitive and has been so exploited [Noyce, 1977]. For other applications,
circuit complexity can embrace even higher
functionality at the processor-memory-switch
level. The Intel 827X is an interesting example:
two processors, a 1.25-microsecond byte-processor and a 250-nanosecond bit-processor, are
combined in one large-scale integrated circuit
[Louie et at., 1977].

38

COMPUTER ENGINEERING

Moore [1976] discusses the LSI dilemma in a
paper on the role of the microprocessor in the
evolution of microelectronic technology. He
points out that a similar situation existed when
integrated circuits were first introduced. Users
were reluctant to relinquish the design prerogative they had when they built circuits from
discrete components. It was not until substantial price reductions were made that the impasse was broken. Then the cost advantages
were sufficient to force users to adopt the new
technology circuits.
The first high functionality, high universality
circuit that comes to mind is the micro~
processor-on-a-chip. For many applications, including most computer systems, the
microprocessor-on-a-chip is not a cost-effective
building block, and other solutions to the dilemma are used. For example, microprogramming is a highly general way of
generating control signals for data path elements, and table lookup using read-only memories is a highly general technique. Both methods
are attractive because they use memory, an inherently low cost LSI circuit. Microprogramming, however, does have limitations.
The extra level of interpretation extracts a performance penalty, and some potential data path
parallelism is often given up to reduce cost. A
more subtle, but practical, limitation is the development cost of microcode. Assuming the
writing rate to be 700 microwords per man-year
for wide-word, unencoded (horizontal) micromachines, a desire to limit the effort to 20-24
man-years would limit the maximum control
store size to about 16 Kwords. This maximum
will tend to increase in the future, when the use
of better microprogramming tools increases the
microcode writing rate beyond 700 microwords
per man-year.
At the register transfer level, the standard microprogramming design method is (conservatively) twice as expensive per instruction as
conventional programming. Moreover, because
microinstructions are usually not as powerful as

conventional instructions, more microinstructions than conventional instructions are
usually required to solve a given problem.
These two factors, more expense per instruction
and more instructions, cause a microprogram
to be five to ten times as expensive to design as a
conventional program to solve the same problem. However, the instruction execution speeds
of a microprogrammed controller are at least 10
times faster than the instruction execution
speeds of a conventional mini.
The characteristics of microprocessor and
read-only memory design methods of creating
customized results from universal large-scale integrated circuits are summarized, along with the
characteristics of anum ber of other methods, in
Table 4.

Table 4. Design Techniques for Various LSI
Building Blocks
Technique
for Varying
Function

Degree
of
Permanence
Generality of Change

Computer
module

Program

Very
high

None

Microprocessor

Program

High

Low to
medium

Bit-slice

Microprogram

Medium

Medium

ROM

Factory mask
change

Very
high

Irreversible

PROM

Field change

Very
high

Irreversible

EAROM.
EPROM

Field change

Very
high

Low

PLA

Factory mask
change

Medium

Irreversible

FPLA

Field change

Medium

Irreversible

Gate
array

Factory mask
change

Medium

Irreversible

RAM

Write

Very high

None

Building
Block

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

The increased basic circuit functionality
available at each new generation has not only
been an important part of semiconductor design, but has also caused design methods to
change with the generations. This book provides examples; as summarized in Table 5.
The design of most relatively high speed
digital systems (including low- to mid-range
minicomputers) is carried out using standard
register transfer integrated circuits complete
with data path and memory. For higher performance computers, there is no alternative to
using either tightly packed standard integrated
circuits or building a unique set of integrated
circuits using some form of customization. The
high performance IBM and Amdahl machines,
for example, use custom ECL circuits or gate

Table 5.

arrays to improve packaging. Alihough Seymour Cray continues to build his high speed
computers (the CDC 6600, 7600 and Cray 1)
with no custom logic, he does so by using impressively dense modules with high density interconnection and freon cooling.
The current spectrum of integrated circuits
and their use is summarized in Table 6.
The Changing Nature of System Design

With the advent of the processor-on-a-chip,
digital system design has been, or soon will be,
converted completely to computer system design (design at the processor-memory-switch
level of Chapter 1, View 1). Problems such as
controlling a CRT, controlling a lathe, building

Design Method versus Generation
Generations

Design Method

First

Second Third

Combinational and sequential; use of
"standard" modules, integrated circuits

s

s

Read-only memory
programming

and

PLA;

micro-

Programming using micros and logic for
interfaces

s

p

Fourth

p

-

The standard method for most digital systems
Done by manufacturers of basic equipment
Also used
Prelude to micros, also done using minis

m

Examples in
this Book
18-bit;
PDP-8
PDP-9;
PDP-11

m

s

m

CMU-11

s

x

LSI-11

s

Cm*

m

LSI-11

PMS design using completely specified
and predesigned microcomputer components
Customized chip design and standard
(logic) design (high performance)

Fifth

s

Microprogramming with standard RT elements (high performance) minor logical
design

s
m
x
p

39

m

40

COMPUTER ENGINEERING

Table 6.

Integrated Circuit Organization and Use in Various Computers
Unique
Chips

Performance
(MIPS)

Cost

Examples

0.1

lowest

Intel 8048. MOSTEK
3870

Organization

Technology

Microcomputer

MOS. very
large-scale
integration
(VlSI)

Microprocessor

MOS

Microprocessor

MOS

2-4

DEC lSI-11.
Fairchild F-8

Microprocessor

MOS

>4

Burroughs B80.
National IMP 16

Bit-slice
(microprogrammed)

TIL

Few

DEC 11/34
Floating- Point
Processor

Gate array

Tll

Most

Raytheon RP16.
IBM Series 1

Medium-scale
integration

TIL

Few

DEC VAX 111780. 11170.
HP 3000

Gate array

ECl

All

IBM 370/168.
Amdahl 470/v6

Small-scale
integration

ECl

Std.

Intel 8080. Zilog Z80.
Motorola 6800

a billing machine, or implementing a word processing system become computer system design
problems similar to those attacked over the first
three generations. The hardware part of the design, the interface to the particular equipment,
is straightforward. The major part of the design
is the programming. Since the late 1940s, three
generations have learned about computer design, especially programming. The first generation discovered and wrote about it. Then it
was rediscovered and applied to minicomputer
systems. This time, it is being learned by everyone who must use and program the microcomputer. Each time, for each individual or
organization, the story is about the same:
people start off by programming (using binary,

80

Highest

CRAY 1

octal, or hexadecimal codes) small tasks, using
no structure or method of synchronizing the
various multiple processes; the interrupt mechanism is learned, and the symbolic assembler is
employed; and finally some more structured
system, possibly an operating system, is employed. Occasionally, users move to high level
languages or macroassemblers.
In view of this cyclical history, it seems likely
that current digital systems design practice,
which consists of building simple hardware interfaces to relatively poorly defined buses together with programming the application, will
be relatively short lived. The design method of
the future (fifth generation) will be at the PMS
level component, although at the moment there

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

are several factors that prevent this from being
done reliably and cheaply by large numbers of
engineers.
One factor which impedes this progress to the
fifth generation is the (fundamental) interconnect problem. Currently, many small-scale
integration components are required to handle
the mismatch between microprocessor chips
and memory and I/O subsystems. Furthermore, buses are hard to specify, as will be discussed in Chapter 11.
Another impediment is that system level behavior (the interaction of processors, memories,
and transducers via switches and links) is less
understood than is interaction at the register
transfer level.
Of substantial assistance in easing the transition to the fifth generation would be base level
operating systems that were embedded in hardware. These should be placed in read-only
memory to give a feeling of permanence so that
users would be less likely to embark on the expensive, unreliable rediscovery path.
In summary, standard components must be
built that can be interfaced to a wide range of
external systems, via clearly defined links, using
parameters that are specified by a field programming method (instead of using logic design
and building with interconnection on modules).
In this way, the complexity of individual integrated circuits can be increased; and with a
standard method for interconnection, higher
volume and lower costs will result.
Design Costs versus Unit Costs

Before discussing the alternatives associated
with integrated circuit design, it is important to
characterize the various costs. Figure 9 shows,
at a crude level, what the relative design costs
might be for various inter- and intra-integrated
circuit design methods. The design cost is highly
variable depending on the project size, its goals,
the manufacturing volumes expected, and most
important, the computer aided design programs
that are available.

h : :;:"~'"
/'

~
*
o

GATE ARHAY

II

~

/

/

2

HYPOTHETICAL
UNIVERSAL lOGIC
ARRAYS (SEE NOTE)

1

41

USER

~C~S~GNED

t

INTRA-IC
DESIGN

J

_

-=::;~~/==STANOARO
Ckts. lOGIC DESIGN
<
ROM/PLA DRIVEN DESIGNS

o

_

-

<

}

--~~:K~~R~g~~~~~~~ROGRAMMING ~~:~~I~=D

-_PROGRAMMING - USING
MICROPROCESSORS

CIRCUIT DENSITY _

NOTE
None exist to date.

Figure 9. Current design cost (or time) versus circuit
density using various design methods.

The lowest design cost is achieved by staying
completely a way from modifying the integrated
circuits, except for programming read-only
memories. There are two elements to the cost of
read-only memories, programming cost and
parts cost. The programming cost has already
been discussed, so this discussion is limited to
parts cost. There are two kinds of read-only
memories, the programmable read-only memory (PROM) and the masked read-only memory (ROM). PROM chips have a higher initial
cost than ROMs, but they provide some inventory advantages in a manufacturing environment because a common stock of unprogrammed parts can be divided into various programmed parts rather than stocking a full supply of each required part. In many high volume
applications, however, the cost of the extra testing steps involved in the common stock approach, plus the extra piece part costs for
PROMs, make masked ROMs preferable.
The design costs discussed in the preceeding
paragraphs are summarized in Figure 10, which
shows the costs for conventional programming,
costs for microprogramming, and the design

42

COMPUTER ENGINEERING

,

/

CUSTOM DESIGN

,~ ~

STANDARD CELL

GATE ARRAY
(ASSUME A FAMILY)

number of explicit physical connections, including the bonds to the semiconductor die. Thus,
the anticipated reliability of two equal functionality designs can be compared by counting discrete circuit pins, integrated circuit pins,
module pins, and connector pins.

STANDARD CIRCUITS,
LOGIC DESIGN
ROM/PLA DESIGN
USING COMBINATIONAL
DESIGN
MICROPROGRAMMING
STANDARD PARTS
DESIGN
PROGRAMMING

SSI

MSI

LSI

VLSI
CIRCUIT DENSITY

-+

Figure 10. Manufacturing costs versus LSI circuit
density for various design techniques.

costs for methods which use combinational
techniques rather than programming techniques. These latter methods, employing readonly memories and programmable logic arrays,
will be discussed shortly. The most costly approach of all shown in Figure 10, excluding intra-Ie design, is design using standard circuits
and associated design techniques.
Design of Integrated Circuits (lntra-IC
Design)

Despite the prospects of higher design cost
with custom integrated circuits than with standard integrated circuits, and, in some cases,
higher manufacturing cost, there are numerous
reasons that a designer is often forced to design
integrated circuits. These are summarized in
Table 7.
There are some drawbacks to custom integrated circuit design. These are listed in Table
8.
The use of custom integrated circuits to reduce the number of discrete components or to
reduce the total number of integrated circuits in
a machine improves the reliability because the
reliability of a system is mostly a function of the

Gate Array Design

The most straightforward and extensively
used intra-integrated circuit design method is to
modify an existing design. If this approach cannot be used, the next most straightforward
method is to use arrays of gates and interconnect them to form the desired function. Design with gate arrays occurs in a completely
defined environment because there is only one
circuit from which the gate is formed and the
gate can be completely characterized. The manufacture of gate arrays is fairly simple because
the fabrication technique of all but the last few
semiconductor processing steps is identical for
all designs. The customization, accomplished
by interconnection of the gates by metal, is carried out last. Interconnection is a well understood aspect of logic design and is used to form
the more complex macrostructures (various
flip-flop types, adders, etc.) and then to form
the higher levels of design by using arrays of
gate arrays. A disadvantage of gate arrays is
that gate array design methods do not permit
the high density possible with the more custom
methods because device placement is fixed.
It should be noted that gate array design is
not a new idea brought about by the need for a
simple method of customizing large-scale integrated circuits. Instead, it was one of the design philosophies advocated in the first few
generations. The concept then was to have a
single module containing a set of gates, and all
subsequent logic design would be done in terms
of that module. For example, flip-flops would
be constructed by interconnecting the gates. A
design predicated on a single module type immensely simplifies the spare stocking and servicing aspects, and it is possible to troubleshoot

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

Table 7.

43

Reasons To Do Custom Integrated Circuit Design

1.

A performance advantage can be gained.

2.

Product life cycle costs can be lower if diagnosability and reliability features are added.

3.

Diagnostic labor can be a high percentage of printed circuit board manufacturing cost. Diagnosis to the chip level
can be sped up by features within the chip, and by a to\AJer chip count, '-''lith a resultant !o\AJer manufacturing cost.

4.

Data buses can be absorbed entirely within a chip to avoid bus interface costs. Even shortening a data bus from
multi-board to single-board length may reduce cost and/or improve performance by reducing stored energy and its
attendant drive/speed penalties.

5.

Innovations concea~d within a chip are difficult for competitors to study and duplicate.

6.

Performance barriers may be breakable only through custom large-scale integration. In central processor design
especially, and perhaps for certain memory interface applications, a custom integrated circuit approach may be the
only practical way to get around conflicting issues of size, power, capacitance, etc.

7.

In some engineering environments there are extremely small amounts of space or very little power.

Table 8.

Reasons Not To Do Custom Integrated Circuit Design

1.

For designs in the 100-500 equivalent gate complexity range, it may take up to a year to do the design with
primitive design tools.

2.

For designs in the 100-500 equivalent gate complexity range, it may take up to $100,000 to do the design.

3.

Unless substantial product volumes are obtained, the chip cost wi" be high relative to off-the-shelf chips.

4.

A decision wi" have to be made whether to have the design done by an outside vendor or within the company. This
can be a very complicated and expensive decision.

5.

The logic design and logic partitioning for large-scale integrated circuit design is different from that of conventional
logic design, and designers used to dealing with conventional design wi" have to assimilate new knowledge to
design large-scale integrated circuits themselves or even to talk with integrated circuit designers.

a problem by simply replacing circuits according to a pattern. Designers did not find these
advantages important enough at that time,
however, so the gate array concept was set aside
until it was rediscovered by integrated circuit
designers.
A representative gate array is a Raytheon
RA-116. It has 300 TTL Schottky gates, of two
cluster configurations, each repeated twelve
times within the 160 mil X 160 mil chip:

Type
•
•
•
Type
•
•
•

1
3 external driver gates (4-input NAND)
5 internal driver gates (3-input NAND)
5 internal expansion gates (3-input
NAND)
2
2 external driver gates (4-input NAND)
5 internal driver gates (3-input NAND)
5 internal expansion gates (3-input
NAND)

44

COMPUTER ENGINEERING

Within each cluster, the expansion gates may
be combined with the driver gates to form 7 or 8
input NAND gates and AND-OR-INVERT
circuits with up to six product terms. The gates
have a typical propagation delay of 5-6 nanoseconds and dissipate 5.5-6 milliwatts per driver
and 1 milliwatt per OR expander. Two metal
layers are used for interconnect, and the resulting circuitry can be connected to the outside
world by means of 56 external pins, including
power and ground.
Because the use of integrated circuit gate arrays is recent, data on package count reduction
is scarce, but one informal study for the Raytheon RP-16 aerospace computer measured a
nine to one replacement ratio and an overall improvement by a factor of 2 over a system constructed with standard components [Parke,
1978].
A 920-gate MOS array of 3 input NOR gates
has been reported by Nakano et al., [1978]. Its
3-nanosecond gate delay illustrates the performance potential as the metal oxide semiconductor process continues to progress toward
smaller, faster gates. For truly high speed applications, an ECL gate array can be used. These
devices, with subnanosecond speeds, exploit the
inherent properties of current mode logic to obtain a particularly flexible element [Gaskill et
al., 1976].
Standard Cell Design

An alternative to gate array design is standard cell design. Standard cell design is identical
to the logical design of the first few generations
because there is a previously designed, well
characterized set of primitive components
(AND gates, flip-flops) in which the design is
carried out. The advantage of the standard cell
design methods is that special functions can be
mixed on the chip in greater variety. There may
also be a density advantage over gate arrays.
However, in some schemes each cell occupies a
different space and has a fixed shape. Careful

planning of the cell arrangements is necessary
to minimize loss of space. Hence, the improvement in packing density is not as substantial as
direct comparisons between standard cell technology and gate array technology might at first
indicate. In addition, if there are a large number
of circuit types, their interconnection rules may
not be characterized well enough to achieve a
quick, ch:>ap design that works the first time.
Custom Design

Custom design is in some ways a variant of
the standard cell because designers typically
have a set of favorite circuits which they interconnect to create designs for specified applications. With custom design, the designer can
(theoretically) specify a circuit for each use
within a particular logic design. For example,
upon observing that a particular gate or flipflop only drives a certain load, the designer can
modify that gate or flip-flop to provide only the
appropriate driving capability. Therefore, with
custom design, the whole integrated circuit can
theoretically be an optimum size, since each
part is no larger than it need be. The advantages
are clearly size, cost, and speed. The design
costs are high because each part can, in principle, be customized. The quality of the circuit
design is totally dependent on the designer, who
must analyze each circuit geometry in terms of
his expectation of performance, operating margins, etc. To the extent that this analysis is carried out, the circuit is clearly optimal.
Universal Logic Arrays, PROMs, and ROMs

Also shown in Figure 9 is a hypothetical line
for universal logic arrays. For at least 15 years,
academicians have studied the possibility of designing a single array of logical design elements,
or a collection of such arrays, that could be interconnected on a custom basis to carry out a

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

given function. The gate array can be looked at
as the simplest example of this type of design.
While many are skeptical that such a device exists, a line representing it is placed on the graph
as a target for those who search for the one
truly universal logic array.
Both programmable read-only memories and
masked read-only memories are commonly
used, but trivial, forms of the truly universal arrays, because they can be used in a table lookup
fashion to create several functions of a number
of input variables. For example, a 1,024 word
read-only memory arranged in a 256 X 4-bit
fashion can generate 4 independent functions of
8 variables. This is a distinct alternative for using a conventional gate structure to carry out
combinational functions. A disadvantage of
this method is that the required read-only memory size doubles for each additional input variable.
Programmable Logic Arrays

The progammable logic array (PLA) is a
combinational circuit which remedies the disadvantages of the read-only memory implementation of combinational functions by allowing
the use of product terms rather than completely
decoding the input variables. Figure 11 shows a
typical circuit, which consists of separate AND
and OR arrays. Inputs are connected to the
AND array, and outputs are drawn from the
OR array. Each row in the programmable logic
array can implement an AND function of selected inputs or their complements, thus forming a Boolean product term, and the OR array
can combine the product terms to implement
any Boolean function.
A simple application is operation-code decoding. For the PDP-ll, the 16-bit Instruction
Register could be directly connected to a programmable logic array and the output thereof
used to specify the address of the microprogram
that executed that instruction. Three different

45

types of operation-code decoding are customarily applied to PDP-II instructions: source
mode decoding, destination mode decoding,
and instruction decoding. With a programmable logic array implementation, a PLA could
hI"
llCPrl
fAr
p'lf"'h Af th<3"<3 rl<3f"'"r!;nn """0 .. ,.,";"''''"
....,"""
y.'••n .... u
J.VI. ",,"""'.1..1 VI. ",J..1'-"Ll"'" u"" ....... vUIJJ.6 vP'"'! al.lVlli:),

and only three chips would be required. A readonly memory implementation, on the other
hand, would require 128 K X 8 bits for address
mode decoding and 64 K X 8 bits for instruction decoding. Using 2 K X 8-bit read-only
memories, 33 chips would be required. For this
reason, modern minicomputers, such as the
PDP-I 1/34, use programmable logic arrays
rather than read-only memories or combinational logic for instruction decoding. The
technique is also extended downward into microcomputers such as the LSI-ll, where programmable logic arrays are used to conserve the
die area used by the microcomputer control
units.
The programmable logic array becomes an
even more useful building block when it is made
field programmable - the FPLA. The programmable connectors shown in Figure 11 are fusible nichrome links that are burned out when
the unit is programmed.
When a register is added to the outputs of the
programmable logic array and incorporated in
the same integrated circuit, a simple sequential
machine is obtained in one package. Since register circuit packages are pin intensive, adding
registers to programmable logic arrays (or to
read-only memories) permits about a factor of2
package count reduction in typical applications.
The first programmable logic arrays had
propagation times of the order of 150 nanoseconds and were thus suitable building blocks
for slow, low-cost computers. Propagation
times of 45 nanoseconds are quite common today, and the programmable logic array is now
more widely used. An attractive application
with these higher speed components is the replacement of the small-scale integration and

46

COMPUTER ENGINEERING

MEMORY

:.2f;,,,

16
FPLA

~OPTICAL

PAPER

PROM

TAPE

AND
MATRIX

RAM

CCD

EPROM

JOSEPHSON
EAROM
JUNCTION
ELECTRON
BEAM
STATIC

I\TAPE

ROTAT~ING

:~:ROBIT BUBCB~:E

HOLA~ER

'7\"

CROSSTIE
DIGITA
.d
."
J.
',

~~~::RCHY ~::~
CASSETTE

PROGRAMMABLE
CONNECTORS

-=
Figure 11. Signetics field programmable logic array
(FPLA) (courtesy of Signetics Corporation. from Signetics
Field Programmable Logic Arrays - An Applications
Manual, February 1977; copyright © 1977 by Signetics
Corporation).

medium-scale integration packages used to implement the control logic for Unibus arbitration
in PDP-II computers.
A more complex application than instruction
decoding has been documented [in Logue et af.,
1975]. An IBM 7441 Buffered Terminal Control Unit was implemented using programmable logic arrays and compared with a version
implemented with small- and medium-scale integration. The programmable logic array design
included two sets of registers fed by the OR array (PLA outputs): one set fed back to the
AND array (PLA inputs); the other set held the
PLA outputs. A factor of 2 reduction in printed
circuit board count was obtained with the programmable logic array version. The seven programmable logic arrays used in the design
replaced 85 percent of the circuits in the smalland medium-scale intregration version. Of these
circuits, 48 percent were combinational logic
and 52 percent were sequential logic.

I

CARTRIDGE

AUDIO

~~~~A ~:~::LE

BERNOULLI
/ \

INSTRUMENTATION
FIXED
HEAD

(

/

CA\

"~~Mm':'··'

n!ppy \
VIDEO

A\

MOVING
HEAD

FIXED
HEAD
A
/

MOVING
HEAD

DISK

"'
DRUM

Figure 12. Family tree of memory technology (courtesy
of Memorex Corporation and S.H. Puthuff. 1977).

MEMORY TECHNOLOGY

The previous section discussed the use of
memory for microprogramming and table
lookup in logic design, but that is not the principal use of memory in the computer industry.
The more typical use of memory components is
to form a hierarchy of storage levels which hold
information on a short-term basis while a program runs and on a longer term basis as permanent files. Figure 12 shows the various
technologies employed in these memory applications. Although the principal focus of this
section is on core and semiconductor memories,
slower speed electromechanical memories
(drums, disks, and tapes) are considered superficially, as their performance and price improvements have pushed the computer
evolution. Because the typical uses for memory
usually require read and write capabilities,
write-once or read-only memory such as video
disks is excluded from the discussion.

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

Measurement Parameters

Because memory is the simplest of components, it should be possible to discuss memory using a minimal number of measurement
parameters. One of the most important parameters is the state of development of the memory
technology at the time the other parameters are
measured, relative to the likely life span of that
technology. Unfortunately, this is one of the
most difficult parameters to quantify, although
its effects are readily observable, principally in
the rate of change of the other parameters associated with that technology. Thus, in new technologies many of the parameters vary rapidly
with time. This is particularly true of semiconductor memory price, which has declined at
a compound rate of 28 percent per year (which
amounts to about 50 percent in two years). The
price is expressed only as price/bit, but it is important to know the price (or size) of the total
memory system for which that price applies. To
get the lowest price per bit, a user may be forced
to a large system because of economy of scale.
Performance for cyclical memories, both the
electromechanical types such as disks and the
electronic types such as bubbles, is expressed in
two parameters: the time to access the start of a
block of memory and the number of bits that
can be accessed per second after the transfer begins. Other parameters, such as power consumption, temperature sensitivity, space
consumption, and weight, affect the utility of
memories in various applications. In addition,
reliability measures are needed to see how much
redundancy must be placed in the memory system to operate at a given level of availability
and data integrity.
In summary, the relevant parameters for a
given memory are:
1.

State of development of the technology
at the time the measurements are taken
relative to the likely life span of the technology.

2.
3.
4.

5.
6.
7.

47

Price per bit.
Total memory size or total memory
price.
Performance.
a. Access time to the first word of the
block.
b. Time to transfer each word (data
rate) in the block.
Operational power, temperature, space,
weight.
Volatility.
Reliability and repairability.

As indicated by the rapidity of the parameter
changes, a good example of a technology that is
young relative to its expected total lifetime is
semiconductor memory. Figure 7 gives past
prices and expected future prices of semiconductor memory. As mentioned above, these
memories have declined in price every two years
by 50 percent, and that rate of decline is expected to continue well into the 1980s because
of continued increases in semiconductor densities. Figure 13, a graph by Dean Toombs of
Texas Instruments, shows memory size versus
performance with time for random-access memories, and cyclically accessed charge-coupled
devices (CCDs) and magnetic bubbles.
Core and Semiconductor Memory
Technology for Primary Memory

The core memory was developed early in the
first generation for Whirlwind (1953) and remained the dominant primary memory component for computers until it began to be
superseded by semiconductor technology. The
advent of the l-Kbit memory chip in 1972
started the demise of core as the dominant
primary memory medium, and the crossover
point occurred for most memory designs with
the availability of the 4-Kbit semiconductor
chip in 1974.
Over the period since the early 1960s, the
price of core memory declined roughly at a rate

48

COMPUTER ENGINEERING

SER IAL ACCESS

RANDOM ACCESS

~
79-80

256
128
65
32

~

~

16

~

74-75

N

Cii

0.25
0.13

0.01

100

0.1

1.000

10.000

ACCESS TIME (MICROSECONDS)

Figure 13. Memory size versus access time for various
memories and yearly availability (courtesy of Dean
Toombs, Texas Instruments, Inc.).

7.0,-------------------,
6.0
5.0

4.0

3.0

0.2

1.0

0.5

of 19 percent per year. This decline can be seen
in the DEC 12-bit machine memory prices, the
DEC IS-bit machine memory prices, and in the
IBM 360/370 memory prices (since 1964). The
price of PDP-tO memory has declined at 30 percent per year, although it is unclear why. A possible reason is that the modular memory
structure had a high overhead cost; with subsequent implementations, the memory module
size was increased, thereby giving an effective
decrease in overhead electronics and packaging
costs and a greater decrease in the cost per bit.
The cost of various memories was projected
by several technology marketing groups in the
period 1972-1974. Each study attempted to
analyze and determine the core/semiconductor
memory crossover point. Three such studies are
plotted in Figure 14 along with Turn's [1974]
memory price data and Noyce's [1977a] semiconductor memory cost (less overhead electronics) projection. Most crossover points were
projected to be in 1974, whereas one study
showed a 1977 crossover. Even though all studies were done at about the same time, the variation in the studies shows the problem of getting
consistent data from technology forecasts.
While these graphs of core and semiconductor prices and performance permit an
understanding of trends in the principal use
areas for these devices, additional information
is needed for disk and tape memory in order to
complete the collection of memory technologies
that can be used to form a single memory hierarchy.

119741

0.4

Disk Memories
03
~EMICONDUCTOR

~~~~~~~ 19770]

'\
2.0

""OS

\

119721

1965

1970

\

1975

1980

YEAR

Figure 14. Cost per bit of core memory estimated by
various market surveys and future predictions.

Disk memories are a significant part of most
systems costs in the middle-range minicomputer
systems; in larger systems, they dominate the
costs.
Although access time is determined by the
rotational delays and the moving head arm
speed, the single performance metric that is
most often used is simply memory capacity and
the resultant cost/bit. In the subsequent section

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

on memory hierarchies, it will be argued that
performance parameters are less important
than cost because more higher speed memory
can be traded off to gain the same system level
performance at a lower cost.
Memory capacity is measured in disk surface
areal density (i.e., the number of bits per in2)
and is the product of the number of bits recorded along a track and the number of tracks
of the disk. Figure 15 shows the progress in
areal recording densities using digital recording
methods. Figure 16 shows the price of the stateof-the-art large, multiple platter, moving head
disks. Note that the price decline is a factor of
lOin 9 years, for a price decline of 22 percent
per year.
Figure 17 shows the performance plotted
against the price per bit for the technology in
1975 and 1980.

49

10r-~~------------------------------~

01

0.01

0_001

0.0001 ...
19-7.....
2 - -..........&...----'---........-L...........- - ' - - - - ' - -........
19-S.....
2 - -.....1-9S-4L......;;. .

YEAR

Figure 16. Price per bit of large, moving head disks and
semiconductor memories (courtesy of Memorex
Corporation, 1977).

10M

/

j

MOVING/ HEAO
OISK

/I

1M

FLEXIBLE
;' OISK

°

;

/
0;';'

A--/

/

°

100K

; /
1Sr

9-TRACK /

V-

\}

koo

o---L----~

1955

19S0

0 .........,~~
'

~l:::,.~__ ~CASSETTE

6

____

19S5

;';';' CARTRIOGE

o

10- 1

P

"';

E-BEAM
CCO

p

~cf

10- 2

BUBBLE

~

7-TRACK';:;

/

2K

,/M~CORE

1/4-INCH

/0
10K

9 BIPOLAR

10

S250

/

0

100~------------------------------------~

/TAPE

10- 3

' "

pOISK

J,""Oa

TAPE

10- 4

19S0

"'a..::.ARTRIOGE
1975

10- 5
~

____

1970

~

____

1975

~

____

19S0

~

1985

1 I1S

1 ms

1 s

1 O-S ' - -_ _ _ _ _ _---'-________--'-________-L.._ _ _ _ _ _ _ _....

10- 9

lO- S

10- 3

1000

YEAR
ACCESS TIME ISECON OS)

Figure 15. Areal density of various digital magnetic
recording media (courtesy of Memorex Corporation,
1978).

Figure 17. Memory trends, 1975-1980 (courtesy of
Memorex Corporation, 1978).

50

COMPUTER ENGINEERING

Magnetic Tape Units

Figure 18 shows the relevant performance
characteristics of magnetic tape units. The data
is for several IBM tape drives between 1952 and
1973. It shows that the first tape units started
out at 75 inches per second and achieved a
speed of 200 inches per second by 1973. Although this amounts to only a 5 percent im-

1,000
110,0001

provement per year in speed over a 21-year
period, this is a rather impressive gain considering the physical mass movement problems involved. It is akin to a factor of 3 improvement
in automobile speed.
The bit density (in bits per linear inch) has
improved from 100 to 6,250 in the same period,
for a factor of 62.5, or 23 percent per year. With
the speed and density improvements, the tape
data rate has improved by a factor of 167, or 29
percent per year.
Tape unit prices (Figure 19) are based on the
various design styles. Slow tape units (minitapes) are built for lowest cost. The most cost
effective seem to be around 75 inches per second (the initial design), if one considers only the
tape. High performance units, though disproportionately expensive, provide the best system cost effectiveness.
Memory Hierarchies

A memory hierarchy, according to Strecker
[1978:72], "is a memory system built of a number of different memory technologies: relatively
small amounts of fast, expensive technologies
and relatively large amounts of slow, inexpensive technologies. Most programs possess
the property of locality: the tendency to access a

100
11,0001

10
11001

o

1125; 62501

LEGEND
•

TAPE UNIT SPEED
(inches/second)

6

;c~:~a::~~/~~~~)G

DENSITY

o ;::V~e~;:e::n~~E

ISPEED; RECORDING OENSITY (char/illl!

1980

1950

YEAR

Figure 18. Characteristics of various IBM magnetic
tape units versus time.

TRANSFER RATE (Kbytes/second)

Fi gure 19. Relative cost versus transfer rate for various
tape drives and controllers (1978).

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

small, slowly varying subset of the memory locations they can potentially access. By exploiting locality, a properly designed memory
hierarchy results in most processor references
being satisfied by the faster levels of the hierarchy and most memory locations residing in
the inexpensive levels. Thus, in the limit a memory hierarchy approaches the performance of
the fastest technology and the per bit cost of the
least expensive technology."
The key to achieving maximum performance
per dollar from a memory hierarchy is to develop algorithms for moving information back
and forth between the various types of storage
in a fashion that exploits locality as much as

Table 9.

51

possible. Two examples of hierarchies which depend on program locality for their effectiveness
are the one level store (demand paging), first
seen on the Atlas computer [Kilburn et at.,
1962], and the cache, described by Wilkes
[1965] and first seen on the IBM 360/85 [Liptay, 1968]. Because both of these are automatically managed (exploiting locality), they are
transparent to the programmer. This is in contrast to the case where a programmer uses secondary memory for file storage: in that case, he
explicitly references the medium, and its use is
no longer transparent.
Table 9 lists, in order of memory speed, the
memories used in current-day hierarchies.

Computer System Memory Component and Technology

Part

Transparency
(To Machine
Language
Programs)

Characteristics on
Which Its Use Is
Based

Microprogram memory

Yes

Very fast

Processor state

No

Very small, very fast register set (e.g., 16 words)

Alternative processor state
context

Yes

Same (so speed up processor context swaps)

Cache memory

Yes

Fast. Used in larger machines for speed.

Program mapping and
segmentation

Yes

Small associative store

Primary (program) memory

No

Relatively fast and large depending on processor speed

Paging memory

Yes

Can be electromechanical. e.g., drum, fixed head
disk, or moving head disk. Can be CCD or bubbles.

Local file memory

No

Usually moving head disk, relatively slow, low
cost.

Archival files memory

Yes (preferably)

Very slow, very cheap to permit information to
be kept forever.

52

COMPUTER ENGINEERING

There is a continuum based on need together
with memory technology size, cost, and performance parameters.
The following sections discuss the individual
elements of the heirarchy shown in Table 9.
Microprogram Memories. Nearly every
part of the hierarchy can be observed in the
computers in this book. Part III describes PDP11 implementations that use microprogramming. These microprogram memories are transparent to the user, except in machines such as
the PDP-II/60 and LSI-II which provide user
microprogramming via a writable control store.
Mudge (Chapter 13) describes the writable control storage user aspects associated with the
11/60 and the user microprogramming.
In retrospect, DEC might have built on the
experience gained from the small read-only
memory used for the PDP-9 (1967) and exploited the idea earlier. In particular, a readonly memory implementation might have produced a lower cost PDP-II /20 and might have
been used to implement lower cost PDP-lOs
earlier.
In principle, it is possible to have a cache to
hold microprograms; hence, there could be another level to the hierarchy. At the moment, this
would probably be used only in high cost/high
performance machines because of the overhead
cost of the loading mechanism and the cache
control. However, like so many other technical
advances, it will probably migrate down to
lower cost machines.
Processor State Registers. To the machine
language program, the number of registers in
the processor state is a very visible part of the
architecture. This number is solely dictated by
the availability of fast access, low cost registers.
It is also occasionally the means of classifying
architectures (e.g., single accumulator based,
general register based, and stack based).
In 1964, even though registers were not available in single integrated circuit packages, the
PDP-6 adopted the general register structure

because the cost of registers was only a small
part of the system cost. In Chapter 21 on the
PDP-lO, there is a discussion of whether an architecture should be implemented with general
registers in an explicit (non-transparent) fashion, or whether the stack architecture should be
used. Although a stack architecture does not
provide registers for the programmer to manage, most implementations incur the cost of registers for the top few elements of the stack. The
change in register use from accum ulator based
design to general register based design and the
associated increase in the number of registers
from 1 to 8 or 16 can be observed in comparisons of the 12-bit and 18-bit designs with
the later PDP-IO and PDP-II designs.
Alternative Processor State Context
Registers. As the technology improved, the

number of registers increased, and the processor state storage was increased to provide multiple sets of registers to improve process context
switching time.
Cache Memory. In the late 1960s, the cache
memory was introduced for large scale computers. This concept was then applied to the latest PDP-IO processor (KLIO). It was applied to
the PDP-II /70 in 1975 when the relatively large
(I Kbit), relatively fast (factor of 5 faster than
previously available) memory chip was introduced. The cache is described and discussed
extensively in Chapter 10. It derives much
power by the fact that it is an automatic mechanism and is transparent to the user. It is the best
example of the use of the principle of memory
locality. For example, a well designed cache of 4
Kbytes can hold enough local computational
memory so that, independent of program size,
90 percent of the accesses to memory are via the
cache.
Program Mapping and Segmentation. A
similar memory circuit is required to manage
(map) multiprogrammed systems by providing
relocation and protection among various user
programs. The requirements are similar to the

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

cache and may be incorporated in the caching
structure. The PDP-I0 models with the KIlO
processor use an associative memory for this
mapping function, and the VAX 11/780 uses a
64-entry, 2-way associative memory.
Paging Memory. The Atlas computer [Kilburn, et al., 1962] was designed to have a single,
one level, large memory. This structure ultimately evolved so that multiple users could
each have a large virtual address and virtual
machine. The paging mechanism works because
of the locality exhibited by program references.
Denning pointed out the clustering of pages for
a given program at a given time and introduced
the notion of the working set [1968]. For most
programs, the number of pages accessed locally
is small compared with the total program size.
Initially, a magnetic drum was used to implement the paging memory; but as disk technology began to dominate the drum, both fixed
head and moving head disks (backed up with
larger primary memories) were used as the paging memories. Denning's tutorial article [1970]
is an excellent discussion of this section of the
memory hierarchy. In the next few years, the
relatively faster and cheaper charge coupled device semiconductor memories and bubble memories are clearly the candidates for paging
memories. Hodges [1975] compares the candidates for paging memory in terms of reliability,
power, cost per bit, and packaging.
Local File Memory and Archival File
Memory. For local file memory in mediumsized to large-scale systems there is no alternative to disks. Archival files, however, are usually kept on magnetic tapes, which permit files
to be stored cheaply on an indefinite basis.
There are usually fewer memory technologies
used in smaller systems than in larger systems
because the smaller systems cannot afford the
overhead costs (disk drives, tape drives, etc.) associated with the various technologies. At most,
two levels of storage would probably exist as
separate entities in smaller systems.

53

Alternatively, one might expect a combination of floppy disk, low cost tape, and magnetic bubbles to be used to reduce the primary
memory size and to provide file and archival
memory. Currently, the floppy disk operates as
a single level memory. Here there are two alternatives for technology tradeoff using parts in
the hierarchy: a tape or floppy disk can be used
to provide removability and archivability,
whereas bubbles or charge-coupled devices can
be used to provide performance. The Strecker
paper [1978] quoted at the beginning of this section on memory hierarchies elaborates on these
concepts.
MEASURING (AND CREATING)
TECHNOLOGY PROGRESS

The previous sections have presented technology in terms of exponentially decreasing
prices and/ or exponentially increasing performance. This section presents a basis for this constant change rate. The progress of a particular
technology as a function of time, T(t), has been
classically observed to be:
T(t)

=K

X e ct

where K = the base technology at the beginning
of the time frame, and c = a learning constant.
This can be converted to a yearly improvement
rate, r, by changing the base of the exponential
to:
T( t)

=

T X r t-tO

where T = the base technology at to, and r =
yearly increase (or decrease) in the technology
metric.
This is the same form used for declining (or increasing) cost from base c:

c=

c X r {-to

54

COMPUTER ENGINEERING

Clearly there are manufactured goods that
neither improve nor decrease in price exponentially, although many presumably could with
the proper design and manufacturing tooling
investments. The notion of price decline is completely tied to the cumulative learning curves of:
(1) people building a product for a long time,
(2) process improvement based on learning to
build it better, and (3) design improvement by
engineers learning from the history of design.
Production learning per se is inadequate to
drive cost and prices down because, after an extremely long time in production, more units
contribute little to learning. With inflation in labor costs, the costs actually rise when the learning is flat. In order to provide a base for
predicting the inflationary effect, the consumer
price index has been plotted in Figure 20.
Learning curves do not appear to be understood beyond intuition. They are (empirical)
observations that the amount of human energy,
En, required to produce the nth item is:
En = K X n d

where K and d are learning constants. Thus, by
producing more items, the repetitive nature of a
task causes learning, and the time {and perhaps

3.0

cost) to produce an item decreases with the
number produced and not with the calendar
time in which an object is produced.
In his study of technology progress, Fusfeld
[1973] took six items, chose a measure of progress in the production thereof, and plotted that
measure against cumulative units produced. In
each case, he found a relationship of the form:

=a

Ti

where i is the number of units produced and Ti
is the value of his selected technology progress
measure at the ith unit - the same as the learning curves would predict.
The graph for turbojet engines, where he used
fuel consumed per pound as the technology
measure, is reproduced in Figure 21. The results
fOT all six items studied are shown in Table 10.
Where two values are given for the technology progress constant, a second rate of progress was observed after a significant shift in the
industry occurred. For example, such a shift occurred in the automobile industry in the late
1920s when the acceptance of the automobile,
the development of a new tire, and the expansion of the public road network operated concurrently to change the nature of the industry.

,..-----------"7"""'1
10.0

~

:i

B.O

I-

co

6.0

I-

4.0

I-

0'1

2.0

RECIPROCAL OF
SPECIFIC WEIGHT

w

a:

x

::>

0

"'«

~

~

III

U

>-

~

g

a:

X ib

~

1.0

2.0

I-

10

I-

O.B

I-

~

~

-

.__.-----

...

0
1960

1970

19BO

YEAR

figure 20. Consumer Price Index using
1967 as base.

= a;1.06

.".,,__0---0

U

::>

T;

RECIPROCAL OF SPECIFIC
FUEL CONSUMPTION
___

:I:

"'z
2

------------'-

I
B.OOO

.,.,.0

°

CUMULATIVE JET
ENGINE PRODUCTION

I

I

10.000

12.000

I
14.000

16.000

NUMBER PRODUCED

Figure 21. Technology progress functions for
turbojet engines [Fusfeld, 1973].

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

Examination of the table will reveal substantial variations in the technology progress
constant from item to item. This is probably because most of the technologies represented
above are mechanically oriented with associated physical limits. Computer technology is
electronically oriented and has not yet reached
its limits. In essence, the table is comparing systems constrained by Newton's Laws with those
constrained by Maxwell's Equations.
U sing the two formulas,
T(t) = K X e ct

and
Ti = a X i b

Fusfeld [1973] related the unit learning curve
concept to the more conventional, timely view
of technology progress when the number of
units produced increases exponentially with
time, that is, relations expressed in the first two
formulas are equivalent when the condition expressed by the following formula holds:

i

= ecjbXt

This previous formula indicates that the production rate is a constant fraction of the total
production to date - i.e., production occurs
with exponential growth.
While the Fusfeld information shows interesting results, it does not explain why technology improves exponentially, nor does it

Table 10.

55

explain why cost declines exponentiaily. Learning curves and an exponential increase in the
quantity of items produced may depress cost,
but simple production learning does not account for the rapid technology changes in the
integrated circuit, for example, where totally
different production processes have been
evolved to support the greater technology.
In the computer industry, the mobility of
technical personnel from company to company
has certainly been a significant factor in technology innovation. The strongest force toward
technology innovation in the computer industry, however, has been the computer users.
They have been doing a significant portion of
the inventing, both in hardware development
and in software development. Although the
case studies in this book indicate several specific
places where users have influenced hardware
design, it would be a substantial oversight not
to mention the profound effect users had on the
creation of PLf 1 and COBOL. Furthermore, all
applications work is done first by users and
then developed by manufacturers at a later date
along the lines of the above model.
The I nfluence of Technology I nnovation on
Cost

The cost of computing is the sum of the costs
which correspond to the various levels-of-integration described in Chapter 1, plus the operational costs. The levels are integrated circuits,

Fusfeld's [1973] Measures of Technology Progress

Item

Measure, Ti

Quantity
Produced (j)

Light bulbs
Automobiles
Titanium
Aircraft
Turbojet engines
Computers

Lumens/bulb
Vehicle h.p.
Psi/$/16
Maximum speed
Fuel consumed. weight
Memory size X rate

10 10
3 X 10 7 ; 108
3 X 108
2 X 105
1.6 X 104
105

Technology
Progress (b)

Change
Observed
In Study

Total
Change

0.04; 0.19
0.11; 0.74
0.3; 1; 1.04
0.33-1.2
1.06
2.51

33
10
10
6
2
109

80
6; 13
350
56
2.9 X 104
3.5 X 10 12

56

COMPUTER ENGINEERING

boards, boxes, cabinets, operating systems,
standard languages, special languages, applications components, and applications. In practice, each additionallevel-of-integration is often
looked at as overhead. Using standard accounting practice, the basic hardware cost, at the lowest level, is then multiplied by an overhead
factor at each subsequent outer level. While an
overhead-based model may work operationally
for a stable set of technologies, such a model
will not adequately allow for rapidly evolving
technologies or the elimination of levels. By examining each level, observations can be made
about the use and substitution of technology.
More importantly, conclusions can be drawn
about how structures are likely to evolve.
Cost, Performance, and Economy of Scale

F or most technologies used in the computer
industry, there is a relationship between cost,
performance, and economy of scale:
Performance = k X cost S X r t
where k = base case performance, s = economy
of scale coefficient, r = rate of improvement of
technology, and t = calendar time.
There are four possibilities for the effect of
economy of scale on the production of any device. These are:
1.

2.

3.

4.

Economy of scale holds. A particular
object can be implemented at any price,
and the performance varies exponentially with price.
Performance = k X price s; s > 1
Linear price performance relationship.
a. Performance = k X price
b. Performance = base + K X price
Constant performance, price independent.
Performance = k
Only a particular device has been implemented. The performance (or size) is a
linear sum of such devices.
Performance = n X (k X price)

Sometimes, economy of scale effects are observed in situations where they would not normally be expected. For example, assume a
performance improvement feature exists that
costs the same whether it is added to a large
computer or added to a small computer. Adding that feature to a product that is already high
priced will have a modest effect (say 5 percent)
on the cost but a substantial effect (say 100 percent) on the performance. Adding the same
constant cost feature to a lower cost product
will have a substantial effect (say 200 percent)
on the cost but only a performance effect (again
100 percent) similar to that obtained with the
higher cost system. This condition is especially
true in disks and computer systems. Use of a
particular recording method employing costly
logic for encoding/decoding, or addition of a
cache memory, is often employed to the high
priced systems first. With time and learning, the
technique can then be applied to lower cost systems. For example, cache, a nearly perfect example of the constant cost add-on, first
appeared in such large machines as the IBM
360/85 in 1968 and later migrated down to large
minicomputers such as the PDP-II /70 in 1975.
On a research basis, cache even reached the
small minicomputer, the cache-based PDP-8/E
at Carnegie-Mellon University (Chapter 7).
In Figure 22, the cost of the lowest price unit
is kept to a minimum and decreases, while the
cost of the mid-range product continues to increase. The cost of the highest performance
product increases the most, because it can afford the overhead costs. Looking at the basic
technology metric, there are really three curves,
as shown in Figure 23. The first curve represents the application of new technology to a
high cost/high performance product to get a
substantial performance improvement. With
time, the technology evolves and is reapplied to
the mid-range products (the first level copy),
and finally, several years later, the technique becomes commonplace and is applied to low cost

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

---

SMALL
.--~~------------~
_ _
(MIN COST)
\

COST

= C.base X C.ratet

-

__

t_

Figure 22.

Cost versus time.

t_

Figure 23.

Technology versus time.

t

cost
base
C.rate
=CT.base
xtech
T.ratet

~

g

=

k X ( C.rate )'

T.rate

~

~

8
OJ

g

Figure 24.

Cost/technology versus time.

57

products (second level copies). The resultant
cost/performance ratios are shown in Figure
24.
The management of technology by applying
it to products in various price and performance
ranges occurs in a more or iess ordered fashion
in most industries, but has not occurred to the
extent that it has in the computer industry. This
is probably because no other industries have
evolved in the same rapid and broad fashion as
have the computer and semiconductor industries. The computer industry is fundamentally driven by the semiconductor technology
push on the one hand, and by IBM on the
other. IBM follows the strategy of applying
technology on an economy of scale basis. This
permits the technology to be first tested at the
high performance/high price lower volume systems before being introduced in higher volume
production. The following examples (from
IBM) show this at work. In printing, the high
price/low volume to low price/ high volume introduction cycle was followed in the use of dot
matrix printing, chain printing, ink-jet printing,
and computer printing as a precursor to systems
products using xerography. In magnetic storage, the cycle saw the basic technology for large
disks as a precursor to the use of similar technology on smaller disks.

Technology Substitution

The cost and performance of a computer system are roughly the additive and multiplication
functions, respectively, of the parts. The technologies represented in those parts each evolve
at their own rates. Usually, when one component begins to dominate the cost (e.g., packaging) or constrain the performance, then
pressure occurs to more rapidly change and improve the associated technology to avoid the
cost or performance bottleneck. Sometimes a
slowly evolving technology is just eliminated as

58

COMPUTER ENGINEERING

a substitute is found. The following is a list of
some of the substitutions that have occurred:
1.

2.

3.

4.

5.

Semiconductor memories are now used
in place of core memories. Since the latter has evolved more slowly in terms of
price decline, semiconductors are now
used to the exclusion of cores. (This has
not occurred where information must be
retained in the memory during periods
of time without power.,
Read-only semiconductor memories are
now substituted for semiconductor logic
elements.
In a similar way, programmable logic arrays can be potentially substituted for
read-only memories, and true content
addressable memories can replace various read-write and read-only memories.
The judicious use of charge-coupled devices or bubble memories can cause
drastic reduction (and quite possibly the
elimination) of the use of MOS randomaccess memories for primary memory.
The fixed head disk could be eliminated
at the same time.
For small systems, the main operational
memories could be completely nonelectromechanical; electromechanical memories (e.g., tape cassettes and floppies)
would be used for loading files into the
system and for archives. For even lower
cost systems, semiconductor read-only
memories could replace cassettes and
floppies for program storage, as in programmable calculators.

After a while those components of computer
system cost which are decreasing less rapidly
than other components, remaining static, or are
rising (like the packaging and power) may become a significant fraction of the total cost. Because costs are additive, the exponential
decrease in some costs, such as those for semiconductor logic and memories, will cause the

costs that are not similarly decreasing to be
more evident. This causes pressure for structural change and may cause new packaging, for
example, to become an especially important attribute of a new design. For instance, although
the PDP-8 is normally considered to be the first
minicomputer, it postdates the CDC 160 (1960)
and DEC's PDP-S (1963). However, the PDP-8
was unique in its use of technology because:
1.

2.

3.

4.

It eliminated the full frame cabinets used
by other systems. This also presented a
new computer style such that users could
embed the computer in their own cabinets. A separate small box held the processor, memory, and many options.
Automatic wire-wrap technology was
used to reduce printed circuit board interconnection cost. This also eliminated
errors and reduced checkout time.
Printed circuit board costs were reduced
by using machine insertion of components.
The Teletype Corporation Model 33
Automatic Send Receive (ASR) teleprinter (also used on PDP-S) was connected as the peripheral. J,t had a
combined printer, keyboard, and paper
tape I/O device (for program loading). It
eliminated the paper tape reader and
punch.

Technology Progress. Product
Development. and the State-of-the-Art Line

If there were no such thing as technological
progress, there would be no such thing as an
obsolete product. In such a situation, it would
not matter when a product was introduced into
the market, as it would be technically equal to
the other products available. In the computer
industry, this is far from the case: for computer
processors, peripherals, and systems, there is a
state-of-the-art line that indicates the average
technological level at which present products

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

are being offered. Since higher technology has
generally meant better price/performance, new
products introduced in the market must have a
proper relationship to the state-of-the-art line.
The following paragraphs elaborate on the interaction between technology progress, product
development, and the state-of-the-art line.
The complete development process can be envisioned as a pipeline process with the following
stages: research, applied research, advanced development (product breadboard), development,
test, sell/build, and use. In this model, ideas
and information flow through the various organizations in a process-like fashion, culminating in a product. Each product type has a
different set of delays associated with the parts
of the pipeline. At the end of the pipeline, the
"education of use" delay occurs while the prospective customers are taught how the product
meets their needs; this delay culminates in market demand. For well defined, commodity-like
products such as disks and primary memory,
the education of use delay is zero, as each user
"knows" the product. For a new language, on
the other hand, there is a large education of use
delay, and the market demand usually develops
slowly.
The disk supply process is a good example of
the pipeline nature of the development process.
The technology (as measured by the number of
bits per areal inch) doubles about every two
years (i.e., the density improves 41 percent per
year). IBM is estimated to invest about 100 million dollars per year in the development and associated manufacturing process pipelines.
Because of this massive investment, the IBM
disks essentially establish the state-of-the-art
line in a structure that is typified by Figure 23.
Using the pipeline development process, development of competitive disks by other companies would lie somewhere about four to six
years behind the state-of-the-art line. This can
be seen by looking at the development process
and taking into account the delays through each

59

stage. To be more competitive, the disk industry
short circuits various delays by engaging in reverse engineering; this results in only two-year
lags. In reverse engineering, the tools are micrometers and reverse molds. At the time of the
first shipment of a new product by the technology leader, the product is purchased by competitors and basically copied on a function per
function basis. The more successful designs use
pin for pin compatibility to take maximum advantage of the leader's design decisions.
From the process, it is also easy to see how
merely copying competitive products guarantees products that will be at least two years behind leadership products and lagging behind
the state-of-the-art. Nonetheless, if there is a
strong market function which operates to define
products based on existing product use, and if
the design and manufacturing process at the
copying company is quite rapid, such a strategy
can be effective. The copying process can also
be very effective for software products because,
while there are no delays associated with manufacture, the time to learn about the product provides a time window in which copiers can catch
up with the leaders.
A high technology, exponentially increasing
(volume) product is denoted by:
1.

Exponential yearly cost improvement
(price decline) rates through product
technology improvements as measured
by price decline of greater than 20 percent (e.g., disk price this year = 0.8 last
year's disk price, CPU = 0.79, primary
memory = 0.7).

2.

Short product life (less than 4 years).

3.

Various types of learning curves. Some
products require very little learning,
while others require a great deal of learning or require re-Iearning because of personnel turnover or the frequent hiring of
additional personnel.

60

COMPUTER ENGINEERING

The Product Problem (Behind the State-ofthe-Art)

Typical product situations, including competitive "problems," can be seen in Figure 25.
When a product is introduced to the market, it
has a relationship to the state-of-the-art line.
There are five possible situations:

1.

Ideal (on the state-of-the-art line).

2.

Advanced (moves below the line).

3.

Late (slip in time to the right).

4.

Expensive (more than expected in cost,
straight above the line).

5.

Late and expensive (to the right and
above the line).

Time Is Money (and vice versa)

Thus, product problems can be solved by either:
1.
2.

Movement in time (left) to get on the
line.
Movement in cost (straight down) to get
on the line.

With exponential price declines, a family of
products over a long time will follow a cost
curve, e:

Situations 3, 4, and 5 are product problems
because they are behind the state-of-the-art line
and, hence, less competitive. This implies increased sales costs, lower margins, loss of sales,
and so on. Note that a late product could be
acceptable if somehow the cost were lower.
Similarly, an expensive product is acceptable if
it appears earlier in time.

e=bXrt
where e = cost at time, t (in years), b = base
cost, and r = rate of price decline.
With de = change in cost above (or below) to
get back to the state-of-the-art line and dt = delay (or advance) in time to get back to the stateof-the-art line, let:

f = de I e = fraction of cost away from line
f = 1 - r dt = (poor cost, expressed as
project slip)
and:
dt = In (1 - f) Iln(r) = (poor timing, expressed as poor cost)

These formulas permit the interchange of time
and money (cost). For example, in disks or central processors where r = 0.8 and In.8 = 0.22,
note:

EFFECTIVE LATENESS
dt

1.3 ~/

............

.

EXPENSIVE
.......--LATE AND EXPENSIVE

LATE
Iii-~.....
1.0 -..-

8

0.9

;;

0.8

g

I(f

dt

OVER COST

......

IDEAL
0.6 -

}dC } EFFECTIVE

/~

y- C =

..........

ADVANCED

....

base Xrt, e.g, C

= O.st

0.4

A one-year slip is equal to a 20 percent cost
overrun.

~ '~·O",.

ll""~~lir

PRODUCT

I

1

0.8 dt

0 .....

'.8r

IDEAL NEXT

f = 1-

I

I

I

~

TIME (YEARS)

Figure 25.
Use of the state-of-the-art line to model
product cost problems and timing problems.

dt = - 4.45 X In (1 - f)

A 10 percent cost increase is equal to a 0.47year slip.

TECHNOLOGY PROGRESS IN LOGIC AND MEMORIES

Engineering, Manufacturing, and !nflation
Effects

a.

Engineering. Perhaps the major determinant of cost by the product design - number of parts, ease of
assembly, etc. The most common
cost problems occur by continued
product enhancement during the design stage to provide increased functionality (called "one-plussing the
design"). One-plussing often occurs
because the market had not been
modeled before the design was begun, and without a model of the
market, engineering is a ship without a rudder.

b.

Manufacturing. Direct labor and

Engineering, by establishing the product direction, has the greatest effect on the product.

Ho\:vever, since most product problems may
have multiple components, it is worth looking
at each.

1.

Timing.
a.

Engineering. Schedule slips translate
into a competitive cost problem as a
sub state-of-the-art, late product.

b.

Manufacturing. Building up the
learning curve base quickly by making many units before the design is
mature is risky, but it has a high
payoff when considering the apparent cost and/or delay.

2.

manufacturing overhead really matter when determining productivity.
Making major changes in the design
of a product or the location of manufacture for a product starts a new
learning curve and serves to stretch
the production time out, and the increased costs associated therewith
put ·false pressure on engineering to
design new products. One curve in
Figure 26 shows the direct costs associated with manufacturing assembly. Some learning should take place
as long :as product volumes increase
expontmtially, to get a net lower
cost. New technology materials
show the ,greatest cost improvement
for computers, assuming that semiconductors .and other electronic materials continue to improve with
time. By capital equipment investment (tooling), there can be stepwise
co.st reductions in materials costs.

Cost.
A number of components and organizations contribute to the total product cost
in an evolutionary fashion, as shown in
Figure 26.

----------

NET = f (LEARNING. TECHNOLOGY.
INFLATION. FUNCTIONALITY)
MANUFACTURING ASSEMBLY
(LEARNING)

NEW TECHNOLOGY.
MATERIALS

INFLATION FACTOR
INCREASE IN FUNCTIONALITY
(ENGINEERING)

c.
!-

Figure 26. The various components that contribute to
product cost.

61

Inflation. While not a direct cost
function, it combines with labor cost
to negate the downward cost trends
that were obtained from learning effects.

62

COMPUTER ENGINEERING

Compound Cost. The costs are taken
altogether. In terms of a sub stateof-the-art product, the costs are
compound.

d.

3.

Manufacturing learning. Learning curves
and forgetting curves really matter. Left
alone, a typical product may go down
three alternative paths (Figure 27):
a. C = b X 0.95 t
(a decrease of 5 percent/year)
b. c = b
(staying constant with little attention)
c. c = b X 1.06t
(increasing with inflation as little
learning occurs after many units are
produced)
Where c = cost at time, t (in years), and
b = base cost.

Mid-Life Kicker for Product Rejuvenation

By enhancing an existing product (the "midlife kicker"), one can improve the
cost/performance metric of a given product.
This is non-trivial, and for certain products
must be inherent (i.e., designed in). Under these
conditions, improvements in cost go immediately to get the product back onto the state-ofthe-art line. For example, a factor of 2 in performance halves cost/performance. The effect

OBSOLESCENCE ..H.
AT T

~

~
0
~

"

......

<.!l

g

dt = 4.45 X In (0.5) = 3.1 years
This situation is shown in Figure 28 and is compared with a 5 percent per year learning curve.
SUMMARY

The discussions above have attempted to
show how technology progress, particularly in
the areas of semiconductor logic, semiconductor memories, and magnetic memory
media, have influenced progress in the computer industry and have provided choice and
challenge for computer design engineers.
As was implied in the Structural Levels-ofIntegration and Packaging Levels-of-Integration Views of Chapter I, computer engineering is not a one-dimensional undertaking
and is not simply a matter of taking last year's
circuit schematics and this year's semiconductor vendor catalogues and turning some
kind of design process crank. Instead, it is much
more complicated and includes many more dimensions.
Two additional dimensions with which a discussion of computer engineering must deal, before going on the DEC computers as case
studies, are packaging and manufacturing.
These are discussed in Chapter 3.

Z

PRODUCT WITH'

'0 ""TI "G

of doubling the density of a disk is to move the
product back to the state-of-the-art line by a
time shift. The preceding formula gives:

9
....
()
}

CONSTANT COST

COST GAP
PROBLEM

LEARNING

AT TIME. T

........... INTRODUCTION
1.0

. . . . . . . . . . -010'95

!
~
<.!l
g

............
______

""" ....... .......

Figure 27.
Product cost versus time within
manufacturing learning.

t

FACTOR OF 2

~ ENHANCEMENT

0.5
0.4

NEW PRODUCT

o~

""',,",C = O.st

............

......

Figure 28.
Product cost improvement by enhancement
of cost/function.

Packaging and Manufacturing
C. GORDON BELL, J. CRAIG MUDGE,
and JOHN E. McNAMARA

As indicated in the previous chapter, computer engineering is more complicated than
simply applying new technology to existing designs or designing new structures to exploit new
technology. To design a successful new computer, the engineer must often deal with issues
of packaging, manufacturing, software compatibility, marketing, and corporate policy.
Some of these issues have been briefly referred
to in the first two chapters, and some are beyond the scope of this text. However, two issues
that can and should be discussed before exploring the case studies are packaging and manufacturing. Both of these are crucial to DEC, as well
as to the computer industry in general.
GENERAL PACKAGING

Packaging is one of the most important elements of computer engineering, but also one of
the most complex. The importance of packaging spans the size and performance range of
computers from the super computers (CDC
6600, CDC 7600, Cray 1) to the pocket calculator. Seymour Cray, the designer of the super
computers cited, has described packaging as the
most difficult part of the computer designer's

job. The two major problems he cites are heat
removal and the thickness of the mat of wires
covering the backplane. (The length of the wires
is also important.) His rule of thumb indicates
that with every generation of large computer
(roughly five years), the size decreases by
roughly a factor of 5, making these problems
yet worse. In his latest machine, the Cray 1, the
C-shaped physical structure is an effort to reduce the time-consuming length of backplane
wires while providing paths for the freon cooling system by having wedge-shaped channels
between the modules.
At the opposite end of the size and performance range, pocket calculators are also greatly
influenced by packaging. In fact, they are determined by packaging. The first hand-held scientific calculator, the Hewlett-Packard HP35, was
simply a new package for a common object, the
calculator, which had been around for about a
hundred years. It was not until semiconductor
densities were high enough to permit implementation of a calculator in a few chips, and not
until those chips could be repackaged in a particular fashion, that the hand-held calculator
came into existence. Currently this em bodiment
is synonymous with the calculator name, but
63

64

COMPUTER ENGINEERING

other forms are appearing. The calculator
watch, the calculator pencil, the calculator
alarm clock, and the calculator checkbook have
all been advertised.
Between the two extremes of super computers
and calculators, packaging has also been important in minicomputers and large computers. In
particular, packaging seems to be the dominant
reason for the success of the PDP-8 and the
minicomputer phenomenon, although marketing, the coining of the name, and the ease of
manufacture (also part of packaging) are alternative explanations. The principal packaging
advantage of the PDP-8 over predecessor machines was the half-cabinet mounting which
permitted it to be placed on a laboratory bench
or built into other equipment, both locations
being important to major market areas.

INTERCONNECTION

COMPUTER
SYSTEM

EXT. CABLES

CABINETS

ROOM. FLOOR

INT. CABLES

~

C~:!~EET

~

INT. CABLES

PCB AND

The importance of packaging is equalled only
by its complexity. The complexity stems from
the range of engineering disciplines involved.
Packaging is the complete design activity of interconnecting a set of components via a mechanical structure in order to carry out a given
function. To package a large structure such as a
computer, the problem is further broken into a
series of levels, each with components that carry
out a given function. Figure 1 shows the hierarchy of levels that have evolved in the last
twenty years for the DEC computers. There are
eight levels which describe the component hierarchy resulting in a computer system.
F or each packaging level there is a set of interrelated design activities, as shown in Figure
2. The activities are almost independent of the
level at which they are carried out, and some
design activities are carried out across several
levels.
While the initial design activities indicated in
Figure 2 are each aimed at solving a particular
problem, the solving of one problem in computer engineering usually creates other prob-

BACKPL!NES
(NOTE 2)

PCB ETCH

CONSOLE

POWER

(BP)~
BOX/FRAME

MODULES

MODULE HOLDERS.
CONNECTORS

DISCRETE OR IC
(NOTE 4)

PRINTED
CIRCUIT BOARD

~ .. 'n"~;~B)

~

WIRE BOND

CH~PS

(NOTE 4)

~

~

DUAL-IN·lINE

PA(~KAGE

~ TDAM~~~Tng~ ~IP)

METAL, DIFFUSION

The Packaging Design Problem

HOLDING
STRUCTURE

COMPONENT

POL YSILICON

TRANSISTORS

SILICON

SUBSTRATE

NOTES

1.
2.

3.
4.

Not present in second generation
Can be taken together as a single level in later generations
Sometimes hand wired
Third and fourth generations only

Figure 1, Eight-level packaging hierarchy for second to
fourth generation computer systems.

lems as side effects. For example, the integrated
circuits and other equipment that do information processing require power to operate. Power
creates a safety hazard and is provided by
power supplies that operate at less than 100 percent efficiency. These side effects create a need
for designing insulators and providing methods
of carrying the heat away from the power supply and the components being powered. In this
way, cooling problems are created. Cooling can
be accomplished by conducting heat to an outside surface so that it may be carried away by
the air in a room. Alternatively, cooling can be
done by convection: a cabinet fan draws air
across the components to be cooled and then
carries the heated air out of the package into the

PACKAGING AND MANUFACTURING

i~

~
II

ELECTROMAGNETIC
SAFETY
MECHANICAL CHARACTERISTICS
IE. G .. VIBRATION. SIZE. WEIGHTI
ACOUSTIC NOISE

I

I

I-

I

1

I

COST TO
• DESIGN
• MANUFACTURE

•
•

OPERATE
SERVICE

• BUY
• SHIP

•
•

MODIFY
DISCARD

1

I

fCOOLING. HEATING .

+·rEUGM.I~I~~~~~~~:~~MENT

.......- - - - - - - - - H E - A T - - : - :

1

I

{

POWER CONVERSION}
AND CONTROL PART

JI

POWER

I
I

SYSTEM INTERFACE

Figure 2. Packaging - a set of closely interrelated
design activities.

room. In either case, the air conditioning system is left with the problem of carrying the heat
away, and the fans associated with that system
are added to the fans associated with the computer to create acoustical noise pollution in the
room, making it more difficult for people to
work. Furthermore, if the computer is used in
an unusually harsh environment, a special heat
exchanger is required in order to avoid contamination of the components within the computer by the pollutants present in the cooling
airflow.
Finally, the mechanical characteristics of a
particular package such as weight and size

65

directly affect manufacturing and shipment
costs. They determine whether a system can be
built and whether it can be shipped in a certain
size airplane or carried by a particular distribution channel such as the public postal system.
The mechanical vibration sensitivity characteristics determine the type of vehicle (ordinary or
special air ride van) in which equipment can be
shipped.
It is also necessary to examine the particular
design parameter in order to determine whether
it is a constraint (such as meeting a particular
government standard), a goal (such as minimum cost), or part of a more complex objective
function (such as price/performance). Table 1
lists the various kinds of design activities and
constraints, goals, or parts of more complex objective functions that they determine. The table
also gives the dimensions of various metrics
(e.g., cost, weight) available to measure the designs; many of these metrics are used in subsequent comparisons.
Given the basic design activities, one may
now examine their interaction with the hierarchy of levels (i.e., the systems) being designed
(see Table 2). This is done by looking at each
level and examining the interaction of the design activities for that level with other design
activities (e.g., function requires power, power
requires cooling, cooling requires fans, fans create noise, and noise requires noise suppression).
Computer Systems Level. The topmost
level in Table 2 is the computer system, which
for the larger minicomputers and PDP-lO computers consists of a set of subsystems (processor, memories, etc.) within cabinets, housed in a
room, and interconnected by cables. The functional design activity is the selection and interconnection of the cabinets, with a basic
computer cabinet that holds the processor,
memory, and interfaces to peripheral units.
Disks, magnetic tape units, printers, and terminals occupy free standing cabinets. The functional design is usually carried out by the user
and consists of selecting the right components

66 COMPUTER ENGINEERING

Table 1.

Design Activities, Metrics, and Environment Goals and Constraints

Design Activity

Environment and [Metrics]

Primary function and
performance (e.g .. memory)

Market. the consumer of the system
[Memory size in bits. operation rate in bits/sec]

Human engineering

Human factors criteria. competitive market factors

Visual/aesthetics

Market. other similar objects. the environment in which the object is to exist

Acoustic noise

Government standards. operating environment. market
[Decibels in various frequency bands]

Mechanical

Shippability by various carriers. handling. assembly/disassembly time
[Weight. floor area. volume. expandability. acceleration. mechanical frequency
response]

Electromagnetic radiation

Government standards. must operate within intended environment
[Power versus frequency]

Power

Operating environment. market
[watts. voltage supply range]

Cooling and environment

Market. intended storage and operating environment. government standards
[Heat dissipation. temperature range. airflow. humidity range. salinity. dust particle. hazardous gas]

Safety

Government standards

Cost
Cost/metric ratios

[Cost/performance (its function) - cost/bit and cost/bit/sec. cost/weight.
cost/area. cost/volume. cost/watt]

Density metrics

[Weight/volume. watts/volume. operation rate/volume]

Power metrics

[Operation rate/watt: efficiency

Reliability

[Reliability - failure rate (mean time between failures). availability - mean time
to repair)

to meet cost, speed, number of users, data base
size, language (programming), reliability, and
interface constraints. Aside from the functional
design problem, cooling and power design are
significant for larger computers. For smaller
computers, accessibility, acoustic noise, and visual considerations are significant because these
machines become part of a local environment
and must "fit in:'

= power out/power in]

Cabinet Level. Since the cabinet is the lowest level component that users interface to and
observe, physical design, visual appearance,
and human factors engineering are important
design activities. For the computer hardware
designer, on the other hand, the component
mounted in the cabinet is usually the largest system. Functional design efforts ensure that the
various components (i.e., boxes) that make up a

PACKAGING AND MANUFACTURING

Table 2.

67

Interrelationship of Hierarchy of Levels and Design Activities
Level of Packaging

Design
Activity
Functional

Chip

Chip
Carrier

Module

Backplane

Box

Logic
electrical

Cabinet
Configuration
options

Physical
layout

Circuit design
physical
layout

Physical
layout

What fits
and operates

Human
Interface

Visual

Visible,
bought for
integration

Selection of
right
components
by user

Boxes and
operable
configurations
Location of
console, size
for use

Placement
for use

Determines
system

Set of cabs,
attractive
place to be

appearance

•

Airflow

Acoustic

Computer
System

vibration

Quiet for
operators
and users

Mechanical

Electromagnetic
interface

Buildable
and signal
transmission

Shippable
and
serviceable

Noise coupling

Inter /intramodule noise
coupling, RFI
containment
and shielding

and rejection
of radio
frequency
interference
(RFII

Power

Special
on-chip

Cooling and
other

Chip to
cooling
special
environment

environment

IC module
cooling
special
environment

Safety

activities

Logic

•

Away from
RFI input
(outside
operating
range)

Dist. and
regulation

Control,
dist. and
regulation

Interconnect
with computer
system

By user
special power
suppl ies for
high
availability

IC to
cooling

Module

Cooling and
covering

Source

Interbox
coupling to
room air
environment

Determines
safety if

Determines
user safety

Power for

Circuit
logic

RFI
containment,
external R F I
shield

Floor load
room size

Dist. and
regulation

various
systems

Dominant
design

•

•

used at
th is level

•

Mechanical,
power,
cooling, EMI,
acoustic

Configuration
visual,
shipping
EMI, safety

User
configuration
design

The box and backplane levels can be considered as a single level (alternatively, the box level may be eliminated in large systems).

68

COMPUTER ENGINEERING

cabinet level system will operate correctly when
interconnected. Safety and electromagnetic interference characteristics are important because
the cabinet serves as the outermost place in
which shielding can be installed. Cooling and
power distribution must be considered, since a
number of different boxes may be mounted
within the same cabinet. Finally, the mechanical structure of a cabinet must be designed to
maintain its physical integrity when shipped.
Box Level. Box level functional design consists of taking one or more backplanes, the
power supplies for the box, and any user interface such as an operator's console and interconnecting them mechanically (see Figure 3).
For systems that are not sold at the box level,
no separate box is required, and the power supply and backplanes are mounted directly in a
cabinet (see Figure 4) or other holding structure

Table 3.

such as a desk or terminal case, so that box and
backplane design merge. If systems are sold at
the box level, then the visual characteristics may
be important; otherwise, the design is basically
mechanical and consists of cooling, power distribution, and control of acoustic noise. The
structure must be sound to protect the unit during shipment.
Of all the dimensions to consider in the design, perhaps the most important is how the box
(or module mounting structure) is placed in a
cabinet. This placement affects airflow, shippability, configurability, cable placement, and
serviceability, and is a classical case of design
tradeoffs. The scheme that provides the best
metrics, such as packaging density and weight,
may have the poorest access for service and the
most undesirable cable connection characteristics. These characteristics are given in Table 3.

Fixed, Drawer, and Hinged Box/Cabinet Mounting

Mounting

Service Access

Cabling

Density

Cooling

Applicabil ity

Fixed

Good for either
backplane or module.
but not both unless a
thin cabinet is used

Best (i.e ..
shortest)

Good for thin
or rear
cabinet
power supply
mounting

Best
(known)

Box not needed;
box can be used

Drawer

One-side access

Long and
movable

Very high

Can be
cooled*

High density. selfcontained

Drawer (with tilt)
for service

Good

Longer and
more movable
than non-tilt
version

Very high

Can be
cooled*

Drawer vertical
mounting modules

Very good

Long and
movable

High

Hinged {module
backplane)

Very good

Short

Medium

* Density restricts cabinet airflow.

Good (if
fans are
fixed to
cage)

Separate box is
awkward

PACKAGING AND MANUFACTURING

REMOVABLE SIDe. PAr..;ci..

HEMOVA8lE TOP COVen.

/

SLIDE GUIDE

(a)

(b)

Front view (with top cover).

Side view (with top cover removed).

Figure 3.

PDP-11/05 computer box.

69

70

COMPUTER ENGINEERING

CPU CABINET

CABLE SUPPORT STRAP
AND CABLE HARNESS

ELAPSED
TIME
METER

POWER SUPPLY
WITH REGULATORS

POWER SUPPLY
WITH REGULATORS- ----:-:~-:--,:-::-----~~:;.:'.....
CONNECTOR FOR
CPU MOUNTING
BOX FAN POWER -:-:-::-~-=--~~~~~-~-=-~~~~I
AND THERMAL
SENSOR

"""'::~~..:;.~~:- UPPER LOGIC
FANS

CONSOLE

CONNECTOR FOR PANEL

MOUNTING SPACE
FOR CONTROLLERS
IUPTO 4J AND
SMALL PERIPHERAL
CONTROLLERS
IUP TO 5J

FOR OPTIONAL
FLOATING POINT
PROCESSOR

MODULES INSTALLED IN CPU
BACKPLANE ASSEMBLY

Figure 4.

Major components and assemblies of PDP-11170 mounted in standard DEC cabinet.

PACKAGING AND MANUFACTURING

Backplane Level. This level of design is the
final level of interconnection for the computer
components that are designed to stand alone,
such as a basic computer disk or terminal.
Backplane design is part of the computer's logical design. In second generation machines such
as the PDP-7 (Figure 24a, Chapter 6), the backplane was wire-wrapped. In the early 1970s
printed circuit boards were used to interconnect
modules (Figure 5). Secondary design activities
include holding, powering, and cooling the
modules so they will operate correctly. Since the
signals are transmitted on the backplane, there
is an electromagnetic design problem. For industrial control systems whose function is to
switch power mains voltages, additional safety
problems are created.
Module Level. In the second generation,
module level design was a circuit design activity
taking discrete circuits and interconnecting
them to provide a given logic function. In the
third and fourth generations, this interface between circuit and logic design moved within
chip level design, so that module level design
became the process of dealing with the physical
layout problems associated with logic design.

71

Module level design is basically electronic, so
power, cooling, and electromagnetic interference (cross talk) considerations dominate.
I ntegrated Circuit Package and Chip
Level. Most integrated circuits used in the com-

puter industry today are sold in a plastic or ceramic package configuration that has two rows
of pins and is called a dual inline package
(DIP). The majority of the integrated circuits in
the module shown in Figure 6 are 16-pin DIPs.
Because of the popularity of this packaging
style, the terms "integrated circuit," "chip,"
and "DIP" are often used interchangeably. This
is not strictly correct; an integrated circuit is actually a 0.25- X 0.25-inch portion of semiconductor material (die or chip) from a 2- to 4inch diameter semiconductor wafer. Except for
cases where multiple die are packaged within a
single DIP, the integrated circuit, chip, and DIP
can be discussed as a single level.
Design considerations at the integrated circuit level include power consumption, heat dissipation, and electromagnetic interference.
Because some integrated circuits are designed to
operate in hostile environments, there is considerable mechanical design activity associated

(PLATED THROUGH
TO LAYER 1)

Figure 5.
Cross-section of a printed circuit
backplane.

Figure 6.
LSI-11 processor with 8 Kbytes of memory
and microcode for commercial instruction set.

72

COMPUTER ENGINEERING

with packaging, interconnection, and manufacturing.

in the prior generation. Discrete events mark
packaging characteristics of each generation,
starting from 1 bit per vacuum tube chassis in
the first generation and evolving to a complete
computer on a single integrated circuit chip in
the fifth generation. Not only the size of the
packaging changed, but also the mounting
methods. In the first generation, logic units
were permanently mounted in racks, where they
were removable for ease in servicing in later
generations.
While the timeline of Figure 7 shows the
packaging evolution of a complete computer,
Table 4 shows how a particular component,

The Packaging Evolution

Figure 7 shows the relation of packaging and
the computer classes for the various computer
generations. For each new generation there is a
short, evolutionary transition phase. Ultimately, however, the new technology is repackaged such that a complete information
storage or processing component (bit, register,
processor) occupies a small fraction of the space
and costs a small fraction of the amount it did

45

55

50

65

60

10

15

80

I

I.....

GENERATION ...

----FIRST-----~

VACUUM
TUBE

~SECONO_
TRANSISTOR

I......
...

---THIRD----

I+--

IC

FOURTH - -

LSI

j4-FIFTH-

PACKAGING

~~RL~~~~ RE
FOR MINIMAL
COMPUTER

...
, ......- - ROOM - -.........
, ......1 BIT/CHASSIS
(FIXED}
ERA 1101

SUPER
MAINFRAME

ENIAC

EDSAC
WHIRLWIND

MINI

CABINET - -••,....1--- BOX - -••,....1--- BOARD

1 BIT/MODULE

1 REG/MODULE

UNIVAC 1103 CDC 1604

104

7090

LGP-30

PDP-1

REG-ON-A-CHIP

CDC 6600

P-ON-A-CHIP

CDC 1600

S/360; PDP-6

PDP-11/70 VT18
8008

LSI-11

HP35

HAND-HELD
DUMB

TERMINAL
(DESK TOP)

Figure 7.

CRAY 1

S/310

PDP-8

MICRO

C-ON-A-CHIP

STORED PROGRAM

INTELLIGENT

Timeline evolution of packaging.

Table 4. Packaging Hierarchy Evolution for Universal Asynchronous Receiver/Transmitter (UART)
Telegraph Line Controller
Generation
Early Second

Late Second

Early Third

Late Third

Late Fourth

Backplane.
Modules.
Discrete
Circuit

2 modules
Discrete
Circuit

Module.
IC.
Chip

IC.
Chip

Chip area

PACKAGING AND MANUFACTURING

73

now called the Universal Asynchronous Receiver/Transmitter (UART), has evolved.
The UART logic carries out the function of
interfacing to a communications line that carries serial data and transforms the data to parallel on a character-by-character basis for entry
into the rest of the computer system. The
UART has three basic components: the serial/parallel conversion and buffering, the interfaces to both the computer and to the
communication line, and the sequential controller for the circuit.
The UART is probably the first fourth generation computer component, since it is somewhat less complex than a processor yet rich
enough to be identifiable with a clean, standard
interface. *
THE DEC COMPUTER PACKAGING
GENERATIONS

With this general background on packaging,
one can examine the DEC packaging evolution
more specifically and against the general archetype of Figure 1. Figure 9 shows how the hierarchies have changed with the technology
generations. The figure is segmented into the
different product groupings. A product is identified as being at a unique level if it is sold at the
particular packaging level. The first DEC computers (i.e., PDP-I to PDP-6) were sold at the
cabinet level as complete hardware systems. Although the PDP-8 was available at the cabinet
level for complete systems, it was significantly
smaller than the previous machines and was
principally sold at the mechanical box level.

* Historically,

Figure 8. 4707 transmitter line unit
of the late second generation.

DEC played a significant part in the development of the UART technology. With the PDP-I, the first UART
function was designed using SOO-KHz systems modules and was used in a message switching application as described in
Chapter 6. The interface was called a line unit and was subsequently repackaged in the late second generation as two
extended systems modules (Figure 8). The UART function was also built into the PDP-8/1 using two modules that were
substantially smaller than those for the PDP-I. In the 680/1, a PDP-8/1-driven message switch, the UART function was
accomplished by programmed bit sampling. late in the third generation (or at the beginning of the fourth generation), some
designers from Solid State Data Systems of long Island, N.Y .. worked with Vince Bastiani at DEC and developed a UART
that occupied a single chip. This subsequently evolved into the standard integrated circuit and is used throughout the
industry.

74

COMPUTER ENGINEERING

GENERATlONS-----SECOND------IN-_ _ _ _ _ _ THIRD _ _ _ _ _ _ _ _t---FOURTH~
COMPUTER
(NOTE 1)
IS HELD BY:

CABINET(S)
BACKPLANES

CABINET

MODULES
DISK CKT
PDP-l.4.5
LlNC ISYSTEMI

PDP-8.8/S.
LlNC-8
\

~~~~~.7.8.

I:!~~'F~~~ CHIPI

PDP-15. KilO.
KL10.
VAX-ll/780

\
\

\

SEE NOTE 2
BOXWITH
BACKPLANE(S)
MODULES

BOX
(SLIDE OR
FIXED IN
CABINET)

CABINET(S)
BACKPLANE(S)
MODULES
IC CHIP

SEE NOTE 2

------+

PDP-8.8/S.
LlNC-8.
PDP-14
IR SERIES
FLIP CHIPI

BOX WITH
BACKPLANE
MODULES
IC CHIP
PDP-8/1. L. E.

F. M. A.
PDP-ll/04 11170
1M-SERIES
FLIP CHIPSI

--

...

"'\.....

BACKPLANE

PDP-ll/03
(BOX)

"

\

\

BACKPLANE
MODULES
ICCHIP

\
NOTES

1.Processor. memory. and basic
I/O controller logic
2.Evolution from box with multiple backplanes
interconnected by cables to a single
box and backplane (i.e_. 1 levell.

LEGEND

(MODULE SERIES)

--

\

- - - EVOLUTION
- - - PART OF HIERARCHY

LSI-ll
(PDP·ll/03)

\

\;...-----.
MODULE
IC CHIP

MODULE

CMOS 8
(BOARD ONLY)

Figure 9.

DEC physical structure (packaging) hierarchies by technology generation.

Subsequently, computer systems became available at the backplane level (LSI-ll), and at the
module level (CMOS-8).
The original packaging hierarchy for most of
DEC's second generation computers used a relatively common packaging scheme based on the
PDP-l. The most significant change occurred
late in the second generation when Flip Chip
modules (Figure 9) were introduced so that
backplanes could be wire-wrapped automatically.
The change to wire-wrap technology not only
reduced costs and increased production line
throughput, it also enabled the box-level production of computers. The change to wire-wrap
and two level products (box and cabinet) is
clear in the second generation. The offering of

products at these two levels continued into and
through the third generation.
With the advent of the fourth generation,
large-scale integration permitted the construction of a complete minicomputer processor on a
single module. Although components are sold
as separate modules (e.g., processor, communications line interfaces, additional primary
memory), a complete system requires a backplane; thus, the lowest level for the product is
the backplane. For larger systems, a power supply is combined and placed in a metal box. A
typical example of such a product is the LSI-II,
which is marketed at three levels as shown in
Figure 9.
The late fourth generation has brought the
processor-on-a-chip, and another packaging

PACKAGING AND MANUFACTURING

level to the price list. An example of the processor-on-a-chip is the CMOS-8, described in
Chapter 7. The new packaging level offered to
the customer is the CMOS-8 module, which is a
single-board complete computer with processor; 16-Kword memory; and all the optional
controllers to directly interface up to five peripheral options.
DEC Boxes and Cabinets

Since the function of the cabinet and box is to
hold backplanes that in turn hold modules that
in turn hold circuit level components, the metric
of electronic enclosures is the number of printed
circuit boards they hold. The earliest DEC
method of mounting was to place the backplanes directly in a 6-foot-high cabinet which
held 19-inch-wide equipment in a 22- X 3~-inch
floor space and weighed about 185 pounds. Figure 10 shows the top view of the various cabinets used to hold module backplanes and boxes
for minicomputers since 1960. The changes to
the basic DEC 6-foot cabinet have mainly been
for improved producibility. The latest (circa
1973) was to use riveted upright supporting
members so the cabinet could be assembled easily without requiring bulk space for shipment
and storage.
The original cabinet used the entire cabinet as
an air plenum so that air was forced between
the modules and out the front doors. When the
PDP-7 used the same cabinet and the module
mounting frame cut off the airflow, it was necessary to add fans to the back doors to blow air
at the modules. Since cooling was one of the
weak points in the PDP-7, the PDP-9 used a
self-contained mounting and cooling structure
in which air was circulated between the modules
with air pulled in from outside without going
through the cabinet.
A second, later packaging method, initiated
with the PDP-8, packaged the metal-boxed
minicomputer inside the 6-foot cabinet. Figure
11 shows the significant boxes that have been

75

used to package minicomputers both within the
6-foot cabinet and freestanding. The box packaging history begins with the PDP-8. The rows
of Figure 11 indicate the four ways that are
available to access the circuitry (fixed, book,
slides, and tilt for access). The PDP-8 design
was followed by the PDP-8fS design which oriented the modules with the pins up for access to
the backplane. By tilting (rotating) the box, the
handle side of the modules could be accessed.
For the PDP-8fl (not shown), modules were
mounted in a vertical plane.
Several fixed backplane module mounting
structures were formed beginning with the
PDP-8f A (1975), which was the first DEC minicomputer since the PDP-5 to be mounted in a
fixed structure in a cabinet.
DEC Backplanes

Backplanes provide the next level-of-integration packaging below cabinets and boxes;
they are used to hold and interconnect a set of
modules which form a computer or an option
(e.g., processor, memory, or peripheral controller). Figure 12 gives the relative cost of interconnecting backplane module pins. Here the
cost per interconnection is roughly the same as
with a printed circuit module interconnection
(Figure 13). This can be somewhat misleading
because backplanes require a negligible cost for
testing and few failures occur during testing.
Figure 12 shows various kinds of interconnection technologies. Even though there are
exponential increases in quantities produced,
the cost continues to increase in the long run
with only occasional downward steps. The
greatest cost decline occurred when interconnections were carried out using automatic
wire-wrap machinery, but the PDP-8fE was
equally significant by being the first DEC computer to use a completely wave-soldered backplane. Figure 12 also shows how effectively the
module pins were used (i.e., whether all available pins were used).

76

COMPUTER ENGINEERING

25

125 X 121

32

(32 X 241
FLIP CHIP MODULES

SYSTEMS MOOULES (60-641

FLIP CHIP 167-721
KA10. KI. KL

1641 PDP·7

PDP-l. 4. 5. 6; ORIGINAL CABINET

H950
CABINET
TO HOLD
METAL BOXES

167- 1
ALL PDp·ll.

PDP-9. lINC-8. PDP-12

60 INCHES HIGH

50 INCHES HIGH

PS/PC

,.

II

'"
1

,@
1

t

CABLING AREA

PS/PC

0
29

® ®®

67

@I

0

i

1

29
SUPER HEX FLIP CHIP; VAX-llI7BO 1781

HEX FLIP CHIPS; PDP-11/60 (771

NOTE
60 INCHES HIGH

Air enters
at top; PS

0

under modules
PS

ALL CABINETS 72 INCHES HIGH
TOP VIEWS
PS = POWER SUPPLY
PC = POWER CONTROLLER

II

@@
SUPER HEX (78)
DECSYSTEM-2020

Figure 10.

Cabinets used to hold various DEC computers (in fixed. book. and box configurations).

PACKAGING AND MANUFACTURING

SLIDES AND BOOK
32 INCHES HIGH
,--_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ FIXED _ _ _ _ _ _ _ _ _ _ _ _ _ _ _--.,

21 INCHES HIGH

~
s

----

FLIP CHIP
(65)
PDP-8

10;20
MODULES

(FRONT OR BACK)

HEX AND
QUAD
(75; 76)
PDP-8/A

EXT. HEX

(71)

(71)

PDP-11/03;
BA11-N

PDP-11/60
BA11-P

, - - - - - - - - - - - - - - - SLIDES AND T I L T - - - - - - - - - - - - - - - - ,

PS

EXT. QUAD
(72)

PDP-11/20
(SIMILAR TO PDP-8/E)

EXT HEX
(74)
PDP-11/04. 34. 70 (MEMORY)
OPTIONS
BA11-K

31 INCHES HIGH

PS

PS = POWER SUPPL Y
PC = POWER CONTROLLER

EXT. HEX - BA11-F
(SIMILAR TO PDP-8/1 - 68)
PDP-11/45. 40. 70 (PROCESSOR)

EXT. HEX
(76) BA11-L
PDP-11/04(BASED ON PDP-11/05)

(72)

L..------------SLIDES------------'

Figure 11.

Boxes used to hold various DEC PDP-8 and PDP-11 series minicomputers.

77

78

COMPUTER ENGINEERING

(;1
~DP-4
•

38

DEC Modules

53
45

HANDWIRED
AND
SOLDERED

36

WAVE SOLDERED
PC BOARD)

34
32
LEGEND
_ _ COST/AVAILABLE PIN

30

-- -

COST/ACTUAL PIN

_

PINS/IN 2 ON BACKPLANE.

•

28

'
"
:
, ,
,

,/',

26

./

24

PDP-11110~

22

"

!~ "
___ I

20

/·~V·'AAunoo

18

16

:~RCEH~NE

14

WRAPPED

12

{

r'
\
,

.\
\

.~

PDP-8

\

I

\......

·

PDP-11/45
•
PDP·11/34

10
PDP-8/1.

-

PDP-~

(OMNIBUSALL WAVE
SOLDERED
PC 80ARD)

YEAR

Figure 12. Relative cost per possible and actual interconnection versus time for various DEC computer backplanes: also pin density (in pins per in2) versus time.

20
18

z

0

16

;::

~

14

0

~

~

12
10

8
>

;::

~

Figure 13. Relative cost per interconnection on DEC
printed circuit board modules versus time.

Since the function of modules is to interconnect and hold components, the metrics for
modules are the area for mounting the components and the cost of each circuit interconnection. For minicomputers, the emphasis
has been to have larger modules with more
components packed on a module as a means to
lower the interconnection cost. Figure 14 shows
the area of DEC modules and the number of
external pins per module versus time. Because
integrated circuit densities have been increasing, in effect providing lower interconnection
costs, a given module automatically provides
increased interconnects simply by packaging
the same number of integrated circuits on a
module. Obviously, one does not want to credit
this effect to improved module packaging. By
increasing the components per module, the cost
per interconnect can be reduced provided the
cost to test the module increases less rapidly
than the increase in components. The emphasis
on module size is usually most intense for larger
systems, where a relatively large number of
modules are needed to form a complete system.
Until recently, the increase in module area
was accompanied by increases in the number of
pins available to interconnect to the backplane.
In the case of the VAX-ll/780 and the DECSYSTEM 2020, the number of pins did not increase significantly over previous designs,
although the board area was 50 percent larger.
In these cases, the number of integrated ciruits
that could be cooled limited the density. In
other cases, either the number of pins or the
module size limited the module's functionality.
There are similar effects throughout the generations.
In the early second generation Systems Module designs, the number of pins and the circuit
board area (in square inches) were about the
same. Components were fairly large and loosely
packed on modules. With the Flip Chip series,
circuits were modified to pack a larger number

PACKAGING AND MANUFACTURING

79

! 4 - - - - S Y S T E M S - - -. . . . . . .- - F L I P C H I P - -. . .r - - - - - E X T E N D E D - - - _ . , . .

~----6

200

I

!

/

150

_

6

USABLE
MODULE AREA

I
I

1
~

I

a:

400 WATTS

1.0

0

0.9

0

0.5 -

>

0.4 -

g

0.7
0.6

0

0

I:::.

0

0

o

0

0.5
0.4

0

0

0

0.3 -

0

0.3

0.2 -

0.2

0.1 g

r------A- : - - - - - - - - - - - - - - ,

0.8

06L~

8
;::

0

I

I

I

l

1962

1964

1966

1968

.. l

l-

1970

1972

1974

0,-1-

~

~.

I-

1978

1976

°0~-~-~-:---~-~--~---6~00--~700

1980

YEAR
WATTS

(a}

Cost efficiency (in relative cost per watt).

(a)

Cost efficiency (in relative cost per watt).

30
25
A

I:::.

LINE SWITCHING
20

0

(J)

15

2·STAGE

III

~

SWITCHING

0

10

0° 0---1:::.

o

15

,...
,...

(J)

o ~LOW.VOLTAGE

,rr>ll,tru Th
.. r>,rr>ll,t rI,,,rrr''''Y'O " f th .. f1, ... _
.I. . "'''.. -~ ""'.11.'""11,..1..1 ... 1.) •
..1..11'-' """'.I.l'-'YJ.L. U.lU6J.U..1.11 V.l 1.1..1"" 1..1.11'-

1

flop package in Figure 8 is basically an EcclesJordan trigger circui r witt[ lf3=tfanSiSfOr am pU:
fier on each output. The input amplifiers isolate
the pulse input circuits and give high-input impedance. The amplifiers give enough delay to
allow the flip-flop to be set at the same time that
it is being sensed. Figure 9 shows the waveforms
of this flip-flop package when complemented at
a lO-megapulse rate. The rise and fall times,
about 25 millimicroseconds, are faster than one
normally sees in a single inverter or an emitter
follower because on each output there is an in-

ZERO IN

ONE IN

r-----~-t----------,r----,-----+---------~ ;:~~ VI

.------r--+---+----+---,-------r-----<>~~;VI
r----t------,-t----t---t--r---+-+--.--+----4----i'---r------+--,--..-

A

NOT A

~

SAME SIGNAl

A-<>

01-3)
1 (0)

---,

1NVERTED SIGN A l l

1

A'"

---,

1 1-3)
010)

A-¢

1 10)
01-3)

A'"
010)
1 1-3)

I

Figure 6. Signal naming convention for DEC
dual polarity logic.

-15 V

-'U-""

f

I

I
I
I

I
A

CIRCUIT
+10 V

I

:=:r:?
SYMBOL

.JFigure 7.

I

~

AN D gate for negative signals.

'If"
I

:~
+10V

CIRCUIT

Figure 8.

I

1-

OR gate for negative signals.

SYMBOL

generated by pulse amplifiers which were blocking oscillator circuits employing pulse transformers. The pulse transformer had both
terminals of its secondary winding available so
that either positive or negative pulses could be
obtained, depending upon which terminal was
grounded. A negative pulse (ground to - 3 volts
and back to ground) was represented in the
logic drawings by a solid triangle, and a positive
pulse (ground to + 3 volts and back to ground)
was represented by a hollow triangle. These signals were normally distributed on twisted pair
and could travel the long distances needed in
large digital systems like the PDP-I without
degradation.
Pulse amplifiers were important elements because they produced high energy (high fan-out),
standardly shaped pulses which could be used
to gate a complete 18-bit register as a single logical signal. The use of pulses and buffered/ delayed output flip-flops is emphasized
because the concept of gating a pulse at the
source and using the gated pulse to transfer
data from register to register on a parallel basis
used a minimum of logic compared to other
methods in use at that time. Some other methods used a common clock and dual rank flipflops for register output delays or used clocked
serial logic and delay lines to store register contents.
Returning to the discussion of gates and flipflops, a primitive flip-flop can be obtained by
interconnecting two grounded emitter inverters
as shown in Figure 9. When one inverter is cut
off, its output is negative. This holds the other
inverter on, which in turn holds the first inverter off. If another inverter circuit is added to
the circuit in Figure 9, the circuit in Figure 10 is
obtained.
The application of a negative pulse to the input of the additional inverter changes the state
of the flip-flop. In the actual implementations
of DEC Laboratory Module flip-flops, buffer
amplifiers were added to the outputs to permit a
single flip-flop to drive the inputs of many other

DIGITAL MODULES. THE BASIS FOR COMPUTERS

I I
Figure 9.

Primitive flip-flop.

INPUT

Figure 10.

Primitive flip-flop with inverter.

gates. The buffer amplifiers also provided delays at the outputs of the flip-flops such that the
output did not change until after the activating
pulse was over. This permitted the state of the
flip-flop to be sensed while the flip-flop was
being pulsed, a necessary feature for the simple
implementation of shift registers, simultaneous
data exchange between two registers, counters,
and adders.
Collections of the inverters, gates, and flipflops just described were packaged in appropri:ate qu(\ntiti·es (i.e., as many as would fit within
the module size and pin constraints) and sold as
Laboratory Modules and System Modules.

107

There were a relatively small number of module
types available in the Laboratory Module
Series. For example, the first product line, the
100 Series, included:
103
110
201
302
402
406
410

6 inverters
2 6-input negative diode NORs
1 buffered flip-flop
lone-shot
1 clock pulse generator
1 crystal clock
1 Schmitt trigger circuit pulse generator
501 3 level standardizers
602 2 pulse amplifiers
650 1 tube pulser (15 volt, 100 nanosecond
pulses)
667 4 level amplifiers (0 to -15 volts)
801 1 relay

By contrast, there were many System Module
types developed. With their higher packing density, lower cost, and fixed backplane wiring,
they were used for computers, memory testers,
and other complex systems of logic.
It is interesting to note that a large percentage
of the modules on the above list were designed
for generating and conditioning of the pulses
and levels used in the relatively small number of
logic circuits. Reference to a present day integrated circuit catalog reveals few pulsing and
clocking circuits but a great many logic circuits.
The emphasis on pulses was one of economy, as
previously noted.
Register transfer level structures and the System Module logic diagrams can easily be corollated, both because of the use of pulse
amplifiers to evoke operations and because of
the buffered/delayed flip-flops. Figure 11
shows in simplified form the interconnection of
two PDP-1 registers and lists some of the register transfer commands that could be used in
conjunction with these registers. Typical examples of such register arrangements in the PDP-l
were the Accumulator (AC), which was the

108

IN THE BEGINNING

~A-;:~IMB'

y

AC) }

AC
7AC -> AC
AC EB MB -> AC
AC

+

1 -> AC

----+

ETC.

WITH REGISTER TRANSFER
CONTROL SIGNALS SHOWN

ctJ
SIMPLIFIED

Figure 11, Register transfer representation of POP-1
Accumulator (AC),

CARRY IN

III

Z

FROM AC

!:!ffi
..1"-

,,-0

~~
a:

III
wz

~:

il-

::;a:

«w

., AC -+ AC

~;

..1O

::>w

.. a:

AC E!l MB -+ AC
(1/2 ADD)

--+----+1

NOTE:

Input at least significant bit generates AC

Figure 12.
AC.

+ 1 -+AC.

Logic diagram of PO P-1 Accumulator bit.

basic register in which all arithmetic operations
were carried out, and the Memory Buffer (MB)
register.
Figure 12 shows the logic diagram for one bit
of the Accumulator and Memory Buffer for operations given in the register transfer diagram.
The operation to clear the Accumulator is carried out by a pulse amplifier connected to all 18
bits of the Accumulator, with logic at the input

of the pulse amplifier to specify the conditions
under which the Accumulator is to be set to
ZERO. Complementing the Accumulator - is
done by a transistor at one of the complementing inputs, Cl, which receives a negative control pulse. Addition is a two-step
process in which the Accumulator and Memory
Buffer are half-added to the Accumulator using
an exclusive-OR operation (where an Accumulator bit is complemented if the corresponding Memory Buffer bit is a ONE), and then the
carry operation is performed. A carry at a given
bit position is initiated to the next bit if the
Memory Buffer is ONE and the Accumulator is
ZERO. Once a carry is started as a bit, it will
continue to propagate if each bit of the Accumulator is a ONE. The propagation is done via
a standard pulse at the propagation output P2.
In a similar way, a ONE can be added to the
Accumulator by pulsing the least significant bit
of the Accumulator which, if it is a ONE, will
create a carry that will propagate along all the
digits that are ONE, complementing each bit of
the Accumulator to ZERO as it propagates.
In 1960 DEC began building modules with
slightly different circuitry than that described
above. While transistor inverters, buffered/delayed flip-flops, and their associated
pulse logic were the best choice for 5- and 10MHz logic, capacitor-diode (C-D) gates and
unbuffered flip-flops were found to be preferable for low speed logic because greater logic
density and lower cost could be achieved.
A positive capacitor-diode gate is illustrated
in Figure 13. With both the level input and the
pulse input at ground for sufficent time to allow
the capacitor charge to reach 3 volts, a negative
level change or a negative pulse at the pulse input will cause a positive pulse to appear at the
output. Such gates could drive the direct set input of any flip-flop which required a positive
pulse and were built into some unbuffered flipflop inputs to be used for shifting and counting,
using the capacitor as a delay element. Often

DIGITAL MODULES, THE BASIS FOR COMPUTERS

one inverter would drive many capacitor-diode
combinations in the same module.
A negative capacitor-diode gate is illustrated
in Figure 14. With the level input at - 3 and the
capacitor input at ground for a sufficient time
to allow the charge on the capacitor to become
stable, a negative level change or a negative
pulse at the capacitor input will cause the transistor to conduct. The conducting transistor
grounds the output for an amount of time determined by the gate time constant or the input
pulse width, whichever is shorter. Gates of this
type could be used to set and clear unbuffered
flip-flops by momentarily grounding the correct
flip-flop outputs in a fashion similar to the inverter gate that was added to Figure 9 to obtain
Figure 10.
The principal advantages of the capacitordiode gates were:
1.

2.

3.

The level input to the gate was used to
charge a capacitor and was isolated from
the rest of the circuit by a diode. Thus,
no dc load was presented to the circuit
driving the level input of a capacitordiode gate.
The resistor-capacitor time constant of
the gate required that the conditioning
level be present a certain amount of time
before the pulse input occurred. This introduced a delay between the application
of a new gate level and the time the gate
was conditioned, and allowed the sampling of unbuffered flip-flop outputs at
the same time that the flip-flop was
being changed.
The resistor-capacitor combination differentiated level changes, permitting a
level change to create a pulse.

The use of saturating micro alloy diffused
transistor (MADT) transistors and toroidal
pulse transformers appeared to be nearing an
operating limit at 10 MHz. The pulses needed
to operate the circuits shown in the previous di-

~
'''"'-tt f
l

Figure 13.

109

jOUTPUT

IOUTPUT

INPUT~

~

I

LEVEL
INPUT

INPUT

CIRCUIT

SYMBOL

Positive C-D gate.

-3 V

f

t

~""',,,", ~ '""O'

I INPUT~

LEVEL
INPUT
LEVel
INPUT
CIRCUIT

Figure 14.

SYMBOL

Negative C-D gate.

agrams were 40 percent of the cycle time of 10MHz logic (40 nanoseconds), which tightly constrained transformer recovery time and made it
difficult to design circuits that were not excessively sensitive to repetition rate. Furthermore,
gate delays were large enough to prevent some
needed logic configurations from propagating
within the 100 nanosecond interval implied by
the 10-MHz rating.
A major break with previous circuit geometries appeared necessary. The use at IBM (in
the IBM 7030 "STRETCH" machines) of nonsaturating logic encouraged an exploration in

110

IN THE BEGINNING

that direction. The project was called the "VHF
Logic" project because operation at 30 MHz or
better (the bottom end of the very high frequency (VHF) radio band) was the goal.
The complex 30-MHz flip-flops were packaged one to a module (Figure IS), with the result that a great many interconnections were
needed to implement logic functions. In systems
designed for 30-MHz operation, the use of leads
longer than a few centimeters was expected to
require special care; hence, it was thought essential for ease of use that a satisfactory transmission line hookup medium be available. A
new solid wall coaxial cable had just been introduced, the 50-ohm impedance version of
which was chosen to hook up the VHF modules. It appeared to have a strong enough center
conductor for practical hookup between modules without being too bulky for easy handbending.
Due to the low impedance needed for the
coaxial cable connections, substantial driving
current was necessary to achieve adequately
high signal voltages, and considerable power
had to be dissipated. The ability to drive a load
at any point along the transmission line was
deemed necessary for practical hookup, and 3volt swings had to be available to insure compatibility with existing modules. These needs
were met by choosing a 60-milliampere output
current, producing a 1.5-volt swing on a
double-terminated 50-ohm line and a 3-volt
swing with a 50-ohm load when interfacing to
existing slower logic. These voltage and current
levels required the addition of heat sinks to the
output transistors. This was accomplished by
installing spring clips that fastened the--cases of
the transistors directly to the connector pins,
exploiting the connectors as heat sinks and at
the same time providing a minimum inductance
connection from the transistor collector (common to the case) out of the module.
The VHF modules contained a novel delay
line implementation which has reappeared in
recent days in the emitter-coupled logic boards

used in the latest PDP-IO processor (KLlO).
Flip-flop output delay was provided by a 10nanosecond stripline etched onto the printed
circuit board. A meander pattern was selected
with a degree of local coupling between the
loops to achieve a 7 to I delay-to-risetime ratio.
Both the delayed and undelayed ends of this 50ohm stripline were made available at the module pins. The undelayed outputs switched simultaneously with the flip-flop outputs, allowing
a subsequent gate to subtract a delayed flip-flop
output from the undelayed complement output
side of the flip-flop and produce a to-nanosecond pulse when the flip-flop changed state.
The performance of the VHF modules was
rated at 30 MHz, which was the limit of the
module testers used on the production floor.
Bench testing demonstrated 40-MHz capability
with the promise of 50-MHz performance if adequate testing apparatus could be found. Risetimes were better than 1 nanosecond.
Modules delivered to customers were used to
build satisfactory high performance systems,
but the need for such high performance was not
widespread. In addition, the product development cycle was, by the standards of the time,
quite long (two years) and enthusiasm for the
VHF modules among DEC engineers waned,
further slowing product momentum. Despite
their failure as a product, with only eight modules in the series, the VHF modules eventually
made a contribution to computer progress. To
produce timesharing systems, the PDP-6
needed a way of comparing relocated addresses
at very high speed. A high speed register comparator was quickly designed using current
mode logic similar to that in the VHF modules.
As a series of general purpose products for
engineers to use, the VHF modules were too
costly and their wiring too inconvenient. Further developments in general purpose logic
modules were to lie in the opposite direction:
toward cheaper, more compact, easier to use,
and slower units.

DELAYED
OUTPUT

DELAYED
OUTPUT

A+l0V(AI

B+l0V(BI

D. P. Z

~

Al
470
112W

,",,,,0
WIRE

r ~ t"l 1

-L Cll
0022

15005%

470
1I2W
A19
470
112W

1/20

R4
470

I

GND

MFD

MFD

R16

~;~Q

C2
150
MMFD

1/2W

PRINTED
WIRE

1/

,0 NS

10 NS

C6
0022

C13
00

C12

0-662

010
0·662

0·662

012
D 662
01

02

03

013
D·662

o
G)

0·662

=i

l>

r-

s:

o

o

C

r-

R8
22

C5
470

1120

m

R11
22
112W

R18
68

TYPE
HM
Cl0

L------+---VVV

VVv

I

IE-C14

UNLESS OTHERWISE INDICATED
RESISTORS ARE 1/4W. 10%
TRANSISTORS ARE DEC 2894·1
R15AND R18ARE CORNING TYPE C
Cl. C3. C4. Cl0. AND C14 ARE ERIE
390000 x5VO 102P .001 MFD 20%

R5
47

R12
47

M

'VVI

1/2W

-l
I

m

OJ

l>

~

en
"Tl
o

1/2W
05·100

Y'

::c

__
05.100

(")

o

s:

~

"'U

014
CLEAR

U

015

C

-l

m

::c

en
Figure 15.

30-MHz VHF flip-flop module.

112

IN THE BEGINNING

By 1964, because of the decreasing cost of
semiconductors during the early 1960s, the cost
of System Module mounting hardware and of

Figure 16. Single and double Flip Chip modules used in
PDP-7 and PDP-8.

wiring had become a significant portion of the
total system cost. In response to this trend, a
new type of module was developed which was a
2.5- X 5-inch printed circuit card with a colorcoded plastic handle (Figure 16). The printed
circuit card provided its own mechanical support - there was no metal frame around it as
there had been in the System Module design.
The new modules, called Flip Chip modules,
plugged into 144-pin connector blocks that
could support eight such modules, providing 18
pins per module. While the improvements in the
cost of module mounting hardware realized
with the new modules were important, the major advantage of the new Flip Chip modules
was that automatic Gardner-Denver Wire-wrap
equipment could be used to wire the module
mounting blocks.
The first series of the new modules was designated the R-Series and was identified by using
red handles. The R-Series circuits were a reaction to the rather complicated set of rules developed for using the previous products. The goal
was to make these modules easy to use and inexpensive. Integrated circuits were not used because they were more expensive than discrete
components, and the computer industry had
not yet decided on the type of integrated circuit
to use. The building block for R-Series logic
was the diode gate, an example of which is
shown in Figure 17. The other basic circuit was
the diode-capacitor-diode (D-C-D) circuit
shown in Figure 18. The diode-capacitor-diode
gate was used to standardize inputs to active devices such as flip-flops and to produce the logic
delay necessary to sense and change flip-flops at
the same time.
A second series of the new modules was developed for the first PD P-8s. This series was
called the S-Series, although it also had red handles. The S-Series modules used the same circuits as their R-Series counterparts, but with
variations in the values of the load resistors and
diode-capacitor-diode gate storage capacitors
to obtain greater speed.

DIGITAL MODULES, THE BASIS FOR COMPUTERS

I

r

' '"~i : !
r

113

...
+10 V

NODE

LEVEl
INPUT
-15 V

UTPUT

INPUTS

PULSE
INPUT

NODE

Figure 17.

Diode gate.

The B-Series with blue handles was essentially the same as the 6000 Series of lO-MHz
System Modules, except that it was repackaged
on new 2.5- X 5-inch cards and used silicon
transistors rather than germanium transistors.
The new silicon transistors were a mixed blessing. While they had temperature sensitivity
characteristics superior to those of the germanium transistors, and their voltage drop characteristics permitted the elimination of the bias
resistor to +10 volts, they did not saturate as
well as the germanium transistors. Because they
did not saturate well, the voltage between the
collector and the emitter in the saturated state
was not as low as it was with germanium transistors. This meant that the series arrangement
of three inverters discussed in conjunction with
the dotted lines in Figure 4 could not be used.
Instead, only two of the silicon transistor in-

-i
LEVEl
INPUT

DIODE GATE SYMBOL

Figure 18.

D-C-D gate.

be connected in series if the output was intended to drive another inverter. The first computer to use the B-Series modules was the PDP7, and the series was heavily used and extended
by the first PDP-lO processor (KAIO).
Analog applications were the target market
for the A-Series modules, which had amber
handles. This series, still being manufactured
today, includes analog multiplexers, operational amplifiers, sample and hold circuits,
comparators, digital-to-analog converters, reference voltage supplies, analog-to-digital converters, and various accessory modules. The
development rate of analog modules peaked in
1971 with 38 new types and declined to 5 new
types in 1977.
While all of the preceding modules had been
designed as user-arrangeable building blocks,
the green handled G-Series was intended for

114

IN THE BEGINNING

modules that would be sold only as part of a
system. For example, all of the DEC core memory circuits have been in the G-Series because a
core memory system is sufficiently complex that
a cookbook approach using a standard series of
modules is not appropriate. The G-Series is still
actively used today for circuits other than logic,
generally in peripheral devices such as disks,
tapes, and terminals.
Like the A-Series and G-Series, the W -Series
(white handle) is still manufactured and is used
to provide input/output capability between
Flip Chip modules and other devices. Lamp
drivers, relay drivers, solenoid drivers, level
converters, and switch filters are incl uded in
this family, but the only modules used widely
today are those modules which include cable
termination modules and blank boards upon
which the user can mount integrated circuits
and wire-wrap them together.
While the W-Seriesmodules provided a variety of interface capabilities, their circuitry was
still too fast for typical industrial applications.
Computer logic, by its very nature, is high speed
and provides noise immunity far below that required in small-scale industrial control systems
located physically close to the process they control.
Unfortunately, industrial electrical noise is
not predictable to the nearest order of magnitude. Thus, attempts to solve noise problems
with high level logic, whose voltage thresholds
were merely a few times greater than computer
logic thresholds, did not work well.
A new series of modules was developed, the
K-Series (with blac(K) handles), which relied
on a com bination of voltage, current, and time
thresholds to protect storage elements such as
flip-flops and timers from false triggering. Since
industrial controls typically interact with physically massive equipment which moves slowly
relative to electronic speeds, time thresholds are
particularly attractive. There are four ways of
exploiting these:

l.

Using basic 100 KHz slow-down circuits
everywhere.

2.

Making optional 5 KHz slow-down circuits available.

3.

Providing transition-sensitive (edge-detecting) circuits with hysteresis to allow
additional discrete capacitor loading of
the input when all else fails.

4.

Replacing the conventional monostable
multivibrator or "one-shot" circuit with
a timing circuit which has both a low impedance and hysteresis at the input.

The hardware for the K-Series was specifically designed to fit the NEMA (National Electrical Manufacturers Association) enclosures
traditionally used with relay implemented industrial controls. The K-Series used the same
connectors as the other Flip Chip modules,
however. Sensing and output terminals were
provided with screw terminals and indicator
lights, and appropriate arrangements were
made to interface with 120-volt ac devices.
Wire-wrap terminals were protected from external voltages but were available for oscilloscope
probes. Magnetically latched reed relays and
diode arrays that could be programmed by
snipping out diodes were provided as memory
elements that would retain data during power
failures.
Gating in early K-Series modules was accomplished with discrete diode-transistor circuits
such as that shown in Figure 19. Other K-Series
modules used integrated circuits for the logic
functions. In these designs the inputs to the integrated circuits were protected with filter /trigger circuits which filtered out the noise
and then restored the fast risetimes required by
the integrated circuits. Outputs were protected
from output-induced noise and converted to
standard K-Series signals by circuits similar to
those used in the discrete logic gates.

DIGITAL MODULES, THE BASIS FOR COMPUTERS

~--~'-----..... +5V

INPUTS

T

l

~

"AND"
EXPANSION

~

~

+vcc

=D-

~

l - ....--t~>--H

OR
OUTPUT

INPUTS{ ;

"OR"

EXPANSION

03

CIRCUIT

Figure 19.

115

i1

DSYMBOL

K-Series circuit.
Figure 20.

Unlike other DEC modules, the K-Series
modules were not directly useful for constructing computers or computer data processing
subsystems due to their low speed and high
cost. They did play an important part in bringing digital logic into industrial applications, and
the noise protection techniques developed for
these modules were useful in the design of the
PDP-14 Industrial Controller (Chapter 7).
By 1967 the electronics world had settled on
transistor-transistor logic (TTL) and the dual
in-line package (DIP) as the technology and
package of choice for integrated circuits. In addition, the cost for logic functions implemented
in TTL integrated circuits had dropped below
that of discrete circuit implementations. With
much more logic fitting into the same printed
circuit board area, a single Flip Chip card could
now accommodate much more complicated
functions. However, there were not enough
connector pins available to get the necessary
signals on and off the card. The answer to the
problem was to keep the cards the same size,
but to have etch and associated contacts on
both sides of the printed circuit board. This increased the number of contacts from 18 to 36,
and a new series with magenta handles (the MSeries) was born. Subsequently, some G-Series
and W-Series modules were also designed with
integrated circuits and double-sided boards.
The advent of transistor-transistor logic
brought the first power supply and signal level

Basic TTL NAN D gate circuit.

change in DEC's history. The -I5-volt and
+ 10-volt supplies were no longer required.
Only a single + 5-volt supply was needed to supply the logic signals which were now 0 and + 3
volts. The packaging was kept consistent, however, as the old single-sided modules could be
plugged into the new connector blocks. Careful
attention to pinning arrangements allowed half
of the circuits of a double-sided module to be
used in a single-sided block.
The basic TTL circuit is the NAND gate
shown in Figure 20. Since the change to TTL
logic brought a change in logic symbols, a
sample of the new symbology is also shown in
Figure 20.
The input of the TTL gate is a multiple emitter transistor. If either input is at or near
ground (0 to 0.8 volts), transistor Ql becomes
saturated, bringing the base voltage of transistor Q210w, turning off transistor Q3 while turning on transistor Q4, and making the output
high (+2.4 to +3.6 volts). If both inputs are
high (above 2 volts), Q2 has base current supplied to it through the collector diode of Ql,
turning Q2 on. This in turn provides base current to Q3, saturating it and cutting off Q4,
making the output low (0 to 0.4 volts).
Like the transistor inverter circuits discussed
in conjunction with System Modules, TTL
N AND gates can be cross-connected to form
flip-flops.

116

IN THE BEGINNING

The first generation of M-Series modules was
used in a redesign of the PDP-S, called the
PDP-S/I. The circuits used in these modules
used TTL integrated circuits which were called
7400 series integrated circuits because of a
growing tendency in the semiconductor industry to standardize part numbers for TTL circuits, calling a package of 4 NAND gates a
7400, a package of 6 inverters a 7404, etc. Soon
there was a need in the computer industry for
higher speed circuits. This need led to the development of the 74HOO series. The 74HOO circuits
were similar to those in the earlier 7400 series,
but they were faster and used much more
power. The first PDP-II (the PDP-11/20), the
second PDP-IO processor (KIlO), and the PDPS/E used both 7400 and 74HOO series integrated
circuits. The PDP-11/45, designed between
1970 and 1972, used Schottky TTL, a circuitry
with such rapid switching speeds and high
power consumption that four-layer boards had
to be used such that the inner layers of power
and ground etch could provide both shielding
and an adequate supply of power and ground.

In 1972 work began on a new PDP-1O processor, the KLIO. This used current switching nonsaturating logic from several vendors, including
the MECL (Motorola Emitter Coupled Logic)
10,000 series. This line of circuits is in some
ways an integrated circuit version of the VHF
modules. The basic gate is shown in Figure 21.
In the circuit shown in Figure 21, transistor
Q6 has a temperature compensated, internally
generated reference voltage of -1.3 volts on its
base. The outputs drive 50-ohm terminated
transmission lines returned to - 2 volts. There is
a complementary pair of outputs so that the circuit is both an OR and a NOR gate. At 25 degrees Celsius the upper level will be between
-O.SI and -0.96 volts, while the lower level
will be between -1.65 and -1.S5 volts. The circuits, like the Schottky circuits, are so fast that
multi-layer boards are required. In addition, a
great deal of care in signal line termination is
req uired. As with the previous logic families
studied, flip-flops can be created. The ECL
master-slave flip-flops are quite complex, typically requiring 32 transistors and 7 diodes.

Vcc21GNOI

VccllGNOI

~-+---~~TPUT

NOR
OUTPUT

6.1 k

50 k

A

~------~v~--------~
INPUTS

Figure 21.

EeL circuit.

1-5.2 VI

4.98 k

DIGITAL MODULES, THE BASIS FOR COMPUTERS

YEAR

58

60

62

64

66

68

70

72

5MHz
CLOCK

74

76

78

20MODUlEST

1
I

'"
:::l

Cl
Cl

::;

:5'"

500 kHz
ANDl MHz

~

,
10\1 nput.Output.Register,
PC \Program .Coun ter< 6: 17>,
OV\Overflow<>,
PF\Program.Flags< I :6>,
RUN< >

AC\Accumulator,

**

**

Memory.State

PC\Program.Counter<5: 17>,
L\Link< >,
RUN< >

**

Memory.State

**

M\Memory[0:4095]<0: 17>,

M\Memory[0:8l91]<0:17>,

**

**

Console.State

**

Console.State

**

TWS\ Test.Word.Switches,
SS\Sense.Switches< 1:6>,
AS\Address.Switches,

AS\Address.Switches,

**

**

Instruction.Format

**

i\instruction,
:= i<0:4>,
op
:= i<5>,
ib< >
:= i<6:17>,
y<6:17>
:= i<6>,
c1i< >
:= i<7>,
lat< >
:= i<8>,
cma< >
:= i<9>,
hlt< >
:= i,
c1a< >
:= i,
lap< >
:= i<14:17>,
stf< 0:3>
:= i<14:17>,
c1f<0:3>
:= i<7>,
spi< >
:= i<8>,
szo< >
:= i<9>,
sza< >
:= i,
spa< >
:= i,
sma<>
:= i<12:14>,
szs<0:2>
:= i<15:17>,
szf<0:2>

**

Effective.Address

z = ib@y = M[y]<5:17>
End

Figure 6.

! Operation Code
! Indirect Bit
! Address
! Clear 10
! OR AC and Test Switches
! Complement AC
! Halt
! Clear AC
! Load PC
! Set Program Flags
! Clear Program Flags
! Skip if Positive 10
! Skip if Zero OV
! Skip if Zero AC
! Skip if Positive AC
! Skip if Negative AC
! Skip if Zero Switches
! Skip if Zero Flags

Instruction.Format

**

i\instruction,
:= i<0:3>,
op<0:3>
:= i<4>,
ib< >
:= i<5:17>,
y<5:17>
:= i<5>,
c1a< >
:= i<6>,
c11< >
:= i<7>,
rt< >
:= i<12>,
hlt< >
:= i< 13>,
rar< >
:= i<14>,
ral< >
:= i<15>,
oas< >
:= i<16>,
cml< >
:= i<17>,
cma< >
:= i<8>,
is< >
:= i<9>,
szl< >
:= i<9>,
snl< >
:= i,
sna< >
:= i,
sza< >
:= i,
spa< >
:= i,
sma< >

**

**

z<6: 17> :=
Begin
z = y Next
Repeat Begin
If Not ib =9 Leave z Next

End,

ACS\AC.Switches,

Effective.Address

! Operation Code
! Indirect Bit
! Address
! Clear AC
! Clear L
! Rotate Twice
! Halt
! Rotate Right
! Rotate Left
! OR AC and Switches
! Complement L
! Complement AC
! Invert Sense of Skip
! Skip if Zero Link
! Skip if Non-Zero Link
! Skip if Non-Zero AC
! Skip if Zero AC
! Skip if Positive AC
! Skip if Negative AC

**

z<5:17> :=
Begin
z = y Next
! indefinite indirect

If Not ib =9 Leave z Next
If z Eqv #OOOI? =9 M[z] = M[z]
z = M[z]<5:17>

+

End,

PDP-1 and PDP-4 ISPS description (courtesy of Mario Barbacci) (part 1 of 5).

I Next

THE PDP-1 AND OTHER 18-BIT COMPUTERS

**

instruction.lnterpretation

**

** Instruction. Interpretation **

interp :=
Begin
Repeat Begin
If Not RUN '9 Stop( ) Next
i = [vi [PC] Next
PC = PC + I Next
execute( )
End
End,

interp : =
Begin
Repeat Begin
If Not RUN ~ Stop( ) Next
i = M [PC] Next
PC = PC + I Next
execute( )
End
End,

execute :=
Begin
Decode op '9
Begin
! Load and Store Group
lac :=AC = M[z()].
! Load Accumulator
lio :=10 = M[z( )],
! Load I/O Register
! Load Immediate (sign extension)
law :=AC< = ib@y,
dac : = M [z( )] = AC,
! Deposit Accumulator
dio :=M[z()] = 10,
! Deposit 1/0 Register
dap :=M[z( )]<6:17> = AC<6:17>,! Dep. Address Part
dip :=M[z( )]<0:5> = AC<0:5>,! Deposit Instruction Part
dzm := M[z( )] = 0,
! Deposit 0 in Memory

execute :=
Begin
Decode op =>
Begin
! Load and Store Group
lac :=AC = M[z( )].

! Arithmetic and Logical Group
add :=Begin
OV@,AC = AC + M[z()] Next
If AC Eqv #777777 = > AC = 0
End,
sub : = Begin
OV@AC = AC - M[z()] Next
If AC Eqv #777777 = > AC = 0
End,
mus :=Begin
! Multiplication Step
IfI0<17> ~ AC = AC + (usl M[z()] Next
AC@IO = (AC@IO) SrO I Next
If AC Eqv #777777 => AC = 0
End,
dis : = Begin
! Division Step
AC@IO = AC@IO@(NotAC<0»Next
IfI0<17> ~ AC = AC -Iusl M[z()] Next
If Not 10<17> ~ AC = AC + Ius M[zO] + I Next
If AC Eqv #777777 => AC = 0
End,
and. :=AC = AC And M[z()].
ior :=AC = AC Or M[z( )].
xor. :=AC = AC Xor M[z( )],
! Program Control Group
jmp :=PC = z(),
! Jump
jsp : = Begin
! Jump and Save PC
AC = OV@'OOOOO@PC Next
PC = Y
End,

! Arithmetic and Logical Group
add :=Begin
L@AC = AC + locI M[z( )] Next
If AC Eqv #777777 => AC = 0
End.
tad : = L@AC = AC + Itcl M [z( )],

Figure 6.

dac :=M[z()] = AC,

dzm :=M[z()] = 0,

and. :=AC = AC And M[z( )].
xor. :=AC = AC Xor M[z( )].
! Program Control Group
jmp :=PC = z(),
jms:= Begin
M[z( )] = L@'(){)()()@,PC Next
PC=z+1
End,

PDP-1 and PDP-4 ISPS description (courtesy of Mario Barbacci) (part 2 of 5).

131

132

BEGINNING OF TH E MINICOMPUTER

cal.jda : = Begin
Decode ib =';>
Begin
! Subroutine Call
cal := Begin
M[#IOO] = AC Next
AC = OV@'OOOOO@PC Next
PC = #101
End.
jda : = Begin
! Jump and save AC
M[z()] = AC Next
AC = OV@'OOOOO@PC Next
PC=y+1
End
End
End.
idx :=Begin
!Index
AC = M[z( )] + 1 Next
If AC Eqv #777777 =';> AC = 0 Next
M[z] = AC
End.
isp : = Begin
! Increment and Skip if Positive
AC = M[z( )] + 1 Next
If AC Eqv #777777 =';> AC = 0 Next
M[z] = AC;
I fA C Geq 0 =';> PC = PC + 1
End.
sad :=If AC Neq M[z()] =';> PC = PC + I, ! Skip if AC Differs
sas :=If AC Eql M[z()] =';> PC = PC + I. ! Skip if AC is Same
xct : = Begin
! Execute
i = M[z()] Next
Restart exec
End,
iot : = undefined( ),
sft : =shift.rotate.group( ).
skp :=skip.group().
opr : = operate.group( ).

Otherwise:= RUN = 0
End
End.

! Undefined Operations

skip< >.

! Result of Condition Tests

skip.group : =
Begin
skip = 0 Next
Decode ib =';>
Begin
o := Begin
! True Test
Ifszo And (OV Eqv 0) =';> (skip = I; OV = 0);
Ifsza And (AC Eql 0) =';> skip = I;
Ifspa And (AC Geq 0) =';> skip = I;
Ifsma And (AC Lss 0) =';> skip = I;
Ifspi And (10 Geq 0) =';> skip = I;
! Test Sense Switches
Decode szs =';>
Begin
#0 :=No.Op().
#7 :=lfSSEqIO=,;>skip= I.
Otherwise:= IfSS EqvO =';> skip = I
End;

Figure 6.

cal

: = Decode ib =';>
Begin
0:=
Begin
M[#20] = L@'OOOO®PC Next

1 :=

PC = #21
End.
Begin
M[M[N20]C = L@'OOOO@PCNext
PC = M[N20] + Ius} 1
End

End.

isz

: = Begin ! Increment and Skip if Zero
M[z] = M[z()] + 1 Next

If M[z] Eql 0 = > PC = PC + 1
End.
sad :=If AC Neq M[z()] =';> PC = PC + I.
xct

iot

:=Begin
i = M[z()] Next
Restart exec
End.
:=Undefined(),

opr.law := Decode ib :9
Begin
O\opr : = operate.group( ).
1\law:= AC ~ Y
End.
Otherwise ;= RUN = 0
End
End.
skip< >.

! Result of Condition Tests

skip.group : =
Begin
skip = 0 Next
Decode is =';>
Begin
Begin
0:=
If snl And (L Xor 0) =';> skip = I;
Ifsza And (AC Eql 0) => skip = I;
Ifsma And (AC Lss 0)
End.

PDP-1 and PDP-4 ISPS description (courtesy of Mario Barbacci) (part 3 of 5).

=';>

skip = I

! True Test

THE PDP-1 AND OTHER 18-BIT COMPUTERS

Decode szf =?
! Test Program Flags
Begin
fIIJ :=No.Op(),
#7 :=IfPF Eql 0 =? skip = I,
Otherwise: = If PF < szf> eqv 0 =? skip = I
End
End,
:= Begin
! Reverse Test
If szo And (OV Xor 0) = > (skip = 1~ O\l = 0);
Ifsza And (AC Neq 0) =? skip = I;
Ifspa And (AC LssO) =? skip = I;
If sma And (AC Geq 0) =? skip = I;
Ifspi And (10 Lss 0) =? skip = I;
Decode szs =?
! Test Sense Switches
Begin
fIIJ
No.Op(),
#7
IfSSNeqO=?skip= I,
Otherwise:= IfSS XorO =? skip = 1
End;
! Test Program Flags
Decode szf =?
Begin
fIIJ
No.Op(),
#7
IfPFNeqO=?skip=l,
Otherwise: = If PF  Xor 0 => skip = 1
End
End
End Next
If skip =? PC = PC + I! Skip
End,
operate.group : =
Begin
If hIt =? RUN = 0;
If cIa =? AC = 0;
If cIi => 10 = 0;
Decode cIf =?
Begin
flIJl:flIJ6:= PF = AC Or OV;
AC =0;
AC<6: 17> = PC
End Next
If cma => AC = Not AC

I :=

Begin
lfszl And (L Eqv 0) =? skip = I;
Ifsna And (AC Neq 0) =? skip = I;

If oas =? AC = AC Or ACS;

If cma =? AC = Not AC;

If cml =? L = Not L;
End,

shift.rotate.group( )
End,

! Shift and Rotate Operations

! Shift and Rotate Operations

hardware function ones(x <0:8> )<0:3>,

Figure 6.

! Reverse Test

Ifspa And (AC Geq 0) =? skip = 1
End

End Next
Ifskip =? PC = PC + 1
End,
operate.group : =
Begin
If hit =? RUN = 0;
skip.group( ) Next
If cIa =? AC = 0;
If cIl =? L = 0;
If rt =? shift.rotate.group( ) Next

! Count Number of I's in x

PDP-1 and PDP-4 ISPS description (courtesy of Mario Barbacci) (part 4 of 5).

133

! Skip

134

BEGINNING OF THE MINICOMPUTER

shift.op<0:3> := i<5:8>,

! Shift Conditions

shift.n<0:8>

! Shift Count

:= i<9:17>,

shift.rotate.group : =
Begin
Decode shift.op 9
Begin
! Rotates
{f01 \ral : = AC = AC Sir Ones(shift.n),
#11 \rar : = AC = AC Srr Ones(shift.n),
{f02\ril : = 10 = 10 Sir Ones(shift.n),
#12\ril : = 10 = 10 Srr Ones(shift.n),
{f03\rc1 : = AC@IO = (AC@IO) Sir Ones(shift.n),
#13\rcr : = AC@IO = (AC@IO) Srr Ones(shift.n),
! Shifts
{f05\sal := Decode AC 9
Begin! AC Left
o := AC = AC SIO Ones(shift.n),
1 : = AC = AC Sll Ones(shift.n)
End,
#15\sar := AC = AC Srd Ones(shift.n),
{f06\sil : = Decode 10<0> 9
Begin
o := 10 = 10 SIO Ones(shift.n),
1 : = 10 = 10 Sll Ones(shift.n)
End,
#16\sir := 10 = 10 Srd Ones(shift.n),
{f07\sc1 := Decode AC 9
Begin
0:= AC@IO = AC@IOSIOOnes(shift.n),
1 := AC@IO = AC@IOSII Ones(shift.n)
End,
#17\scr := AC@IO = (AC@IO)SrdOnes(shift.n),
Otherwise := Undefined()
End
End
End

Figure 6.

shift.rotate.group : =
Begin

! AC Left
! AC Right
! 10 Left
! 10 Right

If Tal 9 L@AC = (L@AC) Sir I;
If rar 9 L@AC = (L@AC) Srr 1

! AC@IOLeft
! AC@IO Right

! AC Right
! 10 Left

! 10 Right
! AC@IO Left

! AC@IO Right

! End of Description

End
End

! End of Description

PDP-1 and PDP-4 ISPS description (courtesy of Mario Barbacci) (part 5 of 5).

This controller, which operated under program
control, used a minimum of hardware, but it
used 100 percent of the processor's time when it
was reading or writing data. For high speed operation, the various tape movement signals were
connected directly into the program flags. To
minimize hardware, there were no word buffers
in the controller; instead, characters were assembled in the processor's I/0 register. While a
controller that requires 100 percent of a
$120,000 computer's attention would not be designed today, this structure is identical to mod-

ern day microprocessor-based controllers that
occupy 100 percent of a much cheaper processor's time. Thus, each computer generation
goes exactly through all the stages of evolution
of the predecessor generations. (A similar concept, the "wheel of reincarnation," is discussed
in the Chapter 7 description of displays.)
The PO P-1 engineering prototype (l / A) is
shown in Figure 8. It was first shown in Boston
at the Eastern Joint Computer Conference in
December 1959. The cathode ray tube was integrated into the console, as shown in Figure 9,

THE PDP-1 AND OTHER 1 8-BIT COMPUTERS

PDP-'

MAGTAPE
TRANSPORT
(TYPE 50)

TAPE CONTROL
UNIT

STATE

~I--

r------~~---i~__LE_VE_L_S__~.

__

~

_ _ _ _ _- '

~~.--~---------------.
FLAG
2

WRITE
BUFFER

INPUTI
OUTPUT
REGISTER

CHARACTER
BUFFER

LOCAL
CONTROL
ELECTRONICS

Figure 7. Program control-based magnetic tape control
from PDP-1 register transfer diagram.

Figure 8.

PD P-lI A prototype (circa 1960).

135

136

BEGI NNING OF TH E MIN ICOMPUTER

Figure 9.

Figure 10.

PDP-1/A CRT console.

PDP-1IB at BBN (circa 1960).

but this design was subsequently dropped for
cost reasons. The use of a cathode ray tube integrated into the console never returned to the
DEC main line of computers, except briefly in a
few PDP-6s and in the LINC and PDP-12 laboratory computers. In modern fourth generation (large-scale integration) computers, the

entire computer is integrated into the cathode
ray tube housing.
Bolt, Beranek, and Newman (BBN), a consulting firm in Cambridge, Massachusetts, purchased the first production machine (1 /B) for
delivery in November 1960. This machine is
shown in Figure 10. A third machine, similar to

THE PDP-1 AND OTHER 18-BIT COMPUTERS

Figure 11.

137

PD P-1 IC production version (circa 1961).

the 1/ A and liB, was constructed for internal
use.
After building the first three machines, it was
clear that modifications were needed to improve producibility, lower production costs,
and improve reliability. The separate console
required many cables, and the connectors between the console and the computer were unreliable. For this reason, the final design (called
the PDP-I/C) used an operator/maintenance
console integrated into the cabinets, as shown
in Figure 11. The cabinets were produced by
DEC and were designed as air plenums to improve air flow by having air enter at the bottom
of the cabinet and flow past all the modules.
The PDP-l/C cabinet design and module
mounting scheme were used directly in the
PDP-4 and PDP-5 computers and have remained relatively unchanged (except for airflow
direction) through the years. They are being
used in housings of the smaller metal-boxed
minicomputers and in options of the third (in-

tegrated circuit) and the fourth (large-scale integrated circuit) generations.
The PDP-l/C design used four cabinets instead of the three cabinets of the earlier versions
and preassigned the space in those cabinets for
improved producibility and configuration control. Each of the multiply-divide, sequence
break, memory extension control, and high
speed channel options had an assigned location.
Figure 12 shows the numerous options that
were offered for the PDP-I. Figure 13 shows a
side view of a typical cabinet and shows the
space for interconnecting to other options. Expansion was accommodated by adding bays to
the basic four-bay mechanical structure and by
interconnecting stand-alone options via cables.
Rather than the bused connection scheme commonly used today, the PDP-I used a radial interconnect system. The radial design of the I/O
structure and the free-standing controllers for
the magtape, displays, card equipment, printer,
and other devices made cabling relatively easy.

138

BEGINNING OF TH E MINICOMPUTER

CENTRAL PROCESSOR OPTIONS

TAPE
TRANSPORT
TYPE 50

TAPE
TRANSPOR
IBM 729

TAPE
TRANSPORT
TYPE 50

CARD
PUNCH

UTOMATIC
LINE
PRINTER

AID
CONVERTER
TYPE 138

.INPUT/OUTPUT OPTIONS

Figure 12.

PDP-1 system block diagram.

As with device controllers, history is repeating
itself today in this area, as new fourth generation designs are returning to radial interconnect due to the decreased cost of logic, the
high cost of interconnect, and the need to
bound the system.
The additional year of module design between American Research and Development's
permission to construct computers and DEC's
actual commencement of computer construc-

tion had permitted more low speed (500 KHz)
modules to be designed. These newer modules
used the same circuit techniques as their predecessors, but they used less expensive, slower
transistors. These new modules were used for
the I/0 equipment. The PDP-l was built from
only 34 module types, including memory modules. Each module type was fully general purpose, except the five module types that were
used for the analog memory circuitry. The module types are shown in Table 1.

THE PDP-1 AND OTHER 18-81T COMPUTERS

CENTRAL PROCESSOR

CONSOLE

N/OUT ANO
SEQUENCE
REAK PANEL

~

r
[

BAY 2

BAY 1

BAY 11

PERFORATEO
TAPE

Table 1.

PDP-1 Modules

BAY 3

MEMORY EXTENSION
CONTROL

HIGH-SPEED
CHANNEL
CONTROL

SPACE

PUNCH

PERFORATED
TAPE
READER

=1

I

CONTRpL UNIT

409B-WORD
MEMORY
MODULE

+

AND

CONTROL
PANEL

000000

+

ARITHMETIC UNIT

000000
STANDARD
IN/OUT
TRANSFER
CONTROL

PUNCH
CONTROL
TYPEWRITER
CONTROL
ONE-CHANNEL
SEQUENCE BREAK

Circuit Type

High

Low

Speed

Speed

5MHz

500 KHz

CIcek

Clock

Inverters, gates, decoders

7

5

Pulse a mplifiers, delay lines

4

2

Flip-flop configurations

2

3

Special drivers,
signal conditioning

4

2

Core memory circuits

5

IN/OUT PLUGS

READER
CONTROL

SPACE

MULTIPLY
DIVIOE
LOGIC

OPTIONAL
IN/OUT
TRANSFER
CONTROL

22
Figure 13.

139

12

PO P-1 IC logic layout diagram.

Because of its short word length and high
speed, the PDP-I was particularly suited to the
laboratory and scientific control applications
that were to emerge later in the second generation. The small, scientific computers from
Bendix (G-I5) and Librascope (LGP-30) had
longer word lengths and cost less than the PDPI, but they were slower because of their serial
design which was dictated by the use of a drum
as primary memory. This slow speed limited the
utility of these machines in computation, control, and laboratory applications.
There were some market credibility problems
which inhibited PDP-I sales. It was an unorthodox machine in that it had high speed, a short
word length, and no built-in floating-point
arithmetic. Also, potential buyers doubted that
a company with only 100 employees and less
than a million dollars in sales could be a reliable
and long-lived computer supplier.
The first few PDP-Is were sold for the anticipated applications in scientific computation
and real-time control. Users directly interacted
with the computer via its typewriter, cathode
ray tube, and console. Customers included:
Lawrence Livermore Laboratory (for periph-

eral support processing to their large scientific
calculators and for graphics I/O); BoIt, Beranek and Newman (for psycho-acoustics and
general computer science research); and Atomic
Energy of Canada Limited (for pulse height
analysis and van de Graaf generator experiment
control). The most important sale in terms of
DEC's future was to International Telephone
and Telegraph (ITT), which used PDP-Is in
message switching systems.
Nearly half of the PDP-Is constructed were
used, as the ADX 7300, for the ITT message
switching application. The application was, in
essence, the automation of a torn tape switching
center. In a torn tape switching center, messages
are received punched on tape, and the tapes are
hand carried to a tape reader appropriate to the
message's destination. In the computerized version, up to 256 teleprinter lines could be
switched under program control in a store and
forward scheme on a character-by-character
basis using the interrupt facility of the PDP-I.
The PDP-I was uniquely suited for this application because of its high speed and high performance Sequence Break System which
permitted low cost teleprinter line interfaces.

140

BEGINNING OF THE MINICOMPUTER

Aside from the experience gained from having to produce computers that could run unattended and without service, the most important
result of the ITT order was that it allowed DEC
to build a number of identical machines without
special engineering. This in turn provided a production base with decreased costs (as described
in Chapter 3) and a discipline to be less special
systems oriented. The first few machines ordered by other customers had been nearly all
different, requiring DEC to build options that
were sold only a few times. In addition, many of
those machines had interfaces that were unique
to the applications.
It should be noted that because the hardware
for the PDP-l was relatively inexpensive, DEC
could afford to stock an ample supply of basic
modules for building special interfaces. Constructing interfaces and specialized hardware
was relatively easy compared to modern day
hardware design. Also, design errors could be
corrected with simple wiring changes - a much
easier process than that demanded by the modern day, where expensive printed circuit boards
have fine etch lines to be cut and read-only
memories to be changed. Finally, the special interfaces and controllers for the PDP-I were
quite simple compared to modern designs.
While the ITT sale was important to DEC's
future, the Bolt, Beranek, and Newman (BBN)
sale was important to the future of the entire
computer industry because it was one of the
events leading to the development of timesharing. A number of computer scientists at
M.LT. and BBN believed that it was necessary
to provide interactive access to computers. The
only way to make this economically viable was
to simultaneously share the computer among
the users. Three experiments were carried out to
demonstrate its feasibility: the IBM 7090 system
at M.LT. [Corbato et al., 1962] which later became the Compatible Time Sharing System
(CTSS), the multiuser PDP-I at M.LT. [Dennis, 1964] which was operational in 1963, and
the shared PDP-l at BBN [McCarthy et al..
1963].

Batch multiprogramming [Strachey, 1959]
was an important part of the design of the
Stretch computer [Buchholz, 1962] and the
Atlas computer [Kilburn et al., 1962]. They
were oriented toward hardware efficiency in
that they aimed for high utilization of all components. Timesharing, on the other hand, was
concerned with the efficiency of the people trying to use the computer - the efficiency of the
man-computer interaction [Corbato et al.,
1962].
A set of requirements was identified for a
timesharing system. Unless the workload was
restricted to programs that were specially designed to run concurrently and to programs
that were error-free, one needed the following:
1.
2.
3.
4.
5.

Memory protection.
Program' and data relocatability.
A supervisor program.
A timed return to the supervisor.
Interpretive execution of the I/O
structions.

In-

The BBN timesharing system began operation in September 1962. Five teleprinter users
shared the upper 4 K words of memory; the
lower 4 K words held the supervisor program,
called the "channel 17 routine." The modifications to the PDP-I to effect timesharing were
embodied in the "restricted mode" of operation. They matched the above requirements in
the following way:
I.

2.

3.
4.

Memory protection. Switching between
the two 4-Kword areas required the use
of an I/O instruction.
Program and data relocatability. Because
only one user was resident at one time,
this was not needed.
A supervisor program. The channel 17
clock routine fulfilled this function.
A timed return to the supervisor. The
channel 17 clock generated an interrupt
every 20 milliseconds.

THE PDP-1 AND OTHER 18-BIT COMPUTERS

5.

Interpretive execution of I/O instructions.
Whenever the PO P-I was in restricted
mode, an attempt to obey an I/O Instruction caused a sequence break.

The TYC Control Language, a debugging aid
adapted from the DDT language devised for the
PDP-I and its predecessor languages, was regarded as important because it allowed direct
language program debugging. The "restricted
mode" modifications, a high speed swapping
drum, and the use of the new multiport memory
designed for the PDP-6 formed the PDP-I/O
design. Timeshared computers were built and
operated at BBN, Stanford, and M.LT. These
timesharing efforts later influenced the use of
timesharing in the PDP-6 (Chapter 21).

141

chines, including the machine that became the
PDP-I successor, the PDP-4 (Figure 14). The
PDP-2 designation was saved for a possible 24bit machine, but none was ever built. DEC also
never built a PDP-3, although one was designed
on paper as a 36-bit machine. *

THE PDP-4

About two years after the PDP-I was first
shown, the notion of a much smaller machine
developed during discussions of process control
applications with Foxboro Corporation and
various other customers. A machine called the
DC-12 Digital Controller was proposed. This
would be a 12-bit computer oriented toward
process control data collection and laboratory
data processing. During the preparation of the
proposal, the CDC 160 was studied, and the
DEC engineers briefly considered building a
copy or version of the 10-bit L-I computer designed by Wes Clark at Lincoln Laboratory.
However, the principal idea input for the
Digital Controller came from another Wes
Clark computer, the Laboratory Instrument
Computer (LINC).
The DC-12 Digital Controller was never built
by that name; instead, it became the PDP-5
(Chapter 7). Some of the ideas studied in the
LINC and L-I were used in other DEC ma-

The decision to make the next machine an 18bit machine, rather than a 12-bit machine, was
taken very lightly when it was made in December of 1962. In retrospect, it may have been a
poor decision, but the reasoning went somewhat as follows.
Based on the programming experience of the
TX-O, Gordon Bell felt that an 18-bit machine
significantly simpler than the PDP-I could be
built and that simple machines with few instructions for a given number of data-types would
perform nearly as weIl as those with more instructions. This feeling was based on the use of
Whirlwind, TX-O as it evolved through its various versions, and the PDP-I. This was later
proven to be true, as the PDP-4 was implemented in less than half the space of the PDP-I
and provided 5/8 the performance for 1/2 the
price. There is some question, however, as to
how much of the size reduction was due to the
simpler architecture, how much to the substantially better logic design implementation,
and how much to the increased logic packing
density.
Gordon Bell had conceived the idea of autoincrementing memory registers. This allowed
vectors to be accessed easily instead of using index registers. The auto-incremented memory
registers performed about as well as index registers and were much less expensive to implement.
The PDP-I had used one's complement arithmetic, which was especially poor for the fast
multiple precision operations and floatingpoint arithmetic that DEC's customers needed.

* In 1960 a customer (Scientific Engineering Institute. Waltham. Massachusetts) built a PDP-3. It was later dismantled and
given to M .1.T.: as of 1974, it was up and running in Oregon.

142

BEGINNING OF THE MINICOMPUTER

Figure 14.

PDP-4.

Multipie precision operations required the detection of carry or borrow and the ability to add
or subtract the result into the next most significant word. One's complement (especially as implemented on PDP-I) did not conveniently
provide this capability, whereas two's complement arithmetic did. Therefore, the PDP-4
was designed to use two's complement arithmetic and to use the Link bit idea from the Lincoln Laboratory L-l design to permit the
efficient programming of multiple precision
arithmetic operations.
Two control instructions were changed so
that they would not affect the Accumulator and
interfere with arithmetic instructions. The
"jump to subroutine" instruction was changed

to store the return link in the program area.
This convention would not be used today because it destroys the state of subroutines, thus
precluding reentrant programming, and it
makes the use of read-only memory difficult.
The other change was that the "index and skip"
instruction operated on memory only.
Those PDP-l features that cost logic but
added little to performance were eliminated.
Among these were program flags, sense
switches, and the wired-in program (read-in
mode) that controlled the automatic reading of
paper tape.
The PDP-l had used 4-Kword memory with
memory bank switching, an arrangement that
was common when the useful software required

THE PDP-1 AND OTHER 18-81T COMPUTERS

8 K words of memory. It was felt that 8 K words
of directly addressable memory would be ideal.
The corollary to Parkinson's Law that programs expand to fill any physical memory size
was clearly not understood. However, it turned
out that most PDP-4s stayed within the 8K word constraint, although the machine could
operate with up to 32 Kwords of memory.
I t was decided that the goal was to build a
modular design such that the optional equipment cost would be associated with the option
rather than wired into all of the machines. It
was also decided that the Teletype Corporation
Model 28 should be used instead of a modified
IBM Model B typewriter such as that used on
the PD P-l. It was felt that this would provide a
lower failure rate, less time to repair, and lower
cost.
The logic design of the PDP-I, although quite
straightforward, was optimized in the PDP-4 by
eliminating redundant terms and encoding the
instructions in ways that would simplify the implementation. (The only way to get a significantly smaller machine was to start over with a
new instruction set processor.) However, the existing peripherals and memories for the PDP-I
could be used immediately to assist the implementation of the new machine. This was another important factor in favor of building a
new I8-bit machine rather than going to a 12bit design.
In addition to the hardware design considerations, software offerings were an important
consideration. The PDP-I users and the prospective customers for the new machine were
adamant about writing process control applications in a high level language. The designers
at DEC briefly considered providing ALGOL
60, but decided that it would be better to provide a FORTRAN II for the new machine. It
turned out that FORTRAN was used somewhat for computation, but most users stayed
with assembly language programming, especially where real-time programming was concerned.

143

The designers had a fairly clear idea of the
intended market for the new machine. Like its
predecessor, the PDP-I, the PDP-4 was to be
used predominately for process control; with
some use in the. laboratory for pulse height
analysis, data gathering, and other similar applications. In fact, during the planning for the
PDP-4, meetings were being held with Foxboro
Corporation about applications at Nabisco for
baking control and with Corning Glass about
the control of a glass tube manufacturing process. The meetings with Foxboro may have
been another factor in the I2-bit versus I8-bit
decision, as Foxboro favored the longer word
length due to their previous experience with a
24-bit RCA control computer. When the PDP-4
machines were produced, both Foxboro and
Corning bought them.
The simplifications achieved in the PDP-4
can best be appreciated by comparing the PDP1 and PDP-4 ISPs, as shown in Figure 6, and
the register transfer structures, as shown in Figures 5 and 15.
As with the PDP-I, the major design goal of
the I/O system was that users be able to connect
equipment easily. The use of an I/O bus structure such as party line or daisy chain was not
considered for the PDP-4, although one was developed one year later for the PDP-5. Instead,
the design effort focused on improving the existing radial scheme to achieve greater peripheral compatibility. The I/O section, called the
Real-Time Control (Figure 16), included the
ability to interface with PDP-l peripherals.
There was a small taper pin patch panel where
cable drivers and input gates could be patched
to the cables which radiated out to the peripherals from the main computer cabinets. The input
capabilities were somewhat better than the
cable drive capabilities, as the process control
operations of that day were really more process
monitoring than process control, a reflection of
industry's distrust of the reliability of computers for actual control applications. The simplicity of the I/O distribution contributed a

144

BEGINNING OF THE MINICOMPUTER

ARITHMETIC AND CONTROL ELEMENT

.. +

~.

20 X J LINES

REAL-TIME
OPTION
TO OTHER
OUTPUT
EQUIPMENT

INPUT/OUTPUT
EQUIPMENT

INFORMATION

' - -_ _ _ FROM OTHER

Figure 15. PDP-4 processor/real-time option register
transfer diagram.

INPUT EQUIPMENT

Figure 16.

great deal to the compactness of the PDP-4. A
complete PDP-4 with card reader, magnetic
tape, display, and other options required three
bays, but many systems could fit within the two
standard bays (Figure 17), making PDP-4 systems less than half the size of comparable PDPI systems.
In addition to the physical aspects of the I/O
system, the logical design of the I/O system included some new features. One of these was the
ability to count events. Event counting was important in scientific applications such as pulse
height analysis, and the first customer to ex-

NOTE,

Included in a standard PDp·4.

PDP-4 block diagram.

press a need for it was the Columbia University
Physics Department. It was also important in
process control applications such as metering
flows and counting discrete items. Options such
as the 16-channel clock implemented the event
counting feature by having the option access a
memory cell and then rewrite its contents plus
one, thus changing the contents of memory as it
was rewritten. Counting could occur at event
rates up to the 125-KHz memory rate.
This method of event counting lead to the design of a relatively low cost, high performance
Direct Memory Access feature called the Three

THE PDP-1 AND OTHER 18-BIT COMPUTERS

BAY 1

BAY 2

1A

INTERNAL
PROCESSOR

i"

VARIABLE POWER
SUPPLY 734

MEMORY
MODULE

-

POWER CONTROL
PANEL
813

20

ARITHMETIC
UNIT

BLANK

2E

1E

1F

2F

1H

POWER SUPPLY
735

REAL·TIME
CONTROL

2J
READER
CONTROL

2K

1L

PUNCH
CONTROL

2L

KEYBOARD/PRINTER
2M
1M CONTROL
BLANK

POWER SUPPLY
728

BLANK

000000

1J

1K

IN/OUT PLUGS
POWER SUPPLY
728
CONTROL
UNITS
FOR
OPTIONAL
EQUIPMENT

POWER SUPPLY
728

2N

BLANK

LOGIC LAYOUT

1

POWER SUPPLY
728

2H
BLANK

1N

MARGINAL CHECK
SWITCH PANEL

2C

CONT:~~ UNIT _
10

BAY 1

-

2B

1B

1C

BAY 2

145

POWER SUPPLY
728

I

POWER SUPPLY
728

8LANK

PLENUM DOOR LAYOUT

Operator Control Panel.

Figure 17.

PD P-4 logic layout diagram.

Cycle Data Break. This feature was first used in
the magnetic tape controller that was designed
for the PDP-4, and it has been used extensively
since then in PDP-8 options (Chapter 7). The
Three Cycle Data Break method of Direct
Memory Access works as follows:
I.

During the first cycle, the word count
(stored as a word in memory) is incremented. The word count is the negative of the length of the block to be
transferred, and the incrementation step
indicates that the present transfer is reducing the number of words left to be
transferred by one.

2.

During the second cycle, the current address pointer (also stored as a word in
memory) is incremented. The current address pointer indicates the memory address to which or from which the data
transfer is to take place.

3.

During the third cycle, the actual data
transfer between the memory and the
I/O device takes place.

In addition to changes in the instruction set
processor and the I/O system, the PDP-4
differed from the PDP-I in the module technology used, as was discussed in Chapter 5.
During the manufacture of the PDP-I, DEC
had been extending its main business, the sale of
logic modules, by extending the lower cost,
slower speed 500-KHz versions of the 5-MHz
modules that were used in the PDP-I. The new
500-KHz modules, evolving to I MHz, were 50
percent less expensive to build than the 5-MHz
modules because they used germanium alloy
transistors rather than micro alloy diffused
transistor (MADT) transistors. They were also
substantially easier to use and more reliable because of their lower data rate and wider clock
pulses. Two additional circuit design techniques

146

BEGINNING OF THE MINICOMPUTER

reduced the cost and increased reliability by reducing the number of active elements. Rather
than use a transistor per gate as in the earlier
designs, a diode-transistor logic design was
used. In addition, capacitor-diode gates were
used for the AND gates associated with register
transfers.
The changes in the technology not only permitted lower cost, greater noise immunity, and
greater reliability, they also permitted greater
densities. This made it possible, in some cases,
to design entire device controls on a single module. Because the modules had only 22 pins (IS
pins for signals), the increased densities could
not be applied directly to the more complicated
logic functions. To solve this problem, a 10-pin
connector was added on the back of each module for the register transfer gating signals. In
this way, bit-slice architecture could be used,
packaging one bit of the Accumulator register
and all of the associated input gates on a single
module (Figure IS).
An interesting device with multiple stable
states was devised to simplify the control section of the PDP-4. It was a generalization of the
flip-flop to n stable states, using n NAND gates
in a cross-coupled way with each NAND gate
having n-l inputs. A patent was awarded for this
circuit, and it was subsequently used in other
computers and in the module product line.
Maintenance did not represent such a high
portion of the product cost as it does today, and
the designers of the PDP-4 did not feel that the
fraction of the total system represented by the
memory justified such present day features as
parity memory. Nonetheless, maintenance was
a major consideration in the PDP-4 design,
motivating the simplicity of architecture,
straightforwardness of implementation, care in
logic design, and clarity of the maintenance
documentation. The machine instruction set description occupied only one letter-size page.
The logic design flow chart (a state diagram) occupied only one D-size (22 X 34 inch) drawing,
and the design drawings for the processor occu-

AC CARRY OUT

- , AC .... AC --+-.-_-I-~---l----I--

-+AC

1=+=1-.+-----1----1 AC CAR RY IN
AC CARRY_---I-_. . . ._ + - . - - I - - _
ENABLE
AC C A R R Y - - + - - - - I = = l = I - - i - -

AC CLEAR
AC
RIGHT
ROTATE

AC LEFT_~++-__I=+H--I---lROTATE

M

o

MBO

SEE NOTE

RB~
ACS V AC
.... AC
ACSo
I/O INPUT
NOTE,

P 1 02 and R B inputs are disconnected if computer
includes real·time option type 25.

Figure 18. PDP-4 Accumulator bit-slice
register transfer diagram.

pied seven D-size sheets. To facilitate understanding the machine operation, each signal
name on the drawings had a mnemonic prefix
identical to the drawing name (e.g., AC) indicating from which of the seven drawings that
signal originated. This convention has been carried forward through many other DEC machines.
The operator's console, shown in Figure 19,
included several functions to assist maintenance. The console switches (Read, Read Next,
Write, Write Next, Start, Continue) could be repeated at a clock rate varied by a speed control

THE PDP-1 AND OTHER 18-81T COMPUTERS

Figure 19.

PD P-4 operator console.

on the console. This simplified testing by permitting easy use of an oscilloscope. In addition,
simple checks on memory could be performed
by using the console Read and Write switches
and observing the results on the console lights.
Because the PDP-I had been generally used
in dedicated applications, the users had written
their own programs. M.LT., for example, had
contributed a good macroassembler, linking
loader, and interactive debugging program DDT. BBN had contributed various subprograms. DEC h,ad invested very little in PDP1 software and thus had no concern for the cost
of writing system software or for the concept
that a new machine should capitalize on previous systems programming. It was easy for
people at DEC to believe that a small part of
the savings achieved by building a simpler machine could be used to pay for the writing of
new software for that machine.
In the present day, designers of new computers realize that program compatibility is a
constraint and that any new machine must be
on an improving cost/performance line. (This is
discussed in greater detail in Chapters 2 and
15.) At the time that compatibility decisions
were being made with regard to the PDP-4,
about 20 PD P-I s had been installed out of an
eventual population of 50. Looking back from
today's vantage point, a compatible machine
might have been built that would have inter-

147

preted most of the PDP- i programs and offered
the same improved cost/performance ratios as
the PDP-4 did, but still not have been very
much larger than the original PDP-4.
The PDP-4 was a limited success. While it
met the corporate profit standard, it did not sell
as well as had been expected. The market demands were not as completely elastic as they
had been for the PDP-I, and 5/8 of the performance for 1/2 the price was not good
enough. According to the evolution model discussed in the final section of this chapter, a machine with a lower price should have had the
same performance as the PDP-I, or else it
should have been priced much less than the
PDP-I to compensate for the relatively poor
performance. In summary, the PDP-4 was not
aggressive enough in performance or in price.
There is an additional reason for the poor financial showing of the PDP-4. Experience with
other machines that were the first of a series,
such as the PDP-5, PDP-6, LINC-8, PDP-14,
and PDP-II/20, indicates that the financial performance of the first machine is always the
poorest of the series, largely because of the lack
of a software and hardware option base. The
PDP-7, 9, 9/L, and 15 were necessary successors that used the software and hardware option base created by the PDP-4.
THE PDP-7

In many ways the original concept of the
PDP-7 (or what was finally named the PDP-7)
started with the design of the PDP-I/D. The initial plans were to simply repackage the PDP-I,
using some higher density systems modules, and
to reduce the processor cycle time. The goal was
to use these changes to produce a lower price
machine with much better performance. This
goal was met quite well in the PD P-7, as it had a
greater performance/price gain over its predecessors than any other DEC 18-bit computer.
The plan to simply repackage the PDP-I was
abandoned when consideration was given to the

148

BEGINNING OF THE MINICOMPUTER

relative sizes of the existing software and peripheral option bases of the PDP-I and the
PDP-4. The PDP-4 had more extensive software than the PDP-I, including an operating
system and a FORTRAN compiler. The PDP-4
also had a much larger peripheral hardware option base than the PD P-I. Therefore, the goal of
program compatibility with the PDP-4 was
added to the goal of a substantial performance/price improvement, and the I/O interface
scheme for the new machine was constrained to
match the timing and structure of the past computers. Although sounding quite broad, these
goals were rather restrictive, especially the requirements for program and peripheral compatibility. The sales goal was truly broad,
however. That goal was to sell 120 systems,
more machines than the total of all other DEC
computer systems sold to date.
To sell all those systems, a substantial advance in performance would be required. Thus,
the performance goal was to decrease the cycle
time from 8 microseconds to 1.75 microseconds, the practical limit of core memories at
the time. This was a rather ambitious goal and
required designing a new core memory system
and a new set of modules, the ~-Series, which
were Flip Chip modules based on the IO-MHz
systems modules (Chapter 5). These new modules were used for the central processor and
memory. Originally, they were also used in the
I/O section of the system, but that was subsequently redesigned to use primarily 2-MHz
R-Series modules, as will be described near the
end of this section. (Note the similarity to the
PDP-I, where cheaper, lower speed, 500-KHz
modules were used in the I/O.)
Program compatibility between the PDP-7
and the PDP-4 was maintained generally, but
was slightly modified in the I/O section to facilitate the introduction of the ASCII 8-level code.
The PDP-4 console teleprinter had been a Teletype Corporation Model 28 KSR teleprinter
that used Baudot (5-level) code. A shift to ASCII (8-1evel) code had already started in the in-

dustry, so the PDP-7 was designed to use the
Teletype Corporation Model 33 KSR. This
change necessitated that all programs determine
whether they were running on a PDP-4 or on a
PDP-7 so that they could determine how to interpret the characters typed on the console teleprinter. Other than this, an upward
compatibility was maintained. Downward compatibility was not maintained, as the PDP-7 had
some additional instructions, a trap feature,
and a multilevel interrupt option to allow multiuser environments. In addition, the program
read-in mode of PDP-I days returned to the
console. This feature permitted the user to press
a key and cause a paper tape, punched in a special format with address and data or terminating address, to be loaded into the computer's
memory. (Figure 20 shows the PDP-7 operator
console.)
The structure of the processor with its registers and the interfaces to I/O and memory are
shown in Figure 21. Note that the structure and
style of the design was essentially the same as
that used in the earlier designs, but modified for
the higher speed technology. The PD P-7 and
the PDP-4 had identical architectures and similar implementations, but they had radically different realizations. Although the I/O section
and the new options were designed to operate at
the 1.75 microsecond cycle rate, to use the
slower PDP-4 compatible I/O equipment, special pulses were used to implement a slow cycle
of 8 microseconds.

Figure 20.

PDP-7 operator console.

THE PDP-1 AND OTHER 18-81T COMPUTERS

FROM INPUT/OUTPU T
EQUiPMENT USING 0 ATA
BREAK TRANSFERS

ADDRESS

MA
CONTROL

~
TRAP

FROM INPUT/OUTPU T
EQUIPMENT USING
PROGRAMMED STAT US
CHECKS

J

SKIP

I

I/OP 1,2,4

TO DEVICE SELECTOR OF
INTERFACE

CONTROL

r°,"M"~"1L

.~

ACCUMULATOR

FOR INPUT/OUTPUT
EQUIPMENT USING 0 ATA
BREAK TRANSFERS

DIRECT CONNECTiON
AVAILABLE FOR ANY
INPUT/OUTPUT EQU IPMENT
OR FOR DEVICE SELE CTOR
OF INTERFACE

r-

~

MEMORY
ADDRESS
REGISTER
15

PROGRAM
COUNTER
15

I----

......

1

ADDRESS

J

-

DATA

DATA

I

ADDRESS
SWITCH
REGISTER
15

LINK
1

SWITCH
REGISTER
18

TO INPUT/OUTPUT
EQUIPMENT VIA THE
INFORMATION
DISTRIBUTOR OF TH E
INTERFACE

PROGRAM
COUNTER
CONTROL

j

iiOi'
PULSE
GENERATOR

I

FROM INPUT/OUTPU T
EQUIPMENT VIA THE
INFORMATION COLLE CTOR
OF THE INTERFACE

I
I

I/O SKIP

I

-----.

TO CORE
MEMORY

18
MEMORY
BUFFER
REGISTER
18

AC
CONTROL

[
DATA

DATA

:::::::"'"'

MB
CONTROL

I INSTRUCTION
REGISTER
4

MINOR STATES (INSTRUCTION STATESI

I
I

MAJOR
STATE

MAJOR STATES IF 0 E BI

TO AND FROM
CORE MEMORY

GENE:ATOR

I
BR[AK
REQ,IEST

{

'00",'",00"'''

DIRECT CONNECTION

DATA ACCEPTED

AVAILABLE FOR
INPUT/OUTPUT
EQUIPMENT

TRANSFER DIRECTION

USING DATA BREAK
TRANSfERS

DATA BREAK REQUEST

DATA READY

r

~:TOE~~~~T REQUEST I

I
DIRECT CONNECTION
AVAILABLE fOR ANY
INPUT/OUTPUT
EQUIPMENT

I

PROGRAM
INTERRUPT
SYNC.

PROGRAM
INTERRUPT
CONTROL

REAL·TIME
CLOCK

l-Jf
CLOCK
REQUEST

POWER
CLEAR PULSE

SPECIAL PU LSES
ISPO 1,2,3.41

TIMING PULSES
III THROUGH T7I
REQUEST
SLOW CYCLE

Figure 21.

PDP-7 processor and I/O section register transfer diagram.

149

150

BEGINNING OF THE MINICOMPUTER

The system diagram of the PD P-7 (Figure 22)
shows the options and the general interconnection scheme. It was fundamentally the
same structure as its predecessors and was designed for use with many of the earlier peripheral controllers.
Physically, the PDP-7 was larger than the
PDP-4 because the console was mounted on the
side plane to facilitate maintenance instead of
on the end as in PDP-l and PDP-4. This permitted a service man to both look at a scope
and operate the console. Also, the paper tape
I/O equipment, which had been on an extra

~;;-P~r---..i..---,

I
I
I
IL_

CORE MEMORY
1.75 11 S

Figure 22.

4K

___ _

PDP-7 system block diagram.

table in the PDP-l and PDP-4, was now housed
in the third bay of the main computer cabinets.
Figure 23 shows that the number of logic panels
for the processor of the PDP-7 was the same as
that for the PDP-4, even though the circuit
board area of the modules in the PDP-7 (3,348
in2) was slightly larger than that in the PDP-4
(3,300 in2). Although it does not show in the
diagrams or in the photos, a significant portion
of the volume of the PDP-4 was cable connectors to various subassemblies. The PDP-7
improved the cabling by having all of the connectors in the backplane so that all of the wiring

THE PDP-1 AND OTHER 18-81T COMPUTERS

151

couid be done in a single wiring operation. The
PDP-7 was thus the first DEC computer designed for automated wire-wrapping. Mechanical block holders were designed to mount the
connector blocks for the modules and cable
....... ___ .............. + . . . . . _ .... =_ +t.... ...... ,... . . . t...= __ + ........... _,..1 ...,. ,., ........ __..;
............",,..+; .....
HI
auu a

The project started on April 1, 1964, and the
first production system was delivered on December 22 of the same year. The entire logic implementation was undertaken by Ron Wilson
and one assistant, Jack Williams. Later, a Field

wire-wrapping technique was developed to allow a much higher speed production of wirewrapped backplanes. Also, a Gardner-Denver
fully automatic Wire-wrap machine was ordered and programs to control it were developed.
The PDP-7 (shown in Figure 24) was a successful product. The design costs, excluding
module and labor costs, were less than $100,000
from the start of the project to completion of
the first prototype. Time was considered a very
important factor in the design of the PDP-7.

hand-built the first production system to be delivered to Bell Laboratories. The memory control and stack were designed by a memory
design engineer, Derrick Chin, who coordinated his design with the processor logic design.
Despite the hand-building of the first unit, the
production of the PDP-7 was the beginning of
several mass production techniques at DEC,
and it was an important machine in the history
of DEC I8-bit computers.
The development problems that were overcome were quite formidable. A complete new

...."1111+ ..........

~UHH~~LUI ~

LH~ ~auul~l~

BAY 2

BAY 3

1:< ____ : __________ L_L:___

T'"\ __

'7 ____ 1_:

l:L ___ 11 __

.;:)t:IVll.:t:

UUJI

L..t:lt:SAl,

lll~l(1Hy

;,~uu-aULUlUal1~

It:Plt:SClll(1l1VC,

BAY 1

BAY 1

INDICATOR PANEL

POWER SUPPLY
73B

BAY 2

BAY 3

BLANK

BLANK
INDICATOR PANEL

BLANK

POWER SUPPLY
77B

BLANK

SYSTEMS MODULES
MOUNTING PANEl

TRIPLE FAN
MOUNTING PANEl
(REMOVED FOR 16K
OR MORE OF
CORE MEMORY)

A
B

MEMORY LOGIC -

C
D
- MEMORY ARRAY E

PERFORATED
TAPE
PUNCH

F
BLANK
H
MEMORY LOGIC -

J

BLANK

POWER SUPPLY
72B

SYSTEMS MODULES
MOUNTING
PANEL AND
TElETYPE DRIVER

TRIPLE FAN
MOUNTING PANEl
(REMOVED FOR 16K
OR MORE OF
CORE MEMORY)

TRIPLE FAN
MOUNTING PANEL

POWER SUPPLY
779

BLANK OR 50-CYCLE
STEP-DOWN
TRANSFORMER

BLANK

"\

-

-1\/

-

-

A

B

C

D

H

PERFORATED
TAPE
READER

-

---

-

BLANK

-

V\

~f

OPERATOR
CONSOLE

E

-F
POWER SUPPLY
72B

MARGINAL CHECK
PANel

AIR BAFFLE PANEl

I

TABLE

PROCESSOR
LOGIC

-J

-

BLANK

K

POWER CONTROL
B32

BLANK

POWER SUPPLY
72B

MEMORY
POWER SUPPLY
739

BLANK

-

-

-

- r-

L

M

A

N

BLANK

REAR VIEW

Figure 23.

PO P-7 front and back logic layout.

DEVICE SELECTOR AND MANUAL
CONTROL
C
FUNCTIONS LOGIC_
D
B

r-

FRONT VIEW

APERFORATED TAPE
READER/PUNCH B CONTROL LOGIC
82B
POWER
RECEPTACLE

152

BEGINNING OF THE MINICOMPUTER

(a)

Front.

(b)

Rear.

Figure 24.

PDP-7.

line of modules, the Flip Chip series, was developed (although IO-MHz circuits had been
tested in the PDP-6). New connector blocks had
to be obtained to hold the modules, a design
effort that was concurrent with similar efforts
for the PDP-8. New wire-wrap techniques had
to be devised to ease the labor requirements so
that systems could be wired faster. Toward this
end, a program was ultimately developed for
the PDP-4 to do wire-routing and to control the
Gardner-Denver machine. System layouts had
to be developed to facilitate wire-wrapping. The
mechanical packaging and cooling had to be altered to accommodate the new wiring panels, as
the existing PDP-I, PDP-4, and PDP-5 air plenum scheme was completely blocked by the new
connector blocks. The memory performance
goals (1.75 microseconds) were difficult to
achieve, as the best memory performance to
date was that of the PDP-6, which was 2 microseconds. All of the above had to be done within
the cost goals.
As the design phase of the PDP-7 neared an
end and production models were being delivered, two developments occurred that suggested
the possibility of an improved production
model. One of these was the R-Series module
developments. These modules were lower speed
than the B-Series modules that formed the processor, but they were lower in cost and more
complete in the range of functions available.
After analyzing the configurations that the customers were ordering, the designers came up
with a new I/O panel that used R-Series modules as much as possible and was prewired for
several of the most popular peripheral controls,
thus reducing the amount of special wiring required to produce a system. This improved system was called the PDP-7 / A.
With the PDP-7 / A completed, the designers
contemplated the possibilities of a next generation system that would use the new tools that
were now in place, such as the Gardner-Denver
fully automatic Wire-wrap machine. The design
criteria for the new machine would be that it be
completely wire-wrappable using the automatic

THE PDP-1 AND OTHER 18-81T COMPUTERS

machine and that a system with 8 Kwords of
memory sell for approximately $35,000. The
new machine was called the PDP-7 IX.
To meet the goals set for the new machine, a
new cabinet design was started that would
mount the wire-wrap panels on door-type
frames. These frames opened to allow access either to the connector side for oscilloscope tracing or to the module handle side for module
replacement. The new cabinets also dealt with
two problems involving the air flow. One of
these was that the air flow needed to be increased due to the high density of the new logic,
and the second was that the existing air flow
method pulled air from the floor, which was
sometimes dirty. To solve these two problems, a
horizontal air flow system was implemented.
To control the system costs, which were becoming a major factor, the computer was divided both logically and physically into three
divisions: memory, central processor, and in-

Figure 25.

PD P-9.

153

put/ output logic. This was done to permit the
calculation and control of costs more accurately
and to divide the computer into the largest
single panels that the Gardner-Denver machine
could wrap.
The cabinet design and system partitioning
completed, the logic design moved ahead
smoothly. At this time, Larry Seligman, who
had designed the Extended Arithmetic Element
for the PDP-7, took over the project from Ron
Wilson. By this time, the project had changed
its name from PDP-7 IX to PDP-9.
THE PDP-9

The basic logic and hardware for the PDP-9
(Figure 25) were the same as that used in the
PDP-7. Although some integrated circuits were
available, no standards had yet been set, and
there were no cost or speed advantages to be
gained. Therefore, the logic used discrete PNP

154

BEGINNING OF THE MINICOMPUTER

transistor, capacitor-diode circuitry operating
with signal levels of - 3 volts and ground. The
modules were about 2.5 X 5 inches or 5 X 10
inches and were plugged into an assembly of
144-pin connector blocks interconnected by 24gauge wire-wrap.
The major technology advance of the PDP-9
over the PDP-7 was in memory. A new memory
had been designed that used a 2-1/2 D driving/ sense structure. The 2-1/2 D system required only three wires through each core in the
stack, rather than the four wires used in earlier,
coincident current designs such as that used in
the PDP-8 memory. The new memory obtained
a cost advantage by being oriented in an 8K word organization rather than a 4-K word organization. The costs of the discrete component
logic in the machine were still high compared to
those of memory, so the cost advantage was not
as exciting as the second advantage of the new
memory, which was speed. The new memory
had a cycle time of I microsecond as opposed to
1.75 microseconds for the memory in the PDP7. Because memory speed limited system performance, the new memory would permit the
system performance of the PDP-9 to be 1.75
times better than that of the PDP-7.
The structure of the PD P-9 processor is
shown in Figure 26. It was a great deal simpler
than earlier designs and used a general data
path through the adder rather than the ad hoc
register structure of the earlier machines. The
basic PDP-9 implemented the PDP-4 instruction set processor and the Extended Arithmetic
Element option using microprogrammed control. It was the first DEC computer to use this
technique.
In addition to being a technological advancement, the PDP-9 was an interesting precursor of
things to come. A 64-word, 36-bit, 212-nanosecond read-only, transformer-coupled, rope
memory was used as the microprogrammed
control store. The design allowed for easy
bench modification in the event that the microcode required changing. It was originally in-

CENTRAL PROCESSOR

SUB-COMMAND
SEQUENCES
LOAD
I/O

,--------,
I

I

I/O
r-----,

I

I

L.....::.:.::r:''-'---'

I

_____ .-JI
FROM MB

Figure 26.
diagram.

PDP-9 central processor register transfer

tended that the control words be arranged for
unary encoding, or what is now called horizontal microprogramming. In such an arrangement, each bit in the microinstruction denotes
an action and can be specified independently of
other microinstructions. This behavior is similar to the operate class of instructions in the 12bit and 18-bit computers. However, the intention of using horizontal microprogramming

THE PDP-1 AND OTHER 18-BIT COMPUTERS

was soon lost in the complexities of design, and
the bits were encoded to reduce the width of the
control words. This eliminated the possibility of
providing special purpose machines by a simple
read-only memory change, a feature that the designers had originally hoped to include.
The necessity of staying within the size constraints of the read-only memory also constrained the extendability and use of the
microprogram control, in that floating-point
arithmetic could not be included due to space
limitations. There were not enough words, a
problem all too familiar when programming either macro or micromachines. The Extended
Arithmetic Element was included in the microprogram-controlled portion of the machine.
The Extended Arithmetic Element demonstrated the power of the control store technique
because this option, a 36-bit multiply/divide
option, was implemented in only six single
height (5 X 2.5 inch) Flip Chip modules. The
processor occupied about 320 module slots, for
a total printed circuit board area of 3,100 in 2 •
This was not only less than the 3,348 in 2 for a
PDP-7, but it also included both the optional
arithmetic element and much of the I/O control. Thus, when functionality is considered, the
PDP-9 was about half the size of earlier machines.
Interesting sidelights of the processor design
effort included the discovery of an error in the
PDP-I signed integer divide algorithm and
Richard Sogge's design of a discrete carry adder
which would develop the carryover 18 bits in
under 30 nanoseconds. This was an especially
impressive circuit since ECL technology is required even today to obtain this speed.
Figure 26 shows a register transfer level diagram of the processor together with I/O and
memory interface lines. The I/O control extended the features of earlier machines by implementing an eight level nested automatic
priority interrupt facility and a data channel
transfer facility. The Automatic Priority Inter-

155

rupt had four levels of hardware interrupt capability at the I/O Bus and four levels of software
priority. The Data Channel Transfer Facility
was the same as a Direct Memory Access channel, but used the Three Cycle Data Break System pioneered in the magnetic tape control for
the PDP-4 (page 144).
The Direct Memory Access channel was the
most disappointing part of the I/O bus concept
because the speed requirement dictated the use
of an extra set of data and address lines which
were carried between the DMA device and the
memory bus multiplexer via an extra set of cables. In addition, a second port to memory was
required. A clean bus cabling scheme for high
speed transfer devices could not be implemented because of the extra lines required, and
the only alternative, slowing down the machine
to handle the transfers, was not acceptable.
Logic for the PDP-9 was mounted in three
sections, each capable of holding eight rows of
forty modules (Figure 27). Each of the three
sections had self-contained cooling and final
power regulation.
A system block diagram of the PDP-9 (Figure 28) shows the evolution of the I/O and
memory bus structured computer. This scheme,
derived from the PDP-5 and PDP-6, was in contrast to the radial structure of the earlier 18-bit
computers and provided greater modularity
and a major cost improvement. The new bus
was daisy-chained from device to device using
twisted pair cables. This technique provided
uniformity in I/O backplane wiring compared
with the PD P-7, which was customized for each
option. The daisy-chain method allowed independent development, manufacturing, and test
of I/O options and simplified the field installation of options. Also, it allowed costs to be associated with each option rather than being
initially higher as in the radial scheme where all
options had to be planned for in the central processor. The new bus structure was a mixed
blessing in that it created the illusion that systems of unlimited size could be built.

156

BEGINNING OF THE MIN ICOMPUTER

II

16K
MEMORY

'''''"'''~ r. P~~~;::~R
AND
MARGINAL
CHECK
SWITCHES

1\

I/O
PACKAGE

TU55 NO.4
10EC'apei
RESERVED
FOR
ME09B MEMORY

TU55 NO.3

PAPER TAPE
READER AND
PUNCH

TU55 NO.2

OPERATOR'S
CONSOLE

TU55 NO.1

TABLE

f

Except for the 300 wire field change on the
first ten processor backplanes, the PDP-9 enjoyed a good reputation for performance and
up time. It was followed by a less costly version,
the PDP-9/L. The cost reduction was accomplished by using a new (and somewhat cumbersome) power supply design and by offering a 4'" K word minimal system with lower cost paper
tape equipment. The 4-Kword memory planes
were borrowed from the PDP-8 line and
adapted to provide half the memory in half the
space. To provide lower cost paper tape capability, the PDP-9/L used a teleprinter equipped
with paper tape reader and punch instead of a
separate, heavy-duty paper tape reader and
punch. The product life of the PDP-9/L was
relatively short; it was soon made obsolete by
the PDP-I5.

TC02

1

><
Figure 27.
PDP-9 front and
back logic layout.

Figure 28.

PDP-9 system block diagram.

THE PDP-15

Unlike its predecessors, the PDP-I5 was designed to provide a range of systems with both
hardware and software. While early 18-bit machines had evolved to include several configurations, the notion of a planned range for
PD P-I5 systems was explicit from the start. As
it turned out, the PDP-I5 evolved too, and over
a considerably larger range than was anticipated. Table 2 shows the range of systems that
eventually developed; of these, only the models
up through 15/40 were in the original plan.
As in the past, the goal for the new machine
was to provide better performance/cost than
the predecessor. The PDP-7 to PDP-9 transition had provided a performance improvement,
but not a big cost improvement. The new semiconductor technology, transistor-transistor
logic (TTL) available in dual inline packages,
could provide the cost improvement required.
The 7400 and 74HOO series of TTL integrated
circuits permitted clock speeds of 10 to 20 MHz
and lower costs and higher packing densities
than did the discrete circuits used in the PDP-9.
Not only did the higher packing densities lower
the packaging costs, but they also permitted the

THE PDP-1 AND OTHER 18-BIT COMPUTERS

Table 2.

The PDP-15 Family of l8-Bit Computer Systems

Model

Hardware

Software

PDP-15/10
(basic paper tape system)

Central processor
4-Kword memory
Teieprinter

Assembler
Editor
Debugger
Utilities

PDP-15120
(keyboard monitor using
DECtape file system)

Central processor
8-Kword memory
Extended arithmetic
Paper tape
DECtape
Teleprinter

Keyboard monitor
FORTRAN IV
FOCAL
PIP*
Utilities

PDP-15/30
(background/fore.ground)

Central processor
16- Kword memory
Extended arithmetic
Automatic Priority
Interrupt
Memory protection
Clock
Paper tape
DECtape
2 teleprinters

B/F monitor
FORTRAN IV
FOCAL
PIP*
Utilities

PDP-15/35

(PDP-15/30 with disks)

PDP-15/40
(Disk based backgroundl
foreground)

Central processor
24- Kword memory
Extended arithmetic
Automatic Priority
Interrupt
Memory protection
Clock
Paper tape
DECtape
524-Kword fixed head disk
~ teleprinters

PDP-15/50

16-Kword memory

PDP-15176

15/40 plus PDP-11

* PIP

=

157

Disk B/F monitor
FORTRAN IV
FOCAL
PIP*
Utilities

11-based file
and 1/0 device
management

Peripheral (Data) Interchange Program

basic PDP 15/10 (Figure 29) to be the smallest
of the I8-bit series, while providing a number of
options and additional features including an additional instruction set with an index and limit
register for multiprogramming. The new TTL

technology had one substantial drawback, however. Where the old discrete transistor technology had used - 3 volt and ground signals,
the new technology used + 5 and ground. Thus,
to permit the use of both existing peripherals

158

BEGINNING OF THE MINICOMPUTER

FANS

FAN

LOGO

MEMORY

CP/IO

CP/IO

CONSOLE

TABLE

DOOR

POWER

SUPPLY

Figure 29.

PDP-15/10.

DEC 19·INCH CABINET DEMENSIONS:
30 INCHES DEEP: 21·11/16 INCHES WIDE: AND
71·7/16 INCHES HIGH.

Figure 30.

and new peripherals, level converters on the
I/O Bus were required.
In addition to the cost improvements anticipated from the use of integrated circuits, it was
also hoped that new memory systems available
would offer both cost and performance improvements. The PDP-I5 memory is contrasted
with the PDP-I memory in Table 3.
With the new memories and changes in addressing capabilities through the Index Register
and relocation options, memory size could be
expanded to 131 K words. A separate control
unit, called the I/O Processor, handled the
bookkeeping for the I/O channels and I/O Bus.
Figure 30 shows a typical PDP-I5 system. The
two processors (main processor and I/O Processor) occupied only a third of the cabinet
space of a comparable PDP-9 system, yet were
faster and had more capability. While on the
subject of cabinets, note that the packaging for
the PDP-I5 reverted to the simplicity of the earlier PDP-I, PDP-4, and PDP-7 cabinets by using a fixed mounting structure rather than
having the module connector blocks mounted
on a door.

PDP-15 side/front logic layout.

The goals for the PDP-I5 were to obtain an
850 nanosecond cycle time, to be compatible
with the PDP-9, to have a low manufacturing
cost, to improve priority interrupt latency, to fit
the basic system in one cabinet, to extend the
length of the I/O Bus, and to improve maintainability. The success in meeting these goals
varied.
The goal of achieving an 850-nanosecond
cycle time was exceeded, as the PDP-I5 was
shipped with an 800-nanosecond cycle time. It
was particularly gratifying that this goal was
met and exceeded because there had been a
number of obstacles to overcome. The central
processor, memory, and I/O had been made
asynchronous to reduce I/O latency, but this required synchronizing logic that resulted in significant circuit delays. A dc (round-trip)
interlocked memory bus had been designed so
that speed independent memories could be
used, but this caused communications delays.
Finally, to minimize cabling, a single set oflines
had been used for communicating address and
data information to the memory. This caused
further communications delays.

THE PDP-1 AND OTHER 18-81T COMPUTERS

Table 3.

Comparison of PDP-1 and PDP-15 Memories
PDP-1

PDP-15

PDP-15 (Late)

Year
Stack size

1960
4 Kwords

1968
4 Kwords

1972
24 Kwords

,... ....... 1- +:..- .....

~

800 ns

960 ns

Words/cabinet
Electronics
Configuration

12 Kwords
1/3 cabinet
3D stack

48 Kwords
1/12 cabinet
5 planes
4 bits/plane
Planar stack
18 mil
3

96 Kwords
1124 cabinet

""Y\..IU:;;

\.Illn:;

Core size
Wires/core

159

..

~

oJ~"

30mil
4

The PD P-9 instruction compatibility was
achieved with three minor exceptions about
which no complaints were received. Compatibility for I/O devices was achieved by
changing the receiver/driver modules to provide the required conversions back and forth
between the older peripherals and the new
PDP-I5 I/O Bus.
To meet the manufacturing cost goals, a
number of things were considered. The PDP-I5
was one of the first DEC computers to use integrated circuits extensively. Because each logic
type used in the machine would have to be specified, purchased, delivered, and tested, it was
important to minimize the number of logic
types. (Note the similarity of this concern to
that expressed in Chapter 4 with regard to minimizing the number of flip-flop types in the TX0.) The PDP-I5 was designed with 21 semiconductor types, including integrated circuits,
transistors, and diodes. All of them were available from multiple suppliers. To simplify manufacturing and field installation of options, the
PDP-IS had fixed configuration rules. This was
a mixed blessing because the fixed configuration rules resulted in higher costs from the
greater number of partially filled cabinets. Margin testing for the PDP-IS was planned using a
combination of varying logic timing and temperature. Special test equipment was constructed for the PDP-IS production line to

20 bits/plane
Planar stack
18 mil
3

permit rapid heat cycling of central processors
and memories. In addition, a fast program
loader system was designed using a PDP-8 with
mUltiple DECtape units. This system permitted
programs to be loaded into the memory of a
unit being tested by merely pressing a button.
This saved considerable checkout time compared to the previous methods of loading diagnostics via paper tape.
It was originally planned that manufacturing
costs would also be reduced by using subassembly replacement. The concept was that if a
processor, memory, power supply, or other
logic assembly failed to work when it was integrated into a system, the entire subassembly
would be replaced and sent back to its appropriate test line, rather than repairing it in the
final assembly area. This process, planned for
both the PDP-9 and PDP-IS, did not work because the production line was never filled with
enough material to allow the subassembly substitution to take place.
. The manufacturing cost goals were not met
during the production of the first 50 units, so an
examination was made to determine which
items were most costly. It was determined that
most of the cost difficulty was in the mechanical
packaging, and that the cabling, in particular,
was costing more than anticipated. Sights were
set on reducing the cabling complexity by using
a single power harness that could be built and

160

BEGINNING OF THE MINICOMPUTER

tested on a jig. The cabling was reduced to one
console cable, one teleprinter cable, one I/O
bus cable assembly, and two memory bus cables. In trying to limit console cabling, a time
division multiplex communication scheme was
designed to get the signals to the lights and from
the switches. In this scheme, a number of signals were transmitted on the same wires on a
timeshared basis, and the console lamp filaments were used as storage elements. While this
scheme was clever enough to gain the PDP-I5's
only patent, it was generally unsatisfactory. It
made the console logic so complex that when it
failed, it was harder to fix than the processor.
The goal of reducing interrupt latency to two
microseconds was not achieved. With the parity, memory protect, and memory relocation
options implemented, and with adder and synchronizing delays added in, the latency could
only be reduced to four microseconds; but that
was acceptable.
The goal of packaging the basic system (central processor, I/O processor, console, and 32
Kwords) in one cabinet was met; it was a close
fit, and there were virtually no spare module
slots. Since few small systems were sold, it is not
clear that this emphasis was warranted.
The goal of extended I/O bus length was
achieved by switching from an unterminated,
diode-clamped I/O bus such as the PDP-9 used,
to a new, terminated I/O bus. A new set of bus
transceiver modules was designed to provide
greater speed and less bus loading. The new bus
design, with cleaner signals and no reflections,
combined with the new bus transceiver modules, permitted the I/O bus to be extended to 75
feet. The penalty paid was higher power consumption and greater power supply cost than in
the PDP-9.
The goal of better maintainability was partially achieved by equipping the logic with a
means of monitoring 400 signal points. This
feature was combined with a single step feature
which permitted troubleshooting from the console without the use of an oscilloscope. As it

turned out, the single step feature was used infrequently because of the training required to
use it properly.
Figure 31 shows the register transfer structure of the PDP-I5 processor. It was based on
elements and features used in earlier designs
and had a basic data path which permitted the
results from any of the 11 registers to be read
into the arithmetic unit and then back into the
registers. In order to achieve high speed operation, a number of separate registers (such as
the Step Counter, the Program Counter, and
the Multiplier-Quotient registers), operated in

FROM
I/O BUS

INPUT GATING
ARITHMETIC UNIT

~TENDED

I

ARITHMETIC
ELEMENT

-

---,

I

I
I
I

I
I
I
I

Figure 31.
diagram.

PDP-15 processor register transfer

THE PDP-1 AND OTHER 18-BIT COMPUTERS

parallel with the basic data path. In this way,
significant overlap occurred, permitting the
800-nanosecond cycle time. The contrast between this design and the PO P-4 design is
noteworthy. The PDP-4 had only four registers
in the basic machine, but the use of integrated
circuits in the PDP-15 permitted more registers
to be used without so much concern for cost.
The first major extension of the PDP-I5 was
the addition of the Floating-Point Processor
(Figure 32) to enable it to perform well in the
scientific/computation marketplace using
FORTRAN and other algorithmic languages.
With the addition of the Floating-Point Processor, the time for a programmed floating-point
operation was reduced from 100-200 microseconds to 10-15 microseconds, giving nearly a
factor of 10 increase in FORTRAN performance - depending on the mix of floating-point

CORE
MEMORY

I

:DP.15
...
· - - - - - , . . . . - - - - - - - : . - _ C _ p u _....

FP15
FLOATING-POINT
PROCESSOR

r----'I

I

CONTROL

L____ J

Figure 32. PDP-15 Floating-Point Processor register
transfer diagram.

161

operations. For most machines, the difference
between built-in and programmed data-types is
higher; but, because the machine was originally
designed to operate effectively without hardwiring, the difference is quite low. Table 4 gives
a summary of the performance improvements
offered by the floating-point option.
The addition of the floating-point unit required that a number of instructions be added
to the machine. The irony of this extension is
that the PDP-II and nearly all minicomputer
instruction set extensions exactly follow this evolution.
A low cost multi-user protection system was
added in the form of a relocation register and a
boundary register. Because this was marketed
as an add-on option, it degraded the machine
performance more than necessary. However,
the minimum machine cost maintained the performance/cost target.
The first PDP-I5 was shipped in February
1970, 18 months after the project had started. A
number of difficulties had been encountered, including personnel turnover, that caused a twomonth slip. However, the project at first customer ship was within the budget and, by 1977,

Table 4.

Floating-Point Computation Times

Program
Type

Without
FloatingPoint
Option

With
FloatingPoint
Option

Improvement

Matrix
Inversion

12.0 sec

5.0 sec

2.4

Fourier
Transform

16.9 sec

2.9 sec

5.8

Least
5.1 sec
Squares Fit

0.7 sec

7.3

Test of all
11.4 sec
FP Functions

1.4 sec

8.1

A Physics
Application

3.0 sec

12.3

37.0 sec

162

BEGINNING OF TH E MINICOMPUTER

790 machines had been shipped - more than the
total of all other DEC I8-bit machines.
Two of the PDP-I5 models are of special interest. A dual central processor version and the
PDP-I5/76. These are treated separately below.
DUAL CENTRAL PROCESSOR PDP-15

In 1973 the PDP-I5 product line proposed
and sold a system that was a dual processor.
From the dual processor project came a dual
port memory, which eventually was transferred
to the PDP-I5 standard product line. The dual
port memory also expanded memory to the full
128 Kwords built into the PDP-I5 addressing
structure. The unit occupied a single rack and
used the M-Series logic modules. Because there
was space to add a third port within the rack
unit, the dual port memory was actually built to
be a three port device. At the time, the laboratory breadboard was an impressive array of
three cabinets containing 128 Kwords of memory and two processors.
The logic included what went unrecognized
as a "synchronizer" problem for two months,
despite reviews by some senior engineers. The
synchronizer problem, first described by
Chaney and Molnar [1973] of Washington University, is a classical logic design problem that is
theoretically unsolvable. When synchronizing
(detecting) the presence of an event occurring at
a random time relative to a fixed clock event, a
small amount of energy is available to set the
flip-flop. When the flip-flop is triggered with
such a small signal, it can go into an undecided
(metastable) state for a relatively long (even indeterminant) period of time. The problem occurred in the dual port memory design because
the three inputs (2 ports and the memory clock)
needed to be synchronized. Despite the theoretical lack of a solution, the practical solution
is usually to wait longer (e.g., two clock times)
or to improve the circuit by unbalancing it.
Once the problem was recognized, the design
went to a quick completion.

PDP-15/76

Of the systems listed in Table 2, the PDP15/76 was one of the most interesting. A simplified block diagram of the final evolved state
of the PDP-I5/76 is shown in Figure 33. The
diagram is referred to as an evolved design because the PDP-II connection and the floatingpoint arithmetic features were not part of the
original PDP-I5 design.
The design of the PDP-I5/76, also referred to
as the Unichannel 15/76, began as a problem:
find the most cost-effective way to attach a new
moving head, removable platter disk to the
PDP-I5. After a review of the problem, it became clear that the correct way to solve the
problem was to use a PDP-II processor and the
controller that had been designed for the PDP11. The key reason for this was not the cost of
designing a controller for the PDP-I5, but
rather the cost of writing a new set of disk diagnostics in PD P-I5 code. (By that time, it was
clear to all designers that hardware costs were
swamped by software costs.)
As the system design progressed, it became
clear that the PDP-II could be used to run the
other PDP-II family peripherals that were the
object of most of DEC's development and production efforts. The list of new peripherals
quickly grew to include communications lines,
plotters, printers, and card equipment. Figure
34 shows the options available for the PDP15/76.

Figure 33.

PDP-15176 simple system block diagram.

THE PDP-1 AND OTHER 18-BIT COMPUTERS

163

TO
OTHER

,
I

I
I

L-+- _ _ _

0
8

RKOS
DEC
PACK

FP15 FLOATlNGPOINT PROCESSOR
KF15 POWER FAIL
KAIS AUTO PRIORITY INTRT.

CENTRAL
PROCESSOR

1/0 PROCESSOR

OTHER
DEVICES

I

TUS6

I

10 0~·J
TO
OTHER
DEVICES

Figure 34.

PDP-15176 (XVM) system block diagram.

.L...---r------r-------r-------'

164

BEGINNING OF THE MINICOMPUTER

The project had a very small but excellent
staff, and the hardware part of the program
went very smoothly. Al Helenius did much of
the logic design for the memory multiplexer device, using existing M -Series logic modules, and
the prototype was operational in early N ovember 1972. The complexity and size of the software task was clearly underestimated.
However, the successful system operation depended on having more software. Rick Hully
proposed an operating system structure that,
for the era and application, was elegant, advanced, and yet straightforward. The reality
was that the PDP-I5/76 was a "multiprocessor" system, and today's terms "backend processor" and "file processor" apply to
what was accomplished on this machine in the
early I970s. Also, this structure was used by
IBM in the coupled 7090/7044 system and the
360 Attached Support Processor.
From an application point of view, the PDP15/76 dual processor system was extremely effective, especially in the following applications:
1.

2.

Computer-aided design. With the PDP-15
processor handling figures and computation while the PDP-II processor
handled an input digitizer, high speed
plotter, and printer; with the PDP-II
and PDP-15 sharing memory and the
new disk.
Batch processing. With the PDP-I5 and
the floating-point option handling computation while the PDP-II handled
spooling to printers, input from card
readers, and terminals.

THE SERIES AND ITS EVOLUTION

It is useful to compare the five 18-bit computers that were designed over the course of
roughly 10 years. The series began in the early
second (transistor) generation and extended to
the early part of the third (integrated circuit)
generation. Had the series been extended to the

fourth (large-scale integrated circuit) generation, a version of the PDP-15 could have been
easily implemented on a single silicon chip. The
paragraphs which follow each summarize the
important characteristics of one or two members of the series, and Table 5 gives the technical information.
Contributions of Individual Machines to
Series Development

The PDP-I had a number of innovations over
its laboratory predecessors, the Whirlwind and
TX-O. It contributed extremely straightforward
I/O interfacing capability together with a multichannel interrupt structure and Direct Memory
Access capability which enabled a high I/O
data rate. These characteristics made it ideal for
high performance laboratory applications. The
PDP-I also represented a major stepping stone
in the early days of timesharing computers. The
message switching application contributed significantly to its market success and motivated
the design of good communication interfaces in
subsequent computers. Because the PDP-I
served as a thorough test vehicle for the circuitry of the lOOO-series system modules, these
modules were more suitable for their general
application in building digital systems.
The PDP-4 contributed in small ways: there
were minor improvements in the instruction set
processor; and, because the PDP-4 was oriented
to a much lower cost, some of the modules were
refined. The simplified logic design of the PDP4 was a major influence on the implementation
style of subsequent computers. It also contributed the fundamental minicomputer notion that
successor machines should be lower cost. Moreover, the PDP-4 extended the marketplace to
industrial control, which had not been possible
at PDP-l's price levels, and further improved
the ease of I/O interfacing.
The PDP-7 and PDP-9 Families exploited a
significant refinement in the wire-wrap packaging technology. Although the circuits were

THE PDP-l AND OTHER 18-BIT COMPUTERS

Table 5.

165

Characteristics of DEC's 18-Bit Computers
PDP-1

PDP-4

PDP-7

PDP-9; 9/l

PDP-15

Project start;
first ship

8/59; 11/60

11/61; 7/62

4/64; 12/64

8/66 - 12/68

5/68; 2170

Goals

Cost; short word
length; speed

Cost

Speed; cost

Speed; cost;
producibi lity

Cost; range of
machines, hardware/software
systems

Applications

Lab control;
message
switching; timesharing

Process control;
industrial testing

Improved timesharing

Graphics

Numerical computation; graphics
processing

I nnovations/
improvements

Circuit use;
package; ISP;
interrupts; Direct Memory
Access; I/O interfacing

Functional (bitslice) modules;
ISP trend to
mini; 3 Cycle
DMA; I/O interfacing

Package; modules; performance

Microprogramming;
I/O Bus

Integrated circuits; floatingpoint; multiprocessor

Price (K$) with
paper tape
reader /pu nch.
Typewriter,
4-Kwords

120

65.5 (56.5)

45

25+; 24.4
(19.9)

19.8 (16.2)

Price/word ($)

7.32

3.66

3.99

2.19; 1.95

1.71; 1.32

MTBF (hours)

2800

Memory cycle
time (/-Ls)

5

8

1.75

1; 1.5

0.8

Memoryaccesses/sec
(millions)

0.2

0.125

0.57

1; 0.67

1.25

Multiply/
divide ti me (/-Ls)

25/40

4.4/9

4.5-12.5/12.5

4.5/4.5

Memory size
(Kwords)

1.4, ... 165

1.4.8, ... ,32

4, ... ,32

8.4. ... ,32

4, ... ,131

Bits accessed
per sec per $

30 (0.033)

34.5 (0.029)

227 (0.0044)

714 (0.0014)

1135 (0.00088)

1.1

6.6

3.1

1.7

Perf.lprice
improve

*

5400

*Uses previous model as base for improvement.

166

BEGINNING OF THE MINICOMPUTER

Table 5.

Characteristics of DEC's 18- Bit Computers (Cant)
PDP-4

PDP,7

PDP-9; 9/L

PDP-15

Price improve*

1.8

1.45

1.8

1.3 (1.5)

Perf. improve*

0.62

4.57

1.75

1.25*

PDP-1

Product life
(years)

4

3

4

4

7

Number
produced

50

45

120

445

790

Power (W)

2160

1125

2100

2000

2875

Weight (lb)

1350

1030

1150

790

750

Size (69 X 21
X 28 inch bays)

4

2

3

1.5 (special)

Volume (ft3)

94

47

70.5

36

23.5

Power density
(W/ft3)

22.9

23.9

29.8

55.5

122.3

Weight density
(lb/ft3)

14.4

21.9

16.3

21.9

31.9

Watts/$

0.018

0.017

0.046

0.08

0.15

Lb/$

0.011

0.016

0.026

0.032

0.038

Kbits accessed
per W

1.6

1 .1

4.9

9.0

7.8

Kbits accessed
per Ib

2.6

2.2

8.9

22.8

30.0

Kbits accessed
per Kft 3

38.3

47.9

146.0

500.0

957.0

Logic
technology

Saturating
MADT transistors

Capacitor-diode
gates; diode
transistors

Saturating
transistors

7400, 74HOO
series integrated
circuits

Module series

1,000

4,000

B

M

Logic speed
(MHz)

5,0.5

1. 0.5, 5

10,1,0.5

*Uses previous model as base for improvement.

10, 1

10, 20

THE PDP-1 AND OTHER 18-BIT COMPUTERS

Table 5.

167

Characteristics of DEC's 18-Bit Computers (Cont)

Module size

PDP-1

PDP-4

5.25 X 4

5.25 X 4

236/41

PDP-7

PDP-9; 9/L

PDP-15

2.25.5

2.25.5. 10
(X 3.875)

Same

(X 3.875)

614/39

644/44

300/54

Modulesltypes

544/34

Transistors.
diodes. ICs

3.5 K. 4.3 K

Power supplyl
types

8/4

4/2

9/4

Modules space
processor

18 X 25

6 X 25

12

Modules space.
I/O interface

3 X 25

3 X 25

Modules space.
reader. punch.
typewriter

3 X 25

3 X 25

8 X 32

8 X 44

7

Modules space.
4-Kword
memory

4 X 25

4 X 25 (8 K)

3 X 32

3 X 44

4 X 32

Pc. Mp. I/O logic
area (in 2 X K)

11 .9

5.2

5.3

5.6

3.4

Processor logic
area (in 2 X K)

8.9

3.3

3.3

3.1

2.1

Logic prints

18

16

27

4412 = 22

7512

350. 200. 3.4 K

X 32

1/1

1/1

8 X 44

4 X 32

4 X 32

*Uses previous model as base for improvement.

based on the early PDP-6 IO-MHz circuits, the
more cost-effective and producible Flip Chip
package was used. Both machines had significant performance gains over all predecessors.
Using the number of words or bits accessed by
the processor per unit time as the performance
measure, the PDP-7:PDP-4 ratio was 4.57 and
the PDP-9:PDP-7 ratio was 1.75. Both gains
were due to the use of faster core memories. The
PDP-9 used microprogrammed control, even
though the simple instruction set processor

probably did not necessitate the high entry cost.
A large microprogram store could have
changed the performance (and history) of successor minis. The change to an I/O bus structure, pioneered in the PDP-5, entered the 18-bit
series with the PDP-9. It distributed the I/O interface to each option and so further reduced
the basic cost.
The PDP-I5's use of integrated circuits provided an 18-bit series improvement. At last
there was a significant reduction in size, al-

168

BEGINNING OF THE MINICOMPUTER

though the power consumption increased. The
board area in the processor decreased by a factor of three over previous implementations,
where it had been relatively constant at about
3,000 in 2 • The two major contributions of the
PDP-I5 were the notion that systems include
both hardware and software and that the machine would span a range of sizes. Finally, to
extend the life of the machine, a number of improvements (e.g., in memory, PDP-II I/O)
were later made to reduce price and to increase
performance (floating-point, multiple processors).

150,--------------------,

100
90
80
70
60
PRICE

40

30

20
TELETYPE
eASR
VERSION

Project Development Times and Product
Lifetimes
1 0 L - - L _ L - - L____

The duration of the projects generally increased with time, reflecting the longer tooling
time for increased production volumes. The
PDP-4 is an exception; it had the shortest design time because the circuits and mechanical
packaging were based on the PDP-I. In addition to increased development times with passing years, later members of the series had longer
product lifetimes; hence, longer times elapsed
before re-implementations occurred. The time
between the first few implementations was only
about two years. The final implementation, the
PDP-I5, was produced for seven years. The
early (too frequent) implementations were per
haps indicative of the attention paid to low
hardware cost and performance, rather than to
application and software enhancements to increase the market life.
Price

Figure 35 shows that the price for a basic
"bare-bones" system declined by more than 19
percent per year. The price of the typical midsize system has never been properly analyzed,
but roughly speaking, the average price declined from an initial cost of$250K for a PDP-I
to $65K for a PDP-9. For a given processor,
however, the size of typical systems purchased

= 120.000 X 0.81'-1960.9

50

60

61

62

63

L-~

64

65

_____
66

~~_~~_~

67

68

69

70

71

YEAR

Figure 35.
Price versus time for l8-bit computers with
paper tape 110. typewriter, and 4-Kword memory.

grew with time. For example, early PDP-I5 systems were sold at an average price of $75K,
while the final average price was about $125K.
Not all price reductions were the result of
cheaper logic technology or better manufacturing techniques on the part of DEC. Some
prices, particularly system prices, were influenced strongly by the prices of peripherals.
For example, the Teletype Corporation Model
33 ASR teleprinter with built-in paper tape
reader and punch helped reduce the price of the
minimum configurations of later IS-bit computers by as much as any other component
price reduction.
The primary memory price decline (Figure
36) of only 16 percent per year can be attributed
to the fact that each subsequent machine
needed higher performance memories. Memories were always implemented at relatively constant price with increasing performance. Again,
the PO P-4 is an exception; it shows the effect of
building a low performance memory versus the
fastest memory. While the first PDP-4s were

THE PDP-1 AND OTHER 18-BIT COMPUTERS

10E.---------------------------------~

5.0
4.0

3.0

2.0

169

shipped with PDP~I memory, the next machines had S-Kword memory systems that cost
about half that of the PDP-I. The price of the
IS-bit memory systems decreased at a rate
slightly less than that of the 12-bit or 36-bit
computers, One possible explanation would be
an economy of scale in quantity shipped in the
I2-bit case and an economy of scale in word
length in the 36-bit case.
Performance

1.0

0.5 L--'-----'-----'------'_ _-'----'-----'------'_ _-'----'-----'-------'-_ _~U
71
72
73
74
60
YEAR

Figure 36.

Price/word of 18-bit memory versus time.

5.0
4.0

1

3.0

2.0

~

Performance (in millions of words accessed
per second by the processor) is shown in Figure
37 and exhibits a 29 percent yearly increase.
Neither the PDP-I5 nor PDP-4 fall on the line
because both were oriented to lower price
rather than to increased performance. In reality, the PDP-I5 later evolved to have much
greater effective performance when built-in
floating-point arithmetic was added. Then its
real performance (a factor of 2 to 10 better for
FORTRAN programs involving floating-point)
exceeded the line position. Midlife extensions of
this sort were generally missing on the other ISbit computers, as design resources went into developing new processors.

§
~

!

1.0

Price/Performance


OJ

::j
(')

0

s:
-U

C

Figure 12.

PDP-8/E system block diagram (part 1 of 2).

---1

m

:::0
(f)

00

tv

184

BEGINNING OF THE MINICOMPUTER

UP TO 4 UNITS MAXIMUM

Figure 12.

PDP-8/E system block diagram (part 2 of 2).

would not be necessary to design a complete
new set of options at the time the machine was
introduced, and existing customers could upgrade to the new computer without having to
buy new peripherals.
The reason for using an adapter to connect to
existing I/O devices was that the PDP-8/E featured a new unified-bus I/O Bus implementation related to the Unibus that was being
designed for the PDP-II. The electrical design
of the I/O Bus for both the previous negative

logic and posItIve bus machines had been
straightforward, but the mechanical packaging
and cabling had not. A new implementation
was needed which would simplify the packaging
and cabling and solve the problems created by
the Direct Memory Access channel, which had
not been bused in previous designs. Don White,
who was leading the design team, conducted a
contest to name the new bus. After discarding
such entries as "Blunderbus," the name "Omnibus" was chosen

THE PDP-8 AND OTHER 12-BIT COMPUTERS

The Omnibus, which is still in use in the
PDP-8/ A, has 144 pins, of which 96 are defined
as Omnibus signals. The remainder are power
and ground. The large number of signals permit
a great number of intraprocessor communications links as well as I/O signals to be accommodated. The Omnibus signals can be
grouped as follows:
1.
2.
3.
4.
5.
6.

Master timing to all components.
Processor state information to the console.
Processor request to memory for instructions and data.
Processor to I/O device commands and
data transfer.
I/O device to processor, signaling completion (interrupts).
I/O Direct Memory Access control for
both direct and Three Cycle Data Break
transfers.

The approximately 30 signals in groups 4 and
5 provide programmed I/O capability. There
are about 50 signals in group 6 to provide the
Direct Memory Access capability. These 80 signals are nearly equivalent in quantity and function to the preceding PDP-8 I/O Bus design,
making the conversion from Omnibus structure
to l')DP-8/1 and PDP-8/L I/O equipment very
simple.
The complement of signals is quite different
from that in the PDP-II Unibus, which is more
strictly an I/O bus, and the PDP-8/E processor
handled many more of the Direct Memory Access and interrupt control functions than does
the PDP-II processor. One specific signaling
structure that differs between the two machines
is the interrupt system, which in a PDP-II
Unibus passes a Bus Grant signal through the
I/O options to be propagated further or absorbed by the option. There are no such passthrough signals on the Omnibus; hence, any option can occupy any slot, and intervening slots
between installed options can be left vacant. A

185

by-product (or perhaps goal) of the Omnibus
structure is that there are a fixed number of
slots. The lack of cabling between options
means that the electrical tmnsmission characteristics are well defined.
The processor for the PDP-8/E occupied
three 8 X IO-inch boards; 4 K words of core
memory took up three more boards; a memory
shield board, a terminator board, a teleprinter
control board, and the console board completed the minimum system configuration.
Thus, a total of ten 8 X IO-inch boards formed
a complete system. The three-board PDP-8/E
processor, occupying 240 in2, was in striking
contrast to the IOO-board PDP-5 processor,
which occupied 2,100 in 2.
The PDP-8/E implementation was determined by the availability of integrated circuits.
Multiplexers, register files, and basic arithmetic
logic units performed the basic operations in a
straightforward fashion using a simple sequential controller. Microprogrammed control was
not feasible because suitable read-only memories were not available. The read-only rope
magnetic memory of the PDP-9 was too expensive and was unsuitable for PDP-8/E packaging. Integrated circuit read-only memories
available at that time were too small, holding
only about 64 bits.
There was some problem partitioning the
processor logic among the three modules. Figure 13 shows the final arrangement, which was
to place timing and interrupt on one module,
the data path on a second, and the control on
the third. Even with this partitioning, more pins
were required between the data and control
modules than were available through the Omnibus. To provide the necessary connections,
additional connectors were installed on each
module on the edge opposite the Omnibus connection.
The PDP-8/E was mounted in a chassis
which had space and power to accommodate
two blocks of Omnibus slots. Thirty-eight modules could be mounted in the slots, allowing

186

BEGINNING OF THE MINICOMPUTER

space for the processor and almost 30 peripheral option controllers. Many customers
wanted to build the PDP-8/E into small cabinets and have it control only a few things. They
found the large chassis and its associated price
to be more than they wanted. To reach this
market, the PDP-81M was designed.

The PDP-81M was essentially a PDP-8/E cut
in half. The cabinet had half the depth of a
PD P-8 IE, and the power supply was half as big.
There were 18 slots available, enough for the
basic processor-memory system and about eight
options. The processor was the same as that for
a PDP-8/E.

PDP-alE ORGANIZATION

~N-~-~L_-P-~-:E-~-S_-O~-~-N-~-_-_-_-_-_-_-_-_-_-_-_-_------------l--

-- -- -- -- --

I : i------------------~~-=-~l;:
I
1"""1'----"'"
I

I
I
I
I
I
I
I
I

I

---+-+-1----....,
I

I

MAJOR REGISTERS

REGISTER
I CONTROL

I

:

I

I :

I

TIMING

----I
OMNIBUS
LOADS

L ___ _

ADDRESS
DECODING

LEGEND
DATA L l N E S _

CONTROL LlNES- - - .

OMNIBUS=:)

Figure 13.

I

r-..L-.,
I

TELETYPE

I

MEMORY
STACK
4096 WORDS

-l
I
I

SENSE/INHIBIT

L.. _ _ _ .J

CONSOLE FRONT PANel

PDP-8/E basic system block diagram.

~EMORY _ _ _ _ _ _ _ _ _ _ _ _ _

I

--.J

THE PDP-8 AND OTHER 12-BIT COMPUTERS

By 1975, DEC had been building "hex" size
printed circuit boards for the PDP-II /05 and
PO P-ll /40 for at least two years. The hex
boards were 8 X 15 inches, half again as big as
the "quad" boards used in the PDP-8/E and
PDP-8/M, which were 8 X 10 inches. The dimensional difference was along the contact side
of the board. A hex board had six sets of 36
contacts while the quad board had only four
sets. Semiconductor memory chips had also become available, so a new machine was designed
to utilize the larger boards and new memories
to extend the PDP-8/E, PDP-8/M to a new,
lower price range. The new machine was the
PDP-8/ A. The PDP-8/ A processor and register
transfer diagram is shown in Figure 14 and the
8/ A processor in Figure 15.
The hex modules permitted some of the peripheral controller options that had occupied
several boards in the PDP-8/E to fit on a single
board in the PDP-8/ A (Figure 16). The availability of hex boards and of larger semiconductor read-only memories permitted the
PO P-8 / A processor to use microprogrammed
control and fit onto a single board. It should be
noted here that when a logic system occupies
more than one board, a lot of space on each
board is used by etch runs going to the connectors. This was particularly true of the PDP8/E and PDP-8/M processor boards, due to the
contacts on two edges of the boards. When an
option is condensed to a single board, more
space becomes available than square inch comparisons would at first indicate because many of
the etch lines to the contacts are no longer required.
The first PDP-8/ A semiconductor memory
took only 48 chips (l Kbit each) to implement 4
Kwords of memory. Memories of 8 Kwords
and 16 K words were also offered. In 1977, only
96 16-Kbit chips were needed to form a 128Kword memory. With greater use of semiconductor memory, especially read-only memory, a scheme was devised and added to the

187

PDP-8/A to permit programs written for readwrite memory to be run in read-only memory.
The scheme adds a 13th bit to the read-only
memory to signify that a particular location is
actually a location that is both read and written.
When the processor detects the assertion of the
13th bit, the processor uses the other 12 bits to
address a location in some read-write memory
which holds the variable information. This effectively provides an indirect memory reference.
In 1976, an option to improve the speed of
floating-point computation was added to the
PDP-8/ A. This option is a single accumulator
floating-point processor occupying two hex
boards and compatible with the floating-point
processor in the PDP-12. It supports 3- or 6word floating-point arithmetic (12-bit exponent
and 24- or 60-bit fraction) and 2-word double
precision 24-bit arithmetic. As a completely independent processor with its own instruction
set processor, it has its own program counter
and eight index registers. The performance, approximately equal to that of an IBM 360 Model
40, provides what is probably the highest performance/cost ratio of any computer.
More Omnibus 8 computers (PDP-8/E,
PDP-8/M, PDP-8/ A) have been constructed
than any of the previous models. The high demand for this model appears to be due to the
basic simplicity of the design, together with the
ability of the user to easily build rather arbitrary system configurations.
In the fall of 1972, DEC began the design of a
single chip P-channel metal oxide semiconductor (MaS) processor to execute the
PDP-8 instruction set. This processor was to be
called the PDP-8/B, and it was hoped that production chips could be obtained by the spring of
1974 for systems to be shipped in the fall of
1974. The designers had progressed through the
design tradeoffs in partitioning a PDP-8 for a
single 40-pin chip when the project was stopped
in the summer of 1973. The key reasons for
stopping the project included the industry trend

.--

r;N;R~IO:-DE;-DE;­
I
I

r--------------,
I

TIMING GENERATOR

I
I Ii

CLOCK.

1 ~J-..I"·-·'n'MiNG'
~
- l-----..-+---J

POWER ON

~

I

I

I

I
l·lu •. ,,,g.T'T.~ I

II
I

I I

I _____ _
L

MO LINES

OJ

SUM LINES

m

G')

Z
~
Z
G')

I

o
."

SIGNALS_~

-I

J:

II

m

I ~6~~RAONLD SKIP
~~GNN~~~--.I
-~~
I
I LOAO
REGISTER
I

~

Z

()

o

s:

SIGNAL~

I I
.J L

~~T~;~ISTER

LLr-::-:-l_l

I
I
MD

I I
I I

I

00
00

--,

I

I

REG~S~'1ER

I
I

-

MAJOR REGISTERS
AND GATING

I

GATING
CONTROL
SIGNALS ~

-

"tJ

T ----- T ---

MD
BUS

-r

L

CONSOLE CONTROL
SIGNALS

--1----1---...1

MA BUS

MO BUS

MAJOR REGISTER
GATING CONTROL
SIGNALS

TIME PULSE
AND
TIME STATE SIGNALS

OATA BUS

OMNI BUS

TIME STATE
SIGNALS

CONTROL
SIGNALS

.- ___ .l _____ .L ___ ,
I PROGRAMMERS
CONSOLE

~

I I/O TRANSFER

1 __

DATA BUS

__ L _____ L ____

~

I

I
I

IL _ _ _ _ _ _ _-_
-_
--_ I
_ _ __-_-_
~

Llj-I_-_

--~

Figure 14.

PDP-8/ A processor and register transfer diagram.

C

-I

m

:::0

PDP-S/I' processor (interior)

190

BEGINNING OF THE MINICOMPUTER

from P-channel to N-channel and the fact that
the Omnibus did not lend itself to cost reductions with large-scale integrated circuit technology. While the Omnibus was ideal for
medium-scale integration and ease of interfacing, it was not as cost-effective as the buses
that microcomputers used, which multiplexed
address and data on the same leads at different
times. The percentage of system cost and complexity represented by the processor in an Omnibus-8 system was too low to make the move
to large-scale integrated (LSI) processor attractive at that time. For these reasons, it was decided to apply the newer N-channel process to a
system in which the processor was a more complex and costly part of the system - the PDP-II
Family. Thus, in the summer of 1973, a project
started in cooperation with Western Digital
Corporation to build a PDP-lion one or more
N-channel LSI chips.
In 1976, Intersil offered the first PDP-8 processor to occupy a single chip, using CMOS
technology. DEC verified that it was a PDP-8
and began to apply it to a product in the fall of
1976. In the meantime, in addition to Intersil,
Harris Semiconductor became a second source
of chip supply for DEC. The two manufacturers
each have their own designation for these chips,
but in the discussion below they will be called
"CMOS-8" chips. A microphotograph of the
chip is shown in Figure 17.
The CMOS-8 processor block diagram is
given in Figure 18. Not surprisingly, it looks
very much like a conventional PDP-8/E processor design using medium-scale integrated circuits. It has a common data path for
manipulating the Program Counter (PC), Memory Address (MA), Multiplier-Quotient (MQ),
Accumulator (AC), and Temporary (Temp)
registers. The Instruction Register (lR), however, does not share the common arithmetic
logic unit (ALU). Register transfers, including
those to the "outside world," are controlled by
a programmable logic array (PLA), as indicated

by the dotted lines in the figure. CM OS-8 is an
example of the use of programmable logic arrays for instruction decoding and for control
purposes, as discussed in Chapter 2.
While the CMOS-8 is the first DEC processor
to be built on a single chip, the most interesting
thing about it is the systems configurations that
it makes possible. It is not only small in size (a
single 40-pin chip), but it also has miniscule
power requirements due to its CMOS construction. Thus, some very compact systems can be
built using it. The block diagram in Figure 19
shows a system built with a CMOS-8 and compatible components. In contrast to those of past
systems, some of the other components in this
system now represent more dollar cost and
more physical space than the processor itself.
Among these are the random-access read-write
memory, the read-only memory, and the Parallel Interface Elements associated with the I/O
devices. The Parallel Interface Elements enable
interrupt signals to be sent back to the proces
sor and decode the In-Out Transfer (lOT) commands that control data transfers. Also shown
in Figure 19 are some specific I/O devices such
as the Universal Asynchronous Receiver /
Transmitter (U ART) chips that do serial/parallel conversions and formatting for
communication lines.
An excellent example of the use of a CMOS-8
as part of a packaged system is the VT78 video
terminal shown in Figure 20. The goals for this
terminal were to drastically reduce costs by including the keyboard, cathode
tube, and
processor in a single package the size of an ordinary terminal. The CMOS-8 chip and high
density RAM chips made this possible. To form
a complete, stand-alone computer system that
supports five terminals, mass storage was
added. Because the mass storage was floppy
disks, it was not in the terminal but in a small
cabinet. Even without the mass storage, however, the VT78 forms an "intelligent terminal."
An intelligent terminal is usually defined to in-

py

THE PDP-8 AND OTHER 12-81T COMPUTERS

MAJOR STATE
GENERATOR

INSTRUCTION
REGISTER

Figure

17.

TIIV;;NG AND
STATE CONTROL

PROGRAMMED LOGIC
ARRAY (PLA)

PLA OUTPUT LATCH

MULTIPLIERQUOTIENT

Microphotograph of the CMOS-8 chip (courtesy of Intersil Corporation).

TEMPORARY
REGISTER

191

192

BEGINNING OF THE MINICOMPUTER

LEGEND
- - - INTERNAL CONTROL LINES
- - EXTERNAL INPUTS-OUTPUTS
- - DATA LINES

T--~
I
I

lQ

RESET. RUN. HLT
DMAREQ. CPREQ

--.l

I
I

I

~O

e+5V
eGND

CRYSTAL

I

I
---I

r----+

I

I
IFETCH.
DATAF. RUN

CPSEL

140 PINSI

111

WAIT

Figure 1S.

SKP. CO.
Cl. C2

STATE
CONTROL

Block diagram of CMOS-S.
DEVICE
ADDRESS

DEVICE
ADDRESS
SELECT

r1D~
~

~
HARRIS
256)( 4
CMOS
RAM

111
LXMAR

ir

III XTC

I

HARRIS
256 X 4
CMOS
RAM

HARRIS
256 X 4
CMOS
RAM
HM-6561

HARRIS
1024 X 12
CMOS
ROM

ii

f f

--"1"~

r---

t

HARRIS
CMOS
PARALLEL
INTERFACE
ELEMENT

PRIORITY
OUT

r-f-

ii

I

I

SELjCT

I

1121 OX

I

141

CO. Cl. C2. SKP
INTREQ

III INTGNT

III

111

IFETCH
III RESET

I

-5 V

~

CPREQ

III RUN/HLl

PLUG IN
FOR
CONTROL
PANEL

-

.-r::3.60 MHz D

HARRIS
CMOS
UARl

~

111

CPSEL

I

III

SWSEL

20mA LOOPS

I

I

GND

Figure 19.

CMOS-S based system.

T
TELETYPE

r---

-

PRIORITY

HARRIS
CMOS
PARALLEL
INTERFACE
ELEMENT

I

III DEVSEL

111

,--

t

rr

III MEMSEL

HARRIS
CMOS
CPU

IN

STATUS

~

~
ICONTROLI
--FLAGS

THE PDP-8 AND OTHER 12-BIT COMPUTERS

193

ure 21 is a block diagram of a VT78 system terminal.
An intelligent terminal can be used either as
part of a network or as a stand-alone computer
system. In the former case, the application is determined by the network to which the terminal
is attached, but in the latter case, the terminal
functions as a desk-top computer running various PO P-8 software.

TECHNOLOGY, PRICE, AND
PERFORMANCE OF THE 12-BIT
FAMILY

Figure 20.

The VT78 video terminal.

clude a computer whose program can be loaded
(usually via a communication line) to take on a
variety of characteristics - i.e., it can learn. Fig-

The POP-8 has been re-implemented 10 times
with new technology over a period of 15 years.
The performance characteristics of these implementations are given in Figure 22. As discussed
in Chapter 1, new technology can be utilized in
the computer industry in three ways: lower cost
implementations at constant performance and
functionality, higher performance implementations at constant cost, implementation of new
basic structures. Of these three ways, the PDP-8
Family has primarily used lower cost implementations of constant performance and functionality.
The points in Figures 23 and 24 are arranged
to show the cost trends of three configurations.
The first configuration is merely a central processor with 4 K words of primary memory. The
second configuration adds a console terminal,
and the third configuration adds OECtapes or
floppy disks for file storage. Note that the basic
system represented in the first configuration has
declined in price most rapidly: 22 percent per
year in the early days and 15 percent per year in
recent years. The price of primary memory, on
the other hand, has declined at the rate of 19
percent per year, as seen in Figure 25.
The price and performance trajectories for
the PO P-8 family of machines are plotted in

194

BEGINNING OF THE MINICOMPUTER

1.0 , . . - - - - - - - - - - - - - - - - - - ,
0.9
PDP·8/E
0.8
PDP-8/1
~e-e~
0.7

,o,'~ --,,""

-7--."""'0'"'"

I

\

I
I

e
VT78

I

1--I
MICROPROCESSOR

0.1
PDP-5e
0.09
0.08

I
I
I

I
I
_..J

0.07
0.06
0.05
0.04

0.03

0.02

EXTERNAL MR78
IFOR PROGRAM LOADING)
FLOPPY DISK SYSTEM

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

PRINTER AND
MISCELLANEOUS DEVICES ••- - - - - - - - - - '

YEAR

Figure 21.
Block diagram of the VT78 microprocessor
system terminal.

Figure 22.
Performance of DEC's 12-bit computers
versus time.

Figure 26, with lines of constant price/
performance separated at factors of 2. Note
that the early implementations had significantly
lower performance than the original PDP-8.
Memory performance and instruction execution performance were directly related in all of
these machines except the PDP-5 (which kept
the Program Counter in primary memory) and
the PDP-8/S (which was a serial machine).
Thus, with the design emphasis on lowering the
cost with each new machine, performance continued to lag behind that of the PDP-8 until
higher speed primary memory was available
without a cost penalty. Other performance improvements, such as the addition of floatingpoint hardware or the addition of a cache, are
not treated in this comparative analysis.

Figure 27 gives the performance/price ratio
for the PDP-8 Family machines, and it can be
directly compared with that of other machines
described in this book. The I8-bit machines improved at a rate of 52 percent to 69 percent per
year over a short time, as indicated on the
graph. Setting aside the PDP-5 design point, the
improvement for the I2-bit machines was similar during the same period but has since slowed
to only 22 percent per year.
Rather than try to fit a single exponential to
the performance/price data points in Figure 27,
it might be better to try two independent exponentials. The reason for this is that the data
points really mark the transition between two
generations. The PDP-5 was a mid-second
(transistor) generation machine, and the PDP-8

THE PDP-8 AND OTHER 12-BIT COMPUTERS

100

90

.r---------------------------------NOTE
LlNC. LlNC·S. and PDP·12
include 2 LI N Ctapes
(or 0 ECtapes) and scopes and

SO
70
60

AID conversion.

40

6

6

J,'
~v

PDP·12 (4 KWI

\

PDp.8/L

/=

20

50.000 X 0.S6'-1963.75
VT78

~~

=

~:;OPPIES.

35

/~6
15.300 X 0.95'-1971.25

10

195

30

'"

\,"'~,

" ' - . PDp.SII

~.

4 Kw WITH

\

\(ARD COPY AND
.,PAPER TAPE

Il.

25

1.USER SYSTEM
/

WITH 8 Kw.
2 DECtapes. HARD COPY. TERMINAL

POP·8/L •

LEGEND
•
4 Kw AND CPU

O
~

20
• PDP·8

4 Kw AND CPU AND
HARD COPY WITH
PROGRAM·LOADNG DEVICE

PDP·S·
15

1·USER SYSTEM WITH
S Kw. 2 DECtapes
(OR EQUIVALENT) AND HARD COPY

64

70

71

72

73

~.

:PDP.8/1
10
74

75

76

77 '78

:DP.8/S
~PDP.8/L

VT78 CRT
AND

PDP.8/~~""""""
.
~ ....... ~OP.8/E
"

YEAR

FLOPPIES

PDP.8IL .......... - - ...........PDP.8/M

"'.

.-----:DP.81 A

PDP.8/E'-

Figure 23.
time (log).

Price of DEC's 12-bit computers versus

•

PDP·8/M -..............
PDp·S/A

YEAR

represents a late second generation machine.
The PDP-8/1 and PDP-8/L were beginning
third (integrated circuit) generation designs.
These four machines represent a relatively rapid
evolution from 1963 to 1968. After the PDP8/L, the evolution slows somewhat between
1968 and 1977, as medium-scale integrated circuits continued to be the implementation technology, and the cost of packaging and
connecting components continued to be controlled by the relatively wide bus structure.
During their evolution, the DEC 12-bit computers have significantly changed in physical
structure, as can be seen from the block diagrams in Figure 28. The machines up through
the PDP-8/L had a relatively centralized structure with three buses to interface to memory,
program-controlled I/O devices, and Direct
Memory Access devices. The Omnibus-8 ma-

Figure 24.
Price of DEC's 12-bit computers versus
time (linear).

chines bundled these connections together in a
simpler physical structure. The CMOS-8 avoids
the wide bus problem by moving the bus to lines
on a printed circuit board. The number of interconnection signals on the bus is then reduced by
roughly a factor of 4 to about 25 signals which
can be brought into and out of the chip within
the number of pins available.
Figures 23 and 24 and Table 2 illustrate the
price/performance oscillating history of the design evolution summarized below:
1.

While the PDP-5 was designed to keep
price at a minimum, the PDP-8 had additions to improve the performance

196

BEGINNING OF THE MINICOMPUTER

4.0

10,000

r------------------------,

3.0

1,000

RANGE OF 18·BIT DESIGNS
11963-19701

•

PDP·8/S

0.1
64

65

66

67

68

69

70

71

72

73

74

75

76

77

10~~~_~~~~__~~~~~_~~~~__L_~

78

63

64

65 66

67

68

69

70 71

YEAR

Figure 25.
time.

Price per word of 12-bit memory versus

Figure

27.

Bits

accessed

72 73
YEAR

74

100~-----------~----~---~--__

NOTE
Price includes 4 Kw and
CPU without terminal

90
80
70
60

/

//~ ~~~~~/~~;~ONRS~:~~E

50
40

//

/

ISEPARATED BY A FACTOR OF 21

~:~=~~ • / /

30

,/

20

/ /

/

PDP·8
14/651

/

~

/

/

//

//0

•

PDp·8/1

PDP 8/S " ~~::Z:::-"'/"'--7-} 14/681
19/661 .~
t = 4 6
t = 3
/
PDP 8/L
111/681·
/
/
//
t=52
\PDP8/E

10
9
8
7
6

/it(~~I~/

/

PDP8/A

//

/

I

• PDis/M

161721

/t=7/
(1175)

/ / t=875

//
1~

0.1

_ _~_~~~~~~~_~~__~~~~~~
0.2
0.3 04 0.5

CPU PERFORMANCE (MILLIONS OF ADDITIONS/SECOND)

Figure 26.
computers.

Price versus performance for DEC's 12-bit

77

78

79

80

by the central procesprocessor sys-

sorlsec/$ versus time (for 4 Kword
tems),

75 76

+

THE PDP-8 AND OTHER 12-81T COMPUTERS

197

while not increasing prIce significantly

over that of a slower speed design. The
cost per word was modestly higher with
the PDP-8 than with the PDP-5, but the
PO P-8 had 6 times the perfomance of a
PDP-5, Thus, the PDP-8 crosses three
lines of constant price/performance in
Figure 26.
2.

The PDP-8/S was an attempt to achieve
a minimum price by using serial logic
and a minimum price memory design.
However, the performance of the PDP8/S was slow.

3.

The market pressures created by PDP8/S performance probably caused the return to the PDP-8 design, but in an integrated circuit implementation, the
PDP-8/1.

4.

The PDP-8/1 was relatively expensive,
so the PDP-8/L was quickly introduced
to reduce cost and bring the design into
line with market needs and expectations.

5.

The PDP-8/E was introduced as a high
performance machine that would permit
the building of systems larger than those
possible with the PDP-8/L.

6.

The PDP-8/M was a lower cost, smaller
cabinet version of the PDP-8/E and was
intended to meet the needs of the OEM
market.

~--~~

______ I

~r-

(a) Negative (PDP-5. 8. 8/S) and
positive (8/1. 8/L) logic families.

(b)

Omnibus family (PDP-8/E. 8/F. 81M. 8/A).

I/O IE.G. SERIAL LINE)

(c)

CMOS-8 (61 00) processor-on-a-chip family.

T(VT52. integral CRT)

L (SERIAL LINE. ",2"3l

16 KWORO
COMPUTER
ON·A·
BOARO
IUSES
CMOS·B)

L (PARALLEL; TO PRINTER, ETC.)

Ms (program load)

Ms!'O 1. floPPV)

The design goal of machines subsequent to
the PDP-8/M has been primarily one of price
reduction. The PDP-8/ A was introduced to further reduce cost from the level of the PDP-8/E
and PDP-8/M, although some large system
configurations are still built with PDP-8/E machines. The CMOS-8 chips represent a substantial cost reduction but also a substantial
performance reduction. The CMOS-8 performance is one-third that of a PDP-8/ A, so a standalone system using a CMOS-8 is less cost-effective than an PDP-8/ A when the central processor is used as the only performance criterion.

(d)
Figure 28.

VT78 computer-in-a-terminal.

Evolution of PDP-8 Family PMS structures.

The main reason for. using large-scale integration is the reduced cost and smaller package rather than performance. Obviously, the
next step is increased performance or more
memory, or both more performance and more
memory on the same chip.

CD

CO

OJ

m

G)

Table 2. Characteristics of PDP·S Family Computers

Z
Z
Z

PDP-5

PDP-8

PDP-8/S

PDP-8/1

PDP-8/l

PDP-8/E

PDP-81M

PDP-8/A

VT78

G)

First ship

9/63

4/65

9/66

4/68

11/68

3/71

6/72

1/75

6/77

0

Goals

lowest cost Cost, much
computer,
greater
interfaceperformance
ability

Cost;
tabletop

Better cost,
more function
than 8

lower cost

Easy to configure; Lower cost,
limited system
more functions
better performance

Lower cost
higher density

Cost; complete
system in a
terminal

"I-l
m

~

Z

(")

Applications

Process
+message
control
switch control
monitoring; Lab.processing
laboratory
for
instruments

Standalone
calculator

Innovations/improvements

1/0 bus;
ISP

Serial
implementation

Processor + 4 Kword .
memory (K$)

25.8

16.2

8.79

11.6

Same + terminal (K$)

27.0

18.0

9.99

12.8

Pricelmemory word ($)

1.83

1.83

0.73

Processor + 8 Kword +
terminal + mass storage

51.1

38.8

30.4

Wire-wrap;
producible;
low cost bitsample communications
controller

Remote job
entry station,
TSS/8

Computer-in-a- Word processing;
desk
desk-top computer, terminal

+Business data
processing,
testing;

s:

"'tl

C

-l

m

:::0

Integrated
circuits

Less package

7.0

Omnibus

Semiconductor Processor-on-amemory;
chip; low
floating-point
power
processor

2.6

NA

4.99

3.69

8.5

6.49

5.19

4.1

NA*

1.46

0.98

0.73

0.61

33.0

NA

28.9

24.1

15.3

11.6

Memory cycle time

6.0

1.6

8.0

1.5

1.6

1.3

1.3

1.5

3.6

Processor Mwords

0.1

0.63

0.04

0.67

0.63

0.76

0.76

0.67

0.28

93

466

55

651

1080

1828

2472

3092

accessedls
Processor bits accessed/s 1$

0

Table 2. Characteristics of PDP-8 Family Computers (Cont)
PDP·5

PDP·8

PDP·8/S

PDP·8/1

PDP·8/l

PDP·8/E

PDP·8/M

PDP·8/A

Performancelprice improve·
ment (over predecessor)

5.0

0.12

11.8

1.65

1.69

1.35

1.25

Price Improvement

1.6

1.84

0.76

1.66

1.4

1.35

1.42

6.3

0.06

16.75

0.94

1.23

1.0

0.87
2+

Performance Improvement

VT78

0.42

Product life (years)

3

5

3

3

7+

5+

Power (watts)

780

780

350

780

250

500

450

400

25

Weight (lbs)

540

250

75

190

70

90

40

55

2

Volume (ft3)

24

8

3.2

8

2

2.2

1.8

1.2

0

Price density (lb/$)

0.02

0.015

0.009

0.016

0.01

0.018

0.011

0.021

Density (lb/ft3)

22.5

31.3

23.4

23.75

35.0

40.9

22.2

45.8

Printed circuit board
average price ($)

2100

240

240

120

Board size

5.25X4

2.25X 3.875

5X38

2.25 X 3.875

Programmed 1/0 Bus

49

49

43 + Bus

40

30

DMA 1/0 Bus

49

49

49

50

50

1600

-1
I

m

-U

0
20

7J

CO

l>

8X 10

8X10

8X15

12X 15

96

8E

8E

5 connectors

Z
0

0

-1
I

m

::0
N

OJ

=i
n
0

s:-U
C
-;

m

::0
(fJ

to
to

200

BEGINNING OF THE MINICOMPUTER

Figure 29 and Table 2 present the power requirements, weight, and volume of the 12-bit
machines. In general, the power requirements
have remained relatively constant. This is both
because each package must house a fixed number of devices and because each device has a relatively high overhead power cost associated
with driving the Omnibus. However, the limited
configuration, lack of an Omnibus, and low
power requirements of CMOS make the VT78
an exception to this rule. The weight and volume have declined significantly with time as the
design has moved from two cabinets to a half
cabinet, and then from a half cabinet to being
em bedded in a terminal.

r--------------------------------.

1000
900 -

aoo 700 600 PDP·5.

:~

~;
70

:o,~'

\)\'

f--

PDp·8/S

. / \
PDP-8/l

60 L

:DP8/A

50 I40

. /

f--

PDP-81M
30 L

SPECIAL DEVICES BASED ON THE
PDP-8

20 -

The PDP-8/ A and the products which incorporate the CMOS-8 chip are the current 12bit product offerings, so the discussion of the

10~~~~~_ _1~~1~1~1__~1~1~1_~1~1~1__~1
64

66

66

67

69

68

70

71

72

73

74

75

76

77

7a

YEAR

(b)

Weight versus time.
100r-----------------------------~

90
80

1000r-----------------------------~

900

70
PDP-S

800

pop-all

.\1\;.

PDP-5
700
600
500

60
50
40

":':::"._PDP-a/A

400

.

PDP-aiM

30

.

PDP-a/S

300

PDP-5

pop-all

200

.\

10
9
100
90

BO

60

.

PDP-S/I

50
PDP SIS

PDP-alE

30

PDP~'
VT7B

PDP-S/M'"
• PDP-alA

20
64

65

66

67

68

69

70

71

72

73

74

75

76

77

7S

YEAR

84

65

66

67

68

69

70

71

72

73

74

75

76

77

7a

(c)

Volume versus time.

YEAR

(a)

Power versus time.

Figure 29. Power. weight. and volume for DEC's
12-bit computers versus time.

THE PDP-8 AND OTHER 12-BIT COMPUTERS

development of DEC's I2-bit computers in
chronological order must stop here. However,
during the development of the main line of 12bit computers, some interesting systems based
on DEC 12-bit processors have been developed,
both by DEC and by others. Among these are
the DEC 338 Display Computer, the cachebased PDP-8, and the PDP-I4 Programmable
Controller (a I-bit machine similar in its instruction set to the PDP-8 and using Omnibus
packaging concepts).
DEC 338 Display Computer

The 338 display, a variant of the PDP-8, is
interesting for its historical importance [Bell
and Newell, 1971: Chapter 25]. It was one of the
earliest display processor-based computers - if
not possibly the first. The problem of displaying
data on a cathode ray tube clearly shows how
the application need drives a complete change
in hardware in order to interpret the necessary
data-type (in this case, a graphic picture).
The 338 display idea was extended and applied to the displays used with the PDP-9, PDP15, and the PDP-II series. Although the 338
had the right general capabilities, it did not
have the refinements of later display processors
for the PDP-IS and PDP-II (GT40 and GT60).
An observation that display and other specialized processors evolve in a fashion called the
"wheel of reincarnation" [Myer and Sutherland, 1968] is diagrammed in Figure 30. As the
figure shows, the process starts with a very
simple basic design - here, to have graphics picture output for a computer. The trajectory
around the wheel follows:
Position 1: Point-plotting. The computer
includes a single instruction display controller
which can plot a picture on a point-by-point
basis under command of the central processor.
For most displays, except storage scopes, the
processor can barely calculate the next point
fast enough to keep the display refreshed.
Hence, the system is processor bound, and the
display may be idle. The original PDP-I display

201

is typical of this position, and a display of this
type is offered on most DEC minicomputers.
Position 2: Vector-plotting. By adding the
ability to plot lines (i.e., vectors), a single instruction to the display processor will free some
of the processor and begin to keep all but the
fastest display busy.
Position 3: Character-plotting and alphanumeric plotting. With the realization that

characters are a major part of what is displayed,
commands to display a character are added,
further freeing the processor. Many of the
point-plotting displays were extended to have
character generation capability.
Position 4: General figure and character
display. In reality, a picture does not consist of

just characters and vectors; each element of the
picture is actually a string of characters and a
set of closed or open polygons to be displayed
starting at a particular point. By providing the
control display with a Direct Memory Access
channel, the display can fetch each string of text
and generate polygons without involving the
central processor.
Position 5: Display processors. With the
ability to put up sub-pictures with no processor
intervention, it is easy for the whole picture to
be displayed by linking the elements together in
some fashion. This merely requires "jump" and
"subroutine" call instructions so that common
picture elements do not have to be re-defined.
The 338 and other display processors have
roughly this capability.
Position 6: Integrated display and central
processor. Now, all the data paths and states

are present for a fully general purpose processor
so that the central processor need never be
called on again. This requires a slightly more
general purpose interpreter. By minor perturbations, the processor design can be refined
in such a way as to execute the same instruction
set as the original host computer because the
cost of incompatibility is too great. Two processors require two compilers, diagnostics, manuals, and support for use. This state provides
the same capability as that shown in Position 1.

202

BEGINNING OF THE MINICOMPUTER

~

B-rG
La--e----8-

8-:=

(NOTE FIRST
PROCESSOR IS
FREE)

en

.r;;

"C«

""10

...;z

~2
~t;

0:=J

~~

xz
>-..,.

~~

~z
a:0

...

"C

0;:

e

a: U
0:=J
en a:
",>",en
uZ
0a:Z

Uen

-z

tug

:;>-

xU

~~

... 0

a:>-

«en

.

**

Processor.state

**

PC\Program.C ounter.
SR \Suhrnutine RetllrnRegi<;ter.
Test\One.Bit.Accumulator< >.
IR\lnstruction.Register.
Op\Operation.Code<0:3>
: = IR <0:3>.
Z\Effective.Address<4:11>
:= IR<4:11>.

**

Input.Output.State

('{"\mnlltp npU! Al1tnl1t
"""....,.&&.y_ ... "" .I."'""", VL.4"'Y"""'"

**

1\lnput.Contacts[O:255]< >.
O\Output. Relays[O:255] < >.

**

Instruction.Cycle

**

I Exec\1 nstruction .Execution
Begin
DecodeOp =?
Begin
I Test input for ON
'OIOI\TXN:= If Not Test And I[Z]=? Test = I.
I Test input for OFF
'OIOO\TXF := If Not Test And Not I[Z]=? Test = I.
I Test output for ON
'0011 \ TYN : = If Not Test And O[Z] =? Test = I.
I Test output for OFF
'OOIO\TYF := If Not Test And Not O[Z]=? Test = I.
~ Jump if Test ON
'lOll \JFN:= (lfTest =? PC<4:11 > = Z; Test = 0).
! Jump if Test OFF
'IOIO\JFF := (If Not Test =? PC<4:11> = Z; Test = 0).
I Set Output ON
'Olll\SYN:=O[Z]= I.
! Clear Output
'0110:= (If Z Neq #377 =? O[Z] = 0; If Z Eql #377 = > 0[0:255] = 0).
! Jump
'IOOO\JMP:= IfZ Eql #224 =? PC = Mp[PCJ,
! Jump to Subroutine
'IOOI\JMS:= IfZ Eql #245 =? (SR = PC next PC = Z).
I Return from Subroutine. Skip
'0000:= (lfZ Eql #154 =? PC = SR; IfZ Eql #144 =? PC = PC + I).
Otherwise: = No.Op( )
End
End.

ICycle\lnterpretation.Cycle : =
Begin
Repeat (IR = Mp[PC] next PC = PC
End
End

+ I next I Exec(

205

))

Figure 32. ISP description of the PDP-14 (courtesy of
Mario Barbaccil,

power relays, appropriate I/O interfaces were
designed.
The instruction set of the PDP-14, shown in
Figure 32, was among the smallest, most trivial
instruction sets that could be found. Technically, the PDP-14 was called a computer because it could perform computation in the same
way a Turing machine can - without an arithmetic unit. However, it encoded the Boolean

".,111<3<'
tt1-.""
.. ~",1-.t
yu.l1...
\1.1.1'-' 11511 ...
·,."L)

";~",, ~f' ~

r:)lU\".I VI

a.

Boolean equation). Therefore, the PDP-14 also
could simulate a sequential machine (state diagram or flowchart). Two additional instructions
sensed the value of intermediate results (stored
in TEST) and were used to eliminate the need to
completely evaluate an equation each time. To
direct program flow, there were four more instructions: "jump," "skip," "jump to subroutine" (a single level) and "return from
subroutine." To handle the "accessories box,"
there was special I /0 rather than having this
carried out internally to a program. This I/O
included up to 16 Boolean variables for timers
consisting of external one-shot multivibrators,
and control memory bits.
A good way to understand the PDP-14's operation is to start with the application. Figure
33 shows a combinational relay logic network
that evaluates a Boolean expression (in parallel). When this network is implemented with the
PDP-14, the inputs and outputs are simply connected, and the program forms the interconn~ction which constitutes the solution of the
equation (Figure 33b). Figure 33c gives the
Boolean expression for the network in Figure
33a. To evaluate this equation using a PDP-14
requires a sequential program (Figure 33d).
This program requires between 120 microseconds and 200 microseconds to compute the
output value, y8, since each instruction requires
20 microseconds. The speed of a computerized
controller compared to that of relay operations
is phenomenal. Heavy duty industrial control
relays typically operate at a 30 Hz rate (33 milliseconds). If the PDP-14 can solve each equation with 4 terms in approximately 150
microseconds, the PDP-14 can solve 222 such
equations in the time necessary to operate the
relay. The memory requirements to solve the
222 equations are not large either. This equa-

206

BEGINNING OF THE MINICOMPUTER

.

4 LS (n.o.)

~
SOLENOID B

5 LS (n.c.)

7 PB (n.c.)

(a) Ladder diagram representation of a solenoid
activated by two push buttons and two limit switches.

tion required 12 locations; hence, 222 such
equations require about 2.5 K words .
A number of PDP-14s were built and installed for the intended applications over the
period 1970 to 1972. Programming was carried
out in languages supported by compilers that
operated on PDP-8. The languages allowed
users to:
1.

yB = (X6 1\ X4) v h

X7 1\ ..., X5)

2.

(b) Boolean equation expressing behavior of ladder
diagram.

3.
4.

SOLENOID B
~

VB

4 LS In.o.)

PDp·14

5 LS In.o.)

5.

(c) Contact input (using normally open contacts) and
solenoid output connections to PDP-14.

ADDRESS

INSTRUCTION ISEE NOTE)

40

TXF 6
TXF 4

42

JFF 50

43

TXN 7

•

44

= (-"JX6

V ...,X4J.

IX611 X5)
(-,X6 V .....,X4)

TXN 5

TEST = X7 V X5

JFF 50
IX7V X6)

(-,X7 V -,X5)

+

46

SVF 8

47
50

SKP
SVN 8

51
52

COMMENT
TURN TEST ON IF EITHER X6 OR X4 IS OFF.
TEST

TURN SOLENOID OFF IF
I ..,X6 V -,X4) II IX7 V X5)
TURN SOLENOID OFF IF
(X6 II X4) V I .... X7 II .... 7X51.
RETURN TO SCAN CONTACTS AGAIN.

{:P

NOTE:
Assume TEST

=

OFF initially.

(d) PDP-14 program to simulate solenoid network by
sequentially (and repeatedly) solving Boolean equation
(33b).

Figure 33. Combinational network representations for
solving Boolean equations.

Write ordinary assembly programs (resembling PDP-8 programs).
Express a problem directly as a set of
Boolean equations.
Express ladder diagrams (in effect, these
are a set of Boolean equations).
Write a program as a flowchart, i.e., as a
sequential machine that goes state by
state and tests and branches on various
input values to create output state, permitting both combinational (Boolean
equations) and sequential circuits to be
implemented.
Simulate the behavior of the program
and system.

As the PDP-14 and contemporary machines
were used, the demand arose for a second generation controller. By 1972, the additional requirements included lower cost, higher speed,
an easily changed read-only memory, and the
ability to load programs via a communications
line or connected console. In addition, the controllers were required to connect in a network
fashion and report back status and results to a
supervisory computer at the next level of a hierarchy. The second generation controller should
be capable of recording events such as counting
the number of parts processed. It also needed
timers which could be used as part of the control equations. The new unit should operate
over an even wider environmental range than
existing PDP-14 and have a more complete set
of I/O interfaces.
From these requirements, the PDP-14/30
evolved (Figures 34 and 35). The initial readonly memory was replaced by an 8-Kword core

THE PDP-8 AND OTHER 12-BIT COMPUTERS

INPUT

AC OUTPUT
SWITCHING

CONTROLLER

VT14

POWER
SUPPLY

Figure 34.

The PDP-14/30.

EXTERNAL INTERFACES
j.:;;.:;p~~ ~;GHS~~
L ~~T ...J L. P~T_-l

128
128
128
128
128
128

Figure 35.

Block diagram of PDP-14/30.

INPUTS
INPUTS
INPUTS
INPUTS
OUTPUTS
OUTPUTS

207

208

BEGINNING OF THE MINICOMPUTER

memory. In this way, the programs could be
easily changed rather than having to be returned to DEC for manufacturing. Because the
original PDP-14 was so slow compared to the
capability of the logic from which it was made,
the instruction time was reduced from 20 microseconds to 2.5 microseconds to achieve better
frequency response and to handle a larger number of equations. Additionally, because a large
number of special registers had been added to
hold numeric values (the shift registers, timers
and counters), an arithmetic unit was added to
the PDP-14/30 in an ad hoc fashion. All these
additions forced the instruction set processor to
change. The PDP-14/30 extensions could not
be made in such a way as to have binary compatibility; thus, software changes were also required.
An interesting offshoot of PDP-14 development was the creation of a special terminal
for a programming, program load and observation console. This terminal consisted of a CRT
and PDP-8 mounted in a portable housing.
Since the PDP-14/30 could report the status of
its input and output variables, the terminal also
had the ability to display the status of ladder
diagrams (i.e., relay and contact position). A
typical screen display is shown in Figure 36.

A t the time when the PD P-14 / 30 was proposed, there were some who felt that it should
not be built because a standard 8 Family computer was cheaper to build, and more production volume and lower costs could be obtained
by not constructing a special unit. In addition,
the 8 Family machine could be extended to have
the original PDP-14 instruction set; and the
PDP-8 instruction set would be available for
evolving tasks, such as self-diagnosis, more extensive counting and timing functions, and
dealing with non-Boolean data such as time, or
non-discrete events including angular position.
The more powerful PDP-8 instruction set
would also be useful for handling general control in both the analog and the digital domains
communicating with computer networks requiring protocol control for intelligent and error-free communication, and using algorithms
to encode the control function instead of relatively large program state methods with no ability to perform computation.
Many of the previous arguments against using PDP-8s had now lost their merit. Since the
PDP-14/30 was proposed to be built using the
same circuit family as that of the PDP-8s, the
electrical noise margins arguments no longer
held. Furthermore, the PDP-8 could be packaged in a proper cabinet for the physical environment, and there could be adequate
interfaces built. Besides, the proposed PDP14/30 would incorporate a PDP-8 anyway, and
two computers were obviously more expensive
than one. In addition, adding the necessary cabinet and interface enhancements to the PDP-8
would have greatly improved the marketability
of PDP-8 for all industrial applications. Although the design group did not buy the arguments that the PDP-14/30 should become a
PDP-8 with appropriate extensions and packaging, some PDP-8 parts were used in the PDP14/30 design.
ACKNOWLEDGEMENTS

Figure 36.

Typical screen display.

The authors were pleased to have Wes Clark
and Dick Clayton read and critique this chapter.

Structural Levels of the PDP-8
C. GORDON BELL, ALLEN NEWELL,
and DANIEL P. SIEWIOREK

The history of the DEC 18-bit and 12-bit
computers, summarized briefly in the previous
two chapters, was basically that of a recursive
process in which new technology was applied
and re-applied to the same basic designs to obtain improved price/performance ratios. In the
late 1960s, the availability of relatively inexpensive integrated circuits made logic cost a
less pressing concern. Computer engineering,
and architectural issues of elegance, flexibility,
and expandability, grew more important as the
importance of architecture to total system.
price/performance became more evident. The
PDP-ll papers in Part III elaborate on these
issues, but first the hierarchical nature of computer systems design will be explored by examining the PDP-8 from the top down to lay the
basic groundwork for future architectural discussions. The description of the PDP-8 will use
some of the processor-memory-switch (PMS)
and instruction set processor (lSP) notations introduced in Computer Structures [Bell and
Newell, 1971]. These compact and straightforward notations are useful in comparing and
analyzing computer architectures, and their use
in the PDP-8 context should be helpful to the

reader when encountering these notations in
other papers.
A map of the PDP-8 design hierarchy, based
on the Structural Levels View of Chapter 1, is
given in Figure 1, starting from the PMS structure, to the ISP, and down through logic design
to circuit electronics. These description levels
are subdivided to provide more organizational
details such as registers, data operators, and
functional units at the register transfer level.
The relationship of the various description
levels constitutes a tree structure, where the organizationally complex computer is the top
node and each descending description level represents increasing detail (or smaller component
size) until the final circuit element level is
reached. For simplicity, only a few of the many
possible paths through the structural description tree are illustrated. For example, the path
showing mechanical parts is missing. The descriptive path shown proceeds from the PDP-8
computer to the processor and from there to the
arithmetic unit or, more specifically, to the Accumulator (AC) register of the arithmetic unit.
Next, the logic implementing the register transfer operations and functions for the jth bit of
209

210

BEGINNING OF THE MINICOMPUTER

PMS

PROGRAMMING

: STATE

REGISTER
TRANSFER

o REGISTER

LOGIC

--"--~-~~-~u-EI~-!I-AL-~\-1i'!~RAY

o CoONTROL

SWITCHING
CIRCUITS

:

• SE

r

00. 0o. '" 0....,..e

....................
FLIP-FLOP 1101

STATE

_ _ _ _ _ _40----!-0- - - - - - - - ' ! ' O : , . . . - - - - - I - - - - - - - - - - I ~;~:~M
c
COMBINATIONAL
CIRCUITS

•

~o
~~E(~~:~KI

:
ELECTRICAL
CIRCUITS

;100 (DATA
CC

19.131

OPERATION)

NAND
1111

• INVERTERI"I

~

DIODE

•

MULTIVIBRATORl101
(ACTIVE COMPONENT)

TRANSISTOR

A

C

R (PASSIVE COMPONENT)

IXI indicates figure number of instance.

Figure 1.

PDP-8 hierarchy of descriptions.

the Accumulator is given, followed by the flipflops and gates needed for this particular implementation. Finally, on the last segment of the
path, there are the electronic circuits and components from which flip-flops and gates are
constructed.
ABSTRACT REPRESENTATIONS

Figure 1 also lists some of the methods used
to represent the physical computer abstractly at
the different description levels. As mentioned
previously, only a small part of the PDP-8 description tree is represented here. The many
documents which constitute the complete representation of even this small computer include
logic diagrams, wiring lists, circuit schematics,
printed circuit board photo etching masks. pro-

duction description diagrams, production parts
lists, testing specifications, programs for testing
and diagnosing faults, and manuals for modification, production, maintenance, and use. As
the discussion continues down the abstract description tree, the reader will observe that the
tree conveniently represents the constituent objects of each level and their interconnection at
the next highest level.
THE PMS LEVEL

The PDP-8 computer in PMS notation is:
C('PDP-8; technology:transistors; 12 b/w;
descendants:'PDP-8/S, 'PDP-8/1, 'PDP-8/L,
'8/E, '8/F, '81M, '81 A, 'CMOS-8;
antecedents: 'PDP-5;
Mp(core: #0:7: 4096 words: tc: 1.5 fLs/word):

STRUCTURAL LEVELS OF THE PDP-8

Pc(Mps(2 to 4 words);
instruction iength: 1 12 words;
address/instruction: 1;
.
operations on data:(=, +, Not, And, Mmus
(negate), Srr 1(/2), SIr 1 (X2), + 1)
optional operations:(X,/ ,normalize);
data-types:word,integer,Boolean vector;
operations for data access:4);
P( display; '338);
P(c; 'LINC);
S('I/O Bus; 1 Pc; 64 K);
Ms(disk, 'DECtape, magnetic tape);
T(paper tape, card, analog, cathode-ray tube)

As an example of PMS structure, the LINC8-338 is shown in Figure 2; it consists of three
processors (designated P): Pc(,LINC),
Pc('PDP-8), and P.display('338). The LINC
processor described in Chapter 7 is a very capable processor with more instructions than the
PD P-8 and is available in the structure to interpret programs written for the LINC. Because of
the rather limited instruction set being interpreted, one would hardly expect to find all the
components present in Figure 2 in an actual
configuration.
The switches (S) between the memory and the
processor allow eight primary memories (Mp)
to be connected. This switch, in PMS called
S('memory Bus; 8 Mp; 1 Pc; time-multiplexed;
1.5 ~s/word), is actually a bus with a transfer
rate of 1.5 microseconds per word. The switch
makes the eight memory modules logically
equivalent to a single 32,768-word memory
module. There are two other connections (a
switch and a link) to the processor excluding the
console. They are the S(,I/O Bus) and L(,Data
Break; Direct Memory Access) for interconnection with peripheral devices. Associated
with each device is a switch, and the I/O Bus
links all the devices. A simplified PMS diagram
(Figure 3) shows the structure and the logicalphysical transformation for the I/O Bus, Memory Bus, and Direct Memory Access link. Thus,
the I/O Bus is:
S('I/O Bus duplex; time-multiplexed; 1 Pc; 64 K;
Pc controlled, K requests; t:4.5 IlS/W)

211

The I/O Bus is nearlv the same for the PDP5, 8, 8/S, 8/1, and 8/L. Hence, any controller
can be used on any of the above computers provided there is an appropriate logic level converter (PDP-5, 8, and 8/S use negative polarity
logic; the 8/1 and 8/L, positive logic). The I/O
B~s is the iink to the controllers for processorcontrolled data transfers. Each word transferred is designated by a processor in-out transfer (lOT) instruction. Due to the high cost of
hardware in 1965, the PDP-8 I/O Bus protocol
was designed to minimize the amount of hardware to interface a peripheral device. As a result, only a minimal number of control signals
were defined with the largest portion of I/O
control performed by software.
A detailed structure of the processor and
memory (Figure 4) shows the I/O Bus and Data
Break connections to the registers and control
in the notation used in the initial PDP-8 reference manual. This diagram is essentially a functional block diagram. The corresponding logic
for a controller is given in Figure 3 in terms of
logic design elements (ANDs and ORs). The
operation of the I/O Bus starts when the processor sends a control signal and sets the six I/O
selection lines (lO.SELECT <0:5» to specify a
particular controller. Each controller is hardwired to respond to its unique 6-bit code. The
local control, K[k], select signal is then used to
form three local commands when ANDed with
the three lOT command lines from the processor. These command lines are called
10.PULSE.1, 10.PULSE.2, and 10.PULSE.4.
Twelve data bits are transmitted either to or
from the processor, indirectly under the controller's control. This is accomplished by using
the AND/OR gates in the controller for data
input to the processor, and the AND gate for
data input to the controller. A single skip input
is used so that the processor can test a status bit
in the controller. A controller communicates
back to the processor via the interrupt request
line. Any controller wanting attention simply
ORs its request signal into the interrupt request

212

BEGINNING OF THE MINICOMPUTER

T.console

T(Teletype: 10 char/s; B b/char; 64 char)

I

T(paper tape: (reader: 300 char/s (punch:
100 charls): 8 b/char)

T(incremental point plot; 300 point/s; 0.01
in/point)

T(card; reader; 200 I BOO card/min)

Tlcard: punch; 100 card/min)

T (line; printer; 300 line/min; 120 col/line;
64 char/col)
S(DMOl Data
Multiplexer;
radial;
from: 7 p, K:
to: Mp)

T(CRT; display: area: lOX 10 in 2 1 5 X 5 in 2 ;
3 lis/point: 0.01 0.05 in/point)
I

T(light; pen)

T(Dataphone: 1.2 - 4.B kb/s)

L(analog: output; 0 -

-10 volts

L(#0:63; analog; input 0 - -10 volts)

K(#0:63; Teletype; 110,180 b/s)

Ms (#0:7; 'DECtape; aOdressable magnetic tape:
133 !,s/w; length: 260 It; 350 char/in; 3 b/char)

Ms(#0:7; magnetic tape; 361451751 112.5 in/s;
200,556,800 b/in: 61 8 b/char)

Ms(#0:3; fixed head disk; t,delay: 0 - 17 ms;
66 !,s/w; 32768 w) I (16 !,s/w; 262144 wi;
112, 1 parity)b/w)

t-"'T""------I TI#0:3;

CRT; display; area: 10 X 10 in 2

I~_T(_0_:3_;~li9~h_t;~p_e~n)

____________________

~~

T(#0:3; push buttons; console)

T.console

Ms(#O:l; L1NCtape: addressable magnetic tape:
6.25 kw/s: 217 w)
NOTES
1. Mp (core: 1.5!,s/w: 4096 w; (12
2

+

l)b)

Sl'Memory Bus)

3 Pc(l - 2 w/instruction: data: w, i, by: 12 b/w:
M.processor state 12-1/2 - 3-1/2) w:
technology transistors;
antecedents: PDP-S, descendants:
PDP-8/S, PDP-8/I, PDP-B/L, Omnibus family
4 5('110 Bus: from: Pc: to: 64 K)
5

K(l - 4 instructions: M buffer 11 char -

Figure 2,

L
I TI#O: 15: knobs, analog, input)
~.------------~--------------~~
~1_T_(C_R_T_;d_is_p_la_y_:5__X_5_i_n2~)______________~~

I Tldigital; input, output)

T('Data Terminal Panel; digital: input, output)
2 wil

LlNC-8-338 PMS diagram,

I-

~.-------------------

STRUCTURAL LEVELS OF THE PDP-8

K.select . = 00 SElECT Eqv k)

IO.PUlSE.P1 And K.select
(used fa, 10.SKIPlkl ,., PC = PC

+

J--I-r-_

IO.PUlSE.P2 And K.select
(used fa, AC = INPUT.DATA Ikll

t--H-r-

IOPULSE.P4 And K.select
(used fo, OUTPUT.DATAlkl = ACI

INTERRUPT. REQUEST Ikl

10.SKIP FLAG Ikl

It21

r-

...__-+--+_} "'" """""

1" ,"'"

To next K
dot "0," connection to bus.

Ks for slow-data-rate. program-controlled data transfers
Kf for high-data-rate. direct-memory-access transfers

Figure 3.

PDP-8 S('I/O Bus) logic and PMS diagrams.

11

213

214

BEGINNING OF THE MINICOMPUTER

SKIP

1>

PERIPHERAL
EQUIPMENT
I/O BUS
DATA
SWITCHES
12

1/0 BUS
PERIPHERAL
EQUIPMENT
USING
PROGRAMMED
TRANSFERS

......

ADDRESS

-

AC
DATA (121

-

TELETYPE
MODEL33
ASR
8

--

-

~

SELECT
CODE
(MBI
(61

OUTPUT
BUS
DRIVERS

PROGRAM
COUNTER
12

LINK
1

~

1""-

TELETYPE
CONTROL

DATA (81

J"-.

4096·WORO
CORE
MEMORY
ACCUMULATOR
12

DATA (12)

{

CLEAR AC

-{)
-{)

DATA (121

r
~

~

'--

V

PE RIPHERAL
EQ UIPMENT
1/0

PROGRAM
COUNTER
CONTROL

-

~
MEMORY
BUFFER
REGISTER
12

AC
CONTROL

OUTPUT
BUS
DRIVERS

4

~

-

H

""'""'0'

DATA (121
INCREMENT MB

r--

REGISTER
3

",....

V
INHIBIT CURRENT ADDRESS COUNT
TRANSFER DIRECTION (NOTE 1)
PE RIPHERAL
EQ UIPMENT
US ING THE
DA TA BREAK
FA CILITIES

<

~

MEMORY
ADDRESS
REGISTER
12

MB
CONTROL

WORD COUNT OVERFLOW

~

BREAK REQUEST

.r-

CYCLE SELECT (NOTE 21

-0
_

-

BREAK STATE

~

ADDRESS ACCEPTED

MAJOR
STATE
GENERATOR

ADDRESS (12)

......
MA
CONTROL

.....
lOP 1. 2. AND 4 PULSES (3)

RUN

PERIPHERAL
EQUIPMENT1/0 BUS

~POWER

RUN AND PAUSE
CONTROL
CLEAR PULSES

.....
~

....

lOP PULSE
GENERATOR

SPECIAL PULSE
GENERATOR

POWER CLEAR PULSE
GENERATOR

Tl AND T2 CLOCK PULSES (2)

TIMING SIGNAL
GENERATOR

PROGRAM INTERRUPT REQUEST

.c":
SKIP INTERROGATION RESPONSE

,.

I/OSKIP

• Part of ISP.

--+-

FLOW DIRECTION

NOTES'

--{>

DEC STANDARD POSITIVE PULSE (-3 VOLTS TO GROUNDI

1. Transfer direction is into PDP-S
when -3 volts. out of PDP-S when ground.
2. Data break request for threecycle break when ground or onecycle break when - 3 volts.

Figure 4.

--

PROGRAM INTERRUPT
SYNCHRONIZATION

PDP-8 processor block diagram.

- - - - - . DEC STANDARD NEGATIVE PULSE (GROUND TO -3 VOLT-I

--<>

DEC STANDARD GROUND LEVEL SIGNAL

~

DEC STANDARD -3 VOLT LEVEL SIGNAL

NUM8ERS IN REGISTERS SIGNIFY WORD LENGTH

STRUCTURAL LEVELS OF THE PDP-8

signal. Normally, the controller signal causing
an interrupt is also connected to the skip input,
and skip instructions are used in the software
polling that determines the specific interrupting
device.
The Data Break input for Direct rv1emory
Access provides a direct access path for a processor or a controller to memory via the processor. The number of access ports to memory can
be expanded to eight by using the DMOI Data
Multiplexer, a switch. The DMOI port is requested from a processor (e.g., LINC or Model
338 Display Processor) or a controller (e.g.,
magnetic tape). A processor or controller supplies a memory address, a read or write access
request, and then accepts or supplies data for
the accessed word. In the configuration (Figure
1), Pc('LINC) and P('338) are connected to the
multiplexer and make requests to memory for
both their instructions and data in the same way
as the PDP-8 processor. The global control of
these processor programs is via the processor
over the I/O Bus. The processor issues start and
stop commands, initializes their state, and examines their final state when a program in the
other processor halts or requires assistance.
When a controller is connected to the Data
Break or to the DMOI Data Multiplexer, it only
accesses memory for data. The most complex
function these controllers carry out is the transfer of a complete block of data between the
memory and a high speed transducer or a secondary memory (e.g., DECtape or disk). A special mode, the Three Cycle Data Break
(described in Chapter 6), allows a controller to
request the next word from a block in memory.
The DECtape was derived from M.LT.'s Lincoln Laboratory LINCtape unit, as indicated in
Chapter 7. Data was explicitly addressed by
blocks (variable but by convention 128 words).
Thus, information in a block could be replaced
or rewritten at random. This operation was unlike the early standard IBM format magnetic
tape in which data could be appended only to
the end of a file.

215

PROGRAMMING LEVEL (lSP)

The ISP of the PDP-8 processor is probably
the simplest for a general purpose stored program computer. It operates on 12-bit words, 12bit integers, and 12-bit Boolean vectors. It has
only a few data operators, namely, =, +, min us
(negative of), Not, And, SIr l(rotate bits left),
Srr 1 (2 rotate bits right), (optional) X, /, and
normalize. However, there are microcoded instructions, which allow compound instructions
to be formed in a single instruction.
The ISP of the basic PDP-8 is presented in
Appendix 1 of this book. The 2 12-word memory
(declared M[0:4095]<0;11» is divided into 32
fixed-length pages of 128 words each (not
shown in the ISPS description). Address calculation is based on references to the first page,
Page.Zero, or to the current page of the Program Counter (PC\Program.Counter). The effective address calculation procedure, called
eadd in Appendix 1, provides for both direct
and indirect reference to either the current page
or the first page. This scheme allows a 7-bit address to specify a local page address.
A 2 15-word memory is available on the PDP8, but addressing more than 212 words is comparatively inefficient. In the extended range,
two 3-bit registers, the Program Field and Data
Field registers, select which of the eight 212word blocks are being actively addressed as
program and data. These are not given in the
ISPS description.
There is an array of eight 12-bit registers,
called the Auto.Index registers, which resides in
Page.Zero. This array (Auto.lndex[0:7]<0
: 11 >:=M[#10: #17]<0: 11 » possesses a useful
property: whenever an indirect reference is
made to it, a 1 is first added to its contents.
(That is, there is a side effect to referencing.)
Thus, address integers in the register can select
the next member of a vector or string for accessing.
The processor state is minimal, consisting of
a 12-bit accumulator (AC\Accumulator

216

BEGINNING OF THE MINICOMPUTER

<0: 11 », an accumulator extension bit called
the Link (L \Link), the 12-bit Program Counter,
the RUN flip-flop, and the INTERRUPT.ENABLE bit. The external processor
state is composed of console switches and an
interrupt request.
The instruction format can also be presented
as a decoding diagram or tree (Figure 5). Here,
each block represents an encoding of bits in the
instruction word. A decoding diagram allows
one more descriptive dimension than the con-

ventional, linear ISPS description, revealing the
assignment of bits to the instruction. Figure 5
still requires ISPS descriptions for the memory,
the processor state, the effective address calculation, the instruction interpreter, and the execution for each instruction. Diagrams such as
Figure 5 are useful in the ISP design to determine which instruction operation codes are to
be assigned to names and operations, and which
instructions are free to be assigned (or encoded).

OPERATE GROUPS MICROCODED INSTRUCTIONS
i

=

group. 1 And i < j > And time [ 1. 2. 3. 4 )
6

7

B

9

10

11

TIME
l'

PRINCIPAL
ADDRESSABLE
INSTRUCTIONS

NEXT k---'----:::;.t-==-----r--....,...,:::::---------~~

EXTENDED
ARITHMETIC
ELEMENT. EAE.
INSTRUCTIONS
eae And time 131

= __--""'--_--L_ _......,"'---_ _ _ _

NEXT ......

#0 \

3,

and

NEXT

#1\

tad::::>

il2\

isz::::>

-==--~......;;"""I

toE:"---------=..,..-------===--k----,I

group.2 And i And time 11.2.3.1

#2\ muV::::>
10

TIME

#3\

dea::::>

#4\

jms::::>

'k--....,..---r---r---""'T"--"111:""---""'"
#3\ dvi::::>

NEXT

eae And i And time 11.2.3)
10

Operate. opr
TIME

'r----r'-=~------------'===-'"

:-------,
i Eqv

o

1

i<3> Eqv

group.l::::>

{

group.2
:::;> (-7--~~

i\instruction i <0:'1>

.=

p~b..&..I........_pa_g....
e.a_d...
dr_es...
s_~

o_P........&..I_ib.1

Lol. . . . . . .

INSTRUCTION WORO FORMAT

Figure 5.

#4\

nmi:::>

#5\

she::::>

......,==--....O::::::'----------:::=-'I,....--"-T"""-==-t

POP-8 instruction decoding diagram.

STRUCTU RAL LEVELS OF THE PDP-8

There are eight basic instructions encoded by
3 opcode bits of the instruction, that is~
op<0:2> := i<0:2>. Each of the first memory
reference six instructions, where the opcode is
less than or equal to 5, has four addressing
modes (direct Page.Zero, direct Current. Page,
indirect Page.Zero, and indirect Current.Page).
The first six instructions in the following four
categories are:
1.

2.

3.
4.

Data transmission.
"deposit and clear Accumulator" (dca).
(N ote that the add instruction, tad, is
used for both data transmission and
arithmetic.)
Binary arithmetic.
"two's complement add to the Accumulator" (tad).
Binary Boolean.
"and to the Accumulator" (and).
Program control.
"jump/set Program Counter" Ump);
"jump to subroutine" Ums); "index
memory and skip if results are zero"
(isz).

The subroutine calling instruction, jms, provides a method for transferring a link to the beginning (or head) of the subroutine. In this way
arguments can be accessed indirectly, and a return is executed by a "jump indirect" instruction to the location storing the returned
address. This straightforward subroutine call
mechanism, although inexpensive to implement, requires reentrant and recursive subroutine calls to be interpreted by software
rather than by hardware. A stack for subroutine
linkage, as in the PDP-II, would allow the use
of read-only memory program segments consisting of pure code. This scheme was adopted
in the CMOS-So
The "in-out transfer" instruction, opcode 6,
lOT (op Eqv #6), uses the remaining nine bits of
the instruction to specify instructions to in-

217

put/output devices. The six IO.SELECT bits
select 1 of 64 devices. Three conditional pulse
commands to the selected device, IO.PULSE.1,
IO.PULSE.2, and IO.PULSE.4, are controlled
by the lOT, io.control<0:2> operation code
bits. The instructions to a typical I/O device
are:
1.

Testing a Boolean Condition of an 10 Device.
If 10.PULSE.l
(If 10.SKIP.FLAG[IO.SELECT] =>
PC = PC + 1)

*

2.

Output data to a device from Accumulator.
If 10.PULSE.4
(OUTPUT.REGISTER[IO.SELECT] =
AC)

3.

Input data from a device to Accumulator.
If IO.PULSE.2
(AC = INPUT.REGISTER[IO.SELECT))

*

*

There are three microcoded instruction
groups selected by (op<0:2> Eqv #7), called
the operate instructions. The instruction decoding diagram (Figure 5) and the ISP description
show the microinstructions which can be combined in a single instruction. These instructions
are: operate group 1 «op<0:2> Eqv #7) And
Not ib) for operating on the processor state; operate group 2 «op<0:2> Eqv #7) And ib<3>
And i < 11 > ) for testing the processor state; and
the Extended Arithmetic Element group
(op<0:2> Eqv #7 And i<3> And i or <4: 11>, are extended instruction (or opcode) bits; that is, the
bits are microcoded to select additional instructions. In this way, an instruction is actually programmed (or microcoded, as it was originally
named before "microprogramming" was used
extensively). For example, the instruction, "set
link to I," is formed by coding the two microinstructions, "clear link" followed by "complement link."

218

BEGINNING OF THE MINICOMPUTER

If «op <0:2> Eqv #7) And (group Eqv 0)) =9 (
If i<5> =9 L = 0; Next
If i<7> => L = Not L )

Thus, in operate group 1, the instructions
"clear link, complement link, and set link" are
formed by coding i<5,7> = 10,01, and 11, respectively. The operate group 2 instructions are
used for testing the condition of the processor
state. These instructions use bits 5, 6, and 8 to
code tests for the Accumulator. The AC skip
conditions are coded as never, always, AC Eql
0, AC Neg 0, AC Lss 0, AC Leq 0, AC Geq 0
and AC Gtr O. The optional Extended Arithmetic Element (EAE) includes additional Multiplier Quotient (MQ) and Shift Counter (SC)
registers and provides the hardwired operations
"multiply," "divide," "logical shift left,"
"arithmetic shift," and "normalize." If all the
nonredundant and useful variations in the two
operate groups were available as separate instructions in the manner of the first seven (dca,
tad, etc.), there would be approximately 7 + 12
(group 1) + 10 (group 2) + 6 (eae) = 35 instructions in the PO P-8.

THE INTERRUPT SCHEME

External conditions in the input/output devices can request that the processor be interrupted. Interrupts are allowed if the processor's
interrupt enable flip-flop is set (If INTERRUPT.ENABLE Eqv 1). A request to interrupt
(i.e., INTERRUPT.REQUEST= 1) clears the
interrupt enable bit (INTERRUPT.ENABLE
= 0), and the processor behaves as though a
"jump to subroutine" 0 instruction Ums 0) had
been executed. A special lOT instruction
(i<0:11> Eql #6001) followed by a "jump to
subroutine indirect" to 0, and instruction
(i <0: 11> Eql #5220) returns the processor to
the interruptable state with INTERRUPT.ENABLE a 1. The program time to save
the processor state is six memory accesses (9 mi-

croseconds), and the time to restore the state is
nine memory accesses (13.5 microseconds).
Only one interrupt level is provided in the
hardware. If multiple priority levels are desired,
programmed polling is required. Most I/0 devices have to interrupt because they do not have
a program-controlled device interrupt-enable
switch. For multiple devices, approximately
three cycles (4.5 microseconds) are required to
poll each interrupter.
REGISTER TRANSFER LEVEL

More detail is required than is provided by
either the PMS or ISP levels to describe the internal structure and behavior of the processor
and memory. Figure 4 shows the registers and
controllers at a block diagram level, and Figure
6 gives a more detailed version using PMS notation. Table 1 gives the permissible register
transfer operations that the processor's sequential control circuit can give to the PDP-8 registers.
Although electrical pulse voltages and polarities are not shown in Table 1, the operations
are presented in considerably more detail than
shown in Figure 4. As Figure 6 shows, the registers in the processor cannot be uniquely assigned to a single function. In a minimal
machine such as the PDP-8, functional separation is not economical. Thus, there are not completely distinct registers and transfer paths for
memory, arithmetic, program, and instruction
flow. (This sharing complicates understanding
of the machine.) However, Figure 6 clarifies the
structure considerably by defining all the registers in the processor (including temporaries and
controls). For example, the Memory Buffer
(MB\Memory.Buffer; AC.

1-----------

1

;IO.SELECT
~J':8 <3:8>

,=

INTERRUPT. REQUEST;
IO.SKIP;
IO.PULSE.Pl.P2.P4.
POWER.CLEAR

V
MEMORY.BUS interface
(to':7 Mp modules)

DB DATA.BREAK
interface
TO REGISTERS
AND CONTROL

I~

~ONSOLE

IL

----..

KIMpsl contains STATE.REGISTER3. RUN. INTERRUPT.ENABLE

-

DATA TRANSMISSION FULL-DUPLEX.

---

CONTROL SIGNALS

Figure 6.

4

DIRECTED DATA TRANSMISSIONS

PDP-8 register transfer level PMS diagram.

tional physical registers, not part of the ISP,
are:
MB\Memory .Buffer
Holds memory data, instruction, and operands.
MA \Memory .Address
Holds address of word in memory being accessed.
IR\Instruction.Register
Holds the value of current instruction being performed.
State.Register
A ternary state register holding the major state of
memory cycle being performed - declared as 2
bits.
F\Fetch: =(If State. Register Eqv 0)
Memory cycle to fetch instruction.
D\Deferred: =(If State. Register Eqv 1)
Memory cycle to get address of operand.
E\Execute: =(If State. Register Eqv 2)
Memory cycle to fetch (store) operand and execute the instruction.

The emphasis in Figure 6 is on the static definition (or declaration) of the information paths,
the operations, and state. The ISP interpretation (Appendix 1) is the specification for
the machine's behavior as seen by a program.
As the temporary hardware registers are
added, a more detailed ISPS definition must be
given in terms of time and in terms of temporary and control registers. Instead, a state diagram (Figure 7) is given to define the actual
processor which is constrained by both the ISP
registers, the temporary registers implied by the
implementation, and time. The relationship
among the state diagram, the ISP description,
and the logic is shown in the hierarchy of Figure
1. In the relationships shown in the figures, one
can observe that the ISPS definition does not
have all the necessary detail for fully defining a

220

BEGINNING OF THE MINICOMPUTER

Table 1. PDP-8 Register Transfer Control Signals and
Data Break Interface
AC\Accumulator. L\link and combined L. AC LAC
AC = 0; AC = #7777; AC = Not AC; LAC = LAC
1
L = 0; L = 1; L = Not L;
LAC = LAC Srr 1; LAC = LAC Srr 2; ! rotates right
LAC = LAC Sir 1; LAC = LAC Sir 2; !rotates left
AC = AC Or SWITCHES; AC = AC And MB; AC = 10. BUS
AC = AC Xor MB; LAC = Carry (AC,MB);
(note that previous two commands form: LAC = AC + MB).

+

MB\Memory.Buffer
MB = 0; MB = MB + 1;
MB = PC; MB = AC; MB

= MIMAJ;

MA \Memory.Address
MA = 0; MA = PC; MA
MA = DB.ADDRESS.

MB

= MB;

= DB.DATA.

MA<5:11 >

= MA<5:11 >;

PC\Program.Counter
PC = 0; PC = PC + 1; PC<0:4> = 0;
PC = M B; PC < 5: 11 > = M B < 5: 11 > .
IR\lnstruction.Register
IR = 0; IR = MIMAJ<0:2>
M\Memory[O:40951

DB\DATA.BREAK interface
DB.DATA<0:11 >
DB.ADDRESS
MB<0:11 >
DB.REQUEST
DB.DIRECTION
DB.CYCLE.SELECT
ADDRESS.ACCEPTED
WORD.COUNT.OK
BREAK.STATE

! Input to MB
! Input to MA
! Control inputs to Pc

! Control outputs from Pc

physical processor. The physical processor is
constrained by actual hardware logic and lower
level detals even at the circuit level. For example, a core memory is read by a destructive
process and requires a temporary register (MB)
to hold the value being rewritten. This is not
represented within a single ISPS language statement because ISPS defines only the nondestructive transfer; however, it can be

considered as the two parallel operations MB =
M[MA]; M[MA] = O. The explanation of the
physical machine, including the rewriting of
core using ISPS, is somewhat more tedious than
the highest level description shown in Appendix
1. For this reason, the state diagram is used
(Figure 7), and the description of the physical
machine (in ISPS) is left as an exercise for the
reader.

STRUCTURAL LEVELS OF THE PDP-8

221

"FETCH" INSTRUCTION MEMORY CYCLE

o

~
Fl

If opr

*

If MB<3> And NotMB<11>
Begin
If skip.conditions Xor MB
PC = PC +2;
If skip.conditions Eqv MB<8>
PC = PC + 1 Next

*

*

*
*

If Not MB<3>
Begin
IfMB<11>

Wait(tml) Next

*

*

If Not MB<3>
Begin
PC=PC+1;
If MB<4>
AC = 0;
If MB
L = 0 Next

If MB<6>
IfMB<7>
End Next

Wait (tms) Next
MB = MIMAI;
IR = IR Or MIMA)<0:2>Next

AC = Not AC;
L=NotL

If MB<4>

*

If Not (opr Or iot)
PC = PC + 1 Next

Ifiot
Begin
PC = PC + 1 Next
If MB
IO.PULSE.1 = 1 Next
If MB<10>
IO.PULSE.2 = 1 Next
If MB<9>
IO.PULSE.4 = 1
End Next

*

*
*

*

*

*

*

AC = 0

End Next

Wait(tmd) Next
MJMAJ = MB Next
MF = PC Next

Wait(tmd) Next
MJMAJ = MB Next
MA = PC Next

Wait(tmd) Next
MJMAJ = MB Next
MA = MB;
If Not MB<4>
MA

Wait(t2) Next

Wait(t2) Next

Wait(t2) Next

*

*

L~AC == L@AC

+1

Next

If MB <8> And Not MB<10>
L@AC = L@AC 5" 1;
If MB<8> And MB<10>
L@AC = L@AC Srr 2;
If MB<9> And Not MB<10>
L@AC = L@AC Sir 1;
If MB<9> And MB<10>
L@AC = L@AC Sir 2
End Next

*

*

If (Not MB<3» And

If MB<3> And Not MB<11>
If MB<9>
AC = AC Or SWITCHES;
If MB<10>
RUN = 0 Next

*

*

jmp

PC

*

= MA;

If MB<3>

*

= 0 Next

*

If (Not MB<3» And
INotjrnp)

*

*

*

*

IR = 0;
MB
0;
State = 0 Next

IR = 0;
MB
0;
State
0 Next

=

, ,
,

FO

J

'-'
Figure 7.

=

, ,I
I
FO

'-~

=

=
=

IR
0;
MB
0;
State
0 Next

, ,J

I

FO

'-~

=

=

=

MB
0;
State = 1 Next

,,
,

DO

I

'-~

MB
0;
State = 2 Next

, ,

,

EO

I

'-'

PDP-8 Pc state diagram (part 1 of 2).

The state diagram (Figure 7) is fundamentally driven by minor clock cycles as seen from
both the state diagram and the times when the
four clock signals are generated. Thus, there are
3 (State. Register Eqv #0,#1,#2) X 4 (clock) or
12 major states in the implementation. The Instruction Register is used to obtain two more
states, F2b and F3b, for the description. The
State. Register values 0, 1, and 2 correspond to

fetching, deferred or indirect addressing (i.e.,
fetching an operand address), and executing.
The state diagram does not describe the Extended Arithmetic Element operation, the interrupt state, or the data break states (which add
12 more states). The initialization procedure,
including the console state diagram, is also not
given. One should observe that, at the beginning of the memory cycle, a new State.Register

222

BEGINNING OF THE MINICOMPUTER

"DEFER" (INDIRECT)
ADDRESS MEMORY CYCLE

"EXECUTION" MEMORY CYCLE

Wait(tms) Next
MB = MIMAI Next

Wait(tms) Next
MB = MIMAI Next

Wait(t') Next
If MA Eql #001
MB = MB + 1 Next

01

""

II and.

""

II tad ""
AC
AC Xor MB Next

=

If isz ""
Begin
MB
MB + 1 Next
If MB EqlO ""
PC + 1
End Next

=

II dca ""
MB
AC Next

=

If jms ""
MB
PC Next

=

Wait(tmd) Next
MIMAI
MB Next

=

MA=MBNext

ifjmp ""
PC
MB Next

=

II Not jmp ""

IR = 0;

=

=

MB
0;
State
0 N ext

I

FO'

..... ~

Figure 7.

If and. ""
AC
AC And MB Next

=

II tad ""
AC
carrylAC.MBI Next

=

If isz ""

*

II dca
AC = 0;

Iljms ""
PC
MA;

=

MB
0;
State
2 Next

=

=

I EO ,

'-~

PDP-8 Pc state diagram (part 2 of 2).

value is selected. The State. Register value is always held for the remainder of the cycle; i.e.,
only the sequences FO, FI, F2, F3, or DO, DI,
D2, D3, or EO, EI, E2, E3 are permitted.
LOGIC DESIGN LEVEL (REGISTERS AND
DATA OPERATIONS)

Proceeding from the register transfer and ISP
descriptions, the next level of detail is the logic
module. Typical of the level is the I-bit logic
module for an accumulator bit, AC, illustrated in Figure 8. The horizontal data inputs in
the figure are to the logic module from AC,
MB, AC input from the IO.Bus.In,
and SWITCHES. The control signal inputs

whose names are identified using the vertical
bar (e.g., lAC = 0 I) command the register operations (i.e., the transfers). They are labeled by
their respective ISP operations (for example,
AC = AC And MB, AC = AC SIr 1, for rotate
once left). The sequential state machine, for the
processor Pc(K), generates these control signal
inputs using a combinational circuit such as the
one shown in Figure 9.
LOGIC DESIGN LEVEL (PC CONTROL,
PC(K) SEQUENTIAL STATE MACHINE
NETWORK)

The output signals from the processor sequential machine (Figure 9) can be generated in

~

A

~

~

BUS TO EACH BIT Of AC
__________________________
__________________________

~

STRUCTURAL LEVELS OF THE PDP-8

r

\
AC
carry
output

Not
AC
Not
AC

MB

NotMB
AC-t-----t

MB

-+---+----f--+I

AC
ISEE NOTE)
carry
input

! LAC

10. Bus In



I
= CarryiAC. MB) I

lAC = Not Aci

lAC = AC Xor MB I

lAC = AC Xor MB I
I
IAC=I }AC
register
IAC = 01
transfer
' - - - - - I L A C = LAC Srrl
c~mtrol
signals
'------1 LAC = LAC sir!

I
L

NOTE:
AC = AC

-+-+------1--1-+1

lAC = AC Or SWITCHES 1

+

Figure S.

1 is formed by AC<11 > carry input.

PDP-S AC bit logic diagram.

'110' IR <0>

IR
IR<2>
(State.register Eqv 2)

IR
IR
lAC = 01:= Itl And (
IIR Eqv '111) And (State. register Eqv 0) And (

IR<2>

(Not MB<3> And MB<4> And Not MB<6» Or
IMB<3> ~nd MB<4> And MB And MB<4 And MB>ll <)) Or
IIR Eqv '011) And (State. register Eqv 2)))

MB<4>

MB<3>
Not MB

Logic diagram lor 1AC = 01

Figure 9.

PDP-S Pc(K) AC

=0

signal logic equation and diagram,

223

224

BEGINNING OF THE MINICOMPUTER

a straightforward fashion by formulating the
Boolean expressions directly from the state diagram in Figure 7. For example, the AC = 0 control signal is expressed algebraically and with a
combinational network in Figure 9. Obviously,
these Boolean output control signals are functions which include the clock, the
State. Register, and the states of the arithmetic
registers (for example, AC = 0, L = 0, etc.). The
expressions should be factored and minimized
so as to reduce the hardware cost of the control
for the interpreter. Although the sequential

~--+----'----41~

controller for the processor is mentioned here
only briefly, it constitutes about half the logic
within the processor.
CIRCUIT LEVEL

The final level of description is the circuits
that form the logic functions of storage (flipflops) and gating (NAND gates). Figures 10
and 11 illustrate some of these logic devices in
detail. In Figure 10 a direct set/direct clear flipflop (a sequential logic element) is described in

-15 V

~....................-o

o

DIRECT
SET

DIRECT
SET

OUTPUT
DIRECT
CLEAR

OUTPUT
DIRECT
SET CLEAR
FLIP-FLOP

1

DIRECT

OUTPUT

=

=

'-----------4~+10

(a)

SET

DIRECT
SET

V

Flip-flop circuit.

(b)

Combinational logic
equivalent of
flip-flop.

Inputs

1

0

0

-3

-3
-3

0
0

0

-3

-3

0

0

-3

Direct
Set

Direct
Clear

-3
-3
-3
-3

-3
-3

0
0

(c)

Outputs (At t+)
(See Note)

Inputs
Outputs (At t)

1

0

1

0

1
0
0
1
0

0

0

-3

0
0

-3
-3
-3

0
0
0

-3
-3

0
0

-3
-3

1
0

Direct
Set

Direct
Clear

0
0
0
0

0
0

0

Note this is not an "ideal" sequential circuit element because there is no delay in the output

Figure 10.

Direct set-clear
flip-flop
sequential logic
element.

Table of Flip-Flop Input-Output

Table of Circuit Input-Output

Outputs (At t)

OUTPUT

PDP-8 direct-coupled flip-flop and logic diagram.

1
0
0

Outputs (at t+)
(See Note)

1

0

1
0
0
0

0

1
0
0

STRUCTURAL LEVELS OF THE PDP-8

-15 volts
-3 voits

I

INPUT

INPUT

L--+---o OUTPUT

r

'''"''l~

. Figure 12 also includes
H

the associated circuit level hardware needed in
the core memory operation (e.g., power supplies, timing, and logic signal level conversion
amplifiers).
The timing signals are generated within the
control portion of the processor and are shown
together with processor clock in Figure 13. The
process of reading a word from memory is:
I.

2.
3.

4.

5.
6.

A 12-bit selection address is established
on the MA <0: 11> address lines, which
is I of #10000 (or 4096) unique numbers.
The upper 6 bits <0:5> select 1 of 64
groups of Y addresses, and the lower 6
bits <6: 11 > select 1 of 64 groups of X
addresses.
The read logic signal is made a 1 at time
t2.
A high current path flows via the X and
Y selection switches. In each of the X
and Y directions, 64 X 12 cores have selection current (Ix and Iy). Only one core
in each plane is selected since Ix = Iy =
Iswitching/2, and the current at the selected intersection = Ix + Iy = Iswitching.
If a core is switched to 0 (by having
Iswitching amperes through it), then a 1
is present and is read at the output of the
plane bit sense amplifiers. A sense amplifier receives an input from a winding
that threads every core of every bit
within a core plane [#0:#7777]. All 12
cores of the selected word are reset to O.
The time at which the sense amplifier is
observed is tms (the memory strobe),
which also causes the transfer MB =
M[MA].
The read current is turned off by timing
in the memory module.
The inhibit and write (slightly delayed)
logic signals are turned on at time tl.
The bit inhibit signal is present or not,
depending on whether a 0 or 1, respectively, is written into a bit.

STRUCTURAL LEVELS OF THE PDP-8

F

TIME..

INHIBIT

_~_ _ READ:

i

~ L._ _ _ _ _ _ WRITE

CONTROL
SiGiIoALS

~

-

-

-

-

-

-

--- -- -- -- -

ICURRENT
SWITCH}

I

~

___

,
I

--.l

TO
MB DATA
INPUTS

:

15/2 ~

-+____-!----.

15/2 WRITE ' - -_ _ _ _ _

{

<611>

-

-- --,,

------------

t 15/2 READ

/ ~~:~~~~E

15/2

LOGIC SIGNALS

,
,
,I
,
,

wire

LOW LEVEL WINDING
ISENSE SIGNALSI

Figure 12.

Y

= SELECT WIRE

FOUR WIRES THROUGH A CORE

SelECTION
CURRENT
POWER SUPPLY

+

IREAD)
_ IWRITE)

I

HIGH CURRENT
---SIGNALS
10 1+ 1./21-1./21

WRITE

I

X = Select

WRITE

READ

X SELECTION
ICURRENT
SWITCHESI

I

READ

-+
15/2

26 - 1

...

,

~ 15/2 WRITE

-

-,

,
,
I,

__ .22 12.. _________

FROM
MA
IDATA
INPUT)

INHIBIT
CURRENT
POWER
SUPPLY

i

<0>
___
_ _ _ ---1I

V-ADDRESS
DECODERS

~---:, C~~~:~L

I
1"--

I

-

--

~

I

-

-

Ir.HiBiT
DRIVE <11 >
ICURRENT
SWITCH)

L__T_M:=:O~+- ~ ___ _

_ ~ 52:

~-

INHIBIT
Driver

I

j

-

10 - 121 = 1512 INHIBIT

- _......._ -

l
I

_ _ _ _ _ _ _ ...J

FROM
MB
IDATA
{
INPUT)

227

X-ADDRESS
DECODERS

READ

WRITE

CURRENT DIRECTION CONTROLS

L ___ .J

PDP-8 four-wire coincident current (three dimensions) core memory logic diagram.

CLOCK
PULSES

Itmd)
INOTE 21

Itm5)
INOTE 1)
Ilt21,

I
0_5

IIT'

11(2 )
1.0

1.5
TIME

1~5)

READ

WRITE

INHIBIT
MEMORY
STROBE

11MB

= MIMAII

t--------Memorv

""
.I.

cycle--------t·~1

NOTES
1. tms memory-strobe
2_ tmd memory·done (determined by memory)

Figure 13.

PDP-8 clock and memory timing diagram.

228

7.

8.

BEGINNING OF THE MINICOMPUTER

A high current path flows via the X and
Y selection switches, but in an opposite
direction to the read case (see item 2). If
a 1 is written, no inhibit current is present and the net current in the selected
core is - Iswitching. If a 0 is written, the
current is - Iswitching +(lswitching/2)
and the core remains reset.
The inhibit and write logic signals are
turned off at time tmd specified by timing in the memory module, and the
memory cycle is completed.

Device Level

For a discussion of the behavior of the transistor as it is used in these switching circuit
primitives, the reader should consult semiconductor electronics and physics textbooks. It
is hoped that the reader has gained a sense of
how to think about the hierarchical decomposition of computers into particular levels of analysis (and synthesis) and that the hierarchical
approach will be of aid in the reading of Part

III.

Opposite:
Top. left to right:
• PDT-11 programmable data terminal.

•

VAX -1117 80.

Bottom. left to right:
• Model 20 central processor.
• PDP-11 packaging showing cabinet level integration.

lPA~l ~~~

lHIE lPlDlP=11IFAM~lL1f

The PDP-11 Family

The PDP-II has evolved quite differently from the other computers discussed
in this book and, as a result, provides an independent and interesting story. Like
the other computers, the factors that have created the various PDP-II machines
have been market and technology based, but they have generated a large number
of implementations (ten) over a relatively short (eight-year) lifetime. Because
there are multiple implementations spanning a performance range at the same
time, the PDP-II provides problems and insight which did not occur in the evolutions of the traditional mini (PDP-8 Family), the optimal price/performance machines (I8-bit), and the high performance timesharing machines (the DECsystem
10). The PD P-II designs cover a range of 500: I in system price ($500 to $250,000)
and 500:1 in memory size (4 Kwords to 2 Mwords).
Rather than attempt to summarize the goals of designers, sentiments of users,
or the thoughts of researchers, the discussion of the PDP-II is divided into chapters which, in most cases, consist of papers written contemporaneously with various important PDP-II developments. The chapters are arranged in five
categories: introduction to the PDP-II, conceptual basis for PDP-ll models, implementations of the PDP-II, evaluation of the PDP-ll, and the virtual address
extension of the PDP-ll.
INTRODUCTION TO THE PDP-11

Chapter 9, first published when the PDP-II was announced, introduces the
PDP-II architecture, gives its goals, and predicts how it might evolve. The concept of a family of machines is quite strong, but the actual development of that
family has differed a good deal from the projections in this chapter. The major
re::tsons (discussed in Chapter 16) for the disparity between the predicted and
actual evolution are:
1.

2.

3.

The notion of designing with improved technology, especially for a family
of machines, was not understood in 1970. This understanding came later
and was presented in a paper in 1972 [Bell et al., I972b].
The Unibus proved unacceptable for intercommunications at the very high
and low-end designs. Although Chapter 9 suggests a multiprocessor and
multiple Unibuses for high-end designs, such a structure did not evolve as
a standard.
The address space for both physical and virtual memory was too small.
231

232

4.

THE PDP-11 FAMILY

Several data-type extensions were not predicted. Although floating-point
arithmetic was envisioned, the character string and decimal operations
were not envisioned, or at least were not described. These data-types
evolved in response to market needs that did not exist in 1970.

CONCEPTUAL BASIS FOR THE PDP-11 MODELS

Chapters 10 and 11 consist of two papers that form some of the conceptual
basis for the various PDP-II models. Chapter 10 by Strecker is an exposition of
cache memory structure and its design parameters. The cache memory concept is
the basis of three PDP-ll models, the PDP-ll/34A, the PDP-l 1/60, and the
PDP-l1/70, in addition to the cache-8 (Chapter 7) and the KLIO processor for the
PDP-I0 (Chapter 21).
Strecker gives the performance evaluation in terms of cache miss ratios,
whereas the reader is probably interested in performance or speedup. These two
measures, shown in Figure I, are related [Lee, 1969] in the following way (assuming an infinitely fast processor):

p
m
t.C

t.p
R

Total number of memory accesses by the processor Pc
N umber of memory accesses that are missed by the cache and
have to be referred to the primary memory Mp
Cycle time of cache memory Mc
Cycle time of primary memory Mp
t.p/t.c (ratio of memory speeds), where R is typically 3 to 10

The relative execution speeds are:
t (no cache)
pR
t (to cache) = p + mR
speedup = pRI(p + mR) = RIO
a = miss ratio = mlp

+ (mlp)

R)

Therefore:
speedup

R 1(1

+

aR) = 1/(a

+ 1/ R)

Note that:
If a
If a

0 (100% hit), the speedup is R
1 (100% miss), the speedup is RIO + R), i.e., the speedup is
less than 1 (i.e., time to reference both memories)

Chapter 11 contains a unique discussion of buses - the communications link
between two or more computer system components. Although buses are a standard of interconnection, they are the least understood element of computer design

THE PDP-11 FAMILY

233

p = TOTAL NUMBER OF MEMORY ACCESSES
BY THE PROCESSOR. Pc

Figure 1. The structure of Pc. Mcache.
and Mp of cached computer.

because their implementation is distributed in various components. Their behavior is difficult to express in a state diagram or other conventional representation
(except a timing diagram) because the operation of buses is inherently pipelined;
hence, design principles and understanding are minimal.
In Chapter 11, Levy first characterizes the intercommunication problem into
the constituent dialogues that must take place between pairs of components. After
giving a general model of interconnection, Levy provides examples of PDP-II
buses that characterize the general design space. Finally, he discusses the various
intercommunications (model) aspects: arbitration (deciding which components
can intercommunicate), data transmission, and error control.
IMPLEMENTATIONS OF THE PDP-11

Chapter 12 is a descriptive narrative about the design of the LSI -11 at the chip,
board, and backplane levels. Since it was written from the viewpoint of a knowledgeable user, it lacks some of the detail that the designers at Western Digital
(Roberts, Soha, Pohlman) or at DEC (Dickhut, Dickman, Olsen, Titelbaum)
might have provided. A detailed account of the chip-level design is available,
however [Soha and Pohlman, 1974].
Two design levels are described: the three chip set microprogrammed computer
used to interpret the PDP-II instruction set, and the particular PMS-Ievel components that are integrated into a backplane to form a hardware system. Chapter
12 also provides a discussion of the microprogramming tradeoff that took place
between the chip and module levels. This tradeoff was necessary to carry out the
clock, console, refresh, and power-fail functions which are normally in hardware.
Since the time that the Sebern paper (Chapter 12) was written, packaging for
LSI-II systems has moved in two directions: toward the single board microcomputer and toward modularity. The single board microcomputer concept is

234

THEPDP-11 FAMILY

exemplified by the bounded system shown in Figure 2. This integrated system
contains an LSI-II chip set, 32 Kwords of memory, connectors for six communication line interfaces, and a controller for two floppy disk drives. It uses 175
circuits (to implement the same functionality using standard LSI-II modules
would require 375 integrated circuits). The modularity direction is exemplified by
the LSI-II /2, for which typical option modules are shown in Figure 3.
Unlike the reports from an architect's or reporter's viewpoint, Chapter 13 is a
direct account of the design process from the project viewpoint. A mid-range
machine is an inherently difficult design because it is neither the lowest cost nor

EIA-CONNECTOR MODULE

MOS RAM
4-8-16-32K X 16
1 K X 16 ROM

PERIPHERAL
MODULE
(3-6 USARTSI

TERMINAL INTELLIGENCE MODULE

LSI-11
CHIP SET
+5V
+12 V
MICRO-PROC
I/O EMULATOR

-5 V

90W
POWER
SUPPLY

+24 V

DRIVE 0
FLOPPY READ/WRITE
CONTROLLER
MODULE

MOS

Figure 2_

CONTROLLER
CHIP

A bounded LSI-11 based system.

DRIVE 1
(OPTIONl

THE PDP-11 FAMI LY

235

the highest performance machine of the family; and thus has to have the right
balance of features, price, and performance against criteria that are usually vague.
Four interesting aspects of computer engineering are shown in the PDP-II /60:
the cache to reduce Unibus traffic; trace-driven design of floating-point arithmetic processors; writable control store; and special features for reliability, availability, and maintainability.
The Unibus was found to be inadequate for handling all the data traffic in high
performance systems, but by using a cache, most processor references do not use
the Unibus and so leave it free for I/0 traffic. In the PDP-l 1/60 work described
in this chapter, Mudge uses Strecker's (Chapter 10) program traces and methodology. The cache design process is implicit in the way in which the work is carried
out to determine the structure parameters. Sensitivity plots are used to determine
the effects of varying each parameter of the design. The time between changes of
context is an important parameter because all real-time and multiprogrammed
systems have many context switches. The study leading to the determination of
block size is also given.
Microprogramming is used to provide both increased user-level capability and
increased reliability, availability, and maintainability. The writable control store
option is described together with its novel use for data storage. This option has
been recently used for emulating the PDP-8 at the OS/8 operating system level.
Chapter 14 presents a comprehensive comparison of the eight processor implementations used in the ten PDP-II models. The work was carried out to investigate various design styles for a given problem, namely, the interpretation of the
PDP-II instruction set. The tables provide valuable insight into processor implementations, and the data is particularly useful because it comes from Snow and
Siewiorek, non-DEC observers examining the PDP-II machines.
The tables include:
1.

2.
3.

A set of instruction frequencies, by Strecker, for a set often different applications. (The frequencies do not reflect all uses, e.g., there are no floatingpoint instructions, nor has operating system code been analyzed.)
Implementation cost (modules, integrated circuits, control store widths)
and performance (micro- and macroinstruction times) for each model.
A canonical data path for all PDP-II implementations against which each
processor is compared.

With this background data, a top-down model is built which explains the performance (macroinstruction time) of the various implementations in terms of the
microinstruction execution and primary memory cycle time. Because these two
parameters do not fully explain (model) performance, a bottom-up approach is
also used, including various design techniques and the degree of processor overlap. This analysis of a constrained problem should provide useful insight to both
computer and general digital systems designers.

236

THE PDP-11 FAMI LY

KD11-HA
LSI-11/2 microcomputer
processor

IBV11-A
IEEE instrument bus interface

Figure 3.

MSV11-D
Dynamic MaS RAM memory

MRV11-BA
4K UV PROM board with
256-word RAM

DLV11-J
Four-line serial interface

MRV11-AA
4K PROM board

The double-height modules forming the LSI-11 12 (part 1 of 2).

THE PDP-11 FAMILY

237

DRV11
16-bit parallel interface

DCK11-AC
Interface foundation kit

RXV11
Interface module for RX01
floppy disk

REV11-A
Refresh/ bootstrap/
diagnostic/ terminator
module

KPV11-A
Power sequencer/ line clock
module

DLV11
Single-line serial interface

Figure 3.

The double-height modules forming the LSI-11 /2 (part 2 of 2).

238

THE PDP-ll FAMILY

EVALUATION OF THE PDP-11

Chapter 15 evaluates the PDP-II as a machine for executing FORTRAN. Because FORTRAN is the most often executed language for the PDP-II, it is important to observe the PDP-II architecture as seen by the language processor - its
user. The first FORTRAN compiler and object (run) time system are described,
together with the evolutionary extensions to improve performance. The FORTRAN IV -PLUS (optimizing) compiler is only briefly discussed because its improvements, largely due to compiler optimization technology, are less relevant to
the PDP-II architecture.
The chapter title, "Turning Cousins into Sisters," overstates the compatibility
problem since the five variations of the PDP-II instruction set for floating-point
arithmetic are made compatible by essentially providing five separate object (run)
time systems and a single compiler. This transparency is provided quite easily by
"threaded code," a concept discussed in the chapter.
The first version of the FORTRAN machine was a simple stack machine. As
such, the execution times turned out to be quite long. In the second version, the
recognition of the special high-frequency-of-use cases (e.g., A+- 0, A +- A + 1) and
the improved conventions for three-address operations (to and from the stack)
allowed speedup factors of 1.3 and 2.0 for floating-point and integers.
It is interesting to compare Brender's idealized FORTRAN IV-PLUS machine
with the Floating-Point Processors (on the PDP-I 1/34, 11/45, 11/55, 11/60, and
11/70). If the FORTRAN machine described in the paper is implemented in microcode and made to operate at Floating-Point Processor speeds, the resulting
machines operate at roughly the same speed and programs occupy roughly the
same program space.
The basis for Chapter 16, "What Have We Learned From the PDP-II?" [Bell
and Strecker, 1976] was written to critique the original expository paper on the
PDP-II (Chapter 9) and to compare the actual with the predicted evolution. Four
critical technological evolutions - bus bandwidth, PMS structure, address space,
and data-type - are examined, along with various human organizational aspects
of the design.
The first section of Chapter 16 compares the original goals of the PDP-II
(Chapter 9) with the goals of possible future models from the original design
documents. Next, the ISP and PMS evolutions, including the VAX extension, are
described. The Unibus characteristics are especially interesting as the bus turns
out to be more cost-effective over a wider range than would be expected.
The section of the chapter which deals with multiprocessors and multicomputers gives the rationale behind the slow evolution of these structures. Because a number of these computer structures have been built (especially at
Carnegie-Mellon University), they are described in detail.
The final section of the chapter interrelates technology with the various implementations (including VAX-II/780) that have occurred. Table 6 gives the performance characteristics for the various models with the relevant technology,
contributions, and implementation techniques required to span the range.

THE PDP-11 FAMI LV

239

VIRTUAL ADDRESS EXTENSION OF THE PDP-11

The latest member of the PDP-II family, the Virtual Address Extension 11 or
VAX-II, is described in Chapter 17. This paper, by the architect of VAX-II,
discusses the new architecture and its first implementation, the VAX-II /780.
VAX-II extends the PDP-II to provide a large, 32-bit virtual address for each
user process. The architecture includes a compatibility mode that allows PDP-II
programs written for the RSX-II M program environment to run unchanged. In
this way, PDP-II programs can be moved among VAX and PDP-II computers,
depending on the user's address size and computational and generality needs.
Chapter 17 provides a clean, somewhat terse, yet comprehensive description of
the VAX-II architecture. Because the VAX part of the architecture is so complete
in terms of data-types, operators, addressing and memory management, it can
also serve as a textbook model and case study for architecture in general. Goals,
constraints, and various design choices are given, although explanations of what
was traded away in the design choices are not detailed.

A New Architecture
for Minicomputers
- The DEC PDP-11
C. GORDON BELL, ROGER CADY, HAROLD McFARLAND,
BRUCE A. DELAGI, JAMES F. O'LOUGHLlN,
RONALD NOONAN, and WILLIAM A. WULF

INTRODUCTION

The minicomputer* has a wide variety of
uses: communications controller, instrument
controller, large-system preprocessor, real-time
data acquisition systems, ... desk calculator.
Historically, Digital Equipment Corporation's
(DEC) PDP-8 family, with 6000 installations
has been the archetype of these minicomputers.
In some applications current minicomputers
have limitations. These limitations show up
when the scope of their initial task is increased
(e.g., using a higher level language, or processing more variables). Increasing the scope of the

task generally requires the use of more comprehensive executives and system control programs, hence larger memories and more
processing. This larger system tends to be at the
limit of current minicomputer capability, thus
the user receives diminishing returns with respect to memory, speed efficiency, and program
development time. This limitation is not surprising since the basic architectural concepts for
current minicomputers were formed in the early
1960s. First, the design was constrained by cost,
resulting in rather simple processor logic and

*The PDP-II design is predicated on being a member of one (or more) of the micro, midi, mini, ... maxi (computer name)
markets. We will define these names as belonging to computers of the third generation (integrated circuit to medium-scale
integrated circuit technology), having a core memory with cycle time of 0.5~2 J.1s, a clock rate of 5~ 10 MHz ... a single
processor with interrupts and usually applied to doing a particular task (e.g., controlling a memory or communications
lines, preprocessing for a larger system, process control). The specialized names are defined as follows.
Maximum
Addressable
Primary Memory
(Words)
Micro
Mini
Midi

XK

~5

32 K
65

K~

Processor and
Memory Cost
( 1970 Kilodollars)

128 K

Word
Length
(Bits)
8~12

5~10

12~16

IO~20

16~24

Processor
State
(Words)

2
2-4
4-16

Data-Types
Integers, words, Boolean vectors
Vectors (i.e., indexing)
Double length floating point
(occasionally)
241

242

THE PDP-11 FAMILY

register configurations. Second, application experience was not available. For example, the
early constraints often created computing designs with what we now consider weaknesses:
1.

2.
3.
4.

5.
6.
7.
8.
9.

Limited addressing capability, particularly of larger core sizes.
Few registers, general registers, accumulators, index registers, base registers.
No hardware stack facilities.
Limited priority interrupt structures,
and thus slow context switching among
multiple programs (tasks).
No byte string handling.
No read-only memory (ROM) facilities.
Very elementary I/O processing.
No larger model computer, once a user
outgrows a particular model.
High programming costs because users
program in machine language.

In developing a new computer, the architecture should at least solve the above problems. Fortunately, in the late 1960s, integrated
circuit semiconductor technology became available so that newer computers could be designed
that solve these problems at low cost. Also, by
1970, application experience was available to
influence the design. The new architecture
should thus lower programming cost while
maintaining the low hardware cost of minicomputers.
The DEC PDP-II Model 20 is the first computer of a computer family designed to span a
range of functions and performance. The
Model 20 is specifically discussed, although design guidelines are presented for other members
of the family. The Model 20 would nominally
be classified as a third generation (integrated
circuits), 16-bit word, one central processor
with eight 16-bit general registers, using two's
complement arithmetic and addressing up to 2 16
8-bit bytes of primary memory (core). Though
classified as a general register processor, the op-

erand accessing mechanism allows it to perform
eq ually well as a 0- (stack), 1- (general register),
and 2- (memory-to-memory) address computer.
The computer's components (processor, memories, controls, terminals) are connected via a
single switch, called the Unibus.
The machine is described using the processormemory-switch (PMS) notation of Bell and
Newell [1971] at different levels. The following
descriptive sections correspond to the levels: external design constraints level; the PMS level the way components are interconnected and allow information to flow; the program level- the
abstract machine that interprets programs; and
finally, the logical design level. (We omit a discussion of the circuit level, the PDP-II being
constructed from TTL integrated circuits.)
DESIGN CONSTRAINTS

The principal design objective is yet to be
tested; namely, do users like the machine? This
will be tested both in the marketplace and by
the features that are emulated in newer machines; it will be tested indirectly by the life span
of the PDP-II and any offspring.
Word Length

The most critical constraint, word length (defined by IBM), was chosen to be a mUltiple of 8
bits. The memory word length for the Model 20
is 16 bits, although there are 32- and 48-bit instructions and 8- and 16-bit data. Other members of the family might have up to 80-bit
instructions with 8-, 16-, 32- and 48-bit data.
The internal, and preferred external character
set, was chosen to be 8-bit ASCII.
Range and Performance

Performance and function range (extendability) were the main design constraints; in
fact, they were the main reasons to build a new
computer. DEC already has four computer

A NEW ARCHITECTURE FOR MINICOMPUTERS

families that span a range* but are incompatible. In addition to the range, the initial
machine was constrained to fall within the
small-computer product line, which means to
have about the same performance as a PDP-8.
Th",
;J!IIl.l(..Ll
... ;t;" 1 ....,,,,..h;
... ,,,, ""t
... "" .. f" ....... ,, tho
ono.:;:
..1 11\,.1
1110.\,,;'111.11'"
VUL..I:-'\.IIJ.V1111~
l.II""
.1...I...J.1. --',

LINC, and PDP-4 based families. Performance,
of course, is both a function of the instruction
set and the technology. Here, we are fundamentally only concerned with the instruction set
performance because faster hardware will always increase performance for any family. Unlike the earlier DEC families, the PDP-II had
to be designed so that new models with significantly more performance can be added to the
family.
A rather obvious goal is maximum performance for a given model. Designs were programmed using benchmarks, and the results
were compared with both DEC and potentially
competitive machines. Although the selling
price was constrained to lie in the $5,000 to
$10,000 range, it was realized that the decreasing cost of logic would allow a more complex
organization than that of earlier DEC computers. A design that could take advantage of
medium- and eventually large-scale integration
was an important consideration. First, it could
make the computer perform well; second, it
would extend the computer family'S life. For
these reasons, a general register organization
was chosen.
Interrupt Response. Since the PDP-ll will
be used for real-time control applications, it is
important that devices can communicate with
one another quickly (i.e., the response time of a
request should be short). A multiple priority
level, nested interrupt mechanism was selected;
additional priority levels are provided by the
physical position of a device on the Unibus.

243

Software polling is unnecessary because each
device interrupt corresponds to a unique address.
Software

The total system including software is, of
course, the main objective of the design. Two
techniques were used to aid programmability.
First, benchmarks gave a continuous indication
as to how well the machine interpreted programs; second, systems programmers continually evaluated the design. Their evaluation
considered: what code the compiler would produce; how would the loader work; ease of program relocatability; the use of a debugging
program; how the compiler, assembler, and editor would be coded - in effect, other benchmarks; how real-time monitors would be
written to use the various facilities and present a
clean interface to the users; finally, the ease of
coding a program.
Modularity

Structural flexibility (sometimes called modularity) for a particular model was desired. A
flexible and straightforward method for interconnecting components had to be used because
of varying user needs (among user classes and
over time). Users should have the ability to
configure an optimum system based on cost,
performance, and reliability, both by interconnection and, when necessary, constructing
new components. Since users build special
hardware, a computer should be interfaced easily. As a by-product of modularity, computer
components can be produced and stocked,
rather than tailor-made on order. The physical
structure is almost identical to the PMS structure discussed in the following section; thus,

* PDP-4, 7, 9, 15 family; PDP-5, 8, 8/S, 8/1, 8/L family; LINe, PDP-8/LINC, PDP-12 family; and PDP-6, 10 family. The
initial PDP-\ did not achieve family status.

244

THE PDP-11 FAMILY

reasonably large building blocks are available
to the user.
Microprogramming

A note on microprogramming is in order because of current interest in the "firmware" concept. We believe microprogramming, as we
understand it [Wilkes and Stringer, 1953], can
be a worthwhile technique as it applies to processor design. For example, microprogramming
can probably be used in larger computers when
floating-point data operators are needed. The
I BM System 360 has made use of the technique
for defining processors that interpret both the
System 360 instruction set and earlier family instruction sets (e.g., 1401, 1620, 7090). In the
PDP-II, the basic instruction set is quite
straightforward and does not necessitate microprogrammed interpretation. The processormemory connection is asynchronous; therefore,
memory of any speed can be connected. The instruction set encourages the user to write reentrant programs. Thus, read-only memory can
be used as part of primary memory to gain the
permanency and performance normally attributed to microprogramming. In fact, the Model
10 computer, which will not be further discussed, has a 1024-word read-only memory,
and a 128-word read-write memory.
Understandability

Understandability was perhaps the most fundamental constraint (or goal) although it is now
somewhat less important to have a machine
that can be understood quickly by a novice
computer user than it was a few years ago.
DEC's early success has been predicated on selling to an intelligent but inexperienced user. Understandability, though hard to measure, is an

* A descriptive

important goal because all (potential) users
must understand the computer. A straightforward design should simplify the systems programming task; in the case of a compiler, it
should make translation (particularly code generation) easier.
PDP-11 STRUCTURE AT THE PMS
LEVEL *
Introduction

PDP-II has the same organizational structure as nearly all present-day computers (Figure
I). The primitive PMS components are: the
primary memory M p which holds the programs
while the central processor Pc interprets them;
I/O controls Kio which manage data transfers
between terminals T or secondary memories Ms
to primary memory Mp; the components outside the computer at periphery X either humans
H or some external process (e.g., another computer); the processor console (T.console) by
which humans communicate with the computer
and observe its behavior and affect changes in
its state; and a switch S with its control K which
allows all the other components to communicate with one another. In the case of PDP-I I,
the central logical switch structure is implemented using a bus or chained switch S called
the Unibus, as shown in Figure 2. Each physical
component has a switch for placing messages
on the bus or taking messages off the bus. The
central control decides the next component to
use the bus for a message (call). The S (Unibus)
differs from most switches because any component can communicate with any other component.
The types of messages in the PDP-II are
along the lines of the hierarchical structure
common to present-day computers. The single

(blo~k-diag.ram) level [Bell ~nd N.ewell. 1970] to describe the relationship of the computer components:
processors, memofles. sWItches. controls. lInks. terminals. and data operators. PMS is described in Appendix 2.

A NEW ARCHITECTURE FOR MINICOMPUTERS

~IPHER¥

HUMAN
USER

\

UNIBUS
(INTERCONNECTS
OTHER
COMPONENTSI

~_ _----.

bus makes conventional and other structures
possible. The message processes in the structure
that utilize S (Unibus) are:
I.

\
J

CONTROL

I--_C_ON_T_RO_L---I • • - ;

SECONDARY
MEMORY

TERMINALS

(E.G .. DISKI

L-(E._G._.
T""'EL_ET_YP_EI.....

/

/'
HUMAN USER
OR
OTHER PROCESS

(a)

2.

Conventional block diagram.

,
I
jPERIPHERY

./

(b)

/

/

I
3.

PMS diagram (see Appendix 2).

Figure 1. Conventional block diagram and PMS
diagram of PDP-11.

COMPUTER
PERIPHERY

x

x

4.
UNIBUS
SWITCHING
STRUCTURE

1.

Unibus control packages with Pc.

Figure 2.

PDP-11 physical structure PMS diagram.

245

The central processor Pc requests that
data be read or wriiien from or to
primary memory Mp for instructions
and data. The processor calls a particular memory module by concurrently
specifying the module's address, and the
address within the modules. Depending
on whether the processor requests reading or writing, data is transmitted either
from the memory to the processor or
vice versa.
The central processor Pc controls the initialization of secondary memory Ms
and terminal T activity. The processor
sets status bits in the control associated
with a particular Ms or T, and the device
proceeds with the specified action (e.g.,
reading a card or punching a character
into paper tape). Since some devices
transfer data vectors directly to primary
memory, the vector control information
(i.e., the memory location and length) is
given as initialization information.
Controls request the processor's attention in the form of interrupts. An interrupt request to the processor has the
effect of changing the state of the processor; thus, the processor begins executing
a program associated with the interrupting process. Note that the interrupt
process is only a signaling method, and
when the processor interrupt occurs, the
interrupter specifies a unique address
value to the processor. The address is a
starting address for a program.
The central processor can control the
transmission of data between a control
(for Tor Ms) and either the processor or
a primary memory for program controlled data transfers. The device signals
for attention using the interrupt dialogue

246

5.

6.

THE PDP-ll FAMILY

and the central processor responds by
managing the data transmission in a
fashion similar to transmitting initialization information.
Some device controls (for T or Ms)
transfer data directly to/from primary
memory without central processor intervention. In this mode the device behaves
similarly to a processor; a memory address is specified, and the data is transmitted between the device and primary
memory.
The transfer of data between two controls, e.g., a secondary memory (disk)
and say a terminal/To display is not precluded, provided the two use compatible
message formats.

As we show more detail in the structure there
are, of course, more messages (and more simultaneous activity). The above does not describe
the shared control and its associated switching
which is typical of a magnetic tape and magnetic disk secondary memory systems. A control for a OECtape memory (Figure 3) has an S
(,OECtape bus) for transmitting data between a
single tape unit and the 0 ECtape transport.
The existence of this kind of structure is based
on the relatively high cost of the control relative
to the cost of the tape and the value of being
able to run concurrently with other tapes. There
is also a dialogue at the periphery between X-T

and X-Ms that does not use the Unibus. (For
example, the removal of a magnetic tape reel
from a tape unit or a human user H striking a
typewriter key are typical dialogues.)
All of these dialogues lead to the hierarchy of
present computers (Figure 4). In this hierarchy
we can see the paths by which the above messages are passed: Pc-Mp; Pc-K; K-Pc; Kio-T
and Kio-Ms; and Kio-Mp; and, at the periphery, T-X and T-Ms; and T. console-H.
Model 20 Implementation

Figure 5 shows the detailed structure of a
uniprocessor Model 20 PDP-II with its various
components (options). In Figure 5, the Unibus
characteristics are suppressed. (The detailed
properties of the switch are described in the logical design section.)
Extensions to I ncrease Performance

The reader should note (Figure 5) that the
important limitations of the bus are: a concurrency of one, namely, only one dialogue can
occur at a given time, and a maximum transfer
rate of one 16-bit word per 0.75 microsecond,
giving a transfer rate of 21.3 megabits/second.
While the bus is not a limit for a uniprocessor
structure, it is a limit for multiprocessor structures. The bus also imposes an artificial limit on
the system performance when high-speed devices (e.g., TV cameras, disks) are transferring

UNIBUS

Figure 3.

DECtape control switching PMS diagram.

Figure 4

Conventional hierarchy computer structure

A NEW ARCHITECTURE FOR MINICOMPUTERS

data to multiple primary memories. On a larger
system with multiple independent memories,
the supply of memory cycles is 13 megabits/second times the number of modules. Since
there is such a large supply of memory cycles
Der second and since the centr::ll r----..,...,nro~e""or. . ~~n
___
absorb only approximately 16 megabits/second, the simple one-Unibus structure
must be modified to make the memory cycles
available. Two changes are necessary. First,
each of the memory modules has to be changed
so that multiple units can access each module
on an independent basis. Second, there must be
independent control accessing mechanisms.
Figure 6 shows how a single memory is modi.I.

------

----

--------

A

247

fied to have more access ports (i.e., connect to
four Unibuses).
Figure 7 shows a system with three independent memory modules that are accessed by two
independent Unibuses. Note that two of the
secondary memories and one of the transducers
are connected to both U nibuses. It should be
noted that devices that can potentially interfere
with Pc-M p accesses are constructed with two
ports; for simple systems, both ports are connected to the same bus, but for systems with
more buses, the second connection is to an independent bus.
Figure 8 shows a multiprocessor system with
two central processors and three U nibuses. Two
of the Unibus controls are included within the
two processors, and the third bus is controlled
by an independent control unit. The structure
also has a second switch to allow either of two
processors (Unibuses) to access common shared
devices. The interrupt mechanism allows either

Teletype; Model 33.35 ASR;
full duplex; 10 char/second;
char set: ASCII; 8 bit/char

Paper tape; reader;
100 char/second; 8 bit/char

Paper tape; punch;

100 char/second; 8 bit/char

M

K

(a)

Secondary s; fixed head disk;
16 bits/word; 32768 words;
i.rate; 66I's/word;
t.access: 0 . . . . 34 ms

l-port.

(60 cycle clock) - L (60 cycle line)-

NOTES
1. Mp (technology: core; 4096 words; t.cycle: 1.2I's;
t.access: 0.6 I'S; 16 bits/word)
2. P(central\c; Model 20; integrated circuit; general registers;
2 addresses/instruction; addresses are: register, stack Mp;
data-types: bits. bytes. words. word integers. byte integers.
Boolean vectors; 8 bits/byte; 16 bits/word; operations:
(+. -. / (optional). X (optional). /2. X2 ....,. - (negate);
(V, :;I);
M(processor state; 'general registers; 8 + 1 word;
integrated circuit))
3. S ('Unibus; non-hierarchy; bus; concurrency: 1;
1 word/D.75 I's)

Figure 5.

PDP-ll structure and characteristics

PMS diagram.

(b)
Figure 6.
diagram.

4-port.
1- and 4-port memory modules PMS

248

THE PDP-11 FAMILY

INITIALIZATION

Ms OR T TO
MpMESSAGES

Figure 7. Three Mp, two S ('Unibus) structure
PMS diagram.

1. K('Unibusl
2. K('Unibus multiple bus to single bus coupler;
from: 2 Unibus; to: 1 Unibusl
3. K('Processor·to·processor couplerl
4. Ms(duplexl

Figure 8.

Dual Pc multiprocessor system PMS diagram.

processor to respond to an interrupt, and similarly either processor may issue initialization
information on an anonymous basis. A control
unit is needed so that two processors can communicate with one another; shared primary
memory is normally used to carry the body of
the message. A control connected to two Pc's
(Figure 8) can be used for reliability; either processor or Unibus could faiC and the shared Ms
would still be accessible.

Higher Performance Processors

Increasing the bus width has the greatest
effect on performance. A single bus limits data
transmission to 21.4 megabits/second, and
though Model 20 memories are 16 megabits/second, faster (or wider) data path width
modules will be limited by the bus. The Model
20 is not restricted, but for higher performance
processors operating on double-word (fixedpoint) or triple-word (floating-point) data, two

A NEW ARCHITECTURE FOR MINICOMPUTERS

or three accesses are required for a single datatype. The direct method to improve the performance is to double or triple the primary
memory and centra! processor data path
widths. Thus, the bus data rate is automatically
doubled or tripled.
For 32- or 48-bit memories, a coupling control unit is needed so that devices of either
width appear isomorphic to one another. The
coupler maps a data request of a given width
into a higher- or lower-width request for the bus
being coupled to, as shown in Figure 9. (The
bus is limited to a fixed number of devices for

4B-BIT UNIBUS

Figure 9.

16-BIT UNIBUS

Computer with 48-bit Pc, Mp with 16-bit

Ms, T.PMS diagram.

electrical reasons; thus, to extend the bus, a busrepeating unit is needed. The bus-repeating control unit is almost identical to the bus coupler.)
A computer with a 48-bit primary memory and
processor and 16-bit secondary memory and
terminals (transducers) is shown in Figure 9.
In summary, the design goal was to have a
modular structure providing the final user with

249

freedom and flexibility to matGh his needs. A
secondary goal of the Unibus is open-endedness
by providing multiple buses and defining wider
path buses. Finally, and most important, the
Unibus is straightforward.
THE INSTRUCTION SET PROCESSOR
(lSP) lEVEl-ARCHITECTURE*
Introduction, Background, and Design
Constraints

The Instruction Set Processor (lSP) is the
machne defined by the. hardware and/or software that interprets programs. As such, an ISP
is independent of technology and specific implementations.
The instruction set is one of the least understood aspects of computer design; currently, it
is an art. There is currently no theory of instruction sets, although there have been attempts to
construct them [Maurer, 1966], and there has
also been an attempt to have a computer pro. gram design an instruction set [H aney, 1968].
We have used the conventional approach in this
design. First, a basic ISP was adopted and then
incremental design modifications were made
(based on the results of the benchmarks).t
Although the approach to the design was
conventional, the resulting machine is not. A
common classification of processors is as 0-, 1-,
2-, 3-, or 3-plus-l-address machines. This
scheme has the form:
op 11, 12, 13, 14

*The word "architecture" has been operationally defined [Amdahl et al., \964] as "the attributes of a system as seen by a
programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flow and
controls, the logical design, and the physical implementation."

t A predecessor multiregister computer was proposed that used a similar design process. Benchmark programs were coded on
each of ten "competitive" machines, and the object of the design was to get a machine that gave the best score on the
benchmarks. This approach had several fallacies: The machine had no basic character of its own: the machine was difficult
to program since the multiple registers were assigned to specific functions and had inherent idiosyncrasies to score well on
the benchmarks: the machine did not perform well for programs other than those used in the benchmark test: and finally,
compilers that took advantage of the machine appeared to be difficult to write. Since all "competitive machines" had been
hand-coded from a common flowchart rather than separate flowcharts for each machine, the apparent high performance
may have been due to the flowchart organization.

250

THE PDP-11 FAMILY

where 11 specifies the location (address) in
which to store the result of the binary operation
(op) of the contents of operand locations 12 arid
13, and 14 specifies the location of the next instruction.
The action of the instruction is of the form:
11

~

12 op 13; goto 14

The other addressing schemes assume specific
values for one or more of these locations. Thus,
the one-address von Neumann [Burks et al.,
1962] machines assume 11 = 12 = the accumulator and 14 is the location following that of
the current instruction. The two-address machine assumes 11 = 12; 14 is the next address.
Historically, the trend in machine design has
been to move from a 1- or 2-word accumulator
structure as in the von Neumann machine toward a machine with accumulator and index
register(s). * As the number of registers is increased, the assignment of the registers to specific functions becomes more undesirable and
inflexible; thus, the general register concept has
developed. The use of an array of general registers in the processor was apparently first used in
the first generation, vacuum-tube machine,
PEGASUS [Elliott et al., 1956] and appears to
be an outgrowth of both 1- and 2-address structures. (Two alternative structures - the early 2and 3-address-per-instruction computers may
be disregarded, since they tend to always access
primary memory for results as well as temporary storage and thus are wasteful of time and
memory cycles and require a long instruction.)
The stack concept (O-address) provides the most
efficient access method for specifying algorithms, since very little space, only the access
addresses and the operators, needs to be given.
In this scheme the operands of an operator are
always assumed to be on the "top of the stack."
The stack has the additional advantage that

arithmetic expression evaluation and compiler
statement parsing have been developed to use a
stack effectively. The disadvantage of the stack
is due, in part, to the nature of current memory
technology. That is, stack memories have to be
simulated with random-access memories; multiple stacks are usually required; and even
though small stack memories exist, as the stack
overflows, the primary memory (core) has to be
used.
Even though the trend has been toward the
general register concept (which, of course, is
similar to a 2-address scheme in which one of
the addresses is limited to small values), it is important to recognize that any design is a compromise. There are situations for which any of
these schemes can be shown to be "best." The
IBM System 360 series uses a general register
structure, and their designers [Amdahl et al.,
1964] claim the following advantages for the
scheme.
1.

2.

Registers can be assigned to various
functions: base addressing, address calculation, fixed-point arithmetic, and indexing.
A vailability of technology makes the
general register structure attractive.

The System 360 designers also claim that a
stack organized machine such as the English
Electric KDF 9 [Allmark and Lucking, 1962] or
the Burroughs B5000 [Lonergan and King,
1961] has the following disadvantages.
I.

2.
3.

Performance is derived from fast registers, not the way they are used.
Stack organization is too limiting and requires many copy and swap operations.
The overall storage of general registers
and stack machines are the same, considering point 2.

*Due, in part, to needs, but mainly to technology that dictates how large the structure can be.

A NEW ARCHITECTURE FOR MINICOMPUTERS

4.

5.
6.

The stack has a bottom, and when
placed in slower memory, there is a performance loss.
Subroutine transparency is not easily realized with one stack.
Variable length data is awkward with a
stack.

We generally concur with points 1, 2, and 4.
Point 5 is an erroneous conclusion, and point 6
is irrelevant (that is, general register machines
have the same problem). The general register
scheme also allows processor implementations
with a high degree of parallelism since all instructions of a local block can operate on several registers concurrently. A set of truly
general purpose registers should also have additional uses. For example, in the DEC PDP-I0,
general registers are used for address integers,
indexing, floating point, Boolean vectors (bits),
or program flags and stack pointers. The general registers are also addressable as primary
memory, and thus, short program loops can reside within them and be interpreted faster. It
was observed in operation that PDP-I0 stack
operations were very powerful and often used
(accounting for as many as 20 percent of the
executed instructions in some programs, e.g.,
the compilers).
The basic design decision that sets the PDP11 apart was based on the observation that by
using truly general registers and by suitable addressing mechanisms, it was possible to consider the machine as a O-address (stack), 1address (general register), or 2-address (memory-to-memory) computer. Thus, it is possible
to use whichever addressing scheme, or mixture
of schemes, is most appropriate.
Another important design decision for the instruction set was to have only a few data-types
in the basic machine, and to have a rather complete set of operations for each data-type. (Alternative designs might have more data-types
with few operations, or few data-types with few
operations.) In part, this was dictated by the

251

machine size. The conversion between datatypes must be accomplished easily either automatically or with one or two instructions. The
data-types should also be sufficienily primitive
to allow other data-types to be defined by software (and by hardware in more powerful versions of the machine). The basic data-type of
the machine is the 16-bit integer which uses the
two's complement convention for sign. This
data-type is also identical to an address.
PDP-11 Model 20 Instruction Set (Basic
I nstruction Set)

A formal description of the basic instruction
set is given in the original paper [Bell et al.,
1970] using the ISPL notation [Bell and Newell,
1970]. The remainder of this section will discuss
the machine in a conventional manner.
Primary Memory. The primary memory
(core) is addressed as either 2 16 bytes or 2 15
words using a 16-bit number. The linear address
space is also used to access the input/output devices. The device state, data and control registers are read or written like normal memory
locations.
General Register. The general registers are
named: R[0:7]< 15:0>; that is, there are eight
registers each with 16 bits. The naming is done
starting at the left with bit 15 (the sign bit) to
the least significant bit O. There are synonyms
for R[6] and R[7]:
1.

Stack Pointer\SP< 15:0>

:= R[6]<@ 15:0>

2.

U sed to access a special stack that is
used to store the state of interrupts,
traps, and subroutine calls.
Program Counter\PC< 15:0>
:= R[7]<@15:0>
Points to the current instruction being
interpreted. It will be seen that the fact
that PC is one of the general registers is
crucial to the design.

252

THEPDP-11 FAMILY

Any general register, R[0:7], can be used as a
stack pointer. The special Stack Pointer SP has
additional properties that force it to be used for
changing processor state interrupts, traps, and
subroutine calls. (It also can be used to control
dynamic temporary storage subroutines.)
In addition to the above registers there are 8
bits used (from a possible 16) for processor status, called PS< 15:0> register. Four bits are the
Condition Codes\ CC associated with arithmetic results; the T-bit controls tracing; and 3
bits control the priority of running programs
Priority < 2:0>. Individual bits are mapped in
PS as shown in the appendix.
Data-Types and Primitive Operations.

There are two data lengths in the basic machine:
bytes and words, which are 8 and 16 bits, respectively. The nontrivial data-types are wordlength integers (w.i.); byte-length integers (by.i);
word-length Boolean vectors (w.bv); i.e., 16 independent bits (Booleans) in a I-dimensional
array; and byte-length Boolean vectors (by.bv).
The operations on byte and word Boolean vectors are identical. Since a common use of a byte
is to hold several flag bits (Booleans), the operations can be combined to form the complete
set of 16 operations. The logical operations are:
"clear," "complement," "inclusive or," and
"implication" (x ::J y or -, x V y).
There is a complete set of arithmetic operations for the word integers in the basic instruction set. The arithmetic operations are: "add,"
"subtract," "m ultiply" (optional), "divide"
(optional), "compare," "add one," "subtract
one," "clear," "negate," and "multiply and divide" by powers of two (shift). Since the address
integer size is 16 bits, these data-types are most
important. Byte-length integers are operated on
as words by moving them to the general registers where they take on the value of word integers. Word-length-integer operations are

* Note

carried out and the results are returned to memory (truncated).
The floating-point instructions defined by
software (not part of the basic instruction set)
require the definition of two additional datatypes (of length two and three), i.e., double
words (d.w.) and triple words (t.w.). Two additional data-types, double integer (d.i.) and triple
floating-point (tJ. or f) are provided for arithmetic. These data-types imply certain additional operations and the conversion to the
more primitive data-types.
Address (Operand) Calculation. The general methods provided for accessing operands
are the most interesting (perhaps unique) part
of the machine's structure. By defining several
access methods to a set of general registers, to
memory, or to a stack (controlled by a general
register), the computer is able to be a 0-, 1-, and
2-address machine. The encoding of the instruction source (S) fields and destination (0) fields
are given in Figure 10 together with a list of the
various access modes that are possible. (The appendix gives a formal description of the effective address calculation process.)
It should be noted from Figure 10 that all the
common access modes are included (direct,
indirect, immediate, relative, indexed, and indexed indirect) plus several relatively uncommon ones. Relative (to PC) access is used to
simplify program loading, while immediate
mode speeds up execution. The relatively uncommon access modes, auto-increment and
auto-decrement, are used for two purposes: access to a stack under control of the registers*
and access to bytes or words organized as
strings or vectors. The indirect access mode allows a stack to hold addresses of data (instead
of data). This mode is desirable when manipulating longer and variable-length data-types
(e.g., strings, double fixed, and triple floating

that, by convention, a stack builds toward register O. and when the stack crosses

400~,

a stack overflow occurs.

A NEW ARCHITECTURE FOR MINICOMPUTERS

I
S{

I
sm

o{

5

BIT

8
.sd

3

4
dm

~

encoding for the common operation: A +- B
(note that the stack and general registers are not
involved). The vector moves A[I] +- B(I) is also
efficiently encoded. For the general register
(and I-address format), there are about 13
MOVE operations that are commonly used, Six
moves can be encoded for the stack (about the
same number found in stack machines).

d

10

11

dd

~EG!STE~

BIT

0
dr

SPEC!F!CATION R(!'!

= DEFER (INDIRECT) ADDRESS BIT

d
m

= MODE (00 = R[r[; 01 = R[rt; NEXT R[rt + si;
10
11

1

= R[rt. Rlrt -ai. NEXT RI2t
= INDEXED WITH NEXT WORD)

The following access modes can be specified:

o

Direct to a register R[rt.
Indirect to a register. R[rt for address of data.
Auto increment via register (pop) - use register as address.
then increment register.
Auto increment via register (pop) - defer.

4

Auto decrement via register (push) - decrement register. then
use register as address.
Auto decrement indirect - decrement register. then use register
as the address of the address of data.
Immediate data - next full word is the data (r

= PC).

Direct data - next full word is the address of data (r

= PC).

Direct indexed - use next full word indexed with R[rt as address of data.
Direct indexed - indirect - use next full word indexed with R[rt
as the address of the address of data.
Relative access - next full word plus PC ;s the address (R
PC).

=

Relative indirect access - next full word plus PC is the address
of the address of data (r = PC).
1.

Address increment/ai value is 1 or 2.

Figure 10.

253

Address calculation formats.

point). The register auto-increment mode may
be used to access a byte string; thus, for example, after each access, the register can be
made to point to the next data item. This is used
for moving data blocks, searching for particular
elements of a vector, and byte-string operations
(e.g., movement, comparisons, editing).
This addressing structure provides flexibility
while retaining the same, or better, coding efficiency than classical machines. As an example
of the flexibility possible, ,consider the variations possible with the m6st trivial word instruction MOVE (Table 1). The MOVE instruction is coded in conventional 2-address, I-address (general register) and O-address (stack)
computers. The 2-address format is particularly
nice for MOVE, because it provides an efficient

Instruction Formats. There are several instruction decoding formats depending on
whether zero, one, or two operands have to be
explicitly referenced. When two operands are
required, they are identified as source Sand
destination D and the result is placed at destination D. For single operand instructions (unary
operators), the instruction action is D +- U D;
and for two operand instructions (binary operators), the action is D +- D b S (where u and b
are unary and binary operators, e.g., ' I , - and
+, -, x, j, respectively. Instructions are specified by a 16-bit word. The most common binary
operator format (that for operations requiring
two addresses) uses bits 15: 12 to specify the operation code, bits 11:6 to specify the destination
D, and bits 5:0 to specify the source S. The
other instruction formats are given in Figure 11.
Instruction Interpretation Process. The
instruction interpretation process is given in
Figure 12, and follows the common fetchexecute cycle. There are three major states: (1)
interrupting - the PC and PS are placed on the
stack accessed by the Stack Pointer jSP, and the
new state is taken from an address specified by
the source requesting the trap or interrupt; (2)
trace (controlled by T -bit) - essentially one instruction at a time is executed as a trace trap
occurs after each instruction, and (3) normal instruction interpretation. The five (lower) states
in the diagram are concerned with instruction
fetching, operand fetching, executing the operation specified by the instruction and storing
the result. The nontrivial details for fetching
and storing the operands are not shown in the
diagram but can be constructed from the effective address calculation process (appendix). The

254

THE POP-11 FAMILY

BINARY ARITHMETIC ANO LOGICAL OPERATIONS:
, (SEE NOTE)
OI .
IbOpIS
~

FORM: 0

S

b

0

EXAMPLE: AOO L=bop=0010)- (CC.O

~

O+S);

UNARY ARITHMETIC ANO LOGICAL OPERATION:

FORM: 0 -u 0;
EXAMPLES: NEG L=uop=0000101100)

~

(CC.O _ - O)-NEGATE

ASLL=uop=00000110011)~(CC.0

.0 X 2); SHIFT LEFT

BRANCH (RELATIVE) OPERATORS:

I

brop

I

offset

I

FORM: IF brop condition. then (PC -' PC + offset);
EXAMPLE: BEQ L= brop = 0316)(2 - (PC

JUMP:

I

000

0

000

001

~

PC + offset)

0

I

0

I

FORM: PC - 0 + Pc
JUMP TO SUBROUTINE:

1

0

000

100

Figure 12. POP-11 instruction interpretation process
state diagram.

SAVE Rlsrl ON STACK. ENTER SUBROUT:NE AT 0 + PC

MISCELLANEOUS OPERATIONS:

I
FORM:

op

code

I

the ADD instruction is executed with the above
effect). In general, the CC are based on the result, that is, Z is set if the result is zero, N if
negative, C if a carry occurs, and V if an overflow was detected as a result of the operation.
Conditional branch instructions may thus follow the arithmetic instruction to test the results
of the CC bits.

ST~f

EXAMPLE: HALT (: = instruction = 0) - (RUN

~

0);

NOTE:
These instructions are all one word. 0 and/or S may each
require one additional immediate data or address word.
Thus. instructions can be one, two. or three words long.

Figure 11.

PO P-11 instruction formats (simplified).

state diagram, though simplified, is similar to 2and 3-address computers, but is distinctly different than a I-address (I-accumulator) computer.
The ISP description (appendix) gives the operation of each of the instructions, and the more
conventional diagram (Figure 11) shows the decoding of instruction classes. The ISP description is somewhat incomplete; for example, the
add instruction is defined as:
ADD (:= bop = 0010 2) =9 (CC,D

+-

0

+

S)

A ddition does not exactly describe the changes
to the Condition Codes CC (which means
whenever a binary opcode [bop] of 0010 2 occurs

Examples of Addressing Schemes
Use as a Stack (Zero-Address) Machine.

Table 2 lists typical O-address machine instructions together with the PDP-II instructions that
perform the same function. It should be noted
that translation (compilation) from normal infix expressions to reverse Polish is a comparatively trivial task. Thus, one of the primary
reasons for using stacks is for the evaluation of
expressions in reverse Polish form.
Consider an assignment statement of the
form:
D

+-

A

+ B/C

A NEW ARCHITECTURE FOR MINICOMPUTERS

Tabie 1.

255

Coding for the MOVE instruction To Compare with Conventionai Machines

Assembler Format

Effect

Description

Ar-B

nCfJlo",c M

2-Address Machine
Format
~nf"\\Ir:::

IYIVVI:.

0

11*

U, ""

MOVE #N. A
MOVE B(RZ). A(RZ)
MOVE (R3)+. (R4)+

General-Register
Machine Format
MOVE A. R1
MOVE R1. A
MOVE @A. R1
MOVE R1. R3
MOVE R1. A(R1)
MOVE @A(RO). R1
MOVE (R1). R3
MOVE (R1)+. R3
Stack Machine Format
MOVE #N. -(RO)
MOVE A. -(RO)
MOVE @(RO)+. -(RO)
MOVE (RO)+. A
MOVE (RO)+. @(RO)+
MOVE (RO). -(RO)

0 .... _1 .... __

A

••• : ... L

..... __ ... __ .r.. _ _ Z

0

VVILII I"UIILCIIL:> UI

0

A+-N
A[I] +- B[I]
A[I] +- B[I];
I +- I + 1

Replace A with number N
Replace element of a connector
Replace element of a vector. move to next element

R1 +- A
A+- R1
R1 +- M[A]
R 1 +- R3
A[I] +- R1
R1 +- M[A[I]]
R 1 +- M[R2]
R3 +-M[I]

Load register
Store register
Load or store indirect via element A
Register-to-register transfer
Store indexed (load indexed) (or store)
Load (or store) indexed indirect
Load indirect via register
Load (or store) element indirect via register. move to next element

S+-N
S+-A
S +- M[S]
A+-S
M[S2] +- SI
S+-S

Load stack with literal
Load stack with contents of A
Load stack with memory specified by top of stack
Store stack in A
Store stack top in memory addressed by stack top -1
Duplicate top of stack

*Assembler Format
() Denotes contents of memory addressed by
Decrement register first
Increment register after
(g, Indirect
# Literal

+

which has the reverse Polish form:
DABC/

+ +-

and would normally be encoded on a stack machine as follows:
Load
Load
Load
Load

stack
stack
stack
stack

address of D
A
B
C

/

+

Store.
However, with the PDP-II, there is an address method for improving the program en-

coding and run time, while not losing the stack
concept. An encoding improvement is made by
doing an operation to the top of the stack from
a direct-memory location (while loading). Thus,
the previous example could be coded as:
Load stack B
Divide stack by C
Add A to stack
Store stack D
Use as a 1-Address (General Register)
Machine. The PDP-II is a general register

computer and should be judged on that basis
Benchmarks have been coded to compare the

256

THE PDP-11 FAMI LY

Table 2. Stack Computer Instructions and
Equivalent PDP-11 Instructions

PDP-ll with the larger DEC PDP-1O. A I6-bit
processor performs better than the DEC PDP10 in terms of bit efficiency, but not with time
or memory cycles. A PDP-II with a 32-bit-wide
memory would, however, decrease time by
nearly a factor of 2, making the times essentially
comparable.
Use as a 2-Address Machine. Table 3 lists
typical 2-address machine instructions together
with the equivalent PDP-ll instructions for
performing the same operations. The most useful instruction is probably the MOVE instruction because it does not use the stack or general
registers. Unary instructions that operate on
and test primary memory are also useful and
efficient instructions.

Common
Stack Instruction

Equivalent
PDP-11 Instruction

Place address value A on
stack

MOVE M. -(RO)*

Load stack from memory
address specified by stack

MOVE @(RO)+. -(RO)

Load stack from memory location A

MOVE A. -(RO)

Store stack at memory address specified by stack

MOVE (RO)+. @(RO)+

Store stack at memory location A

MOVE (RO)+. A

Duplicate top of stack

MOVE (RO). -(RO)

+. add two top data of stack
to stack

ADD (RO)+. @RO

-. X. /; ~ubtract. multiply.
divide

See add

-; negate top data of stack

NEG @RO

At- B; transfer B to A

MOVE B, A

Clear top data of stack

CLR @RO

At-A + B; add

ADD B. A

v; "inclusive or" two top
data of stack "and" two top
data of stack

BSET (RO)+. @RO

-, X, /

See add

Table 3. Two-Address Computer Instructions
and Equivalent PDP-11 Instructions
Two-Address Computer

-,; complement of stack

COM @RO

T est top of stack (set branch
indicators)

TST @RO

Branch on indicator

BR (-. ¢.

Jump unconditional

JUMP

Add addressed location A to
top of stack (not common
for stack machine) equivalent to: load stack. add swap
top two stack data

ADD A. @RO
MOVE
MOVE
MOVE
MOVE

<.

~.

>.

~)

(RO)+. R1
(RO)+. R2
R 1, -(RO)
R2, -(RO)

Reset stack location to N

MOVE N. RO
COM @RO

A. "and" two top stack data

BCLR (RO)+, @ROt-

• Stack pointer has been arbitrarily used as register RO for this
example.

PDP-11

At- - A; negate

NEGA

A t-A V B; inclusive or

BSETB.A

At- -,A; not

COM

Jump unconditional

JUMP

Test A, and transfer to B

TSTA
BR (-, ¢,

>. ~. <. ~) B

Extensions of the Instruction Set for Real
(Floating-Point) Arithmetic

The most significant factor that affects performance is whether a machine has operators
for manipulating data in a particular format.
The inherent generality of a stored program
computer allows any computer by subroutine to
sim ulate another - given enough time and memory. The biggest and perhaps only factor that

A NEW ARCHITECTURE FOR MINICOMPUTERS

separates a small computer from a large computer is whether floating-point data is understood by the computer. For example, a small
computer with a cycle time of 1.0 microsecond
and 16-bit memory width might have the following characteristics for a floating-point add,
excluding data accesses:
250 JlS

Programmed:
Programmed (but special
normalize and differencing
of exponent instructions):

LOGICAL DESIGN OF S(UNIBUS) AND Pc

Hardwired:

It should be noted that the ratios between
programmed and hardwired interpretation varies by roughly two orders of magnitude. The
basic hardwiring scheme and the programmed
scheme should allow binary program compatibility, assuming there is an interpretive program for the various operators in the Model 20.
For example, consider one scheme that would
add eight 48-bit registers that are addressable in
the extended instruction set. The eight floating
registers F would be mapped into eight doublelength (32-bit) registers D. In order to access the
various parts of F or D registers, registers FO
and Flare mapped onto registers RO to R2 and
R3 to R5.

The logical design level is concerned with the
physical implementation and the constituent
combinational and sequential logic elements
that form the various computer components
(e.g., processors, memories, controls). Physically, these components are separate and connected to the Unibus following the lines of the
PMS structure.
Unibus Organization

Figures 4 and 5 of Chapter 14 diagram the Pc
and the entering signals from the Unibus. The
control unit for the Unibus, housed in Pc for
the Model 20, is not shown in the figure.
The PD P-ll Unibus has 56 bidirectional signals conventionally used for programcontrolled data transfers (processor to control),
direct memory data transfers (processor or control-to-memory) and control-to-processor interrupt. The Unibus is interlocked; thus,

Floating-Point and Double-Word Data Instructions

Binary Ops
bop'

Since the instruction set operation code is aimost completely encoded already for byte and
word-length data, a new encoding scheme is
necessary to specify the proposed additional instructions. This scheme adds two instructions:
enter floating-point mode and execute one
floating-point instruction. The instructions for
floating-point and double-word data are shown
in Table 4.

75Jls

Microprogrammed
hardware:

Table 4.

S

0

Op

Floating Point/f

Double Word/d

.....

FMOVE
FADD
FSUB
FMUL
FDIV
FCMP

DMOVE
DADO
DSUB
DMUL
DDIV
DCMP

FNEG

DNEG

+
X
/
compare
unary ops
uop'
0

257

258

THE PDP-11 FAMILY

transactions operate independently of the bus
length and response time of the master and
slave. Since the bus is bidirectional and is used
by all devices, any device can communicate with
any other device. The controlling device is the
master, and the device to which the master is
communicating is the slave. For example, a
data transfer from processor (master) to memory (always a slave) uses the Data Out dialogue
facility for writing and a transfer from memory
to processor uses the Data In dialogue facility
for reading.
Bus Control. Most of the time the processor
is bus master fetching instructions and operands from memory and storing results in memory. Bus mastership is determined by the
current processor priority and the priority line
upon which a bus request is made and the physical placement of a requesting device on the
linked bus. The assignment of bus mastership is
done concurrent with normal communication
(dialogues).
Unibus Dialogues

Three types of dialogues use the Unibus. All
the dialogues have a common protocol that first
consists of obtaining the bus mastership (which
is done concurrent with a previous transaction)
followed by a data exchange with the requested
device. The dialogues are: Interrupt; Data In
and Data In Pause; and Data Out and Data Out
Byte.
Interrupt. Interrupt can be initiated by a
master immediately after receiving bus mastership. An address is transmitted from the master
to the slave on Interrupt. Normally, subordinate control devices use this method to transmit
an interrupt signal to the processor.
Data In and Data In Pause. These two bus
operations transmit slave's data (whose address
is specified by the master) to the master. For the
Data In Pause operation, data is read into the
master and the master responds with data
which is to be rewritten in the slave.

Data Out and Data Out Byte. These two
operations transfer data from the master to the
slave at the address specified by the master. For
Data Out, a word at the address specified by the
address lines is transferred from master to slave.
Data Out Byte allows a single data byte to be
transmitted.
Processor Logical Design

The Pc is designed using TTL logical design
components and occupies approximately eight
8 inch X 12 inch printed circuit boards. The Pc
is physically connected to two other components, the console and the Unibus. The control for the Unibus is housed in the Pc and
occupies one of the printed circuit boards. The
most regular part of the Pc is the arithmetic and
state section. The 16-word scratchpad memory
and combinational logic data operators, D
(shift) and D (adder, logical ops), form the most
regular part of the processor's structure. The
16-word memory holds most of the 8-word processor state found in the ISP, and the 8 bits that
form the Status word are stored in an 8-bit register. The input to the adder-shift network has
two latches which are either memories or gates.
The output of the adder-shift network can be
read to either the data or address parts of the
Unibus, or back to the scratchpad array.
The instruction decoding and arithmetic control are less regular than the above data and
state and these are shown in the lower part of
the figure. There are two major sections: the instruction fetching and decoding control and the
instruction set interpreter (which, in effect, defines the ISP). The later control section operates
on, hence controls, the arithmetic and state
parts of the Pc. A final control is concerned
with the interface to the Unibus (distinct from
the Unibus control that is housed in the Pc).
CONCLUSIONS

In this paper we have endeavored to give a
complete description of the PD P-ll Model 20

A NEW ARCHITECTURE FOR MINICOMPUTERS

computer at four descriptive leveis. These present an unambiguous specification at two levels
(the PMS structure and the ISP), and, in addition, specify the constraints for the design at the
top level, and give the reader some idea of the
implementation at the bottom level logical design. We have also presented guidelines for
forming additional models that would belong to
the same family.

APPENDIX.

259

ACKNOWLEDGEMENTS

The authors are grateful to Mr. Nigberg of
the technical publication department at DEC
and to the reviewers for their helpful criticism.
We are especially grateful to Mrs. Dorothy
Josephson at Carnegie-Mellon University for
typing the notation-laden manuscript.

DEC PDP-11 INSTRUCTION SET PROCESSOR DESCRIPTION (IN ISPL)

The following description gives a cursory description of the instructions in the ISPL, the initial
notation of Bell and Newell [1971]. Only the processor state and a brief description of the instructions are given.
Primary Memory State
M\Mb\Memory [0:2 16 - 1]<7:0>
Mw[0:2 15 -1]<15:0>:= M[0:2 16 -1]<7:0>

Byte memory
Word memory mapping

Processor State (9 words)
R\Registers [0:7]< 15:0>
SP< 15:0> : = R[6]< 15:0>
PC<15:0>:= R[7]<15:0>
PS<15:0>
Priority\P<2:0> : = PS<7:5>

Word general registers
Stack pointer
Program counter
Processor state register
Under program control; prIOrIty level of
the process currently being interpreted; a
higher level process may interrupt or trap
this process.

CC\Condition-Codes<3:0> := PS<3:0>
Carry\C := CC

A result condition code indicating an arithmetic carry from bit 15 of the last operation.

Negative\N : = CC<3>

A result condition code indicating last result was negative.

Zero\Z := CC<2>

A result condition code indicating last result was zero.

260

THE PDP-11 FAMILY

Overflow\ Y : = CC < 1>

A result condition code indicating an arithmetic overflow of the last operation.

Trace\ T := ST<4>

Denotes whet~r instruction trace trap is to
occur after each instruction is executed.

Undefined <7:0> : = PS< 15:8>

Unused
Denotes normal execution.
Denotes waiting for an interrupt.

Run
Wait
I nstruction Set

The following instruction set will be defined briefly and is incomplete. It is intended to give the
reader a simple understanding of the machine operation.
MaY (:= bop = 0001) -+ (CC,D +- S);
MOYB (:= bop = 1001) -+ (CC,Db +- Sb);
Binary Arithmetic: 0 +- 0 b S;
ADD (:= bop = 0110) -+ (CC,D +- 0 + S);
SUB (:= bop = 1110) -+ (CC,D +- 0 - S);
CMP (: = bop = 0010) -+ (CC +- 0 - S);
CMPB (:= bop = 1010) -+ (CC +- Db - Sb);
MUL (:= bop = 0111) -+ (CC, 0 +- 0 X S)
DIV (:= bop = 1111) -+ (CC, 0

U nary Arithmetic: 0

+-

+-

DIS);

Move word
Move byte

Add
Subtract
Word compare
Byte compare
Multiply, if 0 is a register then
a double length operator
Divide, if 0 is a register, then a
remainder is saved

uS;

CLR (: = uop = 050 8) -+ (CC,D +- 0);
CLRB (: = uop = 10508) -+ (CC,Db +- 0);
COM (:= uop = 051 8) -+ (CC,D +- -,0);
COMB (:= uop = 1051 8) -+ (CC,Db +- ,Db);
INC (:= uop = 052 8 ) -+ (CC,D +- 0 + 1);
INCB (:= uop = 10528) -+ (CC,Db +- Db + 1);
DEC (:= uop = 053 8) -+ (CC,D +- 0 - 1);
DECB (: = uop = 1053 8) -+ (CC,Db +- Db - 1);
NEG (: = uop = 054 8) -+ (CC,D +- - D);
NEGB (: = uop = 10548) -+ (CC,Db +- - Db)
ADC (: = uop = 055 8) -+ (CC,D +- 0 + C);
ADCB (:= uop = 1055 8) -+ (CC,Db +- Db + C);
SBC (:= uop = 056 8) -+ (CC,D +- 0 - C);

Clear word
Clear byte
Complement word
Complement byte
Increment word
Increment byte
Decrement word
Decrement byte
Negate
Negate byte
Add the carry
Add to byte the carry
Subtract the carry

A NEW ARCHITECTURE FOR MINICOMPUTERS

SBCB (:= uop = 10568) ~ (CC,Db +- Db - C);
TST (:= uop = 057 8) ~ (CC +- D);
TST (:= uop = 1057 8) ~ (CC +- Db);

261

Subtract from byte the carry
Test
Test byte

Shift Operations: D +- D X 2n;
ROR (:= sop = 0608 ) ~ (l -; D +- C 0 D/2!rotatel);
RORB (:= sop = 10608 ) ~ (C :J Db +- C 0 Db/2!rotatel);
ROL (:= sop = 061 8) ~ (C 0 D +- COD X 2 {rotatel);
ROLB (:= sop = 1061 s) ~ (C 0 Db +- C 0 Db X 2 {rotatel);
ASR (:= sop = 062s) ~ (CC,D +- D X 2);
ASRB (:= sop = 1062 8) ~ (CC,Db +- Db/2);
ASL (: = sop = 063 8) ~ (CC,D +- D X 2);
ASLB (:= sop = 1063 s) ~ (CC,Db +- Db X 2);
ROT (:= sop = 064 8) ~ (C 0 D +- D X 2S);
ROTB (:= sop = 10648) ~ (C 0 Db +- D X 2s);
LSH (:= sop = 065 s} ~ (CC,D +- D X 2s{logicall);
LSHB (: = sop = 1065 s) ~ (CC,Db +- Db X 2s{logical});
ASH (: = sop = 066 s) ~ (CC,D +- D X 2 S);
ASHB (: = sop = 1066s) ~ (CC,Db +- Db X 2 S);
NOR (:= sop = 067s~(CC,D +- normalize (D»;
(R[r'] ~ normalizeL.....Jexponent (D»;
NORD (: = sop = 1067 8 ~ (Db +-normalize (Dd»;
(R[r'] +- normalizeL.....Jexponent (D»;
SWAB (:= sop = 3) ~ (CC,D +- D<7:0, 15:8»

Rotate right
Byte rotate right
Rotate left
Byte rotate left
Arithmetic shift right
Byte arithmetic shift right
Arithmetic shift left
Byte arithmetic shift left
Rotate
Byte rotate
Logical shift
Byte logical shift
Arithmetic shift
Byte arithmetic shift
Normalize
Normalize double
Swap bytes

Logical Operations
BIC (: = bop = 0100) ~ (CC,D +- D +- D /\ -, S);
BICB (:= bop = 1100) ~ (CC,Db +- Db V ISb);
BIS (:= bop = 0101) ~ (CC,D +- D V S);
BISB (:= bop = 1101 ~ (CC,Db +- Db V Sb);
BIT(:= bop = 0011)~(CC+-D /\ S);
BITB (: = bop = 1011) ~ (CC +- Db 1\ Sb);

Bit clear
Byte bit clear
Bit set
Byte bit set
Bit test under mask
Byte bit test under mask

Branches and Subroutines Calling: PC +- f;
JMP (:= sop = 0001 8) ~ (PC +- D');
BR (: = brop = 01 16 ) ~ (PC +- PC + offset);
BEQ (:= brop = 03 16 ) ~ (Z ~ (PC +- PC + offset»;
BNE (:= brop = 02 16 ) ~ (,Z ~ (PC +- PC + offset»;
BLT (:= brop = 05 16 ) ~ (N ~ V ~ (PC +- PC + offset»;
BGE (: = brop = 04 16) ~ (N == V ~ (PC +- PC + offset»;
BLE (:= brop = 07 16 ) ~ (Z V (N ~ V) ~ (PC +- PC + offset»;

Jump unconditional
Branch unconditional
Equal to zero
Not equal to zero
Less than (zero)
Greater than or equal (zero)
Less than or equal (zero)

262

THE PDP-ll FAMILY

BGT (: = brop = 06 16 ) -4 (--, (Z V (N (j1 V» -4 (PC +- PC
offset»;
BCS/BHIS (: = brop = 87 16) -4 (C -4 (PC +- PC + offset»;

+

BCC/BLO (:= brop = 86 16) -4 (IC -4 (PC +- PC + offset»;
BLOS (:= brop = 83 16) -4 (C /\ Z -4 (PC +- PC + offset»;
BHI (: = brop = 82 16 ) -4 «I C V Z) -4 (PC +- PC + offset»;
BVS (: = brop = 85 16 ) -4 (V -4 (PC +- PC + offset»;
BVC (: = brop = 84 16) -4 (IV -4 (PC +- PC + offset»;
BMT (: = brop = 81 16) -4 (N -4 (PC +- PC + offset»;
BPL (: = brop = 80 16 ) -4 (I N -4 (PC +- PC + offset»;
JSR (: = sop = 0040 8) -4
(SP +- SP - 2; next
M [SP] +- R[sr];
R[sr] +- PC; PC +- D);
RTS(: = i = 000200 8) -4 (PC +- R[dr];
R[dr] +- M[SP]; SP +- SP + 2);

Less greater than (zero)
Carry set; higher or same (unsigned)
Carry clear; lower (unsigned)
Lower or same (unsigned)
Higher than (unsigned)
Overflow
No overflow
Minus
Plus
Jump to subroutine by putting
R[sr], PC on stack and loading
R[sr] with PC, and going to
subroutine at D )
Return from subroutine

Miscellaneous Processor State Modification:
R TI (:

= i = 2 8) -4 (PC

+- M[SP];
SP +- SP + 2; next
PS +- M[SP];
SP +- SP + 2);
HALT (: = i = 0) -4 (Run +- 0);
WAIT (: = i = 1) -4 (Wait +- 1);
TRAP (: = i = 3) -4 (SP +- SP + 2; next
M[SP] +- PS;
SP +- SP + 2; next
M[SP] +- PC;
PC +- M [34 8 ];
PS +- M[12]);
EMT (: = brop - 82 16 ) -4 (SP +- SP + 2; next
M[SP] +- PS;
SP +- SP + 2; next
M[SP] +- PC;
PC +- M[30 s];
PS +- M[32s]);
lOT (: = i = 4) -4 (see TRAP)
RESET (: = .i = 5) -4 (not described)
OPERATE(: = i<5:15> = 5)-4
(i <4> -4 (CC +- CC V i <3:0»;
--1i<4> -4 (CC +- CC /\ -, i<3:0»);
end Instruction L-.-I execution

Return from interrupt

Trap to M [34 s] store status
and PC

,Enter new process
Em·ulator trap

I/O trap to M[20 s]
Reset to external devices
Condition code operate
Set codes
Ciear codes

Cache Memories for PDP-11
Family Computers
WILLIAM D. STRECKER

INTRODUCTION

One of the most important concepts in computer systems is that of a memory hierarchy. A
memory hierarchy is simply a memory system
built of two (or more*) memory technologies.
The first technology is selected for fast access
time and necessarily has a high per-bit cost.
Relatively little of the memory system consists
of this technology. The second technology is selected for low per-bit cost and necessarily has a
slow access time. The bulk of the memory system consists of this technology. The use of the
hierarchy is coordinated by user software, system software, or hardware so that the overall
characteristics of the memory system approximate the fast access of the fast technology, and
the low per-bit cost of the low cost technology.
An example of a user software managed hierarchy is core/disk overlaying; an example of a
system software managed hierarchy is core/disk
demand paging. The prime example of a hardware managed hierarchy is a bipolar cache/core
memory system.

Until recently, the concept of cache memory
appeared only in very large scale, performanceoriented computer systems such as the IBM
360/85 [Conti, 1969; Conti et al., 1968] and 370
models 155 and larger. Recently a small cache
was announced as an option for the DG Eclipse
[Data General, 1974] computer system. A
larger, internal cache memory is part of a recently announced Digital PDP-II family computer system: the PDP-II/70 [DEC, 1975]. The
content of this paper is a summary of the research done on the feasibility of using a bipolar
cache/core hierarchy in PDP-II family computer systems.
CACHE MEMORY

A cache memory is a small, fast, associative
memory located between the central processor
Pc and the primary memory Mp. Typically the
cache is implemented in bipolar technology
while Mp is implemented in MOS or magnetic

* Memory

hierarchies can, of course, consist of three or more technologies. Discussion and analysis of these multilevel
hierarchies is a fairly obvious generalization of the discussion and analysis given here.
263

264

THE PDP-ll FAMILY

core technology. Stored in the cache are address
data AD pairs consisting of an Mp address and
a copy of the contents of the Mp location corresponding to that address.
The operation of the cache is as follows.
When the Pc addresses M p, the address is first
compared against the addresses stored in the
cache. If there is a match, the access is performed on the data portion of the matched AD
pair. This is called a hit and is performed at the
fast access time of the cache. If there is no
match - called a miss - Mp is accessed as usual.
Generally, however, an AD pair corresponding
to the latest access is stored in the cache, usually
displacing some other AD pair. It is the latter
procedure which tends to keep the contents of
the cache corresponding to the Mp locations
most commonly accessed by the Pc. Because
programs typically have the property of locality
(i.e., over short periods of time most accesses
are to a small group of Mp locations), even relatively small caches have a majority of Pc accesses resulting in hits. The performance of a
cache is described by its miss ratio - the fraction
of all Pc references which result in misses.
CACHE ORGANIZATION

There are a number of possible cache organizational parameters. These include:
1.
2.
3.
4.
5.

The size of the cache in terms of data
storage.
The amount of data corresponding to
each address in the AD pair.
The amount of data moved between Mp
and the cache on a miss.
The form of address comparison used.
The replacement algorithm which decides which AD pair to replace after a
miss.

6.

The time at which Mp is updated on
write accesses.

The most obvious form of cache organization
is fully associative with the data portion of the
AD pair corresponding to basic addressable
unit of memory (typically a byte or word), as
indicated by the system architecture. On a miss,
this basic unit is brought into the cache from
Mp. However, for several reasons, this is not
always the most attractive organization. First,
because procedures and data structures tend to
be sequential, it is often desirable, on a miss, to
bring a block of adjacent Mp words into the
cache. This effectively gives instruction and
data pre-fetching. Second, when associating a
larger amount of data with an address, the relative amount of the cache storage which is used
to store data is increased. The number of words
moved between Mp and the cache is termed the
block size. The block size is also typically the
size of the data in the AD pair* and is assumed
to be that for this discussion.
In a fully associative cache, any AD pair can
be stored in any cache location. This implies
that, for a single hardware address comparator,
the M p address must be compared serially
against the address portions of the AD pairs which is too slow. Alternatively there must be a
hardware comparator for each cache location which is too expensive. An alternative form of
cache organization which allows for an intermediate number of comparators is termed set
associative.
A set associative cache consists of a number
of sets which are accessed by indexing rather
than by association. Each of the sets contains
one or more AD pairs (of which the data portion is a block). There are as many hardware
comparators as there are AD pairs in a set. The

* In a few complex cache organizations such as that used in the I~M

3~O/85,.the size ~fth~ D porti<:>n of the AD pair (called a
sector in the 360/RS) is larger than the block size. That potentIal wIll be Ignored In thIs dIscussIon.

CACHE MEMORIES FOR PDP-11 FAMILY COMPUTERS

understanding of the operation of a set associative cache is aided by Figure 1. The n bit Mp
address is divided into three fields of 1, i, and b
bits. Assume that there are 2i sets. The i-bit index field selects one of these sets. The A portion
of each AD pair is compared against the I-bit
label field* of the Mp address. If there is a
match, the b-bit byte field selects the byte (or
other sub-unit) in the D portion of the matched
AD pair.

~---- ----~~---- ----~----b~

LABEL

Figure 1.

INDEX

BYTE

265

This strategy is termed \-'{rite-through. Alternatively, only the cache can be updated on a write
hit, and only when the updated AD pair is replaced on some future miss is M p updated. This
strategy is termed write-back. The choice between these two strategies involves systems considerations which are beyond the scope of this
paper.t
There are other possible asymmetries in the
handling of reads and writes. One possibility is
that after a write miss an AD pair corresponding to that access is not stored in the cache. This
is termed no-write-allocate. The alternative is,
of course, termed write-allocate.

Address fields for a set associative cache.

CACHE MEMORY SIMULATION

If there is no match, Mp is accessed and (generally) a new AD pair is moved into the cache.
Which of the AD pairs to be replaced in the set
is selected by the replacement algorithm. Typical replacement algorithms are: first in, first out
(FIFO); least recently used (LRU), or random
(RAND).
There are two limiting cases of the set associative organization. When the number of sets is
the cache size in blocks, only a single hardware
comparator is needed. The resulting organization is called direct mapped. It is the simplest
form of cache organization. When there is only
one set, clearly a fully associative cache results.
So far in the discussion there has been no distinction made between read and write accesses.
When the Pc makes a write access, ultimately
Mp must be updated. There are two obvious
times when this can be done. First is at the time
the write access is made. Both Mp and the cache
(if there is a hit) are updated simultaneously.

The understanding of memory hierarchies
(and programs) has not reached the point where
cache performance can be predicted analytically
as a function of cache organizational parameters. As a consequence, the studying of cache
memory behavior is done through simulation.
(Some cache simulation results for other computer architectures are reported in [Conti et al.,
1968; Meade, 1970; Bell et al., 1974; Gibson,
1967].) For the purposes of this study, a two
part simulator was constructed.
The first part was a PDP-II simulator. This is
a PDP-II program which runs other PDP-II
programs interpretively. A variety of properties
of the interpreted programs can be collected, including the sequence of generated Mp addresses. The latter is termed an address trace.
The address trace is processed by the second
part, the cache simulator. This is parameterized
by cache organization and determines the miss
ratio for a given address trace.

* Note that, in a set associative cache, only the label field must be stored in the cache AD pair - not the entire Mp address.
t For the PDP-II /70 system, write-through was chosen. The main impact of this is that each write access, as well as each read
miss, results in an Mp access. Data suggests that, in PDP-lIs, about iO percent of Pc accesses are writes.

266

THE PDP-11 FAMILY

CACHE SIMULATION RESULTS

Since the performance of cache memory is a
function not only of cache organization parameters but also of the program run, it is desirable
to run cache simulations with a wide variety of
programs. Multiplying these by a wide variety
of a cache's organizational parameters to be
simulated resulted in a considerable amount of
simulation data of which only the highlights are
reported here.
The first experiment was to determine the approximate overall size of the cache memory.
Plots of the miss ratio against cache size for several programs* are given in Figure 2. (All sizes
in both the figures and the discussion are 16-bit
PDP-II words.) A block size of two and a set
size of one were held constant. In general, the
miss ratio falls rapidly for caches up to 1024
words and falls less rapidly thereafter.
Figure 3 depicts the effect of set size (associativity) on cache performance. In order to clarify the results, Figures 3 through 6 only contain
simulation data for a single program (the
Macro assembler) which had the highest miss

ratio in Figure 2. As expected, a larger set size
reduces the miss ratio. The largest improvement
occurs in going from set size one to set size two.
Although not shown, even going to fully associative cache has little further effect on the miss
ratio.

0.4,------------...,
BLOCK SIZE = 2
SET SIZE = 1

2

0.3

I-

FORTRAN'-

rJl
rJl

~

MACRO
ASSEMBLER

•

~

0.2

COMPI~:

"-

FTNP~::~:,

0.1

(GAUSS)~'"
FTN Ex. _ _ ~~i~

.,

~~:

(FFT)

~.

256

512
1024
CACHE SIZE

2048

Figure 2. Effect of cache size on
miss ratio.

* These programs are

system and user programs running under the PDP-II DOS operating system. They include a Macro
assembler, FORTRAN compiler, PIP (a file utility program), and FORTRAN executions of numerical applications. The
range of miss ratios is typical for the much wider group of programs actually simulated. Indeed, the miss ratio for the Macro
assembler for a given cache size was the worst of any program simulated.

CACHE MEMORIES FOR PDP-11 FAMILY COMPUTERS

0.1

~I.::J=:::.:==:::

0.4

o
;:
~

0.05

r-------------,

"\

0.3

tI)

!!!

267

0.2

CACHE SIZE

= 1024

":~~ :::: ::

~

CACHE SIZE
SET SIZE
BLOCK SIZE

= 1024
=2
=

2

----'----I..-="_e.J----_e----l

' : ,---t
FIFO

RAND

LRU

300

3000

30000

CLEAR INTERVAL (ACCESSES)

Figure 5. Effect of replacement
algorithm and write allocation on
miss ratio.

In Figure 4, the impact of block size is shown.
Especially in smaller caches, going to a larger
block significantly reduces the miss ratio. This
is a result of a smaller cache depending more on
the pre-fetching effect for its performance.
The effect of write allocation and replacement algorithm is given in Figure 5. For the
program considered, there is a negligible performance difference across the different strategies.
In Figure 6, the effect of periodically clearing
the cache is depicted. This approximates the effect on the cache of rapid context switching in
that, when a new program is brought in, the
cache appears "clear" to it. Even completely
clearing the cache every 300 Pc accesses only
degrades the miss ratio to 0.3. This represents a
worst case condition that would be unrealized
in practice. For example, the "new program"
brought in every 300 Pc references might be an

Figure 6. Effect of clear interval on
miss ratio.

interrupt handler. Any program running that
often would typically find that the cache always
contained information relevant to it. Indeed,
for the cache organization given, it is impossible
in 300 accesses to significantly clear a 1024word cache.
CO NCLUSIONS

The performance goals of the PDP-II/70
computer system required the typical miss ratio
to be 0.1 or less. Analysis of the preceding data,
with emphasis on the breaks in the curves, suggested that the optimal organization was a
cache size of 1024 words, block size of two
words, and a set size of two. Because the data
suggests that the replacement algorithm and
write allocation strategies have negligible effect,
a no-write-allocate strategy and a random replacement algorithm were selected.

Buses, The Skeleton of
Computer Structures
JOHN V. LEVY

INTRODUCTION

A bus is a communication pathway connecting two or more electrical devices. In the
context of minicomputer design, buses are the
physical and electrical structures that determine
how the building blocks are interconnected.
In every computer system, there are many
buses: internal pathways connect the registers
and arithmetic logic of a central processor; input/output pathways connect processors, memories, and peripheral devices; and external
communication buses attach computer systems
to the telephone and other data communication
pathways. In this chapter, the discussion is restricted to buses that interconnect computer
system components that are designed by different engineering groups.
This particular approach may sound out of
place, but one of the most important functions
of a bus is to provide a well specified interface
between complex subsystems. We exclude from
discussion internal processor register transfer

buses, as well as external buses whose specifications are determined by engineers not involved
in the minicomputer design process. Although
none of the examples in this chapter is drawn
from multiprocessor systems, most of the design experience presented is relevant to such
systems.
What Does a Bus Do 1

A bus is a communication medium. Each one
exists in order to transfer information from
place to place within a computer system. In this
chapter, we attempt to illustrate the complexities of bus design by drawing on the real
history of some PDP-II Family designs.* In
computer systems being manufactured and
sold, the success of bus designs is measured by
the following criteria:
1.

Does the bus successfully establish the
communication pathway required?

* All

of the real buses presented as examples are proprietary products of Digital Equipment Corporation. protected by
United States and foreign patents.

269

270

2.

3.

4.

5.

6.

THE PDP-11 FAMILY

Is the bus well specified (and well documented), so that a series of interfaces designed either concurrently or over a
period of time by different engineers will
in fact be compatible?
Does the bus avoid imposing unnecessarily strict performance constraints on
the system?
Is the cost of the bus and its connections
commensurate with the computer system
and the bus' role in it?
Does the bus design anticipate expansion of the system in the future (without
excessive cost)?
Can the bus be manufactured and tested
in high volume production without excessive hand-crafting or tuning?

Beyond the scope of this chapter are some additional functions of buses, such as providing a
means to diagnose and repair the system components connected to it and to allow measurement of system loads and performance.
Why Buses Are Important

As the above list of criteria suggests, there are
many ways in which poor bus design can spoil
the performance or cost/performance ratio of
an otherwise well designed computer system.
Failure to anticipate future expansion of a computer system is a common problem in bus designs. The PDP-II Unibus, a very successful
bus, first became inadequate as the main interconnection pathway when processor and memory speeds surpassed the bandwidth capability
of the Unibus. Later, the Unibus I8-bit memory
address width became a limitation.
Computer design is driven by advances in
semiconductor technology. Every time the cost
of the components of a computer subsystem decreases by, say, 50 percent, the subsystem is
redesigned to take advantage of the lower cost.
At present, the performance/cost (or storage
capacity Icost) ratio for logic and memory is increasing at a rate of up to 100 percent per year.

But the bandwidth/cost and other performance
ratios of interconnections are steady or decreasing slightly. As a result, bus designs tend to persist in time across several redesigns of the other
computer system components. This justifies the
extensive engineering effort required in the initial design of a bus.
How Buses Are Designed

To design a bus, the engineer must first find
out what system components are to be interconnected. Then, studying the requirements of
communications between these components,
the engineer chooses a structure. Finally, the
cost constraints and available technologies lead
to a choice of implementation.
The five-function model given below is not a
set of bus designs but a functional model that
results from taking the commonly used minicomputer building blocks and asking: What
communications need to occur between this
component and each other component? The
model shows the five types of communications
which were the answers to that question. The
five functional pathways are the maximum
number of interconnections that would be useful in a conventional single-processor minicomputer. Real bus designs combine these
functions in cost-effective implementations.
After choosing the structure and functions of
buses, the engineer must write a specification.
This is crucial to the success of bus design if it is
to be interfaced by a number of different engineers. As an example of the detail that can go
into a bus specification, Figures I, 2, and 3
show how the Massbus Data Read operation
has been specified in a DEC internal engineering document.
After writing a specification, the engineer
builds a prototype and tests it. If other engineers concurrently build interfaces to the bus,
discrepancies, errors, and misunderstandings
will be uncovered sooner. Finally, it is important that the specification be maintained, updating it to conform to the latest known design
constraints. A very useful appendix to a bus

BUSES. THE SKELETON OF COMPUTER STRUCTURES

DATA BUS READ SEQUEr"CE
1.

2.
3.

4.

5.
6.

7.
8.

9.
10.

11.
12.
13.

14.

CONTROLLER

271

DRIVE

A read command is loaded into the Control register of the drive. If the
command is valid. the drive enables its data bus receivers and drivers
and asserts OCC.
Not more than 100 microseconds after step 1. the controller asserts
RUN.
After a cable delay. the drive receives the RUN assertion. Disk drives
now begin searching for the desired sector. Tape drives begin tape motion.
When the drive has read the first data word. it generates parity for the
word; the data and DPA are gated onto the data lines and SClK is
asserted.
After a cable delay. the controller receives the SClK assertion.
The drive negates SClK no less than T nanoseconds after asserting it.
where T is either 225 nanoseconds or 30 percent of the nominal burst
data period of the drive. whichever is greater. The Data lines should be
maintained valid for no less than one half of the SClK interval after
SClK is negated.
After a cable delay. the controller receives the SClK negation. The
controller strobes the D lines and DPA and checks parity.
If there is more data to be read in this block. then not less than T
nanoseconds after step 6. the drive gates out the next data word onto
the D lines. generates DPA. and asserts SClK. Steps 5.6. and 7 then
follow.
After the negation of SClK (step 6) on the last word of data in the
block. the drive asserts EBl.
After a cable delay. the controller receives the EBl assertion. At this
time. the controller must decide whether or not to have the drive read
the next block of data without disconnecting from the data bus (the
controller may already have negated the RUN line).
If the controller decides not to read the next block. it negates the RUN
line not later than 500 nanoseconds after step 10.
After a cable delay. the drive receives the RUN negation (the RUN line
may already have been negated).
Not less than 1500 nanoseconds after step 9. the drive negates EBl. At
this time the drive strobes the RU N line. If R UN has been negated. the
drive disconnects from the data bus (the DRY bit should be set and
OCC negated at this time!.
After a cable delay. the controller receives the EBl negation (the controller may now generate an end-of-transfer interrupt and start another
data transfer).

Figure 1. The Massbus Data Read operation as
described in the Massbus specification.

specification is a list of the design problems that
came up during the engineering of connections
to it and the details of how they were resolved.
This was done for the Massbus, in a section of
the specification called "Design Notes."

T = 225 ns OR .3P,
WHICHEVER IS GREATER
P = NOMINAL BURST
OATA PERIOO OF
DRIVE

NOTE:
Minimum time from one assertion of SCLK to the next is either
500 ns or P. whichever is greater; maximum unspecified.

FUNCTIONS OF BUSES IN COMPUTER
SYSTEMS: A FIVE-FUNCTION MODEL

The functional building blocks of computers
are central processing units, primary memory,
input/output controllers, and peripheral units.
Peripherals tend to be classed as either secondary memory or transducers (usually terminals).
Figure 4 shows these components in a traditional single-processor minicomputer system.
Five different paths are shown interconnecting
these components. These paths do not represent

Figure 2. The Data Read flowchart in the
Massbus specification.

actual buses. Instead, we have considered each
pair of components in the system and asked
whether they need to communicate with each
other. If so, a pathway between the pair has
been inserted. This leads to a model which has
more interconnection pathways than a typical
computer has.

272

THE PDP-11 FAMILY

t

t
t
t
1
I
I
: L---}----t---+--

RUN(C)(n~

II

I
1
1
1
~--------__
i
+-1___-+1I
___i
+ ___
+1_......

L

EBl (C)(R) _ _ _ _ _ _ _ _ _ _ _ I

1

1

1

SClK (C)(R) _ _ _ _ _ _ _ _---J

OCC (D)(nJ

o (O:17)}
DPA
(D)(T)

~WORD1

I

WORD 2

1

I

WORD 4

WORD 3

1

1

l,-

RUN (D)(R)

L

EBl (D)(T)

t 1
2

(C)
(D)
(T)
(R)

t!tt!!fn!tl!!l f! t t f f f

3

= AT THE CONTROllER

46767676

NOTES

= AT THE DRIVE
= TRANSMITIING
= RECEIVING

Figure 3.

1.
2.

U
T
P

100 IJ.s. max
200 IJ.S. max

7

10

11

12

13

= UNSPECIFIED MAXIMUM
= 225 OR 30% OF P WHICHEVER IS GREATER
= NOMINAL BURST DATA PERIOD OF DRIVE

The timing diagram of a Data Read in the Massbus specification.

D1

lEGEND
A
ADDRESS PER WORD
B = BLOCK TRANSFER
CONTROL
C
D
DEVICE
E
EXTERNAL

=

=
=

=

Figure 4.

A five-function model of computer buses.

14

BUSES, THE SKELETON OF COMPUTER STRUCTURES

Table 1.

273

Requirements for the Five Pathway Types
Pathway Types

A

B

C

D

E

Requirement

CPUMemory

ControllerMemory

CPUController

ControllerPeripheral

ControllerExternal

Memory
Address

Large: 222
(one
address
per word)

Large: 222
(one
address
per block)

None

None

None

Maximum
Number of
Connections

Small: 24

Small: 24

Medium: 2 6

Smalllarge: 2 8

Smalllarge: 2 8

Latency
Tolerance

Low
(0.5 ~s)

High

Medium

(50~s)

(5~s)

Mediumhigh

Mediumhigh

Bandwidth

High
(5 Mbytes!s)

Medium
(1.2 Mbytesls)

Low
(0.1 Mbytesls)

Low-high

Low-high

Length

Short
(3 meters)

Medium
(30 meters)

Medium
(30 meters)

Medium-long
(to 300 meters)

Medium-long

In real computer systems, the functions of these
pathways are combined into multifunction
buses in order to get economical designs.
There are five types of interconnection shown
in Figure 4, labeled A, B, C, D, and E. These
labels have the mnemonic value given in the figure legend.
Pathway A, connecting the central processor
(CPU) with the memory, is used to transfer instructions and data. This pathway is distinguished by requiring one address per word.
Pathway B connects one or more mass storage and communication controllers to the
primary memory. It is distinguished by being a
block transfer medium. Only one memory address per block transfer is needed because the
data is stored in consecutive memory locations.
Pathway C is the control pathway. I/O commands are sent over this path from the CPU to
the I/O controllers, and status information is
returned from the controllers. I/O controllers
can also cause an interruption to the CPU over

this path. Small amounts of data are sometimes
transferred over this path, for example, characters moved to and from a console terminal.
Pathways labeled D connect I/O controllers
with their peripheral devices. In Figure 4, Pathway DI represents a disk connection and D2 a
multiple terminal connection path. The terminal interconnection does not normally transfer
blocks of data. Both DI and D2 carry control
information as well as data.
Finally, pathway E represents a connection
to external communication lines. Usually, the
computer designer does not have control over
the specification of such external pathways.
Five key parameters or requirements for
these pathways affect cost and performance and
are often traded against each other. Table 1
summarizes these requirements for the five
types of pathways.
Memory addressing means selecting a word or
block of words within the address space of the

274

THE PDP-11 FAMILY

memory subsystem. Memory address bits are
no different from data bits, from the standpoint
of the bus designer. Both must be transmitted
from one bus connection to another. However,
type A pathways must transmit one address per
word accessed, while type B pathways need only
send one address per block of words. This difference can be exploited to gain lower cost
buses in systems which implement separate
buses for the A and B path functions.
The maximum number of connections to a bus
tells us how many signals must be used to select
a destination for a data transfer on the bus.
Typically, a bus will carry some number, n, of
"select" signals, and therefore be able to deliver
data to as many as 2n connections. On a type A
pathway, a CPU accesses connections which
contain memory. We do not typically need
more than four "select" signals, allowing up to
16 memory connections. In the case of multiprocessor shared-memory systems, it may be
necessary for some additional select codes to be
available to identify the processor that is the
destination for data from memory.
Latency tolerance refers to how long a delay
(latency) a connection can tolerate, after it decides to make a data (word) transfer, until the
transfer is complete. Bandwidth refers to how
many data (word) transfers per second can be
made.
Latency is different from bandwidth: latency
refers to the interval, for anyone data word
transfer, from the time it is initiated until it is
completed. Bandwidth is the repetition rate at
which the initiation and completion of word
transfers can be sustained over a given period of
time. In particular, peak bandwidth - the maximum possible repetition rate - is a parameter
which strongly affects the cost of a bus, and is
the bandwidth we refer to here.
Type A pathways require both low latency
and high bandwidth. The performance of a
CPU -memory system depends heavily on the
rate (bandwidth) at which words can be delivered to the central processor. Furthermore, the

Comments on Unibus Addressing

Transfers on the Unibus are not directed by the selection mechanism just
described. Instead, there is the single
concept of memory addresses. Each
data transfer (type A or type B) on the
Unibus is directed to or from a 1- or 2byte section of memory. The memory
address is broadcast to all connections.
If one of the connections recognizes the
address as being one of its own, then it
participates in the data transfer. This
anonymity allows a very large number
of connections to be made to the
Unibus, with each connection implementing a locally determined number
of memory bytes.
For control transfers (type C), the
Unibus has a concept called the "I/O
page." A block of memory addresses
(the I/O page) is reserved for use in accessing control and status registers in
peripheral controllers and in the central processor. The uppermost 8,192
bytes of memory are never implemented in real memory. Instead, small
segments are assigned (by administrative procedures) to each I/O controller type. Each controller responds
to data transfers to and from addresses
within its assigned segment.
No fixed amount of address space
need be allocated to a given controller.
If two controllers of the same type are
connected to a Unibus, one 'of them is
assigned to a "floating" address segment, an area reserved for such conflict
resolution.
Unibus I/O controllers that perform
Direct Memory Access (DMA) do so
by making data transfers to memory at
addresses below the I/O page. Block
transfers are performed a word at a
time to or from successive memory addresses, with the incrementing address
being maintained by the I/O controller.
An I/O controller on the Unibus
causes an interruption by doing a special control transfer whose destination
is always the CPU. The interrupting
controller transmits an "interrupt vector" as the data. The address lines of
the Unibus are not used in this transfer.

BUSES, THE SKELETON OF COMPUTER STRUCTURES

CPU instruction execution and memory access
times are typically closely matched. Therefore,
the performance of the system is also very dependent on low latency in the CPU-memory
pathway. In this type of pathway, effective
bandwidth and latency are directly (inversely)
related to each other.
On a type B pathway, high bandwidth is also
typically required. Usually, this is the path by
which disk and other mass storage data is
moved to and from memory. In most cases, the
rate at which data is transferred is determined
by the disk subsystem. In minicomputer systems developed through 1977, the bandwidth
required has not exceeded 1.2 megabytes per
second for an individual disk controller-tomemory pathway.
Type B pathways, on the other hand, tolerate
relatively long latencies. If there is sufficient
buffering of data at the controller, system performance is relatively insensitive to delays of as
much as 100 to 1000 microseconds in starting up
a block transfer. The insensitivity is due to the
dominance of relatively long delays already present in disk data accessing. (Mechanical positioning, both rotational and radial, may take
tens of milliseconds in a typical disk access.)
Type C pathways - the control and interruption links - do not require high bandwidth
compared with CPU instruction and DMA
data activity. I/O control commands are issued
relatively infrequently compared with the instruction execution rate in the CPU. Interruptions typically occur even less frequently.
However, latency tolerance is not very high on
the control pathway: it is important for interruptions to be delivered promptly, and CPU instructions that access I/O control and status
registers usually are prevented from completing
until the access has been completed. Therefore,
Table 1 shows latency tolerance as "medium"
(1 to 10 microseconds) for type C pathways: it is
permissible to take a little longer to complete an
I/O control instruction than other instructions,
but not so long as initiating a block transfer
from a disk.

275

Type D and E pathways handle interactions
which are a mixture of type B and type C.
Therefore, their requirements for latency and
bandwidth vary over the range shown for types
Band C.
Length refers to the maximum possible distance along the pathway from one connection
to another. Maximum length is important because it affects both performance and cost of a
bus. The CPU to memory pathway (type A) has
been shrinking in length in recent computer designs because of the relationship between latency and length. The speed of light (or, more
properly, of signals in a wire) sets the minimum
delay between request and response. As a result,
we see memories and central processors more
frequently packaged together or in very close
proximity. Fortunately, the continual size reduction of a given amount of CPU logic or
memory has encouraged this trend. The current
length range of a type A pathway for minicomputers is approximately 0.1 to 3 meters.
High speed block transfer I/O controllers
also tend to be packaged closer to the memory
in recent system designs. But since there may be
many controllers, the length of the type B pathway may have to be two to ten times longer
than the CPU-memory pathway (0.2 to 30 meters ).

Design Tradeoffs

Control pathways connecting the central processor to all I/O controllers often have to be
extended out of the CPU-memory package to
reach peripheral subsystem packages. These
tend to be the longest pathways in a system.
Frequently, the design choice in connecting a
peripheral to a minicomputer system is between: (1) extending the main types Band C
buses out to reach the farthest peripherals and
(2) designing type D buses that extend from a
centrally packaged controller to a remote peripheral. Alternative (2) gives maximum flexibility and performance. But it costs more than

276

THE PDP-11 FAMILY

(1) and may lead to a proliferation of buses in
the computer system. Figure 5 shows the two
alternatives.
All parameters shown in Table 1 contribute
to cost. The cost of a computer system could be
allocated in a simple way to power, logic, memory, electromechanical parts, and package. As
applied to the cost of buses, these become
power, logic complexity, and cable/connector
costs.

r------------,
I
I

I
I
I
I

c

L____

I
I
I
I

0

I
I

.J

(a) Types Band C pathways contained within the
"mainframe" package; longer type D paths.

Single types Band C pathways, extending out of
(b)
the "mainframe" package; short type D paths.

Figure 5. A design tradeoff for types Band C
pathways.

Increasing memory addressing requirements
leads to more signals in the pathway. Each signal adds to power and cable costs. Lower bandwidth can be traded for wider memory
addresses by time-multiplexing the addresses
with data. Increasing the maximum number of
connections adds to the electrical load and leads
to increased power in the bus drivers or to lower
bandwidth, as it takes longer for signals to
settle. Also, more signals are required (logarithmically increasing with the number of connections) to select the destination of a transfer.
Increasing maximum length also requires more
bus drive power for a given signal level and increases the bus cost. Since longer buses have
greater propagation delays, we can trade lower
bandwidth and higher latency for increased
length. Both length and load (connections) contribute to signal decay, and therefore these two
are often traded against each other. For example, each section of a Unibus is rated for a
maximum length of 50 feet or a maximum of 20
bus "loads." Exceeding either limit requires insertion of a "bus repeater" circuit. A Unibus
with fewer loads could be operated at longer
lengths than the maximum 50 feet, but configuration rules with fixed limits are easier to
understand.
By accepting increased cost, some performance parameters can be enhanced as follows.
Decreased latency and increased bandwidth can
be achieved by using higher power driver and
receiver circuits (such as EeL) which have
lower propagation delays in their logic gates.
Bandwidth can be increased by providing more
buffering logic (complexity) at each connection.
For a given level of reliability, the data clocking
rate can be increased with either faster logic
(higher power) or more logic parallelism (complexity). More data transmission parallelism
would mean higher cable and connector costs.
Lower latency can sometimes be achieved by
distributing the task of arbitration among the
connections. More logic is then required at each
connection.

BUSES, THE SKELETON OF COMPUTER STRUCTURES

There are also considerations of physical and
electrical environment that affect costs. To
compensate for noisy environments, error detection and correction circuits may be added at
each connection, adding to the complexity. Or
shielded or twisted-pair cables may be included,
adding to the cost of the interfaces. For physically stressful environments, cable costs may
become dominant as the cables are armored,
strengthened, or given noncorrosive wrapping.
In general, we can trade reduced bandwidth for
increased immunity to electrical noise, since
most noise-induced errors can be overcome by
repetition and redundant signaling. (At this
tradeoff, bus design merges with applied communication theory.)
EVOLUTION OF THE HIGH
PERFORMANCE PDP-11 SYSTEMS

The Unibus, introduced with the PDP-ll in
1970, is a novel bus structure because it is a
single bus to which all system components are
attached. It can be extended indefinitely; moreover, memory modules need not operate synchronously with the rest of the system.
In this section the evolution of the high performance descendants of the PDP-l1/20 is
traced, with emphasis on the development of
buses in response to design goals for each
model.
PDP-11/20

The Unibus design is integral to the PDP-ll
architecture in the handling of interrupts (the
priority level of the central processor affects arbitration) and in the I/O page concept (control
registers appear as memory locations). But the
important aspect of Unibus design, as a bus, is
its support of modularity.
When the PDP-ll/20 (Figure 6) was designed, it was natural to offer a bus that could
be interfaced to many types of equipment, including users' laboratory devices. Digital offered Unibus interfacing modules (such as the

277

DRl1 series) which users of the PDP-l i eouid
easily adapt to their own equipment.
The standardization of interfacing was also a
deliberate attempt to prolong the service lives of
Digital's peripheral equipment. As new members of the PDP-ll family \vere introduced,
older peripherals could still be attached to the
Unibus without electrical modifications.
The asynchronous data transfer of the U nibus has allowed DEC to introduce a series of
memory subsystems with progressively increasing speeds without changing the Unibus timing
or data transfer protocol. In a single system,
various memory technologies can be intermixed.
PDP-11/45

The goal of the PDP-II /45 project (Figure 7)
was to design a very fast central processor to
match the speed of the 300-nanosecond semiconductor memory which was becoming available.
The PDP-II/45 design places the semiconductor memory in close proximity to the
CPU and provides a private type A path, the
Fastbus. This eliminates many of the access delays present when a Unibus was between the
CPU and memory. For compatibility, however,
it was necessary for the semiconductor memory
to be accessible to DMA transfers from outside
the CPU. Therefore, another Unibus was
brought out of the CPU cabinet.
With higher CPU speed came the need for
larger memory sizes. While the PDP-II /20 can
have up to 64 Kbytes of memory (less 8 Kbytes
reserved for the I/O page), the PDP-I 1/45 introduced a memory management unit (the

UNIBUS

Figure 6.

The PDP-11 120 Unibus configuration.

278

(a)

THE PDP-11 FAMILY

Proposed configuration.

7a. The upper Unibus, Unibus C, was to carry
the control and interruption (type C) transactions; the lower tJnibus, Unibus B, was reserved
exclusively for DMA (type B) data transfers.
For this purpose, a special stand-alone Unibus
Arbitrator module was developed because Unibus B has no processor present to perform Unibus arbitration. (Note, however, that the BR
signals are not used on Unibus B, because there
is no CPU to be interrupted).
Unfortunately, the configuration shown in
Figure 7a could not be used Jor two reasons:

1.

2.

(b)

Actual configuration.

Figure 7.

PDP-11/45 configurations.

K TIl) that allows addressing of up to 256
Kbytes. The Unibus design, with foresight, had
been implemented with two spare address lines,
allowing immediate use of the 18 bits of physical memory address from the PDP-l 1/45.
By 1973, the IBM 3330 disk technology (100
megabytes per spindle) had become available at
a cost attractive to minicomputer system users.
The Massbus was developed specifically to interface this and other high data rate devices
which were being planned. The RH 11 controller connects the Massbus to the two U nibuses of PDP-II /45 systems as shown in Figure

DMA transfers from the RHII controller cannot reach memory modules attached to Unibus C if all block transfers
are made on Unibus B. (The proposed
solution of having the RH 11 DMA port
selected by program control was rejected
because of the complexity of determining in software which memory is connected to which bus.)
DMA transfers from controllers on Unibus C cannot reach the semiconductor
memory unit.

The second problem was fatal. The central
processor is capable of dealing with only one
I/O page, and that is on Unibus C. Therefore,
old DMA controllers had to be attached to
Unibus C. In fact, all controllers had to attach
to Unibus C, because that is the on ly interruption path. Since compatible use of old peripherals was essential to success of the family,
the PDP-II /45 was configured only as shown in
Figure 7b. Unibus B, when connected to Unibus C (with the separate arbitrator module removed) becomes part of the single Unibus
system.
PDP-11/70

By 1974, semiconductor memory costs had
become much lower. Therefore, a cache memory became a feasible cost/performance enhancement to the PDP-II/45 (Chapter 10).

BUSES, THE SKELETON OF COMPUTER STRUCTURES

279

Without great modification to the CPU iogic, a
cache memory was added with a width of 32
bits - twice the word size of the PDP-II (Figure
8). The cache effectively interfaced to the PDP11/70 CPU over the same Fastbus that was pre"'.... t; ... th .. Pf)P_l1
1..1,
J.
/1-'_
"' ...... .1I\,.

.111

\,..11,"",

.&.

LJ.A

-

.I.

I n order to gain memory bandwidth for increases in both CPU and DMA performance, a
new memory bus was added, with a 32-bit wide
data path. Closely related to the memory bus
was a backplane interconnection, which can
carry 32 bits at a time to the RH70 controllers
(up to four of them). In Figure 8 the RH70-tomemory path is shown going through the cache
because of a look-aside feature of the cache
memory.
The Massbus had been designed to provide
very high block transfer bandwidth, while keeping the control registers accessible to the central
processor at all times. The successful splitting of
the type C path (the Unibus) from the type B
path (the backplane data path) in the PDP11/70 matched well with the Massbus design
goals, and this match accounts in part for the
relatively long life of the PDP-I 1/70 system in
its marketplace.
The PDP-II /70 also required more memory
addressing capacity to balance its increased
speed. The KTII memory management unit
was easily expanded to address 4 megabytes of
memory, and the RH70 controllers were designed to generate the required 22 bits of memory address directly.
Slower speed peripherals are still interfaced
to the Unibus. In doing DMA transfers from
them, it is necessary to transform the I8-bit address on the Unibus into a 22-bit main memory
address. To do this, a Unibus Map module is
inserted between the Unibus and the cache
memory. This path carries 16 data bits at a
time, and the bandwidth demands are relatively
low.

32

§MORY

I

MEMORY BUS

---I

I
I

CABINET

I

L ______ J

Figure 8.

The PDP-11170 configuration.

SBI

I

32

I

CACHE
64 BITS

MEMORY

UBA

MBA

MBA

CPU

UNIBUS

Figure 9.
SB!.

MASSBUS

MAssa us

The VAX-111780 organization, based on the

VAX-11/780

The VAX-II /780 (Figure 9) emerging in late
1977 returns to a single central bus organization, based on the Synchronous Backplane Interconnect (SBI).
The SBI was originally conceived in 1974 for
use on a PDP-II processor and was later
planned for use on a PDP-lO processor. Those
processors were not released, but the SBI was
carried into the VAX-II /780 design and tailored for the 32-bit environment. *

*The VAX-llj780 SBI is the subject of a patent application filed by Digital Equipment Corporation.

280

THE PDP-11 FAMILY

High DMA bandwidth is obtained by the SBI
short time-slot and by memory read operation
splitting which releases the bus during the memory read-access delay. To help overcome the delay associated with having to do a full bus
transaction to start a memory read cycle, the
memory control logic is capable of receiving
and storing a queue of up to 4 memory read and
write requests while it is working on one of the
requests.
Compatibility with existing PDP-II peripherals is provided by controllers that adapt the SBI
to a Unibus (the Unibus Adaptor (UBA) in Figure 9) and to several Massbuses (MBA).
On the SBI, the I-gigabyte address space is
divided in half with the Unibus I/O page concept extended to cover the upper half. Within
this rather large address space are contained
control registers for all peripherals, an I8-bit
memory address space mapped onto the Unibus, and a number of internal status and control registers, such as those that contain errorreporting information.
Figure 10 shows an historical summary of the
buses used in the PDP-II computers.
ARBITRATION METHODS

Since data transfer requests on a bus can
originate from more than one source, there
must be a means of deciding which source is to
use the bus next. This process is called arbitration.
A connection follows a two-step procedure to
transfer data on a bus:
1.
2.

Arbitration. Obtain the use of the bus.
Data Transfer. Transfer data on the bus.

To assist our examination of arbitration
methods, we define twelve categories, using
three discriminating criteria. The criteria are:
1.

Where? Location of the arbitration logic
(Centralized or Distributed).

PDP·11120

PDP·11/45 (SOME MEMORY
NOT ON UNIBUS)
PDP·11170
(NO MEMORY
ON UNIBUS)

DISK
CONTROLLER
DATA BUSES

~
Z·BUS

I

" "

I
I
I

I

LSI·11 BUS

PDP·11103

"

RH20
PDp·11
RH10

PDP·11/04

" ~;
PDP.11/55'~

PDp·10
PDP·11/60
VAX·11

~J

I "DRAGON"
I (NOT
I RelEASED)
I

I
I
I
I

I SBI
VAX·1117S0

MASSBUS

Figure 10.

2.
3.

LSI·11 SUS

UNIBUS
WITH
MEMORY

UNIBUS
WITHOUT
MEMORY

SBI

Genealogy of PDP-11 Family buses.

How? Allocation rules (Priority, Democratic, or Sequential).
When? Timing relationship of arbitration to data transfer (Fixed or Variable).

Centralized arbitration means that a signal
must pass from a req uesting connection to a
common arbitration point, and a response signal must return to the requesting connection before it may transfer data. In distributed
arbitration there is no single common arbitration point. The Unibus, for example, has
centralized arbitration (with the exception
noted below). A contention-arbitrated serial

BUSES. THE SKELETON OF COMPUTER STRUCTURES

bus, like the Ethernet [ivletcaife ana HOggS,
1975], has distributed arbitration. The resolution of conflicting requests is accomplished in
all arbitration methods by allocation rules. Priority arbitration means that in case of an appartransfer facilities, the rules always let one connection (or group) go ahead of another connection (or group). Democratic allocation
means that there are no priority rules. An apparent tie is resolved arbitrarily or by some
"fairness" rule which attempts to keep anyone
connection from monopolizing use of the data
transfer facilities. Sequential allocation insures
that there are never any apparent ties by giving
request opportunities to only one connection at
a time. (The sequence is not necessarily roundrobin.)
The Unibus has priority allocation, by
groups. Most contention-arbitrated serial buses
have democratic allocation. Centralized, sequential (polled) buses are frequently used as
type D pathways to connect character terminals
to a concentrator (see Example 4, to follow).
Finally, there is the question of the timing
relationship between the arbitration of a
request and the data transfer that occurs as a
result of the request. Arbitration fixed with respect to data transfer means that a connection
must request the data transfer facilities at a
fixed time relative to the data transfer. This category includes buses in which the same signal
lines are used for data transfer and for arbitration.
Arbitration variable with respect to data
transfer means that a connection may request
use of the data transfer facilities at any time,
independent of the current state of the data
transfer facilities.
The Unibus has variable arbitration. Polled
buses have fixed arbitration because data transfer always occurs in the time slot immediately
after the arbitration logic has polled a requesting connection. Contention-arbitrated serial

281

buses have fixed arbItration, too, in that the
data transfer is the request for use of the bus.
Table 2 summarizes the categories of arbitration methods; description of five example
buses follows.
Example 1: Unibus

Figure 11 shows a simplified diagram of the
Unibus arbitration section with two controllers
sharing a Bus Request (BR) line. When Controller 1 wants to use the bus for an interruption
transaction, it asserts the shared BR signal line.
When the processor is in a state capable of receiving an interruption, the arbitrator asserts
the Bus Grant (BG) signal.
The arbitration logic of Controller 1 is shown
in Figure 12. The timing of an arbitration sequence is shown in Figure 13. Controller 1 receives the assertion of BG and may make a data
transfer as soon as the ongoing data transfer is
complete. Controller 1 acknowledges its selection by asserting the Selection Acknowledge
(SACK) signal. Controller 1 can use any BG assertion that arrives after the controller has asserted BR to perform an interruption
transaction. The serial wiring of BG could be
called a kind of priority arbitration, but it is
preferable to think of it as a sequential type of
allocation, in which the sequence begins on demand and always starts at the controller closest
to the processor and arbitrator.
The Unibus actually has four groups of controllers, each group connected to a Bus Request
line (called BR4, BR5, BR6, or BR 7) and wired
as shown in Figure 11. In addition, every controller capable of doing DMA data transactions
is connected into a fifth group called Non-Processor Request (NPR) for data. All five groups
share a common SACK line.
Memory modules do not participate in arbitration on the Unibus since they never initiate
data transfers.

282

THE PDP-11 FAMILY

Table 2.

The Twelve Categories of Arbitration Methods
Fixed with Respect
to Data Transfer

Variable with
Respect to Data Transfer

Central. Priority

Central. Priority, Fixed
5BI

Central. Priority, Variable (plus some aspects of distributed, sequential below)
Unibus, L51-11 Bus

Central. Democratic

Central. Democratic, Fixed

Central. Democratic, Variable

Central. Sequential

Central. Sequential. Fixed
Polled Character-Input

Central. Sequential. Variable

Distributed, Priority

Distributed, Priority, Fixed

Distributed, Priority, Variable

Distributed, Democratic

Distributed, Democratic. Fixed

Distributed. Democratic. Variable

Distributed. Sequential

Distributed, Sequential. Fixed

Distributed. Sequential. Variable

Arbitration Category

NOTE
The Massbus has no arbitration at all, because all control
transfers originate from one point.

BG
IN

SACK

,---------,:----+-----y---t---t
BG

BG
OUT

SACK

TERMINATOR
INTERRUPT
REOUEST

ARBITRATOR

Figure 11. A simplified diagram of the Unibus
arbitration section.

CONTROLLER 1

Figure 12. The arbitration logic of a controller
attached to the Unibus.

ARB DELAY

BGIN

BG OUT

SACK

BBSY

-----~

7J1LU
TRANSFER

Figure 13.

The timing of a Unibus arbitration sequence.

BUSES, THE SKELETON OF COMPUTER STRUCTURES

In the most general case, a single controller
on a Unibus can participate in three types of
transactions:
1.

2.

3.

As the target of a control data transfer
(type C), the controller behaves as if it
Were a memory. It receives commands
(as data writes) into control registers and
transmits status (as data reads) from status registers this way. The controller
does not request the bus for these transactions: it is the "slave" of the processor
which obtained the bus for this purpose.
As the originator of a DMA, type B data
transfer, the controller moves data to or
from memory. To obtain the bus for this
purpose, it asserts the shared NPR line,
and waits for a Non-Processor Grant
(NPG) signal to be passed to it from the
left.
As an interruption source (type C), the
controller sends an interrupt vector to
the processor. To obtain the bus for this
purpose, the controller asserts one of the
four BR lines (BR4, BR5, BR6, or BR 7),
and waits for the corresponding BG signal (BG4, BG5, BG6, or BG7) to be
passed to it from the Arbitrator. Each
controller is assigned a single BR level at
the time of its installation in the system.
Thereafter, it never blocks any of the
other three BG signals.

Some controllers, such as simple terminal interfaces, do no DMA transfers, but perform an
interruption transaction for each character of
input or output.
The priority arbitration of the Unibus is affected directly by the priority state of the CPU.
The CPU program execution priority (PRI)
varies from 0 to 7. The Unibus Arbitrator
grants use of the bus to non-CPU connections
by the following rules:
1.

At any time, when assertion of NPR is
received, assert NPG. (Interpretation: a
controller may do DMA data transfers
at any time.)

2.

283

Whenever the CPU is between instructions (i.e., is interruptable), then:
a. If PRI <7 and BR 7 is asserted, then
assert BG 7, else
b. If PRI <6 and BR6 is asserted, then
c.

If PRI <5 and BR5 is asserted, then

assert BG5, else
d. If PRI <4 and BR4 is asserted, then
assert BR4.
(Interpretation: when the CPU is interruptable, it will accept interruptions
from a controller in a group whose priority is greater than the current program
execution priority of the CPU.)
The priority arbitration rules of the Unibus
involve both the processor priority and the relative priorities of the BR signals, among themselves. Assertion of a BR 7, for example, blocks
the grant signals BG6, BG5, and BG4 until all
controllers asserting BR 7 have accomplished
their interruption transactions. Therefore, we
classify the Unibus arbitration method as centralized and variable, with a mixture of priority
and sequential allocation rules.
Example 2: The LSI-11 Bus

The LSI-II Bus serves the same functions for
the LSI-II system that the Unibus serves for
most of the other PDP-II processors. The LSI11 bus is constrained to use fewer conductors
and, therefore, less power and logic than the
Unibus. It achieves the reduction from 56 signals to 36 signals primarily by time-multiplexing memory addresses and data on the same
conductors (accepting lower bandwidth in order to achieve lower cost).
Arbitration for DMA transfers is essentially
identical to that of the Unibus (Figures 11 and
12). The corresponding signal names on the
LSI-II Bus are SACK (for Unibus SACK),
DMR (for NPR), and DMG (for NPG).
Arbitration for the interruption transaction
has only one priority-group for all interrupting

284

THE PDP-11 FAMILY

controllers. When a controller wants to interrupt the processor, it asserts the Interrupt
Request (IRQ) signal. This is similar to the BR
signals on the Unibus. However, the LSI-II Bus
interruption transaction more closely resembles
a data transfer, so it will be described in the section on data transfer synchronization. Arbitration on the LSI-ll Bus, like the Unibus, is
classed as centralized and variable with a mixture of priority and sequential allocation rules.
However, only one level of priority is used for
interruption transactions.
Example 3: Synchronous Backplane
Interconnect (SBI), the VAX-11/780
Memory Bus

This memory bus is distinguished by its limited length and its master clock which synchronizes all transactions on the bus. (The bus does
not extend beyond the etched backplane of the
computer cabinet.) The functions of the SBI are
the same as those of the Unibus. However, the
SBI differs in physical configuration because
every controller must be directly connected to
the backplane. Another difference between
Unibus and SBI is that all transactions on the
SBI are of fixed duration, which gives much
higher bandwidth for data transfer. (The SBI is
rated at 13.3 megabytes per second, while the
Unibus is capable of approximately 1.7 megabytes per second when operating with equivalent speed memory.) To achieve this bandwidth,
it was necessary to split the memory read operation into two bus transactions - one to transmit an address to the memory, another to
transmit data back to. the requesting connection. In this way the SBI can accommodate
memories of various cycle times, as can the Unibus, but the requesting connection does not occupy the bus facilities for the duration of the
cycle.
Arbitration on the SBI is distributed, priority, and fixed. Figure 14 shows a simplified diagram of the signals involved in SBI arbitration.

A master clock, represented here by a single
signal, defines a sequence of time-slots on the
bus. Each slot (200 nanoseconds in the VAX11/780) is of long enough duration to complete
a transfer of data from one connection to any
other connection, but not for a reply signal to
be sent back.
There are four Transfer Request (TR) signals
in this simplified example: TRO, TRI, TR2, and
TR3. Each TR signal "belongs" to one connection; that is, only one connection is permitted to assert the signal.
Each TR signal has a priority associated with
it: TRO has highest priority. A connection
requests the use of the SBI data transfer facilities by the following procedure:

1.

2.

3.

At the beginning of the next time-slot
(after deciding to transfer data), assert
the TR signal that belongs to this connection.
At the end of the time-slot, sense the
state of all of the higher priority TR
lines.
If none of the higher priority TR lines is
asserted, then at the beginning of the
next slot negate "my own" TR signal
and begin transmitting data.
If any of the higher priority TR lines is
asserted, then do not negate "my own"
TR signal, and go back to step 2.

TRO

TERMINATOR

TR2

TERMINATOR

TR3

CLOCK

Figure 14.
signals.

A simplified diagram of 581 arbitration

BUSES, THE SKELETON OF COMPUTER STRUCTURES

Figure 15 shows a timing diagram for a
sample set of data transfers on the simplified
SBI of Figure 14. In this example, connection
number 3 (corresponding to TR3) requests the
bus during slot 1, and connection numbers 1
and 2 (corresponding to TR 1 and TR2) request
the bus during slot 2.
At the end of slot 1, connection 3 detects no
higher priority TR signals, so it negates TR3
and transmits data during slot 2.
At the end of slot 2, connection 2 senses that
TR 1 is asserted, and therefore waits, leaving
TR2 asserted. At the same time, connection 1
senses no higher priority TR signals, so it negates TR 1 and transmits data during slot 3.
Some transactions on the SBI require that a
connection transmit on two or more consecutive slots. A connection that requires a slot
beyond its first one asserts TRO at the beginning
of its first data transfer slot. TRO, the highest
priority TR signal, is not assigned to anyone
connection.
The example in Figure 15 shows connection 2
doing a two-slot data transfer. After waiting for
connection 1 to transfer, connection 2 "holds"
the bus for slot 5 by asserting TRO (hold signal)
at the beginning of slot 4. In the SBI of the
VAX-II/780, connections are limited to trans• mitting in no more than three consecutive slots.
We have shown four connections in this example, although only three TR signals are assigned. The lowest priority connection, number

285

4, does not have a TR signal assigned to it because there is no need to sense a TR signal from
this lowest priority connection. Connection 4
transmits only when no other connection is requesting the next slot. Connection 4 gains an
advantage by being lowest priority: it may
transmit in any slot not used by the other SBI
connections without asserting a TR signal of its
own in the preceding slot. This gives it a shorter
memory-access latency. For this reason, the
CPU is usually given lowest priority on the SBI.
The master clock is crucial to the operation
of the SBI. In the VAX-II/780, the slots are
defined by combining three clocks into four
equal-interval phase markers. All transmitted
TR signals are asserted at the beginning of
phase 1, and all received TR signals are sensed
at the beginning of phase 4, three-fourths of the
way through the nominal slot period. This guarantees that signals from nearby connections are
not sensed too early and that distant TR signals
are sensed early enough.
Example 4: A Polled Character-Input Bus
(Type D)

Figure 16 shows a diagram of a hypothetical
simple character-input bus. The controller at
the left end accepts all input from the keyboards. It "asks" each keyboard in turn
whether it has a character to send, and if so, the
controller accepts the character during the next

POLLEO (CENTRAL. FIXED. SEQUENTIALI

I"HO~~6,_ _ _ _ _ _ _ _ _~I~sSERTED

BY

2,-1_ __

~----~U~NIT~O~~______~_________~______~

UNIT 1
CLOCK

CONTROLLER
TR3---.J
TIME
SLOTS
DATA
TRANSFER

I

I

I

VFROM3;1!

I

II

I

DATA (81

II

Figure 15. Timing diagram of arbitration for an
example set of data transfers on a simplified SBI.

SEND

Figure 16.

A hypothetical polled character-input bus.

286

THE PDP-ll FAMILY

time slot. This arbitration scheme is centralized,
sequential, and fixed with respect to data transfer.
Three signals are broadcast from the controller to all terminals. One is the Clock, which
defines the time-slots. The other two signals,
called Unit 0 and Unit 1, send out a two-bit
code which selects one of the four keyboards
during each slot. The coding is binary.
The controller changes the Unit Select signals
at the beginning of each slot. The keyboard selected, if it contains a character to be transmitted, asserts the Send signal, and transmits
the character at the beginning of the next slot.
In the timing diagram shown in Figure 17,
keyboard 1 transmits two characters and keyboard 2 transmits one character. In this type of

KEYBOARD
SELECTED ...... 0
UNIT 0

CLOCK

DATAI81 _ _ _ _~~~_ _ _ _~~~~_ _

rI

SEND

~~~~S I

1

Figure 17.
bus.

I I
2

3

I

1

4

I

5

I I
6

7

Hili

2

L-

I I
8

9

the previous example, a single controller at one
end of the bus receives or sends on each data
transfer. Control information is transferred as
on the Unibus, but the "master" of the transfer
is always the controller. Data blocks are transferred using a peripheral-generated clock, and
the transfers are always initiated by writing a
control word into a register in the peripheral.
Interruptions to the CPU are generated by
the controller on demand from any peripheral.
For this purpose an Attention signal exists in
the control section of the Massbus. Each peripheral is capable of asserting this signal.
SYNCHRONIZATION OF DATA
TRANSFERS

Synchronization of a data transfer is coordinating the timing between two bus connections
which are involved in a data transfer. The
method by which data transfer is coordinated
can be very different from the arbitration
method.
To classify the methods of data transfer synchronization, we use two criteria:
1.

Timing diagram of a polled character-input

arbitration scheme, the polling (sequential sampling) of possible sources of data (the keyboards) eliminates the need for contention or
priority rules. The logic of each connection is
simple, but the scheme in this example limits
each connection (keyboard) to using a maximum of 25 percent of the data transfer bandwidth.
Example 5: Massbus

The Massbus is a peripheral-to-controller
(type D) bus that has no arbitration at all. As in

2.

Source. The location of the source of the
synchronizing signals (centralized, one of
the sending or receiving connections, or
both connections).
Periodicity. The type of synchronizing
signals (periodic or aperiodic).

Table 3 shows the six resulting categories and
how the examples fit into them.
The location of the synchronizing signal or
signals may be at one of the connections sending or receiving data (one), at both of the connections (both), or at neither (centralized).The
Unibus data transfer is synchronized by signals
from both the sending and receiving connections.
The synchronizing signal may be a clock (periodic), or it may be something else (aperiodic).
The Unibus uses an aperiodic "handshake."

BUSES. THE SKELETON OF COMPUTER STRUCTURES

Table 3.
Methods

Data Transfer Synchronization

287

BBSY
MSYN
SSYN
ADDRESS AND CONTROL

Location
of Signal
Source

Periodicity
Periodic

t

DATA

I

Aperiodic

S81
polled
character-input

No examples
Figure 18.

One
connection

M assbus Data

No examples

Both
connections

No examples

Unibus.
LSI-11 Bus.
Massbus Control

~t

It

CONTROllER
OR CPU

Centralized

I

I i

MEMORY

i

Unibus data transfer section.

eesy

FROM
Kio

Example 1: Unibus

DMA (type B) and CPU-memory (type A)
data transfers on the Unibus are accomplished
with the same data timing. The interrupt-vector
transaction timing is similar and thus is omitted
from this discussion.
Figure 18 shows the data transfer section of a
Unibus with two connections: a controller or
CPU (the "master" in a data transfer), and a
memory (the "slave"). (For control and status
register transfers (type C), a controller plays the
role of memory or slave.) The timing of transfers on a Unibus is shown in Figure 19. Bus
Busy (BBSY) indicates that the data transfer facilities are in use. Control and Address signals
are a group that specify the kind of transfer and
the memory address. Master Sync (MSYN) is
asserted by the master (the CPU or controller)
to indicate that Control and Address signals are
present.
Slave Sync (SSYN) is asserted by the slave
connection (memory) to indicate that data is
present on the Data lines.
Unibus Data-Out moves data from the requesting connection into memory.

ADDRAE~~ 77?I:-rr-+...;....-~177T.7TT/'777i ~~--~77TTTT:T7T7
CONTROL~~I~:::.::..:=--:~~"""'~ I""~--;'::':::::":::~-""""""""""""~

Wdl/A~~~~e~1.

DATAml"

l

DESKEW

I IM:~~RY I

DESKEW ACCESS
AND

SETUP

SETUP

DATA·OUT

Figure 19.

TIME

DESKEW

DATA·IN

Timing diagram of transfers on Unibus.

Having received permission from the arbitrator and acknowledged it by asserting Select
Acknowledge (SACK), the connection waits for
Bus Busy (BBSY) to be negated. It then asserts
BBSY and negates SACK. This connection now
"owns" the data transfer section of the Unibus.
Next, it must wait for SSYN to be negated to
prevent its own logic from mistakenly sensing
SSYN in the asserted state too early.
Next, the master connection transmits the
Address and Control signals and the Data. It
then waits for an interval, the deskew time, before asserting MSYN, to compensate for the

288

THE PDP-11 FAMILY

variable delay in transmission of different signals from one connection to another. An additional set-up time is inserted to allow all slave
connections time to sense and compare against
the Address and Control signals.
The slave connection senses the Address and
Control signals at all times. In this example, the
address being transmitted by the controller
matches one of the memory addresses "owned"
by this memory connection. Therefore, this
slave responds to the assertion of MSYN by
sensing and storing the signals on the Data
lines.
Having captured the data, the slave asserts
the SSYN signal. When the master receives the
assertion of SSYN it knows that the data transfer has been completed.
The master then stops transmitting the Address and Control, Data signals, MSYN, and
BBSY.
Unibus Data-In is a read from memory. The
timing is similar to Unibus Data-Out, except
that data is transmitted on the data lines by
memory. The second part of Figure 19 shows
the Data-In timing.
Data transfer on the Unibus is aperiodic there is no clock. Synchronization occurs by a
"handshake" interaction between the MSYN
and SSYN signals. In fact, two round-trips of
signaling occur. We could look at this signaling
in tabular form (Table 4).
The sequence of four events insures a fully
"interlocked" data transfer. The timing of a
transfer is variable, depending on the speed of
the slave's memory (for Data-In) and on the
speed of the logic at both connections. On the
Unibus, 75 nanoseconds are allowed for deskew
time and an additional 75 nanoseconds for setup, where noted.
Example 2: LSI-11 Bus

Data transfers on the LSI-II Bus also serve
the functions of pathway types A and B. Synchronization is from both sender and receiver

Table 4.
Transfer

Synchronization of Unibus Data

Data-Out

Data-In

MSYN
assertion

Address and
Control and Data
present

Address and
Control present

SSYN
assertion

Data captured (by
slave)

Data present

MSYN
negation

Stop transmitting
Data and 88SY

Data captured (by
master); stop
transmitting
88SY

SSYN
negation

Stop transmitting
Data

and is aperiodic. Below the CPU-memory (type
A) transfers are described.

The signals involved in data transfers between the central processor and memory are
DAL, SYNC, DIN, DOUT, and RPLY. These
are similar to the Unibus signals shown in Figure 18. The processor initiates all data transfers
of this type. Type C (control and status) transfers are also made using the synchronization described next, with a controller playing the part
of memory in the transfer.
Figures 20 and 21 show the timing of data
transfers. The 16 DAL signals are used to transmit address and then data, one after the other.
SYNC is the signal which tells all memory devices on the bus to examine the DAL lines and
to test for a matching address. DIN and DOUT
initiate the memory read and memory write cycles, for Data-In and Data-Out transfers, respectively. RPL Y, which is similar to the
Unibus SSYN signal, indicates the presence of a
response from the memory.
Before proceeding with a transfer, the CPU
must wait until both SYNC and RPL Y have
been negated, to be sure that no other transfer is
in progress on the bus.

BUSES, THE SKELETON OF COMPUTER STRUCTURES

FROM Mp

FROM Pc

FROM Mp

ADDRESS

DATA

DAl

SYNC
FROM Pc

DIN
FROMPc _ __ _

DOUT

FROM Pc ----I--!--;--;----;

RPlY LL4._ _---'

DATA-IN

DATA-OUT

Figure 20. LSI-11 Bus Data-In and Data-Out
synchronization.

DAl

SYNC
FROM Pc

DIN
FROM Pc _

........._ _

DOUT
FROM Pc _ _ _ _-+~~~

RPlY
FROMMp _ _ _.......

Figure 21.

LSI-11 Bus Data-In-Out synchronization.

The CPU transmits the memory address on
the DAL lines. After waiting for a fixed interval, to allow for deskew and set-up time at the
memory, the processor asserts SYNC.
The memory senses the DAL lines when it receives the assertion of SYNC. rhe memory
matches the address received and decides that
the data word being addressed is in this memory
module.
After another fixed delay, to guarantee that
the SYNC assertion always arrives at the memory first, the processor asserts DIN and stops
transmitting the address on the DAL lines.

289

As soon as the memory receives the DIN assertion, it knows that a read cycle is desired. It
retrieves the data word and transmits it on the
DAL lines. Meanwhile, it may assert theRPL Y
signal as much as 125 nanoseconds before
transmitting the data.
When the processor receives the RPL Y assertion, it waits at least 200 nanoseconds to be sure
that the data has arrived, and then senses and
stores the data. Then the processor negates
DIN.
As soon as the memory receives the DIN negation, it stops asserting RPL Y. Not more than
100 nanoseconds later, the memory stops transmitting the data on the DAL lines.
When the processor receives the negation of
RPL Y, it negates SYNC. The bus is now available for the next data transfer.
The second part of Figure 20 shows the timing of a Data-Out (write to memory) transfer.
Figure 21 shows the timing of another type of
LSI-II Bus data transfer, the Data-In-Out operation. In this transfer, a data word is read
from memory, sent to the CPU, and then a
word is sent back to the same memory location.
This operation is useful for certain PDP-II instructions such as "increment memory" (INC),
which modifies a single word in memory, and
ADD, which stores a result at the address of the
second operand. Bus transmission time is saved
by not requiring the address to be sent a second
time for the Data-Out portion of the cycle. On
the other hand, the CPU may delay the operation by an arbitrary amount of time, while the
word to be written is generated.
Figure 22 shows the timing of the interruption transaction on the LSI-II Bus. This
transaction includes both arbitration and the
transfer of a data word (an interrupt vector)
from a controller to the CPU.
All controllers share the single Interruption
Request (IRQ) line. It is similar to the Unibus
BR signals, causing an interruption when asserted.

290

THE PDP-11 FAMILY

IRQ
FROM Kio

DAL

IAK
FROM Pc

DIN
FROM Pc

RPLY
FROM Kio

Figure 22.
LSI-11 Bus interruption transaction
synchronization.

The Interruption Acknowledge (lAK) signal
is similar to the Unibus BG signals. IAK is
wired from the processor (arbitrator) serially
through all controllers, just like a Unibus priority group.
A controller may assert IRQ at any time.
When the processor is ready to receive an interrupt vector, it begins a sequence which resembles a Data-In transfer. However, the SYNC
signal is not used and no address is sent out on
the DAL lines.

CLOCK. The 10 signals are used to identify the
destination of the transfer when the information transferred is data. The other use of the 10
signals is explained below.
The Data lines carry 32 bits of information.
This information is either: (1) 32 bits of data, or
(2) 28 bits of address and 4 bits of command
code. The Flag signal is asserted to indicate case
(2). In this case, the destination of the transfer is
determined by the 28 address bits, in a way similar to Unibus addressing. For these transfers,
the 10 lines carry the identity of the source of
the transfer. The connection receiving a Read
command saves this source 10 value, so it can
use it as a destination 10 on a later data transfer.
Figure 23 shows the timing of the two SBI
transfers which make up a read operation from
memory. Remember that there is a master clock
which defines a series of time-slots. The Transfer Request (TR) signals are shown again to illustrate the fixed time relationship of arbitration before a transfer.
In Figure 23, the controller (connection 1)
decides at the beginning of slot 1 to initiate a

TRO. _ _ _ _ _ _ _ _ _ _ _ _ _ _ __

Example 3: Synchronous Backplane
Interconnect (S B I)

The SBI synchronization method is centralized and periodic. There is only one sequence of
events which causes information transfers on
the SBI, and that sequence is quite simple.
However, the information transferred from one
connection to another has two possible interpretations: Command and Address, or Data. A
memory read or write operation always consists
of two sequences: one to transfer a command to
the memory connection, the other to transfer
data. The read operation is split, allowing other
transactions to take place while a memory is accessing data.
There are four groups of signals used to effect
data transfer: 10, DATA, FLAG, and

TR'~'___ _ _ _ _ _ _ _ _ _ _ __

---Jr-lL.____

TR2 _ _ _ _ _ _ _ _ _

SOURCE

DESTINATION

ID _ _....Ir==l_ID_=_'-'--_ _ _ _ _ _--Lr==l_ID_=_'.l..-_ _
ADDRESS
DATA _ _

DATA

.....Ic=J~FR_O_M
'..L-_ _ _ _ _ _-....Ic=J_FR_OM_2-'--_ _

F L A G - I I_ _ _ _ _ _ _ _ _ _ __

Figure 23. The timing of two SBI transfers which make
up a read operation.

BUSES, THE SKELETON OF COMPUTER STRUCTURES

memory read operation. In slot 2 it transmits
the following bits:
ID
DATA

1, the identity of the source
connection.
Read command code, plus 28
bits of memory address.
asserted, indicating that
DATA contains command
and address.

291

we could attach a variety of memory subsystems with different access times to one SBI,
without serious performance degradation, as
long as the memory access times are sufficiently
large multiples of the slot-time.

In this case, the memory connection detects
that the address refers to memory contained in
itself, and it therefore begins a read cycle.

The VAX-II /780 system uses a slot-time of
200 nanoseconds and has a memory subsystem
access time of just under 800 nanoseconds (including error detection). The four-slot access
time shown in Figure 23 is typical of this system.
Figure 24 shows the timing of a memory
write operation on the SBI. The controller, connection 1, transmits in the two consecutive slots
following arbitration. In the first slot (slot 10),
FLAG is asserted to indicate that the Write
command and address information is present.
In slot 11, the data is transmitted. The memory
connection must be prepared to accept and capture the sequence of two transmissions.

The memory connection asserts its TR signal
(TR2) one slot before it is ready to transmit
data. The memory transmits its data to the requesting controller in the next slot. (slot 7):

During slots 10 and 11, the ID lines contain
the identification of the controller, allowing the
memory to verify that both transmissions came
from the same source.

FLAG

At the end of slot 2, the memory connection
senses all of these bits and captures them in a
buffer register. In fact, every connection on the
SBI captures all of these bits on every slot. Subsequently, each connection matches the ID bits
to determine if it should respond.

ID
DATA
FLAG

1, the identity of the destination connection.
32 bits of data from memory.
negated, indicating that
DATA carries data.

ASSERTED BY 1

"HOLD"

TRO

I

~I

_________________

TR1~

TR2 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __

At the end of slot 7, all connections to the
SBI capture this information, and controller 1
recognizes the match between the ID bits and
its own identity. A memory read has now been
finished.
On the SBI, a memory may wait a variable
number of slots before replying to a Read command. Clearly there is a performance penalty
for memories that require slightly more than an
integral number of slot-times to access a word.
Therefore, the SBI clock is "tuned" to be an
integral submultiple of the access time of the
memory subsystem we intend to use. However,

ID_ _ _ _....I--...;.;I:~:~IT~~~ID;...--1---11-----AND ADDRESS
FROM 1

DATA
FROM 1

DATA._ _ _ _........._ _...&...._ _..&.-_ _ _ _ __

FLAG

-------"

FROM 1

CLOCK

TIME
SLOTS

Figure 24.
the SB!.

10

11

12

13

The timing of a memory write operation on

292

THE PDP-11 FAMILY

The two-slot write operation is kept contiguous by using the highest priority TRO
"'hold" signal to obtain use of the second slot.
The SBI minimizes the slot interval and maximizes bandwidth by eliminating all round-trip
delays.

OEM
TRA
OS 131
PERIPHERAL
CONTROLLER

RS 151
C 1161
CTOO
ATTN

Example 4: Polled Character-Input Bus

Data transfer on this bus was described in the
section on arbitration methods. The synchronization method is centralized and periodic.
Data transfer occurs in time-slots just as on
the SBI. The time-slots are defined by a master
dock, and the receiver (always the controller)
must accept the data at the end of the time-slot.
In contrast to the SBI, this bus preallocates one
of every four slots to each keyboard connection.
The controller must keep internally an indication of which character is received from
which keyboard.
Example 5 (a): Massbus Control Section

The Massbus actually consists of two sections: a Control Section for reading and writing
the contents of registers in the peripherals, and
a Data Section for moving blocks of data. All
transfers are between the controller and one of
the (up to eight) peripherals. The two sections
operate independently, except that a Control
Section write into a control register of a peripheral is required to initiate a block transfer on
the Data Section.
The Control Section of the Massbus is a miniature Unibus. However, the controller is always the master, and one of the peripherals is
always the slave in the transfer. Figure 25 shows
the Control Section signals involved in data
(i.e., control and status register) transfers. The
Demand (OEM) signal takes the place of
MSYN, and Transfer (TRA) takes the place of
SSYN. Instead of Address and Control lines,
there is an eight-bit address on the Massbus
Control Section: three bits of Drive Select (DS),

I

il

I
Figure 25.

PERIPHERAL 0

II

i

I

~

PERIPHERAL 1

I

The Control Section signals of the Massbus.

and five bits of Register Select (RS). Thus, each
of eight peripherals (drives) may contain up to
32 two-byte registers. The Controller to Drive
(CTOD) signal, when asserted, indicates that
the transfer is a write into a peripheral register.
Control information is transferred 16 bits at a
time on the C lines. Timing of these transfers is
equivalent to that shown for the Unibus in Figure 19.
There is also a shared Attention (ATTN) signal in the Control Section that may be asserted
at any time by a peripheral which requires CPU
intervention. The controller normally creates an
interruption to the CPU soon after ATTN is asserted.
Timing of normal Read transfers is shown in
Figure 26. It is equivalent to a Unibus Data-In
transfer (compare with Figure 19, second part).
There is one special case which uses different
timing on the Massbus Control Section. In order to determine which of the peripherals has
caused an Attention interruption, the CPU
reads the Attention Summary pseudo-register
via the controller. This is a special "register"
which is composed of one bit stored in each peripheral. Figure 27 shows the timing for reading
this register. When the RS lines carry a code of
04, and the direction of transfer is drive to controller (CTOD negated), each peripheral (drive)

BUSES, THE SKELETON OF COMPUTER STRUCTURES

CTOD

- - - - - - - - - - -_ _ _ _ _ __

os
AND
RS

293

transmits its Attention Active (AT A) bit onto
one of the Control (C) lines. Peripheral number
transmits its AT A on CO, peripheral 1 on C 1,
and so on.
The timing of this transfer is different because the TRA signai is driven by more than
one peripheral. There is no way of knowing
when all peripherals have asserted their ATA
bits, so the controller must wait the maximum
possible access time. This maximum delay
"time-out" is present in the controller logic for
normal reads and writes, to guard against possible nonresponse from an addressed peripheral
or register. The Attention Summary read operation makes use of this time-out interval to terminate its wait for the AT A bits.

°

TRA

STROBE

Figure 26. Timing of a control read in the control
section of the Massbus.

Example 5 (b): Massbus Data Section

CTOD

co

C8

C15

CTOD

RS 151

os

RS = 04

131

INOT USEDI

SET _ _ _ TlMEOUT
UP

I

INTERVAl~I

PROP

Figure 27. Timing of a control read from Attention
Summary pseudo-register.

The Massbus Data Section is shown in Figure 28. It contains 18 Data (D) lines, which
carry data in both directions. Two clock signal
lines, Synchronizing Clock (SCLK) and Write
Clock (WCLK), carry a clock from and bad: to
the peripheral, respectively. The RUN and
End-of-Block (EBL) signals control the termination of a block data transfer. The Exception
(EXC) signal is used to indicate error conditions.
Data in the Massbus Data Section is always
transferred in multiple-word blocks. The data
read from or written to a mass storage device,
such as a disk drive, must be synchronized with
the mechanical motion of the recording medium. Therefore, the clock (SCLK) originates
in the peripheral.
A Massbus Data Read begins when a control
register in the selected peripheral is written with
a Read command code. Figure 29 shows the
timing of a Massbus Data Read. The controller
asserts the RUN signal as soon as it is ready to
receive data.
When the peripheral has received the RUN
assertion, it begins reading data from its storage
medium. The peripheral asserts SCLK when a

294

THE PDP-11 FAMILY

SCLK
WCLK

0(181
CONTROLLER

EBL
RUN
EXC

I
I
Figure 28.

PERIPHERAL

I

Massbus Data Section.

:::~CJ::L

"'~--~\
~

SCLK

WCLK

______

----------------WORD
4

Figure 29.

Timing of a Massbus Data Read.

RUN

SCLK

WCLK

_ _ _ _....;;;.;;;;0. . .

-----.......f

Figure 30.

Note that the peripheral does not receive any
positive indication that the data word was received by the controller: the data transfer is
"open loop."
At the end of the block of data words, the
peripheral asserts EBL to indicate that it has
reached the end of a data block.
When the controller receives the EBL assertion, it decides whether to continue (usually by
inspecting a word count register). Within
slightly over one microsecond, the controller
must negate RUN or else accept another block
of data.
As the peripheral negates EBL, it senses the
RUN signal. If it is negated (as shown in Figure
29), the peripheral disconnects itself from the
Massbus Data Section. Otherwise, the peripheral would transmit the next block of data.
If the number of words desired by the controller is less than an integral number of data
blocks, the controller may negate RUN before
EBL is asserted. The controller then simply ignores the remaining data words being transmitted.
Figure 30 shows the timing of a Massbus
Data Write. As for a data read, the peripheral
controls the rate at which data is transmitted.
However, this time the data is coming from the
controller, which asserts the WCLK signal
whenever it puts data onto the D lines.
The controller must have a data word ready
each time it receives the negation of SCLK.
Otherwise a "data overrun" condition occurs,
which causes abnormal termination of the
transfer.

Timing of a Massbus Data Write.

ERROR CONTROL STRATEGIES

new data word is present on the D lines. The
peripheral continues to assert and negate the
SCLK signal at the characteristic data rate.
Each time the controller receives the negation
of SCLK, the controller captures and stores the
data word from the D lines.

Unfortunately, buses do not always succeed
in delivering to the receiving connection what
was transmitted from the sending connection.
Some of the causes of errors are logic failures,
electromagnetic interference, broken conductors, shorted conductors, and power failures. In this section, we examine the following

BUSES, THE SKELETON OF COMPUTER STRUCTURES

Table 5.

Error Control Methods Used By Example Buses

Bus

Check
Bits
( Parity)

ACK

Time-Out

Retry

log

No
No
Yes

Yes (SSYN)
Yes (RPLY)
Yes (CNF)

Yes
Yes
Yes

a
b
Yes

b
b
b

Yes
Yes

Yes (TRA)
Yes (EXC)

Yes
Yes

a
a

b
b

1.
2.
3.
4.
5a.
5b.

Unibus
LSI-11 Bus
SBI
Polled Character-Input
Massbus Control
Massbus Data

a.
b.

Retry is implemented by software in some PD P-11 operating systems.
Logging is implemented at various levels by operating system software.

five categories of countermeasures to these errors:

3.
4.

1.

2.

3.

4.

5.

Check bits. Extra information is sent
which allows the receiver to detect and
sometimes to correct errors in the data.
Acknowledgement. A reply from the receiver to the sender tells whether the
data appeared "good."
Time-out. Failure of an expected acknowledgement to be received by the
sender within a time limit indicates unsuccessful data transmission.
Retry. A transfer which was unsuccessful
is attempted one or more additional
times.
Error reporting and logging. Failures of
all categories are recorded and reported
to higher level (usually software) logic.
Logging means recording the errors in a
file which can be read later by a service
engineer.

Depending on the cost and service objectives,
a real bus should have a data transfer procedure
with all of the following steps:
1.
2.

295

Arbitration. Obtain the use of the bus.
Data transfer. Transfer data (and check
bits) on the bus.

5.

Check. Check for error-free transfer, and
transfer an acknowledgement.
Retry. If the check or acknowledgement
fails, repeat steps 1 through 3.
Log. If all retries fail, enter a failure report in the log file, and send a message to
higher level logic (software routines).

Table 5 summarizes the error-control methods used in the five example buses.
Example 1: Unibus

Data transfer on the Unibus is not checked.
However, two lines are used by memory connections to signal whether a parity error has
been detected while reading a word from memory.
A controller or CPU on the Unibus times out
20 microseconds after MSYN has been asserted, if assertion of SSYN has not been received. Time-out occurs whenever an invalid or
nonexistent memory address is given as the target of a Unibus transfer.
Example 2: LSI-11 Bus

This bus does not have check bits for data
transfers. However, it has two lines (DAL 17
and 16) that can be used for transmitting the
results of memory parity error checking.

296

THE PDP-11 FAMILY

The LSI-II Bus also has time-outs specified
for responses to the assertion of DIN and
DO UT. If a memory does not respond within
10 microseconds, the CPU or controller assumes that the address is invalid.

CLOCK

TIME
SLOT

TRl

10

Example 3: SBI
DATA

Data transfers on the SBI carry several parity
check bits. Parity is generated at the sending
connection and is checked at the receiving connection.
The SBI also does acknowledgement on every
data transfer. A code is returned to the sending
connection two time-slots after the data was
sent. Separate Confirm (CNF) lines are used to
carry this code. The code indicates one of four
possible events:
1.
2.

3.

4.

No Response. There is no connection responding to this address or ID value.
Parity Error. The parity check shows an
error in transmission; transfer is rejected
by the receiving connection.
Busy. (For commands only.) The receiving connection (memory) addressed cannot accept another command now.
Accepted. Parity checks "good" and the
command or data is accepted.

The Confirm code itself is error-protected.
The No Response code is with all CNF signals
negated. The other codes differ from each other
and from the No Response code in at least two
bit positions. Therefore, an error in one CNF
bit results in an invalid code.
Figure 31 shows the timing of SBI data
transfer acknowledgements. The example in
this figure is a data word transfer from memory
(the second half of a read operation). The CNF
lines are always reserved for a reply from a receiving connection exactly two slots after a data
transfer.
The error-control philosophy on the SBI says
that if any connection detects bad parity on a

FLAG

CONFIRM
DATA
PARITY
CHECKING

-+-_....,__+-_-+__+-_......._--1
--+----+--~~--+---~--_4--~

_+-_--+__

.......- - - I

+"'-'~--+--

DATA RECEIVED
BY 2

CONFIRMATION
RECEIVED
BY 1

Figure 31. Timing of S81 data transfer
acknowledgements. including parity check.

data transfer, then the validity of the data transfer is suspect. Therefore, any connection may
assert a Parity Error Confirm code at the beginning of slot 4 in Figure 31.
As implemented in the VAX-ll/780, the SBI
also uses time-outs, in case the memory does
not respond within a fixed number of slots. The
CPU or controller causes an interruption, possibly leading to software-driven retry or logging
of the event. The VAX-II/780 CPU also does
microprogram-controlled retry of transfer
requests that receive the Busy confirmation
code.

Example 4: Polled Character-Input Bus

Since this example is hypothetical, we cannot
claim to explain its actual error-control methods. It is reasonable, however, to add one data
signal to carry a parity check bit for each character. A time-out is not relevant here, but an
acknowledgement could be implemented by
having the controller send a Confirm signal
back to the keyboard during the slot following

BUSES, THE SKELETON OF COMPUTER STRUCTURES

1

1

297

0

UNITO~LUNIT1

--.J

L

SCLK _ _

~

WCLK _ _ _

~

CLOCK
SLOT
NUMBER

Ei

DATA
P::~T;

--"_ _ _ _ _ _- - - & _......._ _ __

KEYBOARD
SEND

...,

I'--_ _ _~Il'--

...J

PARITY

CHEI~K~~~

_

____

DATA

I

fi%l""-L._ _ _ _--J,flil""-L._ _ __
Il~

WORD 2

,
I

WOAD 3

DETECTED ON
WORD2

CONFIRM
NO CONFIRM

WORD 1

PARITY ERROR

.....

CONTROLLER
CONFIRM

n

REPEAT

__

Figure 32. Timing of a plausible error-checking scheme
with acknowledgement and retry for polled characterinput bus.

the data transfer (Figure 32). If the Confirm signal does not indicate "good transfer," the keyboard can send the character again 4 slots later
(when its turn comes around again).
Example 5a: Massbus Control Section

The Massbus Control Section closely resembles the Unibus in timing, but it does carry one
data parity check signal. If an error occurs on
reading a control register, the controller passes
the "bad parity" indication on to the CPU, with
consequences the same as a memory parity error.

If an error occurs on writing a control register, the peripheral ignores the data word and
asserts the Attention signal. "Control Bus Parity Error" is displayed in the Peripheral Error
Status Register.
The Massbus Control Section also has the
same acknowledgement and time-out properties
as the Unibus, with the exception of reading the
Attention Summary pseudo-register, which always uses the time-out to terminate the read
cycle.

Figure 33. Timing of Exception signal in Massbus Data
Write operation.

Example 5b: Massbus Data Section

The Massbus Data Section carries a parity
check bit with each I8-bit word of data. A signal called Exception (EXC) can be asserted
from either end to indicate a bad data transfer
or other exceptional conditions. Figure 33
shows an example of a Massbus Data Write operation that suffers a parity error during the
transmission of the second word. The peripheral asserts the EXC signal as soon as the error
is detected. Although this is too late to stop the
next word from being transmitted, the peripheral stops accepting data words, and it terminates the block transfer early. The entire block
has to be retransmitted. In this example, the
controller displays a "Transfer Error" when it
interrupts the CPU for "end-of-transfer" serVIce.

Two time-outs are used on the Massbus Data
Section, both in the controller. One starts timing at the assertion of RUN and waits up to
seven seconds for the SCLK signal to make a
transition. This long time is required for ANSI
standard magnetic tapes which may have up to
of 25 feet of inter-record gap.

298

THE PDP-11 FAMILY

A shorter time-out, approximately 100 microseconds, is used to detect a failure in a peripheral after at least one SCLK signal
transition has been received. If this limit is
reached, the controller asserts EXC to tell the
peripheral to disconnect.
ACKNOWLEDGEMENTS'

The chapter author wishes to acknowledge
the patience of J. Craig Mudge, the editor who
provided the impetus to produce this chapter,
and of Heidi Baldus, who spent a great many
hours overseeing the production of this work,
many of them on the telephone at a distance of
3000 miles from the author.
Robert Chen and Alice Parker contributed
greatly by their detailed reviews of the first
draft. Others who helped were Sas Durvasula,
Robert E. Stewart, Harold Stone, Mike Riggle
and Don Vonada. George Herbster, patent attorney and friend to many engineers, provided
reference materials on short notice.
APPENDIX: A GLOSSARY OF TERMS

The definitions below are offered as an aid to
understanding the technical meaning of some
words used in this chapter.

Assert (transitive verb) - to cause a signal to take
the "true" or asserted state.
Asserted (nominal) - to be in the "true" state.
Assertion (noun) - the transition from negated
to asserted.
Bandwidth (noun) - data transfer rate measured
in information units (e.g., bits, bytes, or words)
per unit time.
Connection (noun) - an attachment to a bus and
the logic and functions of the attached subsystem. Synonyms: node, interface.
Interval (noun) - an extent in time. Synonym:
period.

Negate (transitive verb) - to cause a signal to
take the "false" or negated state.

Negated (nominal) - to be in the "false" state.
Negation (noun) - the transition from asserted
to negated.
Read (transitive verb) - to move data from a register, memory, or secondary storage.
Sense (transitive verb) - to capture data from
bus signal lines. Synonyms: receive, gate in,
strobe.
Slot (noun) - a particular interval.
Time-out (intransitive verb) - to wait for the end
of an interval and to take an action associated
with the failure of some eve,nt to occur within
the interval.
Transfer (transitive verb) - to move data (a data
word).
Transmit (transitive verb) - to place data on bus
signal lines. Synonyms: drive, gate out.
When (adverb) - at the instant that.
Whenever (adverb) - every time that.
While (adverb) - throughout the interval that.
Write (transitive verb) - to move data into a register, memory, or secondary storage.

ANNOTATED BIBLIOGRAPHY

F or further reading on bus design in general,
the following references will provide an entry
into some of the published literature.
Blaauw, Gerrit A., Digital System Implementation,
Chapter 9, "Communication," pp 286-316; Prentice- Hall (1976). [I/O channel architecture, data
synchronization]
Chen, Robert C.H., "Bus Communications Systems," Ph.D. thesis, Department of Computer
Science, Carnegie-Mellon University (January
1974). [synchronization, arbitration, and deadlock]
Enslow, Philip H., Jr. (ed.), Multiprocessors and Parallel Processing, Chapter 2, "Systems Hardware,"
pp. 26-80; Wiley (1974). [Multiprocessor bus organization; Unibus; tradeoffs in bus design; I/O topology]
Ornstein, S.M., W.R. Crowther, M.F. Kraley, R.D.
Bressler, A. Michel, and F.E. Heart, "Pluribus - a
reliable mUltiprocessor," AFIPS Conference Proceedings, Vol. 44 (1975), National Computer Conference, pp. 551-559. [Multiprocessor IMP for
ARPANEl1

BUSES, THE SKELETON OF COMPUTER STRUCTURES

Thurber, Kenneth J., E. Douglas Jensen, Larry A.
Jack, Larry L. Kinney, Peter C. Patton, and Lynn
C. Anderson, "A systematic approach to the design of digital bussing structures," AFIPS Conference Proceedings (1972), Vol. 41, Part II, Fall
Joint Computer Conference. fpolling, arbitration
methods, data sy'nchro.nization; annotated biblfog-raphy with 93 entries]

The following four references cover the Unibus and some bus related aspects of the PDP11 architecture.
Bartee, T., Digital Computer Fundamentals (Fourth
Edition), Chapter 10, section 10.6, "Interconnecting System Components," and section
10.7, "Interfacing - Buses," pp. 455-470,
McGraw-Hill (1977). [bus structures, including
Unibus]
Cohen, J., Janson, P., McFarland, H., and Young, J.
J r., Data Processing System, U.S. patent
3,710,324 (9 Jan 1973). [PDP-ll system and
Unibus]
Sutherland, Ivan E., and Carver A. Mead, "Microelectronics and Computer Science," Scientific
American, Vol. 237, no. 3 (September 1977), pp.
21 0-228. [interconnections, Unibus]
Tanenbaum, Andrew S., Structured Computer Organization, Chapter 4, section 4.12, "The PDP11/40 Microprogramming Level," pp. 196-204,
Prentice-Hall (1976). [PDP-II/40 internal organization and Unibus operation]
Cohen, J., Janson, P., McFarland, H., and Young, J.
Jr., Data Processing System, U.S. patent
3,815,099 (4 Jun 1974). [PDP-II memory and peripherals]

The following references are the patents covering the Massbus design.
Jenkins S., Secondary Storage Facility for Data Processing System, U.S. patent 4,047,157 (6 Sept
1977). [Dual-Unibus RHII Massbus controller]·
Levy, J., Jenkins, S., Ku, V., McLean, P., and Hastings, T., Drive Condition Detecting Circuit for
Secondary Storage Facilities in Data Processing
Systems, U.S. patent 3,911,400 (7 Oct 1975).
[Massbus A ttention Summary register]
Levy, J., Jenkins, S., and McLean, P., Secondary
Storage Facility for Data Processing Systems,
U.S. patent 3,999,163 (21 Dec 1976). [Massbus]

299

McLean, P., Jenkins, S., and Ku, V., Diagnosiic Circuit for Data Processing System, U.S. patent
3,911,402 (7 Oct 1975). [Massbus peripheral maintenance register]
Sergeant, 0., Levy, J., Lignos, D., and Griggs, K.,
Drive for Connection to MUltiple Controllers in a
Digital Data Secondary Storage Facility, U.S.
patent 4,007,448 (8 Feb 1977). [Dual-Massbus disk
drive]

The Honeywell Megabus, described in the
first reference below, was an independent development that has some ideas similar to the SBI
and the Unibus. The second reference has a
short description of the SBI. The third reference
contains an intellectual precursor to the SBI,
the "z-bus", which was implemented only in a
simulation.
Conway, J.W., "Approach to Unified Bus Architecture Sidestepping Inherent Drawbacks," Computer Design (Jan 1977). [Honeywell Megabus]
Digital Equipment Corporation, VAX-II /780 Architecture Handbook, (1977), Chapter 2, section 2.2,
"The Synchronous Backplane Interconnect," p.
23. [SBI]
Levy, John V., "Computing with Multiple Microprocessors," Report SLAC-161, Stanford Linear
Accelerator Center, (Apr 1973); (Ph.D. thesis,
Computer Science Department, Stanford U niversity). [Z-machine and z-bus]

The next three references relate to a relatively
new development, the contention-arbitrated serial bus. These are distributed-arbitration buses
which have a single signal used for both arbitration and for data transfer. Further references
can be found in these publications.
MacLaren, Don, Contention-arbitrated serial buses,
Digital Equipment Corporation R&D Group internal memo (13 Sep 1977). [with 8 references]
Metcalfe, Robert M., Packet Communication, Report MAC TR-114, Massachusetts Institute of
Technology Project MAC, (December 1973).
Metcalfe, Robert M. and David R. Boggs, "Ethernet: Distributed packet switching for local computer networks," report CSL 75-7, Xerox Palo
Alto Research Center (November 1975).

A Minicomputer-Compatible
Microcomputer System:
The DEC LSI-11
MARK J. SEBERN

INTRODUCTION

In recent years, minicomputers have found
application in a wide range of areas. In so
doing, they have displaced larger computer systems in many traditional maxicomputer markets. At the same time, they have opened up
many new markets, primarily because of their
low cost, small size, and general ease of use.
Still, in spite of this remarkable success, minicomputers are not without competition. In costsensitive areas, the minicomputer is being eased
out of its dominant position by a new generation of LSI microcomputers; the new "processors on a chip" have found a warm reception
from designers seeking inexpensive computing
power. That warm reception sometimes cools,
however, when the user finds himself with a collection of components, instead of a complete
computing system. The discovery that he is
largely on his own when it comes to software
and debugging support has a similarly chilling
effect. The entry into the world of programming
PROMs, using FORTRAN cross-assemblers
and simulators, and writing even simple software routines from scratch can be a traumatic
experience indeed. Still, the advantages of LSI
microcomputers are very real, and many users

have found the difficulties well worthwhile.
Even so, some cannot help but wonder why
they cannot simply have the best of both
worlds: the cost and size of the microcomputer,
and the ease of use and performance of the
minicomputer systems with which they are familiar.
Therefore, the appearance of a new LSI microcomputer system that is fully compatible
with a line of I6-bit minicomputers is an event
of some significance. This new microcomputer,
the DEC LSI-II (see Figure 1), is a complete
4 K PDP-II on a 21.6 cm X 26.7 cm (8.5 inch X
10.5 inch) board; priced to compete with other
LSI microcomputers, it offers true minicomputer performance and maxicomputer support. The LSI-II, while not meant to be yet
another low-end minicomputer, does bring
many minicomputer strengths to the new
microcomputer applications for which it is intended.
To provide minicomputer performance at a
microcomputer price, the LSI-II was designed
to optimize system costs, rather than component costs. A one-chip central processor,
then, was not necessarily superior to a four-chip
301

302

THE PDP-11 FAMILY

The former examines the architecture, organization, and implementation of the LSI-II, while
the latter discusses interfacing, special features,
and PDP-II compatibility. Together, these two
viewpoints will provide the reader with an introduction to the DEC LSI-II, the first microprogrammed minicomputer-compatible LSI
microcomputer, which provides minicomputer
performance at a microcomputer price.
THE COMPUTER DESIGNER'S VIEW

Figure 1. On one 21.6 cm X 26.7 cm board. the LSI11 provides a complete PDP-11 processor. 4 Kwords of
16-bit memory. an ASCII console. a real-time clock. an
automatic dynamic memory refresh. and interface bus
control.

For the purpose of this discussion, the design
of the LSI-II will be studied at the following
three levels: (1) architecture - the machine as
seen by the programmer, (2) organization - the
block diagram view of subsystems and their interconnection, and (3) implementation - the actual fabrication and physical arrangement of
the various pieces at the component level.
Architecture

one; the choice was made on the basis of total
system cost and performance. On this basis, a
microprogrammed processor was selected, permitting the inclusion of features like a "zero
cost" real-time clock and automatic dynamic
memory refresh. The built-in ASCII programmer's console was also made feasible by the
LSI-II's microprogrammed nature.
Awareness of system costs and performance,
then, was a primary motivation in the LSI-II
design. System issues include cost and ease of
interconnection, the customer's investment in
training and software, and the availability of
design support for both hardware and software.
The impact of these system concerns should become apparent in the following sections which
detail the LSI-II design. Two viewpoints are
taken in this description: the first section treats
the internals of the LSI-It from the computer
designer's point of view, while the second considers the system from the user's perspective.

Instruction Set. The architectural level of a
computer system includes its instruction set, address space, and interrupt structure. The basic
LSI-II instruction set is that of the PDP-I 1/40,
without memory mapping. These instructions
include several operations not found in other
small PDP-II processors, such as Exclusive-Or
(XOR), Sign-Extend (SXT), Subtract One and
Branch (SOB), etc. Full integer multiply/divide
(Extended Instruction Set or EIS) and floatingpoint arithmetic (Floating Instruction Set or
FIS) may be provided by the addition of a
single control read-only memory chip (to be discussed later). Unlike other PDP-l1s, there are
two special operation codes which facilitate access to the processor's program status word
(PSW). The instruction set is, then, more comprehensive than that of the PDP-ll/05, while
the execution times (see Table 1) are a little
slower.
To take advantage of the microprogrammed
nature of the LSI-II, it may at times be desirable to invoke a user-written microroutine. This

A MINICOMPUTER-COMPATIBLE MICROCOMPUTER SYSTEM: THE DEC LSI-11

Table 1.

Execution
Time (J,Ls)

ADD R1. R2

iviOV Ri, R2
MOV A (PC). B (R2)
TSTB (R1)+
JMP(R1)
JSR PC. A (R 1)
Bxx L
RT1
MUL*
FADD*
EMUL*
FDIV*
NOTES
R 1. R2 =
A. B =
Bxx =
L=

Register-register

single interrupt level. Interrupts on the LSI-II
are either enabled or masked, these states being
equivalent to PDP-II processor levels 0 and 4.
With this exception, however, interrupt operation follows the same familiar sequence. Upon

PC-relative. indexed
Auto-indexed
Indirect
Subroutine call
Conditional branch
Rtn from interrupt

sor stores the current processor status word
(PSW) and program counter (PC) on the stack
and picks up a new PSW and PC from a memory location (vector) specified by the interrupting device.

LSI-11 Instruction Timing

Instruction

3.5
3.5
11.55
5.25
4.2
8.05
3.5
8.75-9.45
24-64
42.1
52.2-93.7
151-232

303

Comments

Registers
I ndex constants
Any conditional branch
8-bit offset

*Third MICROM installed for EIS/FIS.

is made possible by a set of reserved instructions which cause branching to a fixed microaddress. These reserved instructions cause an
illegal instruction trap to occur if user microcode is not present.
Address Space. Like other microcomputers
without memory mapping facilities, the LSI-II
virtual and physical address spaces are the
same, both being 16 bits, or 64 Kbytes. (Since
two 8-bit bytes make one I6-bit word, this is
equivalent to 32 Kwords.) As in other members
of the PDP-II family, the top 4 Kwords of the
address space are normally reserved for peripheral device control and data registers. Thus the
nominal maxim um main memory size is 28 K
16-bit words.
Interrupt Structure. The LSI-II interrupt
structure is a subset of the full PDP-II interrupt
system. Like other PDP-II processors, the LSI11 features arbitration between multiple peripheral devices and automatic-service routine "vectoring." It differs, however, in having only a

Organization
PMS Level Description. The "organization" of a computer system denotes the collection of building blocks that comprise it, and the
logical and physical links that connect them. A
block diagram of the LSI-II organization is
shown in Figure 2. The LSI-II CPU, being a
microprogrammed processor, is partitioned
logically and physically into three main sections
- data path, control logic, and micromemory.
Each of these units is, in fact, a separate LSI
chip. Interconnection of these chips is through
the microinstruction bus (MIB) ..
The Data Chip. The data chip contains an
8-bit register file and arithmetic logic unit
(ALU). The chip also provides a I6-bit interface
to the data/address lines (DAL) upon which the
external LSI-II bus is built.
The register file consists of 26 8-bit registers;
of these registers, 10 may be addressed directly
by the microinstruction, 4 may be addressed either directly or indirectly, and the remaining 12
may be addressed only indirectly. Indirect addressing is accomplished by means of a special
3-bit register known as the G register, which
may be easily loaded from the register address
field of the PDP-II instruction. Addressing of
the register file is illustrated in Table 2.
The 12 indirectly addressed 8-bit registers are
used to realize the 6 PO P-ll general purpose
registers, RO through R5. The 4 registers which
may be addressed either directly or indirectly

304

THE PDP-11 FAMILY

Table 2.

Micromachine Register File Addressing

File
Registers

Directly
Addressed

0-1
2-3
4-5
6-7
10-11
12-13
14-15
16-17
20-21
22-23
24-25
26-27
30-31

x
x
x
x
x
x
x

NOTES
SP =
PC =
IR =
BA =
SRC =
DST =
PSW =

Indirectly
Addressed

PDP-11
Equivalent

x
x
x
x
x
x
x
x

RO
R1
R2
R3
R4
R5
R6(SP)
R7(PC)
IR
BA
SRC
DST
PSW

Stack Pointer
Program Counter
Instruction Register
Bus Address
Source Operand
Destination Operand
Processor Status Word

LOCATION

DATA ACCESS CONTROL LINES

Figure 2.

Organization of the LSI-11 CPU.

A MINICOMPUTER-COMPATIBLE MICROCOMPUTER SYSTEM: THE DEC LSI-11

contain the PDP-II program counter (PC) and
stack pointer (SP), since they provide special
processor functions and are accessed very frequently. The 5 remaining pairs of directly addressed registers are used for microprogram
U/ArJ...cr\,:l{'p
<:Inri nArrn<:ll1" (,Ant",;n thp fAlI,..",,;n,.,.·
T'f' VA ""'''yu..",,,,,, U- .. IU .I.lVJ.III"I.l] ,""V.1.1,-"'111 \,,11'-" IVI1VVYII10",

(1) the PDP-II macroinstruction, (2) the bus
address, (3) the source operand, (4) the destination operand, and (5) the macro PSW and other
status information.
The 8-bit ALU operates on two operands addressed by the microinstruction. When a fullword operation is specified, the data path is
cycled twice, with the low order bit of each register address complemented during the second
cycle. Thus a I6-bit macrolevel register is realized by two consecutive 8-bit registers in the
register file. An 8-bit operand may also be signextended and used in a I6-bit operation, or an
8-bit literal value from the microinstruction
may be used as one of the operands.
In addition to the register file and ALU, the
data chip contains storage for several condition
codes. These include flags for zero or negative
results, as well as for carry or overflow; 4- or 8bit carry flags are also provided for use in decimal arithmetic. Special flag-testing circuitry is
also provided for efficiency in executing PO P11 conditional branch instructions.
The Control Chip. The control chip generates MICROM addresses and control signals
for external I/O operations. It contains an 11bit location counter (LC), which is normally incremented after each MICROM access. The LC
may also be loaded by "jump" instructions, or
by the output of the programmable translation
array. A one level subroutine capability is also
provided by an II-bit return register (RR),
which may be used to save or restore the LC
contents.
The programmable translation array (PTA),
the heart of the control chip, consists of two
programmable logic arrays (PLAs); the PTA
generates new LC addresses which are a function of the microprocessor state and of external

305

signals. Included in the microprocessor state is
the I6-bit macroinstruction currently being
interpreted; in this way, much of the macromachine emulation may be done with the high
efficiency provided by the PTA. The combinational logic of the two PLAs allows the
PT A to arbitrate interrupt priorities, translate
macroinstructions, and, in general, to replace
the conventional "branch-on-microtest" microprimitive. Since the microlocation counter is
one of the PTA inputs, it is normally unnecessary to specify explicitly the desired translation
or multiway branch; this information is implicit
in the address of the microinstruction which invokes the PTA. External condition handling is
made possible by four microlevel interrupt lines
which are input to the PTA. Also feeding the
PT A are three internal status flags which are set
and reset under microprogram control.
The MICROM Chip. The micro read-only
memory, or MICROM, serves as the control
store for the microprocessor. The microinstruction width is 22 bits. Sixteen of these bits
comprise the traditional microinstruction; one
is used to latch a subroutine return address, and
one to invoke programmed translations; the remaining four bits (which drive TTL-compatible
outputs) perform special system-defined functions.
Each MICROM chip contains 512 words, or
one-fourth of the 2 K microaddress space.
Proper "chip-select" decode is accomplished by
masking a 2-bit select code (along with the
microcode) into each MICROM at the time of
manufacture; no external selection logic is required.
The Microinstruction Bus. As seen in
Figure 2, microinstructions and microaddresses
share the microinstruction bus lines (MIB
00:21). Instructions thus fetched are executed
by the data chip while the next microaddress is
computed by the control chip. The bus design,
then, allows fully pipelined microinstruction execution, with data and control operations overlapped.

306

THE PDP-11 FAMILY

Microinstruction Repertoire. Using the accepted distinction between horizontal (unencoded) and vertical (highly encoded) microorder codes, the LSI-ll may be classified as an
extremely vertical machine. In fact, the microinstruction set strongly resembles the PDP-ll
code it em ulates; the two differ largely in addressing modes, not in primitive operations.
(Microinstruction formats are depicted in Figure 3, while a number of operation codes are
tabulated in Table 3.) This similarity of instruction sets is not accidental; while general-purpose emulation machines have a place, a
micromachine designed with the macro order
code in mind usually offers better performance.
Th us while many operations are general purpose, like Add, Subtract, Compare, Decrement,
And, Test, Or, Exclusive-Or, etc., others serve
primarily in the emulation of the macrolevel
PDP-II instruction set, such as Read and Increment Word By 2 and so on. I/O primitives

I

OP

ADDRESS

15

(a)

I

11 10

Jump format.

(b)

I

15

(c)

I

OP

15

LITERAL

3

I

0

I

Literal format.

CC

OP
12 11

ADDRESS

8

7

Conditional jump format.

OP
7

4

3

Register format

Figure 3.

I

A

8

15

(d)

0

A

4

12 11

Microinstruction formats.

0

allow for Read, Write, and Read-Modify-Write
operations, as well as special polling transactions.
Implementation
LSI Technology. The "implementation" of
the LSI-II, or how it is actually put together, is
a combination of both custom large-scale integration (LSI) and medium- and small-scale
transistor-transistor logic (TTL) integration.
The control, data, and MICROM chips are fabricated in n-channel silicon-gate four-phase
MOS. This technology was chosen as a reasonable compromise between performance expectations and development risks. Existing nchannel components exhibited the desired performance range, while other technologies (such
as CMOS silicon-on-sapphire) were perceived
as too risky for production during 1975 and
1976.
The micromachine operates with a nominal
cycle time of 350 nanoseconds. A simple primitive operation such as a register-to-register 8-bit
addition requires only one cycle, a marked
speed advantage over other available MOS
"processors on a chip." A comparable I6-bit
operation takes only two cycles. This intrinsic
performance of the LSI-II "inner machine"
means extra flexibility when an application suggests the use of a user-written microcode.
The CPU Module. The LSI-II CPU, a
quad-height (21.6 cm X 26.7 cm) module, consists of the microprogrammed processor and a 4
Kword memory, together with bus transceivers
and control logic. The processor itself consists
of four 40-pin LSI parts - one control chip, one
data chip, and two MICROM chips. These two
MICROMs handle emulation of the basic PDP11 instruction set. In addition, one extra 40-pin
socket is provided to allow the installation of a
third MICROM, implementing the extendedarithmetic and floating-point instructions. Optionally, a custom MICROM containing user
microcode may be installed in its place.

A MINICOMPUTER-COMPATIBLE MICROCOMPUTER SYSTEM: THE DEC LSI-11

The 4 K word memory on board the CPU
module consists of sixteen 4 K dynamic n-channel random-access memories (RAMs). This
memory is implemented so as to logically appear on the external LSI-II bus, while being

Table 3.

Some LSI-11 Microinstructions

Arithmetic Operations
Add Word (byte, literal)
Test word (byte, literal)
I ncrement word (byte) by 1
I ncrement word (byte) by 2
Negate word (byte)
Conditionally increment (decrement) byte
Contitionally add word (byte)
Add word (byte) with carry
Conditionally add digits
Subtract word (byte)
Compare word (byte, literal)
Subtract word (byte) with carry
Decrement word (byte) by 1
Logical Operations
And word (byte, literal)
Test word (byte)
Or word (byte)
Exclusive-Or word (byte)
Bit clear word (byte)
Shift word (byte) right (left) with (without) carry
Complement word (byte)
General Operations
MOV word (byte)
Jump
Return
Conditional jump
Set (reset) flags
Copy (load) condition flags
Load Glow
Conditionally MOV word (byte)
Input/Output Operations
Input word (byte)
Input status word (byte)
Read
Write
Read (write) and increment word (byte) by 1
Read (write) and increment word (byte) by 2
Read (write) acknowledge
Output word (byte, status)

307

physically resident on the CPU module. Accessibility to the bus allows external Direct Memory Access (DMA) transfers to take place to
and from the basic 4-Kword memory. Furthermore, an optional jumper allows the CPU module memory to occupy either the first or second
4 K block of the bus address space. That is, it
may respond to address 000000-017776 or
020000-037776 as desired.
Available Memory Options. The LSI-II
macromemory is available in several forms;
these include semiconductor random-access
memories (RAM), ROM (or PROM), and magnetic core.

Both static and dynamic semiconductor
memories are available. The MSVII-A is a
1024-word static RAM, packaged on a doubleheight (21.6 cm X 12.7 cm) module. It may be
used when dynamic memory is not desired. The
MSVII-B is a 4-Kword dynamic memory,
again packaged on one double-height module.
The availability of automatic memory refresh
(discussed in a later section) will in many cases
make the dynamic memory a more attractive alternative than core or static semiconductor
RAM.
The use of a ROM for program storage is often desirable; not only is the program safe from
unintentional modification, but no external device is needed to load the system each time it is
started. The LSI-II instruction set is well suited
to RO M program storage, since program and
data are easily separable. To take advantage of
this, the LSI-II series includes a ROM module
(designated the MRVII-AA); either a masked
ROM or a programmable ROM (PROM) may
be used. This memory uses standard 256 X 4 or
512 X 4 ROM or PROM chips, to a maximum
of 2 K words or 4 K words depending on the
chips employed. Programmable ROMs may be
used for program development, and less expensive masked ROMs substituted for production
use.

308

THE PDP-11 FAMILY

For applications that require nonvolatile
READ/WRITE memory, a 4-Kword core
memory (the MMVII-A) is available. This
memory occupies two quad-height modules,
and must overhang the last slot in a backplane
unit.
THE USER'S OUTLOOK
Interfacing to the LSI-11

Table 4.

Bus Signal

Signal Function

BOAL 00: 15 L

Buffered data/address lines (timemultiplexed)

BDIN L

Data input transfer control line

BDOUT L

Data output transfer control line

BSYNC L

Synchronizing control signal; asserted by bus master (normally CPU)

BRPLY L

Reply control signal; returned by bus
slave (memory or peripheral device)

BWTBT L

Write/Byte control:

The LSI-11 Bus. The LSI-II bus (Table 4)

serves as the link between the processor, memory, and peripheral devices. Narrower (in terms
of the number of signal lines) than some other
minicomputer buses, it was designed to allow
low cost peripheral interfaces for microcomputer applications, rather than to support
the wide range of peripheral configurations
common in large minicomputer systems. The
wider PDP-II Unibus, for example, is better
suited to larger systems in which CPU and
interconnection comprise a smaller part of the
total system cost.
To reduce the number of bus signals, sixteen
bidirectional lines (BDAL 00: 15) are timemultiplexed between data and address. Transfers on these lines are sequenced by several controllines. BSYNC signals that a bus transaction
is in progress and clocks address decoding logic;
BDIN and BDOUT request input and output
transfers, respectively; BWTBT is used to distinguish word and byte output transfers;
BRPL Y is returned by the bus slave when data
is ready or has been accepted. A special address
line, BBS7, indicates that the bus address is in
the range of 28 K-32 K; this simplifies peripheral device design by indicating that the "I/O
page" is being addressed.
Two bus signals, BIRQ and BIAK, are used
to control processor interrupts. An interrupting
device asserts BIRQ and waits for an interrupt
transaction from the CPU. When the proper

The LSI-11 Bus

At address time, specifies a write
At data time, a byte output
BBS7 L

Marks an address in the range 28 K
- 32 K, the "I/O page"

BREF L

Signals a refresh transaction; overrides normal memory addressing for
dynamic memories

BIRQ L

Interrupt request from device

BIAK I L

Interrupt grant in

BIAK 0 L

Interrupt grant out; used with BIAK I
to arbitrate interrupt priority

BDMR L

Direct Memory Access
request line

BDMG I L

DMA grant in

BDMG 0 L

DMA grant out; like BIAK

BSACK L

Bus DMA acknowledge

BHALT L

Forces entry to ASCII console microcode

BEVNT L

External event line; used with realtime clock

BINIT L

Bus initialize signal

BPOK H

Power OK line from supply

BDCOK H

DC power OK, from supply

(DMA)

A MINICOMPUTER-COMPATIBLE MICROCOMPUTER SYSTEM: THE DEC LSI-11

conditions have been met, the CPU, which remains bus master, strobes the interrupting devices by asserting BIAK. During this bus cycle,
BIAK is "daisy-chained" through all peripherals, allowing priority arbitration to take place.
The selected device then places an interrupt vector address on the bus and returns BRPL Y, terminating the transaction. In a similar manner,
BDMR, BDMG, and BSACK are used to control requests for direct memory access transactions by other peripherals desiring to become
bus master. The lines BINIT, BPOK, and
BDCOK are used for system reset and powerfail/restart.
Three other bus lines perform additional system functions; these are BREF, BHALT, and
BEVNT. BHAL T is used to stop PDP-II emulation and enter console mode; BREF and
BEVNT are used for microcode refresh of dynamic memories and real-time clock operation,
to be discussed in a later section.
Standard Modules. To assist the system
designer, the LSI-II series includes several
standard interface modules. Currently available
are both serial and parallel I/O interfaces. The
D L V-II handles a single asynchronous serial
line at speeds of SO-9600 baud, while the DRV11 provides a fullI6-bit parallel interface complete with two interrupt control units. The
DRV-II is completely compatible with the DRIIC interface used with other PDP-lIs. In order to facilitate program loading when volatile
memory is used, a flexible disk drive and interface is also available. This unit, the RXV-II,
employs industry-standard media and formatting.
An Interfacing Example. The design of a
simple interface to the LSI-II system is pictured
in Figure 4. Here, the problem is to interface an
analog-to-digital (A/D) converter and a fourdigit light-emitting-diode (LED) display. The
A/D converter is presumed to have a resolution
of 8-16 bits, and the LED display is driven as
four binary-coded-decimal (BCD) digits of four

309

CSRO-START CONVERSION

AID DATA

DRV-"

t---~.;;;,,;:....:;.~

LED DATA
DRIVER

INT REO B-PUSHBUTTON

Figure 4.

An interfacing example.

bits each. To simplify the design further, the
standard DRV-II parallel interface module is
employed.
On the input side, the data lines from the
A/D converter are connected to the input lines
(INOO: IS) of the DR V-II, and the End-of-Conversion signal (EOC) from the A/D is fed to
one of the interface's interrupt request lines
(INT REQ A). If the processor enables the interrupt control in the interface, the EOC signal
will now cause an interrupt, and the CPU may
read in the data. To initiate sampling of the
analog input signal, a control line (Start Conversion) is needed; this is controlled by an output line (CSRO) from the DRV-II.
On the output side, the data lines (OUT
00: IS) from the DR V-II are fed directly to the
seven-segment decoder drivers which control
the LED displays. The processor may then
write out a single I6-bit word containing four
BCD digits, and the data will appear in the display. Since a second interrupt input (lNT REQ
B) is available, an operator pushbutton is connected to this line; by interrupting the processor, the user may request a new sample from the
A/D converter or perform some other function.
To aid the designer in applying the LSI-II,
detailed interfacing information is available
[DEC, 1975a; DEC, I97Sb]; these manuals
cover both the standard interface modules and

3 10TH E P DP-11 FAM I LY

the methods used to interface directly to the
LSI-II Bus (Figure 5). In most cases, peripheral
interface design is a little simpler than in the
case of the traditional PDP-II Unibus.
Special Features

Several special features of value in low cost
systems have been implemented in the LSI-II
microcode. These include an ASCII console, a
real-time clock, an automatic dynamic memory
refresh, flexible power-up options, and internal
maintenance features.
ASCII Console. The LSI-II ASCII console
serves to replace the conventional "lights and
switches" front panel often associated with
minicomputer operation. The ASCII console
functions with a standard terminal device which
communicates over a serial or parallel link at

any desired rate. The available functions are
very similar to those of PDP-l I octal debugging
technique (ODT), which is familiar to users of
other PDP-II systems. These include examination and alteration of the contents of memory and processor registers, calculation of
effective addresses for PC-relative and indirect
addressing, and the control functions of Halt,
Single-Step, Continue, and Restart. Internal
processor registers are also accessible, making
possible a determination of the type of entry to
the console routines (Halt instruction, etc.).
The advantages of the ASCII console include
low cost, remote diagnostic capability, and
high-level operator interface. The user retains
all the direct hardware control of a conventional front panel, while being freed from
tedious switch register operation. This use of
the terminal device in no way conflicts with its

Figure 5. The LSI-11 series contains the LSI-11 CPU (center), together with parallel and
serial interfaces. and RAM and ROM memory modules. These modules may be housed in a
backplane assembly. connected by the LSI-11 bus.

A MINICOMPUTER-COMPATIBLE MICROCOMPUTER SYSTEM: THE DEC LSI-11

normai use by the program being debugged.
The ASCII console routines also allow the user
to boot load from a specified device in a byte
transfer mode. All together, the ASCII console
routines occupy about 340 words of microcode;
since this space is available in the second
MICROM, the console functions are made possible at no extra cost.
Real-Time Clock. Many low-end configurations require a real-time clock, driven by
the power-line frequency or other timing signal,
which is normally implemented with external
control logic. To save this expense, such a device has been programmed into the LSI-II processor microcode. To use this clock, the user
need only connect the timing signal to the processor through the bus line BEVNT. Once connected, this clock is identical to the K W -11 L
line clock when used in an interrupt mode, except that it may not be turned on and off. An
optional jumper disables the real-time clock if
its operation is not desired.
Automatic Dynamic Memory Refresh.

One disadvantage of using dynamic MaS memories is the necessity of refreshing their contents
at appropriate intervals. This refresh operation
is needed to replace the stored charge in each
memory cell which has been lost through leakage current. In typical dynamic MaS memories, each cell must be refreshed every 2
milliseconds. Most dynamic memories are implemented in such a way that any normal memory access refreshes a group of cells (or "row")
on all selected memory chips. One access must
then be made to each row of every memory
chip; the 4 K memories used in the LSI-II system require that 64 accesses be made. N ormally, the logic to control the refresh operation
would include a 6-bit counter, a clock, and
memory access arbitration circuitry.
In order to minimize this control circuitry,
the LSI-II CPU microcode features automatic
refresh control. When enabled by an optional
jumper, the CPU takes a refresh trap approximately every 1.6 ms. At this time, it performs

311

64 memory references while asserting a special
bus signal, BREF. This signals all dynamic
memories to cycle at the same time. Direct
Memory Access (DMA) requests are arbitrated
between bus refresh cycles to reduce DMA
latency. External interrupts, however, are
locked out during the burst refresh time, temporarily increasing interrupt latency. (When this
latency can not be tolerated, external refresh
circuitry can drive the bus and assert BREF, allowing use of either refresh method with the
same memory modules.) The automatic refresh
feature is not needed, of course, in systems
without dynamic memories.
Power-Fail/Restart Options. The flexibility of the LSI-II system is further enhanced
by the availability of several power-fail/restart
options. The power-fail sequence, which is normally of use only with nonvolatile main memory, is compatible with other members of the
PDP-II family. Upon sensing a warning signal
from the power supply, the power-fail trap is
taken. The current PSW and PC are pushed on
the processor stack, and a new PC and PSW are
taken from a vector at octal location 24. N ormally, the routine thus invoked would save processor registers, set up a restart routine, and
HAL T. When volatile memory is used, the register may not be saved; in this case, the powerfail trap allows an orderly system shut-down to
occur.
Four power-up options are selected by two
jumpers on the LSI-II CPU module. The first
of these is to load a previously set-up PSW and
PC from the vector at location 24. Normally
used with nonvolatile memory to continue
execution from the power-fail point, this option
is compatible with the normal PDP-II powerup sequence. If ROM program storage is
employed, this option allows the program to be
started at an arbitrary address. If the BHALT
line on the bus (the HALT switch) is asserted
during this power-up sequence, the console
microcode will be entered immediately after
loading the PSW and PC.

312

THE PDP-11 FAMILY

The second power-up option causes an unconditional entry to the ASCII console routines. This allows remote system startup
without the necessity of controlling the bus Halt
line. The processor may then be started, as
usual, by an ASCII console command.
The last two options allow program execution to begin at a specified address in either
macrocode or microcode. Option three sets the
macro PC to 173 000 octal and starts normal
execution. Option four causes a jump to microcode location 3002 octal, in the fourth
MICROM page. Here, the CPU expects to find
a user-written microcode routine to perform a
special power-up sequence. The state of the
BHALT line is not checked in this last case until
the execution of the first macrocode instruction
is completed.
The Maintenance Instruction. For ease in
hardware checkout, a special maintenance instruction is included in the LSI-II repertoire.
This instruction stores the contents of five internal registers in a specified block in the main
memory. The information may then be used by
a diagnostic program to probe the internal operation of the microlevel processor.
The LSI-11 as a Member of the PDP-11
Family
Upward Compatibility. Because the basic
instruction set of the LSI-II processor is that of
the entire PDP-II family, the user has an extremely large range of compatible processing
systems at his disposal. This range extends from
the LSI-lIon the low end to the PDP-II /70 on
the high end. The consistency of the instruction
set provides economies in training and documentation costs, as well as the ability to carry
specific application programs, or even complete
operating systems, from one family member to
another. Thus, a user currently employing a
small PDP-II, like the PDP-ll/OS, can easily
convert to the low cost LSI-II without losing a
past investment in software development. This

compatibility also eases the program development problems often associated with microcomputer systems; assembly, compilation, and
initial debugging may be done on any PDP-II
system, with the generated code loaded into an
LSI-II system for testing and final debug.
Through the use of the LSI-II ASCII console, a
central PDP-II system may initialize, load, and
start up a remote LSI-II system over an asynchronous serial line or other link.
Software Support. Other members of the
PDP-II family, beginning with the Model 20
(Chapter 9), have been in service for some time.
Thus the system designer has at immediate
hand a large number of language processors,
utility routines, and application programs.
Many of these programs will run with little or
no modification on an LSI-II system. This existing library of software provides the user with
a head start in the application of microcomputers, at little or no development cost.
Network Capability. Since the LSI-II
shares a common set of data-types and file
structures with other PDP-II systems, many
communication problems disappear. When
linked through line protocols such as DDCMP
(digital data communications message protocol
[DEC, 1974; DEC, I974a]), LSI-lIs may exchange programs and files with other PDP-lIs
without adjustments for differing word sizes,
operating systems, file structures, etc. This fact
makes the LSI-II the ideal choice for a network
node processor. Used with distributed programming systems such as RSX-II, RSTS, or
RT-II, the individual LSI-II processors may
not even require their own mass storage devices,
but rather share those of other network nodes.
A monitoring network might then consist of a
large central PDP-II with disks, magnetic tape
units, and other peripherals, together with several remote LSI-II s which would directly control transducers and communication lines. Yet,
even in such a functionally differentiated system, all processors would be homogeneous in

A MINICOMPUTER-COMPATIBLE MICROCOMPUTER SYSTEM: THE DEC LSI-11

instruction set; the distributed nature of the network need not even be visible to the user.
SUMMARY

The LSI-II, then, is the first of a new class of
microcomputers and offers the user most of the
advantages of a full-blown minicomputer at a
significantly lower cost. It is, in fact, the first
member of the PDP-II family ever offered as a
single-board component to original equipment
manufacturers and others. Gaining power and
flexibility from its microprogrammed design,
the LSI-II provides a number of important system features not yet found in other LSI microcomputers. With its minicomputer-compatible
instruction set, the LSI-II offers a new level of
microcomputer accessibility and ease of use.
Whether seen as low-end minicomputers or
high-end microcomputers, machines like the
LSI-II serve to bridge the gap which has separated minicomputer performance and conven-

313

lence from microcomputer economy and
flexibility.
And so, the computer revolution continues;
from the maxi to the mini to the micro, the
number and breadth of computer applications
continue to grow. The DEC LSI-II~ a microprogrammed minicomputer-compatible microcomputer system, contributes to this growth.
The LSI-II is an important step in this continuing evolution; it will certainly not be the last.
For both designers and users of this new generation of computer systems, there remain many
interesting days ahead.

ACKNOWLEDGEMENTS

The author wishes to express his gratitude to
the many people who helped in the peparation
and review of this paper, especially S. Teicher,
M. Titelbaum. D. Dickhut, R. Olsen, and R.
Eckhouse.

Design Decisions for the
PDP-11/60 Mid-Range Minicomputer
J. CRAIG MUDGE

INTRODUCTION

DESIGN STYLES

Design evolution of a minicomputer family
usually proceeds along three basic dimensions:
cost, functionality, and size. That is, the minicomputer becomes cheaper, more powerful,
and smaller with time. The underlying hardware technology is the dominant factor in determining the evolution. In contrast to the
evolution of large computers, market factors
have less influence on the growth pattern of
minicomputers. However, minicomputer software characteristics are affected by the market.
These requirements rapidly feed down to modify the hardware, given that the technology will
support user needs.
The DEC PDP-II /60 serves to demonstrate
minicomputer designing with improved technologies. Being a mid-range machine, i.e., neither the lowest in cost nor the highest in
performance, its design is a rich source of
tradeoff examples. Its cache design illustrates a
price/performance trade; the decreasing cost of
read-only memories (ROMs) show how
hardware-microcode tradeoffs change over
time, and its integral floating-point arithmetic
unit exemplifies a software-hardware tradeoff.

Equipment history reveals that a member is
added to a minicomputer family whenever technology advances by a factor of 2; for example,
doubling of bit density on a memory chip. Over
the past 15 years, such an improvement has occurred about every two years.
These advances in technology can be translated into either of two fundamentally different
design styles. One provides essentially constant
functionality at a minimal price (which decreases over time); the second keeps cost constant and increases functionality. (Here, and in
the discussion to follow, the definition of functionality has been broadened from its conventional single component, speed, to include
components such as extended instructions and
self-checking.) Both design approaches coordinate with the basic marketing philosophy of the
minicomputer industry: more computation for
more users at less cost. There have been ten
models, or implementations, of the PDP-II architecture since the unit was introduced in 1970
(Chapter 9). Figure 1 illustrates how the two design styles affected successive implementations
within this minicomputer family.
315

316

THE PDP-11 FAMILY

CONSTANT COST.

t

lii
o

u

CONSTANT FUNCTIONALITY.
OECREASING COST
TIME

---+

Figure 1.
Minicomputer family evolution. Advances in
technology translate into two design styles: constant
cost/increasing functionality and constant functionality/decreasing cost. The PD P-11 /60 represents former
design style. Functionality added to PDP-11 /40 is depicted by shaded area. Tradeoffs discussed occur within
this area.

Internal structure. Cache placement between
Figure 2.
Unibus and CPU permits faster execution and allows use
of standard memories. However. DMA monitoring mechanism is needed for traffic on path CBA. Module count is
six for CPU and cache. one for writable control store. one
for microdiagnostics unit. and four for floating-point processor. This processor operates in parallel with CPU execution of nonfloating-point instructions; instruction times
are 1.02 fJ.s for double-precision add and 1.53 fJ.S for
single-precision multiply. Writable control store uses
1024 control words that are reloadable and that control
170 ns inner machine. Machine is design optimized for
user environment characterized by real-time operating
system and FORTRAN.

Lower cost members trace the decreasing
cost/constant functionality curve. (This is the
11/20, 11/05, and LSI-II or 11/03 line.) The
horizontal line in Figure 1 connects the constant cost/increasing functionality designs.
(Not shown are "growth-path" members that
provide greater performance at slightly increased costs; 11/45, 11/55, and 11/70
machines trace an upward growth-path curve.)
Shaded area in the figure represents the added
functionality possible through technology advances. Mid-range minicomputers attempt to
optimize price/functionality and, hence, offer
an excellent vantage point for discussing design
tradeoffs made under the constant-cost design
style.
In addition to the capabilities provided by
technological advances, a mature family architecture and user base allows the minicomputer
designer to include those capabilities that were
not considered feasible in the original architecture. These features may not have been included because they were too costly to
implement, not sufficiently general purpose to
justify their inclusion, or not perceived as being
essential to users. Reliability, maintainability,
the integral floating-point unit, and the writable
control store (WCS) option represent such
ca pa bili ties.
Internal structure of the 11/60 (Figure 2) incorporates a 2048-byte cache, memory management unit (for virtual-to-physical address
translation), and an integral floating-point unit
as standard components. The unit can perform
a register-to-register add instruction in an average time of 530 ns; internal cycle time is 170 ns.
A vailable as options are a floating-point processor, which implements at higher speed the
same 46 instructions as the integral unit, a writable control store, and a microdiagnostic unit.
ADVANCES IN MEMORY TECHNOLOGY

Improvements in memory technology have
been the principal forces in minicomputer de-

DESIGN DECISIONS FOR THE PDP-11/60 MID-RANGE MINICOMPUTERS

velopments. ~..1emory is the most basic component of a computer, and it is utilized
throughout the design. In addition to obvious
uses as main program and data memory, and as
file storage devices (disks and tapes), memory is
also located within the central processor in the
form of registers, state indicators, control, and
buffer storage between the central processor
and main (primary) memory. In input/output
(I/O) devices, there are buffers and staging
areas. Memory can be substituted for nearly all
logic by substituting table lookup for computation.
The constantly increasing bit density mentioned previously has been the most dramatic
development in memories. For example, bipolar read-write or random-access memory
(RAM) chips have advanced as follows.

317

Both hardwired control devices and microprogrammed control devices have curves that
trace increases in cost as they implement increasing functionality (Figure 3). However, the
rate of cost increase is less for microprogrammed controls than for hardwired controls. Davidow [1972] demonstrates that a
factor of 4 difference exists between the two
slopes.
At some point, the two related hardwired and
microprogrammed curves cross. Beyond that
intersection, microprogrammed controls are

t

....

!II

8

Year When First
Widely Available

Number
orBits

1969-70
1971-72
1973
1975
1977

16
64
256
1024
4096

Cost reductions have paralleled bit density
increases. A consequence of high density RAM
technology is that cache memories are now extensively used in mid- and upper-range minicomputers. Bipolar ROM densities have led
RAM densities by about a year. Thus, the 2048bit ROM, organized as 512X4, was available in
1975.
These factors have made microprogrammed
control increasingly attractive to the minicomputer designer. While large-scale computers
utilized extensive microprogramming during
the 1960s, it was not a cost-effective choice for
the minicomputer designer because of the prohibitive cost of the read-only storage technology then available.

X3

X2

PDP-ll

Xl

FUNCTIONALITY -

Figure 3.
Semiconductor technology trends in control
implementations. Cost comparisons, at three different
points in time, of conventional hardwired control and advanced microprogrammed control show two important
trends. First at fixed point in time in 1970s (e.g., time
t3), microprogrammed control is less expensive above
certain level of complexity (x3). For simplest type of machine, random logic gives most economical design. Microprogrammed design has base cost associated with
address sequencing and memory selection circuitry. Microprogrammed control cost increases slowly with number of sequencing cycles, which are added as complexity
increases, because each additional cycle requires one additional word of control store. Second, because rate of
cost-decrease for memories is greater than the rate for
random logic, crossover points move with time, gradually
shifting in favor of microprogrammed control. When
11120 was designed (time t1) hardwired controls were
cheaper. Its successor, the 11/40, was designed at time
t2 and used microprogramming. The 11/60, at time t3,
used increased microprogramming.

318

THE PDP-11 FAMILY

more economical to use in a design. Both of
these curves are moving downward in cost with
time, but the curve for microprogrammed controls is moving downward at a faster rate. Thus,
the intersection point of the two curves is gradually shifting in favor of microprogrammed
controls because the two technologies are moving at different rates. The PDP-II family offers
an example of this trend. At the time the 11/20
was designed, the crossover point was to the
right of the PDP-II instruction set on the abscissa. Hence, the 11/20 used hardwired controls. However, all subsequent implementations
have used a ROM-controlled microprogrammed processor. O'Loughlin [1975] contrasts the control implementations of four
members of the family.
Instruction decode on the 11/60 provides an
example of a different use of ROMs. For the
secondary decode (the primary is done by combinational logic), part of the instruction register
addresses a ROM in which control-storeaddress offsets are stored. This data-table approach offers both a component saving and a
more systematic design. Another example is a
ROM-stored table that inspects memory addresses to detect those that refer to locations internal to the processor.
Other advances in semiconductor technology
that have affected the minicomputer designer's
task include the development of 3-state logic devices and greater levels of gate integration in
logic chips. Widely available in 1975, 3-state
logic encourages bus-oriented designs. Six 3state buses are implemented in the 11/60.
Examples are the 48-bit-wide control signal bus
in the CPU and the 60-bit-wide fraction data
and 10-bit-wide exponent data buses in the
floating-point processor.
Increased gate integration in logic chips had
its major impact on constant-cost minicomputers when the design evolution moved
from the 11/20 to the 11/40. The latter machine
made heavy use of medium-scale integration
(MSI). The MSI available to 11/60 designers

had negligible density gains over that available
to the 11/40 designers. However, after the basic
technology decision for the 11/60 was made, a
significant step in gate integration occurred.
The bit-slice technology, as typified by the 4bit-wide bipolar AM290I microprocessor slice,
became widely available. A 1977 technology decision for a mid-range minicomputer would
clearly choose bit-slice components. For the
11/60, however, improvements came from the
introduction of 3-state logic and from availability of a wider range of Schottky logic components.
Three semiconductor technology advances
contributed to the 11/60 price/performance design in differing degrees. Most important was
the cost reduction in ROMs, next was the density improvement in RAMs, and third was the
(minor) increase in random logic density.
PRICE/PERFORMANCE BALANCE

Two components, the cache memory and the
medium-bandwidth I/O structure, demonstrate
the price/performance balance characteristic of
the 11/60 mid-range minicomputer.
Cache is now a well-proven technique in
computer memory implementation. Its purpose
is to achieve the effect of an all-high-speed
memory by using two memories - one slow
(primary) and one fast (cache) - and by taking
advantage of the fact that, most of the time,
data being used is in the fast or cache memory.
Programs typically have the property of locality; that is, over short periods of time, most accesses are to a small number of memory
locations. The hardware algorithm managing
the cache attempts to keep copies of these locations in the cache. The term "hit ratio" is used
to describe the proportion of requests for data
or instructions that are satisfied by reference
only to the cache. Alternatively, "miss ratio" is
the complement of hit ratio. Performance is determined by the hit ratio, which is a function of
several cache organizational parameters, including: (l) cache size, (2) block size (amount of

DESIGN DECISIONS FOR THE PDP-11/60 MID-RANGE MINICOMPUTERS

data moved between the slow or primary memory and the cache), and (3) form of address
comparison used.
Strecker (Chapter 10) describes the research
that led to the use of a cache memory in the
11/70. llis simulation models \vere also used in
the 11/60 design. By comparing the designs of
these machines, several tradeoffs made to obtain a lower cost memory system appropriate to
the mid-range 11/60 can be noted.
The first parameter to be determined was the
amount of data to be moved between primary
memory and cache. This decision was closely
related to the width of the internal memory bus
connecting I/0 devices to primary memory.
Since the 11/70 was planned to support several
high speed Direct Memory Access (DMA) devices, (e.g., swapping disks operating concurrently), its designers provided a 32-bit bus to
memory to supplement the 16-bit-wide Unibus.
Because the target 11/60 users do not require
such a large I/O bandwidth, the Unibus is used
for DMA traffic. The 11/70 cache has a block
size of two 16-bit words and transfers 32 bits
from memory to cache across its dedicated
memory bus. Since the 11/60 uses the 16-bit
Unibus as its memory bus, the simplest block
size - one 16-bit word - was chosen. Note that a
2-word block size can be achieved with a 16-bit
bus; the bus is cycled twice to effect a 2-word
transfer. Cache simulations showed that this
bus cycling would raise the hit ratio of the
11/60 from 87 to 92 percent. However, the associated performance gain was judged not to be
worth the significant added cost of the extra
control logic needed to cycle the bus twice.
The next decision concerned the size of the
cache. Simulation results showed that the miss
ratio decreases rapidly for cache sizes up to
1024 words and less rapidly for larger sizes. But
how should the 1024 words be partitioned? Because a full-associative cache requires an expensive content-addressed memory, the
partitioning choice for minicomputers is for a
set-associative cache. Since a complete dis-

319

cussion of associativity and replacement is beyond the scope of this article, the reader is
referred to the papers by Meade [1971] and
Strecker (Chapter 10).
Degree of associativity and total cache size
was dominated by the form factors of two candidate RAM chips (256 X 1 and 1024 XI).
These factors are illustrated in Figure 4. The
following list shows the clear price/
performance advantage of the chosen 1024word, set-size-of-one cache.
RAM
Chip
Capacity

Set
Size

Cache
Size

RAM
Chip
Count

Hit
Ratio

1
1
1
1

1
1
2
1

256
512
512
1024

n
2n
2n
n

0.70
0.75
0.82
0.87

256 X 1
1024 X 1

2
2

1024
2048

4n
2n

0.93
0.93

256 X
256 X
256 X
1024 X

The resulting structure is shown in Figure 5.
This simple, direct-mapped organization should
dominate minicomputer cache designs in the
near-term future. By using the design evolution
model shown in Figure 1, it is projected that the
two candidate RAM chips for the successor to
the 11/60 cache will be the 1024 X 1 and 4096 X
1 chips. Obviously, the design choice for that
new class of machine will be a 4096-word directmapped cache.
Since simulation data show negligible performance difference between various writeallocation strategies, the lowest cost strategy,
that of allocate-on-write, was implemented. Because the 11/60 utilized a set-size-of-one cache,
there was no need to decide upon a replacement
algorithm. The 11/70 uses a random-replacement algorithm.
The next decision to be made concerned
placement of cache. Two choices were evaluated. The cache could be placed between the
Unibus and the primary memory or between

320

TAG

THE PDP-11 FAMI LV

DATA

the Unibus and the central processor. The latter
was chosen because of the following advantages.
l.
SETO
TAG

WORD
PDp·ll/S0 CACHE
CONSTRUCTED
FROM n 1024 X 1
RAM CHIPS

SET 1

DATA

TAG

DATA

DDJDDJ
WORD

WORO

WORO

2.

WORD

PDP·II170 CACHE CONSTRUCTEO FROM
4n 256 X 1 RAM CHIPS

Figure 4.
Cache comparison. Simple direct mapped
cache of the 11/60 contrasted with the 11170 cache
illustrates a price-performance tradeoff. The 11170
cache has a block size of two (two words are transferred
from primary memory) and a set size of two (a word may
be placed in either set). Component savings of the simpler organization are clear; only one address comparator
is needed. no multiplexer is required to select the output
of the data store. and only one set of parity checkers is
needed. Hit ratio of the simpler 11/60 cache is 0.87 as
compared with 0.93 for the 11170 cache. which required
five times the component count.

Figure 5.
Direct-mapped cache. Mapping occurs from
128 Kwords of primary memory to 1024-word cache.
High-order seven bits of an 18-bit address are stored in
tag store to ensure uniqueness in mapping. Tag store
also holds a valid bit and parity bits. Cache word format
(27 bits in total) is as shown in the bit map.

3.

Machine execution is faster since the
high speed cache is local to the central
processor. Time delays associated with
synchronization and transmission on the
Unibus are avoided.
Instead of designing specific 11/60 memory modules, existing memory subsystems that interface to the Unibus
could be used. Moreover, as faster
Unibus-interfaced memories become
available, they can be installed on the
machine without change.
DMA traffic interferes with processor
activity to a lesser extent. DMA activity
takes place over the path labeled ABC in
Figure 2. Processor speed is degraded by
interference with I/O operations only
when the cache needs to reference the
primary memory, using path ABD in
Figure 2. This happens only in the event
of a read miss, typically less than 13 percent of the time, and on write operations
(10 percent of memory references).

The disadvantage of this placement is that a
mechanism to monitor DMA traffic must be
added to the cache to avoid the "stale data"
problem. (When the processor reads a location
that has been written by DMA, it must receive
the information from primary memory.) The alternative placement avoids this extra mechanism by handling both DMA and processor
requests with the same mechanism. However,
there is more interference between the processor
and I/O activity.
Increased memory chip density and the cache
performance tradeoff resulted in a significant
component reduction. The 11/70 cache occupies four printed circuit boards (approximately 440 chips); the 11/60 occupies less than
one board (approximately 85 chips). This factor
of 5 component reduction is due to: (1) absence

DESIGN DECISIONS FOR THE PDP-11/60 MID-RANGE MINICOMPUTERS

of the 32-bit bus, (2) simpler cache organization, and (3) semiconductor technology advances. These three factors contributed in
approximately equal proportions.
FREQUENCY-DR!VEN DES!GN

Because the 11/60 implemented a stable, mature instruction set, several years of programming experience were incorporated into the
system design. A simulator program was used
to gather execution statistics on a range of programs. Frequency distributions of operation
codes and addressing modes drove the design of
the base 11/60 and the floating-point processor
option.
Functions implemented in hardware, as
opposed to microcode, require less time to
execute. However, microprogrammed implementations are less expensive, as shown in
Figure 3. Frequency distributions of operation
codes guided the tradeoff. A balanced mixture
of hardwired and microprogrammed implementation of functions produced a central processor
that approached the speed of a computer with
completely hardwired control functions, but at
a lower cost.
Frequency distributions of floating-point
operands were also used. Sweeney [1965]
analyzed the execution of more than one million floating-point additions and tabulated the
behavior of preshift alignment and postshift
normalization. Both distributions are highly
skewed toward low numbers of shifts. By exploiting these data, the floating-point processor
performs a double-precision add in 1.02 microseconds as compared with 1.68 microseconds
on a comparable unit that uses a conventional
algorithm.
To measure the price/performance advantage claimed for the frequency-driven design
approach in the base 11/60, a similar machine
was needed for comparison. Obviously, such a
machine, realized in the same semiconductor
technology and designed so that the hardware

321

resources were divided equally among all instructions, was not available. However, data
was available on floating-point implementations. The floating-point processor design was a
four printed circuit board unit that exploited
the frequency distributions of operation codes,
addressing modes, and shift amounts. A theoretical comparison was made with another four
board design that did not use a frequencydriven approach. The 11/60 floating-point processor was estimated to exhibit a performance
gain of 30 to 40 percent on the standard set of
benchmark programs used throughout the design process.
INTEGRAL FLOATING-POINT
ARITHMETIC UNIT

Addition of an integral floating-point arithmetic unit to the 11/60 was a direct consequence of market feedback. In particular, it
was determined that the majority of the
machine's users would use FORTRAN IV as a
source language. In addition, among those
using that language, many were not interested
in heavy floating-point computation because integer arithmetic dominated their applications.
The FORTRAN IV-PLUS compiler has been
optimized for execution speed (as opposed to
compile speed) - typically a factor of three over
other available FORTRAN IV compilers. This
compiler, however, employs the instruction set
and auxiliary r.egisters of the PDP-II floatingpoint processors. Thus, to take advantage of the
compiler's efficiency without burdening the
user with the cost of a fast floating-point processor, the central processor must provide those
floating-point instructions. This is done by
emulating the 46 instructions, including the 64bit data operations, of the full floating-point instruction set using the 16-bit-wide data path of
the base 11/60. For users who require
FORTRAN IV but have low floating-point
content in their programs, the integral floatingpoint unit is all that is necessary.

322

THE PDP-11 FAMILY

Additional microcode and register space
added a few percent to the CPU cost. However,
for that small cost increase, FORTRAN IV performance on integer programs was increased by
300 percent - a dramatic increase.

CABINET-LEVEL INTEGRATION

Physical packaging of minicomputer systems
involves another set of tradeoffs. Several levels
of size integration are available, ranging from
the chip level (LSI-II), through the board level
(11/04) and the box level (11/34), to the cabinet
level (11/60).
At the cabinet level, packaging techniques are
generally traditional. System fabrication is frequently the result of determining methods to install subassemblies into standard racks. At this
configuration level, generalized subassemblies
are usually chosen for certain functions.
This generally evokes a cost. For instance,
there may be a great deal of unused space in
conventional industrial racks; in most cases this
excess space is simply covered with blank paneling. The cooling system, however, must be designed as if all the racks within the cabinet were
occupied with subassemblies.
It was projected that the majority of the configurations sold would be system oriented; thus,
design optimization at the cabinet level would
be worthwhile. Therefore, the standard 11/60 is
cabinet packaged. Figure 6 shows how the
CPU, memory, disk units, power supplies, and
expansion backplane are packaged to gain the
advantages that stem from cabinet level integration. This integration also yielded added
volume, allowing a more powerful blower system to be installed. Acoustic sound power emittance is very low, considering that the rated
operating environment is DEC Standard 102
Class C (122 0 F) for the processor. Improved
power efficiency, appearance for the office environment, and subassembly accessibility are also
provided.

USER MICROPROGRAMMING OPTION

User microprogramming was incorporated in
the system to meet growing market demands.
The option allows the user to create instructions
that tailor the central processor, particularly the
data flow, to his particular application.
Many potential applications of microprogramming were considered during the design of the data path and control sections of the
11/60. They ranged from instruction set extensions, e.g., translation, string, and decimal
arithmetic operations, to application kernels,
such as node manipulation in list processing
and fast Fourier transform in signal processing.
Merely substituting RAM for ROM control

LEGEND:
A - DISK DRIVES
- MAINTENANCE CONSOLE
- CARD CAGE SWUNG INTO MAINTENANCE·ACCESS POSITION
D - CARD CAGE IN CLOSED POSITION
E - REAR ACCESS MODULAR POWER SUPPLIES
F - BLOWER SYSTEM

Figure 6. Cabinet packaging. Primary design goals
were reliability and maintainability. System logic is
mounted on swing-out card cages C and D for easy access. Rear access power supplies E are modular. Cable
routing reduces electrical noise and crosstalk. Blower
system F keeps all devices cool. Keypad B with numerical display facilitates machine control and maintenance.
Disks A are top- or front-loading units.

DESIGN DECISIONS FOR THE PDP-11/60 MID-RANGE MINICOMPUTERS

does not result in a microprogrammable computer. A microprogrammable computer system
should have the following:
1.
2.
3.
4.
5.
6.

Extra address space in the control store.
Generality in the data path's processing
elements.
A means to load the writable control
store (WCS).
User-oriented hardware documentation.
Software to support writing and debugging microprograms.
Integration of hardware and software
protocols.

All these capabilities were designed into the
11/60 WCS option.
A previously reserved operation code,
0767XX in the PDP-II instruction set, has been
allocated for users. Its designation is XFC, extended function code. When this code is recognized, the CPU transfers control to the upper
1024-word block of the 4096-word microprogram address space. User-written microcode
may take over from there.
A second (asynchronous) type of entry to
user's microcode is also provided. This occurs
when a WCS-serviced interrupt is recognized by
the base machine. Thus, a user can write interrupt service routines in microcode and invoke
them without the usual inerrupt overhead. Such
routines may even be complete I/O channel
emulations.
Implementation of the basic 11/60 demonstrated flexibility of microprogramming. The
techniques were used in such diverse functions
as console service, error logging, floating-point
arithmetic, and cache initialization.
Microprogramming does not always result in
significant performance gains. Well-suited applications can gain by a factor of 5; poorly
suited ones may give only minimal improvement. This is supported by measurements on
digital signal processing software reported by
Morris and Mudge [1977]. Prospective users

323

must carefully analyze the execution behavior
of the application to determine which parts are
"hot spots," i.e., most frequently executed. For
the average application, an overall factor of 2
improvement should be expected. This average,
found to be a useful rule of thumb, is derived by
assuming that all hot spots are microprogrammed and the remainder of the program
is left unchanged.
Two user-microprogramming options are
available. The first is composed of the writable
control store module, software tools, and associated manuals. The second is a board containing control logic and sockets ready for the
insertion of custom-programmable ROMs
(PROMs) containing microprograms developed
with the writable control store. This extended
control store (ECS) option is designed for situations where microcode integrity and/or multiple installations are required.
A novel structuring of the writable control
store allows it to be used to store data. Availability of data storage local to a processor, i.e.,
not accessed through a main, general purpose
memory bus, can increase system speed. Such
local store is usually implemented in some special technology that has low capacity but high
performance. Writable control store has been
structured so that the 48-bit microinstruction
storage words can be read and written as 16-bit
data words. In addition to conventional writable control store hardware, logic is available to
realize a local store address register (LSAR)
and a local store data register (LSDR).
Thus, the microprogram mer has fast local
store available. This storage is block-oriented.
A three-cycle overhead is needed to start a
block read (or block write); then, words are
read (or written) at the rate of one per microcycle. The microprogram can be logically partitioned into two sections: control store - 48-bit
control words; and local store - 16-bit data
words (three per microword). A common partitioning would be 512 words of control store and
1536 words of local store.

324

THE PDP-11 FAMILY

RELIABILITY AND MAINTAINABILITY

Design decisions to allocate a portion of the
cost of the 11/60 to reliability and maintainability, rather than to further improving performance, were motivated by user and market
needs. Prime considerations were the increasing
labor cost associated with maintenance and the
growing use of minicomputers in applications
demanding more reliability.
The first goal was to increase the mean time
between failures (MTBF) by: (1) reducing the
occurrence and impact of normally fatal hardware malfunctions, (2) providing error statistics, and (3) providing operating alternatives to
keep the system running after failures occur, albeit at a lower performance.
The second goal was to reduce the mean time
to repair (MTTR) when hardware malfunctions
occur by: (1) hardware design and packaging
that facilitate error diagnosis and repair during
scheduled and nonscheduled maintenance, (2)
continuous logging of hardware errors during
system operation, and (3) provision of software
and microdiagnostic tools for problem isolation.

fects are minimized. A blower system cools the
logic card cage by drawing fresh, filtered air
down over the printed circuit boards such that
no board receives exhaust air from another.
Other physical packaging to reduce hardware
problems include cable troughs, impactabsorbing casters, and special cabinet grounding. A filter is attached to the maintenance console to reduce electrostatic noise interference.
Console microcode double checks every entry
to verify data received from the keypad. A significant proportion of the 11/60 microcode
(Table 1) is devoted to logging microlevel state
upon the occurrence of a detected error. This
logged state can be accessed via a maintenance
examine and deposit (MED) instruction. Logged information is used by an operating system
to compile error records, which aid in tracking
down intermittent errors.
To reduce the impact of hardware malfunctions on the user environment, a number of failsoft capabilities have been implemented.
I.

2.
MTBF

Reducing the incidence of fatal hardware
malfunctions was a joint effort by engineering
and manufacturing. The Schottky transistortransistor logic (TTL) used in the machine, having been in widespread use for over five years, is
a well proven family of devices. Moreover, conservative electrical design practices were followed.
Plotted against time, chip failure rate tends to
follow a bathtub-shaped curve, high at either
end of the life cycle. The 11/60 production pro
cess includes extensive thermal cycling to ensure
that "infant mortality" cases are discovered
early during manufacturing.
The cabinet is designed to minimize buildup
of hot air over the processor boards. Power supplies are mounted at the rear of the cabinet
away from the logic, so that radiant heating ef:

3.

4.

If the cache fails, it is turned off and the
still-functioning primary memory is used
to keep the system running.
If a parity error occurs in WCS, the processor disables that control store. Then
the operating system is notified, and program execution can continue using the
basic PDP-II instructions.
Systems can be programmed to fall back
onto the integral floating-point unit if an
error is detected in the floating-point
processor.
The bootstrap loader permits system
loading from an alternative device if the
primary bootstrapping device is disabled.

MTTR

Error diagnosis is the most time-consuming
problem facing the field service engineer. Special diagnostic tools, both hardware and software, have been designed to reduce the time
spent in error isolation.

DESIGN DECISIONS FOR THE PDP-11/60 MID-RANGE MINICOMPUTERS

Table 1.

Control Store Usage by Category

Category

Number of

Percentage

Microwords

of Total

PDP-11 Instruction Set
Initialization
Operand fetch, execution, and operand
store
Infrequent intraprocessor transfers

230

B

Integral Floating-Point Instruction Set

1010

C

Reliability and Maintainability

A

D

E

325

4

95
515

20
840

9
40

Error logging, MED, and cache fail-soft

190

7

Console, boot. and initial diagnostic

230

9

Support of Options
Writable control store

60

2

Floating-point processor

80

3

150

6

2560

100

Reserved for Future Changes and
Additions

Total address space for microprograms is 4096 words. of which the 2560 categorized in the table are
Implemented in ROM.
Note the increased utilization of microprogramming in the 11/60. as compared to the 11/40. Category A.
totaling 840 words. was implemented in 256 words for the 11/40. The two machines have comparable
microword widths
The third subcategory in Category A illustrates the use of microprogramming in the frequency-driven
design approach. Examples of infrequent intraprocessor transfers are error handling and data transfer to and
from internal addresses. e.g .. memory management relocation registers.
One of the benefits of a microprogrammed implementation of control is the ease with which engineering
change orders (ECO) can be implemented. Space in Category E is reserved for such use and for the further
correction of undetected errors in the microcode itself.

Focal point of the hardware maintainability
effort is the microdiagnostic unit. This single
board tests the logic on five of the six processor
boards. When faults are detected, an error code
is displayed on light-emitting diodes (LEOs). A
fault directory can then be used to determine
which boards are to be replaced. The unit

requires only a small portion of the internal
machine (the microword sequencing) to be operational.

In addition, a number of on-board diagnostic
aids are included in the CPU design. These include LEOs to display the contents of the next

326

THE PDP-11 FAMILY

micro address register, a single-step mode, and a
micro break function.
Software diagnostic programs are used to
diagnose errors in system peripherals as well as
in all CPU subsystems, such as memory management unit and cache. User mode diagnostic
programs allow peripheral diagnosis to occur
while the system is available for other users.
Conventional standalone diagnostic programs
can also be used.
Physical packaging facilitates quick repair.
Hinged card cages and modular power supplies
allow easy access and module change.

SUMMARY
The design of a mid-range minicomputer has
been used as a concrete illustration of tradeoffs
made to effect a price/performance balance.

Designers use technology advances, e.g., doubling of density on a memory chip, to produce
new designs in one of two design styles: constant cost/increasing functionality or constant
functionality /decreasing cost. Increased use of
microprogramming, a factor of 3 in this case
study, is a trend that was observed.
By choosing a less powerful cache organization, the 11/60 design obtained a factor of 5
component reduction. Cache design also illustrates how some design parameters are highly
interdependent. The frequency-driven design
approach used on the floating-point processor
can lead to a 40 percent performance gain.
Examples of added functionality in the constant-cost style of design include greater reliability and maintainability, and user microprogramming.

Impact of Implementation
Design Tradeoffs on Performance:
The PDP-11, A Case Study
EDWARD A. SNOW and DANIEL P. SIEWIOREK

INTRODUCTION

As semiconductor technology has evolved,
the digital systems designer has been presented
with an ever increasing set of primitive components from which to construct systems:
standard SSI, MSI, and LSI as well as custom
LSI components. This expanding choice makes
it more difficult to arrive at a near-optimal
cost/performance ratio in a design. In the case
of highly complex systems, the situation is even
worse since different primitives may be cost-effective in different subareas of such systems.
Historically, digital system design has been
more an art than a science. Good designs
evolved from a mixture of experience, intuition,
and trial and error. Only rarely have design
methodologies been developed (e.g., two level
combinational logic minimization, wire-wrap
routing schemes, etc.). Effective design methodologies are essential for the cost-effective design
of more complex systems. In addition, if the
methodologies are sufficiently detailed, they
can be applied in high level design automation
systems [Siewiorek and Barbacci, 1976].
Design methodologies may be developed by
studying the results of the human design process. There are at least two ways to study this

process. The first involves a controlled design
experiment where several designers perform the
same task. By contrasting the results, the range
of design variation and technique can be established [Thomas and Siewiorek, 1977]. However,
this approach is limited to a fairly small number
of design situations due to the redundant use of
the human designers.
The second approach examines a series of existing designs that meet the same functional
specification while spanning a wide range of design constraints in terms of cost, performance,
etc. This paper considers the second approach
and uses the DEC PDP-II minicomputer line as
a basis of study. The PDP-II was selected due
to the large number of implementations (eight
are considered here) with designs spanning a
wide range in performance (roughly 7:1) and
component technology (bipolar SSI, MSI,
MOS custom LSI). The designs are relatively
complex and seem to embody good design
tradeoffs as ultimately reflected by their
price/performance and commercial success.
The design tradeoffs considered fall into
three categories: circuit technology, control unit
implementation, and data path topology. All
327

328

THE PDP-11 FAMILY

three have had considerable impact on performance. Attention here is focused mainly upon the
CPU. Memory performance enhancements
such as caching are considered only in so far as
they affect CPU performance.
This paper is divided into two major parts.
The first part presents an archetypal implementation followed by the model-specific variations
from the archetype. These variations represent
the design tradeoffs. The second part presents
methodologies for determining the impact of
various design parameters on system performance. The magnitude of the impact is quantified
for several parameters and the use of the results
in design situations is discussed.
The PDP-ll Family is a set of small- to medium-scale stored program central processors
with compatible instruction sets. The 11 Family
evolution in terms of increased performance,
constant cost, and constant performance successors is traced in Figure 1. Since the 11/45,
11/55 and 11/70 use the same processor, the
KBII, only the 11/45 is treated in this study.
IMPLEMENTATION OF MEDIUM
PERFORMANCE PDP-11s

The broad middle range of PDP-lIs have
comparable implementations yet their performances vary by a factor of 2. The processors making up this group are the PD P-ll /04, 11/10,*
11/20, 11/34, 11/40, and 11/60. This section
discusses the features common to these implementations and the variations found between
machines which provide the dimensions along
which they may be characterized.
Common Implementation Features

All PDP-II implementations, be they low,
medium, or high performance, can be decomposed into a set of data paths and a control
unit. The data paths store and operate upon
byte and word data and interface to the Unibus,
permitting them to read from and write to

PDP-ll/70

o

~""

,,,.,,,,,0 - 0

PDP-11/20 / '
PDP-11/40
o---o~-----o PDP-11/60

~~"'''~
o

0

\PDP-ll/04

o

....
o(,)
'"

\

LSI-ll

o

TIME

Figure 1_

PDP-11 Family tree.

memory and peripheral devices. The control
unit provides all the signals necessary to evoke
the appropriate operations in the data paths
and Unibus interface. Mid-range PDP-lIs have
comparable data path and control unit implementations allowing them to be contrasted in a
uniform way. In this section, a basis for comparing these machines is established and used to
characterize them.
Data Paths. An archetype may be constructed from which the data paths of all midrange PDP-lIs differ but minimally. This archetype is diagrammed in Figure 2. All major registers and processing elements as well as the
links and switches which interconnect them are
indicated. The data path illustrations for individual implementations are grouped with Figure 2 at the end of the chapter. These figures are
laid out in a common format to encourage comparison. Note that with very few exceptions, all
data paths are 16 bits wide (PDP-II word size).
The heart of the data paths is the arithmetic
logic unit or ALU through which all data circulates and where most of the processing actually
takes place. Among the operations performed
by the ALU are addition, subtraction, one's

*The 11/05 and the 11/10 are identical machines sold to different markets. This chapter refers to the machine as the 11/10.

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11. A CASE STUDY

~

BUS
ADDRESS

~

U

USDATA

~
CONSTANTS

SPM

---...

I _

•

Iats

~ IDATA

CODES

NOTE:
All data paths are 16 bits wide unless otherwise indicated.

Figure
paths.

2.

Archetypal

medium-range

PDP-ll

data

and two's complementation, and logical ANDing and ORing.
The inputs to the ALU are the A leg and the
B leg. The A leg is normally fed from a multiplexer (A leg MUX) which may select from an
operand supplied to it from the Scratchpad
Memory (SPM) and possibly from a small set of
constants and/or the Processor Status register
(PS). The B leg also is typically fed from its own
MUX (B leg MUX), its selections being from
the B Register and certain constants. In addition, the B leg MUX may be configured so that
byte selection, sign extension, and other functions may be performed on the operand which it
supplies to the ALU.
Following the ALU is a multiplexer (the A
MUX) typically used to select between the output of the ALU, the data lines of the Unibus,
and certain constants. The output of the A
MUX provides the only feedback path in all
mid-range PDP-II implementations except the
11/60 and acts as an input to all major processor registers.
The internal registers lie at the beginning of
the data paths. The Instruction Register (lR)
contains the current instruction. The Bus Address register (BA) holds the address placed on
the Unibus by the processor. The Program Status register (PS) contains the processor priority,
memory management unit modes, condition

329

code flags, and instruction trace trap enable bit.
The Scratchpad Memory (SPM) is an array of
16 individually addressable registers which include the general registers (RO-R 7) plus a number of internal registers not accessible to the
programmer. The B Register (B Reg) is used to
hold the B leg operand supplied to the ALU.
The variations from this archetype are minor
as discussed in the section entitled "Characterization of Individual Implementations." Variations encountered include routings for Bus
Address and Processor Status register, the point
of generation for certain constants, the positioning of the byte swapper, sign extender, and
rotate/shift logic, and the use of certain auxiliary registers present in some designs and not
others. In general, these variations are all peripheral to the major elements and interconnections of the data paths.
Control Unit. The control unit for all POP11 processors (with the exception of the POP11/20) is microprogrammed [Wilkes and
Stringer, 1953]. The considerations leading to
the use of this style of control implementation
in the PDP-II are discussed in [O'Loughlin,
1975]. The major advantage of microprogramming is flexibility in the derivation of
control signals to gate register transfers, synchronization with Unibus logic, control of microcycle timing, and evocation of changes in
control flow. The way in which a microprogrammed control unit accomplishes all of
these actions impacts performance.
Figure 3 represents the archetypal PDP-II
microprogrammed control unit. The contents
of the Microaddress Register determine the current control unit state and are used to access the
next microinstruction word from the control
store. Pulses from the clock generator strobe
the Microword and Microaddress Registers
loading them with the next microword and next
microaddress respectively. Repeated clock pulses thus cause the control unit to sequence
through a series of states. The period spent by
the control unit in one state is called a microcycle (or simply cycle when this does not lead to

330

THE PDP-11 FAMILY

MICROWORD
REGISTER
FROM UNIBUS
INTERFACE
CONTROL
FIELD

INFORMATION

DATA
PATH
CONTROL
FIELDS

UNIBUS
CONTROL
SIGNALS

CONTROL
SIGNALS
TO
PATHS

Figure 3. Archetypal microprogrammed PDP-11
control unit.

confusion with memory or instruction cycles),
and the duration of the state as determined by
the clock is known as the cycle time. The Microword Register shortens cycle time by allowing
the next microword to be fetched from the control store while the current microword is being
used.
Most of the fields of the microword supply
signals for conditioning and clocking the data
paths. Many of the fields act directly or with a
small amount of decoding, supplying their signals to multiplexers and registers to select routings for data and to enable registers to shift,
increment, or load on the master clock. Other
fields are decoded based upon the state of the
data paths. An instance of this is the use of auxiliary ALU control logic to generate function
select signals for the ALU as a function of the
instruction contained in the IR. Performance as
determined by microcycle count is in large measure established by the connectivity of the data
paths and the degree to which their functionality can be evoked by the data path control
fields of the microprogram word.
The complexity of the clock logic varies with
each implementation. Typically, the clock is
fixed at a single period and duty cycle; however,

processors such as the 11/34 and 11/40 can select from two or three different clock periods
for a given cycle depending upon a field in the
Microword Register. This can significantly improve performance in machines where the
longer cycles are necessary only infrequently.
The clock logic must provide some means for
synchronizing processor and Unibus operation
since the two operate asynchronously with respect to one another. Two alternate approaches
are employed in mid-range implementations.
Interlocked operation, the simpler approach,
sh uts off the processor clock when a Unibus operation is initiated and turns it back on when
the operation is complete. This effectively keeps
microprogram flow and Unibus operation in
lockstep with no overlap. Overlapped operation
is a somewhat more involved approach which
continues processor clocking after a DATI or
DATIP is initiated. The microinstruction requiring the result of the operation has a function bit set which turns off the processor clock
until the result is available. This approach
makes it possible for the processor to continue
running for several microcycles while a data
transfer is being performed, improving performance.
The sequence of states through which the
control unit passes would be fixed if not for the
branch on microtest (BUT) logic. This logic
generates a modifier based upon the current
state of the data paths and Unibus interface
(contents of the Instruction Register, current
bus requests, etc.) and a BUT field in the microword currently being accessed from the control
store which selects the condition on which the
branch is to be based. The modifier (which will
be zero in the case that no branch is selected or
that the condition is false) is ORed in with the
next microinstruction address so that the next
control unit state is not only a function of the
current state but also a function of the state of
the data paths as well. Instruction decoding and
addressing mode decoding are two prime exam-

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11. A CASE STUDY

pIes of the application of BUTs. Certain code
points in the BUT field do not select branch
conditions, but rather provide control signals to
the data paths, Unibus interface, or the control
unit itself. These are known as active or working BUTs.
The JAM logic is a part of the microprogram
flow-altering mechanism. This logic forces the
Microaddress Register to a known state in the
event of an exceptional condition such as a
memory access error (bus timeout, stack overflow, parity error, etc.) or power up by ORing
all one's into the next microaddress through the
BUT logic. A microroutine beginning at the allone's address handles these trapped conditions.
The old microaddress is not saved (an exception
to this occurs in the case of the PDP-II 160);
consequently, the interrupted microprogram sequence is lost and the microtrap ends by restarting the instruction interpretation cycle with the
fetch phase.
The structure of the microprogram is determined largely by the BUTs available to implement it and by the degree to which special cases
in the instruction set are exploited by these
BUTs. This may have a measurable influence
on performance as in the case of instruction decoding. The fetch phase of the instruction cycle
is concluded by a BUT that branches to the appropriate point in the microcode based upon
the contents of the Instruction Register. This
branch can be quite complex since it is based
upon source mode for double operand instructions, destination mode for single operand instructions, and operation code for all other
types of instructions. Some processors can perform the execute phase of certain instructions
like setl clear condition code during the last
cycle of the fetch phase meaning that the fetch
or service phases for the next instruction might
also be entered from BUT IRDECODE. Complicating the situation is the large number of
possibilities for each phase. For instance, there
are not only eight different destination address-

331

ing modes, but also subcases for each that vary
for byte and word and for memory modifying,
memory nonmodifying, MOV, and JMP I JSR
instructions.
Some PDP-II implementations such as the
11/10 make as much use of common microcode

as possible to reduce the number of control
states. This allows much of the IR decoding to
be deferred until some time into a microroutine
which might handle a number of different cases.
F or instance, byte and word operand addressing is done by the same microroutine in a number of PDP-lIs. With the cost of control states
dropping with the cost of control store ROM,
there has been a trend toward providing separate microroutines optimized for each special
case as in the 11/60. Thus, more special cases
must be broken out at the BUT IRDECODE,
making the logic to implement this BUT increasingly involved. There is a payoff, though,
because there is a smaller number of control
states for IR decoding and fewer BUTs. Performance is boosted as well since frequently occurring special cases such as MOY register to
destination can be optimized.
Typical Instruction Interpretation Cycle.
To get a feel for the PDP-ll data paths and
control unit in operation, consider the interpretation of a representative instruction by the
archetypal PDP-II. The instruction to be followed is a word bit set (BIS), an instruction
which takes its source operand, logically ORs it
with the destination operand, and returns the
result to the destination. Register addressing
with register 2 is used for the source; indexed
addressing with register 7 is used for the destination.
What follows is the sequence of microinstructions evoked during the execution of the
macroinstruction described in Table 1. Each
microinstruction is numbered and consists of
the register transfers and any Unibus operation
or branch on microtest initiated by the microword.

332

THE PDP-11 FAMILY

Table 1.
Phase

Microinstructions Evoked During Execution of Macroinstruction
Cycle

FETCH

Operation

Explanation

BA +- PC;
DATI; CLKOFF

A read operation is initiated to fetch the instruction
addressed by the Program Counter.

~

2

IR +- BUSDATA

3

PC +- PC + 2;
BUT IRDECODE

~

The instruction is placed in the Instruction Register.

The Program Counter is incremented to address the
next location in the instruction stream (in this case
the location containing the index for the destination). The instruction (held in the I R) is decoded by
the BUT and found to be a double operand instruction causing a branch to the microcode for source
mode O.

BUT IRDECODE
double operand word
sou rce mode zero
SOURCE

4

SRCOPR +- RS;
BUT DESTINATION

The contents of the register addressed by the source
field of the instructiorl (register 2) are copied into
the Scratchpad Register reserved for source operands. The next state is determined by the destination addressing mode and the fact that BIS is a
word instruction which modifies its destination.

BUT DESTINATION
modifying word;
destination mode b
DESTINATION

A read operation is initiated to get the index word
(pointed to currently by the Program Counter) for
the effective address of the destination operand.

5

BA +- PC;
DATI

6

PC +- PC
CLKOFF

7

B +- BUSDATA

8

BA +- RD
B;
DATIP; CLKOFF

9

B +- BUSDATA

+

+

2;

+

~

1
!

+

The Program Counter is incremented to point to the
next instruction. Note that this cycle is overlapped
with the DATI started in cycle 5.
The index is stored for use in the next cycle.

The index is added to the contents of the destination register to form the effective address of the
destination operand. A DATIP is performed to read
the operand since the operand is to be modified and
then restored to its original location in memory.
The destination operand is stored so it is available to
the B leg of the ALU.

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11, A CASE STUDY

Table 1.

333

Microinstructions Evoked During Execution of Macroinstruction (Cont)

Phase

Cycle

Operation

Explanation

EXECUTE

10

BUSDATA +- SRCOPR OP B;
DATO; CLKOFF;
BUT SERVICE

1

The source and destination operands are logically
ORed together and put out on the Unibus to be written into the memory location from which the destination operand was read. (Note that the destination
address is still in BA.) Upon completion of the
DATO, the control unit branches into the service
phase if a serviceable condition is pending; otherwise, it branches back to repeat the fetch phase for
the next instruction. Although it performs an execute phase function, this microinstruction is part of
the same destination mode microroutine that generated cycles 5 through 9.

BUT SERVICE
no service
request
next fetch

service
request
service phase

Notation used in microinstructions for Table 1:
B=
BA =
BUSDATA =
CLKOFF =

B Register
Bus Address register
Unibus data lines
Stop the processor clock
until a Unibus transaction
is completed; used for processor /U nibus overlap
IR = Instruction Register
PC = Program Counter (Scratchpad Register 7)
RD = Scratch pad Register addressed by macroinstruction destination field
(IR<2:0»
RS = Scratch pad Register addressed by macroinstruction source field (lR <8:6»

SRCOPR = Scratch pad Register 10 (not
accessible to programmer);
used as a temporary for
source operands
a OP b = Operand a (on the A leg of
the AL U) and operand b
(on the B leg of the AL U)
are combined according to
the operation specified by
the macroinstruction. The
ALU function is selected by
the auxiliary ALU logic as
described in the subsection
"Control Unit."
a +- b = Register a is loaded with
operand b
At a detailed level, the instruction interpretation process of each PDP-II implementation varies significantly from that outlined in

334

THE PDP-11 FAMILY

Table I; however, the scenario is still highly representative of the operation of the control unit
and data paths in the designs to be considered.
Characterization of Individual
Implementations

A set of common implementation features
may be used to characterize each mid-range
PDP-II to provide the raw data upon which
comparisons may be based. A summary of these
characteristics is given in Tables 2 and 3.
PDP-11/20. The 11/20 was the first of the
PDP-II family. The 11/20 is atypical in a number of important aspects. Because the semiconductor read-only memory technology which
makes microprogramming economically attractive was unavailable when the PDP-I 1/20 was
designed, control was implemented in random
logic in contrast to the microprogrammed control used in all the succeeding members of the
PDP-II family. This causes control to be forced
into a very stylized form so as to minimize the
number of control unit states. Finally, the Unibus control generates a number of signals controlling the operation of the data paths. This
makes it necessary for the Unibus and processor control unit to operate in tight lockstep with
each other with no possibility of asynchronous
data transfer.
The absence of MSI also has significant impact on the implementation of the data paths
(Figures 4 and 5). The extensive use of SSI logic
has several ramifications beyond increased cost
and complexity. The A leg and B leg MUXs are
set up to act as latches in addition to acting as
data selectors (Figure 5). One may think of a B
leg being placed between the B leg MUX and
the ALU. The ALU is a simple adder in contrast to the multifunctioned TTL MSI 74181
ALUs used in every other medium performance
PDP-ll. Logical operations are carried out in
the A leg MUX/latch. The MUX can select either the true or complemented form of operands to support logical NOT. Logical OR is

accomplished by gating the two operands into
the MUX simultaneously (one operand may
have been latched beforehand). Logical AND is
performed by making use of DeMorgan's Rule
(A'" B == '" ['" A V", B]). Since there is no logic
for complementing the output of the A leg
MUX/latch, two cycles are necessary: the first
to form '" A V", B, the second to run it through
the A leg MUX again to form the complement.
The rotate/shift/byte swap logic is built into
the MUX following the adder. A final peculiarity of the 11/20 is the separate paths provided
from the Unibus for the IR and PS. Interestingly enough, even with all of these rather
striking differences in implementation, the
PDP-I 1/20 still shows a strong kinship to its
successors.
PDP-11/40. The PDP-I 1/40 was designed
to improve upon the performance of the PDP11/20 without an increase in price by taking advantage of the TTL MSI technology arising after the introduction of the 11/20. With the
exception of the PDP-II/60 (and the 11/20
which exceeds the 11/40 in cost), the 11/40 is
both the fastest and most expensive mid-range
PDP-II processor.
The data paths of the 11/40 (Figure 6) correspond closely to those of the archetype except in
the immediate vicinity of the ALU. What has
been indicated as the A leg MUX is really the
negative-logic wired OR of a number of signals.
Options such as the Floating-Point Processor
are added by simply tying them into the D
MUX output and A leg. Two paths exist out of
the PS: one running to the A leg MUX as in the
archetype and a second running directly to the
Unibus as in the 11/20. A path from the A leg
MUX directly to the D MUX (equivalent to the
A MUX of other models) exists allowing the
ALU (and thus the propagation delay incurred
by passing through it) to be bypassed in those
cases where the contents of the SPM or PS are
to be routed directly back to the B Register of
SPM. Single-bit shifts and rotates right are handled in the D M UX in a fashion similar to the

s:"'U

l>

()
~

0
."

Table 2. PDP·11 Circuit Technology and Data Paths

s:"'U

Circuit Technology

Model

Perfor rna nce
Relative
to LSI-11

Logic
Family

Level of
Integration

r

m

Data Paths
Scratchpad
Memory

ALU

Sign
Extension

Rotate!
Shift

s:m
Byte
Swap

Z

~

Other Features

l>

~

(5
LSI-11

1.000

N-channel
MOS

LSI

• Organized 26
registers X 8
bits
• 1 write/2 read
ports

8-bit nMOS ALU

Not
needed;
done in
microcode

In ALU

Not
needed;
done in
microcode

• 8-bit-wide data paths,
16-bit operands requilre two cycles
• Non-Unibus, datal
address lines MUXed

Z
0

m

en
G)

Z
~

:::0

l>

11/04

11/10

1.455

1.436

TTL

TTL

MSI

MSI

16X 16
with SP Reg for
write after read

74181s with 74182
carry lookahead

16 X 16
read and write
may not take
place within same
cycle

74181s with 74182
carry lookahead

16 X 16
with input latches
for write after
read

7482 adders,
ripple carry plus
combinational
logic

In B leg
MUX

In B leg
MUX

B Reg is
bidirectional
sh ift reg ister

Before
SPM

B Reg is
bidirectional
shift register

None
performed
as 8 shifts

• Complementor at
ALU A leg for sub·
traGt instruction

0

m

0

."
."

en
0
Z
"'U

m

:::0
."

0

:::0

s:

l>
11/20

1.667

TTL

SSI

In B leg
MUX/
latch

Following
adder

Following
adder

• Bu:; data has own
path to I Rand PS
• PS has own path out
to bus data, no other
outgoing paths

Z

()

~
~

I

m

"'U

0

"?
11/34

1.942

TTL
TTL/S

MSI

16 X 16
write while read

74S181 s with
74S182 carry
lookahead

Following
AMUX

B Reg is bidirectional
shift register

Following
AMUX,
speeds
odd·byte
accesses

• B extension register
(BX Reg) for EIS
instructions

l>
()

l>
m

en
en
~

c
0
-<
w
w

U1

w
w

(j)

-I

I

m
"'C

0

-p

Table 2. PDP-11 Circuit Technology and Data Paths (Cant)
Circuit Technology

Data Paths

"T1

»

s::

Model

Performance
Relative
Logic
Family
to LSI-11

Level of
Integration

11/40

2.819

TTL

MSI

16 X 16
D Reg and multiphase cycle allow
write after read

11/45

6.820
(with
bipolar
memory)

TTL/S

MSI

• Two banks of
16 X 16 for 1
write/2 read
parts

Scratchpad
Memory

Rotate/
Shift

Byte
Swap

74181s with 74182
carry lookahead

In B leg
MUX

To left in
ALU to right
in DMUX

In B leg
MUX

• Bypass from A leg
MUX around ALU
and DReg
• Two paths into BA

74S181 s with
74182 carry
lookahead

In ALU

To left in
ALU
To right in
SHFMUX

In
SHFMUX

• PC broken out
separately from
scratch pads
• Multiple paths into
ALU
• Fastbus supports
sem iconductor
memory

74S181 s with
74182 carry
lookahead

In shift
tree

In shift tree

In shift
tree

• Shift tree allows
multibit shifts

ALU

• Read and
write may not
occupy same
cycle
11/60

3.727
(87%
cache hit
ratio)

TTL/S

MSI

• Two banks of
32 registers
X 16 bits
• Only RO-R7
and user R6
duplicated
• Write after
read

r

Sign
Extension

Other Features

• Scratch pad C
for constants, bus
input, and status
logging
• 3-state logic used
extensively

-<

s:""0

Table 3. PDP-11 Control Unit and Physical Assembly

Model
LSI-11

11/04
11/10

11/20

11/34

11/40

11/45

11/60

Control
Derivation
Vertical
microcode

Cycle
Time(s)
(ns)

Processorl
Unibus
Synchronization

Control
Store Size
(bits X words)

400

Interlocked

22 X 1024
(expandable
to 2048)

Horizontal
microcode

11:)0

Horizontal
microcode

300
(150
for fast
shift)

Overlapped

280

Interlocked

Random
logic

Interlocked

Interlocked

Horizontal
microcode

140
200
300

Overlapped

Horizontal
microcode

170

994

"Tl

Other Features
• No next microaddress in
microword; microwords
are selected sequentially
until a branch, jump, or
translate is encountered

249
249

• Microword is not buffered

Circuit
Boards
1 quad
(4 positions)

I nt4~grated
Cirl:uit
Packages

Integrated
Circuit
Types

48

24

s:""0
r
m

s:m
z---j

»
---j
6
Z
0

1 hex
(6 positions)

138

2 hex
(12 positions)

203

40

m

(f)

G)

Z

60

---j

::0

»
0

180
240

150

40 X 256

Control
Store
Words
Used

---j

0

m
0

Horizontal
microcode

Horizontal
microcode

40 X 256

»()

Physical Assembly

Controller

Overlapped

Interlocked

• Control states are encoded
in major and minor state
shift registers

48 X 512

56 X 256

64 X 256

48 X 2560
(excluding
user control
store space)

488

251

256

2410
(including
integral
floating
point)

• BUT field is buffered, BUT
must be placed one microinstruction ahead of where
it is to take place

6 quad,
6 double,
2 single
(38 positions)

523

27

"Tl
"Tl
(f)

0

Z

""0

m

::0

2 hex
(12 positions)

2:31

4 hex,
1 quad
(28 positions)

4'17

54

"Tl

0

::0

s:
53

»

z

()

m
..

--f

::r:
m

• Forks and microbranches
may be enabled together,
microbranches taking
precedence

7 hex,
1 quad
(46 positions)

6H6

• Multilevel microsubroutines
• Page-addressed microstore
• Extensive use of residual
control
• Control store available to
user through WCS

6 hex
(36 positions)

648

78

""0

0

7J

74

»
»
(f)

()

m

(f)
---j

C
0

-<
CAl
CAl
-...J

338

THE PDP-11 FAMILY

IR

BUS
ADDRESS
BA
18
8US
DATA

BUS
DATA
MUX

B lEG MUX
AND LATCH

.--7':--+ BUS

DATA

CONDITION
COOES

NOTE:
All data paths are 16 bits wide unless otherwise indicated.

Figure 4.

PO P-11/20 data paths.

A lEG MUX/LATCH
1/474Hoo

R <03> H
JFROM SPM)

---,----++---L.-/
ADD <03> L

ROTATE/SHIFT MUX

BO <03> H
(BUS DATAl

-+-'----+---.-"",

ADD <11> L - t - - - . - - r - "

ADD <04> L---1H---r-....
STPM <03> H
(CONSTANTS) -----++~--r-_I

ADD <02> L-f+-l--_f"-.... ,
74H53

B LEG MUXlLATCH
LATCH B <15:00> H
GATE B

~

~

GATE BYTE <07:00> H

R <07:00> H

GATE RIGHT <15:00> H

STPM <15:00> H

GATE LEFT <15:00> H

GATE B
GATE B

BD <15:00> H

GATE ADD <07:00> H

~

KEY
··SIGNAL NAME"· H·SIGNAL IS ASSERTED (1) WHEN HIGH.
··SIGNAL NAME·· L·SIGNAL IS ASSERTED (11 WHEN LOW.

Figure 5. Detail of central part of PDP-11/20 data paths. One-bit (03) slice (adapted from
Kell Processor Manual).

0<03> H
(TO REST OF
DATA PATHS)

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11, A CASE STUDY

SA MUX

EXTEND
J - - - ' - - _ B U S DATA

NOTE:

All data paths are 16 bits wide unless otherwise indicated.

Figure 6.

PDP-11/40 data paths.

11/20. Rotate/shifts to the left, however, are
performed in the ALU. Sign extension and byte
swapping are performed in the B leg MUX.
Since the Scratch pad Register may not be both
simultaneously read and written, the D Register
(0 Reg) is used to hold results generated while
the SPM is being read in one processor clock
phase so that during a later phase they may be
written back into the Scratchpad. In this way
the 0 Register permits read-write access of the
SPM within a single cycle. A final feature is the
presence of two paths into the Bus Address register, one from the A leg MUX and one from
the ALU. This is of benefit in such operations
as autoincrement and autodecrement addressing modes in which the contents of a register
can be modifed and either the premodification
(autoincrement) or postmodification (autodecrement) value of the register can be put into
the Bus Address register in a single cycle.
The 11/40 microprogrammed control unit is
quite elaborate to gain full benefit of the potential of the data paths. Among its features are
overlapped processor /U nibus operation and
three selectable microcycle clock periods. The
latter feature increases performance immensely
since the maximum cycle time of 300 nanose-

339

conds is needed only when a full circle from
Scratchpad through ALU and back to Scratchpad is made. In cycles which do not write into
the Scratchpad, a 200-nanosecond cycle may be
selected. When the data paths are unused and
only microbranching is involved, an even
shorter cycle time of 140 nanoseconds is possible. A final unique feature of the 11/40 is a
variation in the branch on microtest logic from
that of the archetypal control unit. To increase
micro branch speed, the microword BUT select
field is buffered in the Microword Register
rather than being routed directly from the control store to the BUT logic. This causes a onecycle delay in processing the branch and forces
all BUTs to be placed one microinstruction
ahead of where they are to take effect. In some
cases, dummy steps are required to provide sufficient lead time for BUT action to occur, somewhat offsetting the speedup of this
arrangement.
One way in which the 11/40 uses its processor /U nibus overlap feature to advantage is by
prefetching words from memory whenever possible. At the end of the fetch phase, a check is
made to see if the next memory reference fetches an instruction or operand index. If it does,
the read access is begun immediately using the
contents of the PC as the address. Exceptions to
this are when the PC is used as a destination or
when a service request is pending, both of which
mean that the current value of the PC will not
be the address of the next instruction. Starting
the access early allows it to proceed in parallel
with the execution of the current instruction.
This reduces the time the processor idles waiting for the accessed word. Updating of the PC is
deferred until the proper point in the instruction interpretation process is reached. This
guarantees that references to the PC will result
in the proper value being used.
PDP-11/10. The PDP-II/IO was designed
as a minimal cost processor. The implementation is again TTL MSI but stripped to the bare
essentials without the elaboration of the 11/40.

340

THE PDP-11 FAMILY

The data paths of the 11/10 (Figure 7) follow
the conventions of the archetype closely. A constant zero may be selected onto the A MUX in
addition to ALU or Unibus data. The ALU A
leg multiplexer allows selection of the PS, some
constants, and some internal addresses as well
as the Scratchpad memory. The B Register is
implemented as a universal bidirectional shift
register so that single-bit shifts and rotates may
be performed without additional logic. The

NOTE:

All data paths are 111 bits wide unless otherwise indicated.

Figure 7.

PDP-11/10 data paths.

ALU B leg multiplexer includes the constants
one and zero and permits sign extension of the
low order byte of the B Register. The Scratchpad Memory may not be both read and written
in the same cycle; thus, operations such as incrementing the PC, which takes only a single
microcycle on other processors, takes two microcycles to compie".on the 11/10. A byte
swapping path is absent in the 11/10. As a consequence, odd-byte addressing and swapping
must be accomplished by a series of eight shifts
or rotates.
The 11/10 control unit has a relatively austere implementation. There is no Microword

Register in the control unit although there is
necessarily a Microaddress Register. As a consequence, the output of the control store is used
directly to condition the data paths. This precludes the overlap of current microinstruction
execution with next microinstruction fetch.
Hence, the propagation delay of the control
store must be added to that of the data paths in
setting the microcycle time, causing it to be a
relatively long 300 nanoseconds. The simplicity
of the data paths allows the use of a microword
only 40 bits wide. The microcode contains very
few frills and gains very little in performance
from special cases. A notable example of this is
the jump address calculation for JMP and JSR
instructions. The II/lOuses the same section of
microcode for JMP and JSR destination modes
as it uses to fetch conventional destination operands. This costs an extra memory reference
over the separate microroutines used in other
PDP-II processors because, in addition to the
effective address of the jump being calculated,
its contents are also fetched (the microprogram
logic precludes using this operand as a prefetched instruction even though this is effectively what it is). Overlapped processor/Unibus
operation allows some of the extra microcycles
necessitated by the data paths to be effectively
hidden by putting them in parallel with Unibus
accesses. The other concession to performance
is clock speed doubling during shift operations
to partially compensate for the performance
lost in the absence of a byte swapper.
PDP-11/04. The PDP-l1/04 is the simplest
PDP-II except for the LSI-II. Although
simple, the 11/04 embodies a very good set of
design tradeoffs. Figure 8 diagrams the 11/04
data paths. The Scratch pad Memory has a register (SP Reg, part of the SPM shown in Figure
8) sitting between it and the A MUX. This register allows the Scratchpad to support readmodify-write accesses, saving a microcycle in
each such access over the 11/10. A multiplexer
sitting before the SPM implements' the swap
byte operation, allowing the halves of a word to

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11, A CASE STUDY

be interchanged. This improves byte operation
performance considerably over the 11/10 and
obviates the need for the 11/1 O's fast shift logic.
Also eliminated is ovedapped processor /U nibus operation because the savings from
number of microcycles.
The A MUX (the major data bus and the
multiplexer which drives it) can select the PS
and a number of constants in addition to ALU

341

be selected into the B ieg of the ALU in a manner identical to that of the B leg MUX of the
11/10. The B Register is also identical to that of
the II/lOin that it is a bidirectional shift register implementing rotate/shifts.
The final contributor to increased performance of the 11/04 is the decrease in cycle time
from 300 nanoseconds in the 11/10 to 260 nanoseconds, made possible in part by pipelining
the microword fetch. On the whole, the 11/04 is

BUS
ADDRESS

BUS

1B
BUS

MUX

EXTEND

B LEG MUX

CONDITION
CODES

CONDITION

CODES

NOTE:
All data paths are 16 bits wide unless otherwise indicated.

Figure 8.

B LEG MUX

PD P-11 /04 data paths.

output and Unibus data. Between the SPM and
ALU is a one's complementor so that the 74181
ALU may be used to perform the B leg minus A
leg operation used in the "subtract" instruction,
in addition to the A leg minus B leg operation
used in the "compare" instruction. The A leg
MUX also directly drives the Unibus address
lines without a Bus Address register (if processor /U nibus overlap had been used, a BA register would have been necessary). Between the B
Register and ALU is a multiplexer which allows
the B Register, sign-extended low order byte of
the B Register, or the constants zero or one to

NOTE:
All data paths are 16 bits wide unless otherwise indicated.

Figure 9.

PDP-11/34 data paths.

superior in performance to the II/lOin all cases
except the fetch phase and certain addressing
modes where the use of its processor/Unibus
overlap capability is sufficient to put the 11/10
ahead.
PDP-11/34. The PDP-l1/34 is an elaboration of the 11/04. The 11/34 data paths (Figure
9) bear close resemblance to those of the 11/04.
The 11/04 complementor has been replaced in
the 11/34 by additional microcode which reverses the placement of source and destination
operands on the A and B legs of the ALU during the subtract instruction from that of the

342

THE PDP-11 FAMI LY

other double operand instructions. This frees
the 11/34 from performing the adjustments that
must be made in the data paths of the PDP-II
processors to make the subtract instruction operate correctly under the restrictions of the
74181 ALU. Added is a B Extension register
(BX register) which, when concatenated with
the B Register, forms a 32-bit register for
double-width operand and results manipulated
by extended instruction set operations such as
multiply and divide. Also notable is the relocation of the byte swapper to the tail of the A
MUX allowing odd-byte accessing to occur as
data is entered from or placed upon the Unibus
without the customary extra microcycle needed
in other implementations to right adjust the
byte. Included with the byte swap per is the sign
extension logic. Schottky TTL is used in critical
places in the data paths, notably the AL U, to
speed up micro cycle time from the 260 nanoseconds of the 11/04 to 180 nanoseconds. Additional hardware for memory management (not
shown in Figure 9) and extended instruction set
microcode are standard features.
The 11/34 microprogrammed control unit
makes some concessions to the improved performance of the data paths. In addition to the
normal I80-nanosecond cycle, there is a 240-nanosecond cycle used primarily for Unibus operations. Again, there is no processor/Unibus
overlap feature because considerations of simplicity (i.e., cost) outweighed the incremental
improvement in performance that would be netted. Because of its additional logic, the PDP11/34 has a wider microword than the 11/04
(48 bits versus 40 bits). Also, since many more
cases are broken out by the BUT IRDECODE
in the 11/34 than in the machines preceding it,
the size of the control store has been increased
to 512 words, double that of earlier horizontally
microprogrammed implementations.

*The POP-II /70 also uses a cache.

BUS
ADDRESS

18

DATA

NOTES
1. All data paths are 16 bits wide unless otherwise indicated.
2.
PS is implemented separately from data paths.

Figure 10.

PDP-11/60 data paths.

PDP-11/60. The PDP-ll/60 is the latest
implementation covered in this paper and in
many ways the most unique. Its design exploits
advances in circuit technology occurring since
the introduction of the earlier models giving it a
number of features which set it apart from other
PDP-II family members. Two major enhancements are a larger microcode addressing space,
making an integral floating-point instruction
set and a writable control store option feasible,
and a cache memory.* Both are possible due to
increases in the density and decreases in the cost
of bipolar ROM and RAM (see Chapter 13).
As illustrated in Figure 10, the 11/60 data
paths show significant differences from those of
other midrange implementations. A major difference is the presence of three Scratchpad
Memories feeding the ALU. Scratchpads A and
Bare 32-word X I6-bit register arrays, each
having twice the number of registers of the

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11. A CASE STUDY

singie Scratchpad found in other mid-range designs. As with the 11/45 (see the section entitled
"Implementation of a High-Performance PDP11 "), the contents of the general registers are
kept in both Scratchpads allowing different registers to be read onto the A and B legs of the
ALU simultaneously within the same cycle.
This speeds register-to-register operations. The
additional registers in the A and B Scratchpads
are used as floating-point registers by the integral floating-point microcode, working storage by user microprograms, and console,
maintenance, and status registers by the processor. Scratchpad C is a I6-word X 16-bit array
which holds bus data and constants used by the
processor and takes the place of the constants
ROM on the B leg of other midrange implementations. During exceptional situations these
constants may be overwritten with other information but must be restored before execution of
the base machine microcode may be resumed.
The 11/60 is the first PDP-II implementation
to make use of three-state devices to eliminate
many of the multiplexers used in other designs
(the 11/40 uses open-collector logic on the A leg
bus to the same effect). For instance, instead of
actual A leg and B leg MUXs, the 11/60 uses
registers and combinational elements with
three-state outputs that can be independently
enabled onto a common bus for each ALU leg.
The ALU itself is the conventional 181 type
used in all of the other MSI implementations.
As in the 11/40, the D Register (D Reg) latches
the ALU output so that results may be rewritten to the Scratchpads during a later clock
phase of the microcycle in which they are generated. The output of the D Register is the major,
but not sole, feedback route in the data paths.
The Bus Address register (BA) is loaded from
the A leg bus as in the 11/04 and 11/34. The
Address Out bus is driven by the BA and supplies addresses to the memory subsystem
(cache, relocation hardware, and Unibus interface). The Data In (DIN) bus routes data into

343

the processor from the memory subsystem, internal registers accessed via Unibus addresses
such as the PS, and constants emitted by the
microinstruction word. Scratchpad C and the
Instruction Register are loaded directly from
DIN in a manner reminiscent of the 11/20. A
register in SPM C is set aside specifically for
transfers from memory to the data paths. Results are routed from the data paths back to the
memory subsystem and internal registers via a
separate bus data out (DOUT) bus.
As compared to the other mid-range machines, several data path elements are unique to
the 11/60. The counter (CNTR) is an iteration
counter used by the Extended Instruction Set
and floating-point microcode. The Shift Register and Shift Register guard (shown together as
the SR in Figure 10) can be loaded in parallel
with D Reg and shifted one position right or
left. Either all or the low order seven bits of the
SR may be gated onto the A leg bus through the
X MUX (not shown). The shift tree is a network of multiplexers used for byte swapping,
sign extension, and field isolation and positioning. It is unusual in that it allows right shifts of
from 1 to 14 bit positions combinationally in a
single microcycle.
The PDP-ll/60 control unit is horizontally
microprogrammed in much the same manner as
the other midrange implementations. Extensive
use of Schottky logic throughout the processor
allows a fixed I70-nanosecond microcycle time.
Processor /Unibus communication is interlocked unlike either the 11/40 or 11/45. There
are several significant differences from the more
conventional implementations. Many of these
differences are generalizations of the microprogram flow control mechanism to allow more
functions of the base machine to be performed
by microcode rather than hardwired logic and
to create a user microprogramming environment which can be put to uses beyond executing
the PDP-II instruction set. The 11/60 has a
larger and more generalized set of BUTs than

344

THE PDP-11 FAMI LY

earlier machines. Also included for the first
time in a horizontally microprogrammed machine is a multilevel microsubroutine
call/return capability.
Increased reliance on microcode has expanded the control store to 4,096 words by 48
bits. Of this, 2,560 words are used to implement
the basic machine. The remaining 1,536 words
are available to the user through a ROM control store option; 1,024 are available through a
writable control store option. Since addressing
the microstore requires 12 bits, a page-addressing scheme has been adopted to avoid widening
the microword. Page size is 512 words reducing
microaddresses to 9 bits within a page. Microbranches across a page boundary require that
an additional 3-bit page field be specified.
Another concept used extensively in the
11/60 to reduce microword size is residual control. In this technique relatively static control
information is kept in set-up registers separate
from the microword. The microprogram must
load these registers to affect the data path elements which they control. Set-up registers are
used in the 11/60 to gate registers onto DIN
bus, enable data into registers from the DOUT
bus, select SR functions, and control certain actions of the shift tree.
The overlapping of a number of different
control fields by bit steering is a final means of
keeping the microword relatively narrow. Certain bits in the microword control the interpretation of corresponding microword fields.
This allows a single field to control several different functions. The one drawback of this technique is that these functions become mutually
exclusive within a single microword since their
simultaneous use would involve two different
interpretations of the same microfield.
Hardwired logic in the memory subsystem
detects internal addresses in a manner similar to
other PDP-II processors. However, the actual
access to these registers is accomplished
through microcode instead of additional control logic. Internal address access has been

added to the exceptional conditions detected by
the JAM logic of the 11/60. If the JAM microroutine finds that a microtrap has been caused
by an internal address access, an intraprocessor
transfer to or from the addressed register is performed. Unlike other JAM sequences, such
transfers are terminated by resuming the interrupted microprogram. Microcoded register access requires much more time than the
corresponding hardwired access. Reading the
PS, for instance, takes 33 microcycles or 5.610
microseconds using microcode where a single
microcycle suffices for the hardwired approach.
This is justified, however, by the decreased cost
of microcode versus hardwired logic and by the
infrequent access made to these registers.
Like the 11/40, the II/60 prefetches instructions and operand indices whenever possible.
Unlike the 11/40, the PC is incremented at the
time the prefetch is performed. Because of this,
prefetching cannot be done when the current instruction uses the PC as either a source or destination register. A second difference is that
service requests are not polled until the end of
the current instruction, when the next instruction may already be prefetched and the PC updated. When this occurs, two micro cycles must
be spent to decrement the PC to restore its old
value before proceeding with the service phase.
IMPLEMENTATION OF A MINIMAL COST
PDP-11

The LSI-II (Chapter 12) is designed for the
low-end market where there is more concern for
low cost than high performance. Integrated circuit package count and printed circuit board
area, the main determinants of manufacturing
cost, are kept low through an n-channel MOS
LSI technology implementation of the CPU.
The result is a PDP-II processor with four kilowords of semiconductor memory on a single 8.5
X 10.5-inch (standard DEC quad height)
printed circuit board which can execute the entire PDP-II /40 instruction set.

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11. A CASE STUDY

The constraints imposed by current semiconductor technology dictate much of the implementation of the LSI-II. The entire CPU
consists of four LSI packages plus a number of
standard TTL SSI and MSI packages for clock
generation and bus interfacing. A system control chip provides microinstruction addressing
logic plus an interface to external signals used in
bus control. A data paths chip contains the registers and arithmetic logic unit of the machine.
Two chips are microcode ROMs (MICROMs).
Each contains 512 microinstruction words with
a width of 22 bits. An optional third MICROM
adds the Extended Instruction Set/floatingpoint instruction set option of the PDP-I 1/40.
To decrease the complexity of the machine, the
traditional Unibus was abandoned in favor of a
scheme requiring fewer bus lines. Most notable
is the multiplexing of both data and addresses
onto a single set of 18 data/address lines,
DAL< 17:00>. A significant savings over the 34
lines dedicated to data and address in the
Unibus results at the expense of bus cycle speed.
The 22-bit microinstruction word of the LSI11 is quite narrow compared to the microwords
of the horizontally microprogrammed PDP-lIs
which range from 40 to 64 bits wide. Four bits
are not decoded and provide direct TTL-compatible signals which are used by logic external
to the CPU chips. Another two bits are used
within the CPU chips to control next microinstruction addressing. The remaining 16 bits
are decoded as a microinstruction by the CPU
chips. LSI-II microinstructions differ little in
form from conventional minicomputer instructions with their operation code and operand
(which may be register, microcode address, or
literal) fields. These require a great deal more
decoding than the horizontal microinstructions
of other designs.
The LSI-II microstore is larger than the control store of any other PDP-II except the 11/60.

345

Since LSI-II microinstructions lack the possibilities for parallelism inherent in the horizontal
microinstructions, more LSI-II microinstructions are needed to code a given operation. In addition, certain functions which are
handled with combinational logic in other
PDP-II control units and data paths are microcoded in the LSI-II. Finally, the LSI-ll has
more elaborate console microcode than the
other implementations. As a result, the LSI-II
has 22,528 bits of microstore versus 14,336 bits
for the PDP-I 1/40, 16,384 bits for the PDP11/45, and 122,880 bits for the PDP-I 1/60. The
narrow microword is used in spite of its attendant problems due to the limitation imposed by
the packaging of the MOS CPU chips. Only 40
pins are available to carry power and signals to
and from each chip, limiting the number of lines
available for transmitting the microword from
the MICROMs to the control and data path
chips.
Technology also imposes a serious constraint
on instruction decoding. The equivalent of a
branch on microtest allows only eight bits to be
decoded at a time. This is sufficient for decoding the majority of instructions; however, the
remainder require additional decoding which
may consume as many as eight microcycles.
This is in marked contrast with all other PDPlIs which require only a single microcyc1e to do
the initial instruction decode at the end of the
fetch phase (BUT IRDECODE).* The effect
that this has on the average duration of the LSI11 fetch phase is evident from Table 4.
Figure 11 details the data paths around which
the operands of the macroinstruction level machine circulate. As with the medium-performance implementations, the ALU is the hub of
activity, operating upon quantities supplied
from the Scratchpad memory. The A MUX selects from the output of the ALU, the high or

*The 11/60 requires two microcycles to decode certain instructions.

THE PDP-11 FAMILY

346

Table 4.

Average PDP-11 Instruction Execution Times in Microseconds

Fetch

Source

Dest.

Execute

Total

Speed
Relative
to LSI-11

LSI-11

2.514

0.689

1.360

1.320

5.883

1.000

PDP-11/04

1.940

0.610

0.811

0.682

4.043

1.455

PDP-11/10

1.500

0.573

0.929

1.094

4.096

1.436

PDP-11120

1.490

0.468

0.802

0.768

3.529

1.667

PDP-11/34

1.630

0.397

0.538

0.464

3.029

1.942

PDP-11/40

0.958

0.260

0.294

0.575

2.087

2.819

PDP-11/45
(bipolar
memory)

0.363

0.101

0.213

0.185

0.863

6.820

PDP-11/60
(87 percent
cache hit
ratio)

0.541

0.185

0.218

0.635

1.578

3.727

low byte of the data/address lines, and the processor flags. The selected quantity is fed back to
be rewritten into the Scratchpad. Constants
supplied as literals from the microinstruction

MUX

NOTES

1.
2.

All data paths are 8 bits wide unless otherwise indicated
!R
maintained within SPM.

,s

Figure 11.

LSI-11 data paths.

word may be gated into the data paths through
the B leg MUX to the ALU. Additional paths
exist for transmitting information in and out on
the data/address lines.
Significant differences exist between the data
paths of the LSI-ll and the mid-range machines. One major difference is in the width of
the data paths. The LSI-II is the only member
of the PDP-II family with data paths 8 bits
rather than 16 bits wide. This is necessitated by
limitations in current semiconductor chip density. Bus paths in particular occupy large
amounts of chip real estate dictating their reduction in width. Since only 8 bits of data can
be processed at a time, 2 microcycIes are required to accomplish any I6-bit operation. A
second effect is the elimination of logic that
would otherwise be necessary to configure the
data paths for both byte and word operations.
A last unique characteristic is the absence of a B
Register for feeding the B leg of the ALU. Instead, the B leg is fed from a second read port

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11. A CASE STUDY

into the Scratchpad lVlemory. in this, the LSI11 bears a curious resemblance to the PDP11/45 and 11/60. The difference is that while
the LSI-II uses this feature to eliminate cycles
that would be needed to load a B Register, there
is not sufficient logic to allow source and destination registers to be accessed simultaneously.
Consequently, multiple cycles are still required
to set up register/register operations on the
LSI-ll.
The final important performance factor is
again a direct result of the circuit technology
employed. NMOS logic is not as fast as the
bipolar logic found in every other PDP-II implementation so that the microcycle time of the
LSI-II is 400 nanoseconds or one-third slower
than the next slowest PDP-II. Coupled with the
larger number of microcycles necessary to execute a given macroinstruction, this ca uses the
LSI -11 to lag in performance.
IMPLEMENTATION OF A HIGH
PERFORMANCE PDP-11

The PDP-ll/45 was designed for maximum
performance and followed the 11/20 to become
the second member of the PDP-II family. Maxim urn performance is achieved with a complex
set of data paths allowing highly parallel operation and an optional high-speed semiconductor memory (bipolar or MaS) with its
own path into the processor called the Fastbus.
The extensive use of Schottky TTL in the processor makes possible a 150-nanosecond cycle
time, half as long as that in some mid-range designs.
The complexity of the PDP-11/45 data paths
is evident from Figure 12 even with several of
the special purpose registers and buses omitted
for clarity. The overall organization still bears
some resemblance to the mid-range PDP-II
data paths, however. The ALU remains the hub
of data path activity with its output the primary
feedback path to the processor registers, al-

347

though not the only one as in other implementations. The ALU is based upon the Schottky
equivalent of the 74181 chip used in most other
PDP-II designs. The difference begins with the
multiplexers driving the A and B legs of the
ALD. These ~Y1DXs allow operands to be
routed directly to the proper leg without using
additional cycles to move operands from register to register. KO MUX and K1 MUX (combined in Figure 12) are multiplexers used in
conjunction with the B MUX to gate constants,
trap vector addresses, and branch offsets into
the B leg of the ALU.
Among the registers supplying the A MUX
and B MUX are the source and destination operand registers (S Reg and 0 Reg, respectively).
These, in turn, are supplied by the SR MUX
and DR MUX which select data from individual Scratchpad Registers or the Program
Counter. Besides holding operands from the
general registers, the S Reg and 0 Reg act as
working registers. In particular, 0 Reg is a shift

CONDITION

NOTE:
All data paths are 16 bits wide unless otherwise indicated.

Figure 12.

PDP-11/45 data paths.

348

THE PDP-11 FAMI LY

register used to accumulate the less significant
half of results during multiply and divide.
Separate Scratch pads are maintained so that
source and destination general registers may be
read simultaneously and independently. This
necessitates both Scratchpads being written together to keep their contents identical. Each
Scratchpad is organized as 16 words of 16 bits
each. Fifteen words in each Scratchpad are actually used: two sets of general registers (RO
through R5) and three sets of stack pointers
(R6). Register set selection is controlled by status bits in the PS.
The Program Counter is not maintained in
the Scratchpad Registers as in other PDP-II s.
Rather, it is held separately so that it may be
routed directly to the BA MUX while the S Reg
and D Reg are occupied with other operations.
Moreover, two Program Counters are implemented. PCB holds the current value of the Program Counter and is used as a general register
or bus address. PCA holds the new value of the
Program Counter allowing the PC to be updated while the old PC value is still in use, after
which PCB is clocked to load it with the new
value contained in PCA.
The SHF MUX can right shift or byte swap
data from the ALU before it is clocked into the
Scratchpads. It also provides a route from PCB
to the S Reg and/or D Reg when the PC is used
as a general register. This arrangement precludes the shifting or byte swapping of data
being loaded into the PC that is possible with
data destined for one of the other general registers residing in the Scratchpads. As a consequence, arithmetic shift left and byte swap
operations on the PC do not cause the PC to be
modified, although the condition codes are updated as though it were.
Processor access to the Unibus, Fastbus, and
internal registers is via the Bus Register MUX
(BR MUX), the bus register (BR and BRA),
and the Data Out MUX (D MUX). The BR
and BRA (the duplication is due to electrical
loading considerations) are logically a single

register as shown in Figure 12. They receive all
incoming data and transmit almost all outgoing
data in addition to accumulating the more significant half of results during multiply and divide. The BR MUX selects the input to the BR
(and BRA) from among the two external buses
and internal input bus for input to the processor
and from the SHF MUX for output from the
processor via the BR and D MUX to the external buses and internal output bus. The internal
buses connect a number of special registers and
an optional Floating-Point Processor to the
data paths. Of these, only the PS is indicated in
Figure 12. The Instruction Register (duplicated
as IR and AF IR, again for electrical loading
reasons) are also loaded from the BR MUX but
are clocked only when an instruction is fetched.
Bus addresses are applied directly to the
Unibus or to an optional memory mapping unit
by the Bus Address multiplexer (BA MUX). No
Bus Address register is needed since memory
access and processor clocking are fully interlocked except during an overlapped fetch in
which case the PCB is held selected while operations continue in other parts of the data paths.
The PDP-I 1/45 control unit is horizontally
microprogrammed and is for the most part
quite similar to the archetype described for midrange PDP-II implementations .. The control
store is 256 words X 64 bits. The relatively wide
microword is necessary for generating the large
number of control signals used in conditioning
and clocking the complicated data paths. An
additional source of complexity is the timing
logic needed to produce and use the five processor clock phases.
There are two classes of microsequence-altering functions corresponding to the BUTs of
other PDP-lIs. The first class consists of simple
branches having four or fewer possible branch
addresses. These operate in the same fashion as
BUTs. The second class of branches consists of
three complex instruction decoding functions
called forks. The first, fork A, does the initial
instruction decode and corresponds to the BUT

IMPACTOF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11. A CASE STUDY

IRDECODE of other implementations. Fork B
dispatches to an execute phase microroutine
following a destination operand fetch. Fork C
dispatches to a destination phase microroutine
following a source operand fetch. A fork enable
fieid in the microword is used to enabie one
fork at most during a cycle. When a fork and
branch are combined in the same cycle, the fork
is disabled if the branch is taken. This permits
the implementation of certain functions without
the use of additional cycles.
The 11/45 microcode is structured to take
full advantage of the data paths and processor IUnibus overlap. Besides intensively exploiting special cases in the addressing modes and
instruction set, the microprogram implements
operand and instruction fetch overlap in much
the same way as the 11/40. The one difference
between the two prefetch mechanisms is that
the 11/45 updates the PC value in PCB and
stores it in PCA at the time the prefetch is
started. References to the PC work correctly because PCB holds the old PC value until it is updated at the appropriate time.
All the design decisions described above are
directed toward implementing the fastest system possible. Tradeoffs involving circuit technology and control unit and data path
organization have all been made with this end
in mind.
MEASURING THE EFFECT OF DESIGN
TRADEOFFS ON PERFORMANCE

There are two alternative approaches to the
problem of determining just how the particular
binding of different design decisions affects the
performance of each machine:
1.

Top-down approach. Attempt to isolate
the effect of a particular design tradeoff
over the entire space of implementations
by fitting the individual performance figures for the whole family of machines to
a mathematical model which treats the

2.

349

design parameters as independent variables and performance as the dependent
variable.
Bottom-up approach. Make a detailed
sensitivity analysis of a particular
tradeoff within a particuiar machine by
comparing the performance of the machine both with and without the design
feature while leaving all other design features the same.

Each approach has its assets and liabilities
for assessing design tradeoffs. The first method
requires no information about the implementation of a machine, but does require a sufficiently large collection of different
implementations, a sufficiently small number of
independent variables, and an adequate mathematical model in order to explain the variance
in the dependent variable to some reasonable
level of statistical confidence. The second
method, on the other hand, requires a great deal
of knowledge about the implementation of the
given system and a correspondingly great
amount of analysis to isolate the effect of the
single design decision on the performance of the
complete system. The information that is
yielded is quite exact, but applies only to the
single point chosen in the design space and may
not be generalized to other points in the space
unless the assumptions concerning the machine's implementation are similarly generalizable. In the following subsections the first
method is used to determine the dominant
tradeoffs, and the second method is used to estimate the impact of individual implementation
tradeoffs.
Quantifying Performance

Measuring the change in performance of a
particular PD P-ll processor model due to design changes presupposes the existence of some
performance metric. Average instruction execution time was chosen because of its obvious
relationship to instruction stream throughput.

350

THE PDP-11 FAMILY

Neglected are such overhead factors as Direct
Memory Access, interrupt servicing, and, on
the LSI-II, dynamic memory refresh. Average
instruction execution times may be obtained by
benchmarking or by calculation from instruction frequency and timing data. The latter
method was chosen due to its freedom from the
extraneous factors noted above and from the
normal clock rate variations found from machine to machine of a given model. This method
also allows the designer to calculate the change
in average instruction execution time that
would result from some change in the implementation. Such frequency-driven design has
already been applied in practice to the PDP11 /60 (Chapter 13).
The instruction frequencies are tabulated in
Appendix A and include the frequencies of the
various addressing modes. These figures were
calculated from measurements made by Strecker [1976a] on 7.6 million instruction executions traced in ten different PDP-II instruction
streams encountered in various applications.
While there is a reasonable amount of variation
of frequencies from one stream to the next, the
figures in Appendix A should be representative.
Instruction times are tabulated in Appendix
B. These times were calculated from the engineering documents for each machine. The times
vary from those published in the PD P-II processor handbooks for two reasons. First, in the
handbooks, times have been redistributed
among phases to ease the process of calculating
instruction times. In the appendix an attempt
has been to accurately characterize each phase.
Second, there are inaccuracies in the handbooks
arising from conservative timing estimates and
engineering revisions. The figures included here
may be considered more accurate.
A performance figure is derived for each machine by weighting its instruction times by frequency. The results, given in Table 4, form the
basis of the analyses to follow.

Analysis of Variance of PDP-11
Performance Top-Down Approach

The first method of analysis described is employed in an attempt to explain most of the variance in PDP-II performance in terms of two
parameters:
1.

2.

Microcycle time. The microcycle time is
used as a measure of processor performance which excludes the effect of the
memory subsystem.
Memory read pause time. The memory
read pause time is defined as the period
of time during which the processor clock
is suspended during a memory read. For
machines with processor /V nibus overlap, the clock is assumed to be turned off
by the same microinstruction that initiates the memory access. Memory read
pause time is used as a measure of the
memory subsystem's impact on processor performance. Note that this time is
less than the memory access time since
all PDP-II processor clocks will continue to run at least partially concurrently with a memory access.

The choice of these two factors is motivated
by their dominant contribution to, and (approximately) linear relationship with, performance. Keeping the number of independent
variables low is also important due to the small
number of data points being fit to the model.
The model itself is of the form:

where t· is the average instruction execution
I
.
I
time of machine i from Table 5. The mlcrocyc e
time of machine i is eli (for machine with selectable microcycle times, the predominant time is
used). C2i is the memory read pause time of machine i.

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11, A CASE STUDY

This modei is only an approximation since it
assumes kl and k2 will be constant over all machines. In general this will not be the case. kl is
the number of microcycles expected in a canonical instruction. This number will be a function
mainly of data path connectivity, and strictly
speaking, another factor should be included to
take that variability into account; however,
since the data path organization of all PDP-II
implementations considered here (excepting the
11/03, 11/45, and 11/60) are comparable, the
simplifying assumption of calling them all identical at the price of explaining somewhat less of
the variance is made. The number of memory
accesses expected in a canonical instruction is
k2; it also exhibits some variability from machine to machine. A small part of this is due to
the fact that some PD P-Il s actually take more
memory cycles to perform a given instruction
than do others (this is really only a factor in
certain 11/10 instructions, notably JMP and
JSR, and the 11/20 MOV instruction). A more
important source of variability is the
Unibus/processor overlap logic incorporated
into some PDP-II implementations which effectively reduces the actual contribution of the
k2C2i term by overlapping more memory accesstime with processor operation than is excluded
from the memory read pause time.
Given the model and the dependent and independent data for each machine (Table 5), a
linear regression is applied to determine the
coefficients kl and k2 and to find out how much
of the variance is explained by the model.
Applying the regression over all eight processors: kl = 11.580, k2 = 1.162, R2 = 0.904. R2 is
the amount of variance accounted for by the
model or 90A percent. If the regression is applied to just the six mid-range processors, kl =
10.896, k2 = 1.194, R2 = 0.962. R2 increases to
96.2 percent partly because the LSI-II and
11/45 can be expected to have a different k
coefficients than the mid-range machines and

351

do not fit the modd as welL Note that if two
mid-range machines, the 11/04 and the 11/40,
are eliminated instead of the LSI-II and 11/45,
R2 decreases to 89.3 percent rather than increasing. The k coefficients are close to what
should be expected fOi avciagc miciocyclc and
memory cycle counts. Since kl is much larger
than k2, average instruction time is more sensitive to microcycle time than to memory read
pause time by a factor of kl/ k2 or approximately 10. The implication for the designer is
that much more performance can be gained or
lost by perturbing the microcycle time than
memory read pause time.
Although this method lacks statistical rigor,
it is reasonably safe to say that memory and microcycle speed do have by far the largest impact
on performance and that the dependency is
quantifiable to some degree.
Table 5. Top-Down Model Parameters in
Microseconds
Independent Variables
Memory

Dependent
Variable

MicroCycle
Time

~e~d

Pause
Time

Average
!!"'!s!!'!.!<:t:ion
Execution
Time

LSI-11

0.400

0.400

5.883

PDP-11/04

0.260

0.940

4.043

PDP-11/10

0.300

0.600

4.096

PDP-11120

0.280

0.370

3.529

PDP-11/34

0.180

0.940

3.029

PDP-11/40

0.140

0.500

2.087

PDP-11/45
(bipolar
memory)

0.150

0.000

0.863

PDP-11/60
(87 percent
cache hit
ratio)

0.170

0.140

1.578

352

THE PDP-11 FAMILY

Measuring Second Order Effects: BottomUp Approach

Effect of Adding Processor/Unibus Overlap to the 11/04. Processor/Unibus overlap is

I t is much harder to measure the effect of
other design tradeoffs on performance. The approximate methods employed in the previous
section cannot be used because the effects being
measured tend to be swamped out by first order
effects and often either cancel or reinforce one
another making linear models useless. For these
reasons, such tradeoffs must be evaluated on a
design-by-design basis as explained above. This
subsection evaluates several design tradeoffs in
this way.

not a feature of the 11/04 control unit. Adding
this feature involves altering the control
unit/Unibus synchronization logic so that the
processor clock continues to run until a microcycle requiring the Unibus data from a DATI
or DATIP is detected. A Bus Address register
must also be added to drive the Unibus lines
after the microcycle initiating the DATIP is
completed. This alteration allows time to be
saved in two ways. First, processor cycles may
be overlapped with memory read cycles as explained in the subsection on control units. Second, since Unibus data is not read into the data
paths during the cycle in which the DATIP occurs, the path from the ALU through the A
MUX and back to the registers is freed. This
permits certain operations to be performed in
the same cycle as the DATIP. For example, the
microword BA +- PC; DATI; PC +- PC + 2
could be used to start fetching the word pointed
to by the PC while simultaneously incrementing
the PC to address the next word. The cycle following could then load the Unibus data directly
into a Scratchpad register rather than loading
the data into the B Register and then into the
Scratch pad on the following cycle as is necessary without overlap logic. A savings oftwo microcycle times would result.
DA TI and DA TIP operations are scattered
liberally throughout the 11/04 microcode; however, only those cycles in which an overlap
would produce a time savings need be considered. An average of 0.730 cycles can be saved or
overlapped during each instruction. If all of the
overlapped time is actually saved, 0.190 microsecond or 4.70 percent will be pared from the
average instruction execution time. This
amounts to a 4.93 percent increase in performance.
Effect of Caching on the 11/60. The PDP11/60 uses a cache to decrease its effective
memory read pause time. The degree to which
this time is reduced depends upon three factors:

Effect of Adding a Byte Swapper to the
11/10. It is evident that the lack of a byte

swapper on the PDP-II / 10 has a negative effect
on performance. In this subsection, the performance gained by the addition of a byte swapper either before the B Register or as part of the
B leg multiplexer is calculated. Adding a byte
swapper would change five different parts of the
instruction interpretation process: the source
and destination phases where an odd-byte operand is read from memory, the execute phase
where a swap byte instruction is executed in
destination mode 0 and in destination modes 1
through 7, and the execute phase where an oddbyte address is modified. In each of these cases
seven fast shift cycles would be eliminated and
the remaining normal speed shift cycle could be
replaced by a byte swap cycle resulting in a savings of seven fast shift cycles or 1.050 microseconds. None of this time is overlapped with
Unibus operations; hence, all would be saved.
This savings is effected, however, only when a
byte swap or odd-byte access is actually performed. The frequency with which this occurs is
just the sum of the frequencies of the individual
cases noted above or 0.0640. Multiplied by the
time saved per occurrence gives a savings of
0.0672 microsecond or 1.64 percent of the average instruction execution time. The insignificance of this savings could well be used to
support the decision for leaving the byte swapper out of the PDP-ll/lO.

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11, A CASE STUDY

the cache read hit pause time, the cache read
miss pause time, and the ratio of cache read hits
to total memory read accesses. A write-through
cache is assumed; therefore, the timing of memory write accesses is not affected by caching and
oniy read accesses need be considered. The performance of the 11/60 as measured by averag~
instruction execution time is modeled exactly as
a function of the above three parameters by the
equation:

where t is the average instruction execution
time, a iIs the cache hit ratio, kl is the average
execution time of a PDP-ll/60 instruction excluding memory read pause time but including
memory write pause time (1.339 microseconds);
k2 is the number of memory reads per average
instruction (1.713); k3 is the memory read pause
time for a cache hit (0.000 microseconds); and
k4 is the memory read pause time for a cache
miss (1.075 microseconds).
The above equation can be rearranged to
yield:

The first term and the coefficient of the second term in the equation above evaluate to
3.181 microseconds and 1.842 microseconds, respective~y, with the given k parameter values.
This reduces the average instruction time to a
function of the cache hit ratio making it possible to compare the effect of various caching
schemes on 11/60 performance in terms of this
one parameter.
The effect of various cache organizations on
the hit ratio is described for the PDP-II Family
in general (Chapter 10) and for the PDP-II/60
in particular in Mudge (Chapter 13). If no cache
is provided, the hit ratio is effectively zero and
the average -instruction execution time reduces
to the first term in the model or 3.181 micro-

353

seconds. A set associative cache with a set size
of I word and a cache size of 1,024 words has
been found through simulation to give a 0.87 hit
ratio. An average instruction time of 1.578 microseconds results in a 101.52 percent improvement in performance over that without the
cache.
The cache organization described above is
that actually employed in the 11/60. It has the
virtue of being relatively si01ple to implement
and therefore reasonably inexpensive. Set size
or cache size can be increased to attain a higher
hit ratio at a correspondingly higher cost. One
alternative cache organization is a set size of 2
words and a cache size of 2,048 words. This organization boosts the hit ratio to 0.93 resulting
in an instruction time of 1.468 microseconds, an
increase in performance of 7.53 percent. This
increased performance must be paid for, however, since twice as many memory chips are
needed. Because the performance increment derived from the second cache organization is
much smaller than that of the first while the
cost increment is approximately the same, the
first organization is more cost-effective.
Design Tradeoffs Affecting the Fetch
Phase. The fetch phase holds much potential
for performance improvement since it consists
of a single short sequence of micro-operations
that, as Table 4 clearly shows, involves a sizable
fraction of the average instruction time due to
the inevitable memory access and possible service operations. In this subsection, two approaches to cutting this time are evaluated for
four different processors.

The Unibus interface logic of the PDP-II /04
and 11/34 are very similar. Both insert a delay
into the initial microcycle of the fetch phase to
allow time for Bus Grant arbitration circuitry
to settle so that a microbranch can be taken if a
serviceable condition exists. If the arbitration
logic were redesigned to eliminate this delay,
the average instruction execution time would
drop by 0.220 microsecond for the 11/04 and

354

THE PDP-11 FAMILY

0.150 microsecond for the 11/34. * The resulting
increases in performance would be 5.75 percent
and 5.21 percent, respectively.
Another example of a design feature affecting
the fetch phase is the operand/instruction fetch
overlap mechanism of the 11/40, 11/45, and
11/60. From the normal fetch times in Appendix B and the actual average fetch times given in
Table 4, the savings in fetch phase time alone
can be calculated to be 0.162 microsecond for
the 11/40, 0.087 microsecond for the 11/45, and
0.118 microsecond for the 11/60 or an increase
of 7.77 percent, 10.07 percent, and 8.11 percent
over what their respective performances would
be if fetch phase time were not overlapped.
These examples demonstrate the practicality
of optimizing sequences of control states that
have a high frequency of occurrence rather than
just those which have long durations. The 11/10
byte swap logic is quite slow, but is utilized infrequently causing its impact upon performance
to be small while the bus arbitration logic of the
11/34 exacts only a small time penalty, but does
so each time an instruction is executed and results in a larger performance impact. The usefulness of frequency data should thus be
apparent since the bottlenecks in a design are
often not where intuition says they should be.
SUMMARY AND USE OF THE
METHODOLOGIES

The PDP-II offers an interesting opportunity
to examine an architecture with numerous implementations spanning a wide range of price
and performance. The implementations appear
to fall into three distinct categories: the midrange machines (PDP-II /04, 11/10, 11/20,
11/34, 11/40, 11/60); an inexpensive, relatively
low performance machine (LSI-II); and a comparatively expensive, but high performance machine (PDP-I 1/45). The mid-range machines
are all minor variations on a common theme

with each implementation introducing much
less variability than might be expected. Their
differences reside in the presence or absence of
certain embellishments rather than in any major
structural differences. This common design
scheme is still quite recognizable in the LSI-II
and even in the PDP-I Ij45. The deviations of
the LSI-II arise from limitations imposed by
semiconductor technology rather than directly
from cost or performance considerations although the technology decision derives from
cost. In the PDP-I Ij45, on the other hand, the
quantum jump in complexity is motivated
purely by the desire to squeeze the maximum
performance out of the architecture.
From the overall performance model presented in the section on top-down performance
analysis, it is evident that instruction stream
processing can be sped up either by improving
the performance of the memory subsystem or
the performance of the processor. Memory subsystem performance depends upon number of
memory accesses in a canonical instruction and
the effective memory read pause time. There is
not much that can be done about the first number since it is a function of the architecture and
thus largely fixed. The second number may be
improved, however, by the use of faster memory components or techniques such as caching.
Performance of the PDP-II processor itself
can be enhanced in two ways: by cutting the
number of processor cycles to perform a given
function or by cutting the time used per microcycle. Several approaches to decreasing the effective microcycle count have been
demonstrated:
I.

Structure the data paths for maximum
parallelism. The PDP-I 1/45 can perform
much more in a given microcycle than
any of the mid-range PDP-lIs and, thus,
needs fewer microcycles to complete an
instruction. To obtain this increased

*These figures are typical. Since the delay is set by an RC circuit and Schmitt trigger, the delay may vary considerably from
machine to machine of a given model.

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11. A CASE STUDY

2.

3.

functionaiity, however, a much more
elaborate set of data paths is required in
addition to a highly developed control
unit to exercise them to maximum potential. Such a change is not an incremental one and invoives rethinking
the entire implementation.
Structure the microcode to take best advantage of instruction features. All processors except the 11/10 handle
JMP / JSR addressing modes as a special
case in the microcode. Most do the same
for the destination modes of the MOV
instruction because of its high frequency.
Varying degrees of sophistication in instruction dispatching from the BUT IRDECODE at the end of every fetch is
evident in different machines resulting in
various performance improvements.
Cut effective microcycle count by overlapping processor and Unibus operation.
The PDP-II/ 10 demonstrates that a
large microcycle count can be effectively
reduced by placing cycles in parallel with
memory access operations whenever
possible.

Increasing microcycle speed is perhaps more
generally useful since it can often be applied
without making substantial changes to an entire
implementation. Several of the mid-range PDP11 s achieve most of their performance improvement by increasing microcycle speed in the following ways:
1.

Make the data paths faster. The PDP11/34 demonstrates the improvement in
microcycle time that can result from the
judicious use of Schottky TTL in such
heavily travelled points as the ALU. Replacing the ALU and carry-lookahead
logic alone with Schottky equivalents
saves approximately 35 nanoseconds in
propagation delay. With cycle times running 300 nanoseconds and less, this
amounts to better than a 10 percent increase in speed.

2.

355

Make each microcycle take only as long
as necessary. The 11/34 and 11/40 both
use selectable microcycle times to speed
up cycles which do not entail long data
path propagation delays.

Circuit technology is perhaps the single most
important factor in performance. It is only stating the obvious to say that doubling circuit
speed doubles total performance. Aside from
raw speed, circuit technology dictates what it is
economically feasible to build as witnessed by
the SSI PDP-II/20, the MSI PDP-ll/40, and
the LSI-II. Just the limitation of a particular
circuit technology at a given point in time may
dictate much about the design tradeoffs that
can be made - as in the case of the LSI-II.
Turning to the methodologies, the two presented in the previous section can be used at
various times during the design cycle. The topdown approach can be used to estimate the performance of a proposed implementation or to
plan a family of implementations, given only
the characteristics of the selected technology
and a general estimate of data path and memory cycle utilization. The bottom-up approach can be used to perturb an existing or
planned design to determine the performance
payoff of a particular design tradeoff. The relative frequencies of each function (e.g., addressing modes, instructions, etc.), while required for
an accurate prediction, may not be available.
There are, however, alternative ways to estimate relative frequencies. Consider the three
following situations:
1.

2.

At least one implementation exists. An
analysis of the implementation in typical
usage (i.e., benchmark programs for a
stored program computer) can provide
the relative frequencies.
No implementation exists, but similar systems exist. The frequency data may be
extrapolated from measurements made
on a machine with a similar architecture.
For example, the Gibson Mix [Bell and

356

3.

THE PDP-11 FAMI LY

Newell, 1971] provided the relative frequencies of IBM 7090 functions from
which the relative frequencies of IBM
360 functions were estimated.
No implementation exists, and there are
no prior similar systems. From knowledge of the specifications, a set of mostused functions can be estimated (e.g., instruction fetch, register and relative addressing, move and add instructions for
a stored program computer). The design
is then optimized for these functions.

APPENDIX A:

Of course, the relative frequency data should
always be updated to take into account new
data.
Our purpose in writing this paper has been
twofold: to provide data about design tradeoffs
and to suggest design methodologies based on
this data. It is hoped that the design data will
stimulate the study of other methodologies
while the results of the design methodologies
presented here have demonstrated their usefulness to designers.

INSTRUCTION TIME COMPONENT FREQUENCIES

Frequency

1.0000
Fetch
Source Mode
0.4069
0.1377
OR
1 @R or (R)
0.0338
0.1587
2(R)+
3@(R)+
0.0122
4 -(R)
0.0352
5 @-(R)
0.0000
6X(R)
0.0271
7@X(R)
0.0022
No Source
0.5931
NOTE:
Frequency of odd-byte addressing
(SM 1-7) = 0.0252.
Destination
0.6872
Data Manipulation Mode
0.6355
0.3146
OR
1 @R or (R)
0.0599
2(R)+
0.0854
3 @(R)+
0.0307
4 -(R)
0.0823
5 @-(R)
0.0000
6X(R)
0.0547
7 @X(R)
0.0080
NOTE:
Frequency of odd-byte addressing
(DMl-7) = 0.0213.

Frequency
Jump (JMP / JSR) Mode

0.0517

OR
1 @R or (R)
2 (R)+
3@(R)+
4 -(R)
5 @-(R)
6 X(R)
7 @X(R)

0.0000
(ILLEGAL)
0.0000
0.0000
0.0079
0.0000
0.0000
0.0438
0.0000

Execute Instruction

1.0000

Double Operand

0.4069

ADD
SUB
BIC
BICB
BIS
BISB
CMP
CMPB
BIT
BITB
MaY
MOYB
XOR

0.0524
0.0274
0.0309
O.
0.0012
0.0013
0.0626
0.0212
0.0041
0.0014
0.1517
0.0524
O.

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11, A CASE STUDY

Frequency

Frequency

Single Operand
CLR
CLRB

0.2286
0.0186
0.0018

£""'£"'\11.,{

n.

'-..-v IV!

COMB
INC
INCB
DEC
DECB
NEG
NEGB
ADC
ADCB
SBC
SBCB
ROR
RORB
ROL
ROLB
ASR
ASRB
ASL
ASLB
TST
TSTB
SWAB
SXT

v.

O.
0224

O.
0.0809

O.
0.0038

O.
0.0070

O.
O.
O.
0.0036

O.
0.0059

O.
0.0069

O.
0.0298

357

No Destination

0.3128

Branch
All Branches (true)
All Branches (false)
SOB (true)
SOB (false)

0.2853
0.1744
0.1109

O.
O.

Jump
JMP
JSR

0.0517
0.0272
0.0245

Control, Trap, and
Miscellaneous
Set/Clear Condition Codes
MARK
RTS
RTI
RTT
lOT
EMT
TRAP
BP-:

0.0270
0.0017

O.
0.0236

O.
O.
O.
0.0017

O.
O.

O.
0.0329
0.0079
0.0038

O.

NOTES:
Frequency of destination odd-byte addressing (DMI-7) =
0.0213
Execution frequencies indicated as O. have an aggregate frequency <0.0050.

w

Appendix B: Instruction Execution Times for PDP-11 Models

(11

ex>

(,us)

LSI-11
0.40

PDP-11/04
0.26

PDP-11/10
0.30

PDP-11/20
0.28

PDP-11/34
.18/.34

PDP-11/40
.14/.20/.30

PDP-11/45
0.15

PDP-11/60
0.17

Fetch

1/5

2AO

1/3

1.94

1/5

1/4

1/3

1.63

1/4

1.12

1/3

1/3 051

Source*
OR
1 (n R or (R)

0/1

OAO

012

0.52

0/2 0.60
1/3 1.50
1/5 1.50

0/1
1/1

0.18 1
1.12

0/0 0.0

1.30

112 0.30
112 0.30

112 0.34
112 0.34

2/3
1/2

2.42
1.30

2/5 0.75

2/3

2A2

2/4 2.60
3/5 3.72

0.0
0.78
0.84
1.72
0.84
1.72
1.34

0/0 0.0

112

0/0
1/3
1/3
2/5
1/3
2/5
2/5

2/6 0.90
2/4 0.60

2/5 0.85
1/3 0.51
2/6 1.02
2/4 0.68

317

2.12

317

317

0/1
1/1

/0
0.0
1/3 0.78

Microcycle

2 (R)

3

+

(n (R)

4

+

1/3 1.60 1
1/4 2.003
217 3.60 1
1/52A02

(R)

2/8

4.00 1

6 X (R)

2/9

4AO 1

7

3/126.00 1

5 (n

(R)

(n X (R)

Destination
OR
1 (n R or (R)

+

0/1

OAO

1/4 2.00

6 X (R)

1/5 2AO 1
2/8 4.00
1/6 2.80 1
2/9 4.40
2/104.80

7

3/136AO

2 (R)
3

(II

4

~

(R)

+

(R)

5 (n

(R)

(n X (R)

1/2 1.46
1/3 1.72
2/5 3.18
1/31.72
2/5 3.18
2/6 3A4
3/8 4.9

217

1.50

0/0 0.0
1/4
1/4
217

2.70

1/4 1.50
2/6 2.70

1/4 1.49

2.70

217
217

3/9

3.90

3/103.91

012

112

1/4 1.50
2/6 2.70

2.70
2.70

0.60 1
1/3 1.50
1/5 1.50

0/1 0.28
1/4 1.39
1/4 1.39

217

217

2/4 2.92
2/5 3.18

217

317

3/9

4.64

lA9
lA9
2.70

217

0/1 0.26
1/1 1.20
1/2 1.46
2/4 2.92
lA6

lA9

2.70

1121.301
2/3 2A2
112 1.30

2.60

1/4

1.39

2.70

217
217

2.60
2.60

3.90

3/103.81

0.18 1,2
1.12

2/3 2.42
2/4 2.60
3/5 3.72

1/3

OA5

1.05

0/0 0.0
112 0.3
112 0.3

1/30.84

2/5 1.70
1/3 0.84
2/5 1.70
2/5 1.78
317

OA5

2/5 0.15
1/3

OA5

2/6 0.9
2/5 0.75
3/8 1.2

256

-t
I

m

"1J

o
7'

r

-<

1.19

0/0 0.0
1/2 0.34
112

0.34

2/5 0.85
1/3 0.51
2/6 1.02
2/5 0.85
3/8 1.36

Jump (JMP/JSR)

1 «(I R or (R)
2 (R) +
3(n (R)

4

+

(R)

5 (1/ - (R)

6X (R)
7 (1/ X (R)

MOV
MOVB
ADD
SUB
BIC
BICB
BIS
BISB
BIT
BITB
CMP

0/3
0/5
1/5
0/5
1/6

1.20
2.00
2.40
2.00
2.80

117

3.20

2/104.80
1/3 1.602
1/2 1.20 1
1/3 1.60 3
1/3 1.60 3
1/3 1.60 3
112 1.20 3
1/3 1.60 3
1/2 1.20 3
0/2 0.80
0/1

OAO

0/2 0.80

*Format r/m t.tt n (r

=

0/2 0.52
0/3 0.78
1/31.72
0/3 0.78
1/31.72
1/4 1.98
2/6

3A4

317

1/2
1/2
1/2

1.06
1.06
1.06
1.06
1.06

2/103.54
1/3 0.80
1/3 0.80
1/3 0.80
1/3 0.80

112

1/2

1,2
1,2
1
1
1

1/1 0.90
1/3 0.90
2/52.10

0/4
0/4

1.12
1.12

117

2.33

112

0.90

0/4

1.12

2/42.10
2/5 2.10

117
117

2.33
2.33

3.30

1/4 1.80 1
1/4 1.80 1
1/4 1.80 1
1/4 1.80 1
1/4 1.80 1

1121.061

1/41.801

1/2 1.06 1
1/2 1.06 1
0/1 0.26
0/1 0.26
0/1 0.26

1/4
1/4

1.80 1
1.80 1

012

0.60

0/2 0.60
0/2 0.60

number of memory reads or writes, m

=

1/5
1/5

lAO
lAO

0/0 0.0

1

012 0.36
112 1.30

1
1
1
1
1
1
1
1

1/3 0.80
1/3 1.80
0/4 1.12
0/4 112
012 0.56 1

0/1
1/2
1/2
2/4
1/1
1/1
1/1
1/1
1/1
1/1
1/1
1/1
0/1
0/1
0/1

0.18
1.30
1.30
2.60
0.78
0.78
0.78
0.78
0.78
0.78
0.78
0.78
0.18
0.18
0.18

number of microcycles; t.tt

1
1
1
1
1
1
1
1
1

=

0/2 0.34
0/3 0.64

012
012

112
0/2

0.94
OA4

1/2
1/4
2/4
1/3
1/3
1/3
1/4
1/3
1/3
1/3
1/3

0.94
0.84
1.34
0.64
0.64
0.54
0.68
0.54
0.54
054
0.54

1/4 0.6
0/2 0.3
1/5 0.75
1/3

4
4
1,2
1
1,2
1,2
1,2
1,2

0/3 OA8 3
0/3 OA8 3
0/3 OA8 3
time in f.ls; n

=

0.3
0.3

OA5

2/6 0.90
1/0 0.0 1,3
1/2 0.3
112 0.3
112 0.3
112 0.3
112 0.3
112 0.3
112 0.3
0/1 0.151,2
0/1 0.151,2
0/1 0.15 1,2

footnotes number).

0/1

0.17

012
112

0.34
0.34

0/1 0.17
1/3 0.51
112

0.34

2/5

0.85
1.17
1.17
1.17
1.34
117
1.17
1.17
1.17
0.17
0.17
0.17

112

1/2
112

1/3
112

1/2
1/2
112

0/1
0/1
0/1

1,6
4
1.6
1,7
1,6,C
1,6,C
1,6,C
1,6,C

1,B

~

Microcycle
(~s)

LSI-11
0.40

PDP-11/10
0.30

PDP-11/20
0.28

PDP-11/34
.18/.34

PDP-11/40
.14/.20/.30

PDP-11/45
0.15

-u
):>
n

PIDP-11/60
0.17

--l

o"Tl

CMPB
0/1 0.40
XOR
1/3 1.603
CLR (B). COMB
1/3 1.602
COM
1/4 2.00 2
INC. DEC
1/5 2.40 3
INCB. DECB
1/4 2.00 3
ADC
1/5 2.403
ADCB
1/4 2.003
SBC
1/5 2.403
SBCB
1/4 2.003
1/4 2.003
ROL. ASL
ROLB. ASLB
1/3 1.60 3
ROR
1/8 3.603
RORB
1/5 2.403
ASR
1/9 4.003
ASRB
1/8 3.604
TST
0/4 1.60
TSTB
0/3 1.20
NEG
1/4 2.00 2
1/3 1.602
NEGB
SWAB
1/3 1.60 2
SXT
1/62.803
BRANCH
BRANCH (TRU E) 0/4 1.66
BRANCH (FALSE) 0/4 1.60
SOB (TRUE)
0/8 3.20
SOB (FALSE)
0/6 2.40
JUMP
JMP
0/2 0.80
JSR
10/62.80 9
SET/CLEAR CC
0/3 1.20
MARK
1/166.80
RTS
1/6 2.80
RTI
2/156.80 5.6
RTT
2/156.80 5.7
lOT. EMT. TRAP. 2/33 14.80
E
BPT

* Format

PDP-11/04
0.26
0/1

0.26

1/2 1.06 1
1/2 1.06 1
112 1.06 1
1/2 1.06 1
1/2 1.06 1
1/2 1.06 1
1/2 1.06 1
1/2 106 1
1/3 1.32 1
1/3 1.32 1
1/3 1.32 1
1/3 1.32 1
1/3 1.32 1
1/3 1.32 1
0/1 0.26
0/1 0.26
112 1.06 1
112 1.06 1
1/3 1.32 1

012

0.60

1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
1/5 2.10 1
0/3 0.90
0/3 0.90
1/5 2.10 1
1/5 2.10 1
1/123.151,2

012
1/3
1/3
1/3
1/3
1/3
1/3
1/3
1/3
1/3
1/3
1/3
1/3
1/3
1/3
012
0/2

1/3

0.56
0.84
0.84
0.84
0.84
0.84
0.84
0.84
0.84
0.84
0.84
0.84
0.84
0.84
0.84
0.56
0.56

1
1
1
1
1
1
1
1
1.2
1,2
1.2
1.2
1.2
1.2

0.841

0/1
1/1
1/1
1/1
1/1
1/1
1/1
1/1
1/1
1/1
112
1/2
112
112
112
112
0/1
0/1
112
112
1/1
1/1

0.18
0.78
0.78
0.78
0.78
0.78
0.78
0.78
0.78
0.78
0.96
0.96
0.96
0.96
0.96
0.96
0.18
0.18
0.96
0.96
0.78
0.78

0/3
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2

1
1
2
1

1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
0/4
0/4
1/3
1/3
1/3
1/4

0.48 3
0.62
0.62
0.62
0.62
0.62
0.62
0.62
0.62
0.62
0.62
0.84
0.84
0.84
0.84
0.62
0.62
0.54
0.54
0.54
0.62

1.2
1.2
1.2
1.2
1.2
1.2
1.2
1,2
1,2
1.2
5
5
5
5
1,2
1.2
1.2
1.2
1
1.2

0/1
1/2
1/2
112
1/2
1/2
1/2
112
112
1/2
1/2
1/2
1/2
112
112
112
0/1
0/1
1/4
1/4
1/2
1/2

0.15
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.15
0.15
0.6
0.6
0.3
0.3

1.2
1
1
1

1.5
1.5
1.5
1.5
1.2
1.2

4
4
1

0/1 0.17
1/3 1.34
1/3 1.34
1/3 1.34
1/3 1.34
1/3 1.34
1/3 1.34
1/3 1.34
1/4 1.51
1/4 1.51
1/3 1.34
1/3 1.34
1/4 1.51
1/4 1.51
1/5 1.68
1/5 1.68
0/2 0.34
0/2 0.34
1/4 1.51
1/4 1.51
1/5 1.68
1/6 1.85

1.B

7
2.7
2.7
2.7.8
2.7.8
2.7.8
2.7.8
6.8
6.8
2.7.8
2.7.8
6

6
7.9
7.9
2.5
2.5
7.8
7.8
7
7

~

-u

r

m

~
m

Z

--l
):>
--l

(5
Z

o

m

(J)

G)

Z
--l
::0
):>

o

m

o"Tl
"Tl
(J)

o
Z

-u

m

::0
"Tl

o

::0

0/3

0.78
0.0

0/3
0/3

0.90
0.30

0/4
0/0

1.12
0.0

117
012

0.0
2.36
0.52

012
1/9
0/3

0.60
3.30
0.90

0/0 0.0
1/102.80
0/0 0.0

1/5
2/6

2.24
3.44

117
2/9

2.10
2.70

1/6
2/9

5.8

2/126.08 2/136.3

2.05
3.26

2121 6.62

0/3 0.54
0/0 0.0
0/4 0.78
012 0.42

0/3
012
0/5
0/5

0.64
0.28
1.24
.92

0/1 0.15
0/0 0.0 6
0/3 0.45 6
012 0.3 6

0/4 0.68
0/2 0.34
0/101.70
0/7 1.19

0/1
1/5
012
1/8
1/4
2/6
2/6
2/13

012
1/6
0/2
1/6
1/4
2/6
2/6
2/14

0.34
1.48
0.6
1.54
1.28
2.32
2.32
4.18

0/1 0.15
1/5 0.75
0/2 0.15
1/4 0.6 6
1/4 0.6
217 1.05
217 1.05
2/111.65 7

0/1 0.17
1/6 1.85
3.A
0/8 1.19
1/9 1.53
1/4 .68
2/101.70
2/193.23
2/22
540

0.18
1.50
0.36
2.38
1.66
2.96
2.96
5.42

3
3

s:

):>

z
n

m

--l
I

m

-u

o

(J
):>

n

):>

CIl

m

CIl

--l

C

o
-<

rim t.tt n (r = number of memory reads or writes. m = number of microcycles; ttt = time in ~s; n = footnotes number).
W
tTl

to

360

THE PDP-ll FAMI LY

LSI-11 NOTES

Fetch:
All single-operand instructions except
SWAB, SXT, MFPS, and MTPS add 1
,ucycle (+0.400 ,us).
XOR, JMP, RTS, RTI, RTT, set/clear
condition codes add 1 ,ucycle (+0.400 ,us).
SW AB adds 2 ,ucycles (+0.800 ,us).
SXT adds 5 ,uscycles (+2.000 ,us).
BPT, lOT add 6 ,ucycles (+2.400 ,us).
MARK adds 8 ,ucycles (+ 3.200 ,us).

Source:
(I) Byte addressing subtracts 1 ,ucycle (-0.400
,us).
(2) Byte addressing adds 1 ,ucycle ( +0.400 ,us).
(3) If register ¢ R6 or R7, byte addressing
adds 1 ,ucycle (+0.400 ,us).

(7) If new PS has bit 4 set, add 10 ,ucycles
(+4.000 ,us).
(8) If new PS has bit 4 set, add 1 ,ucycle
(+0.400 ,us).
(9) If register not 7, then 1/15 (6.40 ,us).

Times Assumed for All Calculations:
(1) Microcycle time is 0.400 ,us.
(2) Microcycle time is extended by 0.400 ,us
during DATI/DATIP/DATO/DATOB.
(N ote: 1 extra wait ,ucycle is actually generated for each memory access; however,
these ,ucycles have not been tallied in the
micro cycle counts above.)
PDP-11/04 NOTES

Source:
Odd-byte addressing (SM 1-7) adds 2 ,ucycles (+0.520 ,us).

Destination:
For MOV: DMO subtracts 1 ,ucycle (-0.400
,us). DMI-7 subtracts 2 ,ucycles and memory read (-1.200 ,us).
Byte addressing (OM 1-7) subtracts 1 ,ucycle
(-0.400 ,us).
(I) If register = R6 or R 7, byte addressing
adds 2 ,ucycles (+0.800 ,us) additive to the
time noted directly above.

Execute:
(I) DMO adds 1 ,ucycle and subtracts memory
write (+0.000 ,us).
(2) DMO subtracts memory write (-0.400 ,us).
(3) DMO subtracts 1 ,ucycle and memory write
(-0.800 ,us).
(4) DMO subtracts 3 ,ucycles and memory
write (-1.600 ,us).
(5) If new PS has bit 7 clear, add 1 ,ucycle
(+0.400 ,us).
(6) If new PS has bit 4 set, add 9 ,ucycles
(+ 3.600 ,us).

Destination:
Odd-byte addressing (DMl-7) adds 2 ,ucycles (+0.520 ,us).

Execute:
(I) Destination odd-byte addressing (OM 1-7)
adds 2 ,ucycles (+0.520 ,us). DMO subtracts
memory write (-0.540 ,us).
(2) DMO subtracts 1 additional ,ucycle (-0.260
,us).

Times Assumed for All Calculations:
(1) Microcycle time is 0.260 ,us.
(2) Microcycle time is extended by 0.220 ,us by
bus priority arbitration delay during BUT
SERVICE.
(3) Microcycle time is extended by 0.940 ,us
during DATI/DATIP (MOS memory).
(4) Microcycle time is extended by 0.540 I-lS
during DATO/DATOB (MOS memory).

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11, A CASE STUDY

361

PDP-11/10 NOTES

Execute:

Source:

(1) DMO subtracts 1 /J-cycle and memory write
(-0.280 /J-s). PS as destination adds 1 /J-cycie
(+0.280 /J-s).
(2) Odd-byte addressing (D M 1-7) adds 2 /J-cycles (+0.560 /J-s).

Odd-byte addressing (SM 1-7) adds 7 fast
shift (0.150 /J-s/ /J-cyde) and 1 regular /J-cycle
for a total of + 1.350 /J-S.
Destination:

Odd-byte addressing (DMI-7) adds 7 fast
shift (0.150 /J-s/ /J-cycle) and 1 regular /J-cycle
for a total of + 1.350 /J-S.
(I) MOV subtracts 1 /J-cycle (-0.300 /J-s).
Execute:
(I) De.stination odd-byte addressing (D M 1-7)
adds 7 fast shift /J-cycles (0.150 /J-s/ /J-cycle)
for a total of + 1.050 /J-S. DMO subtracts 2
/J-cycles and memory write (-1.200 /J-s).
(2) Byte swap consists of 7 fast shift (0.150
/J-s/ /J-cycle) and 1 regular /J-cycle for a total
of + 1.350 /J-S.

Times Assumed for All Calculations:
(I) M icrocycle time is 0.280 /J-S
(2) Microcycle time is extended by 0.370 /J-S
during DATI.
(3) Microcycle time is extended by 0.270 /J-S
during DATIP.
(4) Microcycle time is extended by 0.000 /J-S
during DATO/DATOB.
PDP-11/34 NOTES

Source:
(1) DMO subtracts 1 /J-cycle (-0.180 /J-s).

Destination:
Times Assumed for All Calculations:
(I) Microcycle time is 0.300 /J-S.
(2) A C K 0 F F follow i n gaD A T I /
DATIP/DATO/DATOB extends /J-cycIe
time by 0.600 /J-S minus 0.300 /J-S for each
/J-cycle that the CKOFF is removed from
the cycle initiating the bus transaction.
PDP-11/20 NOTES

Source:

Odd-byte addressing (SM 1-7) adds 2
lcycles (+0.560 /J-s).

MOV(B) and DMI-7 changes long to short
/J-cycle and subtracts memory read (-1.000
/J- s).
(I) MOV(B) subtracts an additional /J-cycle
(-0. 180 /J-s)
(2) Single-operand instruction except NEG(B)
subtracts 1 /J-cycle (-0.180 /J-s).
Execute:
(1) DMO subtracts memory write and changes

long to short /J-cycle (-0.600 /J-s).
(2) DMO subtracts memory write, changes
long to short /J-cycle, and adds 1 /J-cycle
(-0.420 /J-s).

Destination:

Odd-byte addressing (D M 1-7) adds 2 /J-cydes (+0.560 /J-s).
Non-modifying instruction (CMP(B),
BIT(B), TST(B» adds 0 /J-cycles (+0.100 /J-S
for DATI in place of DATIP).

Times Assumed for All Calculations:
(I) Microcycle times are 0.180 and 0.240 /J-S.
(2) M icrocycle time is extended by 0.150 /J-S by
bus priority arbitration delay during BUT
SERVICE.

362

THE PDP-11 FAMILY

(3) Microcycle time is extended by 0.940 J.Ls
during DATI/DATIP (MOS memory).
(4) Microcycle time is extended by 0.540 J.LS
during DATO/DATOB (MOS memory).
(5) Memory management unit delay is not
included (+0.120 J.Ls/memory cycle when
enabled).
PDP-11/40 NOTES

Source:
Odd-byte addressing (SM 1-7) adds 2 J.Lcycles (+0.340 J.Ls).

Destination:
Odd-byte addressing (DMI-7) adds 2 J.Lcycles (+0.340 J.Ls).
(I) Single-operand instruction or SMO suotracts 0 J.Lcycles (-0.440 J.Ls).

Execute:

(I)
(2)
(3)
(4)

(5)

If (single-operand instruction or SMO and
double-operand instruction except
MOYB), DMO, destination ;t. register 7,
and no service request pending, then next
fetch is overlapped (-I J.Lcycle/-0.640 J.LS
from next fetch).
If DMO, phase takes 3 J.Lcycles and memory
write is not done (0.480 J.Ls).
If odd-byte addressing (OM 1-7), phase
takes 5 J.Lcycles (1.020 J.Ls).
If odd-byte addressing (OM 1-7), phase
takes 5 J.Lcycles (0.820 J.Ls).
If byte instruction and DMI-7, phase takes
4 J.Lcycles (0.880 J.Ls). For DMO: If word instruction, phase takes 2 J.Lcycles (0.340 J.Ls).
If byte instruction, phase takes 4 J.Lcycles
(0.680 J.Ls).
For DMO: If word instruction, phase takes
3 J.Lcycles (0.740 J.Ls). If byte instruction,
phase takes 4 J.Lcycles (0.880 fJ.s). In neither
case is memory write done.

Times Assumed for All Calculations:
(I) Microcycle times are 0.140, 0.200, and
0.300 J.LS.
(2) A CLKOFF following a DA TI/DATIP extends J.Lcycle time by 0.500 J.LS minus sum of
cycle times between DATI/DATIP (exclusive) and CLKOFF (inclusive).
(3) A CLKOFF following a DATO/DATOB
extends J.Lcycle time by 0.200 J.LS minus sum
of cycle times between DATO/DATOB
(exclusive) and CLKOFF (inclusive).
(4) Memory management unit delay is not
included (+0.150 J.Ls/memory cycle when
enabled).

PDP-11/45 NOTES

Fetch:
Execute phase of previous instruction may
be overlapped with fetch. Consult execute
phase note for effect on timing.

Destination:
MOY and DMI-7 subtracts memory read
(-0.000 J.Ls). Odd-byte addressing (DMI-7)
adds I J.Lcycle (+0.150 J.Ls).
(I) Single-operand instruction or SMO subtracts I J.Lcycle (+0.150 J.Ls).

Execute:
(I) For DMO:
If double-operand instruction, destination
;t. register 7, and SMl-7:
If odd-byte addressing, then phase
takes 2 J.Lcycles (0.300 J.Ls), else phase
takes I J.Lcycle (0.150 J.Ls). If no service request is pending, then next
fetch is overlapped (-1
J.Lcycle/-0.150 J.LS from next fetch).

IMPACT OF IMPLEMENTATION DESIGN TRADEOFFS ON PERFORMANCE: THE PDP-11, A CASE STUDY

(2)

(3)

(4)
(5)
(6)

(7)

If double-operand instruction, destination
= register 7, and SMIl-7:
Phase takes 2 ,ucyc1es (0.300 ,us).
Otherwise (single-operand instruction or
SMO):
Phase takes 1 ,ucycie (0.150 ,us). If
destination ~ register 7 and no service request is pending, then next
fetch is overlapped (-2 ,ucycles/-0.300 ,us from next fetch).
No memory write is done.
For DMI-7, if destination fetch is via Fastbus and no service request is pending, then
next instruction fetch is overlapped (-1
,ucycle/ -0.150 ,us from next fetch).
DMI-2 adds 1 ,ucycle (+0.150 ,us).lfno service request is pending, then next fetch is
overlapped (-1 ,ucycle/ -0.150 ,us from next
fetch).
DMO subtracts 2 ,ucycles and memory
write (-0.300 ,us).
Odd-byte addressing adds 1 ,ucycle (+0.150
,us).
If no service request is pending, then next
fetch is overlapped (-1 ,ucyc1e/ -0.150 ,us
from next fetch).
lOT 1.65 ,us, BPT 1.8 ,us.

Times Assumed for All Calculations:
(1) Microcycle time is 0.150 ,us.
(2) Memory access time does not influence microcycle times (bipolar memory).
(3) Memory management unit delay is not
included (+0.090 ,us/memory cycle when
enabled).
PDP-11/60 NOTES

Fetch:
The following instructions take 1 additional ,ucycle (+0.170,us) to decode: XOR,
SW AB, SXT, lSR, set/clear condition

363

codes, MARK, SOB, RTS, RTI, RTT,
lOT, EMT, TRAP, BPT, MFPI(D),
MTPI(D).
Fetch or execute phase of previous instruction may be overlapped with fetch. Consult
execute phase notes for effect on timing.

Source:
For SMI-7: Word instruction except MOV
and DMI-7 adds 1 ,ucycle (+0.170 ,us). Byte
instruction adds 2 ,ucycles (+0.340 ,us).

Destination:
Byte addressing (DM 1-7) adds 2 ,ucycles
(0.340 ,us).
(1) Single-operand instruction except SWAB
or SXT or SMO and double-operand instruction except XOR subtracts 1 ,ucycle
(-0.170 ,us).

Execute:
(1) If SMO, DMO, source ~ register 7, and
destination ~ register 7, then fetch overlap
is attempted. If no service request is pending at conclusion of instruction, then next
fetch is overlapped (-2 ,ucycles/-0.340 ,us
from next fetch); otherwise, add 2 ,ucycles
(+0.340 ,us) to service phase following instruction for PC rollback, add 1 memory
read (+0.000 ,us) to next fetch for instruction refetch.
(2) If DMO and destination ~ register 7, then
fetch overlap is attempted. If no service
request is pending at conclusion of instruction, then next fetch is overlapped (-2 ,ucycles/ -0.340 ,us from next fetch); otherwise,
add 2 ,ucycles (+0.340 ,us) to service phase
following instruction for PC rollback, add
1 memory read (+0.000 ,us) to next fetch for
instruction refetch.

364

THE PDP-ll FAMILY

777542

Writable Control Store Address Register
Writable Control Store
Data Register
Console Switch and Display
Register
Memory Management Status Register 0
Memory Management Status Register 1
Memory Management Status Register 2
User Page Descriptor Registers
User Page Address Registers
Memory System Error Register
Cache Control Register
Cache Hit/Miss Register
CPU Error Register
Microprogram Break Register
Stack Limit Register
Processor Status Word

(3) If no service request is pending, then next
fetch is overlapped (-2 }.icycles/ -0.340 }.is
from next fetch); otherwise, subtract 1
}.icycle (-0.170 }.is) from execute.
(4) For DMO: SMO subtracts memory write
(-0.830 }.is). SM 1-7 subtracts 1 }.icycle and
memory write (-1.000 }.is).
(5) DMO subtracts 1 }.icycle (-0.170 }.is).
(6) DMO subtracts 1 }.icycle and memory write
(-1.000 }.is).
(7) DMO subtracts 2 }.icycles and memory
write (-I .170 }.is).
(8) OM 1-7 and byte addressing adds 1 }.icycle
(+0.170 }.is).
(9) DM 1-7 and byte addressing adds 3 }.icycles
(+0.510 }.is).
(A)DM3, 5-7 adds 1 }.icycle (+0.170 }.is).
(B) SMI-7, DMO, and word addressing adds 1
}.icycle (+0.170 }.is).
(C) SMO, DMI-7, and byte addressing adds 1
}.icycle ( +0.170 }.is).
(D) SMO adds 1 }.icycle (+0.170 }.is).
(E) If new PC odd: Microcontrol transfers to
writable control store if present and instruction timing does not apply; otherwise,
trap sequence continues normally with 3
extra }.icycles (+0.510 }.is).

Times Assumed for All Calculations:

Accessing the following internal addresses invokes microcode which adds additional microcycles in all phases:
772300-16
Kernel Page Descriptor
Registers
772340-56
Kernel Page Address Registers
777540
Writable Control Store Status Register

(1) M icrocycle time is 0.170 }.is.
(2) Microcycle time is extended by 0.000 }.is
during DATI/DATIP with cache hit (all
tabulated times assume cache hit on read).
(3) Microcycle time is extended by 1.075 }.is
during DATI/DATIP with cache miss.
(4) Microcycle time is extended by 0.830 }.is
during DATO/DATOB.
(5) Memory Management unit adds no delay
when enabled.

777544
777570
777572
777574
777576
777600-16
777640-56
777744
777746
777752
777766
777770
777774
777776

Turning Cousins into Sisters:
An Example of Software Smoothing
of Hardware Differences
RONALD F. BRENDER

INTRODUCTION

In 1970, the PDP-ll was Digital Equipment
Corporation's newly announced minicomputer
and its first offering in the 16-bit world. Among
the many software components needed to complement the hardware, a FORTRAN system
was high on the list. A FORTRAN project was
begun in 1970 and the first release of the resulting product took place in mid-1971. In the succeeding years, the number of PDP-II CPUs and
related options increased dramatically to provide a wide range of price/performance alternatives. What makes the original FORTRAN
interesting, even today, is the extent to which
the basic implementation approach was able to
be extended gracefully to span the entire family
with modest incremental effort.
This paper describes the design concepts,
threaded code and a FORTRAN virtual machine, used to implement the original PDP-II
FORTRAN product. As the PDP-ll family of
processors expanded with new models and options, these original design concepts proved
both stable enough and flexible enough to be
employed successfully across the entire family.

When this FORTRAN was finally superseded in early 1975, it had two successors. One,
called FORTRAN IV, continued the threaded
code and virtual machine concepts of the earlier
product with similar execution performance
across the PDP-II family, but offered much faster compilation rates in smaller memory. The
other successor, called FORTRAN IV-PLUS,
produced direct PDP-II code and obtained significantly improved execution performance for
the PDP-ll/45, PDP-II/70, and PDP-II/60
with FPII floating-point hardware relative to
both of the other FORTRANs.
In the Beginning

The PDP-II/20 was a significant advance
over other minicomputers of its time, but was a
bare machine architecture by today's standards.
There was no floating-point hardware of any
kind (even as an option) and integer multiply
and divide operations were available only by
means of an I/O bus option, the Extended
Arithmetic Element (EAE). (The EAE also provided multiple-bit arithmetic shift operations;
365

366

THE PDP-11 FAMILY

the PO P-ll /20 instructions provided only
single-bit shifts.)
The first disk-based operating system, DOS,
was designed for a minimum standard system
that included 8 Kwords (16 Kbytes) of memory.
After allowing typically 2 Kwords for the resident parts of the monitor, only 5 K to 6 K remained for other use. Consequently, size
constraints played a major role in the FORTRAN system design and implementation,
There were not many competitors at the time,
but at least one, the IBM 1130, offered a diskbased operating system and FORTRAN system. To meet this competition, an important
goal was to deliver the PDP-II FORTRAN system to the market as quickly as possible, even at
the cost of performance, if necessary.
Neither Compiler nor Interpreter. but
Threaded Code

The fundamental design strategy to be determined was the structure of the executing code,
the "run-time environment" [DEC, 1974b;
DEC, 1974c].
We were leery of a compiler that generated
direct machine code primarily because of the
size of compiled code. Much of the compiled
code would necessarily consist of calls to floating-point and other support routines, and on
the PDP-II, each subroutine call required two
words of memory, not counting argument
transmission.
An interpreter would easily solve the space
problem, but this had its own disadvantages.
The basic interpreter loop overhead was a concern, but not crucial at that stage in our deliber-

ations. However, a disadvantage of interpreters
is that they must be "always present" even
though not all of the capabilities are being used.
For example, routines for complex arithmetic
are part of the interpreter even though the particular program in use does not perform complex arithmetic. Further, we wanted to maintain
the traditional FORTRAN features of independent compilation and linking of routines,
and easy writing of routines in assem bIer for inclusion in the program.
The solution was threaded code [Bell, J.,
1973]. Threaded code is a kind of combination
of an interpreter and compiled code with most
of the best features of each. On the PDP-II it
works in the following way.
The "compiled code" consists simply of a sequence of service routine addresses. A single
register (we used R4) is chosen to contain a
pointer to the next address in the sequence to be
invoked. Each service routine completes by
transferring control to the next routine in the
sequence and simultaneously advancing the
pointer.
To illustrate, consider a service routine whose
purpose is to perform floating-point addition of
two real values found in a stack (we used R6,
the hardware stack pointer, for the value stack)
and leave the result on the top of the stack in
place of the parameters. The service routine
would look like the following. *
$ADR: <  >
JMP@(R4)+
The JMP instruction with deferred autoincrement addressing mode provides just the

*The brackets < < and> > are used in examples in place of code to indicate the purpose of code that is too bulky and/or not
relevant for the example.
In the PDP-II MACRO assembler language [DEC, 1976], identifiers may consist of up to six characters from among the
letters, numerals, "." and "$". Identifiers created by the FORTRAN compiler include either a period or dollar sign to
assure that they are distinct from FORTRAN language identifiers.
In the PDP-II MACRO assembler language, a colon follows a label and separates the label from assembler instructions.

TURNING COUSINS INTO SISTERS

combination needed to sequence through the
table of addresses. It is a single one-word instruction.
The instruction corresponds to the basic loop
of an interpreter. Consequently, there is no centralized interpreter: the interpreter is distributed
throughout everyone of the service routines.
Arguments to a service routine can also be
placed in-line following the routine address.
The routine picks up the arguments using the
pointer register, each time advancing the
pointer for the next use. For this, both the autoincrement and deferred auto-increment addressing modes are ideal.
For example, the following service routine
copies onto the stack the value of an integer
variable whose address follows the call:
$PUSHV: MOV @(R4)+, -(SP)
JMP@(R4)+
Similarly, the following routine pops a value
from the stack and stores it in the variable
whose address follows the call:
$POPV:

MOV (SP)+ ,@(R4)+
JMP@(R4)+

Using the two primitives $PUSHV and
$POPV, the FORTRAN assignment statement:
I

=J

can be implemented by "compiling" code as
follows:*
$PUSHV ; Address of$PUSHV routine
J
; Address of storage for J
$POPV
; Address of$POPV routine
I
; Address of storage for I

367

The principal disadvantage of a normal interpreter is avoided by representing the address of
a service routine in symbolic fashion as the
name of a module to be obtained from a library
of routines. Only those routines that are actually referred to are included in the program
when it is linked for execution.
We complete this introduction by briefly illustrating how flow of control and changing
modes is accomplished.
A simple transfer of control, e.g., the FORTRAN statement:
GOTO 100
can be compiled to:
$GOTO,.100
using the service routine:
$GOTO:

MOV
JMP

(R4),R4
@(R4)+

The implementation of the FORTRANcomputed GOTO statement is illustrated in
Figure 1. Notice that the count of the number
of labels is included in the arguments to the service routine. The service routine checks that the
index value is in the correct range; if it is not, an
error is reported and control continues in-line
(no transfer takes place). In this example, register 1 (R 1) is used as a temporary location within
the service routine.
To enter threaded code mode when executing
normal code, the following call is executed:
JSR R4,$POLSH

*In subsequent examples, the arguments of a service routine will be written on the same line as the routine address. Th us, the
above would appear as:
$PUSHV,J
$POPV,I
This is more compact and suggestive of conventional assembler notations; the effect is identical to the previous example.

368

THE PDP-ll FAMI LY

FORTRAN SOURCE
GOTO

(100.200.300) I

100
200
300
THREADED CODE
$CGOTO.1.3 .. 100.. 200 •. 300
.100:
.200:
.300:
COMPUTED GOTO SERVICE ROUTINE
$CGOTO:

1$:

Fetch value of index
Error if less or equal zerO
Compare with label count
Error if greater
-2 for word offset
Pointer to target label
Fetch target label
Continue ...

MOV

@(R4)+.Rl

BlE
CMP
BGT
ASl
ADD
MOV
JMP

1$
Rl.(R4)
1$
Rl
Rl.R4
(R4).R4
@(R4)+

ERROR
MOV
ASl
ADD
JMP

"Computed GOTO value out of bounds"
(R4i+.Rl
; Fetch label count. adjust R4
Rl
; - 2 for word offset
; Pointer to next in line
Rl.R4
@(R4)+
; Continue ...

Figure 1. Threaded
GOTO statement.

code for

FORTRAN-computed

Threaded mode begins immediately followthis call. The service routine is:

~ng

$POLSH: MOV
JMP

(SP)+,R4
@(R4)+

Leaving threaded mode requires no service
routine at all; the operator is simply the address
of the immediately following word of memory.

execution of the machine, apply the machine to
another purpose, and later resume the original
execution as though the interruption had not
occurred. In this sense, the state clearly includes
the stack pointer (SP) register and the program
counter (R4) register as well as the memory regions occupied by the program, variables, and
values on the stack. In the actual implementation, some virtual machine instructions also left
values in general register 0 (RO) or in the processor condition codes for use by the subsequent virtual machine instruction. Thus, these
values must also be considered part of the virtual machine state. However, the remaining
general registers of the PDP-II are not part of
the state even though they are used freely by
individual instructions to hold temporary values during the execution of a single virtual instruction, as illustrated in Figure 1.
This FORTRAN machine went through two
phases of development. In the first phase, the
virtual machine specification did not change;
rather, the implementation was broadened to
take advantage of newer models of the PDP-II
family. Increased performance was achieved
through improved performance of the new
CPU and the floating-point hardware options.
In the second phase, the virtual machine specification itself was extended to achieve greater
performance across all of the PO P-II family
processors.

A Virtual Machine

By now it should be apparent that we have
the beginning of a FORTRAN virtual machine.
Instructions in this machine language are encoded as the addresses of the service routines.
The PDP-II instruction set provides the
pseudo-microinstruction set used to emulate the
FORTRAN machine. Register 4 (R4) is the virtual program counter.
For a complete characterization of a virtual
machine, it is necessary to identify the complete
state of the machine, that is, all of the values
that must be preserved in order to interrupt the

FORTRAN MACHINE - PHASE 1

The introduction described the basic technique, threaded code, by which it was possible
to produce a FORTRAN processor for the first
PDP-II processor, the PDP-I 1/20. This section
focuses on the design of the FORTRAN virtual
machine proper and how it was implemented
across the range of PDP-II CPUs.
The major part of the FORTRAN virtual
machine was relatively ad hoc in form, more or
less closely following the form of the FORTRA N language. The previous example of the

TURNING COUSINS INTO SISTERS

computed GOTO statement is representative of
the approaches taken. This correspondence between the language and the virtual machine
greatly simplified the compiler. Variations in
the order of arguments and/ or the introduction
of extra arguments (such as the label list count)
were made to aid the speed and/or the error
checking capability of the supporting service
routines.
One part of the machine had a more regular
structure - assignment statements and expression evaluation. We will focus our attention on
this part of the machine because this is where
the majority of FORTRAN execution time is
spent.
Many details of the machine are easily
sketched. It was a stack-oriented machine - values were pushed onto the stack, and operators
took their operands from the stack and replaced
them with the result. The hardware stack
pointer (SP) was used to control the value stack.
Consideration was given to using the PDP-II
general registers as fast top-of-stack locations.
However, this was rejected because it violated
the inherent simplicity of the pure stack model
and because analysis showed that the extra
overhead of managing these locations substantially eliminated any benefits.
Naming conventions were adopted for the
operators as a mnemonic convenience. The
arithmetic operators were named as illustrated
in Figure 2. For example, $ADR designated the
routine to add two single-precision (real) operands, while $ADC designated the routine to
add two complex operands, and so on.
Throughout this design process the size of the
generated code continued to be the most important factor. This led to the most unusual aspect
of the machine design.
To push a value onto the stack required two
words: one for the push instruction and one for

FORM:

369

Ssot

WHERE

AD
SB
ML
DV

0

t

PW

For
For
For
For
For

addition
subtraction
multiplication
division
exponentiation (raising to a power)

o
C

For
For
For
For
For
For

byte data
logical data
integer data
real data
double-precision data
complex data

=

NOTE:

"SPW" has a 2-letter suffix. The first indicates the base data-type,
the second the exponent data-type.

Figure 2.

FORTRAN Phase 1 arithmetic instructions.

the address of the variable. To reduce this to a
single word, the compiler produced a service
routine for each variable that would push the
value of the variable onto the stack. Such a routine was called a push routine. In this way, the
compiler reduced the size of the compiled code
by producing specialized service routines that
complemented the general service routines obtained from the FORTRAN library.
For example,. the push routine for an integer
variable, I, would be:
$P.I:

MOV
JMP

I,-(SP)
@(R4)+

The push routine for a complex variable, C,
would be:*
$P.C:

MOV
MOV
MOV
MOV
MOV
JMP

#C+8,RO
-(RO),-(SP)
-(RO),-(SP)
-(RO),-(SP)
-(RO),-(SP)
@(R$)+

Of course, each push routine itself took
space: three words ·for an integer variable and
five words for a real variable. Consequently, the

*Note that since the stack of the PDP-II grows downward in memory, values must be copied from high address toward low
address to obtain a correct copy on the stack.

370

THE PDP-ll FAMILY

breakeven point was three uses for an integer
variable and five uses for a real variable.
Three uses of an integer variable were
deemed likely to be achieved in most programs,
especially in larger and more complex programs
where space would be most critical. The five
uses for a real variable were reduced by some
complex merging of code for multiple push routines for real, complex, and double-precision
variables. The compiler also maintained a bit in
the symbol table entry for each variable indicating that a push routine was actually
needed. (It is fairly common for a particular
subroutine to reference only a few variables out
of a large COMMON block.)
Pop routines for each variable were also considered, but rejected. There are typically more
uses of a variable's value than assignments of
new values. Consequently, the breakeven point
is less likely to be consistently achieved. Instead, general pop routines for each data-type
(actually, each size of data value - 1, 2,4, or 8
bytes) were used.
Figure 3 presents a complete example of the
compiled code produced by the compiler for
two sample assignment statements. The figure
includes push routines automatically generated
by the compiler, as well as the allocation of
storage for the variables of the program. All
service routines not shown are obtained from
the FORTRAN library when the program is
linked for execution.
It should be apparent from this figure that
the compiled code corresponds to the wellknown Polish postfix notation, which is a rearrangement of expression information suitable
for stack evaluation disciplines.
The Virtual Machine Across the PDP-11
Family

Even as the FORTRAN system was in its
early development phase, new models of the
PDP-II family were under development by the

hardware groups. The next in line was the PDP11/45 with a floating-point hardware option.
How could the software development group
that had just produced a FORTRAN tailored
for an 8 K PDP-ll/20 without even integer
multiply /divide instructions respond with another FORTRAN for the high-performance

FORTRAN SOURCE
K=K+l
X2 = IA - lB· ·2-4.·A·C))/(2.·AI
END
THREADED CODE
SSTART:

JSR R4.SPOLSH
$P.K
$P.l
SADI
SPOPI.K
$P.A
SP.B
SP.2
SPWRI
SP.4.
SP.A
SMLR
SP.C
$MLR
SSBR
SSBR
SP.2.
SP.A
SMLR
SDVR
$POP2.X2

:
:
:
:

Push K
Push 1
Add integer giving K + 1
Pop to K

: Push A
: Push B
: Push 2
Push 4.
Push A
4.·A
Push C
4.·A·C
B' ·2-4.·A·C
lA-lB' ·2-4.·A·C))
Push 2.
Push A
2.·A
: I ... 1/12.·A)
: Pop to X2
:
:
:
:
:
:
:
:
:
:

; PUSH ROUTINES
SP.K;
SP.l;
SP.A:
SP.B:
SP.2:
SP.4.:
SP.C:
SP.2.·
SF'

MOV
JMP
MOV
JMP
MOV
BR
MOV
BR
MOV
JMP
MOV
BR
MOV
BR
MOV
MOV
MOV
JMP

K.-ISPI
@IR41+
#l.-ISPI
@IR41+
#A+4.RO
SF
#B+4.RO
SF
#2.-ISPI
@IR41+
#SR.4 .• RO
$F
#C+4.RO
SF
#SR.2+4.RO
-IROI.-ISPI
-IROI.-ISPI
@IR41+

: Shared code for pushing
: the values of A. B. C and
: the constants 2. and 4.

; STORAGE ALLOCATION
K;
A:
B:
SR.4.:
SR.2.:

Figure 3.

.BLKW
.BLKW
.BLKW
.FLT2
.BlKW
.FLT2

4.
2
2.

.END

SSTART

Example of code generation.

TURNING COUSINS INTO SISTERS

PDP-II /45 with optional hardware floating
point? Fortunately, the virtual FORTRAN machine approach made it relatively easy. All that
was needed was to re-implement the virtual machine using the new and more extensive "microcode." The compiler did not even have to be
changed at all! How this was accomplished is
discussed below.
The PDP-I 1/20, with its EAE option, required two implementations of the virtual machine. The PDP-II /45 added two more: one for
the floating-point option and another because it
added instructions for integer multiply/divide
and multiple bit shifting as part of the standard
instruction set. *
Later the PDP-II /40 added a fifth variation
for its Floating Instruction Set (FIS) option. t
By the time we were done, there were five versions of the FORTRAN machine which corresponded to the family processors as follows:

1.

2.

Basic

EAE

PDP-I 1/20, PDP-I 1/40

PD P-II /20 with EAE, PDP11/40 with EAE
Integer multiply/divide

3.

EIS

PDP-II/40 with EIS, PDP11/45
Integer multiply/divide

4.

FIS

5.

FPll

371

PDP-l 1/45 with FPll
Integer multiply/divide and
single/double precision floating
point

Later processors (PDP-li/70, lij60, lij34,
11/05, 11/04, and LSI-II) have all matched one
of these five categories.
Figure 4 illustrates the general logical structure of a typical floating-point service routine.
As presented in this logically extreme form, it
consisted of five completely independent implementations. They were combined in a single
source file to help manage and minimize the
proliferation of files. (This also significantly

$ADR:

.IF NDF EAE'EIS!FIS!FPP
<  >
.ENDC
.IF DF EAE
<  >
.ENDC
.IF DF EIS
<  >
.ENDC
.IF DF FIS
<  >
.ENDC
.IF DF FPP
<  >
.ENDC
.END

NOTE:
In the POP-" MACRO assembler language •. "IF" introduces a sequence of statements (instructions) that
are included in a given assembly only if a specified
condition is satisfied. The statement. ".ENOC" terminates the sequence. Also, conditional sequences can
be tested within other conditional sequences. as illustrated in other figures. In this figure. the condition.
"OF EAE" is satisfied if the name EAE has a defined
value. "OF EIS" is satisfied if EIS is defined, and so
on. The condition. "N OF EAE !EIS !. .... is satisfied if
none of the given names has a defined value.

PD P-II /40 with EIS and FIS
Integer multiply/divide and
single-precision floating point

Figure 4. General logical structure of
conditionalized FORTRAN operator routine.

*These Extended Instruction Set (EIS) operations were similar in function to t?e ca~ability ~f~~e ~A~, but were an integr~1
part of the instruction set instead of an I/O bus add-on_ This was more effiCIent smce the InItIalIzatIOn necessary to begm
execution of these functions was less_
tOn the PDP-II/40, the EIS instructions were an option also.

372

THE PDP-11 FAMI LY

aided maintenance.) This one file would be assembled five times, each time with a different
conditional assembly parameter, to produce the
five different object files that implemented the
same operation on the different systems.
In practice, the separation of implementations was not as complete as shown. Some instructions, such as the computed GOTO,
remained independent of the hardware configuration. Generally, the EIS and EAE versions were localized variations of the basic (no
option) implementation, while the FPII and
FIS versions tended to be totally distinct.
A more representative illustration of the kind
of conditionalization used is shown in Figure 5.
Notice that the conditional use of EIS or EAE
operations is nested within an outer conditionalization for neither FIS nor FPII. The FIS
and FP II versions are distinct.
The FORTRAN Machine and the
PDP-11/40 EIS

Because of the incompatibility in operand addressing capability between the FPII and FIS,
the FIS option of the PDP-I 1/40 seems at best
an architectural curiosity and at worst an unfathomable aberration. In a broader perspective, however, it was an excellent
compromise between goals and constraints for
the combined hardware and software system at
the time it was introduced.
The marketing requirement was simple.
There must be at least a single-precision floating-point option for the PDP-II /40 to maintain
competitive FORTRAN performance and it
must sell for no more ~an a given (relatively
low) price. The cost constraint, combined with
other engineering factors, precluded the implementation of even a simple subset of the FPII
instruction set.
Consultation between the hardware and software engineers led to the resulting Floating Instruction Set. The FIS provided four singleprecision floating-point instructions (add, sub-

$ADR:

.IF NDF FIS!FPP

< < basic implementation> >
IF OF EAE
«EAE variation»
ENOC
.IF OF EIS
«EIS variation»
ENOC
.IF NOF EIS!EAE

<  >
.ENOC

<  >
.ENOC :NOF FIS!FPP
.IF OF FIS
FAOO SP
JMP i1j:IR41+
.ENOC :OF FIS
.IF OF FPP
SETF
LOF
ISPI+.FO
AOOF
ISPI+.FO
STF
FO.-ISPI
JMP
@IR41+
.ENOC :OF FPP
.ENO

Figure 5.

Partial detail of implementation of $ADR.

tract, multiply, and divide) which corresponded
exactly with the FORTRAN virtual machine
requirements. As seen in Figure 5, the FIS version of the FORTRAN $ADR service routine
consists of just two single-word instructions
(compared to the FPII variant that occupies
five words).
The FIS option for the PDP-ll/40 accomplished everything that it was supposed to accomplish.
FORTRAN MACHINE - PHASE 2

While the FORTRAN product successfully
"supported" the full range of the PDP-II family, the design tradeoffs made for the original
and low end of the family were not valid at the
high end. Benchmark competition of FORTRAN on the PDP-I 1/45 with FPll became
significant even though the underlying hardware was the fastest available by clear margins.
The reason is easy to understand. The FORTRAN virtual machine and its implementation
did not fully exploit the hardware capability.

TURNING COUSINS INTO SISTERS

To illustrate, consider the execution of the
statement, I = 1+ 1, as shown in Figure 3. This
statement compiled to five words of threaded
code (not counting the overhead of service or
push routines), and required 18 memory cycles
to execute. In conrast, the singie PDP-II instruction, INC I, would obtain the same effect
with only two words of code and three memory
cycles to execute. Similar overheads existed for
floating-point operations. As shown in Figure
5, the basic arithmetic operators had to copy
their operands from the stack into the FPll registers to do the operation, and then immediately
return the result to the stack.
On the PDP-II /20, integer execution times of
20 microseconds instead of 4 microseconds did
not matter much when floating-point times
where typically 300 to 1000 microseconds.
However, with FPII times under 10 microseconds for these operations, the tradeoffs are
much different.
Since the existing compiler was based totally
on the threaded code implementation, a complete new compiler that generated direct PDP11 code would be needed to fully exploit the
hardware potential. In the meantime, something was needed to immediately improve performance and relieve the competitive pressure.
That something was provided, not by discarding threaded code, but by extending the
FORTRAN virtual machine architecture. The
extension devised was based on a combination
of systematic and ad hoc pragmatic considerations.
The primary considerations were to:
1.

Focus attention on operations for integer, real, and double-precision datatypes. Logical and complex data-types
do not occur frequently enough to merit
much concern [Knuth, 1971].

2.

Limit the impact on the compiler to as
small a portion as possible to limit the
programming effort. Fortunately, ex-

373

pression handling and assignment statements were well modularized in the
implementation.
Addressing Modes

The principal concept that formed the basis
of the extended machine was the recognition
that operands could be in any of a number of
locations and that arithmetic operators should
be able to take operands from any of them and
deliver the result to any of them, instead of just
the stack. The principal locations identified
were:
• The stack.
• In memory at an address given as a parameter.
• In memory at an address given in RO as a
result of an array subscripting operation.
Other "locations" were formalized for particular groups of operators as will be seen later.
Conceptually, these locations became addressing modes associated with each operator.
However, any kind of decoding of addressing
modes during execution would destroy the performance objective. Consequently, each combination of operator and addressing modes was
implemented by a unique threaded service routine.
At this point, a new consideration came into
play. Not only would each routine take some
memory, but the number of global symbols that
must be handled by the linking loader would
rise dramatically. (The system linking loader
maintained its global symbol table in free main
memory; hence, the number of symbols that
could be handled was limited by main memory
size. Fortunately, the minimum system main
memory requirement had independently increased from 8 K words to 12 K words; otherwise, the approach would not have been
acceptable.) The above three modes for each of
three operand locations for each of the four

374

THE PDP-11 FAMILY

basic operations for each of the three important
data-types required 3 * 3 * 3 * 4 * 3 or 324 new
service routines. Care would be needed to keep
this explosive cross-product in bounds.
The memory size increase was offset by the
fact that in many cases the push routines of a
variable were no longer needed. This can be appreciated better by looking at some examples.

FORM; $sbXz. sarg. barg
WHERE

s

=

C

If subscript is in memory (core) and directly
addressable (i.e .. not a
parameter or array element)

If subscript is pOinted at
by RO at execution time
If subscript on execu-

tion stack
If subscript is a parameter

If subscript is contents
of RO (i.e.. results of
function call)

The Extended Machine

Figures 6 through 11 detail most of the extended machine and give numerous sample
code sequences.
There were three principal groups of extended operations dealing with one-dimensional array sUbscript calculation, arithmetic
operations, and general data movement. Once
again, naming conventions were used for mnemonic aids. Generally, the first two or three letters (after the "$") designated an addressing
mode, the next letter designated the kind of operation and the final letter designated the datatype. For example, the $ADR routine used in
previous figures acquired the name $SSSAR in
this new scheme.
As an example, consider the FORTRAN
statement:

If array is not a parame"
ter
A

If array is a parameter

1.2.4,8

The array element size
in bytes

sarg

Argument address if s = C
Argument list offset if s
P
N at pn~sent otherwise

barg

Array address minus element size
if b = C
Address of array descriptor block
(ADB) if b
A

=

=

SPECIAL CASES
SCCXO. address

Is generated when the subscript is a constant and the array is not a FORTRAN
dummy argument. The final address is

computed at compile time and is the argument.

SKAXO. scaled-constant, adb-address

is generated when the subscript is a constant and the array is a FORTRAN dummy

argument; the constant subscript is converted to a byte offset at compile time.

Figure 6. One-dimensional array
subscripting instructions.

I=J+K+L
This would be compiled to:
$CCSAI,J,K
$SCCAI,L,I

; AddJ,K and
; put result on stack
; Add stack,L and
; put result in I

The PDP-II code for these service routines is:
$CCSAI:

$SCCAI:

MOV
ADD
JMP

@(R4)+,-(SP)
@(R4)+,@SP
@(R4)+

ADD
MOV
JMP

@(R4)+,@SP
(SP)+ ,@(R4)+
@(R4)+

ASSUME
SUBROUTINE SUBIA.I!
DIMENSION AI10}. BI10). MilO}
FORTRAN
SOURCE

COMPILED CODE

BIJ}

SCCX4.J.B-4

BII}

SPCX4.4.B-4

B15}

SCCXO,B+20

A15}

SKAXO.20.SA.A

BIMI2}}

SCCXO.M+2
SRCX4.B-4

NOTE;
$A.A is the address of an array descriptor block for A.

Figure 7_ Example of subscripting
operations.

TURNING COUSINS INTO SISTERS

FORM: S1 rdot. larg. rarg. darg
Where

1

=

C

Move instructions are two address instructions. Data of
any type may be moved.

If argument is in memory (core)
and directly addressable (i.e .. not
a parameter or array element)

FORM: SsdVt. sarg. darg

If argument is pointed to by RO at

Where

execution time (Le., as the result
of a subscripting operation)

s

=

C

!f argument :5 cc~tae~ed on the
execution stack (SP)

If argument is in memory (core)
and directly addressable
!f argument address in RC at
cution time

If D (destination) is C and is the
same argument

8;;;:8-

If argument on stack

(As above)

G

(As above)
(As above)

If argument contained in RO-R3
(as result of function call)

If argument is integer constant

If argument is in core, directly ad-

If argument is integer constant 1

dressable. and an integer constant
(i.e .. special case of C)

C

If argument is integer constant 1
(i.e .. special case of K)

(As above)
(As above)

(As above)
t

(As above)

=

B

If result is to be placed on execution stack

A

For logical data
For integer data

For addition
For subtraction

M

For byte data

For real data

For multiplication

D

For double-precision data

For division

For complex data

For integer data

For real data
sarg. darg

Argument address if address mode

=

For double-precision data

Constant value if address mode

= K

larg. rarg. darg
Argument address if addressing mode

Constant value if addressing mode

Not present otherwise

=C

=K

Not present otherwise

Figure 10
Figure 8.

Move instructions.

Arithmetic instructions.

ASSUME

ASSUME

OIMENSION L(10)
DIMENSION ARRAY (101
FORTRAN SOU RCE

COMPILED CODE

A=B+C

SCCCAR.B.C.A

A=B+C-D

SCCSMR.C.D
SCSCAR.B.A

+5

SCKCAI.J.5.1

1= J

1=1-5

SDKCSI.5.1

J=J+1

SD1CAI.J

L(J + 1) = J + 2

SC1SAI.J
SSCX2.L-2
SCKRAI.J.2

1= L(I)

+2

SCCX2.I.L-2
SRKCAI.2.1

Figure 9. Example of arithmetic
operations.

FORTRAN SOURCE

COMPILED CODE

A=B

SCCVR.B.A

1=1

S1CVI.1

B = ARRAY(J)

SCCX4.J.ARRAY-4
SRCVR.B

ARRAY(1) = ARRAY(l+1)

SC1SAI.I
SSCX4.ARRAY-4
SGET3
SCCXO.ARRAY+O
SSRVR

Figure 11. Example of move
instructions.

C

375

376

THE PDP-ll FAMILY

Notice that no push routines are needed for any
of the variables.
All subscripting operations resulted in the address of the array element being left in RO at
execution time. Only one-dimensional arrays
were handled. Two- and three-dimensional arrays continued to be handled as in the more
general Phase 1 implementation.
These forms can occur on both left- and
right-handed sides of assignment statements.
The arithmetic instructions are three address
instructions, taking two arguments and putting
the result in a designated place. These instructions are limited to +, -, *, / on integer, real,
and double-precision data.
Ad Hoc Special Cases

Within this general framework, a number of
additional ad hoc addressing modes were incorporated.
For each of the arithmetic operators and each
of the three data-types, the first operand addressing mode could be given as D to designate
that it was the same as the destination core address and the destination parameter was eliminated. This was not done for the second operand based on the simple observation that programmers will almost always write assignments
as:
A = A

+ ...

instead of:

A

= ... + A

This added 12 more service routines.
For the integer operators only, the second
operand could be given as K to designate that it
was a constant given as the parameter instead of
the address of the value. This was not done for
the first operand for reasons similar to the case
above.

For integer add and subtract operators only,
the second operand could be given as I to designate that it is the constant value 1 and no parameter is present. This is simply a frequent
special case of the previous use of K.
By combining the above, the FORTRAN
statement:
K=K+I
is compiled to:
$DICAI,K
where the service routine is simply:
$DICAI:

INC
JMP

@(R4)+
@(R4)+

This code occupies two words and requires
five memory cycles to execute. This is not quite
as good as the two words and three cycles
needed for direct PDP-II code, but far better
than the five words and 18 cycles required by
the earlier implementation.
General Results

Execution improvement varied, of course,
with the particular programs used. Over a large
set of programs, the following guidelines were
obtained.
• Programs that were floating-point intensive increased in speed by factors of 1.1
to 1.6, with 1.3 being representative.
• Programs that were integer intensive increased in speed by factors of 1.4 to 2.4,
with 2.0 being representative. (One particularly simple benchmark increased in
speed by a factor of 4!)
Moreover, because of the reduced need for push
routines, most programs increased in size by
less than 10 percent.

TURNING COUSINS INTO SISTERS

377

The improvement for integer operations was
better than for floating-point operations for
several reasons. Integer operations were more
easily "optimized" because they took place in
the basic CPU general registers. The FPll has a
separate set of floating-point registers, and
floating-point computations must be performed
only in those registers. Also, the FPll operates
in either single-precision or double-precision
mode depending on a status bit; the compiler
implementation was not suitable for tracking
the state of this bit and, hence, each floatingpoint operation continued to bear the overhead
of reestablishing the state as needed by that operation. (This is the purpose of the SETF instruction shown in Figure 5.)
The performance improvements of the Phase
2 system with its extended virtual machine were
obtained with a design, development, and testing effort of about three man-months. For that
effort, PDP-II FORTRAN regained a strong
competitive position that held reasonably well
until FORTRAN IV-PLUS, an optimizing
PDP-II code-generating system, replaced it 18
months later (in early 1975).

abie control store microprogramming option
[DEC, I977a]. But, while the analysis showed
that a significant improvement could be obtained, the result, at best, would be comparable
to the performance already achieved by the
FORTRAN iV-PLUS product. Consequentiy,
it was not done.
The analysis proceeded along the foHowing
lines. Execution time was considered in three
categories: instruction fetch and decode, operand fetch and/or store, and execution time
propei. Since the analysis is a comparison of
different FORTRAN implementations for a
given machine, the basic execution times are assumed to be the same and neglected. The resulting comparison, thus, shows the number of
words of memory and the number of memory
cycles for each implementation.
For this presentation we shall consider the
following two FORTRAN statements as reasonably representative of FORTRAN as a
whole.

REAL MICROCODE AND THE FORTRAN
MACHINE

F or these statements, the size and memory
cycles are easily determined by examination of
the code generated by FORTRAN and FORTRAN IV-PLUS, respectively. These values are
shown in Table 1.
For the hypothesized micro-thread implementation, the code size is unchanged from
FORTRAN, while the memory cycle count is

Clearly, the FORTRAN virtual machine described above could be implemented in "real"
microcode instead of the PDP-II instruction
set. This was considered during the design planning for the PDP-II /60 which features a writ-

I=J*K+L
A(I) = B(J) + 4

Table 1. Comparison of Size and Time Requirements of Sample Statements with
Different Implementation Techniques

I=J*K+L

A(I)

=

B(J) + 4

Technique

Size

Time

Size

Time

PDP-11 threads
FORTRAN IV-PLUS
Micro-threads
Model

6 words
8 words
6 words
7 words

20 cycles
12 cycles
12 cycles
11 cycles

9 words
14 words
9 words
9 words

38
20
22
17

cycles
cycles
cycles
cycles

378

THE PDP-11 FAMILY

reduced by eliminating the instruction fetches
that occur in the service routines. These results
are also shown in the table. Comparison of the
results shows that the micro-thread implementation is faster (as expected), but also that its
speed is no better than that of FORTRAN IVPLUS. Could this be coincidence or is there reason to believe these results should be obtained?
To answer this, we formulated a simple intuitive model for the expected size and speed of
code on an idealized FORTRAN machine. To
estimate the code size:
• Count one unit for each variable that is
referenced (e.g., A(I) counts as two).
• Count one unit for each operation performed (e.g., assignment or subscripting
are unit operations).
To estimate the memory cycles for execution:
• Count one unit for each variable that is
referenced.
• Count one unit for each operation performed.
• Count one, two, or four units for each
value fetch or store operation depending
on the size of the data.
This very simple model is appropriate only
for compilers that produce code based only on
isolated source information, which is true of the
original FORTRAN. Optimizing compilers,
such as FORTRAN IV-PLUS, do better than
suggested by this model by eliminating or simplifying operations (for example, by constant
expression elimination or moving invariant
computations out of loops, and/or by keeping
values in registers instead of main memory, especially across loops). Consequently, the model
serves primarily as a relatively implementationindependent frame of reference for comparing
alternative implementations.

The sizes and cycle counts from this model
for the sample statements are also shown in
Table 1. These values are quite similar to values
for both the micro-thread and FORTRAN IVPLUS implementations.
We interpreted these results as a clear demonstration that a micro-threaded implementation
could not significantly outperform the existing
FORTRAN IV-PLUS implementation. Further, effort expended for greater performance
would be better directed toward improved optimization in FORTRAN IV-PLUS (which
would benefit existing hardware products) or
toward faster hardware per se. *
There is also a broader interpretation of the
results that is worth reflection. The threaded
implementation was designed to be a good
FORTRAN architecture. Yet, when implemented in microcode in a manner comparable
with the host PDP-II architecture, the performance is close to that achieved by the FORTRAN IV-PLUS compiler and also close to
that of an "ideal" model. One is led to speculate
that the PDP-II with FPII is also a good FORTRAN architecture.
ACKNOWLEDGEMENTS

Many individuals contributed to the design,
implementation, and evolution of the PDP-II
FORTRAN product. The following were particularly involved in those aspects described in
this paper. Jim Bell, Dave Knight, and the author participated in the initial design evaluation
that led to the basic virtual machine. Dave was
project leader for the first versions of the product. Rich Grove participated in the support of
the FP 11 and FIS options. The extended virtual
machine design and implementation, and the
microcode feasibility analysis were done by the
author. Finally, Craig Mudge assisted in the
preparation of this paper with valuable discussion and criticism, and by not accepting
"no" for an answer.

* Note that Digital did both. FORTRAN IV-PLUS V2 and the FPJ J-C were both released in early 1976 with each offering
significant performance improvements.

The Evolution of the PDP-11
C. GORDON BELL and J. CRAIG MUDGE

A computer is not solely determined by its
architecture; it reflects the technological, economic, and organizational aspects of the environment in which it was designed and built. In
the introductory chapters the non architectural
design factors were discussed: the availability
and price of the basic electronic technology, the
various government and industry rules and
standards, the current and future market conditions, and the manufacturing process.
In this chapter one can see the result of the
interaction of these various forces in the evolution of the PDP-II. Twelve distinct models
(LSI-II, PDP-I 1/04, 11 /05, 11 /20, 11 /34,
11/34C, 11/40, 11/45, II/55, 11/60, 11/70, and
VAX-II/780) exist in 1978.
The PDP-II has been successful in the marketplace: over 50,000 were sold in the first eight
years that it was on the market (1970-1977). It
is not clear how rigorous a test (aside from the
marketplace) the design has been given, since a
large and aggressive marketing organization,
armed with software to correct architectural inconsistencies and omissions, can save almost
any design.

Many ideas from the PDP-II have migrated
to other computers with newer designs. Although some of the features of the PDP-II are
patented, machines have been made with similar bus and instruction set processor structures.
Many computer designers have adopted a unified data and address bus similar to the Unibus
as their fundamental architectural component.
Many microprocessor designs incorporate the
PDP-II Unibus notion of mapping I/O and
control registers into the memory address
space, eliminating the need for I/0 instructions
without complicating the I/O control logic.
It is the nature of computer engineering to be
goal-oriented, with pressure to produce deliverable products. It is therefore difficult to plan
for an extensive lifetime. Nevertheless, the
PDP-II evolved rapidly over a much wider
range than expected. An outline of a family
plan was set forth in a memo on April 3, 1969,
by Roger Cady, head of the PDP-II engineering group at the time (Table 1). The actual evolution is shown in tree form in Figure 1 and is
mapped onto a cost/performance representation in Figure 2.
379

w

Table 1. PDP-11 Family Projection as of April 3, 1969

Model

Processor

11/10

11/20

00
0

Software

Logic

Arithmetic

Speed

Price

Power

Power

(115)

($K)

Configuration

0.7

0.7

2-3

4

Tech nologically
cost reduced
11/20 with Mos

KA11

2.2

11/30

KA11

11/40

KB11

2*

11/45

KBll

2*

5.2

Paper Tape

\J

7l

Pc, 1-Kbyte ROM,
128 byte R/W
turnkey console

"»
s:
r

9.3

Pc, 8-Kbyte core,
console, TTY

Assembler, editor;
math utility
FOCAL, BASIC,
ASA BASIC:j:
FORTRAN)

8-like monitor
(system builder
w/ODT, DDT, PIP)t

10-20

1.2

13

Adds * , / , normalize, etc. possible
microprogrammed
processor, no EAE
saves $1 ,000

Possible 16-Kbyte
FORTRAN IV
improved
assembler

FORTRAN IV

10-20

1.2

15

11/45 with memory
protect/relocate
maximum core 262
Kbyte, maximum
physical memory
(using disk)222
bytes

disk

11/50

KC11

2*

50-100

1.2

11/55

KC11

2*

50-100

1.2

25

Adds hardware
floating point
32-bit processor,
16-bit memory
(16 Kbyte)

27

With memory
protect/relocate

+
disk
KDll

4

100-200

1.2
32-bit

45

+
disk

--I
I
m

0

2.2

+

11/65

Disk

32-bit separate
memory bus. 32-bit
processor

NOTES:
*If microprogrammed, then logical power could be tailored to user and go to 20-50, 40-100 for 11/65.
t
.
.
.
:j: Business language system under consideration.
Possible by-product of F OCA L.
• ·Super monitor for 11/45, 11/55, 11/65 is priority multi-user real-time system.

Super monitor**
65-Kbyte virtual
memory/user for
either small or
large disk

-<

THE EVOLUTION OF THE PDP-11

1982

1-

EVAL.UATION AGAINST THE ORIGiNAL
GOALS

,~o~~

I

I

I

.34C

LSI-11/J

1978

I

55.

\..

1976

I

i1

.60

.03

c

j

1974
MEMORY BUS
(MBC BUS)

"n

1970

l

I
BUS
(FASTBUS)

~

1

.20

~

I
1968

The PD P-11 Family tree.

Figure 1.

" " "/e
,

"

"'

~

;"
+
Co

a:
Co

"g

,

~

"

/,,------- MAXIMUM
PERFORMANCE

/

70
,

OESIGNS
e55

X'
" " " "-

,

,

e~

"'

'\ ~e~60

2~"

"40

COST/
"'-PERFORMANCE

,

"'

"DESIGNS

" ", "" ""~
"

e 05
"

" _ '\
e34

"

~

,

"'

LINES
OF
CONSTANT

'~PERFORMANCE
"

lAND DECREASING
"RICE)

"-

"

Figure 2. PDP-11 models price versus time with lines
of constant performance.

* The

381

In the original 1970 PDP-II paper (Chapter
9), a set of design goals and constraints were
given, beginning with a discussion of the weaknesses frequently found in minicomputers. The
designers of the PD P-II faced each of these
known minicomputer weaknesses, and their
goals included a solution to each one. This section reviews the original goals, commenting on
the success or failure of the PDP-II in meeting
each of them.
The weaknesses of prior designs that were
noted were limited addressability, a small number of registers, absence of hardware stack facilities, limited interrupt structures, absence of
byte string handling and read-only memory facilities, elementary I/0 processing, absence of
growth-path family members, and high programming costs.
The first weakness of minicomputers was
their limited addressing capability. The biggest
(and most common) mistake that can be made
in a computer design is that of not providing
enough address bits for memory addressing and
management. The PDP-II followed this hallowed tradition of skimping on address bits, but
it was saved by the principle that a good design
can evolve through at least one major change.
For the PDP-II, the limited address problem
was solved for the short run, but not with
enough finesse to support a large family of
minicomputers. That was indeed a costly oversight, resulting in both redundant development
and lost sales. It is extremely embarassing that
the PDP-II had to be redesigned with memory
management* only two years after writing the
paper that outlined the goal of providing increased address space. All earlier DEC designs
suffered from the same problem, and only the

memory management served two other functions besides expanding the 16-bit processor-generated addresses into 18bit Unibus addresses: program relocation and protection.

382

THE PDP-11 FAMILY

PDP-tO evolved over a long period (15 years)
before a change occurred to increase its address
space. In retrospect, it is clear that another address bit is required every two or three years,
since memory prices decline about 30 percent
yearly, and users tend to buy constant price successor systems.
A second weakness of minicomputers was
their tendency to skimp on registers. This was
corrected for the PDP-II by providing eight 16bit registers. Later, six 64-bit registers were
added as the accumulators for floating-point
arithmetic. This number seems to be adequate:
there are enough registers to allocate two or
three registers (beyond those already dedicated
to program counter and stack pointer) for program global purposes and still have registers for
local statement computation.* More registers
would increase the context switch time and worsen the register allocation problem for the user.
A third weakness of minicomputers was their
lack of hardware stack capability. In the PDPI I, this was solved with the autoincrement/autodecrement addressing mechanism.
This solution is unique to the PDP-II, has proved to be exceptionally useful, and has been
copied by other designers. The stack limit
check, however, has not been widely used by
DEC operating systems.
A fourth weakness, limited interrupt capability and slow context switching, was essentially
solved by the Unibus interrupt vector design.
The basic mechanism is very fast, requiring only
four memory cycles from the time an interrupt
request is issued until the first instruction of the
interrupt routine begins execution. Implementations could go further and save the general
registers, for example, in memory or in special
registers. This was not specified in the architecture and has not been done in any of the implementations to date. VAX-II provides

* Since dedicated registers are used for each Commercial
was added.

explicit load and save process context instructions.
A fifth weakness of earlier minicomputers,
inadequate character handling capability, was
met in the PO P-l1 by providing direct byte addressing capability. String instructions were not
provided in the hardware, but the common
string operations (move, compare, concatenate)
could be programmed with very short loops.
Early benchmarks showed that this mechanism
was adequate. However, as COBOL compilers
have improved and as more understanding of
operating systems string handling has been obtained, a need for a string instruction set was
felt, and in 1977 such a set was added.
A sixth weakness, the inability to use readonly memories as primary memory, was
avoided in the PDP-II. Most code written for
the PDP-II tends to be reentrant without special effort by the programmer, allowing a readonly memory (ROM) to be used directly. Readonly memories are used extensively for bootstrap loaders, program de buggers, and for
simple functions. Because large read-only memories were not available at the time of the original design, there are no architectural
components designed specifically with large
ROMs in mind.
A seventh weakness, one common to many
minicomputers, was primitive I/O capabilities.
The PDP-II answers this to a certain extent
with its improved interrupt structure, but the
completely general solution of I/O computers
has not yet been implemented. The I/O proces:"
sor concept is used extensively in display processors, in communication processors, and in
signal processing. Having a single machine instruction that transmits a block of data at the
interrupt level would decrease the central processor overhead per character by a factor of 3; it

Instruction Set (CIS) instruction, this was no longer true when CIS

THE EVOLUTION OF THE PDP-11

should have been added to the PDP-II instruction set for implementation on all machines.
Provision was made in the 11/60 for invocation
of a micro-level interrupt service routine in
writable control store (WCS), but the family architecture is yet to be extended in this direction.
Another common minicomputer weakness
was the lack of system range. If a user had a
system running on a minicomputer and wanted
to expand it or produce a cheaper turnkey version, he frequently had no recourse, since there
were often no larger and smaller models with
the same architecture. The PDP-II has been
very successful in meeting this goal.
A ninth weakness of minicomputers was the
high cost of programming caused by programming in lower level languages. Many users programmed in assembly language, without the
comfortable environment of high-level languages, editors, file systems, and debuggers
available on bigger systems. The PDP-II does
not seem to have overcome this weakness, although it appears that more complex systems
are being successfully built with the PDP-II
than with its predecessors, the PDP-8 and the
PDP-IS. Some systems programming is done
using higher level languages; however, the optimizing compiler for BLISS-II at first ran only
on the PDP-IO. The use of BLISS has been
slowly gaining acceptance. It was first used in
implementing the FORTRAN-IV PLUS (optimizing) compiler. Its use in PDP-IO and VAX11 systems programming has been more widespread.
One design constraint that turned out to be
expensive, but worth it in the long run, was the
necessity for the word length to be a multiple of
eight bits. Previous DEC designs were oriented
toward 6-bit characters, and DEC had a large
investment in 12-, 18-, and 36-bit systems, as described in Parts II and V.
Microprogrammability was not an explicit
design goal, partially because fast, large, and inexpensive read-only memories were not available at the time of the first implementation. All

383

subsequent machines have been microprogrammed, but with some difficulty because
some parts of the instruction set processor, such
as condition code setting and instruction register decoding, are not ideally matched to microThe design goal of understandability seems to
have received little attention. The PDP-II was
initially a hard machine to understand and was
marketable only to those with extensive computer experience. The first programmers' handbook was not very helpful. It is still unclear
whether a user without programming experience can learn the machine solely from the
handbook. Fortunately, several computer science textbooks [Gear, 1974; Eckhouse, 1975;
Stone and Siewiorek, 1975] and other training
books have been written based on the PDP-II.
Structural flexibility (modularity) for hardware configurations was an important goal.
This succeeded beyond expectations and is discussed extensively in the Unibus Cost and Performance section.
EVOLUTION OF THE INSTRUCTION SET
PROCESSOR

Designing the instruction set processor level
of a machine - that cotlection of characteristics
such as the set of data operators, addressing
modes, trap and interrupt sequences, register
organization, and other features visible to a
programmer of the bare machine - is an extremely difficult problem. One has to consider
the performance (and price) ranges of the machine family as well as the intended applications, and difficult tradeoffs must be made.
For example, a wide performance range argues
for different encodings over the range; for small
systems a byte-oriented approach with small
addresses is optimal, whereas larger systems require more operation codes, more registers, and
larger addresses. Thus, for larger machines, instruction coding efficiency can be traded for
performance.

384

THE PDP-11 FAMILY

The PDP-II was originally conceived as a
small machine, but over time its range was
gradually extended so that there is now a factor
of 500 in price ($500 to $250,000) and memory
size (8 Kbytes to 4 Mbytes*) between the smallest and largest models. This range compares favorably with the range of the IBM System 360
family (16 Kbytes to 4 Mbytes). Needless to
say, a number of problems have arisen as the
basic design was extended.
Chronology of the Extensions

A chronology of the extensions is given in
Table 2. Two major extensions, the memory
management and the floating point, occurred
with the 11/45. The most recent extension is the
Commercial Instruction Set, which was defined
to enhance performance for the character string
and decimal arithmetic data-types of the commercial languages (e.g., COBOL). It introduced
the following to the PDP-II architecture:
1.

2.
3.

4.

5.

Data-types representing character sets,
character strings, packed decimal
strings, and zoned decimal strings.
Strings of variable length up to 65 Kcharacters.
Instructions for processing character
strings in each data-type (move, add,
subtract, multiply, divide).
Instructions for converting among
binary integers, packed decimal strings,
and zoned decimal strings.
Instructions to move the descriptors for
the variable length strings.

The initial design did not have enough operation code space to accommodate instructions
for new data-types. Ideally, the complete set of
operation codes should have been specified at
initial design time so that extensions would fit.

*Although

With this approach, the uninterpreted operation codes could have been used to call the various operation functions, such as a floatingpoint addition. This would have avoided the
proliferation of run-time support systems for
the various hardware/software floating-point
arithmetic methods (Extended Arithmetic Element, Extended Instruction Set, Floating Instruction Set, Floating-Point Processor). The
extracode technique was used in the Atlas and
Scientific Data Systems (SDS) designs, but
these techniques are overlooked by most computer designers. Because the complete instruction set processor (or at least an extension
framework) was unspecified in the initial design, completeness and orthogonality have been
sacrificed.
At the time the PDP-I 1/45 was designed, several operation code extension schemes were examined: an escape mode to add the floatingpoint operations, bringing the PDP-II back to
being a more conventional general register machine by reducing the number of addressing
modes, and finally, typing the data by adding a
global mode that could be switched to select
floating point instead of byte operations for the
same operation codes. The floating-point instruction set, introduced with the 11/45, is a
version of the second alternative.
It also became necessary to do something
about the small address space of the processor.
The Unibus limits the physical memory to the
262,144 bytes addressable by I8-bits. In the
PDP-I 1/70, the physical address was extended
to 4 Mbytes by providing a Unibus map so that
devices in a 256 Kbyte Unibus space could
transfer into the 4-Mbyte space via mapping
registers. While the physical address limits are
acceptable for both the Unibus and larger systems, the address for a single program is still
confined to an instantaneous space of 16 bits,
the user virtual address. The main method of

22 bits are used, only 2 megabytes can be utilized in the 11/70.

THE EVOLUTION OF THE POP-11

Table 2. Chronology of PDP-11 Instruction
Set Processor (lSP) Evolution
Model(s)

Evolution

11120

Base ISP (16-bit virtual address) and
PMS (16-bit processor physical
memory address) Unibus with 18-bit
addressing

11/20

Extended Arithmetic Element (hardware multiply/divide)

11/45
(11/55,11170,
11/60,11/34)

Floating-point instruction set with 6
additional registers (46 instructions)
in the Floating-Point Processor

11/45
(11/55,11170)

Memory management (KT11 C). 3
modes of protection (Kernel. Supervisor, User); 18-bit processor physical addressing; 16-bit virtual
addressing in 8 segments for both
instruction and data spaces

11/45
(11/55,11170)

Extensions for second set of general
registers and program interrupt
request

11/40
(11/03)

Extended Instruction Set for multiply/divide; floating-point instruction
set (4 instructions)

11/40
( 11 /34, 11 /60)

Memory Management (KT110). 2
modes of protection (Kernel. User);
18-bit processor physical addressing; 16-bit virtual addressing in 8
segments

11170

22-bit processor physical addressing; Unibus map for peripheral controller 22-bit addressing

11170
(11/60)

Error register accessibility for on-line
diagnosis and retry (e.g., cache parity
error)

11/03
(11/04,11/34)

Program access to processor status
register via explicit instruction (versus Unibus address)

11/03

One level program interrupt

11/60

Extended Function Code for invocation of user-written microcode

VAX-111780

VAX architectural extensions for 32bit virtual addressing; VAX ISP

11/03

Commercial Instruction Set (CIS)

11170mP

Interprocessor Interrupt and System
Timers for multiprocessor

385

dealing with relatively small addresses is via
process-oriented operating systems that handle
many small tasks. This is a trend in operating
systems, especially for process control and
transaction processing. It does, however, enforce a structuring discipline in (user) program
organization. The RSX-ll series of operating
systems for the PDP-II are organized this way,
and the need for large addresses is lessened.
The initial memory management proposal to
extend the virtual memory was predicated on
dynamic, rather than static, assignment of
memory segment registers. In the current memory management scheme, the address registers
are usually considered to be static for a task (although some operating systems provide functions to get additional segments dynamically).
With dynamic assignment, a user can address
a number of segment names, via a table, and
directly load the appropriate segment registers.
The segment registers act to concatenate additional address bits in a base address fashion.
There have been other schemes proposed that
extend the addresses by extending the length of
the general registers - of course, extended addresses propagate throughout the design and include double length address variables. In effect,
the extended part is loaded with a base address.
With larger machines and process-oriented
operating systems, the context switching time
becomes an important performance factor. By
providing additional registers for more processes, the time (overhead) to switch context
from one process (task) to another can be reduced. This option has not been used in the operating system implementations of the PDP-lIs
to date, although the 11/45 extensions included
a second set of general registers. Various alternatives have been suggested, and to accomplish
this effectively requires additional operators to
handle the many aspects of process scheduling.
This extension appears to be relatively unimportant since the range of computers coupled
with networks tends to alleviate the need by increasing the real parallelism (as opposed to the

386

THE PDP-11 FAMILY

apparent parallelism) by having various independent processors work on the separate processes in parallel. The. extensions of the PDP-II
for better control of I/O devices is clearly more
important in terms of improved performance.
Architecture Management

In retrospect, many of the problems associated with PDP-II evolution were due to the
lack of an ongoing architecture management
function. As can be seen from Table I, the notion of planned evolution was very strong at the
beginning. However, a formal architecture control function was not set up until early in 1974.
In some sense this was already too late - the
four PDP-II models designed by that date
(11/20, 11/05, 11/40, 11/45) had incompatibilities between them. The architecture
control function since then has ensured that no
further divergence (except in the LSI-II) took
place in subsequent models, and in fact resulted
in some convergence. At the time the Commercial Instruction Set was added, an architecture extension framework was adopted.
Insufficient encodings existed to provide a large
number of additional instructions using the
same encoding style (in the same space) as the
basic PDP-II, i.e., the operation code and oper':
and specifier addressing mode specifiers within
a single 16-bit word. An instruction extension
framework was adopted which utilized a full
word as the opcode, with operand addressing
mode specifiers in succeeding instruction
stream words along the lines of VAX-II. This
architectural extension permits 512 additional
opcodes, and instructions may have an unlimited number of operand addressing mode specifiers. The architecture control function also had
to deal with the Unibus address space problem.
With VAX-ll, architecture management has
been in place since the beginning. A definition

of the architecture was placed under formal
change control well before the VAX-ll/780
was built, and both hardware and software engineering groups worked with the same document. Another significant difference is that an
extension framework was defined in the original
architecture.
An Evaluation

The criteria used to decide whether or not to
include a particular capability in an instruction
set are highly variable and border on the artistic. * Critics ask that the machine appear elegant, where elegance is a combined quality of
instruction formats relating to mnemonic significance, operator/data-type completeness and
orthogonality, and addressing consistency.
Having completely general facilities (e.g., registers) which are not context dependent assists in
minimizing the number of instruction types and
in increasing understandability (and usefulness). The authors feel that the PDP-II has provided this.
At the time the Unibus was designed, it was
felt that allowing 4 Kbytes of the address space
for I/O control registers was more than enough.
However, so many different devices have been
interfaced to the bus over the years that it is no
longer possible to assign unique addresses to
every device. The architectural group has thus
been saddled with the chore of device address
bookkeeping. Many solutions have been proposed, but none was soon enough; as a result,
they are all so costly that it is cheaper just to live
with the problem and the attendant inconvenience.
Techniques for generating code by the human
and compiler vary widely and thus affect instruction set processor design. The PDP-II provides more addressing modes than nearly any
other computer. The eight modes for source

*Today one would use the S, M, and R measures and methodology defined in Appendix 3.

THE EVOLUTION OF THE PDP-11

and destination with dyadic operators provide
what amounts to 64 possible ADD instructions.
By associating the Program Counter and Stack
Pointer registers with the modes, even more
data accessing methods are provided. For example, 18 varieties of the MOVE instruction
can be distinguished as the machine is used in
two-address, general register, and stack machine program forms. (There is a price for this
generality - namely, fewer bits could have been
used to encode the address modes that are actually used most of the time.)
How the PDP-11 Is Used

In general, the PDP-ll has been used mostly
as a general register (i.e., memory to registers)
machine. This can be seen by observing the use
frequency from Strecker's data (Chapter 14). In
one case, it was observed that a user who previously used a one-accumulator computer (e.g.,
PDP-8), continued to do so. A general register
machine provides the greatest performance, and
the cost (in terms of bits) is the same as when
used as a stack machine. Some compilers, particularly the early ones, are stack oriented since
the code production is easier. In principle, and
with much care, a fast stack machine could be
constructed. However, since most stack machines use primary memory for the stack, there
is a loss of performance even if the top of the
stack is cached. While a stack is the natural
(and necessary) structure to interpret the nested
block structure languages, it does not necessarily follow that the interpretation of all statements should occur in the context of the stack.
In particular, the predominance of register
transfer statements are of the simple 2- and 3address forms:
D~S

and
DI(index 1) ~ f(S2(index 2), S3(index 3)).

387

These do not require the stack organization.
In effect, appropriate assignment allows a general register machine to be used as a stack machine for most cases of expression evaluation.
This has the advantage of providing temporary,
random access to common subexpressions, a
capability that is usually hard to exploit in stack
arch itect ures.
THE EVOLUTION OF THE PMS
(MODULAR) STRUCTURE

The end product of the PDP-II design is the
computer itself, and in the evolution of the architecture one can see images of the evolution
of ideas. In this section, the architectural evolution is outlined, with a special emphasis on the
Unibus.
The Unibus is the architectural component
that connects together all of the other major
components. It is the vehicle over which data
flow between pairs of components takes place.
I ts structure is described in Chapter 11.
In general, the Unibus has met all expectations. Several hundred types of memories and
peripherals have been interfaced to it; it has become a standard architectural component of
systems in the $3K to $1 OOK price range (1975).
The Unibus does limit the performance of the
fastest machines and penalizes the lower performance machines with a higher cost. Recently
it has become clear that the Unibus is adequate
for large, high performance systems when a
cache structure is used because the cache reduces the traffic between primary memory and
the central processor since about one-tenth of
the memory references are outside the cache.
For still larger systems, supplementary buses
were added for central processor to primary
memory and primary memory to secondary
memory traffic. For very small systems like the
LSI-II, a narrower bus was designed.
The Unibus, as a standard, has provided an
architectural component for easily configuring

388

THE PDP-11 FAMILY

systems. Any company, not just DEC, can easily build components that interface to the bus.
Good buses make good engineering neighbors,
since people can concentrate on structured design. Indeed, the Unibus has created a secondary industry providing alternative sources of
supply for memories and peripherals. With the
exception of the IBM 360 Multiplexer/Selector
Bus, the Unibus is the most widely used computer interconnection standard.
The Unibus has also turned out to be invaluable as an "umbilical cord" for factory diagnostic and checkout procedures. Although
such a capability was not part of the original
design, the Unibus is almost capable of controlling the system components (e.g., processor
and memory) during factory checkout. Ideally,
the scheme would let all registers be accessed
during full operation. This is possible for all devices except the processor. By having all central
processor registers available for reading and
writing in the same way that they are available
from the console switches, a second system can
fully monitor the computer under test.
In most recent PDP-II models, a serial communications line, called the ASCII Console, is
connected to the console, so that a program
may remotely examine or change any information that a human operator could examine or
change from the front panel, even when the system is not running. In this way computers can
be diagnosed from a remote site.
Difficulties with the Design

The Unibus design is not without problems.
Although two of the bus bits were set aside in
the original design as parity bits, they have not
been widely used as such. Memory parity was
implemented directly in the memory; this phenomenon is a good example of the sorts of
problems encountered in engineering optimization. The trading of bus parity for memory parity exchanged higher hardware cost and
decreased performance for decreased service

cost and better data integrity. Because engineers
are usually judged on how well they achieve
production cost goals, parity transmission is an
obvious choice to pare from a design, since it
increases the cost and decreases the performance. As logic costs decrease and pressure to include warranty costs as part of the product
design cost increases, the decision to transmit
parity may be reconsidered.
Early attempts to build tightly coupled multiprocessor or multicomputer structures (by mapping the address space of one Unibus onto the
memory of another), called Unibus windows,
were beset with a logic deadlock problem. The
Unibus design does not allow more than one
master at a time. Successful multiprocessors required much more sophisticated sharing mechanisms such as shared primary memory.
Unibus Cost and Performance

Although performance is always a design
goal, so is low cost; the two goals conflict
directly. The Unibus has turned out to be nearly
optimum over a wide range of products. It
served as an adequate memory-processor interconnect for six of the ten models. However, in
the smallest system, DEC introduced the LSI11 Bus, which uses about half the number of
conductors. For the largest systems, a separate
32-bit data path is used between processor and
memory, although the Unibus is still used for
communication with the majority of the I/O
controllers (the slower ones). Figure 1 summarizes the evolution of memory-processor interconnections in the LSI-II Family. Levy
(Chapter 11) discusses the evolution in more detail.
The bandwidth of the Unibus is approximately 1.7 megabytes per second or 850 K
transfers/second. Only for the largest configurations, using many I/O devices with very
high data rates, is this capacity exceeded. For
most configurations, the demand put on an I/O
bus is limited by the rotational delay and head

THE EVOLUTION OF THE PDP-11

positioning of disks and the rate at which programs (user and system) issue I/O requests.
An experiment to further the understanding
of Unibus capacity and the demand placed
against it was carried out. The experiment used
a synthetic workload; like all synthetic workloads, it can be challenged as not being representative. However, it was generally agreed that
it was a heavy I/0 load. The load simulated
transaction processing, swapping, and background computing in the configuration shown
in Figure 3. The load was run on five systems,
each placing a different demand on the Unibus.
Each run produced two numbers: (1) the time
to complete 2,000 transactions, and (2) the
number of iterations of a program called
HANOI that were completed.

System

Benchmark
Time
(minutes)*

Number of
HANOI
Iterations

11/60 cache on
11 /60 cache off
11/40
11/70 MBCBUS
11 /70 Unibus

15
15
15
15
26

12

* 2,000 transactions

plus swapping plus HANOI.

2.

I/O throughput. For this workload the
Unibus bandwidth was adequate. For
systems 1 through 4 the I/O activity
took the same amount of time.
11/70 Unibus. The run on this system
(no use was made of the 32-bit wide processor/memory bus) took longer because of the retries caused by data lates
(approximately 19,000) on the moving
head disk (RP04). The extra time taken
for the benchmark allowed more iterations of HANOI to occur. The PDP-

BACKGROUNO COMPUTATION (HANOI
BENCHMARK LOOPING)

HANOI LOOP

TRANSACTION
PROCESSING
NO.1

1000 TRANSACTIONS
EACH TRANSACTION INVOLVES 8
READS AND 2 WRITES (TOTAL OF 4064
,

TRANSACTION
PROCESSNG
NO.2

8K

RP04

WORDS PER TRANSACTION) AND 12 ms

1@J -----------

!

PROCESSING

' "

/

1000 TRANSACTIONS (AS FOR NO 1)

RS03
SWAP EVERY 100 ms (ONE 15K WRITE
(CK1). ONE 10K READ (CK2). ONE 15K
READ (CK1)

SWAPPING

15K

41's/WORD
RK05

EXEC

14K

RSX11M
MCR TASK SHF
IS LOADED FROM
AK05 EVERY 100 ms

Figure 3. The synthetic workload used to measure
Unibus capacity.

2
3
23
38

The results were interpreted as follows:

1.

8K

389

3.

11/70 Unibus had a bandwidth of about
1 megabyte. It was less than the usual
Unibus (about 1.7 megabyte) because of
the map delay (100 nanoseconds), the
cache cycle (240 nanoseconds), and the
main memory bus redriving and synchronization.
11/60 Cache. Systems 1 and 2 clearly
show the effectiveness of a cache. Most
memory references for HANOI were to
the cache and did not involve the
Unibus, which was the PDP-II/60s I/0
Bus. Systems 2 and 3 were essentially
equivalent, as expected. There are two
reasons for the 11/40 having slightly
more compute bandwidth than an 11/60
with its cache off. First, the 11/40 memory is faster than the 11/60 backing
store, and second, the 11/40 processor
relinquishes the Unibus for a direct
memory access cycle; the 11/60 processor must request the Unibus for a processor cycle.

390

THE PDP-ll FAMILY

There are several attributes of a bus that affect its cost and performance. One factor affecting performance is simply the data rate of a
single conductor. There is a direct tradeoff involving cost, performance, and reliability.
Shannon [1948] gives a relationship between the
fundamental signal bandwidth of a link and the
error rate (signal-to-noise ratio) and data rate.
The performance and cost of a bus are also affected by its length. Longer cables cost proportionately more, since they require more
complex circuitry to drive the bus.
Since a single-conductor link has a fixed data
rate, the number of conductors affects the net
speed of a bus. However, the cost of a bus is
directly proportional to the number of conductors. For a given number of wires, time domain multiplexing and data encoding can be
used to trade performance and logic complexity. Since logic technology is advancing faster than wiring technology, it seems likely that
fewer conductors will be used in all future systems, except where the performance penalty of
time domain multiplexing is unacceptably
great.
If, during the original design of the Unibus,
DEC designers could have foreseen the wide
range of applications to which it would be applied, its design would have been different. Individual controllers might have been reduced in
complexity by more central control. For the
largest and smallest systems, it would have been
useful to have a bus that could be contracted or
expanded by multiplexing or expanding the
number of conductors.
The cost-effectiveness of the Unibus is due in
large part to the high correlation between memory size, number of address bits, I/O traffic,
and processor speed. Gene Amdahl's rule of
thumb for IBM computers is that I byte of
memory and I byte/sec of I/O are required for
each instruction/sec. For traditional DEC applications, with emphasis in the scientific and
control applications, there is more computation
required per memory word. Further, the PDPII instruction sets do not contain the extensive

commercial instructions (character strings) typical of IBM computers, so a larger number of
instructions must be executed to accomplish the
same task. Hence, for DEC computers, it is better to assume I byte of memory for each 2 instructions/sec, and that I byte/sec of I/0
occurs for each instruction/sec.
In the PDP-II, an average instruction accesses 3-5 bytes of memory, so assuming I byte
of I/0 for each instruction/sec, there are 4-6
bytes of memory accessed on the average for
each instruction/sec. Therefore, a bus that can
support 2 megabytes/sec of traffic permits instruction execution rates of 0.33-0.5 mega-instructions/sec. This implies memory sizes of
0.16-0.25 megabytes, which matches well with
the maximum allowable memory of 0.064-0.256
megabytes. By using a cache memory on the
processor, the effective memory processor rate
can be increased to balance the system further.
If fast floating-point instructions were added to
the instruction set, the balance might approach
that used by IBM and thereby require more
memory (an effect seen in the PO P-ll /70).
The task of I/O is to provide for the transfer
of data from peripheral to primary memory
where it can be operated on by a program in a
processor. The peripherals are generally slow,
inherently asynchronous, and more error-prone
than the processors to which they are attached.
Historically, I/0 transfer mechanisms have
evolved through the following four stages:
1.

Direct sequential I/O under central processor control. An instruction in the processor causes a data transfer to take
place with a device. The processor does
not resume operation until the transfer is
complete. Typically, the device control
may share the logic of the processor. The
first input/output transfer (lOT) instruction in the PDP-I is an example; the lOT
effects transfer between the Accumulator and a selected device. Direct I/O
simplifies programming because every
operation is sequential.

THE EVOLUTION OF THE PDP-11

Fixed buffer, I-instruction controllers. An
instruction in the central processor
causes a data transfer (of a word or vector), but in this case, it is to a buffer of
the simple controller and thus at a speed
rrl!lt('hino th!lt {1f thp 1""
nT{1(,pC;:C;:{1T ••A ftPT thp
"'~~-""'O

~"~~

~,

~"-

~--~~~,

~,~-,

~"-

high speed transfer has occurred, the
processor contin ues while an asynchronous, slower transfer occurs between the
buffer and the device. Communication
back to the processor is via the program
interrupt mechanism. A single instruction to a simple controller can also cause
a complete block (vector) of data to be
transmitted between memory and the peripheral. In this case, the transfer takes
place via the direct memory access
(OMA) link.

3.

Separate I/O processors - the channel.
An independent I/O processor with a
unique ISP controls the flow of data between primary memory and the peripheral. The structure is that of the
multiprocessor, and the I/O control program for the device is held in primary
memory. The central processor informs
the I/0 processor about the I/0 program location.

4.

I/O computer. This mechanism is also
asynchronous with the central processor,
but the I/0 computer has a private
memory which holds the I/O program.
Recently, DEC communications options
have been built with embedded control
programs. The first example of an I/0
computer was in the CDC 6600 (1964).

The authors believe that the single-instruction controller is superior to the I/0 processor
as embodied in the IBM Channel mainly because the latter concept has not gone far
enough. Channels are costly to implement, suf-

391

ficiently complex to require their own programming environment, and yet not quite powerful
enough to assume the processing, such as file
management, that one would like to offload
from the processor. Although the I/O traffic
rtn.P~ rpn
lIirp
nr{1(,pC;:C;:{1r
rpC;:{"lllr('p~
~ rt_
""''1...... - ('pntr~
...................... 1 ,t-'.1.
'-' __
_u'-' _. . . . _u, t.....hp
.1""" """'""
dition of a second, general purpose central processor is more cost-effective than using a central
processor-I/O processor or central processormultiple I/0 processor structure. Future I/O
systems will be message-oriented, and the various I/0 control functions (including diagnostics and file management) will migrate to the
subsystem. When the I/O computer is an exact
duplicate of the central processor, not only is
there an economy from the reduced number of
part types but also the same programming environment can be used for I/O software development and main program development.
Notice that the I/O computer must implement
precisely the same set of functions as the processor doing direct 1/0.*
'""'-''''''t..J'

.I.

f.A.A

UU,-,,l

.I.

MULTI PROCESSORS

It is not surprising that multiprocessors are
used only in highly specialized applications
such as those requiring high reliability or high
availability. One way to extend the range of a
family and also provide more performance alternatives with fewer basic components is to
build multiprocessors. In this section some factors affecting the design and implementation of
multiprocessors, and their effect on the POP11, are examined.
It is the nature of engineering to be conservative. Given that there are already a number of
risks involved in bringing a product to the market, it is not clear why one should build a higher
risk structure that may require a new way of
programming. What has resulted is a sort of
deadlock situation: people cannot learn how to
program multiprocessors and employ them in a

*The 1/0 computer is yet another example of the wheel of reincarnation of display processors (see Chapter 7).

392

THE PDP-11 FAMI LV

single task until such machines exist, but manufacturers will not build the machine until they
are sure that there will be a demand for it, i.e.,
that the programs will be ready.
There is little or no market for multiprocessors even though there is a need for increased reliability and availability of machines.
IBM has not promoted multiprocessors in the
marketplace, and hence the market has lagged.
One reason that there is so little demand for
multiprocessors is the widespread acceptance of
the philosophy that a better single-processor
system can always be built. This approach
achieves performance at the considerable expense of spare parts, training, reliability, and
flexibility. Although a multiprocessor architecture provides a measure of reliability,
backup, and system tunability unreachable on a
conventional system, the biggest and fastest machines are uniprocessors - except in the case of
the Bell Laboratories Safeguard Computer [Bell
Laboratories, 1975].
Multiprocessor systems have been built out
of PDP-lIs. Figure 4 summarizes the design
and performance of some of these machines.
The topmost structure was built using 11/05
processors, but because of inadequate arbitration techniques in the processor, the expected performance did not materialize. Table 3
shows the expected results for multiple 11/05
processors sharing a single Unibus and compares them with the PDP-ll/40.
From the results of Table 3 one would expect
to use as many as three 11/05 processors to
achieve the performance of a model 11/40.
More than three processors will increase the
performance at the expense of the cost-effectiveness. This basic structure has been applied
on a production basis in the GT40 series of
graphics processors for the PDP-II. In this
scheme, a second display processor is added to
the Unibus for display picture maintenance. A
similar structure is used for connecting special

9] cp ... E¥l ... qlal:;J
~. gJ .. :
(a)

Multi-Pc structure using a single Unibus.

(b)

Pc with P.display using a single Unibus.

(c)

Multiprocessor using multiport Mp.

(d) C.mmp CMU multi-miniprocessor computer
structure.
Figure 4.

PD P-11 multiprocessor PMS structures.

signal-processing computers to the Unibus although these structures are technically coupled
computers rather than multiprocessors.
As an independent check on the validity of
this approach, a multiprocessor system has

THE EVOLUTION OF THE PDP-11

Table 3.

Multiple PDP-11i05 Processors Sharing a Single Unibus

Number and
Processor
Model
i-i i/O5
2-11/05
3-11/05
1-11/40

393

Processor
Performance
(Relative)
i .00
1.85
2.4
2.25

Processor
Price

System
Price'" /Performance Price

Pricet /Performance

I.UU

I.UU

~.uu

. ,.,,.,

1.23
1.47
1.35

0.66
0.61
0.60

3.23
3.47
3.35

0.58
0.48
0.49

. ,.,,.,

4

,.,,.,

,., ,.,,.,

I.UU

*Processor cost only.
tTotal system cost assuming one-third of system is processor cost.

been built, based on the Lockheed SUE [Ornstein et al., 1972]. This machine, used as a high
speed communications processor, is a hybrid
design: it has seven dual-processor computers
with each pair sharing a common bus as outlined above. The seven pairs share two multiport memories.
The second type of structure given in Figure 4
is a conventional, tightly coupled multiprocessor using multiple-port memories. A
number of these systems have been installed,
and they operate quite effectively. However,
they have only been used for specialized applications because there has been no operating system support for the structure.

tions because it is quite likely that technology
will force the evolution of computing structures
to converge into three styles of multiprocessor
computers: (1) C.mmp style, for high performance, incremental performance, and availability
(maintainability); (2) C.vmp style for very high
availability motivated by increasing maintenance costs, and (3) loosely coupled computers
like Cm* to handle specialized processing, e.g.,
front end, file, and signal processing. This argument is based on history, present technology,
and resulting price extrapolations:
1.

PDP-11 Based Multiprocessor: CarnegieMellon University Research Computers

2.
The PDP-II architecture has been employed
to pioneer new ideas in the area of multiprocessors. The three multiprocessors built at
Carnegie-Mellon University (CMU) are discussed: C.mmp [Wulf and Bell, 1972], a I6-processor multiprocessor; C. vmp [Siewiorek et al.,
1976], a triplicated, voting multiprocessor computer for high reliability; and Cm * (Chapter
20), a set of computer modules based on LSI11.
The three CMU multiprocessors are good examples of multiprocessor development direc-

3.

MOS technology appears to be increasing in both speed and density faster than
the technology (such as ECL) from
which high performance machines are
usually built.
Standards in the semiconductor industry
tend to form more quickly for high volume products. For example, in the 8-bit
microcomputer market, one type supplies about 50 perc~nt of the market and
three types supply over 90 percent.
The price per chip of the single MOS
chip processors decreases at a substantially greater rate than for the low
volume, high performance special designs. Chips in both designs have high
design costs, but the single-MOS-chip
processors have a much higher volume.

394

4.

5.

THE PDP-11 FAMILY

Several 16-bit processor-on-a-chip processors, with an address space matching
and appropriate data-types matching the
performance, exist in 1978. Such a commodity can form the basis for nearly all
future computer designs.
The performance (instructions per se