Grabbe_Ramo_Wooldridge_Handbook_of_Automation_Computation_and_Control_Vol_1_1958 Grabbe Ramo Wooldridge Handbook Of Automation Computation And Control Vol 1 1958

Grabbe_Ramo_Wooldridge_Handbook_of_Automation_Computation_and_Control_Vol_1_1958 Grabbe_Ramo_Wooldridge_Handbook_of_Automation_Computation_and_Control_Vol_1_1958

User Manual: Grabbe_Ramo_Wooldridge_Handbook_of_Automation_Computation_and_Control_Vol_1_1958

Open the PDF directly: View PDF PDF.
Page Count: 1037

DownloadGrabbe_Ramo_Wooldridge_Handbook_of_Automation_Computation_and_Control_Vol_1_1958 Grabbe Ramo Wooldridge Handbook Of Automation Computation And Control Vol 1 1958
Open PDF In BrowserView PDF
HANDBOOK OF AUTOMATION,
COMPUTATION, AND

CONTROL
Volume 1

CONTROL FUNDAMENTALS

NEW YORK · JOHN WILEY & SONS, INC.
London • Chapman & Hall, Limited

HANDBOOK OF AUTOMATION,
COMPUTATION, AND CONTROL
Volume

1

CONTROL FUNDAMENTALS

Prepared

by a Staff of Specialists

Edited by

EUGENE M. GRABBE
SIMON RAMO
DEAN E. WOOLDRIDGE
The Ramo-Wooldridge Corporation
Los Angeles, California

Copyright

©

1958, by John Wiley & Sons, Inc.

All Rights Reserved. This book or any part
thereof must not be reproduced in any form
without the written permission of the publisher.
Library of Congress Catalog Card Nu.mber: 58-10800
Printed in the United States of America

CONTRI BUTORS

E. L. ARNOfF, Case Institute of Technology, Cleveland, Ohio (Chapter 15)

J. E. BARNES, Jr., General Electric Company, Schenectady, New York
(Chapter 26)

C. E. BRADFORD, General Electric Company, Pittsfield, Massachusetts
(Chapter 22)

J. M. CAMERON, National Bureau of Standards, Washington, D. C. (Chapter 14)
R. F. CLIPPINGER, Datamatic, A Division of Minneapolis-Honeywell Regulator
Company, Newton Highlands, Massachusetts (Co-Editor, Chapter 14)

A. B. CLARKE, University of Michigan, Ann Arbor, Michigan (Chapter 13)
A. H. COPELAND, SR., University of Michigan, Ann Arbor, Michigan (Chapters
11 and 12)

.

P. G. CUSHMAN, General Electric Company, Pittsfield, Massachusetts
(Chapter 23)

M. W. DE MERIT, General Electric Company, Schenectady, New York (Chapter 22)
J. B. DIAZ, University of Maryland, College Park, Maryland (Chapter 14)

B. DIMSDALE, Service Bureau Corporation, Los Angeles, California (Chapter 14)
P. ELIAS, Massachusetts Institute of Technology, Cambridge, Massachusetts
(Chapter 16)

B. FRIEDMAN, University of California, Berkeley, California (Chapter '14)
W. M. GAINES, General Electric Company, Tempe, Arizona (Editor, Part E;
Chapters 19 and 25)
G. E. HAY, University of Michigan, Ann Arbor, Michigan (Chapters 4 and 5)
v

CONTRIBUTORS

vi

E. ISAACSON, New York University, New York City, New York (Chapter 14)
S. J. JENNINGS, General Electric Company, Evendale, Ohio (Chapter 20)

W. KAPLAN, University of Michigan, Ann Arbor, Michigan (Co-Editor, Part Ai
Chapters 5, 7, 8, 9, and 10)

J. H. LEVIN, Datamatic, A Division of Minneapolis-Honeywell Regulator Company,
Newton Highlands, Massachusetts (Co-Editor, Chapter 14)
D. L. LIPPITT, General Electric Company, Schenectady, New York (Chapter 24)

R. C. LYNDON, University of Michigan, Ann Arbor, Michigan (Chapters 2 and 3)
M. MANNOS, Datamatic, A Division of Minneapolis-Honeywell Regulator Company, Newton Highlands, Massachusetts (Chapter 14)

P. MERTZ, Bell Telephone Laboratories, New York City, New York (Chapters 17
and 18)

S. G. REQUE, Ge.neral Electric Company, Tempe, Arizona (Chapter 21)
R. RICHTMEYER, New York University, New York City, New York (Chapter 14)
E. H. ROTHE, University of Michigan, Ann Arbor, Michigan (Chapter 6)

W. E. SOLLECITO, General Electric Company, Schenectady, New York (Chapter 21)
R. M. THRALL, University of Michigan, Ann Arbor, Michigan (Co-Editor, Part Ai
Chapter 1)

A. A. WINKEUOHANN, General Electric Company, Evendale, Ohio (Chapter 20)

FOREWORD

The proliferation of knowledge now makes it most difficult for scientists or
engineers to keep ahead of change even in their own fields, let alone in contiguous
fields. One of the fields where recent change has been most noticeable, and in fact
exponential, has been automatic control. This three-volume Handbook will aid
individuals in almost every branch of technology who must constantly refresh their
memories or refurbish their knowledge about many aspects of their work.
Automation, computation, and control, as we know them, have been evolving
for centuries, but within the last generation their impact has been felt in nearly
every segment of human endeavor. Feedback principles were exploited by Leonardo
da Vinci and applied by James Watt. Some of the early theoretical work of importance was contributed by Lord Kelvin, who also, together with Charles Babbage,
pointed the way to the development of today's giant computational aids. Since
about the turn of the present century, the works of men like Minorsky, Nyquist,
Wiener, Bush, Hazen, and Von Neuman gave quantum jumps to computation and
control. But it was during and immediately following World War II that quantum
jumps occurred in abundance. This was the period when theories of control, new
concepts of computation, new areas of application, and a host of new devices appeared with great rapidity. Technologists now find these fields charged with challenge, but at the same time hard to encompass. From the activities of World War II
such terms as servomechanism, feedback control, digital and analog computer,
transducer, and system engineering reached maturity. More recently the word
automation has become deeply entrenched as meaning something about the field
on which no two people agree.
Philosophically minded technologists do not accept automation merely as a third
Industrial Revolution. They see it, as they stand about where the editors of this
Handbook stood when they projected this work, as a manifestation of one of the
greatest Intellectual Revolutions in Thinking that has occurred for a long time. They
see in automation the natural consequence of man's urge to exploit modern science
on a wide front to perform useful tasks in, for example, manufacturing, transportation, business, physical science, social science, medicine, the military, and government. They see that it has brought great change to our conventional way of thinking about the human use of human beings, to quote Norbert Wiener, and in turn
about how our engineers will be trained to solve tomorrow's engineering problems.
They even see that it has precipitated some deep thinking on the part of our indusvii

viii

FOREWORD

trial and union leadership about the organization of workers in order not to hold
captive bodies of workmen for jobs that automation, computation, and control have
swept or will soon sweep away.
Perhaps the important new face on todais technological scene is the degree to
which the broad field needs codification and unification in order that technologists
can optimize their role to exploit it for the general good. One of the early instances
of organized academic instruction in the field was at The Massachusetts Institute
of Technology in the Electrical Engineering Department in September 1939, as a
course entitled Theory and Application of Servomechanisms. I can well recollect
discussions around 1940 with the late Dr. Donald P. Campbell and Dr. Harold·L.
Hazen, which led temporarily to renaming the course Dynamic Analysis of Automatic Control Systems because so few students knew what "servomechanisms"
were .. But when the GI's returned from war everybody knew, and everyone wanted
instruction. Since that time engineering colleges throughout the land have elected
to offer organized instruction in a multitude of topics ranging from the most abstract mathematical fundamentals to the most specific applications of hardware.
Textbooks are available on every subject along this broad spectrum. But still the
practicing control or computer technologist experiences great difficulty keeping
abreast of what he needs to know.
As organized instruction appeared in educational institutions, and as industrial
activity increased, professional societies organized groups in the areas of control and
computation to meet the needs of their members to tell one another about technical
advances. Within the past five years several trade journals have undertaken to
report regularly on developments in theory, components, and systems. The net
effect of all this is that the technologist is overwhelmed with fragmentary, sometimes contradictory, redundant information that comes at him at random and in
many languages. The problem of assessing and codifying even a portion of this
avalanche of knowledge is beyond the capabilities of even the most able technologist.
The editors of the Handbook have rightly concluded that what each technologist
needs for his long term professional growth is to have a body of knowledge that is
negotiable at par in anyone of a number of related fields for many years to come. It
would be ideal, of course, if a college education could give a prospective technologist
this kind of knowledge. It is in the hope of doing this that engineering curricula
are becoming more broadly based in science and engineering science. But it is unlikely that even this kind of college training will be adequate to cope with the consequences of the rapid proliferation of technology as is manifest in the area of
automation, computation, and control. Hence, handbooks are an essential component of the technical literature when they provide the unity and continuity that
are requisite.
I can think of. no better way to describe this Handbook than to say that the
editors, in both their organization of material and selection of substance, have
given technologists a unified work of lasting value. It truly represents today's
optimum package of that body of knowledge that. will be negotiable at par by
technologists for many years to come in a wide range of disciplines.
GORDON

S.

BROWN

Massachusetts Institute of Technology

PREFACE

Accelerated advances in technology have brought a steady stream of
automatic machines to our factories, offices, and homes. The earliest
automation forms were concerned with doing work, followed by the controlling function, and recently the big surge in automation has been
directed toward data handling functions. New devices ranging from
digital computers to satellites have resulted from military and other
government research and development programs. Such activity will continue to have an important impact on automation progress.
One of the pressures for the development of automation has been the
growing complexity and speed of business and industrial operations. But
automation in turn accelerates the tempo of whatever it tou'ches, so that
we can expect future systems to be even larger, faster, and more complex.
While a segment of engineering will continue to mastermind, by rule of
thumb procedures, the design and construction of automatic equipment
and systems, a growing percentage of engineering effort will be devoted to
activities that may be classified as problem solving. The activities of the
problem solver involve analysis of previous behavior of systems and equipment, simulation of present situations, and predictions about the future.
In the past, problem solving has largely been practiced by engineers and
scientists, using slide rules and hand calculators, but with the advent of
large-scale data processing systems, the range of applications has been
broadened considerably to include economic, government, and social activities. Air traffic control, traffic simulation, library searching, and language
translation, are typical of the problems that have been attacked.
This Handbook is directed toward the problem solvers-the engineers,
scientists, technicians, managers, and others from all walks of life who are
concerned with applying technology to the mushrooming developments in
automatic equipment and systems. It is our purpose to gather together
in one place the available theory and information on general mathematics,
ix

x

PREFACE

feedback control, computers, data processing, and systems design. The
emphasis has been on practical methods of applying theory, new techniques
and components, and the ever broadening role of the electronic computer.
Each chapter starts with definitions and descriptions aimed at providing
perspective and moves on to more complicated theory, analysis, and applications. In general, the Handbook assumes some engineering training and
will serve as an information source and refresher for practicing engineers.
For management, it will provide a frame of reference and background material for understanding modern techniques of importance to business and
industry. To others engaged in various ramifications of automation systems, the Handbook will provide a source of definitions and descriptive
material about new areas of technology.
It would be difficult for anyone individual or small group of individuals
to prepare a handbook of this type. A large number of contributors, each
with a field of specialty, is required to provide the engineer with the desired
coverage. With such a broad field, it is difficult to treat all material in a
homogeneous manner. Topics in new fields are given in more detail than
the older, established ones since there is a need for more background
information on these new subjects. The organization of the material is in
three volumes as shown on the inside cover of the Handbook. Volume 1 is
on Control Fundamentals, Volume 2 is concerned with Computers and Data
Processing, and Volume 3 with Systems and Components.
In keeping with the purpose of this Handbook, Volume 1 has a strong
treatment of general mathematics which includes chapters on subjects not
ordinarily found in engineering handbooks. These include sets and iela~
tions, Boolean algebra, probability, and statistics. Additional chapters are
devoted to numerical analysis, operations research, and information theory.
Finally, the present status of feedback control theory is summarized in
eight chapters. Components have been placed with systems in Volume 3
rather than with control theory in Volume 1, although any discussion of
feedback control must, of necessity, be concerned with components.
The importance of computing in research, development, production, real
time process control, and business applications, has steadily increased.
Hence, Volume 2 is devoted entirely to the design and use of analog and
digital computers and data processors. In addition to covering the status
of knowledge today in these fields, there are chapters on unusual computer
systems, magnetic core and transistor circuits, and an advanced treatment
of programming. Volume 3 emphasizes systems engineering. A part of the
volume covers techniques used in important industrial applications by
examining typical systems. The treatment of components is largely concerned with how to select components among the various alternates, their
mathematical description and their integration into systems. There is also

PREFACE

xi

a treatment of the design of components of considerable importance today.
These include magnetic amplifiers, semiconductors, and gyroscopes.
We consider this Handbook a pioneering effort in a field that is steadily
pushing back frontiers. It is our hope that these volumes will not only
provide basic information on new fields, but will also inspire work and
further research and development in the fields of automatic control. The
editors are pleased to acknowledge the advice and assistance of Professor
Gordon S. Brown and Professor Jerome S. Wiesner of the Massachusetts
Institute of Technology, and Dr. Brockway McMillan of the Bell Telephone Laboratories, in organizing the subject matter. To the contributors
goes the major credit for providing clear, thorough treatments of their
subjects. The editors are deeply indebted to the large number of specialists
in the control field who have aided and encouraged this undertaking by
reviewing manuscripts and making valuable suggestions. Many members
of the technical staff and secretarial staff of The Ramo-Wooldridge Corporation have been especially helpful in speeding the progress of the Handbook.
EUGENE

M.

GRABBE

SIMON RAMO
DEAN E. WOOLDRIDGE

August 1958

CONTENTS

A.

GENERAL MATHEMATICS
Chapter 1.

Sets and Relations
1. Sets 1-01
2. Relations 1-05
3. Functions 1-06
4. Binary Relations on a Set 1-07
5. Equivalence Relations 1-07
6: Operations 1-08
7. Order Relations 1-09

1-01

8. Sets of Points . 1 - 10
References 1- 11

Chapter 2.

Algebraic Equations
1. Polynomials 2-01
2. Real Roots 2-03
3. Complex Roots 2-04
References 2-06

2-01

Chapter 3.

Matrix Theory
1. Vector Spaces 3-01
2. linear Transformations
3. Coordinates 3-04
4. Echelon Form 3-05
5. Rank, Inverses 3-07

3-01
3-03

Determinants, Adjoint 3-08
Equivalence 3-09
Similarity 3- 10
Orthogonal and Symmetric Matrices 3-13
10. Systems of Linear Inequalities 3- 14
6.
7.
8.
9.

References

3- 17
xiii

CONTENTS

xiv
Chapter 4.

Finite Difference Equations

4-01

1. Definitions 4-01
2. Linear Difference Equations 4-03
3. Homogeneous Linear Equations with Constant
Coefficients 4-04
4. Nonhomogeneous Linear Equations with
Constant Coefficients 4-05
5. Linear Equations with Variable
Coefficients 4-07
References 4-08
Chapter 5.

Differential Equations
1. Basic Concepts 5-01
2. Equations of First Order and First
Degree 5-02
3. Linear Differential Equations 5-04
4. Equations of First Order but not of First
Degree 5-07
5. Special Methods for Equations of Higher
than First Order 5-09
6. Solutions in Form of Power Series 5- 10
7. Simultaneous Linear Differential Equations
5-12
8. Numerical Methods 5- 14
9. Graphical Methods-Phase Plane Analysis
5-15
10. Partial Differential Equations 5-20
References 5-22

5-01

Chapter 6.

Integral Equations
1. Definitions and Main Problems 6-01
2. Relation to Boundary Value Problems 6-03
3. General Theorems 6-05
4. Theorems on Eigenvalues 6-06
5. The Expansion Theorem and Some of Its
Consequences 6-07
6. Variational Interpretation of Eigenvalue
Problem 6-08
7. Approximation Methods 6-10
References 6- 17

6-01

Chapter 7.

Complex Variables
1. Functions of a Complex Variable

7-01
7-01

CONTENTS

xv

2. Analytic Functions. Harmonic Functions
7-04
3. Integral Theorems 7-05
4. Power Series. Laurent Series 7 -08
5. Zeros. Singularities. Residues. Argument
Principle 7 - 11
6. Analytic Continuation 7-16
7. Riemann Surfaces 7 - 17
8. Elliptic Functions 7 - 1 8
9. Functions Defined by Linear Differential
Equations 7-21
10. Other Transcendental Functions 7-25
References 7-28

Chapter 8.

Operational Mathematics .

8-01

1. Heaviside Operators 8-01
2. Application to Differential Equations 8-05
3. Superposition Principle. Response to Unit
Function and Delta Function 8-06
4. Appraisal of the Heaviside Calculus 8-07
5. Operational Calculus Based on Integral
Transforms 8-07
6. Fourier Series. Finite Fourier Transform
8-10
7. Fourier Integral. Fourier Transforms 8-15
8. Laplace Transforms 8-17
9. Other Transforms 8-18
References 8- 19

Chapter 9.

Laplace Transforms

.

1. Fundamental Properties 9-01
2. Transforms of Derivatives and
Integrals 9-03
3. Translation. Transform of Unit Function,
Step Functions, Impulse Function (Delta
Function) 9-06
4. Convolution 9-08
5. Inversion 9-09·
6. Application to Differential Equations 9- 10
7. Response to Impulse Functions 9- 15
8. Equations Containing Integrals 9-1 8
9. Weighting Function 9- 18
10. Difference-Differential Equations 9-20

9-01

CONTENTS

xvi

11. Asymptotic Behavior of Transforms
References 9-21

Chapter 10.

9-21

Conformal Mapping

10-01

1. Definition of Conformal Mapping.
General Properties 10-01
2. Linear Fractiondl Transformations 10-05
3. Mapping by Elementary Functions 10-06
4. Schwarz-Christoffel Mappings 10-08
5. Application of Conformal Mapping to
Boundary Value Problems 10-09
References 10- 11

Chapter 11.

Boolean Algebra
1. Table of Notations
2.
3.
4.
5.
6.

Chapter 12.

11-01

11' -01
Definitions of Boolean Algebra 11-01
Boolean Algebra and logic 11-05
Canonical Form of Boolean Functions 11 -08
Stone Representation 11-09
Sheffer Stroke Operation 11 - 10
References 11-11

Probability

12-01

1. Fundamental Concepts· and Related
Probabilities 12-01
2. Random Variables and Distribution
Functions 12-04
3. Expected Value 12-06
4. Variance 12-11
5. Central Limit Theorem 12-13
6. Random Processes 12-18
References 12-:-20

Chapter 13.

Statistics
1.
2.
3.
4.
5.
6.
7.
8.
9.

Nature of Statistics 1 3-01
Probability Background 13-02
Important Probability Distributions
Sampling 13-06
Bivariate Distributions 13-13
Tests for Goodness of Fit 1 3 - 1 6
Sequential Analysis 13-16
Monte Carlo Method 13-17
Statistical Tables 13-18
References 13-21

13-01

13-04

CONTENTS

B.

xvii

NUMERICAL ANALYSIS
Chapter 14.

Numerical Analysis

14-01

1. Interpolation, Curve Fitting, Differentiation,
and Integration 14-01
2. Matrix Inversion and Simultaneous Linear
Equations 14-13
3. Eigenvalues and Eigenvectors 14-28
4. Digital Techniques in Statistical Analysis of
Experiments 14-48
5. Ordinary Differential Equations 14-55
6. Partial Differential Equations 14-64
References 14-88

C.

OPERATIONS RESEARCH
Chapter 15. Operations Research.

15-01

1. Operations Research and Mathematical
Models 15-02
2. Solution of the Model 1 5-1 0
3. Inventory Models 15-21
4. Allocation Models 15-31
5. Waiting Time Models 15-73
6. Replacement Models 15-86
7. Competitive Problems 15-99
8. Data for Model Testing 15- 115
9. Controlling the Solution 15-120
10. Implementation 15-123
References 15- 124

D.

INFORMATION THEORY AND TRANSMISSION
Chapter 16.

Information Theory .
1.
2.
3.
4.
5.
6.

Introduction 1 6-01
General Definitions 16-02
Simple Discrete Sources 16-08
More Complicated Discrete Sources 1 6- 1 9
Discrete Noiseless Channels 16-24
Discrete Noisy Channels I. Distribution of
Information 16-26
7. Discrete Noisy Channels II. Channel
Capacity and Interpretations 16-32
8. The Continuous Case 16-39
References 1 6-46

16-01

CONTENTS

xviii
Chapter 17.

Smoothing and Filtering

17-01

1. Definitions: Smoothing and Prediction.
Symbols 17-01
2. Definitions: Correlation 17-05
3. Relationship between Correlation and Signal
Structure 17-09
4. Design of Optimum Filter 17-1 3
5. Extensions of Procedure 17-19
6. Network Synthesis 17-25
References 17-32
Chapter 18.

E.

Data Transmission
1. Introduction and Symbols 18-01
2. Formation and Use of the Electrical
Signal 18-07
3. Transmission Impairment 18-18
References 18-30

18-01

FEEDBACK CONTROL
Chapter 19.

Methodology of Feedback Control

19-01

1. Symbols for Feedback Control 1 9-01
2. General Feedback Control System Definitions
19-04
3. Feedback Control System Design
Considerations 19-12
4. Selection of Method of Synthesis for
Feedback Controls 19-19
References 19-21'
Chapter 20.

Fundamentals of System Analysis .
1.
2.
3.
4.
5.
6.

Chapter 21.

20-01

Representation of Physical Systems 20-01
Classical Methods of Analysis 20-28
Block Diagrams 20-56
System Types 20-66'
Error Coefficients 20-70
Analysis of A-C Servos: Carrier Systems
20-79
References 20-84

Stability
1. Introduction 21-01
2. Classical Solution Approach

21-01
21-02

CONTENTS
3.
4.
5.
6.
7.
8.

Chapter 22.

xix

Routh's Criterion 21 -05
Nyquist Stability Criterion 2,1-09
Bode Attenuation Diagram Approach 21-29
Root Locus Method 21 -46
Miscellaneous Stability Criteria 21-71
Closed Loop Response from Open Loop
Response 21 -72
References 21-81

Relation between Transient and
Frequency Response .

22-01

1. Introduction 22-01
2. Response Characteristics Defined 22-02
3. Relation between Transient Response and
Location of Roots of Characteristic
Equation 22-03
4. Relation between Closed Loop and Open
Loop Roots 22-15
5. Design Charts Relating Open Loop Frequency
Response and Transient Response 22-18
6. Approximate Relations-Rules of Thumb
22-43
7. Numerical and Graphical Techniques of
Relating Transient and Frequency Response
22-43
References 22-61

Chapter 23.

Feedback System Compensation

23-01

1. Design Criteria and Techniques 23-01
2. Compensating Components: D-C Systems
23-18
3. Compensating Networks: A-C Systems
23-48
4. Open-Closed Loop Control 23-54
References 23-56
'

Chapter 24.

Noise, Random Inputs, and
Extraneous Signals
1.
2.
3.
4.

Introduction 24-01
Mathematical Description of Noise
Measurement of Noise 24-06
System Response to Noise 24- 11

24-01
24-02

xx

CONTENTS

5. System Design in the Presence of Noise
24-15
References

Chapter 25.

24-1 9

Nonlinear Systems

25-01

1.
2.
3.
4.

Definitions 25-01
General Nonlinear System Problem 25-03
Methods of Analysis: Linearization 25-07
Methods of Analysis: Describing Function
25-13 '
.5. Methods of Analysis: Phase Plane, Graphical
Solution of System Equations 25-36
6. Other Methods of Analysis 25-43
7. Nonlinear System Compensation 25-48
References 25-66

Chapter 26.

Sampled-Data Systems and
Periodic Controllers .
1. Description and Definition of Sampled-Data
System 26-01
2. 'Methods of Transient Analysis 26-06
3. Sampled-Data System Stability 26-15
4. Sampled-Data System Synthesis 26-20
References 26-32

INDEX

26-01

MATHEMATICS

A.

GENERAL MATHEMATICS
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
1 3.

R. M. Thrall and
W. Kaplan, Editors

Sets and Relations, by R. M. Thrall
Algebraic Equations, by R. C. Lyndon
Matrix Theory, by R. C. Lyndon
Finite Difference Equations, by G. E. Hay
Differential Equations, by G. E. Hay and W. Kaplan
Integral Equations, by E. H. Rothe
Complex Variables, by W. Kaplan
Operational Mathematics, by W. Kaplan
Laplace Transforms, by W. Kaplan
Conformal Mapping, by W. Kaplan
Boolean Algebra, by A. H. Copeland, Sr.
Probability, by A. H. Copeland, Sr.
Statistics, by A. B. Clarke

A

GENERAL MATHEMATICS

Chapter

1

Sets and Relations
R. M. Thrall

1.
2.
3.
4.

Sets
Relations
Functions
Binary Relations on a Set

1-01
1-05
1-06
1-07

5. Equivalence Relations

1-07

6. Operations
7. Order Relations
8. Sets of Points

1-08
1-09
1-10

References

1-11

1. SETS

A set is a collection of objects of any sort. The words class, family,
ensemble, aggregate are synonyms for the term set.
Each object in a set is called an element (member) of the set. If S denotes the set and b an element of S, one writes:
b~S;

this is read: "b belongs to S." If b does not belong to S, one writes: b ([ S.
Sets will generally be designated by capital letters, elements by lower
case letters.
IMPORTANT EXAMPLES OF SETS. Z, the set of positive integers z; Z consists of the numbers 1, 2, 3, ... ;
J, the set of all integers j (including 0 and the negative integers);
1-01

1-02

GENERAL MATHEMATiCS

Q, the set of rational numbers q (fractions alb, where a is an integer and
b is a positive int~ger) ;
R, the set of all real numbers r (numbers which are expressible as unending decimals) ;
C, the set of all complex numbers c (numbers of form x + yV -1, where
x and yare real).
In geometry one employs sets of points; for example, all points on a specified line or all points inside a circle.
rr....
Geometric diagrams can be helpful in
I'S
reasoning about sets which may have
T
~ no reference to geometry (Fig. 1).
~.
W A set can be designated by listing
JlLIIJJ,I.I........""'-I.U.J..1J...""""'-'--UllJ.~
its elements between braces. Thus

ill

S = {I, -3,7}
FIG. 1. Set and subset. T is a subset
of S, T c S.

is the set whose elements are the numbers 1, -3, and 7. For infinite sets
one still uses braces, but instead of writing all the elements one gives a
rule for set membership. For example, Z = {z I z is a positive integer} is an
abbreviation for liZ is the set of all z for which z is a positive integer."
This set is also sometimes designated (less precisely) by

Z = {I, 2, 3, "', n, ... }.
Two sets are said to be equal if they have exactly the same members.
For example, if

B = {3, 1,5, 1, 7},

A={1,3,5,7},

then A = B. Neither the order in which the elements are written down
nor the number of times that an element is repeated within the braces is
significant. If A is not equal to B, one writes: A ~ B.
Subsets. The set T is said to be a subset of the set S (Fig. 1) if every
member of T is also a member of S; in symbols,
T c S or S:J T.
If both T c Sand SeT, Sand T are equal.
If S :J T and S ~ T, i.e., if S contains every element of" T and at least
one element not in T, one says that S contains T properly or that T is a
proper subset of S; in symbols,

S> T

or

T ; the transpose of the relation "is the wife of" is the relation "is the husband of."
Range and DOlllain. Let R be a relation between elements a of A and
b of B. For each a in A one denotes by R(a) the set of all b in B for which
aRb; R(a) is called the image of a under R. For each subset S of A one
denotes by R(S) the set of all b in B for which aRb for at least one a in S;
R(S) is called the image of Sunder R. The image of A under R, the set
R(A), is called the range of R.
The counterimageof .an element b under R is the set of all elements a for
which aRb; this is the same as the set RT (b). The counterimage of a subset
U of B is the set of all a in A for which aRb for at least one b in U; this
is the same as the set RT (U). The set RT (B) is called the domain
of R.
A relation is sometimes called a correspondence between its domain and
its range.
.
Product of Rehitions. If R is a relation between elements a of A and b
of Band S is a relation between elements b of B and elements c of C, the
product relation (or composition) RS is defined as a relation between elements a of A and c of C, as follows: aRSc whenever for some b in Bone
has aRb and bSc.
EXAMPLE. The product can be illustrated by a communications network. Let A, B, C be three sets of stations, let aRb mean that a can communicate with b and let bSc mean that b can communicate with c. Then
aRSc means that there is a two-stage communication link from a through
some intermediate station b to c. For products of three relations one has
the associative law A (Be) = (AB)C.
3. FUNCTIONS

A relation F between elements a of A and bof B is said to be a function
if, for every a in A, F(a) is either empty or contains just one element. If
in addition F has domain A, one says that F is a function on A into B. In
this case one identifies F(a) with its unique element b and writes: b = F(a)
or
F
a -)0 b
The terms mapping and transformation are synonyms for function. A
function F on A into B can be defined directly as a correspondence which
assigns to each element a in A a unique image b = F(a) in B. The set B
is called a codomain of F. The range F(A) is a subset of B. If F(A) = B,
one says that F is a function on A onto B.
If both F and FT are functions, one says that F is a one-to-one function.

1-07

SETS AND RELATIONS

The transpose F is then termed the inverse of F and is denoted by F-l . A
one-to-one function on A onto B is called a one-to-one correspondence between
A and B. In this case one has

(FFT)a = FT(F(a)) = a for all a in A;
FFT is the identity function EA on A onto A: EA(a)

EE.

'

= a; similarly, FTF

=

In classical analysis, a function F is often denoted by F(x). The symbol
F(x) then has two meanings: the value of the function for a particular x,
and the function as a whole. Similarly, F(x, y) denotes a function of two
variables and also the value of the function for given x, y.
4. BINARY RELATIONS ON A SET
A relation R on a set A is said to be

Identical if R = E A, i.e., if aRb is equivalent to a = b;
Reflexive if R :::> EA, i.e., if aRa for all a in A;
Irreflexive if R n EA =0, i.e., if aRa for no a in A;
Transitive if R2 c R, i.e., if aRb and bRc imply aRc;
Symmetric if R = R T, i.e., if aRb implies bRa;
Antisymmetric if R n RT c E A, i.e., if aRb and bRa imply a = b;
Asymmetric if R n RT = 0, i.e., if for no a, b is aRb and bRa;
Acyclic if R n n EA = 0 for all n, i.e., if alRa2, a2Rag, .• " an-IRan imply al ~ an;
Complete if R U RT = A X A, i.e., if for each pair (a, b) either aRb or
bRa;
Trichotomous if R U RT U EA = A X A and R n RT = 0, i.e., if for
each pair (a, b) exactly one of the relations aRb, bRa, a = b holds.
Note that a relation is asymmetric if and only if it is antisymmetric and
irreflexive.
EXAMPLES. The parallel relation on lines in a plane is symmetric, irreflexive but not transitive (unless a line is defined to be parallel to itself).
The relation < on the real numbers is asymmetric, transitive, trichotomous,
and acyclic; whereas the relation ~ is antisymmetric, reflexive, transitive,
and complete. The relation "is at least as good as" is reflexive, transitive,
but not antisymmetric if there are two objects judged equally good.
5. EQUIVALENCE RELATIONS

By a partition of a set A is meant a subdivision of A into subsets, no two
of which have an element in common. By an equivalence relation R in, A
is meant a relation R in A which is reflexive, symmetric, and transitive.

GENERAL MATHEMATICS

1-08

Each partition of A determines an equivalence relation in A; aRb holds
when a and b are in the same subset of the partition. Conversely, each
equivalence relation determines a partition of A; the subsets of the partition are the sets R(a), i.e., the sets of form {b IbRa}. The sets R(a) are
called equivalence classes.
Equivalence is the basis of classification; the equivalence classes contain
elements which,·' although not identical, can be regarded as alike or interchangeable for some purpose. Example. The sorting of nuts and bolts is
based on the equivalence relation "has the same size and shape as." A
property shared by all elements of each equivalence class is called an invariant. More formally, let R be an equivalence relation on a set A. A
function F on A is said to be an invariant relative to R if aRb implies F(a)
= F(b). For example, if'R is the relation of congruence on a set A of triangles, then the function F(a) = area of triangle a is an invariant.
A set of invariants F, G, ... relative to 'a relation R is said to be complete if
F(a)

=

F(b), G(a)

=

G(b), ... together imply aRb.

The language "a necessary and sufficient for K is that Pl, P 2 , ••• all hold"
frequently states that Pl, P 2 , . : . are a complete set of invariants for an
equivalence relation associated with K. Many of the theorems of elementary geometry fall in this class.
One sometimes is interested in choosing from each equivalence class a
representative from which one or more invariants can be easily calculated.
Such representatives are said to be in normal form or standard form. More
technically, let R be an equivalence relation on a set A. A function which
assigns to each equivalence class R(a) one of its members is called a canonical form relative to R. Thus in matrix theory (Chap. 3) one has canonical
forms for row equivalence, equivalence, congruence, orthogonal congruence,
and similarity. It is customary to select a representative which displays a
complete set of invariants.
6. OPERATIONS
A function F assigning to each ordered pair (a,· b), with a in A and b in
B, an element c of set C is called a (binary) operation on A X B. If
F(a, b) = c, one also writes aFb = c. If A = B, F is called an operation
on A. If also C = A, F is called an interior operation; otherwise F is called
an exterior operation.
EXAMPLES. Addition and multiplication of numbers are interior operations. The scalar product of two vectors is an exterior operation.
Let F be an interior operation on A and let R be an equivalence relation

SETS AND RELATIONS

1-09

on A. One says that F has the substitution property relative to R, if aRa'
and bRb' imply (aFb)R(a'Fb'). Example. Let A be the set of all integers;
let aRb mean that a and b have the same parity (both even or both odd).
ThenaRa' and bRb' imply that (a + b)R(a' + b'). Thus addition has the
substitution property relative to R.
An exterior operation is said to have the substitution property relative
to R if aRa' and bRb' imply (aFb) = (a'Fb f ).
7. ORDER RELATIONS
~ on a set A is said to be a partial order if
(i) a ~ a (reflexivity),
(ii) a ~ band b ~ c imply a ~ c (transitivity),
(iii) a ~ band b ~ a imply a = b (antisymmetry).
If a ~ b and a ~ b, one writes: a < b. The relation < is then asymmetric:
(iv) for no a, b is a < band b < a;
it is also transitive. If a ~ band b < c, one writes: b ~ a and c > b
(transposition). An element a of A is said to be an upper bound for the
subset B of A if b ~ a for all b in B; if also a ~ c for every upper bound c
of B, one says that a is a least upper bound (l.u.b.) for B. An upper bound
for B which belongs to B is called a maximal element of B.
If in these definitions one replaces ~ by ~, the resulting concepts are
called lower bound, greatest lower bound (g.l.b.) and minimal element, respec-

A relation

tively.
The least upper bound (greatest lower bound) of a set, if it exists, is
unique.
The partial order is said to be a linear order or chain order, if it is complete:
(v) for every a, b either a ~ b or b ~ a.
EXAMPLES. The relation a ~ b between real numbers is a linear order;
there is no maximal or minimal element. The complex numbers x + yi
can be partially ordered by the definition: a + bi ~ c + di if a < c or if
a = c and b = d. Numbers with the same real part are not compared.
A partially ordered set A is said to be a lattice if each subset containing
two elements has a least upper bound and a greatest lower bound.
EXAMPLE. Let A be the class of all subsets of a given set B and let
S ~ T if S is a subset of T, i.e., if SeT. Then A is a lattice and l.u.b.
{S, T} = S U T, g.l.b. {S, T} = S n T. One extends this notation to
lattices generally and uses a U b (a cup b) for l.u.b. {a, b}, a
b (a cap b)
for g.l.b. {a, b}. If every subset of A has a g.l.b. and a l.u.b., then A is
called a complete lattice. The example given of a class of all subsets of a
given set is a complete lattice.

n

GENERAL MATHEMATICS

1-10

In a lattice the operations U and
, For all a, b, c in A

n have

the following properties:

a U b = b U a and a n b = b n a (commutative laws);
(a U b) U c = a U (b U c) and (a
b)
c= a
(b
c) (associa-

n n

n n

tive laws);

a U a = a, a n a = a (idempotent laws);
a n (a U b) = a, a U (a 'n b) = a (absorptive laws).

A lattice is said to be distributive if for all a, b, c in A
. a U (b

n c)

= (a U b)

n (a

U c)

or, equivalently, if a n (b U c) = (a n b) U (a n c) for all a, b, c.
If A has a minimal 'element and a maximal element, one ordinarily denotes them by 0, 1 respectively. Two elements a and b are said to be complements of each other if a n b = and a U b = 1. A lattice is said to be
complemented if each of its elements has a complement. In a distributive
lattice, no element can have more than one complement. A Boolean algebra is a complemented, distributive lattice. (See Chap. 11.)
EXAMPLE. The partially ordered set formed of the class of all subsets
of a given set B forms a Boolean algebra. The minimal and maximal elements are the empty set 0,' and the set B the complement is the same as
that defined in Sect. 1.
REMARK. A relation ;S which satisfies only conditions (i) and (ii) is
called a preorder or quasi-order. An example is the relation "is at least as
good as" between automobiles.

°

8. SETS OF POINTS
Sets of real numbers can be interpreted as sets of points on the real line,
For fixed a, b, a < b,

or number axis.

{x Ia < x < b} is an open interval,
{x Ia ~ x ~ b} is a closed interval,

{xla ~ x < b} or {xla < x ~ b} is a half-open interval.
For fixed a, e, e > 0, the set
. {xta - e

<

X

<

a

+ e}

is the e-:neighborhood of a. An arbitrary set of real numbers is open if each
element of the set has ane-neighborhqod . contained in the set. A set is
closed if its complement is open. A number a is a limit point of a set A if
every e-neighborhood of a contains at least one element of A differing from

a.

SETS AND RElATIONS

1-11

Sets of ordered number pairs (x, y) can be interpreted as sets of points
in the xv-plane. For fixed (a, b) and e > 0 the set
{(x, y) I (x - a)2

+ (y -

b)2

<

e}

is the e-neighborhood of (a, b). A set of points in the xv-plane is open if
each point (a, b) in the set has an e-neighborhood contained in the set. A
set is closed if its complement is open. A point (a, b) is a limit point of a
set A if every e-neighborhood of (a, b) contains at least one element of A
differing from (a, b). An open set is called an open region or domain if each
two points of the set can be joined by a broken line within the set. A
point (a, b) is a boundary point of set A if every e-neighborhood of (a, b)
contains at least one point of A and at least one point not in A. The
boundary of A is the set of all boundary points of A. A closed region is a
set formed of the union of an open region and its boundary. A point (a, b)
of a set A is called an isolated point of A if some e-neighborhood of (a, b)
contains no element of A other than (a, b).

REFERENCES·
The references for this chapter fall into several levels. The most elementary discussions of foundations and set theory are found in Refs. 1 and 4. References 5 and 7
are basic graduate level texts in the foundation of mathematics; Hef. 6 is at the same
level for general set theory. Reference 3 (Chap. 11) gives a simple introduction to
lattice theory and Boolean algebra. Reference 2 is a treatise on all phases of lattice
theory, including preorder, partial order, and Boolean algebra.
1. C. B. Allendoerfer and C. O. Oakley, Principles of Mathematic!). McGraw-HilI,
New York, 1955.
2. Garrett Birkhoff, ~attice 'Theory, American Mathematical Society, New York, 1948.
3. Garrett Birkhoff and Saunders MacLane, A Survey of llIodern Algebra, Macmillan,
New York, 1941.
4. J. G. Kemeny, J. L. Snell, and G. L. Thompson, Introduction to Finite },{athematics, Prentice-Hall, Englewood Cliffs, New Jersey, 1957.
5. R. B. Kershner and L. R. ·Wilcox, The Anatomy of Mathematics, Ronald, New York,
1950.
6. Erich Kamke, Theory of Sets, Dover, New York, 1950.
7. R. L. Wilder, Introduction to the Foundations of Mathematics, Wiley, New York,
1952.

A

GENERAL MATHEMATICS

Chapter

2

Algebraic Equations
R.

1. Polynomials

c. Lyndon

2-01
2-03
2-04
2-06

2. Real Roots
3. Complex Roots
References

1. POLYNOMIALS

A polynomial may be defined as a function f = f(x) defined by an equation
f(x) = anx n
an_IX n- 1
+ alX ao,

+

+ ...

+

where the coefficients ao, at, "', an are constants (real or complex) and x
is variable (real or complex). The leading coefficient an will be assumed
~ O. The degree of f is n. A polynomial a2x2 + alX + ao of degree 2 is
quadratic; a polynomial alX + ao, of degree 1 is linear; we accept the constant polynomials: f(x) = ao, although the zero polynomial f(x) == 0 must
be tacitly excluded from certain contexts.
An algebraic equation Of degree n is an equation of form: polynomial of
degree n in x = 0; that is, of form
f(x) = anx n + ... ao = 0

+

A root of sueh an equation is a value of x whieh satisfies it; a root of the
equation is called a root of f(:r) or a zero of f(x). Thus r is a root of f(x) if
and only iff(r) = O.
2-01

GENERAL MATHEMATICS

2-02

The fundamental theorem of algebra a3serts that an algebraic equation of
degree n (n = 1, 2, ... ) has at least one root (real or complex) (Refs. 1, 2).
From this it follows that an algebraic equation of degree n has exactly 11,
roots (some of which may be repeated, see below).
The operations of addition, subtraction, multiplication, and division of
polynomials will be assumed to be familiar.
Synthetic division is an abbreviation of division by a linear polynomial,
x-c. As an illustration, the division of 3x 2 - 7x
11 by x - 2 is carried out in long form and by synthetic division.

+

3x - 1
x -

21 3x 2

-

3x 2

-

7x
ox
X

x

+

11

+ 11
+ 2

213 - 7
111
o 6- 2
3 - 1
9

9

Either method yields the quotient 3x - 1 and remainder 9, so that
3x 2 ~ 7x

+ 11 =

(3x - 1)(x - 2)

+ 9.

In the synthetic process, on the first line one replaces x - 2 by +2,
3x 2 - 7x + 11 by the numbers 3, -7, 11. A zero is placed below the 3
and added to yield 3; the result is multiplied by 2 to yield 6; the 6 is added
to -7 to yield -1; the -1 is multiplied by 2 to yield - 2; the - 2 is
added to 11 to yield 9. On the third line the coefficients of the quotient,
3x - 1, and the remainder, 9, appear in order.
REMAINDER THEOREM. If a polynomial f(x) is divided by x - c, then the
remainder is f(c).
FAc'roR THEOREM. C is a root of f(x) if and only if x - C is a factor of
f(x) (Ref. 2).
ApPLICATION. If one root, c, of f(x) has been found, the remaining roots
of f(x) will be roots of the quotient polynomial f(x) -;-. (x - c), which is of
degree n - 1. Repetition of this reasoning leads to a representation of f(x)
as a constant times a product of linear factors (x - CI), (x - C2),
Since f(x) is of degree n there must be exactly n such factors:

Thus f(x) has n roots Cl, C2, ••• , Cn, some of which may be equal. If CI is
repeated m times, so that (x - CI)m is a factor of f(x) (and (x - Cl)m+l is
not a factor), then Cl is a root of multiplicity m.

ALGEBRAIC EQUATIONS

2-03

If c is a repeated root of f(x) (a root of multiplicity
2 or more), then c will also be a root of f' (x), the derivative of f(x),
(n - 1)an_lx n- 2
al.
f'ex) = nanx n- l
Repeated Roots.

+

+ ... +

To find the repeated roots, one can proceed as follows.
fo(x) = f(x)~

Let

fleX) = f'ex),

and by division obtain
fo(x) = ol(x)fl(X)

+ hex),

where hex) is of degree lower than that of fl (x).
ft-l(X) = Ot(x)ft(x)

Continue, taking

+ ft+l(X),

until ft+I(X) = o. Then the repeated roots of f(x) are the roots of ft(x).
If ft(x) is a (non-zero) constant, f(x) has no repeated roots. Otherwise all
repeated roots of f(x) can be found as the roots of ft(x), which has degree
lower than that of f (Ref. 2).
2. REAL ROOTS

In this section f(x) denotes a polynomial with real coefficients. If f(x)
is of odd degree, f(x) has at least one real root, whereas x 2 + 1, for example, has no real roots. Two problems will be considered: (1) establishing
existence of real roots, perhaps within prescribed intervals; (2) computing
to a satisfactory accuracy the value of a root that has been approximately
located.
Graphical Methods. One plots the graph of y = f(x). The roots of
odd multiplicity are the values of x at which the curve crosses the x-axis,
while at roots of even multiplicity the curve is tangent to the x-axis. If
f(XI) and f(X2) have opposite signs, there is a root between Xl and X2. In
practice, one could use synthetic division to compute the values of f(x) for
a number of values of x within some interval a ~ X ~ b. The values a and
b can be chosen so that all roots lie between a and b; in particular, all real
roots lie in the interval

_(M
+ 1) ~ x ~ M+ 1,
lanl
lanl
where M is the largest of the numbers Iao I, Ial I, ... , Ian-l I. Narrower
bounds can often be found by inspection. If in computation of f(b) by
synthetic division, the third row consists of non-negative numbers, then no
real root exceeds b. An alternative criterion is Newton's rule: if the values
f(b),' f' (b), ... , f(n) (b) of the successive derivatives are all non-negative,
then no root exceeds b. These last two rules can be applied to the equa-

2-04

GENERAL MATHEMATICS

tion obtained by replacing x by - x, in order to obtain a lower bound a.
The following rule is sometimes useful: if g(x) = xnf(llx) and if g has all
of its real roots between -b and +b, then f has no real roots between
- (lIb) and (lib).
Derivative. The value f'(C) of the derivative at c gives the rate of
increase (decrease, if l' (c) < 0) of f(x) at x = c. At an extremum (relative
maximum or minimum) of f(x) , 1'(x) is zero; there can be at most n - 1
such values of x (critical points of f(x)).
ROLLE'S THEOREM. Between each two real roots of f(x) there is at least one
critical point.
Descartes and SturIn Tests. Zero is a root of f(x) only if ao = O. By
division by x or some power of x, all zero roots can be removed. Information about the number of positive roots is given by:
DESCARTES'S RULE. The signs of the coefficients an, an-b· .. , ao in order,
omitting possible zeros, form a string of +'s and -'s. The number v of
alternations in sign is defined as the number of consecutive pairs + - or
- +. The number p of positive roots is no greater than v and v - p is even.
(Negative roots of f(x) are the positive roots of f( -x).)
EXAMPLES. x 2 + X + 1 = 0, v = 0, no positive roots; x 2 - 2x + 3 = 0,
v = 2, 0 or 2 positive roots; x 2 + 2x - 3 = 0, v = 1, 1 positive root. A
more precise criterion is given by:
STURM'S THEOREM. Write fo(x) = f(x), fleX) = f'ex) and, stepwise,
ft-I(X) = qt(x)ft(x) - ft+I(X), where ft+I(X) is of lower d~gree than ft(x).
Continue until some fm+I(X) = o. Now suppose a < b, f(a) ~ 0, feb) ~ O.
Let v(a) be the number of alternations in sign in the sequence of values it (c),
f2(C), ... , fm(c) (zeros omitted). Then v(a) - v(b) is the exact number of
distinct real roots between a and b (Ref. 2).
Newton's Method. If Xl is an approximate value of a root of f(x) then
one sets

The sequence of numbers thus defined converges to a real root of f(x),
provided f(XI)f"(XI) > 0 and it is known that Xl lies in an interval containing a root of f(x) but none of 1'(x) or of f"(x) (Ref. 2).
3. COMPLEX ROOTS

Let fez) = anzn + ... + ao be a polynomial in the complex variable z,
z = x + iy, i = V -1. The coefficients are allowed to be real or complex.
If they are real, complex roots of fez) come in conjugate pairs, x ± yi, so
that the total number of nonreal complex roots is even.

ALGEBRAIC EQUATIONS

2-05

If f(z) is of degree 2, 3, or 4, explicit algebraic formulas for all roots are
available (Ref. 1).
It is proved in Galois theory that similar formulas for equations of
higher degree do not exist (Ref. 1).
Equations for Real and Imaginal'Y Parts. Replacement of z by
x + iy in the equation J(z) = 0 and equating real and imaginary parts
separately to zero leads to two simultaneous equations in the real variables
x, y. These can be solved by elimination.
EXAMPLE. Z3 - Z
1 = O. Replacement of z by x
iy leads to the
equations x 3 - 3xy2 - X + 1 = 0, 3x 2y - y3 - Y = O. To find nonreal
roots, one assumes y ~ 0 and is led to the equations 8x 3 - 2x - 1 = 0,
y2 = 3x 2 - 1. The first has one real root x = 0.66. Hence 0.66 ± 0.55i
are the nonreal roots of the equation.
Application of Argument Principle. The argument principle, when
applied to the polynomial J(z) , states that the total change in the argument
(polar angle) of the complex number w = f(z), as z traces out a simple
closed path (circuit) C, equals 27r times the number of zeros of J(z) inside
C (provided J(z) ~ 0 on C). (See Chap. 7, Sect. 5.) The path C can be
chosen as a circle, semicircle, square, or other convenient shape, and the
variation of the argument of w can be evaluated graphically. One can pass
to the limit from a semicircle in order to find the number of roots in a halfplane. This is the basis of the Nyquist criterion (Chap. 21) .
.In general, no root can lie outside the circle with center at z = 0 and
radius 1
(1lI/ Ian I), where ill is the largest of Iao I, Ial I, "', Ia n-l I
(Ref. 2).
Hurwitz-Routh Criterion. This is a rule for determining whether all
roots of J(z) lie in the left half-plane (i.e., have negative real parts). For
a given sequence co, CI, " ' , Cn, " ' , one denotes by ~k the determinant

+

+

+

~k

so that

.11

=

=

Cl

Co

0

0

0

C3

C2

Cl

Co

0

C2k-l

C2k-2

Ck

CI,

d2

=

Cl

Co

C3

C2

1

I'

~3

=

Cl

Co

0

C3

C2

Cl

C5

C4

C3

For a given polynomial J(z) = coz n + CIZ n - 1 + ... + Cn with real coefficients and Co > 0, one forms ~I, " ' , ~n-I, with Ck replaced by 0 for

GENERAL MATHEMATICS

2-06

All roots of fez) lie in the left half-plane if and only if Lli > 0,
"', Ll n - I > (Ref. 3).
Graffe's Method. Graffe's method is efficient for finding a complex root,
or successively all roots, of a polynomialf(z). For simplicity, suppose that
fez) has no repeated roots, as can always be arranged by the methods indicated above (Sect. 1). One must further suppose that fez) has a single root
To of maximum absolute value; if this fails for fez) it will hold for the new
polynomial g(z) = fez + c) for all but certain special values of c. It is
necessary to have some rough idea of the argument of the root TO; for
example, if To is real, to know whether it is positive or negative.
Starting with the polynomial fez) = fl (z) = zn + alzn- I + .. " one
forms fl (-z). The product ft (z)ft (-z) contains only even powers of z,
hence is of the form fl (z)ft (-z) = h(z2). Similarly, h(z) is formed from
!2(Z): h(Z2) = f2(Z) 'h( -z), and the process is continued to form a sequence of polynomials fk(Z) = zn + akZn-1 + .. '. (As justification note
that ik has roots which are the 2kth powers of the roots of f; that -ak is
the sum of the roots of fk, and hence that the ratio of -ak to T02k approaches
1 as k ---7 (0). One chooses a value Zlc of the 2kth root of - ak; the choice
of Zk is made to agree as closely as possible in argument with the initial
estimate for the argument of To. The successive values Zl, Z2, ... can be
expected to approach TO rapidly.
After the root of largest absolute value has been found, one could divide
out the corresponding factor and proceed to find the root of next largest
absolute value. In practice, it is generally more efficient to use an elaboration of Graffe's method (Ref. 7).
k

> n.
> 0,

Ll2

°

REFERENCES
1. Garrett Birkhoff and Saunders MacLane, A Survey of Modern Algebra (Revised
edition), Macmillan, New York, 1953.
2. L. E. Dickson, First Course in the Theory of Equations, Wiley, New York, 1922.
3. E. A. Guillemin, The ~Mathematics of Circuit Analysis, Wiley, New York, 1949.
4. C. C. MacDuffee, Theory of Equations, Wiley, New York, 1954.
5. J. V. Uspensky, Theory of Equations, McGraw-Hill, New York, 1948.
6. L. Weisner, Introduction to the Theory of Equations, Macmillan, New York, 1938.
7. F. A. Willers, Practical Analysis, Dover, New York, 1948.

A

GENERAL MATHEMATICS

Chapter

3

Matrix Theory
R. C. Lyndon

1. Vector Spaces

3-01

2. Linear Transformations

5. Rank, Inverses
6. Determinants, Adjoint

3-03
3-04
3-05
3-07
3-08

7. Equivalence

3-09

3. Coordinates
4. Echelon Form

8. Similarity
9. Orthogonal and Symmetric Matrices
10. Systems of Linear Inequalities

3-10
3-13
3-14
3-17

References

1. VECTOR SPACES

Let F denote the rational number system, or the real number system, or
the complex number system; in the following, elements of F are termed
scalars and are denoted by small Roman letters a, b, c, .... A vector space
V over F is defined (Ref. 9) as a set of elements called vectors, denoted by
small Greek letters a, {3, 'Y, "', for which the operations of addition: a + {3
and multiplication by scalars: aa are defined and satisfy the following rules:
(i) For each pair a, {3 in V, a + {3 is an element of V and a + {3 = {3 + a,
a + ({3 + 'Y) = (a + (3) + 'Y;
3-01

GENERAL MATHEMATICS

3-02

(ii) For each a in V and each a in F, aa is an element of V and, for
arbitrary b in F and (3 in V
a(a

+ (3)

= aa

+ a{3,

(a

a(ba) = (ab )a,

+ b)a = aa + ba,
la

=

a;

(iii) For given a, {3 in V, there is a unique vector 'Y in V such that

+ 'Y = {3. In particular, there is a unique vector denoted by 0 such that
a + 0 = a for all a in V.
a

When F is the real number system, V is called a real vector space; when
F is the complex number system, V is a complex vector space. The system
F can be chosen more generally as a field (Ref. 9). The vectors of mechanics in 3-dimensional space form a real vector space V. In terms of a
coordinate system, the elements of V are ordered triples (x, y, z) of real
numbers; addition and multiplication by real scalars are defined as follows:
(Xl, yl,

Zl)

+ (X2' Y2, Z2)

=

(Xl

+ X2, YI + Y2, Zl + Z2),

a(x, y, z) = (ax, ay, az).

A vector a is said to be a linear combination of vectors aI, ... , an if

for appropriate choice of aI, ... , an. An ordered set {aI, ... , an} is said
to be independent if no member of the set is a linear combination of the
others or, equivalently, if
al al

+ ... + anan =

0

implies al = 0, ... , an = o. If the ordered set S = {al, ... , an} is independent and a is a linear combination of its elements (is linearly dependent
on S), then the scalars al, ... , an can be chosen in only one way so that
a = ~iaiai.
If there is a finite set S = {aI, ... , an} such that every a in V is linearly
dependent on S, then V is said to be of finite dimension. For the remainder
of this chapter, only vector spaces of finite dimension will be considered; this is,
however, not the only case of importance. If S = {al, ... , an} is independent and every a of V is a linear combination of these vectors, then S
is said to constitute a basis for V. Every finite dimensional vector space
has at least one basis, all bases have the same number, n, of elements; n is
the dimension of V.
A subset W of V is said to be a subspace of V if, with the operations as
defined in V, W is itself a vector space. A subset W will be a subspace of
V if, whenever a, {3 are in W, a + {3 is in W, and aa is in W for every a in

MATRIX THEORY

3-03

F. In particular, {O} is a subspace, as is V itself. The intersection (Chap.
1, Sect. 1) W n U of two subspaces of V is a subspace of V; it is the largest
subspace contained in both TV and U. The union TV U U is not usually a
subspace; the smallest subspace containing vVand U is rather their (linear)
sum W
U, consisting of all vectors a
{3, a in lV, and {3 in U. If
W n U = 0, then W + U is called a direct sum, and is often denoted by
WEB U or W
U; in this case every vector in W + U is expressible
uniquely as a + {3, a in W, {3 in U. For any set of vectors {aI, "', an},
the set of all their linear combinations constitutes a subspace, and the subspace is spanned by them. Every independent set is a subset of a basis.
From this it follows that, for each subspace W, there exists U (in general,
many) such that V is the direct sum of Wand U.

+

+

+

2. LINEAR TRANSFORMATIONS

Let f be a transformation (function, mapping) (Chap. 1, Sect. 3) of
vector space V into a second space V'; f is said to be linear if for all a, /3,
a, b,
f( aa + b(3) = af( a) + bf({3).
The image of V under f, denoted by f(V), is the set of all vectors f(a) for
a in V; f(V) is a subspace of V'. If f(V) = V', f is said to map V onto V'.
The null space of f, denoted by N (f), is the set of all vectors a in V such
that f(a) = 0; N(f) is a subspace of V. If N(f) contains only the element 0, f is said to be nonsingular; this is equivalent to the condition that
f be one-to-one (Chap. 1, Sect. 3); a nonsingular transformation is termed
an isomorphism of V onto f(V). The rank of f is defined as the dimension
of f(V); this equals the dimension of V minus that of N(f). The mapping
f is nonsingular if and only if its rank is maximal, that is, equals the dimension of V. If W is chosen so that V is the direct sum of N(f) and W,
and W has dimension greater than 0, then the restriction of f to W is a
nonsingular mapping of W onto f(V), that is, an isomorphism of W onto
f(V). If f is an isomorphism of V onto V', then the inverse transformation
fT = f- I is a linear transformation of V' onto V.
The set of all linear transformations of V into V' becomes itself a vector
space over F, if addition and multiplication by scalars are defined by the
rules: .
f + g is the transformation such that (f + g)a = f(a) + g(a) for all a
in V;
.
af is the transformation such that (af)a = a[f(a)] for all a in V.
If f maps V into V' and g maps V' into V", following f by g defines the
composite transformation fg of V into V"; explicitly, fg(a) = g[f(a)]. If f,
g are linear, so also is fg (Refs. 2, 8, 9).

GENERAL MATHEMATICS

3-04
3. COORDINATES

Let {ab " ' , an} be a basis for the vector space V, so that every vector
a in V can be written uniquely in. the form ~aiai. The ai are the coordinates of a relative to the chosen basis; the ai are also termed components,
but this word is sometimes used for: the terms aiai. The choice of a definite
basis is often necessary for computation. With a fixed basis understood,
one can replace each vector a by the corresponding n-tuple (ab "', an);
then

A basis that is natural at one stage of a problem may not be the most
advantageous at a later stage, so that one must be prepared to change
bases.
If a basis ab " ' , an is chosen for V and a basis a'b' . " a'm for V',
then each linear transformation of V into V' can be assigned coordinates
as follows. The transformation f is fully determined by the images f(ai)
of the basis elements for V. If f(ai) = ~jaija'j, thenf may be characterized
by the n·m scalars aij, where i = 1, .. " n, j = 1, " " m. These numbers
are usually thought of as arranged in a rectangular array, or matrix

A=

1l

a12

[aa21

a22

anI

an2

a'a2mm1

. . . = (ai;).
anm

One terms A the matrix representing the transformation f relative to the
given bases in V and V'.
If g is a second transformation from V into V', with matrix B = (b ij ) ,
it is clear that the transformation f + g will have the matrix (aij + bij ).
Accordingly, one defines the sum of two n by m matrices as follows:

Similarly, the product cA, which represents cf, is defined as the matrix
(caij) .
Now let f be a linear transformation of V into V', g a linear transformation of V' into V", where V, V' have bases as before, and V" has a basis
a/'l, "', a"p. Relative to these bases, f is represented by an n by m

MATRIX THEORY

3-05

matrix A = (aij), g by an m by p matrix B = (b ij ), fg by an n by p matrix
C = (Cij). Since
(fg)(ai) = 2: 2: aijbjka"k
k

j

one finds
7n

Cik =

2: aijbjk ;
j=l

correspondingly, one defines the product of two matrices A and B (where
the number of columns of A equals the number of rows of B) to be the
matrix C = AB, where the elements Cik of C are given by the above "rowby-column" rule. Multiplication of matrices is not commutative, but is
associative and distributive: A(BC) = (AB)C, A(B + C) = AB + AC,
(A + B)C = AC + BC.
H a = ~aiai is a vector with coordinate representation (at, "', an), one
can regard the n-tuple as a 1 by n matrix. The product aA can then be
evaluated as that of a 1 by n matrix and an n bym matrix. The result is
the 1 by m matrix

IVA
.....

=

(~a.a'l
L..J
t

t ,

i=l'

... ,L..J
.; a·a·
t tm )
i=l

which represents f(a):
f(a)

= f (~aiai) = ~ ai/(ai) = ~
t

t

t

2;: aiaija'j
J

This shows that, when bases are chosen in V and V', each matrix A is the
matrix of a linear transformation (Ref. 2, 9).
4. ECHELON FORM

The matrix A associated with a linear trarisformation f from V to V'
can be given an especially simple form by suitable choice of basis for V,
for V', or for both. We consider the effect of a change of basis for V.
Every change of basis for V can be effected by a sequence of elementary
transformations of the following types: (1) replacement of ai by a scalar
multiple cai, c ~ 0; (2) renumbering, interchanging ai and aj; (3) adding to
ai some multiple of aj, j ~ i, so that ai is replaced by ai + caj (and aj is
left unchanged). The effect of each transformation is to carry out the
analogous operation on the rows of the matrix A = (aij). Thus (1) multiplies each element of the ith row by c, (2) interchanges ith and jth rows,
(3) replaces the ith row by (ail + cajl, "', aim + Cajm)'

GENERAL MATHEMATICS

3-06

A matrix is said to be in (strict) echelon form (Ref. 9) if:
(i) The leading element (first nonzero element) in each nonzero row appears farther to the right than that of any preceding row;
(ii) The leading elements are all 1;
(iii) Only zeros appear in the same column with a leading element;
(iv) All zero rows (if any) appear at the bottom.
By a zero row (or column) is meant one consisting wholly of zeros.
EXAMPLE. The following matrix is in echelon form.

0 1 3
000
A=
[o 0 0
o 0 0

0 0 5]
107
0 1 2
000

Each matrix can be reduced to echelon form by elementary transformations on its rows, as follows:
Step 1. If the first column is a zero column, leave it untouched and
proceed to the matrix formed by the remaining columns. If the first
column is not a zero column, permute rows so that an ~ O. Dividing
this row by all gives a new matrix with all = 1. Subtracting suitable
multiples of this row from the other rows makes all ail = 0 for i ~ 1.
The matrix now has a first column which is a zero column, or else it has
all zeros except for a 1 in the top position. Leave the first row and column
untouched and proceed to the matrix formed by the elements not in the
first row or column.
Step 2. Repeat this process as long as possible. The resulting matrix
will satisfy (i), (ii), and (iv).
Step 3. To obtain (iii), subtract suitable multiples of each row from
earlier rows to convert the elements in these rows above the leading element of the given row into zeros. The result may be stated as follows:
Every matrix is row-equivalent to an echelon matrix and (it can be shown)
to a unique echelon matrix.
Application to Systmns of Equations (Ref. 9).
A system of m
linear equations in n unknowns
n

L:

aijXj

= Ci

(i = 1, "', m)

j=l

can be replaced by a single matrix equation
AX = C,

3-07

MATRIX THEORY

where A

(aij) and X, 0 are column vectors:

X=

[J

c= [ }

Let B be the augmented matrix of the system, obtained by adjoining -0
as (n + l)st column to A. The usual manipulations of equations employed
to successively eliminate (so far as possible) the unknowns Xl, X2, " ' , Xn
correspond to elementary transformations on the matrix B. If the result
were the echelon matrix of the above example, one would have obtained
the equivalent system:

X4

+7=

0

Since Xl, X3 do not appear in leading terms, they can be assigned arbitrary
values; the general solution can be obtained immediately from the given
equations:
Xl

arbit.,

X3

arbit.,

X5

= -2.

If a row (00 ... 01) had appeared, there would be an equation 1 = 0, as a
consequence of the original system, which would therefore be inconsistent
and have no solution.
5. RANK, INVERSES

The rank of a linear transformation I of V into V' was defined (Sect. 2)
as the dimension of the image space I(V). If I has matrix A, then I(V) is
the row-space of A; that is, the subspace of V' spanned by the vectors consisting of the rows of A. The rank of A is defined as the dimension of the
row-space of A; hence the rank of A equals the rank of I. It can be shown
that the rank of A also equals the dimension of the column-space of A.
The rank is unaltered by elementary transformations and can be determined by inspection for an echelon matrix, where it is simply the number
of nonzero rows.
Let I be a one-to-one linear transformation of V onto V', so that I has a
linear inverse 1-1 (Sect. 2). The spaces V, V' must have the same dimen-

3-08

GENERAL

MATHEM~TICS

sion m and i, i-I are represented by nonsingular square matrices A, B such
that AB = BA = I, where

I

=

Im

= [:

:

~

.:]

is the m by m identity matrix. If A is an arbitrary square nonsingular
matrix, there exists a unique inverse A-I such that AA -1 = I (which implies A-I A = 1). Hence B must be A -1. The echelon matrix for a square
nonsingular matrix A is I; the inverse A -1 may be obtained by applying
to I the same sequence of elementary transformations that carry A into
its echelon form I. The inverse has the properties
(CA)-1 = c-1A-t,
6. DETERMINANTS, ADJOINT

By a permutation p of the set of integers 1, 2, .. " m is meant a function
p:k ~ k' = p(k) which is a one-to-one transformation of this set onto
itself (Ref. 2). Each such permutation is classified as even or odd according
as the polynomials in m variables

are the same or nega ti ves of each other.
EXAMPLE. If m = 3, and p(l) = 3, p(2) = 1, p(3) = 2, then p is even,
since

One denotes by sgn p the value 1 if p is even, the value -1 if p is odd.
The determinant (Refs. 1,9) det A of a square m by m matrix A = (aij)
is defined to be the scalar
det A =

L: sgn p.

alp(1) • a2p(2)' •••

• amp(m)

p

where the sum is over all permutations p of 1,2, "', m. If A is singular,
det A = O. For nonsipgular A, det A ~ 0 and det A -is (_l)h times the
product of the scalars c appearing in the elementary transformation of type
(1) (Sect. 4) used in reducing A to the echelon form I, where h is the number of transformations of type (2).

MATRIX THEORY

3-09

I,et Aij denote the submatrix of A obtained by deleting the ith row and
jth column. Then for any fixed i,
m

det A

=

L

(-l)i+jaij'det Aij;

j=1
there is an analogous result for expansion according to a fixed column j.
One calls det Ai; the minor of aij, and the expansions of det A are called
expansions by minors.
EXAMPLE.

= all (a22 a33 - a23 a32) - a12(a21 a33 - a23 a31)

+ a13(a21 a32

- a22 a31).

The adjoint (adD A of a square matrix A is the matrix B = (b ij ), where
bij = (-l)i+j det Aji
(note the reversal of indices). One has the rule
adj A·A = (det A)·I
and, if det A

~

0,
A-I = (det A)-I· a dj A,

adj A = (det A).A -1.
CRAMER'S RULE.

If det A

~

0, the system

m

L

aijXj = Cj

(i = 1 ... m),

j=1
has a unique solution
Xi =

det A (i)

detA

,

where A (i) is the matrix obtained from A by replacing the i-th column by
(Ref. 2, 9).

Cl, " ' , Cm

7. EQUIVALENCE

Let f be a linear transformation of V into V'. It has been seen (Sect. 4)
that the matrix A for f can be put in echelon form by a suitable change of
basis in V. If V'is not the same space as V, one can further simplify A
by independently changing the basis for V'. This effects elementary trans-

GENERAL MATHEMATICS

3-10

formations on the columns of A; by successive subtractions of multiples of
earlier columns from later ones, followed possibly by a renumbering of the
basis, A can be reduced to the form
Jr

_ (Ir 0)

-

o

Ir

0

where
is the r by r identity matrix (Sect. 5) and the O's stand for rows
and columns consisting wholly of zeros; J r is a rectangular n by m matrix,
just as was the given matrix A. For the matrix in echelon form in the
example of Sect. 4 the matrix J r would be

[

~ ~ ~ ~ ~ ~l

o
o

0 1 000
0 0 0 0 0

The effect of a change of basis in V is to replace A by P A, where P is a
nonsingular n by n matrix; the effect of a change of basis in V' is to replace A by AQ, where Q is a nonsingular m by m matrix. The matrix B is
said to be equivalent to matrix A if B = P AQ for some nonsingular P and
Q. This is a proper equivalence relation (Chap. 1, Sect. 5). The reasoning
given above then gives the conclusion: Every A is equivalent to a unique
matrix of the form Jr. In other words, the matrices J r (for various r, m,
and n) are a set of canonicallorms under equivalence (Ref. 9).
8. SIMILARITY

One now considers the possible matrices A representing a linear transformation I of the vector space V into itself. The field F of scalars will be
assumed to be the complex number system. Since V' = V, one can no
longer change bases in V and V' independently. Indeed, let a'i = '1;j Pijaj
be equations defining a new basis a'1, ... , a' n in V. Then P = (Pij) is a
nonsingular matrix with inverse p- 1 = (qij) , and ak = '1;h qkha ' h. Let I
have the matrix A = (ajk) relative to the basis ai, SO thatl(ai) = '1;k ajka'k.
Then
I(a'i)

=

L L L
h

j

Pijajkqkhcih,

k

and I has the matrix PAP- 1 relative to the basis a'1, ... , a' n. The square
matrix B is said to be similar to square matrix A if B = PAP- 1 for some
nonsingular matrix P. Hence change of basis in V replaces the matrix of
I by a similar matrix. Similarity is an equivalence relation (Chap. 1, Sect.
5) in the class of square matrices (Ref. 9).
If A can be reduced to a similar matrix of sufficiently simple form, most

MATRIX THEORY

3·11

of the important properties of A can be read off. The ideal situation is
that in which A is similar to a diagonal matrix; that is, a matrix (aij) in
which aij = 0 for i ~ j. Unfortunately, not every A is similar to a diagonal
matrix, and the various canonical forms are approximations to the diagonal
form.
If A is similar to
Al 0

B = diag (AI, ... , An)

0

A2

_0

0

:.J

then in terms of the new basis aI, ... , an associated with B one has

f(al) = alB = AlaI, ... , f(a n) = anB = Anan.
In general, if a vector a ~ 0 is such that aA = Aa for some scalar A, then
A is called an eigenvalue (characteristic value, latent root) of A, and a is
called an eigenvector belonging to A. The characteristic polynomial for A is
the polynomial cp(x) = det (xl - A); this is a polynomial

cp(x) = Co

+ CIX + ... + cnxn

of degree n, and its n roots (real or complex) are the eigenvalues of A. In
particular,
(-l)nco = det A = AI·A2 ..... An,

-Cn-l = au

+ ... + ann

= Al

+ ... + An

= trace of A, and Cn = l.

The HAMILTON-CAYLEY THEOREM (Ref. 9) states that A satisfies its characteristic equation:

cp(A) = col

+ cIA + ... + cnA n = o.

If the roots of cp(x) are distinct, then A is similar to B = diag (AI, ... , An).
In fact, let cp(x) be a factor of the kth power of a polynomial 1f;(x), whose
roots are the distinct numbers AI, ... , Ap; if 1f;(A) = 0, then A is similar
to a diagonal matrix; if 1f;(A) ~ 0, then A is not similar to a diagonal
matrix.
In the general case of repeated roots, the matrix A is similar to a matrix
B in Jordan normal form; that is, a matrix (in partitioned form, see Ref. 9,
Sect. 2.8)

3-12

GENERAL MATHEMATICS

where- the Bi are square matrices of form
"Xi

1

0

0

0

"Xi

1

0

0

0

"Xi

1

0

0

0

"Xi

Bi=

and "Xl, "', "Xs are not necessarily distinct. In the matrix B each characteristic root "X appears on the diagonal a number of times equal to its multiplicity.
An alternative rational canonical form for matrix A has the form B =
diag (BI, "', Bp), where Bi has form

o
o

1

o

o

o

o

o

o

1

If A has rational (real) entries, the Bi can be chosen so that the Cij are
rational (real).
If A is a real matrix, the eigenvalues "X need not be real but, since ¢(x)
has real coefficients, they will occur in conjugate complex pairs. In this
connection it is useful to note that the matrices

r c~s (J
( r sm (J

-r sin (J)
r cos (J

are similar.
When A is of small degree or is otherwise especially simple, its eigenvalues and eigenvectors can be found by explicit calculation from the definitions given above; often they can be found from the physical interpretation of the problem. Determination of eigenvalues is a problem in solving
an algebraic equation (Chap. 2), but other methods are available (Ref. 4).
If "X is an eigenvalue having absolute value greater than that of all other'
eigenvalues and a is any reasonable approximation to an eigenvector belonging to "X (a must not lie in the subspace spanned by the eigenvectors
of the remaining eigenvalues), then the sequence a = aI, a2, "', an, ...
where ai+l = aiA/ci and Ci is the first nonzero coefficient of ai will converge to an eigenvector for "X.

MATRIX THEORY

3-13

9. ORTHOGONAL AND SYMMETRIC MATRICES

Let V be a real vector space, with basis aI, "', an, SO that each vector
has coordinates (al, "', an). The inner product (Ref. 9) (a, (3) of the
vectors a = (aI, "', an), {3 = (b l , " ' , bn) is defined as the scalar
(a, (3)

+ ... + anb n.

= alb l

The norm of a is the scalar Ia I = (a, a)Y2. The angle () between a, {3 is defined by the equation
(a, (3) = lall{3lcos ().

These definitions are relative to the given basis but are unaffected if a
new orthonormal basis is introduced; that is, a basis a'l, "', a' n such that
(a'i, a'j) = Oij = 1 or 0 according as i = j or i ~ j. If
a'i

=

L: aijaj,
j

then the matrix A = (aij) has as its inverse the transposed matrix AT =
(b ij ) , where bij = aji; that is, AAT = I. A matrix with this property is
'called orthogonal. Since det A = det AT, and det A ·det AT = det I = 1,
one concludes that det A = ±1. When det A = 1, A is called proper
orthogonal and is a product of rotations; if det A = -1, A is a product of
rotations and one reflection, so that orientation is reversed. The eigenvalues
of an orthogonal matrix all have absolute value equal to 1.
A real matrix A = (aij) is termed symmetric if A = AT; if, further, the
quadratic form "J;i,j aijXiXj is >0 except when Xl = ... = Xn = 0, then A
and the quadratic form are said to be positive definite. The x's can be
interpreted as coordinates of a vector a with respect to a given basis; then
~i,j aijXiXj = (aA, a). If a new basis is chosen (not necessarily orthonormal), the form is replaced by a new quadratic form. When A is positive
definite, the new basis can be chosen so that (aA, a) has the form ~i xl;
this is equivalent to the statement that A can be written as ppT, where P
is nonsingular. If A is symmetric, but not necessarily positive definite,
the new basis can be chosen so that (aA, a) has the form
Yl 2

+ ... + Yr2

- Yr+1 2 - ... - Ys2,

where the numbers r, s are uniquely determined by A. This is equivalent
to the statement that there exists a nonsingular matrix P such that PApT
= B, where B = (b ij ) , bij = 0 for i =/= j, bij = 0 or ±1 for i = j. (One
terms B congruent to A.)
The eigenvalues of a symmetric matrix A are all real, and A is similar
to a real diagonal matrix C; indeed C = P AP-l, where P may be chosen
to be orthogonal (Ref. 9).

GENERAL MATHEMATICS

3-14

An analogous theory holds for complex vector spaces. The inner product
is defined as

so that (a, a) = ~ilail2 > 0; the norm of a is defined to be (a, a)Y2.
Orthogonal matrices are replaced by unitary matrices, defined by the condition AAT = I, where the bar denotes replacement of each entry by its
conjugate. Symmetric matrices are replaced by H ermitean matrices, defined by the condition A = AT.
10. SYSTEMS OF LINEAR INEQUALITIES

Let V be a real vector space with fixed basis {aI, "', an} as in Sect. 9,
so that each vector a has coordinates (ab "', an). If the vector {3 has
coordinates (bI, "', bn ) then one writes
{3

>

a or a

< {3

if

ai

< bi

(i = 1, "', n),
(i = 1, "', n),

{3 ~ a or a ~ {3

if

a ~ {3

but

a ~ {3.

The relation ~ is a partial order; the relations < and ~ are antisymmetric
and transitive but not reflexive (Chap. 1, Sects. 4 and 7). A vector a is
said to be
non-negative if a ~ 0,
positive if a ~ 0,
strictly positive if a >

o.

The set Q of all non-negative vectors is called the positive orthant in V.
A positive vector a such that al + ... + an = 1 is called a probability
vector.
For fixed a and real number k, the set of all vectors ~ = (Xl, " ' , xn)
such that
(a, ~) + k = aIXI + ... + anXn + k ~ 0
is a closed set (Chap. 1, Sect. 8) called a half-space x. For example, the
solutions of 2XI + 3X2 - 6 ~ 0 form the half-space (half-plane) in twodimensional space, as shaded in Fig. 1. Similarly, the solutions of (a, ~) +
k > 0 constitute an open half-space XO' The solutions of (a, ~) + k = 0
constitute a hyperplane II which is the boundary of both X and XO' By a
system of linear inequalities is meant a set of relations

MATRix THEORY

3-15

where'the index K ranges over a given l:let (possibly infinite), and for each
RK is one of the relations >, ~,
For exmnple,'
Xl

+ 5X2 > 0,

Xl

+

X3

K,

=0

is a system of linear inequalities. By a solution of the system of inequalities is meant a vector~::= (Xl, •• " xn) which satisfies all the inequalities.

FIG. 1.

Half-space in two dimensions.

With each inequality is associated a half-space (or hyperplane) 3C K • The
set of all solutions of the system is the intersection of all X K •
Convexity. The vector a is said to be a convex combination of vectors
al, " ' , am if
P1

+ ... + Pm

= 1,

Pi ~ 0 (i = 1, "', m).

A nonempty set I(in V is sa~d to be convex if it contains all convex combinations of its vectors. If I( is interpreted as a point set in n-dimensional
space, K is convex if and only if, for each pair of points al, a2 in K, the
line segment joining a1 to a2 lies in K. (See Fig. 2.) A half-space 3C is
said to be a support for a convex set K in V if K is a subset of x. If,
moreover, II contains n - 1 independent vectors of K, X is called an extreme support for K.
Let T be a subset of V. The set of all convex combinations of vectors

3-16

GENERAL MATHEMATICS

in T is a convex set, called the convex closure of T. A convex set K is said
to be finitely generated if it is the convex closure of a finite subset of K.

FIG. 2.

Convex set.

A set T is said to be bounded if, for some constant M,

Iall + ... + Ian I ~

M for all a in T.

DOUBLE DESCRIPTION THEOREM. If K is a finitely generated convex set
in V, then K is the intersection of a finite number of supports; moreover, if K
spans V, then K is the intersection of its extreme supports. Conversely, if the
intersection of a finite collection of half-spaces is nonempty and bounded, then
it is finitely generated.
This theorem, when formulated in algebraic terms, is known as Farkas'
Lemma:
FARKAS' LEMMA (strong nonhomogeneous form). Let V be n-dimensional real vector space, let V' be m-dimensional real vector space; fixed
bases are assumed chosen in each. Let A be an n by m matrix, let 0 be a
vector in V', let k be a scalar, and suppose that there is at least one vector
cp in V such that cpA.~ o. Then a vector a in V will satisfy the condition
(a, cp) ~ k for all cp for which cpA ~ 0 if and only if there exists a vector
"I ~ 0 in V' such that a = "IAT and ("I, 0) ~ k.
FARKAS' LEMMA (weaker homogeneous form, k = 0, 0 = 0). Let A be
an n by m matrix. Then a vector a in V will satisfy the condition
(a, cp) ~ 0 for all cp for which cpA ~ 0 if and only if there exists a vector
"I ~ 0 in V' such that a = "lAT.
Farkas' Lemma can be used as a foundation for the 111inimax Theorem in
game theory and for the Duality Theorem in linear programming. These
theorems can also be deduced from the following one:

MATRIX THEORY

3-17

THEOREM. Let A = (aij) awl B be n by m 1r/,alrices with aij > 0 for all i, j.
Then there exist probability vectors ~ in V and 7] in V' and a unique scalar Ie
such that
(leA - B)'r] ~ 0 and ~(lcA - B) ~ 0;

in the first inequality 7] is regarded as an m by 1 matrix.
For all of Sect. 10, see Refs. 5, 6, 9.

REFERENCES
1. A. C. Aitken, Determinants and Matrices, Interscience, New York, 1954.
2. Garrett BirkhofI and Saunders MacLane, A Survey of lIfodern Algebra (Revised
edition), Maemillan, New York, Hl53.
3. M. Bocher, Introduction to Higher Algebra, Macmillan, New York, 1930.
4. R. A. Frazer, W. J. Duncan, and A. R. Collar, Elementary lIfatrices, Cambridge
University Press, Cambridge, Englund, 1938.
5. T. C. Koopmans (Editor), Activity Analysis of Production and Allocation (Cowles
Commission Monograph No. 13), Wiley, New York, 1951.
G. H. W. Kuhn and A. W. Tucker (Editors), Contributions to the Theory of Games,
Vols. I, II, III (Annals of Mathematics Studies Nos. 24, 28, 38), Princeton University
Press, Princeton, N. J., 1950, 1953, 195G.
7. C. C. MaeDuffee, The Theory of Matrices, Chelsea, New York, 194G.
8. C. C. MacDuffee, Vectors and Matrices, Mathematical Association of America,
Buffalo, N. Y., 1943.
9. R. M. Thrall and L. Tornheim, Vector Spaces and lIfatrices, Wiley, New York, 1957.
10. J. H. M. Wedderburn, Lectures on Matrices, American Mathematical Society,
New York, 1934.

A

GENERAL MATHEMATICS

Chapter

4

Finite Difference Equations
G. E. Hay

1. Definitions

4-01

2. Linear Difference Equations

4-03

3. Homogeneous Linear Equations with Constant Coefficients

4-04

4. Nonhomogeneous Linear Equations with Constant Coefficients

4-05

5. Linear Equations with Variable Coefficients

4-07
4-08

References

1. DEFINITIONS

By a difference equation is meant an equation relating the values of an
unspecified function f at x, x
h, x
2h, "', x
nh, where h is fixed.
For example,

+

(1)

f(x

+ 3)

- f(x

+ 2)

+

- xf(x

+

+ 1)

- 2f(x) = x 2

is a difference equation, in which h = 1, n = 3. The variable x will generally be assumed to vary over the discrete set of real values Xo + ph
(p = 0, ±1, ±2, ... ), where Xo is a constant. By proper choice of origin
Xo and scale one can make Xo = 0, h = 1, so that x varies over the integers
0, ± 1, ±2, .... In the subsequent discussion, this simplification will be
assumed made, so that x varies over the integers and the difference equation thus relates the values of f at x, x
1, "', x
n. For the case
when x varies continuously, see Remark at the end of Sect. 4. The values
of f are assumed to be real, although much of the theory extends to the
case in which f has complex values.

+

4-01

+

GENERAL MATHEMATICS

4-02

A general difference equation is constructed from a function 1/;(x, Yo, Yl,
.. " Yn) of the integer variable x and the n + 1 real variables Yo, "', Yn.
The difference equation is the equation
(2)

1/;(x, f(x) , f(x

+ 1), .. " f(x + n»

= O.

By a solution of the difference equation is meant a function f which satisfies
it identically. When the equation takes the form
(3)

f(x

+ n) =

c/>(x,f(x), ... ,f(x

+n

-

1»

and c/>(x, Yo, .. " Yn-l) is defined for all values of x, Yo, "', Yn-l, eq. (3) is
simply a recursion formula. If f(O) , f(l), "', fen - 1) are given arbitrary
values, then eq. (3) determines successively fen), fen + 1), "'; thus there
is a unique solution for x ~ 0 with the given initial values f(O), f(1), "',
fen -'- 1).
The first d~fference of f(x) is f)"f = f(x + 1) - f(x); the second difference
is f),,2f = f)"(f),,f) = f(x + 2) - 2f(x + 1) + f(x); the kth difference is f)"kj.
A difference cq. (2) can be written in terms of f and its differences. For
example, eq. (1) is equivalent to the equation
(1')
Conversely, an equation relating f, f)"f, "', f)"nf can be written in form (2).
Thus, eq. (2) is the general form for difference equations, and this form
will be used throughout this section, in preference to an equation relating
the differences of f.
The order of the difference eq. (2) is defined as the distance between the
most widely separated x-values at which the values of f are related. If 1/;
definitely depends on f(x) and f(x + n), then the order is n. However,
the order may be less than n. For example, the equation
(4)

f(x

+ 4)

- 2f(x

+ 3)

- f(x

+ 1)

=0

has order 3, since the most widely separated values are x + 1 and x + 4.
The substitution g(x) = f(x + 1) reduces this to an equation relating g(x),
g(x + 2), g(x + 3).
OPERATOR NOTATION. If Y is a function of x, one writes
(5)

Thus EOy = y(x), Ely = Ey = y(x
be written
(2')

(k = 0, 1,2, ... ).

+ 1).

The difference eq. (2) can thus

FINITE DIFFERENCE EQUATIONS

4-03

2. LINEAR DIFFERENCE EQUATIONS

By a linear difference equation is meant an equation of form
(6)

anf(x

+ n) + an-d(x + n

- 1)

+ ... + ad(x + 1) + aof(x)

=

where ao, "', an, vex) are given functions of the integer variable x.
terms of the operator E of Sect. 1, the equation can be written:

vex),
In

(6')
where Y = f(x).

It can be written more concisely as follows:
,p(E)y

(7)

= vex),

where ,peE) is a linear difference operator:

,peE)

(8)

= an En

+ ... + alE + ao.

If vex) == 0, eq. (6) is termed homogeneous; otherwise it is nonhomogeneous.
In case ao ~ 0, an ~ 0, the equation is of order n. In case ao == al == ... ==
am-l == 0, but ama n ~ 0, then it is of order n - m; the substitution g(x) =
f(x - m) then reduces eq. (6) to a linear equation for g of form (6), with
nonvanishing first and last coefficients.
Linear Independence. Let YI (x), "', Yp(x) be functions of x defined
for a < x < b. The functions are said to be linearly independent if a relation

blYI (x)

+ ... + bpYp(x)

==

0

with constant bl " ' , bp, can hold only if bl = b2 = ... = bp = O. Otherwise, the functions are said to be linearly dependent.
General Solution. Let the difference eq. (6) be given, with vex) == 0
and ao(x)an(x) ~ for a < x < b; all coefficients are assumed defined for
a < x < b. Then the equation has order n, there are n linearly independent solutions Yl (x), "', Yn(x) for a < x < band

°

(9)

Y = CIYI(X)

+ ... + cnYn(x),

a

< x < b,

where CI, " ' , en are arbitrary constants, is the general solution; that is,
all solutions are given by eq. (9). If vex) ;;5 0, but the other hypotheses
hold, then the general solution has form
(10)

.

Y = CIYl(X)

+ ... + CnYn(X) + VeX),

where Vex) is a solution of the nonhomogeneolls equation and CIYl(X) +
... + cnYn(x) is the general solution of the related homogeneous equation,
that is, the homogeneous equation obtained by replacing vex) by O.
EXAMPLE. The functions YI == 1, Y2 == X are linearly independent solutions of the equation (E 2 - 2E
l)y = 0, so that Y = Cl
C2X is the
general solution; Y = 2x - 1 is a solution of the equation (E 2 - 2E + l)y =
2x - l , so that Y = Cl
C2X
2x - 1 is the general solution.

+

+

+

+

4-04

GENERAL MATHEMATICS

3. HOMOGENEOUS LINEAR EQUATIONS WITH CONSTANT
COEFFICIENTS

The equations considered have form
1/;(E)y = 0,

(11)

where
(12)
the coefficients an, "', ao are constants, and aoan
eq. (12) is the characteristic polynomial
1/;(A) = anA n

in the complex variable A.

~

O. Associated with

+ ... + alA + ao

The equation

(13)

1/;(A)

=0

is an algebraic equation of degree n, called the characteristic equation or
auxiliary equation associated with eq. (11). The characteristic equation
has n roots At, "', An called characteristic roots. (See Chap. 2.) These
may be real or complex; since the coefficients are assumed real, the complex
roots come in conjugate pairs.
From the set of characteristic roots one obtains a set of n solutions of the
difference eq. (11) by the following rules:
I. To each simple real root A one assigns the function Xl:;
II. To each real root A of multiplicity k one assigns the k functions AX,
XA x, "', Xk-IA x ;
III. To each pair of simple complex roots ex + (3i = p(cos cp ± i sin cp)
one assigns the functions pX cos cpx, pX sin cpx;
IV. To each pair of complex roots ex ± (3i = p(cos cp ± i sin cp) of multiplicity k one assigns the 2k functions
pX cos cpx, xpx cos cpx, "', Xk- l pX cos cpx,

pX sin cpx, xpx sin cpx, "', xk-Ipx sin cpx.
In all one obtains n functions YI (x), "', Yn (x) which are linearly independent solutions of eq. (11) for all x, so that

Y = CIYI(X)
is the general solution.
2
4
3
EXAMPLE. (E - 8E + 25E
The characteristic equation is
A4 - 8A3

+ ... + cnYn(x)

-

+ 25A2

36E

+ 20)y = O.

-36A

+ 20 = O.

4-05

FINITE DIFFERENCE EQUATIONS

The roots are 2, 2, 2 ± i.

Y

= 2X(Cl

Hence the general solution is

+ C2X) + 5x / 2 (C3 cos cpx +

C4

sin cpx),

where cp = arctan 72.
REMARKS. The variable x has heretofore assumed only integral values.
If x is allowed to take on all real values, then the difference equation becomes a functional equation. The methods of this section are still applicable and provide the general solution of eq. (11) subject only to the following two modifications: (a) the arbitrary constants Cl, C2, ••• may be replaced by arbitrary periodic functions of x, of period 1; (b) if A is a negative
characteristic root of multiplicity Ie, the corresponding solutions become
(_A)X cos 7I"X, x( _A)X cos 7I"X, "', Xk - l ( -A)X cos 7I"X.
4. NONHOMOGENEOUS LINEAR EQUATIONS WITH CONSTANT
COEFFICIENTS

The equation considered is
1f;(E)y

(14)

= vex),

where 1f;(E) satisfies the same conditions as in Sect. 3. By the rule stated
at the end of Sect. 2, the general solution of eq. (14) has the form
(15)

y

=

C1Yl (x)

+ ... + cnYn(x) + Vex),

where Vex) is a particular solution and the other terms are the general solution of the related homogeneous equation 1f;(E)y = O.
The procedures for finding the particular solution Vex) can be described
concisely by means of an operational calculus which parallels that used for
differential equations (Chap. 8). The operators 1f;(E) with constant coeffi. cients can be added, subtracted, multiplied, and multiplied by constants
just as polynomials. The operators can be converted into operators x(~)
by the relation

A=E-l.

(16)

For example,
The powers ~, D?, "', ~ k are the first, second, "', kth difference, as defined in Sect. 1.
If Y = Vex) is a solution of eq. (14), one writes
(17)

Vex) = -

1

1f;(E)

vex) = [1f;(E)]-lV(X).

GENERAL MATHEMATICS

4-06
TABLE

No.
1. t/;(E)

1.

RULES FOR PARTICULAR SOLUTIONS

vex)

t/;(E)
CIVI(X)

2. t/;1(E)t/;2(E)

+ C2V2(X); CI, C2 const.

vex)

[t/;(E)]-IV
cI[t/;(E)]-IVI c2[t/;(E)]-IV2

+

t/;)E)

Ch~E) v)
x-I

3. A

=

vex),

E - 1

X

=

0, ±1, ...

A-IV =

L

v(k)

k=O

4. A2

=

E2 - 2E

+1

x-I k-l

vex), x= 0, ±1, ...

e) ~ 1* (x 1~1'~ ~ 1) I

A-2v =

L

L v(S)

k=O s=O

d-' (:) -

(n: J

1
aX t/;(aE) u(x)

6. t/;(E)

7. t/;(E)

t/;(a)

8. (E - a)kcf>(E)

cf>(a)k!
aXu(x), t/;(a) ~ 0, 'it a polynomial of degree S

9. t/;(E)

aX[p(a)
as

+ !!.- p'(a)A + ...
1!

+ I"
p8(a)As]u(x),
s.
10. (E - a)kcf>(E)
¢(a) ~

°

p(X) = l/tf;(X)
aXu(x), u(x) a polynoial ofaX-k[q(a)A -k
degree S
a
q'(a)A l-k

+ -1!
as

+ -;! q8(a)A
11. E - a
12. (E - a)k

+

13. (E - a)2 ,82
a + i,8 = p( cos cf>
i sincf»

+

s

-

+ ...

k]u(x),

vex)

q(X) = l/cf>(X)
ax-1A -lea-xv)

vex)

ax-kA -k(a-xv)

vex)

(pX-l/,8)[sin (cf>x - ¢).
A -l(p-x cos cf>x v)
- cos (cf>x - cf».
A -l(p-x sin cf>x v)]

FINITE DIFFERENCE EQUATIONS

4-07

Thus the inverse operator [1f(E)]-I, when applied to vex), yields one solution of the eq. (14). The rules for finding particular solutions can now be
summarized in a table, which evaluates [1f(E)]-I V for various choices of 1/;
and v. This is carried out in Table 1. The last column gives one choice
Vex) of [1f(E)]-lV; the general solution is given by eq. (15).
The binomial coefficient (~) of Rule 5, Table 1, is defined for n = 1,2, ....
When n = 0, it is defined to equal 1, and Rule 5 remains valid. Corresponding to this inverse rule is the direct rule:
O n.

A general power- of x can be expanded in terms of these coefficients:
(19)
where the T ni are Stirling numbers of the second kind. They are tabulated
on page 170 of Ref. 2. If the polynomial u(x) is expanded in terms of the
coefficients by eq. (19) and Rule 5 or eq. (18) is applied, then Rules 9, 10
are easier to use.
A general expression 1/1f(E) can be regarded as a rational function of E
and expanded in partial fractions, just as if E were a numerical variable.
Rules 11, 12, 13 then permit evaluation of the terms. For example)
1

E2 _ 3E

+ 2 vex)

=

1

=

(E - 1)(E - 2) vex)

= 2x - I Ll- 1 (2- X v) -

(1

1)

E - 2 - E _ 1 vex)

Ll-Iv.

Rule 12 is needed for multiple roots. Rule 13 is needed for complex roots;
it can be generalized to take care of repeated complex roots (Ref. 2).
5. LINEAR EQUATIONS WITH VARIABLE COEFFICIENTS

The general solution of the first order linear equation
[E - p(x)]y = v(x),·

(20)

where p(x)

;z!=

0 for x

~ a,

(22)

Y = q(x)

q(x) = p(a)p(a

=

1,

~

v(s)

]

a,

is
x-I

(21)

x

x

[

c

+ L:

8=a

q(s

+ 1) '

+ 1) ... p(x = a.

1),

x

>a

GENERAL MATHEMATICS

4-08

Laplace's Method. For equations

(23)
with polynomial coefficients, one seeks a solution
(24)

y(x)

=·1

b

tx-1v(t) dt,

a

where a, b, and vet) are to be determined.
Let (x)n denote n!(~)~ so that (x)n = x(x - 1) ... (x - n + 1) for n =
1, 2, ... , and let (x)o = 1. It follows from eq. (19) that an arbitrary polynomial can be expressed as a linear combination of the polynomials (x)n.
Hence the coefficients ak(x) can be considered as linear combinations of the
(x )n. N ow by integration by parts one obtains from eq. (24) the relation
(25)

(x

+m =

l)mETy

[,~ (-l)'+'(x + m -

l)m_,t"-1+'D'-' (t"v(t) I

.

+ (-l)m

f

I

b

tx+m-1Dm{tnv(t)} dt,

a

where D8 = d8/dt 8. Hence the difference eq. (23) takes the form
(26)

[F(x, v, t)]:

+

f

b

tX-1G(v, t) dt = O.

a

The function vet) is chosen so that G(v, t) == O. In fact, the equation
G(v, t) = 0 is usually a homogeneous linear differential equation for vet).
The constants a and b are then chosen so that F(x, v, t) vanishes when
t = a and t = b, so that eq. (26) is satisfied. Once a, band vet) have been
determined in this way, eq. (24) then yields y(x).
For further details see Ref. 2, Sect. 174.

REFERENCES
1. T. Fort, Finite Differences, Oxford University Press, Oxford, England, 1948.
2. C. Jordan, Calculus of Finite Differences, 2nd edition, Chelsea, New York, 1950.
3. N. E. Norlund, Vorlesungen uber Differenzenrechnung, Springer, Berlin, 1924.

A

GENERAL MATHEMATICS

Chapter

5

Differential Equations
G. E. Hay and W. Kaplan

1. Basic Concepts
2. Equations of First Order and First Degree
3. Linear Differential Equations
4. Equations of First Order but not of First Degree
5. Special Methods for Equations of Higher than First Order
6. Solutions in Form of Power Series
7. Simultaneous Linear Differential Equations
8. Numerical Methods
9. Graphical Methods-Phase Plane Analysis

10. Partial Differential Equations
References

5·01
5·02
5·04
5·07
5·09
5·10
5·12
5·14
5·15
5·20
5·22

1. BASIC CONCEPTS

An ordinary differential equation is an equation of form
(1)

",(x, y, y', "', yen») = 0,

expressing a relationship between an unspecified function y of x and its
derivatives y' = dy/dx, "', yen) = dny/dx n. An example is the following:

y' - xy = 0.
The order of the equation is n, which is the order of the highest derivative
appearing. A solution is a function y of x, a < x < b, which satisfies the
5-01

5-02

GENERAL MATHEMATICS

equation identically. For many equations one can obtain a function
(2)

= f(x,

y

cn ),

Cl, ••• ,

expressing y in terms of x and n independent arbitrary constants Cl, ••• , Cn
such that, for each choice of the constants, eq. (2) is a solution of eq. (1),
and every solution of eq. (1) is included in eq. (2). When these conditions
are satisfied, eq. (2) is called the general solution of eq. (1). A particular
solution is the general solution with all of the n arbitrary constants given
particular values.
If eq. (1) is an algebraic equation in yen) of degree k, then the differential
eq. (1) is said to have degree k. For example, the equation
(3)

y",2

+ y"y'" + y4 =

eX

has order 3 and degree 2. When the degree is 1, the equation has the form
(4)

p(x, y, ... , y(n-I))y(n)

+ q(x,

y, ... , yen-I») = 0

or, where p -=F- 0, the equivalent form
(5)

yen) = F(x, y, ... , yen-I)),

F = -q/p.

The EXISTENCE THEOREM asserts that, if in eq. (5) F is continuous in an
open region R of the space of the variables x, y, ... , y(n-I), and (xo, Yo, ... ,
Yo (n-I)) is a point of R, then there exists a solution y(x) of eq. (5), Ix - Xo I < h,
such that
(6)

y = Yo, y' = y' 0,

••• ,

yen-I) = Yo (n-I)

for x = Xo.

Thus there exists a solution satisfying initial conditions (6). If F has continuous partial derivatives with respect to y, y', ... , yen-I) in R, then the
solution is unique.
2. EQUATIONS OF FIRST ORDER AND FIRST DEGREE

. An equation of first order and first degree can be written in either of the
equivalent forms

(7)
(8)

y' = F(x, y),
M(x, y) dx

+ N(x, y) dy = o.

For equations of special form, explicit rules can be given for finding the
general solution. Some of the most important types are listed here.
Equations with Variables Separable. If in eq. (8) M depends only
on x, N only on y, then eq. (8) is said to have the variables separable. The
equation may then be written with the x's separated from the y's, and the
general solution may be obtained by integration.

DIFFERENTIAL EQUATIONS
EXAMPLE.

5-03

y' = 3x:.!y. An equivalent separated form is 3x 2 dx - y-l dy

= O. Hence

f3X 2 dx - f y -

l

dy = c.

Integrating and solving for y one finds y = clex~ as the general solution,
where CI = e-c •
HOlllogeneous Equations. A function F(x, y) is said to be homogeneous of degree n if F(Ax, AY) == AnF(x, V). The differential eq. (7) is said
to be homogeneous if F(x, y) is homogeneous of degree o. To solve such
a differential equation write y = vx and express the differential equation in
terms of v and x. The resulting differential equation has variables separable
and can be solved as above. In general, y' = F(x, y) becomes
xv'

+v =

F(x, vx) = xOF(1, v) = G(v),
dx

-+
x
v-

dv
G(v)

=0.

Exact Equations. The differential eq. (8) is exact if for some function
u(x, y)

(9)

au
-

ax

=

au

-

M(x, V),

ay

= N(x, V),

so that du = 1\;[ dx + N dy. The equation is exact if and only if alv[lay ==
aNlax. The general solution is given (implicitly) by u(x, y) = c.
EXAMPLE.
(3x 2 - 2xy) dx + (2y - x 2) dy = O. Here aMlay = -2x
= aNI ax, so that the equation is exact. Then
au

- = 3x 2 - 2xy
ax

'

From the first equation, y = x - x 2y + g(y), where g(y) is an arbitrary
function of y. Substitution in the second equation yields the relation
_x 2 + g'(y) = 2y - x 2, so that g(y) = y2 + c. Hence the general solution is x 3 - x 2 y + y2 = c.
Integrating Factors. If the eq. (8) is not exact, it may be possible to
make it exact by multiplying by a function ¢(x, V), called an integrating
3

factor.

The equation (3xy + 2y2) dx + (x 2 + 2xy) dy = 0 is not
exact, but after multiplication by x becomes the exact equation
(3x 2y + 2xy2) dx +(x3 + 2x 2y) dy = O.
EXAMPLE.

The general solution is x 3 y

+ x 2y2

=

c. The integrating factor is x.

5-04

GENERAL MATHEMATICS

Linear Equations. A differential equation is linear if it is of the first
degree in the dependent variable and its derivatives. If such an equation
is also of the first order, it may be written in the form

(10)

y'

+ p(x)y =

q(x).

Here u = ef p dx is an integrating factor and the general solution is
(11)
EXAMPLE.

y

y'

~ u-1

(f

+ x-Iy =

Q(x)udx

4x 2.

+ C),

U

pdx
= ef .

Here u = x and eq. (11) gives

3. LINEAR DIFFERENTIAL EQUATIONS

The linear differential equation of order n can be written in the form
(12)

aoDny

+ alDn-Iy + ... + an_IDy + anY =

Q(x),

where the coefficients ao~ "', an may depend on x, and Dky == dky / dxk.
When the aj are constant, eq. (12) is said to have constant coefficients.
When Q(x) == 0, the equation is said to be homogeneous. The homogeneous
equation obtained from eq. (12) by replacing Q(x) by 0 is called the related
homogeneous equation. It will generally be assumed that ao ~ 0 throughout
the interval of x considered.
The general solution of eq. (12) is given by
(13)

y =

ClYI (x)

+ ... + cnYn(X) + y*(x),

where y* (x) is one particular solution and YI (x), "', Yn (x) are particular
solutions of the related homogeneous equation which are linearly independent; that is, a relation
blYI (x)

+ b2Y2(X) + ... + bnYn(x)

==

0,

with constant bI, "', bn can hold only if bl = 0, "', bn = O.
Q(x) == 0, one can choose y*(x) to be O.

When

HOlllogeneous Linear Equations with Constant Coefficients. The

equation has the form
(14)

ao

~

0,

where ao, "', an are constants. Particular solutions are obtained by setting y = eTX • Substitution in eq. (14) leads to the equation for r:
(15)

5-05

DIFFERENTIAL EQUATIONS

This is called the auxiliary equation or characteristic equation. In general
it has n roots, real or complex, some of which may be coincident (Chap. 2).
From these roots one obtains n linearly independent solutions of the differential eq. (14) by the following rules:
I. To each real root r of multiplicity k one assigns the functions ekx ,
kx
xe , "', xk-1e kx .
II. To each pair of conjugate complex roots a ± {3i of multiplicity k one
assigns the 2k functions
eCXX cos {3x, eCXX sin {3x, xe CXX cos {3x, xe CXX sin {3x,
.. " xk-1e cxx cos {3x, xk-1e cxx sin (3x.

The n function Yl (x), .. " Yn(x) thus obtained are linearly independent and
Y = CIYl(X)

+ ... + cnYn(x)

is the general solution of eq. (14).
EXAMPLE 1. D2y - 3Dy + 2y = 0. The auxiliary equation is r2 - 3r
+ 2 = 0, the roots are 1, 2; the general solution is y = clex + C2e2x.
4
EXAMPLE 2. D6y - 9D y + 24D2y - 16y = 0. The auxiliary equa4
2
tion is r6 - 9r + 24r - 16 = 0, the roots are ±1, ±2, ±2; the general
solution is Y = clex + C2e-x + e2x (c3 + C4X) + e-2x (c5 + C6X).
3
EXAMPLE 3. D4y + 4D y + 12D2y + 16Dy + 16 = 0. The auxiliary
4
3
equation is r + 4r + 12r2 + 16r + 16 = 0, the roots are -1 ± iy3,
-1 ± iV3. The general solution is y = e-X[(cl + C2X) cos y3 x
(C3 + C4 X ) sin V3 x].

+

Nonhomogeneous Linear Equations with Constant Coefficients.

The equation considered is
(16)

ao

~

0,

where ao, ';', an are constants and Q(x) is, for example, continuous for
< x < b. The general solution of the related homogeneous equation is
found as in the preceding paragraphs; it is called the complementary function. Here are presented methods for finding a particular solution y*(x)
of eq. (16). As indicated in eq. (13), addition of the complementary function and y*(x) gives the required general solution of eq. (16).
Method of' Undetermined Coefficients. If Q(x) is of form
a

(17)

eCXX[p (x) cos (3x

+ q(x) sin (3x],

where p(x) and q(x) are polynomials of degree at most h, then there is a
particular solution
(18)

GENERAL MATHEMATICS

5-06

where cjJ(x) and 'if; (x) are polynomials of degree at most h and a ± {3i is a
root of multiplicity k (possibly 0) of the auxiliary equation. If (3 = 0, Q is
of form eaXp(x); also p and q may reduce to constants (h = 0). The coefficients of the polynomials cjJ, 'if; can be considered as undetermined coefficients; substitution of eq. (18) in eq. (16) leads to relations between these
coefficients from which all can be determined. As an example consider the
equation
(D 2
l)y = 3 cosx.

+

Here a = 0, (3 = 1, h = O. Since ±i are roots of the auxiliary equation,
= 1 and
Y* = x(A cosx
B sin x).

k

+

Substitution in the differential equation leads to the relation
2( -A sin x
Hence B =

%,

+ B cos x)

== 3 cos x.

A = 0; Y* = %x sin x and the general solution is
Y = ix sin x

+ Cl cos +
X

C2

sin x.

Superposition Principle. If in eq. (16) Q(x) is a linear combination
of functions Ql (x), "', QN(X) and Yl *(x), "', YN*(X) are particular solutions of the respective equations obtained by replacing Q(x) by Ql (x), "',
QN(X), then the corresponding linear combination of Yl *(x), "', YN*(X)
is a solution of eq. (16); that is, if

then

+ ... + bNYN*(X)

y*(x) = b1Yl * (x)

is a particular solution of eq. (16). For example, particular solutions of
(D 2

+

l)y

= 3 cos x,

are found by undetermined coefficients to be %x sin x, e2x respectively.
Hence a particular solution of
(D 2
is given by 6x sin x

+ 2e

2x

+ l)y =

12 cos x

+ 10e

2x

•

Variation of Parallleters. Let the complementary function be

Then a particular solution is
(19)

DIFFERENTIAL EQUATIONS

5-07

where

and WI (X), ... , Wn (X) are defined by the linear equations

+ ... + Yn(x)wn(x) = 0,
Y'l (X)WI (x) + ... + y' n(X)Wn(X) = 0,

YI (X)WI (x)

(20)

YI(n-l)(X)WI(X) +···+Yn(n-l)(X)W n(X) = Q(x)/ao.
The determinant of coefficients of eqs. (20) is the Wronskian determinant

YI(X)
(21)

W=

Y'I (x)
YI (n-l) (x)

Yn (n-l) (x)

Under the assumptions made, W cannot equal 0 for any x of the interval
considered, so that eqs. (20) have a unique solution (Chap. 2). This
method is applicable if ao, ... , an are functions of x, provided ao(x) r!= O.
Operational Methods. The operational methods based on the H eaviside calculus provide another powerful tool for obtaining solutions of nonhomogeneous linear equations with constant coefficients (see Chap. 8).
Closely related are the methods based on the Laplace transform (Chap. 9).
4. EQUATIONS OF FIRST ORDER BUT NOT OF FIRST DEGREE

The equations considered have form
(22)

-.r(X, y, p) = 0,

where p = dy/dx. Equation (22) can be solved for p, except where Y;P = O.
The locus defined by the two equations
(23)

-.r(X, y, p) = 0,

is called the singular locus. It may contain curves y = f(x) which are
solutions of eq. (22); such solutions are called singular solutions. The solutions of eq. (22) (with the possible exception of the singular solutions) can
often be obtained by one of the following special methods.
Factol'ization. If eq. (22) can be factored in the form
(24)

[p - FI(x, y)][p - F 2(x, V)] ... [p - Fk(x, V)] = 0,

5-08

GENERAL MATHEMATICS

then its solutions are obtained by combining all solutions of the first degree
equations
(25)

(p

For example, the equation
p2 _ (2x

+ y)p + 2xy =

= dy/dx).

°

can be factored into the equations
p = 2x,

p

=

y;

the solutions are y = x 2 + CI, y = C2ex. If 1f;(x, y, p) is of second degree in
p, the expressions for p and the equivalent factorization (24) can be obtained by the quadratic formula.
Solving for y or x. If eq. (22) is of first degree in y, one can solve for y
to obtain an equation
(26)

y

= F(x, p).

Differentiation of this equation with respect to x yields a relation of form
'.dp
dx = G(x, p),

(27)

that is, a first order equation relating p and x. If the general solution of
eq. (27) is given by

cJ>(x, p) =

(28)

C,

then the equations
(29)

y = F(x, p),

cJ>(x, p) =

C

together define solutions of eq. 22; p may be eliminated between the equations or treated as a parameter. As an example, consider the Clairaut
equation:
(30)

y = xp

+ F(p).

The method described leads to the "general solution"
(31)

y

= cx + F(c).

There is, in general, a singular solution defined by the equations
(32)

x+F'(p) =0,

y = xp

+ F(p).

If the eq. (22) is solvable for x, one can differentiate with respect to y, reo

DIFFERENTIAL EQUATIONS

5-09

placing dxldy by lip; one obtains the solutions in the form
(33)

cp(y, p) =

y

C,

= F(x, p).

5. SPECIAL METHODS FOR EQUATIONS OF HIGHER THAN FIRST ORDER

Equations with Dependent Variable Missing.

Let the given equa-

tion be
F(x, y', y", .. " yen») = 0,

(34)

Set p = dyldx.

so that y does not appear.

Then

dp
y" = - ,
dx

and so eq. (34) becomes
(34')

n 1
dp
d - )
F ( x p - , ... -~ =
, 'dx
' dxn -1

°
'

an equation of order n - 1 for p in terms of x. If its solutions are known,
then the solutions of eq. (34) are obtained from the relation y =
EXAMPLE.

f

p dx.

Consider the equation
x 3 y" - x 2 y' = 3

+ x2•

The substitution p = y' leads to the first order linear equation
x 3 dp _ x2p = 3
dx

+ x2.

Its general solution is found (Sect. 2) to be
1

p

= - 2
X

+ 1 + CIX.

Hence integration yields y:

Equations with Independent Variable ~1issing.

equation be the nth order equation
(35)

F(y, y', y", "', yen») = 0,

Let the given

5-10

GENERAL· MATHEMATICS

so that x does not appear. Set p = y'. Then
d2y
dp
dp dy
dp
-=
-=--=p2
dx
dx
dy dx
dy'
2
3
d y = p2 d p
(d P)2, ....
dx 2
dy2
dy

+

Thus eq. (35) becomes an equation of order n - 1. If its solutions are
known, in the form
p = cp(y, Cl, ••• , Cn-l),
then
dy
dy
-----=dx.
dx = cp,
ocp(y, Cl, ••• )
Thus integration yields an implicit form of the solutions of the given equation.
Linear Equations with One Known Solution. Let a linear equation
be given:
(36)

ao(x)y(n) (x)

+ ... + an(x)y = Q(x).

Let YI (x) be a solution of the related homogeneous equation.
substitutions
(37)

Y

=

YI(X)V,

w

Then the

= v'

leads to an equation of order n - 1 for w. If w has been found, integration
and multiplication by YI (x) yields y.
6. SOLUTIONS IN FORM OF POWER SERIES

Formation of Taylor Series.

(38)

y(n)

= F(x,

Let an equation of order n be given:
y, y', ... , y(n-l»)

and let F be expressible in an absolutely convergent power series in powers
of x, y, y', ... for Ixl < a, Iyl  0, {3 > 0; in Fig. 3 the roots
are AI, A2with Al < 0 < A2; in Fig. 4 the roots are ±{3i, {3 > 0; in Fig. 5
y

x

FIG. 4.

Solution near center type singular point.

the roots are Al, A2 with Al < A2 < O. The solutions of the system (61)
will have the same appearance near (0, 0) as the solution of eqs: (67),
except in borderline cases; of the four cases illustrated, only that of Fig. 4
is of borderline type. For a full discussion, see Ref. 2.
y

FIG. 5.

Solution near node type singular point.

DIFFERENTIAL EQUATIONS

5-19

Limit Cycles. Of much importance for applications are the solutions
represented by closed curves in the xy-plane. These are termed limit cycles.
For the parametric eqs. (G1) such a solution is represented by equations
x = 'p(t), y = q(t), where p and q have a common period '1'. A typical
solution family containing a limit cycle C is illustrated in Fig. G. The
y

x

FIG. 6.

Limit cycle.

cycle C is stable in this case; that is, all solutions starting near C approach
C as time t increases. In many cases simple properties of the isoclines allow
one to conclude existence of limit cycles in particular regions. A theorem
of Bendixson states, that a region in which Fx + Gy > 0 can contain no
limit cycle of eqs. (G1) (Refs. 2, 3).
Phase Plane. For the motion of a particle of mass m on a line, classical
mechanics gives an equation of the form
2

(G9)

dx
m dt
2

(

dX)

= F t, x, dt .

When F is independent of t, the substitution v = dx/dt leads to an equation
(70)

dv
l1lV -

; dx

= F(x, v)

which can be analyzed as above. The pair (x, v) represents a phase of the
mechanical system and the xv-plane is termed the ph'lse plane. Second
order equations arising in other contexts can be treate:l similarly and the
term phase is used for the pair (x, v) or (x, y) regardless of the physical

GENERAL MATHEMATICS

5-20

significance of the variables. An especially simple graphical discussion can
be given for the conservative case of eq. (69):
d2 x

(71)

m dt 2

=

F(x).

See Ref. 2.
10. PARTIAL DIFFERENTIAL EQUATIONS

This section presents a brief discussion of partial differential equations
of second order. Some further information is given in Chap. 6. (See
Refs. 4, 10.)
Classification. Consider an equation
(72)

A

a2 u
-2

ax

a2u

a2u

au

au

+ 2B laxay
+ C -ay2 + D -ax + E -ay
- + Fu + G =

0

where u is an unknown function of x and y and the coefficients A, "', G
are given functions of x and y (perhaps constants). The eq. (72) is termed

elliptic if B2 - AC

< 0,

parabolic if B2 - AC = 0,
hyperbolic if B2 - AC

> O.

The three types are illustrated by the

Laplace equation:

a2 u
-2

ax

a2 u

+ -2 =

0,

ay

:au
a2 u
heat equation: - - k2 - 2 = 0,
at
ax
wave equation:

a2u
-2 -

at

k2

a2

1,l

-2

ax

=

o.

Attention will be restricted to the three special types.
Dirichlet Problelll. One seeks a solution u(x, y) of the Laplace equation in an open region D, with given boundary values on the boundary of
D. This problem can be treated by conformal mapping (Chap. 10, Sect. 5).
Heat Equation. A typical problem is the following. One seeks a solution u(x, t) of the heat equation Ut - k 2 u xx = 0 for t > 0, 0 < x < 1, with
given initial values rf>(x) = u(x, 0) and boundary values u(O, t) = 0,
u(l, t) "= O. To obtain a solution one can employ the method of separa-

5-11

DIFFERENTIAL EQUATIONS

tion of variables. One seeks solutions of the differential equation and
boundary conditions of form
u = f(x)g(t).

(73)

From the differential equation one finds that one must have
g'(t)
2f "(X)
- - k --=0.
get)
f(x)

Hence g'l 9 must be a constant A, and f" If must equal Alk2 :
(74)

k 2f" (x) - Af(x) = O.

g' (t) - Ag(l) = 0,

From the boundary conditions at x = 0 and x = 1 one finds that
f(O) = f(l) = O.

(75)

From eqs. (74) and (75) one concludes that f(x) and A must have the form
(76)

f(x) = b sin n7rX,

A = _k 2 n 2 7r 2 ,

n = 1, 2, ....

From eqs. (74) get) has form const .. eAt. Hence particular solutions of form
(73) have been found:
n = 1,2, ....

(77)

Each linear combination of the functions (77) is also a solution of both the
heat equation and the boundary conditions at x = 0 and x = 1. Accordingly, each convergent series
(78)
also represents a solution. By proper choice of the constants bn the initial
values can be satisfied. One requires that
00

(79)

¢(x) = ~ bn sin n7rx.
n=l

Thus the bn are determined from the expansion of ck(x) in its Fourier sine
series (Chap. 8, Sect. 8). With the bn so chosen, eq. (78) represents the
desired solution of the given problem.
'Vave Equation. One seeks a solution u(x, t) of the wave equation
2
Utt - k u xx = 0 for 0 < x < 7r, t > 0 with given initial values u(x, 0) =
¢(x) and initial velocities Ut(x, 0) = t/I(x) and given boundary values
u(O, t) = u(7r, t) = O. This is the problem of the vibrating string. The

GENERAL MATHEMATICS

5-22

method of separation of variables can be used as above and one obtains
the solution in the form of a series
ao

(80)

U

=~IL,'sin nx[an sin knt

+ f3n cos knt],

n=I

where an and f3n are determined from the expansions:
ao

(81)

cJ>(x) =

L

ao

f3n sin nx,

y;(x) =

n=I

L

nka n sin nx.

n=I

Relaxation Methods. One can obtain an approximation to the solution of a partial differential equation by replacing it by a corresponding
difference equation. The method has been especially successful for the
Dirichlet problem, which is discussed here. The differential equation
U xx + U yy = 0 is replaced by the equation
(82)

u(x

+ h, y) + u(x, y + h) + u(x

- h, y)

+ u(x, y

- h) - 4u(x, y) = O.

If the given region is the square 0 ~ x ~ 1, 0 ~ y ~ 1, one chooses h =
lin for some positive integer n and requires eq. (82) to hold at the lattice
points (kIh, k2h), 0 < ki < n, 0 < k2 < n. The values of u on the boundary (x = 0 or 1, Y = 0 or 1) are given, and eq. (82) bec~mes a system of
simultaneous linear equations for the unknowns U(klh, k2h). These can be
solved by the relaxation method. One chooses an initial set of values for
the unknowns, then obtains a next approximation by replacing u(x, y) by
(83)

t[u(x

+ h, y) + u(x, y + h) + u(x -

at each lattice point.

h, y)

+ u(x, y

- h)]

Repetition of the process generates a sequence

un(x, y) which can be shown to converge to the solution of eq. (82). As

h ~ 0, the solution of eq. (82) can be shown to converge to the desired
solution of the Dirichlet problem (Ref. 10).

REFERENCES
1. R. P. Agnew, Differential Equations. McGraw-Hill, New York, 1942.
2. A. Andronow and C. E. Chaikin, Theory of Oscillations, Princeton University Press,
Princeton, N. J., 1949.
3. E. A. Coddington and N. Levinson, Theory of Ordinary Differential Equations,
McGraw-Hill, New York, 1955.
4. R. Courant and D. Hilbert, Methods of Mathematical Physics, Vol. I, Interscience,
New York, 1953.

DIFFERENTIAL EQUATIONS

5-23

5. E. L. Ince, Ordinary Differential Eqllations, Longmans, Green, London, 1927.
6. E. Kamke, Differentialgleichllngen, Losllngsmethoden 1lnd Losllngen, YoU, 2nd
edition, Akademische Verlagsgesellschaft, Leipzig, 1H43.
7. E. Kamke, Differentialgleichungen reel leI' Fllnktionen, Akademische Verlagsgesellschaft, Leipzig, 1933.
8. E. D. Rainville, Elementary Differential Equations, Macmillan, New York, 1952.
9. E. D. Rainville, Intermediate Differential Equations, Wiley, New York, 1943.
10. R. V. Southwell, Relaxation Methods in Engineering Sciences, Oxford University
Press, Oxford, England, IH46.
11. E. T. Whittaker and G. M. Watson, A Course of lIfodern Analysis, 4th edition,
Cambridge University Press, Cambridge, England, 1940.

A

GENERAL MATHEMATICS

Chapter

6

Integral Equations
E. H. Rothe

1 Definitions and Main Problems

6-01

2. Relation to Boundary Value Problems

6-03

3. General Theorems

6-05

4. Theorems on Eigenvalues

6-06

5. The Expansion Theorem and Some of Its Consequences

6-07

6. Variational Interpretation of the Eigenvalue Problem

6-08

7. Approximation Methods

6-10

References

6-17

1. DEFINITIONS AND MAIN PROBLEMS

A linear integral equation of first kind is an equation of form
(1)

f.

b

](s, t)x(t) dt = f(s);

f(s) and ](s, t) are considered to be given, and a function x(t) satisfying
eq. (1) is called a solution of the integral equation.
Fredhohn Integral Equation. This is the linear integral equation of
second kind and has the form
(2)

f.

xes) - A

b

](8,

t)x(t) dt

a

6-01

= f(s).

6-02

GENERAL MATHEMATICS

Here K(s, t) and f(s) are given real functions, and A is a given real constant; a solution of the integral equation is a function xes) satisfying eq.
(2) for a ~ s ~ b.
Volterra Integral Equation. If in eq. (2) the upper limit b is replaced
by the variable s, the resulting equation
(3)

f.

8

xes) - A

]{(S, t)x(t) dt

= f(s)

a

is called a Volterra integral equation. Equation (3) can be considered as a
special case of eq.(2); namely, the case for which K(s, t) = 0 for t ~ s.
The preceding definitions relate to integral equations for functions of
one real variable. There are analogous definitions for functions of two or
more real variables. It is also of importance to allow x, K, f to take on
complex values and to allow A to be complex. For simplicity the results
will be formulated for functions of one variable; essentially no change is
required to extend the results to functions of several variables. Only functions with real values will be considered here. The discussion will furthermore be restricted to the integral equation of second kind; for the equation
of first kind, see Ref. 10, Chap. 2.
REMARK. The equations defining Laplace and Fourier transforms can
be regarded as integral equations of first kind. Solving the equations is
equivalent to finding the inverse transforms. See Chaps. 8, 9.
The function K(s, t) in eq. (2) is called the kernel of the integral equation.
The eq. (2) is said to be homogeneous if f(s) = 0; otherwise it is nonhomogeneous. The homogeneous equation
(4)

f.

xes) - A

b

K(s, t)x(t)

=0

a

obtained from eq. (2) by replacingf(s) by 0 is called the homogeneous equation associated with eq. (2).
A number A such that eq. (4) has a solution x = ¢(s) not identically 0 is
called a characteristic value or eigenvalue of eq. (4) or of the kernel K(s, t);
the solution ¢(s) is called 'an eigenfunction associated with A. For each
eigenvalue A there may be several associated eigenfunctions. From the
definition it follows that 0 cannot be an eigenvalue.
The eigenvalue problem associated with eq. (4) is the determination of
whether, for a given kernel, eigenvalues exist, what they are, and what
the corresponding eigenfunctions are.
The expansion problem associated with eq. (3) is the determination of
the possibility of expanding every function g(s) of a given class in an in-

INTEGRAL EQUATIONS

6-03

fini te series:

:E caCPa(s),

g(s) =

a=l

where the CPa(s) are eigenfunctions.
The solvability problem associated with eq. (2) is the determination of
whether eq. (2) has a solution xes) and whether the solution is unique.
2. RELATION TO BOUNDARY VALUE PROBLEMS

The problems described in Sect. 1 arise naturally in the analysis of boundary value problems associated with partial differential equations.
A TYPICAL EXAMPLE is presented. The equation is the wave equation
u

(5)

=

u(x, y, z, t),

where y-2 is the Laplacian operator:
2
y- u =

(6)

a2u

a2u

a2u

-+-+-.
ax2
ay2
az 2

A finite domain D, with smooth boundary B, is given in x, y, z space; a
function g(x, y, z) is given in D. One seeks a function u satisfying eq. (5)
for t ~ 0 and for (x, y, z) in D and satisfying the boundary conditions
(7)

u(x, y, Z, t)

= 0,

(8)

u(x, y, Z, 0)

= g(x, y, z),

(9)

(x, y, z) on B, t ~ 0;

Ut(x, y, z, 0) = 0,

(x, y, z) in D;

(x, y, z) in D.

The classical attack on this problem is to "separate" the time and space
variables; that is, to set
u(X; y, z, t)

(10)

= Sex, y, z)T(t).

Then one is led to the boundary value problems:
(11)
(12)

y- 2

s + XS
T"(t)

= 0,

S

= 0 for (x,

+ XT(t)

= 0,

y, z} on B.

T'(O) =

o.

If Sand T satisfy eqs. (11), (12) for some constant X, then u = ST satisfies eqs. (5) and both (7) and (9), but not necessarily eq. (8).
Integral Equation. The problem (11) can be replaced by an integral
equation by the following reasoning. It is shown in the theory of partial
differential equations (Ref. 5) that there exists a uniquely determined function ]((s, 0") of the two points

s: (x, y, z),

0": (~, 11, r)

GENERAL MATHEMATICS

6-04

in D, the so-called Green's junction, with the following properties: K has
continuous second partial derivatives as long as s ~ er; K(s, er) = 0 for s
on B, er in D; if ¢(er) has continuous first partial derivatives in D and Dl is
an arbitrary subdomain of D (with smooth boundary), then
(13)

V2

fff

~, 1], t)¢(~, 1], t) d~ d1] dt

K(x, y,--z,

= -¢(x,

y, z)

for (x, y, z) in D 1 • Identifying ¢ with -AS and Dl with D, one sees that
the boundary value problem (11) is equivalent to the homogeneous integral equation in these variables:
Sex, y, z) -A

(14)

fff

K(x, y, z,

~, 1], t)S(~, 1], t) d~ d1] dt

=

o.

D

Solution. Let eq. (14) have a sequence of eigenfunctions Sa (x, y, z)
associated with the positive eigenvalues Aa (a = 1, 2, ... ). -Then Sa
satisfies eq. (11) with A = Aa; for A = Aa eq. (12) has the solution Ta ==
-cos
t, so that u = SaTa satisfies eqs. (5), (7), (9). To satisfy eq. (8),
one notes that each series

vx:

OC!

(15)

U

=

L: caSa(x, y, z)Ta(t)

= ~caSa cos

a=1

vx:

t

also satisfies eqs. (5), (7), (9), if the c's are constants and the series satisfies
appropriate convergence conditions. The condition (8) now becomes
OC!

(16)

g(x, y, z)

L: caSa(x, y, z);

=

a=1

thus one is led to the expansion problem. If g- can be expanded as in eq.
(8), then (15) defines a solution of the given problem.
Suppose that the 0 on the right-hand side of eq. (5) is replaced by
Fo(x, y, z, t); this corresponds to an external force. If Fo(x, y, z, t) =
F(x, y, z) T(t), where T(t) satisfies eq. (12) for some A = AO, then the substitution of eq. (10) leads to the nonhomogeneous integral equation
(17)

Sex, y, z) - AO

fff

K(x, y, z,

~, 1], t)S(~, 1], t) d~ d1] dt = j(x, y, z),

D

where
(18)

j(x, y, z)

=

fff
D

K(x, y, z,

~,

1],

t)F(~,

1],

t)

d~

d1] dt.

INTEGRAL EQUATIONS

6-05

3. GENERAL THEOREMS

In what follows ]((s, t) will be assumed continuous for a ~ s ~ b,
a ~ t ~ b. Such a continuity condition is not always satisfied, e.g., for
the Green's function of Sect. 2. See Ref. 5 (pp. 543 ff.) and Ref. 6 (pp.
355 ff.) for reduction of the discontinuous case to the continuous case.
Definitions. The following definitions relate to functions of t defined
and continuous for a ~ t ~ b. If x, yare two such functions, their scalar
product is
(19)

(x, y)

=

f

b

x(t)y(t) dt.

a

II x II of x = x(t) is defined as (x, x)>-2. Functions xl, "',
linearly independent if
.

The norm

Xn

are

(20)

with constant Cl, " ' , Cn, implies Cl = 0" . " Cn = 0; if the functions are
not linearly independent, they are termed linearly dependent. An infinite
system of functions
(21)
is called linearly independent if ¢l, "', ¢k are linearly independent for
every k.
Two functions x, yare said to be orthogonal if (x, y) = O. The system
(21) is orthogonal if (¢a, ¢(J) = 0 for a ~ {3. A system of orthogonal functions none of which is identically zero is necessarily linearly independent.
The system (21) is called orthonormal if it is orthogonal and normalized,
that is,
/I ¢a /I = 1 for all a.
If the system (21) is linearly independent, it can be orthogonalized and
normalized; that is, an orthonormal system {¥tal can be found such that,
for every n, ¥t1t is a linear combination of ¢l, "', ¢n and ¢n is a linear
combination of ¥tl, "', ¥tn. For details, see Ref. 4, p. 50.

THE SCHWARZ INEQUALITY states that for every x, y, I (x, y) I ~ II x 11·11 y II,
with equality if and only if x, yare linearly dependent.
THE BESSEL INEQUALITY states that, if {¢a 1 is an orthonormal system,
then for every x

(22)

L I(x, ¢a) 12
a=l

~

II X 112.

6-06

GENERAL MATHEMATICS

N ow consider three related integral equations:
(23)

f

b

K(s, t)cjJ(t) dt = f(s),

cjJ(s) - A

a

Ai K(s, t)cjJ(t) dt 0,
b

(24)

cjJ(s) -

(25)

"'(8) - AibK(t, 8)",(t) dt = O.

=

a

Equation (24) is the homogeneous equation related to (23); eq. (25) is
called the adjoint or transposed equation of (24).
THEOREM 1. If A is an eigenvalue of eq. (24), then A is an eigenvalue of
eq. (25). There are at most a finite number k of linearly independent eigenfunctions of eq. (24) associated with eigenvalue A; this maximal number k is
the same for eq. (25).
The number k is called the multiplicity of the eigenvalue A.
THEOREM 2. Equation (23) has a solution if and only if f is orthogonal to
all solutions of the adjoint eq. (25).
.
Conclusions Based on Theorellls 1 and 2. Let A not be an eigenvalue
of ]((s, t). Then A is also not an eigenvalue of K(t, s); that is, l/I == 0 is the
only solution of eq. (25). Hence (f, l/I) = 0 for all solutions l/I of eq. (25),
and eq. (23) has a solution for arbitrary f. For each f, the solution is unique;
for the difference cjJ of two solutions is a solution of eq. (24), hence cjJ == O.
Let A be an eigenvalue of K(s, t). Then eq. (25) is satisfied for at least
one l/I not identically zero and eq. (23) is not satisfied for somef, in particular
for f = l/I. (In the problem of Sect. 2 this case arises if the frequency Ao of
the time factor T of Fo(x, y, Z, t) is an eigenvalue of the homogeneous eq.
(14); this is the case of resonance.)
One is thus led to the following alternative of Fredholm: either (i) the
nonhomogeneous eq. (23) has a solution for arbitrary f or (ii) the homogeneous eq. (24) has at least one (not identically vanishing) solution.
Case (i) can also be characterized by the statement: eq. (23) has at most
one solution for each f; for the uniqueness implies existence of a solution.
4. THEOREMS ON EIGENVALUES

The kernel K(s, t) is said to be symmetric if K(s, t) == K(t, s). This case
occurs in many applications; for example in the problem of Sect. 2.
THEOREM 3.
A symmetric kernel has at least one and at most a countable
infinity of eigenvalues. Eigenfunctions corresponding to distinct eigenvalues
are orthogonal. The eigenvalues can be numbered to form a sequence {A a }, in
which each eigenvalue is repeated as many times as its multiplicity, and such

INTEGRAL EQUATIONS

6-07

that IAll ~ IA21 ~ •.• ; if there are infinitely lnany eigenvalues, then IAa I ~ 00
as ex ~ 00. An eigenfunction cJ>a can be assigned to each Aa in such a fashion
that the sequence {cJ>a} is orthonormal and every eigenfunction cJ> is a linear
combination of a finite number of the cJ>a's.
The sequence {cJ>a} is called a full system, of eigenfunctions of the kernel.
REMARK. While restricting ](s, t) to be real, one can consider complex
eigenvalues A and eigenfunctions x(t) = Xl (t) + iX2(t). Some kernels have
only complex eigenvalues; some kernels have no eigenvalues at all. A symmetric kernel has only real eigenvalues.
5. THE EXPANSION THEOREM AND SOME OF ITS CONSEQUENCES

THEOREM 4. Let {cJ>a} be a full system of eigenfunctions for the symmetric
kernel I{(s, t). Then in order that a function g(s) can be expanded in a uniformly convergent series:
(26)

where
(27)

it is sufficient that g(s) can be written in lhe form
(28)

g(s) =

f

b

I{(s, t)G(t) dt,

a

where G(t) is continuous.
In many applications the form (28) for the function g(s) to be expanded
arises in a natural way. For example, the function (I8) is of this form.
The coefficients (27) can be written in a different form which is often
useful. From eq. (28) and from the facts that cJ>a satisfies eq. (24) with
A = Aa and that ]( is symmetric, one deduces the expression
(29)

Ca

(G, cJ>cJ
= ---.
Aa

As a first application of the expansion theorem, let A be a number which
is not an eigenvalue, and seek to expand the solution x = cJ>(s) of eq. (23)
in terms of the eigenfunctions. To do this, note that by eq. (23) x - f is
of form (28) with G = AX. By Theorem 4 and eq. (29) one deduces the
expansion
(30)

xes) - f(s) =

A

La

(x, cJ>a)cJ>a(s).

Aa

If this relation is multiplied by cJ>{3(s) and integrated from a to b, one ob-

GENERAL MATHEMATICS

6-08

tains a linear equation for (x, ¢(3). Solving this equation and substituting
the result in eq. (30) gives the desired formula
(31)

~ (j,

¢a)

+ A L....i - '-

xes) = f(s)

a

Aa - A

¢a(s).

(The series is meaningless if A is one of the Aa , unless (j, ¢a) = 0; this is
in agreement with Theorem 2 of Sect. 3.)
A second application concerns the "quadratic form"

ff
b

(32)

I{x, x} =

a

b

K(s, t)x(s)x(t) dt ds,

a

whose importance will become clear in the next section. If one applies the
expansion theorem to the integral of K(s, t)x(t), one obtains the formula:
(33)

I{x, x} =

k

2

a
2:-,
a

Aa

The transition from eq. (32) to (33) is the analogue of choosing coordinates
which represent a conic section in its "principal axis" form.
6. VARIATIONAL INTERPRETATION OF THE EIGENVALUE PROBLEM

In this section the hypotheses and notations are the same as those of
Sect. 5. It is convenient to denote the positive Aa'S by

o < Pl

(34)

~

P2

~

P3

~

..•

and the negative ones by
(35)

There may be no p's or no n's; as remarked in Sect. 1, 0 is not an eigenvalue. Equation (33) now becomes
k· 2

(36)

I{x, x} = 2:~
i

-

Pi

l·2

2:~,
i

nj

where k j = (x, 'if;j), lj = (x, Xj) and ..pj is the eigenfunction associated with
Pi, Xj the eigenfunction associated with nj. From eq. (36) and Bessel's
inequality (22) one now concludes:
THEOREM 5.
If there are positive eigenvalues of the symmetric kernel
K(s, t), then
(37)

I{x, x}

~ JL:Jf,
Pl

INTEGRAL EQUATIONS

6-09

where PI is the smallest positive eigenvalue. The maximum of I {x, x} for x
within the class of x having norm 1 is attained when x = 1/11 and equals 1/Pl.
If there is a positive eigenvalue Pn, then the maximum of I {x, x} within the
class of x for which

(38)

"x" =

(j = 1, "', n - 1)

1,

is attained when x = 1/In and equals 1/Pn.
If K(s, t) is replaced by -I((s, t), one obtains a characterization of the
negative eigenvalues and corresponding eigenfunctions.
The characterization of eigenvalues in Theorem 5 is recursive; that is, in
order to characterize Pn and 1/In one has to know 1/11, "', 1/In-l. A direct
characterization is obtainable as follows. Let M {YI, "', Yn-l} denote the
least upper bound of I {x, x} among all x such that
(39)

=0

(x, Yj)

(j = 1, "', n - 1).

It can be shown that, among all choices of Yl = Yl (t), "', Yn-l = Yn-l (t),
M has its smallest value, namely 1/Pn, when Yl = 1/11, "', Yn-l = 1/In-l.
See Ref. 4, p. 132.
Rayleigh-Ritz Quotient.
(40)

This is the quotient

Q{xl = I{x, xl +

IT[

KCs, I) xCI)

dlr

ds.

Assume that there are at most a finite number of negative eigenvalues
and assume all the eigenvalues are numbered so that Al ~ A2 ~ A3 ~ .....
From eq. (33) one finds
(41)
From the expansion theorem of Sect. 5, with G(t) = x(t), one deduces that
(42)

f. b[f b

I((s, t)x(t) dt

a

a

]2ds =

2: (k~ )2 .
a

Aa

From eqs. (40), (41), (42) one thus obtains the inequality
(43)
Furthermore one can show that Q {x} takes on its mInImUm Al when
x = CPl. Thus the smallest eigenvalue and associated eigenfunction are obtainable by minimizing Q{x} . This is the basis of a very effective computational procedure.

6-10

GENERAL MATHEMATICS

The quotient Q{x} can be written in another way, more familiar in the
theory of differential equations. One sets
(44)

u(s)

=

f

b

K(s, t)x(t) dt,

a

so that
b

(45)

=f u(t)x(t) dt -;- f

Q{x}

a

b

u 2 dt.

a

The analogous definition, and integration by parts, for the problem of Sect.
2 leads to the expression

III(U
(46)

2
X

2
+ u 2+U
z ) dx dy dz
y

D

Q{x}

III

2
u dx dy dz

D

where u is the solution of the problem
(47)

11 =

0 on B.

7. APPROXIMATION METHODS

The first fqur methods to be described are devices for replacing the integra] equation by a system of linear algebraic equations.
ApproxiIllation of Integrals. Let a subdivision of the interval
a ~ t ~ b be given:
a = t1

< t2 < ... < tn < tn+1

and let D = max (tf+1 ...;.. tf).

f.

=

b

Then for continuous k(t), the difference
n

b

h(t) dt -

f, h(tj) (tj+, -

tj)

can be made as small as desired, in absolute value, by making 0 sufficiently
small. Hence one can take the sum as approximation to the integral. If
this is done for the Fredholm eq. (2), one obtains the approximating
equation
n

(48)

xes) - A 2: K(s, tj)(tf+1 - tf)
f=1

= f(s).

INTEGRAL EQUATIONS

6-11

If one now writes

(49)
for i = 1, "', n, then at s = ti eq. (48) becomes
n

(50)

Xi - A

L aijXj = bi

(i = 1, "', n).

j=l

This is a system of linear equations for Xl, " ' , x n . A solution can be regarded as giving the values of the desired xes) at tl, "', tn; one can interpolate linearly between these points to obtain an approximation to xes).
The first proof by Fredholm of the main theorems of Sect. 3 was based on
eq. (50) and subsequent passage to the limit (n ~ 00, 0 ~ 0).
For numerical purposes the procedure may be improved by using better
approximations for the integral such as those given by the trapezoidal rule,
Simpson's rule or Gauss's quadrature (Ref. 9, Chap. 7). Each of these
methods replaces the integral by a sum T-h(tj)A j with properly chosen
abscissas tj and "weights" Aj. For more details and also the question of
convergence, see Ref. 1 (pp. 105 ff.), Ref. 3 (pp. 437 ff.), Ref. 9 (p. 455).
Method of Degenerate Kernels. A kernel A (8, t) is called degenerate
if it can be written as a finite sum of products of a function of s by a function of t; that is, if it is of the form
n

(51)

A(8, t)

=

L A j (8)Bj (t).
j=l

Every continuous kernel I((s, t) can be approximated by a continuous degenerate kernel A (s, t); that is, for every E > 0 there exists a continuous A (s, t)
such that II((s, t) - A (s, t) I < E for a ~ 8 ~ b, a ~ t ~ b. One therefore
obtains an approximate solution of eq. (2) by replacing I( by A. For the

question of convergence one is referred to Ref. 4 (pp. 118 ff.), Ref. 1
(Abschnitt IV), and Ref. 3 (p. 464).
If I( is replaced by A, the Fredholm eq. (2) is replaced by the equation
00

(52)

xes) - A

L Aj(s)

f

b

Bj(t)x(t) dt

= f(8),

a

j=l

whose solution is found by solving a system of linear equations. To see
this, one multiplies eq. (52) by Bi(S) and integrates with respect to s from
a to b. With the notations

f

a

b

Aj(t)Bi(t) dt

= aij,

f

a

b

f(t)Bj(t) dt =

h,

GENERAL MATHEMATICS

one obtains the system
n

(53)

Xi -

A

L: aijXj = h

·(i = 1, ... , n).

j=l

It can be verified that if

Xl, ..• , Xn

is a solution of this linear system, then
n

Xes) = f(s)

(54)

+ A L: xjAj(s)
j=l

is a solution of eq. (52) and, conversely, every solution of eq. (52) is obtained in this way.
The Ritz-Galerkin Method. This is a method for finding approximations to the eigenvalues and eigenfunctions of the homogeneous eq. (3)
with symmetric kernel. Let {va} be an orthonormal system. Such a system is called complete (in the class of continuous functions on the interval
a ~ s ~ b) if for every continuous function xes) the sums
n

(55)

L: XiVi(S),
i=l

converge "in the mean" to x(s); that is, if

EXAMPLE.

The functions

form a complete orthonormal system for

-71"

~ S ~

7r; see Chap. 8, Sect.

8.
Now let ¢l (s) pe a normalized eigenfunction of eq. (3) corresponding to
the smallest positive eigenvalue AI. (If there are no positive eigenvalues,
one follows a similar procedure starting with the negative eigenvalue of
smallest absolute value.) One now seeks an approximation ¢ to ¢l of form
n

(56)

¢ =

L: CiViCS),
i=l

where {Va} is a complete orthonormal system. In order to determine the
note (Theorem 5, Sect. 6) that I/AI is the maximum of I {x, x} when
II X II = 1, and that this maximum is reached for X = ¢l. Restricting at-

Ci,

INTEGRAL EQUATIONS

6-13

tention now to functions of form (56), one finds
n

I {cp, cp}

(57)

=

n

2: 2: ba {3c a c{3,
a=l (3=1

with

f. f.
b

ba {3 =

(58)

The condition

II cP "

= ~

b

K(s, t)v a (s)v{3(t) ds dt.

becomes

(59)
Maximizing the quadratic form (57) with side condition (59) can be analyzed by the method of Lagrange multipliers (Ref. 4). One obtains the
equations
n

(60)

Ci -

A

2: bijcj = 0

(i = 1, "', n)

j=l

which, together with eq. (59), determine the Ci and A. In particular, A is
a root of the algebraic equation obtained by setting the determinant of
eq. (60) equal to zero. If Al * is the smallest positive root of this equation,
then Al * is an approximation to Al and Al * ~ AI; for A = Al * eqs. (59) and
(60) determine CI, " ' , Cn and, by eq. (56), a desired approximation cP of
the eigenfunction CPl.
Method of Enskog. The method will be discussed for the Fredholm
eq. (2) with symmetric kernel, with A not an eigenvalue. (For less restrictive assumptions, see Ref. 7, p. 109.) It is based on a complete linearly
independent system VI, V2, ••• with the additional property that the functions
b

(61)

f. K(s, t)vn(t) dt

Yn(s) = vn(s) - A

a

are orthonormal and complete. Such a system can be constructed as follows: Let WI, W2J ••• be a complete linearly independent system (e.g., the
system of sines and cosines given above). One then defines
b

(62)

f. K(s, t)wn(t) dt.

zn(s) = wn(s) - A

a

I t can be proved that the

Zn

are likewise linearly independent and com-

6-14

GENERAL MATHEMATICS

plete. From the Zn one constructs an equivalent orthonormal system
Yl, Y2,
(Sect. 3), so that relations
n

Yn(s) =

(63)

L: cnmzm(s)
m=l

hold, with constant Cnm .

One now defines:

(64)

vn(s) =

n

L: cnmwm(s).
m=l

It follows from eq. (62) that eq. (61) holds. Moreover, the system {v n } is
a complete linearly independent system.
Having a system {v n } of the properties indicated, one can find an approximate solution xes) of the Fredholm eq. (2) of form
n

xes) =

L: CiYi(S),
i=l

Multiply eq. (2) by VieS) and integrate with respect to S from a to b, to
obtain the relations:
(f, Vi)

=

f

ff

b

b

X(S)Vi(S) ds - A

a

i

a

f

b

=

K(s, t)X(t)Vi(S) dt ds

a

b

X(S)[Vi(S) - A

a

b

K(t, S)Vi(t) dt] ds.

a

Because of the symmetry of the kernel, the expression in brackets is Yi(S).
Hence
(j, Vi) =

i

b

X(S)Yi(S) ds = Ci·

a

Iteration is the basis of the following methods:
The Fredholm equation can be written
in the form
Successive Approxhnations.

(65)

xes) = f(s)

f

+A

b

1((s, t)x(t) dt.

a

This form suggests defining successive approximations
tion xes) as follows:
(66)

x(O) (s) = f(s) ,

x(i+l) (s} = f(s)

f.

+A

a

x(i)

(s) to the solu-

b

K(s, t)x(i) (t) dt,

INTEGRAL EQUATIONS

where i = 0, 1, 2, ....

One can prove by induction that
n

x(n) (s) = f(s)

(67)

6-15

+L

b

Aif KY> (s, t)f(t) dt,

i=1

a

where the so-called iterated kernels are defined by the relations
(68)

[{(l)(s, t) = [{(s, t),

[(Hl)(8, t)

=J"[(8, u)[(i)(u, t) du,
a

for i = 1, 2, ....

(69)

xes) =

It can be proved that

}~",x(n>(s)

+ i5 Ail.
00

=

f(s)

b

KY>(s, t) dt

exists if IAI is less than [(b - a) Max [{ (s, t)] -1; the series, known as
Neumann's series, converges uniformly for a ~ s ~ b. The function xes)
defined by eqs. (69) is the solution of (65) for A restricted as stated. For
the Volterra eq. (3) the Neumann series converges for all A and the solution is valid for all A.
The Schwarz Constants. Write I{x, x, I(} for the quadratic form
1 {x, x} defined by eq. (32) to express more clearly the dependence on [{.
The Schwarz constants are then defined as follows:
(70)

ao ::= (x, x),

a·~ = I{x " x j(i>} ,

(i = 1, 2, ... ),

where the [{(i> are defined by eq. (68). These constants (which obviously
depend on the choice of the function x) are important for the theory as
well as for estimating eigenvalues. Note the following facts, supposing always that 1(s, t) is symmetric:
If P is an arbitrary real number, subject only to the restriction that
(71)

and
(72)
then the interval with end pointsP, Q contains at least one eigenvalue provided that at least one of theJollowing assumptions is satisfied: (a) i is
even, (b) [{ is a positive definite kernel, that is, I {x, x} > 0 unless x ::::; o.
(For a proof, see Ref. 1, p. 30.) The quotients Q are termed Temple
quotients. Setting P = 0 in eq. (72) leads to consideration of the quotients
Qi = ai-IIai. It can be shown that the sequence IQ2i-1 I is monotone
nonincreasing and converges to IAll, where Al is the eigenvalue of smallest

GENERAL MATHEMATICS

6-16

absolute value. (For further applications of the Schwarz constants, see
Refs. 1, 3, 9.)
Method of Steepest Descent. The basis of this method is the fact
that x is a solution of the Fredholm eq. (2) with symmetric kernel if and
only if x minimizes the expression

f. f.
b

(73)

b

F {x} = ![(x, x) - A

a

K(s, t)x(s)x(t) ds dtl - (f, x).

a

Let Xo be a first approximation to x. One seeks a better approximation
= Xo h, and tries to choose h so that in going from Xo to Xl the value
of F descends as rapidly as possible. With the notation

Xl

+

f

L[xl = x - A

(74)

b

K(s, t)x(t) dt,

a

one finds that
(75)

F{xo

+ h}

=

F{xo}

+ (L[xo]

-j, h)

+ t(L[h], h).

N ow if F were a function of a finite number of real variables, the analogue
of the second term on the right side of eq. (75) would be the scalar product
of grad F with h. One therefore defines here
g[x] = L[x] - j

(76)

as the gradient of F. This suggests that, as in the case of a function of a
finite number of real variables, the direction oj steepest descent is given by
the negative gradient; this can be proved to be true. One therefore sets
h = - ag[xo], where a is a real constant to be determined. Replace h by
-ag[xol in eq. (75); then F[xo + h] becomes a function of the real variable
a. Now determine a by minimizing this function by the ordinary methods
of calculus. The result for the desired next approximation Xl = Xo
h is

+

(77)

Xl

= Xo _

\\ g[xo] \\2
g[xol.
(L[g[xo]], g[xoD

If one repeats the procedure starting with Xl instead of Xo, one obtains a
new approximation X2; continuing thus, one obtains a sequence Xl, X2, •• "
x n , •••• If the kernel K is symmetric and positive definite and IA\ is less
than IAa I for every eigenvalue Aa, then Xn converges in the mean to the
solution x of the Fredholm eq. (2). For proofs and details, see Ref. 8 (pp.
103 and 136). The method can also be applied to finding eigenvalues (Ref.
8, p. 142).

INTEGRAL EQUATIONS

6-17

REFERENCES
1. H. BUckner, Die Praktische Behandlung von I ntegralgleichungen, Ergebnisse der
Angewandten ~Mathematik, Vol. 1, Springer Verlag, Berlin, G6ttingen, Heidelberg, 1952.
2. L. Collatz, Eigenwertprobleme and ihre numerische Behandlung, Chelsea Publishing
Company, New York, 1948.
3. L. Collatz, Numerische Behandlung von Diiferentialgleichungen, Die Grundlehren
der JI,{athematischen Wissenschaften in Einzeldarstellungen, Vol. LX, 2nd edition, Springer
Verlag, Berlin, G6ttingen, Heidelberg, 1955.
4. R. Courant and D. Hilbert, Methods of JI,{athematical Physics, Vol. 1, Interscience
Publishers, New York-London, 1953.
5. Ph. Frank and R. v. Mises, Die Diiferential- und Integralgleichungen der Jl,fechanik
und Physik, 2nd edition, Vieweg, Braunschweig, 1930 (republished Rosenberg, New
York, 1943).
6. E. Goursat, Cours d'Analyse Mathematique, Vol. 3, 3rd edition, Gauthier-Villars,
Paris, 1923.
7. G. Hamel, Integralgleichungen, Einfuehrung in Lehre und Gebrauch, Springer, Berlin,
1937 (Edwards Brothers, Ann Arbor, Mich., 1946).
8. L. V. Kantorovich, Functional Analysis and Applied Mathematics, National Bureau
of Standards Report 1509, 1952 [translated from Uspekhi Matemat. Nauk, 3 (6),89-185
(1948)J.
9. Z. Kopal, Numerical Analysis, Wiley, New York, 1955.
10. W. Schmeidler, Integralgleichungen mit Anwendungen in Physik und Technik, 1.
Lineare Integralgleichungen, Akademische Verlagsgesellschaft Geest u. Portig, Leipzig,
1950.

A

GENERAL MATHEMATICS

Chapter

7

Complex Variables
W. Kaplan

1. Functions of a Complex Variable
2. Analytic Functions. Harmonic Functions
3. Integral Theorems

7-01
7-04
7-05
7-08

4. Power Series. laurent Series
5. Zeros. Singularities. Residues. 'Argument Principle

7-11
7-16
7-17
7-18

6. Analyric Continuation
7. Riemann Surfa,ces
8. Elliptic Functions
9. Functions Deflned by linear Differential Equations
10. Other Transcendental Functions

7-21
7-25

References

7-28

1. FUNCTIONS OF A COMPLEX VARIABLE

COlllplex NUlllbers. Throughout Chap. 7, z = x + iy and w = u + iv
denote complex numbers; i is the imaginary unit, i 2 = -1; x, .y, u, v are
arbitrary real numbers; x is the real part of z, y the imaginary part of z:
(1)

x

= Re

(x

+ iy),

y = 1m (x

+ iy).

The complex numbers z can be represented geometrically by the points
(x, y) of an xy-plane (or z-plane), as in Fig. 1. The polar coordinates (r, 0)
of z are termed respectively the modulus (or absolute value) of z and argument (or amplitude) of z:
(2)

r

=

Iz I =

mod z;

() = arg z = amp z;
7-01

z = r(cos ()

+ i sin ()).

GENERAL MATHEMATICS

7-02

The con}ugate of z

=x

+ iy is:

(3)

2

= x - iy.

Algebraic properties of complex numbers are discussed in Chap. 2. In
general, complex numbers are combined as are real numbers, with the relation i 2 = -1 used to simplify the results. Addition is the same as vector
y

x

I
I

I
I
I

~z

The complex z-plane.

FIG. 1.

addition (Fig. 1) . Multiplication of ZI by Z2 yields a number
modulus is !ZI ! . !Z2! and whose argument is arg ZI + arg Z2.

ZI • Z2

whose

Useful Rules.
ZI

(4)

+ Z2 =

21

+ 22 ,

+ 2 = 2 Re (z),
!Zl + Z2! ~ !ZI !+ !Z2! ,

z

zn

Z -

= [r(cos 0 + i sin O)]n =

2

= 2i 1m (z),

rn(cos nO

+ i sin nO),

n

= ±1, ±2, ....

COlllplex Functions. By a function of the complex variable z will be
meant an assignment of a value w to each z of a certain set D in the z-plane
(see Chap. 1, Sects. 1 and 3); one then writes:

w = fez).

(5)

(Some formulas will assign several values of w to each z in D. One then
speaks of a "multiple-valued function.") The set D is generally an open
region (e.g., interior of a circle); see Chap. 1, Sect. 8. From the equation
u + iv = f(x + iy) one deduces two equations of the form
(6)

u = u(x, y),

v = vex, y)

((x, y) in D),

COMPLEX VARIABLES

7-03

and conversely a pair (6) of functions of two variables determines a complex function (5) of z.
Limits and continuity for complex functions are defined as for real functions. The phrase "z approaches zo" is interpreted to mean: Iz - Zo I ~ 0,
or that the distance from z to Zo becomes arbitrarily small. The basic
theorems on sums, products, quotients hold without change from the real
case. Continuity of w = J(z) is equivalent to continuity of both u(x, y)
and v(x, y) in (6).
Each complex function w = J(z) can be interpreted as a mapping (Chap.
10) of the set D into a set E in the w-plane. If J(z) is continuous, then as
z traces a curve in the z-plane, w traces a curve in the w-plane.
Derivatives of complex functions arc defined as for real functions:

~ J(z) = J'(z) = lim J(z + L\z) - J(z) ,

(7)

dz

L\z

AZ ----> 0

and the formal rules of differentiation carryover. Higher derivatives
J"(z) , ... are defined similarly.
Definite integrals of complex functions are defined as line integrals:

(8)

j
o

Z

J (Z) dz = f(u

2

ZI

+ iv) (dx + i dy)

0

= ofu dx - v dy

+ i ofv dx + u dy.

Here C is a continuous path of finite length from Zl to
formal rules carryover.
ExaIllples of COIllplex Functions are the following:
polynomials: w = aoz n
an-lz
an,

Z2.

Again the

+ ... +
+
n
aoz + ... + a
n,
rational Junctions: w =
bozm

(9)

+ ... + bm

exponential Junction: w = eZ = eZ(cos y

+ i sin y)

logarithm: w = log z = log Iz I + i arg z

(z

~

= exp z,
0),

power Junction: w = za = exp (a log z),
eiz
trigonometric Junctions: sin z =
hyperbolic Junctions: sinh z =

eZ

_

e- iz

2i

e-z

-

2

eiz
' cos z

+ e- iz

= --2-eZ

+

e- z

, cosh z = - - 2

7-04

GENERAL MATHEMATICS

inverse trigonometric functions:

1
_~
sin-1 z = ~ log (iz ± VI - Z2), cos-1

1

Z

= -;-log (z ± i~)

t

t

The logarithm is a multiple-valued Junction and can be made singlevalued (so that continuity can be discussed) by properly restricting z and
the choice of arg z. The principal value is:
log z = log I z I +

(10)

ie,

(r

> 0,

-7r

< e~

7r),

a function continuous except for e = 7r.
If a is a rational number (e.g., 7~), za has a finite number of values. For
example, zYz =
has two values:

vz

zYz = eYz

log z

= eYz(lOg r+i arg z) =

Vr e(Yz) i arg z

(11)

if

e is one choice of arg z.

Identities satisfied by the exponential function, logarithm, and trigonometric functions:

log zn = n logz,
sin

(Zl

+ Z2)

= sin Zl cos Z2

+ cos Zl sin Z2, sin2 z + cos2 Z =

1, .. '.

In the case of the logarithm, the identities are true only for proper choice
of value of each logarithm concerned. The rules for differentiation also
carryover:
(13)
2. ANALYTIC FUNCTIONS.

d
- sin z = cos z,
dz
HARMONIC FUNCTIONS

The function w = J(z) is said to be analytic (regular, holomorphic) in
an open region D if it has a derivative f'(z) in D. The function fez) is
analytic inD if and only ifu = Re (f(z» and v = 1m (f(z» have continuous

COMPLEX VARIABLES

7-05

first partial derivatives in D and the Cauchy-Riemann equations hold in D:
(14)

au

av

au

av

ax

ay

ay

ax

Furt.hermore, if j(z) is analytic,
(15)

au

av

+ i -ax
ux

f'(z) = -

av

av

+ i -ax
ay

=-

= ....

If j(z) is analytic in D, the derivatives of all orders of j, u, v exist and
are continuous in D. From eq. (14) one deduces that
(16)

that is, u and v are harmonic junctions. Relations (14) are described by
the statement: "u and v form a pair of conjugate harmonic functions."
One says "v is conjugate to u," but should note that u is conjugate to -v.
In polar coordinates (14) and (15) become
(17)

au

1 av

1 au

ar

rae'

r ae

(18)

j'(z)

av
- ar'

eu + av)

= e-tf) -

ar

i- .
ar

All the functions (9) are analytic, provided the logarithms are restricted
so as to be continuous and division by zero is excluded. A function analytic for all z is called an entire function or an integral function; examples
are polynomials and eZ •
A function cannot be analytic merely at a single point or merely along
a curve. The definit,ion requires always that analyticity holds in an open
region. The phrases "analytic at zo" or "analytic along curve C" are
understood to mean "analytic in an open region containing zo" or "analytic in an open region containing curve C." If j is analytic in an open
region D, then the values w = j(z) form an open region in the w-plane.
3. INTEGRAL THEOREMS

The open region D is termed simply connected if every ,simple closed path
C in D (Fig. 2) has its interior in D. If D is not simply connected it is

7-06

GENERAL MATHEMATICS

multiply connected; for example, the region between two concentric circles
is multiply connected; it is doubly connected, because its boundary is formed
of two pieces or "components."

FIG. 2.

Simply connected' region.

All paths in the following line integrals are assumed to .be "rectifiable,"
i.e., to have finite length.
CAUCHY INTEGRAL THEOREM. If fCz) is analytic in a simply connected
open region D, then

c~f(Z) dz

=

0

on every simple closed path C in D or, equivalently, jf(Z) dz is independent
of path in D.
MORERA'S THEOREM Cconverse of Cauchy theorem). If fez) is continuous
in the open region D and

c~f(Z) dz

=0

on every simple closed path C in D, then fCz) is analytic in D.
An indefinite integral of fCz) is a function FCz) whose derivative is fez).
If fCz) is continuous in D and has an indefinite integral F(z), then
(19)

in particular, the integral is independent of path, so that fez) must be analytic; since F' Cz) = fCz), F(z) must also be analytic. If fCz) is a given analytic function in D, then existence of an indefinite integral of fez) can be

COMPLEX VARIABLES

7-07

proved, provided D is simply connected. In particular,

F(z) = jZfCZ) dz

(20)

(zo in D)

Zo

has meaning, since the integral is independent of path, and F'(z) = fez),
so that F is an indefinite integral.
CAUCHY INTEGRAL FORMULAS. Let fez) be analytic in D. Let C be a
simple closed path in D and having its interior in D. Let Zo be interim' to C
(Fig. 2). Then

f(zo)

1
= -.
27r~

~ fCz)
- - dz,
C
Z - zo

1 ~
!'(zo) = -.
27r~ C

fez)
2 dz, "',
(z - zo)

(21)

At the heart of this theorem is the special case fez) == 1:

n = 2,3,
Cauchy's theorem and integral formulas can be extended to multiply
connected domains. Let D be a domain bounded by curves Cl , C2 , " ' , Ck

FIG. 3.

Multiply connected region.

as in Fig. 3. Let fez) be analytic in a somewhat larger region, including
all of D and its boundary. Then

~f(Z) dz + ~f(Z) dz + ... +

(22)
Ct

C2

iv(z) dz = 0;

cn:f

that is, the integral of fez) around the complete boundary B of D is zero,

GENERAL MATHEMATICS

7-08

provided one integrates on the boundary in the direction· which keeps the
region D "on the left":
Bff(Z) dz = O.

(23)

Under the same conditions, if Zo is in D,
(24)

f -fez)- dz,

1
f(zo) = - .
21T"t B

n!
fen) (zo) = 21T"i B

Z - Zo

f ----

·f(z)
dz.
(z - zo)n+l

CA UCHY INEQUALITIES. Under the hypotheses' stated above for eqs.. (21),
let If(z) I = M on C and let C be a circle with center Zo and radius R. Then

Mn!
If(n) (zo) I ~ R n

(25)

(n=0,1,2,"')'

LIOUVILLE THEOREM. If fez) is analytic for all finite z and I fez) I ~ M,
where M is a constant, for all z, then fez) is identically constant.
MAXIMUM PRINCIPLE.
Let fez) be analytic in the open region D. If
If(z) I has a weak relative maximum at a point Zo of D (that is, if If(z) I
~ If(zo) I for z sufficiently close to zo), then fez) is identically constant.
For proofs of these theorems see Refs. 2, 3, 8.

4. POWER SERIES.

LAURENT SERIES

Infinite series whose terms are complex numbers are defined as for real
numbers and, in general, the theory of convergence is the same. In par00

ticular, a series

L

bn of complex numbers is termed absolutely convergent

n=l

if the series of real numbers ~ Ibn I converges. Absolute convergence implies
convergence.
Power Series. A power series in z has the form
00

L

(26)

cn(z - zo)n,

n=O

where Zo is fixed. Each such series has a radius of convergence p, 0 ~ p
~ +00. If p = 0, the series converges only for z = ZOo Otherwise, the
series converges (in fact, absolutely) for Iz - Zo I < p, i.e., inside the circle
of convergence (whose radius p may be infinite). Outside this circle, for
Iz - Zo I > p, the seri~s diverges. On the circle: Iz - Zo I = p, the series
may converge at some points and diverge at others. The radius can be
evaluated by the formulas:
(27)

p

I ~ I'

= n--+oo
lim
Cn+l

p

= lim
n--+oo

1
_nj-'
V

ICn I

COMPLEX VARIABLES

7-09

provided the limit exists, and in any case by the formula
(28)

p

= lil!l

1
_nj-'

n~«' V

Icnl

where lim denotes the lower limit.
Let the power series (26) have radius of convergence p > 0, so that its
sum is a well-defined function J(z) inside the circle of convergence. One
can then prove that the series converges uniformly to J(z) in each circle
Iz '-- Zo I ~ p' < p, so that J(z) is continuous. Furthermore, the differentiated series ~ncn(z - zo)n-l converges uniformly in each circle Iz - Zo I
~ p' < p. From this it follows that the differentiated series converges to
J'(Z) and that f'(z) is continuous. Hence J(z) is itself analytic for Iz - Zo I
< p. Every power series defines an analytic Junction inside its circle oj convergence. In general, all derivatives of J(z) can be evaluated by repeated
differentiation of the series. One hence concludes that

c

(29)

n

J(n) (zo)
=---'

n."

that is, the power series is the Taylor series of JCz).
that equality of the sum of two power series:

From this it follows

Jz - zol < p,
implies equality of corresponding coefficients:
Cn=0,1,2,···).
N ow let JCz) be given as an analytic function in an open region D of
arbitrary shape and let Zo be a point of D. With Zo as center one can then
construct a circle of maximum radius ro having its interior in D (Fig. 4).

FIG.

4.

Taylor series expansion.

GENERAL MATHEMATICS

7-10

Within this circle fez) can be represented by a power series, its Taylor series
about Zo:
00 f(n) (zo)
fez) =
(z - Zo)n,
(30)
Iz - zol < ro;
n=O n!

L --

the series may have a radius of convergence P larger than roo
theorem one deduces the following expansions:

zn

Z3
all z· sin z = z - ~ + . .. all z·
n=on!'
3!"
00

eZ =

L -,

Z2
cos z = 1 - 2!
(31)

+ z) =

log (1

1

-z

+

+ ... ' all z·,

Z2

Z)k = 1 + kz

Z3

2" +"3 - ... , Izl < 1;

z-

= 1+ z

-1-

(1

From this

+ Z2 + ... ,
k(k - 1)

+

2!

Z2

Izl

<

1;

+ ... , Iz I < 1.

Laurent Series. A series of form

~

bn
(z - zo)n

n=I

is reducible by the substitution z' = 1/ (z - zo) to the form of an ordinary
power series and accordingly converges for Iz' I < p, i.e., for Iz - Zo I> PI
1
00
= -. If now a series
an(z - zo)n converges for Iz - Zo I < P2 and

L

P

PI

n=O

< P2,

then the sum
00

L
n=I

b
n
(z - zo)n

00

+L

n=O

an(z - zo)n

has meaning for PI < Iz - Zo I < P2, that is, in a certain annular region D
(Fig. 5). Here PI may be 0 and P2 may be +00. Let the sum be f(z) , so
that fez) is analytic in D. If one writes bn = a_n (n = 1, 2, ... ); then
one has
~

(32)

fCz) =

L
n=-oo

00

an(z - zo)n

+L

n=O

00

anCz - zo)n =

L
n=-oo

anCz - zo)n.

COMPLEX VARIABLES

This series is termed the Laurent expansion of J(z).
be shown to be uniquely determined as follows:

7-11

The coefficients can

n = 0, ±I, ±2,

(33)

where C is any path about the ring, as in Fig. 5.
If J(z) is an arbitrary function analytic in a ring domain D, then one
can compute the coefficients an by eq. (33) and form the Laurent series,

FIG. 5.

Laurent expansion.

which will then converge to J(z) in D. In practice there are easier ways of
obtaining the coefficients. One way is to write J(z) as the sum of two functions!2(z), !I (z), the first analytic for Iz - zol < P2, the second analytic for
Iz - Zo I > PI and approaching 0 as Iz I ~ 00. Under the substitution
r = I/(z - zo),r:/J Jr (z) becomes a function of r analytic for Ir I < 1/PI, so
that Jl (z) =
bnr n or

L

n=1

r:/Jb

(34)

Jr(z) =

L

n

n=1 (z - zo)n

,

Iz- zol>Pl'

For J2(Z) one has a Taylor series about zoo Addition of the two series provides the desired Laurent series. For example, if J(z) = I/(z - 1) (z - 2)
and D is the ring 1 < Izl < 2, then one can choose Jl(Z) = -I/(z - 1),
!2(z) = 1/ (z - 2).
5. ZEROS.

SINGULARITIES.

RESIDUES.

ARGUMENT PRINCIPLE

Zeros. Let J(z) be analytic in domain D and let J(zo) = O. Then Zo is
called a zero of J(z). If J(z) is not identically zero, then each zero has a

7-12

GENERAL MATHEMATICS

definite order (or multiplicity) n, a positive integer, andf(z) = (z - zo)ng(z)
where g is analytic in D and g(zo) ~ O. The order n is the smallest value
of k such that j(k) (zo) ~ O. If fez) is not identically zero, then each zero
of fez) is isolated; that is, for each zero Zo one can choose a circular region
Iz - Zo I < a cont'aining rio other zero.
Singularities. Let fez) be not identically zero and have a zero of order
nat zoo Then h(z) , = Ilf(z) is analytic in some circular region Iz - Zo I < a
except at the center ZOo By definition, h(z) has a pole of order n at Z00 One
can write:h(z) = (z - zo)-np(z), where p(z) is analytic for Iz - Zo I < a
and p(zo) ~ O. Since f(zo) = 0, lh(z) l ~ 00 as z ~ ZOo One conventionally assigns the value 00 to h(z) at ZOo
In general, let fez) be analytic in a punctured disk: 0 < Iz - Zo I < a,
but not at ZOo Then fez) is said to have an isolated singularity at zoo One
can form a Laurent expansion of fez) in the ring domain PI = 0 < Iz - Zo I
< a = P2' Three cases can then arise.
I. No negative poU?ers in the Laurent series. Then
00

fez) =

2: an(z

- zo)n,

n=O
so that fez) can be treated as a function analytic for Iz - Zo I < a without
exception. The singularity is termed removable. The new value of fez) at
Zo is ao = lim fez).
Z-'Zo

II. A finite number of negative powers in the Laurent series.
proper choice of N,
fez) =

a_N

a_I

N

(z -:-- zo).

+ ... + - - + ao + al (z
Z - Zo

---: zo)
.'

Here, for

+ ...

(35)

g(z)
(z - zo)N'

Hence fez) has a pole of order N at ZOo
III. Infinitely many negative powers in the Laurent series. In this case
fez) is said to have an essential singularity at zoo
By a theorem of Riemann, the three cases can be distinguished as follows: I. If(z) I is bounded for 0 < Iz - Zo I < b for some b. II. If(z) I ~ 00
as z ~ ZOo III. Neither If(z) I nor II If(z) I is bounded in each punctured
disk 0 < Iz - zo I < b. In Case III, by a theorem of Weierstrass and
Casorati, fez) comes arbitrarily close to every complex number in every
neighborhood of zoo

COMPLEX VARIABLES

7-13

If fCz) is analytic for Iz I > R, then fez) is considered to have an isolated
singularity at z = co. A Laurent expansion is available, with P1 = Rand
P2 = co. The classification is similar to the above, with "negative'" replaced
by "positive." Also the type of singularity of fez) at co is the same as that
of f(l/z) at z = O.
A function analytic for all finite z except for poles is termed a meromorphic
function.
Residues. The residue of fez) at an isolated singularity Zo is defined as

Res [f(z) , zo]

(36)

=~.
27rl

jf(Z) dz,
C

where C is a circle I z - Zo I = c, enclosing no singularity other than Zo, and
the integration is in the counterclockwise direction. The residue of fez)
at z = co, denoted by Res [fez), co], is defined by the same integral, where
C is a circle I z I = c outside of which fez) has no singularity other than co
and where the integration is in the clockwise direction. If ~o is finite,
Res [fez), zo]

(37)

= a_1,

where a_1 is the coefficient of (z - zo) -1 in the Laurent expansion about
zoo If Zo is co,
Res [fez),

(38)

co]

= -a_I,

where a_1 is the coefficient of Z-1 in the Laurent expansion of fez) for

Izl>

R.

The CAUCHY RESIDUE THEOREM asserts that, iff(z) is analytic in an open
region containing the path C, then

c~f(Z) dz

(39)

= 27ri· (sum

of residues of fez) insz'de C),

provided fCz) is analytic inside C except for a finite number of isolated singularities. Similarly,

~f(z) dz

(40)

c

= 27ri·
(sum of residues of fez) outside C, including Res [f(z) , co]),

provided fez) is analytic outside C except for a finite number of isolated singularities. Hence, if fez) is analytic for all z, except for a finite number of singularities, the sum of all residues of f(z) , including that at co, is O.

GENERAL MATHEMATICS

7-14

Calculation of residues may be simplified by the following rules:

1. At a pole Zo of first order,
Res [fez), zo] = lim (z - zo)f(z).

(41)

2. At a pole Zo of order N (N = 2,3, ... ),

g(N-l)(Z)
Res [fez), Zo] = lim

(42)

Z~Zo

(N - I)!

,

where g(z) = (z - zo)Nf(z).
3. Let

fez)

(43)

=

A(z) ,
B(z)

where A(z) and B(z) are analytic at ZOo If A (zo)
of first order at Zo, then
A (zo)
(44)
Res [fez), zo] = - - .
B'(zo)
If A (zo)

0 and B(z) has a zero

0 and B(z) has a zero of second order at Zo, then

. 6A '(zo) B"(zo) - 2A(zo)B"'(zo)
Res [f(z), zo] =
3[B"(zO)]2
.

(4.5)
If .A (zo)
(46)

~

~

~

0 and B(z) has a zero of third order at Zo, then

Res [f(z), zo]

120A"B",2 - 60A'B'''B i v

-

12AB"'B 1'

+

15ABt1,2

40B",3
where all quantities are evaluated at ZOo If A (z) has a first order zero at
zo and B(z) a second order zero, then
Res (f(z), zo] =

(47)

2A'(zo)
B"(zo)

.

ARGUMENT PRINCIPLE. Let fez) be analytic in an open region D containing
the simple closed path C; let fez) have at most a finite number of singularities
inside C, all of which are poles, and let fez) ~ 0 on C. Then

(48)

_1 ,j/'(z) dz
27ri c:r' fez)

= number of zeros of f

inside C - number of poles of f inside C,

where zeros and poles are counted according to multiplicity.

COMPLEX VARIABLES

C.

7-15

The left-hand side of eq. (48) is termed the logarithmic residue of f(z) on
It can be written as

~ J:.d logf.
27l"~c~

As z traces C, w

J

= f(z) traces a path Cw in the w-plane. The integral

d log f(z) equals i times the total change in the argument of w as the

path Cw is traced. Hence it equals 27l"i times the "winding number" of
Cw about w = 0, i.e., the number of times that Cw effectively winds about
w = 0 in the positive direction.
THE ;FUNDAMENTAL THEOREM OF ALGEBRA (see Chap. 2). From the
argument principle one deduces that every polynomial in z of degree N has
precisely N zeros in the complex plane.
ROUCHE'S THEOREM may also be deduced: if both fl (z) and f2(Z) are
analytic in a simply connected open region containing the simple closed
path C and If1 (z) - h(z) I < Ih(z) I on C, thenfl (z) andf2(z) have the same
number of zeros inside C.
Evaluation of Definite Integrals by Residues. A great variety of
definite integrals can be evaluated with the aid of residues. For example,
if R(u, v) is a rational function of u and v, then

1 + 1)

271"
~ (Z2 Z2
dz
R(sin e, cos e) de =
R - - .-, - - -:o
2~z
2z
~z

i

(49)

Izi =

1

and the integral on the right can be computed by residues. Also, in general
(50)

foo J(x) dx

=

2"i {sum ofresidues of J(z) in the half-plane y > O},

-00

provided f(z) is analytic for y
y

~

0 except for a finite number of points in

> 0, i 0, g(z) is rational, and
g(z) has a zero at 00. For further applications one is referred to Chap. VI
of Ref. 12.

GENERAL MATHEMATICS

7-16

6. ANALYTIC CONTINUATION

Let IICz) be analytic in the open region Db f2CZ) in D 2. If D2 and Dl
have a common part and fl Cz) = hCz) in that common part, then f2CZ) is
said to be a direct analytic continuation of II Cz) from Dl to D 2. Given
IICz), Dl, D 2 ,the function f2CZ) mayor may not exist; however, if it does
exist, there can be only one such function Cuniqueness of analytic continuation).
Let Db D 2, "', Dn be regions such that each has a common part with
the next and let hCz) be analytic in Dj (j = 1, "', n). If hCz) = h+l Cz)
Cj = 1, "', n - 1) in the common part of Dj, D j +b then one says that
fl Cz) has been continued analytically from Dl to Dn via D 2, "', D n- 1 and
calls fnCz) an Cindirect) analytic continuation of ftCz). Given flCZ) and the
regions D 1 , " ' , D n , there is at most one analytic continuation of II Cz) to
Dn via D 2, "', D n- 1 • There may exist other continuations of II (z) to Dn
via other chains of regions.
Given a function fCz) analytic in region D, one can form all possible
continuations of fCz) to other regions. The totality of such continuations
is said to form an analytic function in the broad sense CWeierstrassian analytic function). In this sense log z, VZ, sin- 1 z can each be considered as
one analytic function. The importance of the concept is illustrated by the
fact that every identity satisfied by fCz) will be satisfied by all its analytic
continuations. The term "identity" includes linear differential equations
with polynomial coefficients.
ExaInple of Analytic Continuation. The functions
00

flCZ) --

zn

2: 2n +,

n=O

1

\z\

< 2,

are analytic continuations of each other. Indeed, both are power series
expansions of hCz) = I/C2 - z) and have the same ~um for \z\ < 2. Also
hCz) can be regarded as the Taylor series of flCZ) about z = -1. This
series happens to converge outside of Iz \ < 2 and hence provides an analytic continuation.
Analytic Continuation fro I n Reals. Let II Cz) be defined only for
y = 0, a < x < b, i.e., only when z is real and between a and b. Let f2CZ)
be analytic in an open region D which includes the interval of definition
offlCz). Iff2Cz) = IICz) on this interval, thenhCz) is said to be an analytic
continuation of II Cz) from reals. Again continuation, if possible, is unique.
Examples. eZ as a continuation of eX, sin z as a continuation of sin x,
log z as a continuation of log x.

COMPLEX VARIABLES

7-17

7. RIEMANN SURFACES
The function w = zY2 can be considered as an analytic Junction in the
broad sense; that is, it is formed of several functions which are analytic
continuations of each other. The resulting totality has the defect that it
is two-valued: for each z, there are two possible values (except for z = 0).
To remedy this defect one regards w = zY2 as a function defined not in the
z-plane, but on a Riemann surface over the z-plane. In this case, the Riemann surface can be constructed as follows. One takes two copies of the
z-plane, calling them Sheet I and Sheet II. Each sheet is considered as
cut open along a branch line, the positive real axis. Sheet II is placed
directly over Sheet I, with axes in the same position, and then the two
sheets are attached by joining upper edge of the cut line of each sheet to
the lower edge of the cut line of the other, as suggested in Fig. 6. Un-

FIG. 6.

Riemann surface of Z72.

FIG. 7.

Branch line for w = Z72.

fortunately this cannot be carried out in space. For each point in the
,z-plane, one has then two points in the Riemann surface, one in each sheet.
As one traces a path about z = 0 in the z-plane, one can describe a corresponding path in the Riemann surface by assigning a sheet to each position; no change of sheet can be made except when crossing the branch
line, and a change of sheet must be made at such a crossing (Fig. 7). A
closed path in the z-plane will not in general lead to a closed path on the
Riemann surface. A path which closes up after two encirclements of the
origin will be closed on the Riemann surface. The origin itself appears as
a point common to the two sheets and is termed a branch point.
On the Riemann surface just constructed one can now define yz as a
single-valued function as follows: Vz = Vr eiO / 2 , 0 < 8 < 27r, on Sheet I;
z = yr eiO / 2 , 27r < 8 < 47r, on Sheet II. Above the branch line continuity
determines the proper value to be assigned.
The procedure described can be generalized to
w =~;;,

w

=

y

(z - 1) (z - 2) (z - 3)

7-18

GENERAL MATHEMATICS

and to all algebraic functions. In general n sheets will be required and
several branch lines and branch points. The surface for

w=

V Cz

- 1) Cz - 2) Cz - 3)

is suggested in Fig. 8.
The procedure can be extended to nonalgebraic functions, but in general infinitely many sheets are required. An important case is log z, for
z-plane

z-plane

II
-II -1

4

a

1 II

0

1 II

'1lffii'~~~If1ii~I~~ffiH,fffitllllllll!lIlIlIlIlI>
-II -I

II

FIG. 8. Riemann surface of
= [(z - l)(z - 2)(z - 3)]~.

w

FIG. 9.

Riemann surface of
log z.

°

which sheets 0, ±I, ±II, ... are needed, as in Fig. 9. In this case z =
is a logarithmic branch point and is not regarded as a point of the Riemann
surface.
8. ELLIPTIC FUNCTIONS

Let fCz) be a meromorphic function Canalytic except for poles); fCz) is
said to have period w, W ~ 0, if fCz + w) = fCz) for all z; fCz) is called an
elliptic or doubly periodic function if f is not constant and has periods WI, W2'
and if WdW2 is not real. It then follows that nlwl
n2w2 are also periods,
for every choice of the integers nI, n2. For proper choice of WI, W2 these
are all the periods of f and it will always be assumed that WI, W2 are so
chosen.' The numbers Q = nlwl + n2w2 form the vertices of a paving of
the plane by parallelograms, anyone of which can be chosen as a period
parallelogram of fCz); it is convenient to exclude the points on a pair of
adjacent sides from each period parallelogram. It can be proved that fCz)
has a finite number N of poles Ccounted according to multiplicity) in a
period parallelogram; N is the order of fCz) as an elliptic function; N is
always at least 2. In general, fCz) - a has N zeros in the parallelogram.
Jacobian Elliptic Functions. Examples of elliptic functions are provided by the functions
sn z, cn z, dn z

+

COMPLEX VARIABLES

7-19

of Jacobi. These can be defined as follows. For fixed k, 0
(51)

F(w) =

i

w
0

VI _

 0, but r can be continued analytically
and becomes a meromorphic function with poles of order 1 at 0, -1, -2,
Identities satisfied by r(z) are the following:
(84)

r(z
r(n

(85)

+ 1)

+ 1)

= n!

= zr(z);
(n

r(z) r( -z) =

(86)

r(z) = lim
n ~ z(z

(87)

= 1, 2, 3, ... );
z sin
n!n Z

7rZ

+ 1) ... (z + n) ;
ze'Yz IT [(1 + -=) e- z1n ] ,
00

_1_
r(z)

(88)

=

n

n=l

where
(89)

'Y

= lim
m ~

00

(f ~n - log m) =

0.5772 1566 49 ...

n=l

is the Euler-Mascheroni constant.
The Beta Function.

fa t
1

(90)

B(z, w) =

Z

-

1

Re z

(1 - t)w-l dt,

>

0, Re w

> o.

This is expressible in terms of the r-function:

r(z)r(w)
B(z w) =
..
,
r(z + w)

(91)

The Incomplete Gamma Function.

(92)

1'(a, z)

= fa'e-'t a - . dt,

Re

a>

O.

This is expressible in terms of the Whittaker function of the preceding
section:
(93)
The Error Functions.

(94)
(95)

Erf (z) =

Erfc (z) =

f

Z

OCi

fa'e-" dt;
~

7r

e- t dt = - - Erf (z).
2

COMPLEX VARIABLES

7-27

These functions are also expressible in terms of the Whittaker function:
(96)
The LogarithIllic Integral Function.

(97)

Ii (z) =

i

z

- dt = - (- log z) Y22Z 722W -Y2,o( - log z).
o log t

The Exponential Integral Function.

_foo e-

Ei (z) =

(98)

-z

t

dt = Ii (e Z ).

t

The Sine and Cosine Integral Functions.
(99)

si z =

I

sin t

z

-

00

Si z =

(100)

(101)

1
dt = - [Ei (iz) - Ei (-iz)]
t
2i
'

Ci z = -

f

oo

z

i

o

cos t
-

t

Z

sin t

7T'

t

2

- - dt = -

+ si z·

dt = ![Ei (iz)

'

+ Ei (-iz)].

In eq. (97) z is first taken as real and positive, but analytic continuation
then gives meaning to the function, as a multiple-valued function, for all
z ~ O. Similarly, in eq. (98), z is first to be real and negative. The functions si z and Si z are entire functions; Ci (z) - log z is also entire.
The RieIllann Zeta Function.
00

(102)

t(z) =

1

L-'z

Re z

>

1.

n=ln

This function can be continued analytically and becomes a function singlevalued and analytic for all z except z = 1, where r(z) has a pole of first
order. One has the integral representation:
(103)

1

t(z) = r(z)

i

oo
0

t
eZ

z 1
_

1 dZ J

Re z

>

1.

7-28

GENERAL MATHEMATICS

REFERENCES
1. Higher Transcendental Functions, Vols. 1, 2, 3, prepared by the staff of the Bateman
manuscript project, McGraw-Hill, New York, 1954.
2. L. V. Ahlfors, Complex Analysis, McGraw-Hill, New York, 1953.
3. R. V. Churchill, Introduction to Complex Variables and Applications, McGraw-Hill,
New York, 1948.
4. H. Hancock, Lectures on the Theory of Elliptic Functions, Vol. 1, Wiley, New York,
1910.
5. H. Hancock, ElUptic Integrals, Wiley, New York, 1917.
6. A. Hurwitz and R. Courant, Funktionentheorie, 3rd edition, Springer, Berlin, 1929.
7. E. Jahnke and F. Emde, Tables of Functions, 3rd edition, Teubner, Berlin, 1938.
8. W. Kaplan, A First Course in Functions of a Complex Variable, Addison-Wesley,
Cambridge, Mags., 1953.
9. K. Knopp, Theory of Functions, Vols. 1, 2, translated by F. Bagemihl, Dovel',
New York, 1945.
10. F. Oberhettinger and "Y. Magnus, Anwendung der Elliptischen Funktionen in
Physik und Technik, Sp!'inger, Berlin, 1949.
11. E. C. Titchmarsh, 'l'he Theory of Functions, 2nd edition, Oxford University Press,
Oxford, England, 1939.
12. E. T. Whittaker and G. M. Watgon, A Course of Modern Analysis, 4th edition,
Cambridge University Press, Cambridge, En,?;1and, 1940.

A

GENERAL MATHEMATICS

Chapter

8

Operational Mathematics

w. Kaplan
1. Heaviside Operators

8-01

2. Application to Differential Equations

8-05

3. Superposition Principle.

8-06

Response to Unit Function and Delta Function

4. Appraisal of the Heaviside Calculus

8-07

5. Operational Calculus Based on Integral Transforms
6. Fourier Series. Finite Fourier Transform

8-07
8-10

7. Fourier Integral.

8-15

Fourier Transforms

8. Laplace Transforms

8-17

9. Other Transforms

8·18

References

8- 19

RI1) + (1/<1>2) = (1/<1>2) + (1/<1>1), as is multiplication, provided the coefficients are constant.
The ratio of two polynomial operators is defined by the equation:

<1>1 (D)
[ 1 ]
<1>2 (D) f = <1>1 (D) <1>2(D)f .

(10)

Here the order chosen is essential: it is not true that
(11)

even if the coefficients are constant.
linear.

All operators defined thus far are

Integral Representation of Inverse Operators with Constant
Coefficients. One has the formulas:
1
-f(t)
=

(12)

(13)

D

feu) du,

0

- -1f ( t ) = eat
D-a
1

(14)

it
it
it

f t - eat
(D_a)k ()-

e-aUf(u) du,

0

0

e-au(t - U)k-l
u du
(k-l)!
f()
,

where a is constant and k = 1, 2, .... Now <1>(D) can be factored as in
algebra:
(15)

where rr, "', rn are the roots of the characteristic equation
(16)

Correspondingly,
(17)

Thus if n = 2

TIt t
_1_f=
1
[
1
f] = e
e<-T1 +T2)U rUe-T2Vf(v) dv du.
<1> (D)
ao(D - rl) CD - r2)
ao)o
Jo

r

In general, computation of [1/<1>(D)]f is reduced to a repeated integration.

GENERAL MATHEMATICS

8-04

If ¢(r) has complex roots, quadratic factors appear in eq. (17); for these one
has the rule:
1
eatit au
,
(18)
2
2f = e- sin bet - u)f(u) duo
(D - a) + b
b 0
If II ¢(D) is expanded in partial fractions as in algebra, then the corresponding operator identity is valid; for example,

t
= !e'i

e~f(u) du

t

- !e-'i e"f(u) duo

HEAVISIDE EXPANSION THEOREM. More generally, a ratio ¢1(D)/¢2(D)
can be replaced by its partial fraction expansion. If in particular the degree of ¢2 exceeds that of ¢1, and ¢2(r) has simple roots rl, r2, ... , r n ,
then by Chap. 7, Sect. 5,
¢1 (r)

~ ¢1 (rk)
1
L..J----'

¢2(r)

k=1 ¢' 2(rk) r - rk

-- =
(19)
¢1 (D)

=

¢2(D)

i=

1
k=1 ¢'2(rk) D - rk
¢1 (rk)

This is in essence the Heaviside expansion theorem.
Power Series Operators. The formal relation
1

1

--=----D - a

does not agree with the definition of I/(D - a). However, if the operator
is applied to a polynomial in t, one obtains a particular solution of the
corresponding differential equation (with modified initial conditions). For
example,

is a solution of
dx

--ax=t
dt

for which x(O) = -2a-3 •

2

OPERATIONAL MATHEMATICS

8-05

One can also expand in inverse powers of D:
a
a2
D-a=D+D2 +D3
1

(20)

1

-····

In this case the rule can be proved to be correct.
OCJ

The power series

L

hnDn / n! can be interpreted as the operator ehD •

o
One then finds, under appropriate conditions,

(21)
2. APPLICATION TO DIFFERENTIAL EQUATIONS

The general solution of a linear differential equation,

cjJ(D)x = J(t),

(22)

is formed of the complementary Junction xc(t), which is the general solution
of the homogeneous equation cjJ(D)x = 0 and of a particular solution xp(t)
of the given equation:
(23)
[cf. Chap. 5, Sect. 3]. The Heaviside operators provide simple ways of
finding xp(t), namely as the function
1
xp(t) = -J(t);

(24)

cjJ(D)

this is the solution with zero initial conditions. If 1/cjJ(D) is expanded in
partial fractions, one can then apply the integral formulas (12), (13), (14),
(18).
The procedure can be extended to simultaneous equations. For example,

+ (D + l)x +

Dx
(D

1)y = F(t)
2Dy = G(t)

can be solved formally:

x =

2D
D2

+

1

F(t) -

D-1
D2

+1

G(t)

'

Y=

D
D2

+1

G(t) -

D+l
F(t)
D2 + 1

and it can be verified that these provide the solution for which x = 0,
y = 0 when t = o.

GENERAL MATHEMATICS

8-06

3. SUPERPOSITION PRINCIPLE.
DELTA FUNCTION

RESPONSE TO UNIT FUNCTION AND

The Heaviside unit Junction u(t) is defined to equal 0 for t < 0 and to
equal 1 for t ~ O. The solution of the differential equation cP(D)x = u(t)
with zero initial values, i.e., the function (1/ cP(D) )u(t) = A (t) is known as
the indicial admittance or step response.
The superposition principle states that the response of a linear system to
a linear combination cdl (t)
C2J2(t) equals the corresponding linear combination CIXl(t)
C2X2(t) of the responses Xl(t) toh(t), X2(t) tof2(t). In
the typical case x(t) and J(t) are related by a differential equation cP(D)x =
J(t) and the superposition principle is equivalent to the statement that
1/cP(D) is a linear operator.
One can apply the superposition principle to show that (when cP(D) has
constant coefficients) the response to a general J(t) is deducible from the
indicial admittance, i.e., the response to u(t). Indeed, the response to
u(t - h), for h ~ 0, is A(t - h); one can approximateJ(t) by a linear combination ~kCkU(t - tk), where Ck = J(tk+l) - J(tk). A passage to the limit
gives the Duhamel theorem

+

+

x(t) =

(25)

f.

t

J(s)A'(t - s) ds.

[It is assumed that J(t) is 0 for t < 0 and the solution x(t) has 0 initial
values]. If J(t) is constant, equal to 1/ E for 0 ~ t ~ E, and then equal to 0
for t> E, the response is [A(t) - ACt - E)]/E. The limiting case of such
anJ(t), as E ~ 0, is an "ideal function," the delta Junction oCt), also termed
the unit impulse Junction. The response to oCt) is interpreted as A'(t) =
h(t). Accordingly,
x(t) =

(26)

f.

t

J(s)h(t - s) ds.

For some linear systems the response to u(t) appears as [cP(D)/1f(D)]u,
where cP and 1f are polynomials. If 1f has simple roots ba (a = 1, "', k),
then by eqs. (19)

cP(D)
k cP(b a)
1
k cP(b a) ebat - 1
- u = 2:-u = 2:-u(t)
1f(D)
a=l1f'(ba) D - ba
a=l1f'(b a)
ba
and hence
(27)

OPERATIONAL MATHEMATICS

8-07

4. APPRAISAL OF THE HEAVISIDE CALCULUS

The operational methods described in the preceding section provide a
valuable tool for solution of linear differential equations. The method has
two principal drawbacks: it is very awkward to obtain solutions with specified initial values other than 0; further development of the method leads
to symbolic expressions whose meaning has to be studied afresh in each
case. Great ingenuity has been employed to remedy these defects but a
satisfactory general theory within the Heaviside framework has not been
found.
On the other hand, it has been discovered that all the goals of the
Heaviside calculus can be achieved without reference to differential operators or their inverses and, indeed, without any symbolic calculus. The
means to this end is the Laplace transform (see Chap. 9); the closely related
Fourier transforms can also serve the purpose. By means of these the
questions about initial conditions are easily disposed of, and justification
of formal rules becomes simple.
The transformations referred to do not merely serve as a substitute for
the Heaviside calculus. Deeper study shows that they lie at the very basis
of that calculus and must inevitably enter in a full justification of the
operational rules.
5. OPERATIONAL CALCULUS BASED ON INTEGRAL TRANSFORMS

One considers equations ,of form
(28)

F(y)

=

f

b

f(t)K(t, y) dt.

a

Such an equation assigns a function F(y) to each function f(t), whenever
the integral has meaning. One calls F the integral transform of f with respect to the particular transformation (28) and writes:
(29)

F = T[f].

The relation between f and F is much like that between independent and
dependent variables; here the variables are functions.
Because of the form of eq. (28), the transformation T is linear:
(30)

The transformation (28) is said to have a (single-valued) inverse if, for
each F of a certain class, there is precisely one f such that T[f] = F. One
writes:
(31)

f =

T- 1 [F]

GENERAL MATHEMATICS

8-08

and calls f the inverse transform of F. Because T is linear, T- 1 must also
be linear.
Convolution. If to each pair of functions iI, f2 one can associate (in a
unique manner) a third function fa such that
T[f3] = T[iI]' T[12]'

(32)

then one calls fa the convolution of iI, 12 and writes:
(33)

fa =

iI *12·

The convolution must then obey simple laws:
(34)

*iI ; iI * (12 + fa) = iI * f2 + fl * f3;
= eiI * 12 = e (iI * h) ; iI * (12 * fa) = (iI * h) * f 3·

iI * f2 =
iI * ef2

f2

Solution of Differential Equations. Suppose the transformation
T has the property that, for a certain polynomial differential operator cp(D)
and for f(t) in a certain class of functions, one has an identity:

T[cp(D)f] = H(y)T[J] = H(y)F(y),

(35)

where H(y) is a function of y associated with the operator cpo
solve a differential equation

Then to

cp(D)x = get)

(36)

for x = f(t) in the class

ref~rred

to, one forms the transformed equation

T[cp(D)x]

= T[g]

or equivalently, by eqs. (35),
H(y)F(y) = G(y).

(37)

Accordingly,
(38)

F( ) = G(y) ,
y
H(y)

f(t) = T-1

[G(Y)].
H(y)

One can try to find the inverse transform of G(y)/H(y) with the aid of
tables of functions and their transforms. One can also seek
T-1

(39)

[_1 ] =

w(t).

H(y)

Then eq. (38) gives
(40)

T[f(t)] = T[w(t)]· T[g(t)] = T[w

so that
(41)

f(t) = wet)

* get).

* g],

OPERATIONAL MATHEMATICS

8-09

The crucial question is choice of the transformation '1' so that eqs. (35)
hold. For differential" equations with constant coefficients it is sufficient
to choose 'T so that
T[Df(t)] = H(y)F(y).

(42)

For then
(43)

T(aoDn

+ ... + an-1D + an)! =

(aolIn

+ ... + a1H + an)F(y).

Fourier Integral. Now associated with the operator D are certain
functions f such that Dj is a constant times f; these are precisely the functions ke at • It is known that an "arbitrary" function f is expressible as a
"sum" of functions of this form. For example, under appropriate conditions,

(44)

r(t)

~ foo F(w)ei"' dw;
-00

this is the representation of f as a Fourier integral.
(45)

F(w) = -1
271"

One finds that

j'OO f(t)e- iwt dt,
-00

so that F(w) can be considered as ~T[f], a linear integral transform of T;
except for a constant multiplier, this is the Fourier transform of f. The
fact that De iwt = ie iwt is reflected in the formula .'
Df = f'(t) =

fOO iwF(w)eiwt dw,
-00

which follows from eq. (44).
(47)

Hence
T[Dfl = iwF(w).

Thus the transformation T defined by eq. (45) has the property desired.
The functions f representable as Fourier integrals must be small for
large positive or negative l (see Sect .. g). For functions not satisfying such
a condition other representations can be used. If f is defined only for
t ~O and does nQt grow too rapidly as t ~ 00, one can use the Laplace
transform. If.f is defined for all t and has period 271". then .f can be represented by a Fourier series; associated with this series is the finite Fourier
transform.
If cp(D) does not have constant coefficients, the transformation T must
be related specially to the particular operator cp. Associated with cp are the
"characteristic functions" f for which cpD(f) is a constant times f. Representation of an arbitrary function as a series or integral of such characteristic functions leads to a corresponding integral transformation.

8-10

GENERAL MATHEMATICS

6. FOURIER SERIES.

FINITE FOURIER TRANSFORM

Let J(t) be defined for all real t. One says that J(t) has period T ~ 0 if
+ T) = J(t) for all t. A function J(t) given only for a < t < b can
always be defined outside this interval so as to have period T = b - a
(periodic extension of J( t) ) .
Let J(t) have period T and let w = 2rr/T. The Fourier series of J(t) is
defined as the series:

I(t

a

~

(48)

2

where
(49)

an = -2

T

00

+ L: (an cos nwt + bn sin nwt),
n=l

iT

i
T

2
bn = -

J( t) cos nwt dt,

0

T

J(t) sin nwt dt.
0

Because of the periodicity of J(t), the interval of integration in eqs. (4J)
can be replaced by any other interval of length T, e.g., from - Y2T to Y2T.

T

FIG. 1.

2T

Piecewise continuous function of period T.

It is assumed that the integrals in eqs. (49) have meaning. For this it is
sufficient that J(t) be piecewise continuous, i.e., continuous except for jump
discontinuities (Fig. 1).
Convergence. The Fourier series of J(t) converges to J(t) under very
general conditions: for example, wherever J(t) is continuous and has a
derivative to the left and to the right. At a jump discontinuity to the
series converges to
![f(to+) + J(t o - )],
where
(50)

JCto+) = lim J(t) ,
t---->to+

JCto-) = lim J(t) ,
t---->to-

OPERATIONAL MATHEMATICS

8-11

provided [J(to + h) - J(t o+ )]/h and [J(to -) - J(to - h)]/h have limits as
h ~ 0+. For example, if J(t) = t for -1 < t < 1 and J(t) has period 2,
then the corresponding Fourier series converges to 0 at t = 1, t = -1,
t = 3, t = -3, .... It is common practice to redefine J(t) as Y2[J(to+) +
J(t o -)] at each jump discontinuity.
If J(t) is merely continuous, there is no general theorem on convergence.
However, one has a "convergence in the mean," that is, if sn(t) denotes
the sum of the first n terms of the series (48), then the "mean square error"

iT
T

-1

[J(t) - sn(t)]2 dt

0

tends to 0 as n ~ 00. This result holds considerably more generally,
e.g., if J is merely piecewise continuous.
If J(t) has a continuous derivative over an interval to ~ t ~ tI , then the
Fourier series of J(t) converges uniJormly to J(t) over this interval; i.e.,
max

(51)

to

IJ(t) - snU) I ~ 0

as n ~

00.

;£ t ;£ tl

In general, if a series of form (48), i.e., a "trigonometric series," converges
uniformly to J(t) for 0 ~ t ~ T, then the series must be the Fourier series
of J(t).
A function is determined uniquely by its "Fourier coefficients" ao, aI,
.. " bI , . . . ; that is, if J(t) and get) have the same Fourier coefficients, then
J(t) = get) except perhaps at points of discontinuity.
Fourier Cosine and Sine Series. If J(t) is even [J(t) = J( -t)], then
all bn are 0 and J(t) is represented by a Fourier cosine series; that is,
(52)

a

J(t) = ~
2

+ L: an cos nwt, an = -4
00

n=I

T

iT'2

J(t) cos nwt dt,

0

provided the convergence conditions are satisfied. If J(t) is given merely
between 0 and Y2T, eqs. (52) are still valid; for J(t) can be extended to all t
to be even and have period T. Similar remarks apply to representation of
an odd function [J(t) = -J( -t)] by a Fourier sine series:
00

(53)

L: bn sin nwt,

J(t) =

n=I

bn = -4

T

iT'2

J(t) cos nwt dt.

0

The identities:
(51)

1.
.
sin a = - (eta - e- ta )
2i
'

GENERAL MATHEMATICS

8-12

lead to a rewriting of the formulas (49) in complex form. Under conditions
for convergence,
00

T1

n=

C

iT .
0

f(t)e- mwt dt,

n

= 0, ±1, ....

n=-oo

One can interpret the doubly infinite

Finite Fourier TransforIn.

sequence of numbers

.£

T

f(t)e- inwt dt,

n = 0, ±1, ±2, "',

as a function of n, (n), defined only when n is an integer. The equation
(n) =

(56)

.£

T

f(t)e- inwt dt

can then be regarded as a special case of the linear integral transformation
(28); the variable y is replaced by n and is restricted to integer values.
The notations:
(57)

(n) = cp[J(t)]

or

 = cp[f]

will be used to denote the functional transformation CP, the finite Fourier
transformation, which assigns the function (n) to the function J(t). cP is
then defined at least for all J(t) which are piecewise continuous for 0 ~ t
~ T.
As in Sect. 5, cP is linear:
(58)

Inverse TransforIn. If cp[f] = , then one writes: J = cp-l[]. The
inverse transformation is then uniquely defined by the theorem stated
above concerning functions having the same Fourier coefficients. It is a
less simple matter to describe those functions ¢(n) for which <1>-1 exists.
One class of such functions ¢(n) consists of those for which the series
~T-l(n)einwt converges uniformly for 0 ~ t ~ T. The sum of the series
is then a function J(t) which serves as cp-l[];

1

(59)

<1>-1[¢]

T
Convolution.

Given

00

L:

=-

(n)einwt •

n=-oo

!I (t), J2(t) having period T, their convolution is

defined as:

.£ !I
T

(60)

Ja(t) =

(s)h(t - s) ds;

OPERATIONAL MATHEMATICS

one writes:
property:

8-13

h (t) = f 1 (t) * 12 (t) . One can then prove the characteristic

(G 1)

cfJ[ft

* 12]

= cfJ[fd . cfJ[h]·

If f(t) has a continuous derivative
T, then an integration by parts proves that

Transformation of Derivatives.

for 0

~

t

~

(62)

cfJ[f'(t)] = f(T) - f(O)

+ inwcfJ[f(t)].

This rule can be made the basis for application of the finite Fourier transform to boundary value problems. Interest will be concentrated here on
the periodic case: f(T) = f(O), for which the rule becomes
cfJ[f'(t)] = inwcfJ[f(t)].

(63)

Similarly, if
order,

f is periodic and has continuous derivatives through the kth

(64)
this relation remains true if f(k-l) (t) is continuous and f(k) is continuous
except at a finite number of points at which left and right hand kth derivatives exist. From eq. (64) it follows that, for every polynomial operator
1f;(D) = aoDn + ... + an_1D + an with constant coefficients
(65)

cfJ{1f;(D)[f(t)]} = 1f;(inw)cfJ[f(t)].

Steady-State Solutions of Differential Equations. Letf(t) be piecewise continuous and have period T. Let ao, "', an be constants and let
1f;(D) = aoDn + ... + an-1D + an. It can then be shown that in general
the differential equation

1f;.(D)x = f(t)

(66)

has a solution x = X(t) having period T; X(t) has continuous derivatives
through the (n - l)st order and an nth derivative which is continuous.
where F(t) is continuous. If 1f;(p) has no root of the form inw for some n,
there is precisely one such periodic solution; it will be assumed in the following that 1f;(inw) ~ 0 for every n. Applying the finite Fourier transformation to eq. (66), one finds by eq. (65)
(67)

cfJ[X]

(n) einwt •

n=-oo

!f;(inw)

One can attempt to reduce this to a simpler form by developing a table of
finite Fourier transforms and inverse transforms. One can also apply the
convolution formula to eq. (67):
(69)

X = g *J =

i

T

J(s)g(t - s) ds,

where g = cp-I[I/!f;(inw)]. To find g, decompose 1/!f;(inw) into partial fractions and apply linearity. The problem is reduced tio finding inverses of
(inw - a)-k (k = 1,2, ... ). One finds:
cp-I [

1

•
~nw

]

- a

= k a eat ,

(70)

where ka = (1 - eaT )-I. In particular, if !f;(p) has simple roots PI,
Pm, so that
(71)

one finds that
m

(72)

=

X(t)

L AjH~(t, Pj)
j=1

where
HI(t, p) = ePt[QI(t, p)
(73)

QI(t, p) =

i

+ kpePTQJ(T, p)],

t

J(s)e- P8 ds,

0

~t~

T.

The operators QJ and HI can be tabulated for various functions J(t) of
interest, so that the corresponding periodic solutions can be found easily.
For tables and illustrations of applications see Ref. 11.

OPERATIONAL MATHEMATICS
7. FOURIER INTEGRAL.

8-15

FOURIER TRANSFORMS

Fourier Integral. By allowing the period 'P to become infinite, one is
led to the following integral analogue of the Fourier series expansion:

f(t) = 1.oo[a(w) cos wt

+ b(w) sin wt] dw,

(74)
1

a(w) = 7r'

f-:

f

1

00

b(w) = -

J(t) cos wt dt,

7r'

-00

f

00

J(t) sin wt dt.

-00

The "coefficients" aCw), b(w) exist if J(t) is, for example, piecewise continuous, and

IJCt) Idt exists. The representation of J(t) as a Fourier inte-

gral is then valid under the same conditions as for Fourier series, e.g.,
wherever f' (t) exists. Also, under the conditions described in Sect. 8, the
integral equals Y2[J(t o+) + J(t o -)] at each jump discontinuity of J. One
can write eqs. (74) in complex form:

f(t) =

(75)

Joo A(w)e

iw ,

1

dw,

A(w)

=-

27r'

-00

f

00

.

J(t)e- twt dt;

-00

the first integral must, however, be treated as a principal value, i.e., as

f ...
b

lim

as b

-7 00.

-b

Under conditions analogous to those for Fourier series one is led to representation of a function JCt) in the interval 0 ~ t < 00 by a Fourier cosine

integral

!a

ooa(w) cos wi dw.

It is customary to define the Fourier cosine

transform of J(t) as

/2

00

F,(w) = \/; 1. f(t) cos wt dt

(7G)

so that the Fourier cosine integral representation of J reads

/2

00

f(t) = \ / ; 1. F,(w) cos wt dw;

(77)

thus J is also the Fourier cosine transform of F c' Similar formulas hold
for the Fourier sine transform:

(2

(78)

00

F,(w) = \/; 1. f(t) sin wt dt,

J(t)

/2

00

\/; 1. F.(w) sin wt dw.

GENERAL MATHEMATICS

8-16

Similarly one defines the (exponential) Fourier transform ofj as

F(w) =

(79)

1 fifI'J j(t)e-iwt dt,
V271" -rYJ

so that by eqs. (75)
1

rYJ
f
!(w)e
v271" -rYJ

jet) = _ /-

(80)

iwt

dw.

Properties of the Fourier Transform. For simplicity, the numerical
factor is dropped and the Fourier transform is defined as

CPrYJ[f] = jrYJj(t)e- iwt dt;
-rYJ

(81)

then cI>rYJ[f] is a linear operator. If j has a continuous derivative 1'(t) and

jet), 1'(t) satisfy the conditions stated above, then
(82)

A convolution is defined as follows:

j

(83)

* g = frYJj(S)g(t

- s) ds = h(t)

-rYJ

and one has the characteristic property:
(84)

it is assumed here that j, g satisfy the conditions given above. An inverse
operator is defined by the condition: rYJ -l[F] = j, if CPrYJ[f] = F. The function j can be shown to be uniquely defined by its transform F.
The applications of the Fourier transform to differential equations parallel those for the finite Fourier transform, as described in Sect. 8 above; eq.
(65) is replaced by
(85)

Application of the transform to the equation 1/;(D)x = jet) yields a solution
in the form of a Fourier integral:
(86)

1 frYJ F(w).
.
X(t) = - - e~wt dw
271" -rYJ 1/;(iw)
,

or as a convolution:
(87)

X(t) = j

* g,

OPERATIONAL MATHEMATICS

8-17

°

If J(t) = for t < 0, the same solution is obtainable by Laplace transforms;
see Sect. 10 and Chap. 9 below.
References to tables of Fourier transforms are given at the end of this
chapter (Refs. 1,5,6).
8. LAPLACE TRANSFORMS

The Laplace transform of J(t) , t

~

0, is defined as

.

(88)

00

F(s) = L[f] = 1"0 J(t)e- st dt.

It is convenient to allow s to be complex: s = u
reads:

+ iw.

Equation (88) then

(89)

hence for each fixed u the Laplace transform of J is the same as the Fourier
transform of J(t)e- qt , where J is considered tobe for t < 0:

°

(90)

Accordingly, the Laplace transform is well defined if u is chosen so that
(91)
exists, and for such u one can invert:
1

J(t)e- l1t = 27J'

f

00

F(u

+ iw)e

.

twt

dw,

-00

(92)

1
J(t) = L-l[F(s)] = 27J'

f

00

F(s)e st dw,

t>

0;

-00

+

in the last integral s = u
iw, u has any value such that (91) exists,
and the integral itself is a principal value. The integral can be interpreted
as an integral in the complex s-plane along the line u = const., w going
from -00 to +00 (Fig. 2). Since ds = idw on the path,
(93)

J(t) =

~

rF(s)e

27J'~ Jc

st

ds,

C being the line u = const. The conditions for equality of left and right
sides of (93) are the same as for Fourier integrals. At t = 0, J(t) will

8-18

GENERAL MATHEMATICS

in general have a jump, because of the convention that J(t) be 0 for t < 0,
and accordingly the right hand side gives )1J(O+).
The validity of eqs. (88) and (92) depends on choosing (J' so that (91)
exists. It can be shown that for each J(t) there is a value (J'o, - 00 ~ (J'o ~
w
s-plane

c

FIG. 2.

Path of integration for inverse of Laplace transform.

+00, called the abscissa oj absolute convergence, such that the integral (91)
exists for (J' > (J'o. If (J'o = -00, all values of ..(J' are allowable; if (J'o = +00,
no values are allowed.
Further properties of the Laplace transform and its applications are discussed in Chap. 9.
9. OTHER TRANSFORMS

The two-sided Laplace transJorm is defined as
(94)

L 1 [f] = F(s) = jooJ(t)e- st dt.
-00

Hence it differs from the (one-sided) Laplace transform only in the lower
limit of integration; thus
(95)

with no requirement that J(t) be 0 for t < o. The two-sided transform is
thus a generalization of the one-sided transform.
The Laplace-Stieltjes transJorm of get) is defined as
(96)

G(s)

~ .£00e-,t dg(t).

The integral on the right is an improper Stieltjes integral; it has meaning
if get) is expressible as the difference of two monotone functions and if the
limit as b ~ +00 of the integral from 0 to b exists. If g'(t) = J(t) exists,

OPERATIONAL MATHEMATICS

8-19

then G(s) is the Laplace transform of f(t). If get) is a step function with
jumps at t1 , t2 , " ' , the integral'reduces to a series };cje- tjs • For furtheI
information one is referred to the book of Widder (Ref. 10).
Other integral transforms have been defined and studied. These have
found their main applications in the boundary value problems associated
with partial differential equations; they could conceivably be applied to
ordinary linear differential equations with variable coefficients, on the basis
of the analysis of Sect. 5.
The Legendre transform is an example which assigns to each f(t), -1 ~
t ~ 1, the function
(97)

T[f] = cP(n) =

f

1

f(t)P net) dt,

n=O,I,2,···,

-1

where P net) is the nth Legendre polynomial. The transformation has the
property
T[R{f}] = -n(n+ l)cP(n),
(98)

R{f} =

~

dt

[(1 - t

2

)

~f(t)].
dt

Hence the transform can be applied to differential equations of form
(99) (aoR m + alRm-l + ... + am-1R + am)x = f(t),
-1 ~ t ~ 1,
where ao, ... , am are constants. For details on the Legendre transform
see Ref. 12.
The Mellin transform, Bessel transforms, Hilbert transform, and others
are defined and their properties are listed in the volumes of the Bateman
project (Ref. 1).
REFERENCES
1. Tables of Integral Transforms, Vols. 1, 2, prepared by the staff of the Bateman
manuscript project, McGraw-Hill, New York, 1954.
2. R. V. Churchill, Modem Operational Mathematics in Engineering, McGraw-Hill,
New York, 1944.
3. G. Doetsch, Theorie und Anwendung der Laplace Transformation, Springer, Berlin,
1937.
4. G. Doetsch, Handbuch der Laplace Transformation, Vol. I, Birkhiiuser, Basel, 1950.
5. G. Doetsch, H. Kniess, and D. Voelker, Tabellen zur Laplace Transformation,
Springer, Berlin, 1947.
6. M. F. Gardner and J. L. Barnes, Transients in Linear Systems, Vol. I, Chapman
and Hall, London, 1942.
7. T. von Karman and M. A. Biot, Mathematical Methods in Engineering, McGrawHill, New York, 1940.

,

8-20

GENERAL MATHEMATICS

8. D. F. Lawden, Mathematics of Engineering Systems, Wiley, New York, 1954.

9. B. van der Pol and H. Bremmer, Operational Mathematics Based on the Two-sided
Laplace Integral, Cambridge University Press, Cambridge, England, 1950.
10. D. V. Widder, The Laplace Transform, Princeton University Press, Princeton,
N. J., 1941.
11. W. Kaplan, Operational Methods for Linear Systems, Addison-Wesley, Cambridge, Mass., 1958.
12. R. V. Churchill, The Operational Calculus of Legendre Transforms, J. Math;
Phys., 33, 165--178 (1954).

/

A

GENERAL MATHEMATICS

Chapter

9

Laplace Transforms

w.

Kaplan

1. Fundamental Properties

9-01

2. Transforms of Derivatives and Integrals

9-03

3. Translation. Transform of Unit Function, Step Functions, Impulse
Function (Delta Function)

9-06

4. Convolution

9-08

5. Inversion

9-09

6. Application to Differential Equations

9-10

7. Response to Impulse Functions

9-15

8. Equations Containing Integrals

9-18

9. Weighting Function

9-18

10. Difference-Differential Equations

9-20

11. Asymptotic Behavior of Transforms

9-21

References

9-21

1. FUNDAMENTAL PROPERTIES

Of the various operational methods described in Chap. 8 those based on
the Laplace transform have proved to be the most fruitful.
Basic Definitions and Properties. Let f(t) be a function of the real
variable t, defined for t ~ O. The Laplace transform of J(t) is a function
F(s) of the complex variable s = u + iw:
(1)

L[fl

=

F(s)

= ,£oof(t)e-" dt.
9-01

9-02

GENERAL MATHEMATICS

It is convenient to allow J(t) itself to have complex values: J(t) = it (t) +
ihCt), though for most applications J will be real.
It will be assumed that J(t) is piecewise continuous (Chap. 8), although
the theory can be extended to more general cases. It can be shown that
there is a number 0"0, -00 ~ 0"0 ~ +00, such that
(2)

i

OO

If(t) Ie-at dt

exists for 0" > 0"0 and does not exist for 0" < 0"0. If 0"0 = - 00, the integral
exists for all 0"; if 0"0 = +00, it exists for no 0"; 0"0 is called the abscissa oj
absolute convergence of L[f]. If 0" > 0"0, then the Laplace transform of J
does exist. Accordingly, there is a certain half-plane in the complex s-plane
for which L[f] = F(s) is defined (Fig. 1). Furthermore, F(s) is an analytic
Junction oj s in this halJ-plane (Chap. 7, Sect. 2).
REMARK. For existence of F(s), it is sufficient that the integral in (1)
w

0"0

0"

FIG. 1. Domain of definition of F(s)

= L[fl.

have meaning. It can be shown that there is a number 0"1, the abscissa oj
(conditional) convergence, for which this integral exists, and 0"1 ~ 0"0. For
most applications 0"1 = 0"0 and for most operations on F(s) it is simpler to
restrict 0" to be greater than 0"0.
EXAMPLES OF LAPLACE TRANSFORMS. These are given in Table 1.' For
extensive tables one is referred to Refs. 1, 5, 6, Chap. 8.
Existence. For practical purposes the condition that the Laplace
transform exist for some 0" is that the function J(t) should not grow too
t2
do not have Laplace transforms.'
rapidly as t -7 +00. For example, e ,
In general, a function of exponential type, i.e., a function for which IJ(t) I
< ekt for some k and for t sufficiently large, has a Laplace transform
F(s).

ee\

LAPLACE TRANSFORMS

9-03

Linearity. The Laplace transform is a linear operator. More precisely,
if L[h(t)] = FI(S) exists for u > UI and L[h(t)] = F2(S) exists for U > U2,
then for every pair of constD,nts Cl, C2 L[cdi + c2f2] exists for u >
max (ut, (2) and
(3)

2. TRANSFORMS OF DERIVATIVES AND INTEGRALS

Rules.
(4)

L[f'(t)] = sL[f(t)] - f(O),

(5)

L[f"(t)] = s2L[f(t)] - 1'(0) - sf(O) ,

(6)

L[f(n)(t)] = snL[f] - [f(n-l)(o)

(7)

L

LC

f(t) dt]

+ sj(n-2)(0) + ... +

sn-lj(O)],

~ ~ L[fl·

The first rule is basic here, the others being consequences of it; it is valid
if (for some u) jet) and I'(t) have Laplace transforms and jet), I'Ct) are
continuous for t ~ O. More generally, eq. (4) is valid if only J(t) is continuous and I' (t) is continuous except for jump discontinuities. Similarly,
eq. (6) is valid if the Laplace transforms exist and all derivatives concerned
are continuous except perhaps the nth, which is allowed to have jump disdiscontinuities. Rule (7) is valid if J is piecewise continuous and the transforms exist.
EXAMPLE. If J = sin t, j' = cos t, J(O) = 0, so that L[cos t] = sL[sin t]
= S/(S2 + 1).
Of great importance is the special case of eq. (6):

(8)

L[J(n) (t)] = sn L[f],

if J(O) = I' (0) = ... = J(n-l) (0) = O.

Hence, if one restricts to functions with 0 initial values, differentiation with
respect to t corresponds to multiplication by s.

'0

TABLE

1.

F(s) = L[f] = Lf(t)e-st dt

f(t)

1

b,a:....

LAPLACE TRANSFORMS

Range of u

u>o

1
eat

l/s
1/(s - a)

3

tn (n > -1)

r(n
1)
of
sn+l. or, 1 n = 0, 1, 2,

4

tnefLt (n > -1)

r(n + 1)
of
c;-a)nH or, n = 0,1,2,

5

cos at

s/(s2

6

sin at

7

cosh at

8

sinh at

9

t n cos at (n > -1)

10

tn sin at (n > -1)

11

cos 2 t

2 ; + S2 + 4

S)

(}">O

12

sin 2 t

1 (1
2 s

S)

(}">O

13

sin at sin bt

2

u > Re (a)

+

n!

0

0

0,

u>O

sn+l
n!

(}" > Re (a)

Q
m

(}" > IImal

m

)

(}" > IImal

r-

siCs!' - a2 )
a/(s2 - a2 )

(}" > iReal

3:

(}" > IReal

::z:

I

000,

(s _ a)n+1

+ a2)
al(s?' + a
2

+ 1) (s + ai)n+l + (s - ai)n+l
(S2 + a2)n+l
rCn + 1) (s + ai)n+l - (s - ai)n+1
--(S2 + a2)n+l
2i
r(n

.:.,

1 (1

S2

+4

2abs

[S2

+ (a + b)2][S2 + (a -

b)2]

(}" > IImal

Z

;;0

»

»-4
m

3:

»

::::!

n

(J)

(}" > IImal

(}" > Max (a, (3)
a = 11m (a + b) I
(3 = IIm(a - b)/

14

eat sin (bt

(8 - a) sin c + b cos C
(8 - a)2 + b2

+ c)

u > Max (a, (3)
a = Re (a + bi)
{3

15

16
17

1 for 2n ~ t < 2n + 1
o for 2n + 1 ~ t < 2n + 2
n = 0,1,2, ... (square wave)
1 for a ~ t ~ b < 00
o for 0 ~ t < a and t > b
o for 0 ~ t < b,
1 for t

18
19

20

t, 0
1, t

~
~

~

b

t ~ 1
1

t, 0 ~ t ~ 1
2 - t, 1 ~ t
0, t ~ 2

~

2

a
a - bit - (2n

+ l)b I
for 2nb ~ t ~ (2n + 2)b,

aCt - nb) for nb ~ t
(sawtooth wave)

8(1

+ e-8)

< (n + l)b

= Re (a

- bi)

u>O

e-as _ e- bs

8

all u

e- OS

8
1 - e- s
-8-2(1 - e-8 )
82

n = 0, 1, ... , b > 0,
a real (triangular wave)
21

1

u>O
u>O

r-

»-c
r»n
m

all u

-f
:;0

»

Z

en
"T1
0

a 1 + e- bs
b
82

u>O

a(l + b8 - ebs)
82 (1 _ eb8 )

u>O

:;0

~

en

'0

6
til

9-06

GENERAL MATHEMATICS

3. TRANSLATION. TRANSFORM OF UNIT FUNCTION, STEP FUNCTIONS,
IMPULSE FUNCTION (DELTA FUNCTION)

Translation. In Laplace transform theory it is convenient to consider
each function J(t) to be defined as 0 for t < o. Hence for c ~ 0 J(t - c) is
o for t < c and coincides with a translated J(t) for t > c (Fig. 2). One finds

L[J(t - c)] = fooJ(t - c)e-st dt

(9)

c

= e-CSL[f].
x
I(t)

I(t - c)

c

FIG. 2.

Translated function.

Unit Function. N ow let u(t) = 0 for t ~ 0, u(t) = 1 for t > 0; u(t) is
called the unit function (of Heaviside). By entry 1 of Table 1,
L[u(t)]

(10)

1

= -.
8

Hence for c

~

0

e- Cs

(11)

L[u(t - c)]

= -,
s

where u(t - c) is the translated unit function with jump at t = c (Fig. 3);
cf. entry 17 of Table 1. A square pulse of height h (Fig. 4) can be reprex

x
u(t - c)

h

a
FIG. 3. Translated unit function.

FIG. 4.

b

Square pulse.

LAPLACE TRANSFORMS

9-07

sen ted as a combination of two unit functions:
(12)

o~

J(t) = h[u(t - a) - u(t - b)],

a

< b;

he"nce its transform is
h
L[f] = - (e- as
s

(13)

-

e- bs ).

x

FIG. 5.

Step function.

A general step Junction (Fig. 5) can be regarded as a superposition of such
square pulses:
(14)

J = hl[u(t) - u(t - al)]

+ h2[u(t -

al) - u(t - a2)]

+ ... ;

hence (if the pulses do not grow too rapidly, so that £[J] exists)
(15)

1
L[f] = - [hI (1 - e-a1S )
s

+ h2(e- alS

- e-a,S)

+ ... ].

Impulse Function. The unit impulse Junction at t = 0 is defined as
the limit as e -; 0 of a square pulse form t = 0 to t = e and having unit
area, i.e., the limit as e -; 0+ of

1
- [u(t) - u(t - E)].

(16)

E

The limit does not exist in the ordinary sense; it can be considered as defining an "ideal" function, the delta function o(t). One can consider oCt)
to be 0 except near t = 0 where oCt) is large and positive and has an integral
equal to 1. Now

L [

U(t) - u(t E

E)] __ 1 - e- - -; 1 as
ES

ES

and accordingly one defines:
(17)

L[o(t)] = 1.

E -;

0,

9-08

GENERAL MATHEMATICS

The unit impulse function at t = c is defined as oCt - c) and one finds
L[o(t - c)] = e- C8 •

(18)

It should be noted that L[u(t)] = L[o(t)]/s, so that by eq. (7) u(t)" can
be thought of as an integral of o(t): u(t) = fot oCt) dt. This in turn suggests
interpretation of oCt) as u'(t).
4. CONVOLUTION

Let f(t) and get) be piecewise continuous for t
convolution of f and g is defined as

~

0. Then the (Laplace)

t

f

(19)

f. f(u)g(t -

*g =

u) du = h(t).

It can be verified that h(t) is continuous for t
t

h(t) =

(20)

.

If now, for some u,

f."

1

Jro g(u)f(t -

f.'"

~

u) du = g *f.

II(t) 1e-" dt and

f."

1g(t) 1e-"

h(t) 1 e-" dt exists, so that L[f], L[g], L[h] exist and

L[h]

(21)

= L[f * g] = L[f]L[g].

Properties of the Convolution.

These are:

(23)

= f * g + f * h;
f * (cg) = (cf) * g = c(f * g), c

(24)

f * (g * h) = (f * g) * h.

(22)

0, also that

.

f * (g

+ h)

Special Convolutions.

=

const.;

The following are useful:

(25)
(26)

eat

tn-Ieat

* eat * ... * eat = _ __

(n - I)!

(27)

(28)

eat --:- ebt
eat * ebt = _ __

a-b

(a ;;e b).

(n factors);

dt exist, then

LAPLACE TRANSFORMS

9-09

5. INVERSION

If L[f] = F(s), one writesJ = L- 1 [F], thereby defining the inverse Laplace
transJorm. The inverse is uniquely determined; more precisely, if L[f] =
L[g] and J, 9 are piecewise continuous, then J = g, except perhaps at points
of discontinuity.
If J = L-1 [F], then as for Fourier series and integrals (Chap. 8, Sects.
8,9),
1
00
1
(29) J(t) = F(s)e Bt dw = lim F(s)e Bt dw,
s = (j iw,
27r -00
b-+oo 27r -b

fb

f

+

at every t for which J has left- and right-handed derivatives; in the integrals (j is chosen greater than the abscissa of absolute convergence of L[f].
Under the conditions desr,ribed in Chap. 8, Sect. 8, the integral represents
Y2[f(to+)
JUo -)] at each jump discontinuity to. In general J(t) is defineq to be 0 for t < 0, which will force a discontinuity at t = 0 unless
J(t) ~ 0 as t - t 0+; the integral thus gives Y2J(O+) at t = O.
Conditions for Existence. Given F(s) as a function of the complex
variable s, one can ask whether L -l[F] exists, i.e., whether F is the Laplace
transform of some J(t). For this to hold, F(s) must be analytic in some
half-plane (j > (jO, but this alone is not sufficient. If F(s) is analytic at
s = 00 and has a zero there (Chap. 7, Sect. 5), so that

+

lsi>

(30)

R,

then F(s) is a Laplace transform:
(31)

L -l[F(s)] = J(t) =

00

tn

n=O

n.

2: an +l ,

;

J(t) is of exponential type and is an entire function of t (Chap. 7, Sect. 4).
Furthermore,
(32) ;

J(t) =

~

27r1, c

feBtF(S) ds,

where C is a circle: lsi = Ro > R. If in addition, F(s) is analytic for all
finite s except at SI, "', Sn, then J(t) equals the sum of the residues of
F(s)eBtat SI, "', Sn (Chap. 7, Sect. 5).
More general conditions that F(s) be a transform can be given. If, for
example, F(s) is analytic for (j > (jO ~ 0 and is representable in the form
(33)

F(s)

c

J.I.(s)

=-+s S1+0

(0

>

0),

9-10

GENERAL MATHEMATICS

where IJ.t(s) I is bounded for (T ~ (Tl > (To, then F(s) is the Laplace transform of f(t), where f(t) is given by eqs. (29), with (T = (Tl.
If F(s) is a proper rational function of s: F(s) = P(s)/Q(s), then eq. (32)
is applicable and the integral can be computed by residues. If in particular
Q(s) has only simple roots SI, ... ; Sn, then by Chap. 7, Sect. 5, estp /Q has
residue exp (Skt)P(Sk)/Q' (Sk) at Sk, so that
L -1 [pes)] =
Q(s)

(34)

±

k=I

eSktP(sk).
Q'(Sk)

This corresponds to the Heaviside expansion formula (Chap. 8, Sect. 1).
Particular inverse transforms can be read off Table 1 (Sect. 1) or the
accompanying Table 2. Others can be deduced from these by linearity
and the various rules such as (4)-(7), (9), and with the aid of convolutions.
Extensive lists are given in Refs. 1, 5, 6 of Chap. 8.
Rules for Finding Laplace Transforms and Their Inverses. If
f(t) has period T, then
L[f] =

(35)

1

1

+ e-

S

TiT

e-stf(t) dt.

0

For general f(t) with transform F(s),

~ ~F

G)'

(36)

L[f(atl]

a

(37)

L[e-atf] = F(s

(38)

L[tnf] = (-l)nF(n) (s),

(39)

L[t-nf] =

> 0;

+ a);
n = 1, 2, "';

foo .. 'fooF(S) ds ... ds
,

(n=1,2,···).

S

n times

6. APPLICATION TO DIFFERENTIAL EQUATIONS

Characteristic Function. Let ao, "', an be constants, with ao ~ O.
The function V (s) = aos n + ... + an will be termed the characteristic
function associated with the differential equation

(40)

dnx
ao - n
dt

Transfer Function.
(41)

dx

+ ... + an-l - +
dt

anx = f(t).

The function
1

1

Yes) = = ----Yes)
aosn + ... + an

will be termed the tranf?fer function.

LAPLACE TRANSFORMS

9-11

Solutions. Let J(t) be piecewise continuous for t ~ 0 and have an
absolutely convergent Laplace transform for u > uo. A solution x(t) of
eq. (40) satisfying given initial conditions

x(O)

(42)

= ao,

X' (0) =

aI, "', x(n-l) (0) = an-l

is obtained as follows. One forms the Laplace transform of both sides of
eq. (40), applies the rule (6), and obtains the transformed equation
V(s)X(s) - Q(s) = F(s),

(43)

where X = L[x], F = L[f] and
(44)

Q(s) = aoaosn- l

+ (aOal + alaO)sn-2 + ... + (aOan-l + ... + an-lao).
Accordingly,
(45)

Q(s)
Xes) = Yes)

(46) x(t) = L -l[Y(S)Q(s)

F(s)

+-

Yes)

+

=

+ Y(s)F(s),

Y(s)Q(s)

Y(s)F(s)] = L -l[Y(S)Q(s)]

+ L -l[Y(s)F(s)].

Since Y(s)Q(s) is a proper rational function, its inverse can be found by
residues as in Chap. 9, Sect. 5. The inverse of Y(s)F(s) can be found in a
variety of ways. In particular, Yes) has an inverse transform yet) and
(47)

L-'[Y(s)F(s)] = y(t) • I(t) = ,fY(U)/(t - u) duo

Thus both terms in eqs. (46) are well defined and it can be shown that
x(t) is the solution sought; x(t) has continuous derivatives through the
(n - l)st order and an nth derivative which is continuous except where
J(t) is discontinuous.
The formula (47) defines y * J if J(t) is piecewise continuous for t ~ 0,
even though J(t) may grow very rapidly as t ~ +00. If Yes) has only
simple roots Sl, " ' , Sn, so that
(48)

Yes) =

n

A.

2: _J_.
j=l S -

n

yet) =

Sj

2: Aje

Sjt

,

j=l

then
(49)

If V has multiple roots, each multiple root

Sj

gives rise to terms of form

9-12

GENERAL MATHEMATICS
TABLE

2.

INVERSE LAPLACE TRANSFORMS

F(s)

L-l[F(s)]

c

!?. e(-b/a)t

+b
ps + q
(s + a)(s + (3)

1 as
2

3

4

= f(t)

a

(q - pa)e- at - (q - p(3)e- pt
{3-a
'

+

ps
q
(s + a)2

e-at[p

p3 + q
2
as 2+bs + c ,b -4ac>O

1
.
- - [(q - pa)e- at - (q - p(3)e-Pt ],
p.

+

(q - ap)t]

a = b + p., {3 = b - J.1.,
2a
2a
5

as 2

ps

+

q

+ bs + c

,b2 - 4ac

<0

e(-b/2a)t
J.1.

6

(s

pS2 + qs + r
+ a)(s + (3)(s + 1')'

pS2 + qs + r
+ a)2(s + (3) ,

(s

a ¢ {3

+ qs + r
+ a)3

(s

yb 2 -

V4ac - b2

-1
ABC [A (pa 2

-

+ r)e-at
q{3 + r)e- Bt
qa

ql' + r)e-"Y t],
A = {3 - 1', B = l' - a, C = a - {3
2
p{32 - q{3 + r e- pt + [pa - qa + r t
({3 - a)2
({3 - a)
2
pa - 2a{3p + q{3 -at
({3 - a)2
e

rJ

+

8 pS2

=

pe-at

4ac

[Ea cos.!!:.-.2a t + 2aqap.- pb sin.!!:.-.2a tJ '

+ B(p{32 + C(pl'2 -

a, {3, l' distinct

7

=

J.1.

+ (q -

+ (pa 2 -

2pa)te- at
2

qa

t
+ r) '2e-at

LAPLACE TRANSFORMS
TABLE

2.

9

10

(s

(Continued)

INVERSE LAPLACE TRANSFORMS

L-l[F(S)] = J(t)

F(s)
pS2

9-13

+ qs + r

~e-at

+ ex) (as 2 + bs + c)'
aa 2 - bex + c ~ 0

N
M

pS3 + qs2 + rs + u
(as 2 + bs + C)(AS2 + Bs + C)

as 2 + bs + c and
As2 + Bs + C having no
common roots

+ ~L-l

[ Bs + C ],
N
as 2 + bs + c
= pex 2 - qex + r, N = aex 2 - bex
B = (aq - bp)a+ pc - ar,
C = (ar - pc)a + qc - br

L-l [~ + qo ]
as 2 + bs + C

+ L-l [

+ c,

PIS + q1 ] .
AS2 + Bs + C

To find po, qo, PI, ql, compute:

+
+

Ao = a(ar - cp)
b(bp - aq),
J..I.o = a(au - cq)
bcp,
ero = a(bu - cr) + c2p, (3 = aB - bA
"I = aC - cA, Do = a'Y 2 - b{3'Y + c{32,
Al = A(Ar - Cp)
B(Bp - Aq),

+

J..I.1

=

erl

=

15 1 =

A(Au - Cq) + BCp,
A(Bu - Cr) + C2p,
A'Y2 - B{3'Y + C{32.

Then
po =
PI =

11

ps + q
2
(as2 + bs + C)2' b - 4ac

Ao'Y - J..I.o{3

Do
-Al'Y

.

+ J..I.tf3 ,

15 1

q1

e- at

< 0 '2dif33 [p{32t sin {3t + (q ex =

~,

2a

{3

=

J..I.o'Y - ero{3

, qo =

-J..I.I'Y

=

,

+ ertf3

15 1

exp )(sin {3t - {3t cos (3t)]

V4ac 2a

Do

2

b

GENERAL MATHEMATICS

9-14

The corresponding term in yet) is Atk- I e8 jl/(k - I)!

A(s ~ Sj)-k in Yes).
and in L -I[YF] is

(50)
Particular Solutions. If all the initial constants ao, " ' , an-I are 0,
then Q(s) = and x = L -I[YF] is the solution sought. This particular
solution can be found by eqs. (47), which requires knowledge of yet) and
hence of the roots of V(s). This can cause difficulty. An alternative is to
employ eqs. (29):

°

1
(51)

x(t)

=-

27r

f

00

Y(s)F(s)e Bt dw,

0"

= const. >

0"0'

-00

It may be possible to simplify this by residues or series expansions.
If J(t) is of form ebtp(t), where pet) is a polynomial of degree m in t, a
particular solution can be found explicitly without finding the roots of Yes).
If V(b) ~ 0, the particular solution is
(52)

x(t)

= ebt [Y(b)P(t)

+ Y'(b) p'(t)
I!

Y"(b)

y(m) (b)

2!

m!

+ - - p"(t) + ... +

If V(b) == 0, then Yes) = (s - b)kW(s), Web) ~ 0. Let Z(s)
and let PI (t) be the polynomial obtained by integrating
(53)

Z(b)p(t)

+ Z'(b)p'(t) + ... +

z(m)

]
p(m)(t) .

= I/W(s)

(b)

m! .

p(m)(t)

°

k times from to t. Then x = ebtpi (t) is a particular solution of eq. (40).
In both cases it can be verified that
L[x]

= Y(s)L[ebtp]

+ Y(s)R(s),

where R is a polynomial of degree less than that of V (in fact less than that
of W (s) in the second case).
Shnultaneous Equations. Similar methods are employed for simultaneous linear differential equations with constant coefficients in unknowns
xl, X2, •••• One applies the Laplace transformation to the equations,
thereby obtaining equations for XI(s), X 2 (s), ... ; in forming these new
equations, certain initial conditions for xl, X2, ••• are assumed. The equations are simultaneous algebraic equations for XI(s), X 2 (s), ... and can

LAPLACE TRANSFORMS

9-15

be solved by elimination or determinants. When X j(8) is known, Xj(t) can
be found by forming the inverse transforms.
EXAMPLE.

d2 x

dy

-dt2 + 2 -dt + y

= 13e2t

dx
- - 2x
dt

'

d2 y

when t = 0, x = 1, y = 0, dx/dt = 0, dy/dt = 1.
.
82 X(8)
.

(8 - 2)X(8)

X=
Y

84
(8 -

.

15

= 2 + --,
8-2

2

2

+

1582

-

178

+

26

+ 1)(8 + 2)(8 + 1)

1

8- 2

2t

2

8 -

2

3

Y=---

y=e

Hence

= 8 + --,

+ (8 + 38 + 5) Y(8)

2

2
19
X=--8- 2
2(8 + 1)

2t

'

13

+ (28 + 1) Y(8)

(8 - 2)(8

x=2e

= 15e2t •

+ 8 + 88 + 58 + 54 ,
2)(8 + 1)(8 + 2)(8 + 1)

83

=

dy

+ -dP + 3 -dt + 5y

+

21
5(s

19
2(s

+ 2)
28

,

+

+ 1) + 5(8 + 2) +

19 - t
21 -2t
- - e +-e
2
5

19 - t
28 -2t
- - e +-e
2
5

438 - 51
10(S2

298
10(s2

+ 1) '

+

7

+ 1) '

- 51 sin t
+ 43 cos t 10
'

t + 7 sin t
+29-cos
----

10

7. RESPONSE TO IMPULSE FUNCTIONS

For many applications it is important to consider the response of a linear
system to the impulse function oCt) or to other ideal functions such as
O/(t), o"(t), ....
EXAMPLE

1.

Consider the equation

dx

-dt + x =

oCt)

'

x(O) = ao.

If one applies the Laplace transform mechanically to both sides and em-

GENERAL MATHEMATICS

9-16

ploys the rule: L[o(t)] = 1 (Sect. 3), one finds:
1

+ ao

Xes) = - - ,

xU) = (1

s+l

Hence x(t) has a discontinuity at t =
'initial value ao to the value 1 + ao.
EXAMPLE 2. Similarly,
d2 x
dt2

+ ao)e- t.

°

(Fig. 6); x jumps from the assigned

dx

+ dt

= oCt)

is found to have the solution (Fig. 7) x = 1 + ao

+ al

-

(1

+ al)e7"t, with

x
x

1

+ ao

FIG. 6.

Response of first order system
to a-function.

FIG. 7.

Response of second order system to a-function.

ao = x(O), al = x'(O). Here there is no discontinuity of x(t) at t = 0, but
x'(t) has a discontinuity, jumping from the assigned initial slope of al to
the slope 1 + al.
It should be noted that the second example can be written as follows:
dx
dt = y,

dy
dt

+y

= o(t);

thus its solution is an integral of the solution of the first example. Each
such integration reduces the type of discontinuity. In general,
V(D)x = oCt),

V(D) = aoDn

+ ... ,

ao

~

0,

has a solution which has a jump in the (n - l)st derivative at t = 0, but
no jumps in the derivatives of lower order. For the equation
V(D)x = oCt - c),

c>

0,

a similar conclusion holds, with the discontinuity occurring at t = c.

LAPLACE TRANSFORMS

9-17

One can interpret oCt - c) as dd u(t - c), if one forms the transforms by

.

t

the rule L[1'] = sL[j], ignoring the discontinuity which would make the
rule inapplicable. For then
L [:t u(t - e) ] = sL[u(t - e)] = e-",

in accordance with eq. (18). This suggests the general procedure.
General Procedure. For the differential equation
df
V(D)x = -,
dt
in which f has a jump discontinuity at t = c but has otherwise continuous
derivatives, one should take transforms ignoring the discontinuity:
V(s)L[x] = sL[f] - f(O).
Under similar conditions on f, a similar procedure can be used for higher
derivatives, and for the general equation
V(D)x = W(D)f.
If the order of V(D) is less than that of WeD), x will itself be an ideal
function; otherwise x will merely show some discontinuity at t = c. Similar
remarks apply when there are several jump discontinuities.
Let f have continuous derivatives of all orders except for t = c, at which
the derivatives have limiting values to the left and to the right. Then f
can be written as !l (t) + klU(t - c), wherefl(t) is continuous at t = c; correspondingly, 1'Ct) = f't (t) + kloCt - c), where f'l (t) is discontinuous at
t = c. Thus

1'(t) = !2(t)

+ k 2u(l -

f"(t) = 1'2(t)

+ k20(t

= faCt)

c)

- c)

+ kau(t -

c)

+ klo(t -

c)

+ klo'(t

- c)

+ k20(t -

c)

+ klo'(t -

c),

Computation of lAf'], L[f"], "', as described above, is then equivalent to
that obtained by writing
L[1'] = L[!2]
L[f"] = L[fa]

+ k2L[u(t

- c)]

+ kaL[u(t -

c)]

+ klL[o(t

- c)]

+ k2L[0(t -

c)]

+ klL[o'(t -

c)]

9-18

GENERAL MATHEMATICS

if one agrees that

(m =

(54)

1,2, ... ).

The justification for· the rules adopted lies in the fact that they give a
reasonable limiting form for the response x(t), and they meet the needs of
the physical situations to which they are applied.
8. EQUATIONS CONTAINING INTEGRALS

The method of Sect. 6 is applicable to "integro-differential equations"
such as the following:
(55)

ao dx
dt

+ alX + a2

rtx dt = J(t).

Jo

One need only apply the Laplace transformation to both sides and employ
rule (7):
(56)
from which one can solve as before for Xes).
One can also differentiate eq. (55) to obtain an equation of second order:
d2x
dx
(57)
ao - 2 + al - + a2 x = f' (t) ;
dt
dt
from eq. (55), aox'(O) + alx(O) = J(O), so that one initial condition for eq.
(57) is fixed. If J(t) has discontinuities, f'(t) has to be treated as an ideal
function (Sect. 7); in such a case, it is simpler to use eq. (56).
It should be remarked that eq. (55) is equivalent to the system
dy
-=X

dt

'

with the initial conditions: x(O) = ao, yeO) = O. By similar devices integrals can be eliminated formally in most cases.
9. WEIGHTING FUNCTION

It has been seen that, for proper initial conditions, various problems lead
to relations of form
(58)

Xes) = Y(s)F(s),

where F(s) is the Laplace transform of a driving function or "input" J(t)
and Xes) is the Laplace transform of the "output" x(t). In such cases

LAPLACE TRANSFORMS

9-19

yes) is termed the transJer Junction; i.e., in general, the transfer function
is the ratio of the Laplace transforms of output and input.
If yet) is the inverse Laplace transform of Yes), then as in Sect. 6
x(t) = yet) * J(t) =

(59)

1.

t

J(t - u)y(u) duo

Accordingly, x(t) is a weighted average of J(t) over the interval from 0 to t,
the value at t - u receiving weight y(u). SinceJ(t) = 0 for t < 0, one can
also write
x(t) =

(60)

f' f(t -

u)y(u) du,

-00

so that the average is over the entire "past" of J(t).
Graphical Computation. One can then compute x(t) at each t
graphically as suggested in Fig. 8. Here y(u) is graphed against u, with

u

FIG. 8.

Response as a weighted average.

the positive u-axis to the left and the origin above the point t on the t-axis.
The value of J at t - u is multiplied by the value of y above t - u and the
result is integrated to yield x(t) at the t chosen. As the graph of y(u) is
moved parallel to the t-axis, the average at successive times t can be found.
Weighting Function. The function yet) = L -l[y(S)] is termed the
"weighting function." In view of the discussion given, this term would be

1.

00

justified only if

(61)

Y(s)

y(t) dt = 1.

=

1.

But

00

y(t)e-" dt,

Y(O)

=

1.

00

y(t) dt,

provided Yes) is defined for s = O. Hence if YeO) = 1, the "total weight"
is 1, as desired. If YeO) ~ 00, one can redefine the input as a constant
times x(t) and achieve the same result.

9-20

GENERAL MATHEMATICS

Response to Unit Impulse. If € is very small andJ(t) is a'square pulse

of height 1/ € from t = O'to t =

€,

then eqs. (59)' show that approximately

1
x(t) = - yet) . € = y(t);
€

as E ~ 0, this can be shown to be the limiting relation. .Thus the weighting
Junction is the response to the unit impulse Junction oCt). This also follows
from eq. (58), since if J(t) = oCt), L[f] = F(s) = 1.
One can also remark that,if J(t) is the unit function u(t), then L[f] =
F(s) = l/s, so that by eq. (58)
Yes)
Xes) = - ,

dx
yet) = dt ;

Yes) = sX(s),

s·

for, by eqs. (59), x(O) = O. Thus the weighting function can be interpreted
as the derivative oj the response to the unit Junction. If one denotes by A (t)
the response to the unit function, so that L[A] = Y(s)/s, then for an arbitrary driving function J(t),
(62)

Y(S)
)
Xes) = s ( -s- F(s) ,

X

dit

=-

dt

J(t - u)A(u) duo

0

Equations (59) and (62) are equivalent to the eqs. (25), (26) of Chap. 8,
Sect. 5.
.
10. DIFFERENCE-DIFFERENTIAL EQUATIONS

Because of the transformation rule: L[f(t - c)] = e-C8 L[f] (Sect. 3),
Laplace transforms can be applied to solve linear difference-differential
equations, i.e., equations of form
n

(63)

M

E E amlcf(lc) (t -

mT) = g(t);

lc=;=Om=O

it will be assumed that the coefficients amlc are constants and that a solutionJ(t) is to be found which is equal to 0 for t ~ 0 and satisfies eq. (63) for
t > O. Under these conditions

and the transformed equation corresponding to eq. (63) is
(65)

(~m~o amks'e-mT') F(s) =

G(s).

LAPLACE TRANSFORMS

9-21

This can be solved for F(s) and the solution sought is L -l[F(s)]. Validity
of this process requires in particular that for some 0"0 the term in parentheses in eq. (65)"have no zeros in the complex s-plane for 0" > 0"0' For
discussion of the questions involved here see Ref. 1.
Instead of requiring that J(t) be == 0 for t < 0 one can impose the condition that J(t) coincide with a given function Jo(t) in an "initial interval"
-MT ~ t ~ O. ;This case can be reduced to the previous one by first extending the definition of Jo(t) to the range t > 0, while preserving continuity, and introducing a new unknown function it (t) = J(t) - Jo(t).
11. ASYMPTOTIC BEHAVIOR OF TRANSFORMS

In general the behavior of J(t) at t = 0 is related to that of F(s) = L[f]
as s ~ 00 along the real axis, while the behavior of J(t) at t = +00 is related to that of F(s) as s ~ 0 (or s ~ 0"0) along the real axis. A full discussion is given in the book of Doetsch (Ref. 3, Chap. 8), pp. 186-277.
If
al
G(s)
(66)
F(s) =

-+-,
s
S2

where IG(s) I < M for
(67)

0"

>

O"o,then.

lim J(t) = lim sF(s)
t-+O+

(s real).

8-+00

If J(t) and J'(t) have convergent Laplace transforms for
has a limit as t ~ +00, then
(68)

limJ(t) = lim sF(s)
t-+oo

0"

>

0 and J(t)

(s real).

8-+0

REFERENCES
1. R. Bellman and J. M. Danskin, The Stability Theory of Differential Difference
Equations, Proceedings of the Symposium on Nonlinear Circuit Analysis, Vol. II, pp.
107-128, Polytechnic Institute of Brooklyn, New York, 1953.
See also the list following Chap. 8.

A

GENERAL MATHEMATICS

Chapter

10

Conformal Mapping

w. Kaplan
1. Deflnition of Conformal Mapping.

General Properties

2. linear Fractional Transformations

10-01
10-05

3. Mapping by Elementary Functions

10-06

4. Schwarz-Christoffel Mappings

10-08

5. Application of Conformal Mapping to Boundary Value Problems

10-09

References

10-11

1. DEFINITION OF CONFORMAL MAPPING.

GENERAL PROPERTIES

Definitions. Let u = f(x, y), v = g(x, y) be two real functions of the
real variables x, y, both defined in an open region D of the xy-plane. As
(x, y) varies in D (Fig. 1), the corresponding point (u, v) varies in a set
Dl and one says that the equations
(1)

v = g(x, y)

u = f(x, y),

define a transformation or mapping T of D onto Dl (Chap. 1, Sect. 3). If
for each (u, v) in Dl there is precisely one (x, y) in D such that u = f(x, y),
v = g(x, y), then the transformation T is said to be one-to-one, and T has
an inverse T-I, defined by equations
(2)

y = 1/I(u, v),

x = cp(u, v),

obtained by solving eqs. (1) for x and y in terms of u and v.
Now let T, defined by eqs. (1), be a mapping of D onto D 1 • In addition,
let f(x, y) and g(x, y) have continuous first partial derivatives in D. The
10-01

10-02

GENERAL MATHEMATICS

mapping T is said to be conformal if, for each pair of curves CI, C2 meeting
at a point (xo, Yo) of D, the corresponding curves CI, C2 meeting at (uo, vo)
form an angle ex at (uo, vo) equal to that formed by C1 *, C2 * at (xo, Yo). It
is assumed that CI, C2 are directed curves and have well-defined tangent
vectors at (xo, Yo) so that C1 *, C2 * also have tangent vectors at (uo, vo). The
angle ex is then measured between the tangent vectors. I t is customarily
a signed angle and measured, e.g., from C1 to C2 and, correspondingly, from
v

u

x

(b)

(a)

FIG. 1.

Conformal mapping: (a) z-plane, (b) w-plane.

C1 * to C2 *. Conformality then means that the corresponding angles are
equal and have the same sense, as in Fig. 1. To emphasize this, one can
write more explicitly that T is to be conformal and sense-preserving. For
most applications T is assumed to be one-to-one. Conformalityof T then
implies conformality of T- 1 •
THEOREM 1. Let (1) define a mapping T of D onto D 1 • Let f(x, y) and
g(x, y) have continuous first partial derivatives in D. Then T is conformal
and sense-preserving if and only if the Cauchy-Riemann equations
au
av
-=-,
ax
ay

(3)

au

av

ay

ax

hold in D and the Jacobian a(u, v)/a(x, y) ¢ 0 in D.
By virtue of this theorem, the theory of conformal mapping is related
to the theory of analytic functions of a complex variable (Chap. 7). One
can use complex notation:
(4)

z

=

x + iy,

w = u

+ iv,

i=

and the transformation T is then simply a complex function w = F(z) defined in D. The mapping w = F(z) is conformal precisely when F is analytic
in D and F'(z) ¢ 0 in D. (See Ref. 2.)
.

CONFORMAL MAPPING

10-03

REMARK. If F'(z) is 0 at a point zo, then Zo is termed a critical point of
F(z). A function w = F(z) cannot define a conformal mapping of any open
region D containing a critical point zoo The behavior of F(z) near a critical
point is typified by the behavior of zn near z = 0, for n = 2, 3, ... ; except
for w = 0, each w has n inverse values w l / n • Curves meeting at angle a at
z = 0 are transformed onto curves meeting at angle na at w = O. The
absence of critical points does not guarantee that F(z) describes a one-toone mapping; all that can be said is that, if F'(zo) ~ 0, then w = F(z) does
define a one-to-one conformal mapping of s~me sufficiently small region
cont'aining zoo
GeoIlletrical Meaning of ConforIllality. Let w = F(z) define a
on~to-one conformal mapping of D on D I • Then each geometrical figure

v

y

A'

C

pQa
S
u

H
FIG. 2.

Behavior of mapping in the interior and on the boundary.

in D will correspond to one in DI which is similar in a certain sense; if the
first figure is bounded by smooth arcs, the second will be bounded by similar
arcs and corresponding pairs of arcs form the same angle (Fig. 2). The
lines x = const., y = const. in D form two families of curves meeting at
right angles; hence these correspond to curves in Dl formed of one family
and of its family of orthogonal trajectories (Fig. 3). Similarly the curves
u = const. form orthogonal trajectories of the curves v = const. On the
boundary of D conformalitY,may break down. In general there is some
sort of continuous correspondence between boundary points of D and those
of D I .· If D and Dl are each bounded by several simple closed curves, and
F is one-to-one, then the mapping F and its inverse can indeed be extended
continuously to the boundaries. Commonly there are points at which conformality is violated in that two boundary arcs of D meeting at angle a
correspond to boundary arcs of DI meeting at angle {3 ~ a; in particular
this can mean a folding together of the boundary, as suggested in Fig. 2.

10-04

GENERAL MATHEMATICS

As in Chap. 7, Sect. 5, one can adjoin the number 00 to the complex
plane to form the extended plane. The mapping w = F(z) is said to be
conformal in a region containing z = 00 if F(l/z) is conformal in a region
containing z = O. Similarly, one can discuss conformality in a neighborhood of a point Zo at which F(zo) = 00, so that F(z) has a pole, in terms of
the conformality of l/F(z) near zoo
y

v

x

(a)

FIG. 3.

u

(b)

Level curves of x and y: (a) z-plane, (b) w-plane.

ConforIllal Equivalence. Two regions D, Dl are said to be conformally
equivalent if there is a one-to-one conformal mapping w = F(z) of D on Dl
(so that the inverse function maps Dl conformally on D). Conformally
equivalent regions must have the same connectivity; i.e., if D is simply
connected, then so is D 1 ; if D is doubly connected, so is D 1 • However,

having the same connectivity does not guarantee conformal equivalence.
If D is simply connected then D is conformally equivalent to one and only
one of the following three: (a) the interior of a circle; (b) the finite plane;
(c) the extended plane. In particular, one has the following theorem.
THEOREM 2 (RIEMANN MAPPING THEOREM). Let D be aS'imply connected region of the finite z-plane, not the whole finite plane. Let Zo be a point
of D,. and let a be a given real number. Then there exists a one-to-one conformal
mapping w = F(z) of D onto the circle Iw I < 1 such that F(zo) = 0 and
arg F' (zo) = a. Furthermore, F(z) is uniquely determined.

From this theorem it follows that the one-to-one conformal transformations of D onto Iw I < 1 depend on three real parameters: Xo = Re (zo) ,
Yo = 1m (zo) and a. These parameters can be chosen in other ways. For
example, three boundary points of D can be made to correspond to 3 points
on Iwi = 1 (in the same "cyclic order").

CONFORMAL MAPPING

10-05

2. LINEAR FRACTIONAL TRANSFORMATIONS

Each function

az

+b

cz

+d

I ae . bd I ~ 0,

w=---,'

(5)

where a, b, c, d are complex constants, defines a linear fractional transformation. Each such transformation is a one-to-one conformal mapping of
the extended z-plane onto the extended w-plane. Special cases of eqs. (5)
are the following:
Translations. The general form is

w = z

(6)

+ b.

Each point z is displaced through the vector b.
Rotation Stretchings. The general form is
(7)

The value of w is obtained by rotating z about the origin through angle a
and then increasing or decreasing the
distance from the origin in the ratio
Complex plane
A to 1.'
Linear Integral TransforlDations.
(8)

w = az

+ b.

Each transformation (8) is equivalent
to a rotation stretching followed by a
translation.

~r-------~---------+--~

Reciprocal TransforlDation.

(9)

1

w =-.

z

FIG. 4. The transformation w = liz.
Here Iwl = 1/lzl and arg w = - arg z.
Hence w is obtained from z by "inversion" in the circle Iz I = 1 followed
by reflection in the x-axis (Fig. 4).
IlDportant ConforlDal Mappings. The general linear fractional transformation (5) can be composed of a succession of transformations of the
special types:

(10)

a
w = -

+
e

be - ad

e

r,

r

1
=-1

Z

Z = ez

+ d.

10-06

GENERAL MATHEMATICS

If one includes straight lines as "circles through 00," then each transformation (5) maps each circle onto a circle. By considering special regions.
bounded by circles and lines one obtains a variety of important conformal
mappings, as illustrated in Table 1. The first three entries in the table
depend on 3 real parameters and provide all conformal mappings of D on
Dl in each case .. '
.'

1.

TABLE

IMPORTANT CONFORMAL MAPPINGS

F(z)

D

eia z - Zo.
1 - zoz
a real, IZo I < 1

az
ez

Izl < 1

+b

+ a'
a, b, e, d real, ad - be

z - Zo

>

1

>0

1m (z)

>

0

1m (w)

1m (z)

>

0

Iwl < 1

>0

. _
z -_
Zo •
e,a
a real, 1m (zo)

Iwl<

0

1

1
2b

region between circles

z

Iz - al = a,
Iz - bl = b,

1

< Re (w) < 2a

O O. The corresponding region Dl consists of the w-plane minus the ray:
u ~ 0, v = O. The points (x, 0) on the boundary of D correspond to the
points (u, 0) on the boundary of D 1 , both (x, 0) and ( -x, 0) corresponding
to (u, 0), with u = x 2 • It should be noted that F'(z) = 2z is 0 at z = 0,
so that this point is critical; conformality fails here, 'and in fact the edges
of D" forming a 180 0 angle at z = 0, are transformed onto overlapping
edges of Dl which form a 360 0 angle.
For w = Z2 one can also choose D as a sector a < arg z < /3, provided
/3 - a < 71"; the region Dl is the sector: 2a < arg w < 2/3. A third choice
of D is a hyperbolic region: xy > 1, x > 0; Dl is then a half-plane, v > 72.
A fourth choice of D is a strip: a < x < b, where a > 0; Dl is then a region
bounded by two parabolas: 4a 2u + v2 = 4a4 , 4b 2u + v2 = 4b4 •
The Function tV = zn. Analogous choices of regions can be made
for w = zn (n = 2, 3, 4, ... ). The sector D: a < arg z < /3, with /3 - a
< 271"/n, corresponds to the sector D 1 : na < arg w < n/3. If n is allowed
to be fractional or irrational, w = zn becomes a multiple-valued analytic
function (Chap. 7, Sects. 6 and 7) and one must select analytic branches.
For such a branch the mapping of sectors is similar to that when n is an
integer.
The General Polynolllial tV = aoz n + ... + an_lZ + an. Suitable
regions can be obtained by means of the level curves of u = Re (w) and
v = 1m (w). In particular the level
y
curves of u and v which pass through
the critical points of F(z) divide the
z-plane into open regions each of which
is mapped in one-to-one fashion on a
VI
region of the w-plane. This" is illustrated in Fig. 5 for w = Z3 - 3z + 3.
x
The critical points are at z = ± 1, at
V
which v = O. The level curve v = 0
divides the z-plane into six regions, in
each of which w = F(z) describes a
one-to-one conformal mapping of the
region onto a half-plane. Adjacent
regions, such as I and IV, can be FIG. 5. Mapping by w = z3 - 3z + 3.
merged along their common boundary
to yield a region mapped by w = F(z) on the w-plane minus a single line.
The Exponential Function tV = e Z • This maps each infinite strip
a < y < b conformally onto a sector a < arg w < b, provided b - a ~ 271";
in particular each rectangle: c < x < d, a < y < b in the strip corresponds

10-08

GENERAL MATHEMATICS

to the part of the sector lying between the circles /w I = eC and IwI = ed •
Similarly, the inverse of the exponential function, w = log z, maps a sector
on an infinite strip. When b - a = 7r/2, the sector is a quadrant; when
b - a = 7r, the sector is a half-plane.
The Trigonometric Function w = sin z. This maps the infinite
strip -7r/2 < x < 7r/2 on the finite w-plane minus the portion IRe (w) I
~ 1 of the real axis.
'
The Rational Function w = z
(lIz) = (z2
l)lz. This maps the
exterior of the circle Iz I = 1 on the w-plane minus a slit from -2 to +2.
The same function maps the upper half-plane 1m (z) > 0 on the w-plane
minus the portion IRe (w) I ~ 2 of the real axis.
Let the real constants hI, "', hn+I, XI, "', Xn satisfy the conditions

+

(12)

Xl

<

X2

for some m, 1

~

m

< ... <
~

n

+

Xn,

+ 1.

Then
n

(13)

fez) = hI log (z - Xl) - hn+llog (z -

Xn)

- Xk
+ :E hk log -z k=2

Z -

Xk-l

maps the half-plane 1m (z) > 0 one-to-one conform ally on a region DI consisting of a strip between two lines v = const. minus several rays of form
v = const. If the strip DI has width ~ 27r, the function F(z) = exp [fez)]
maps the upper half-plane conformally and one-to-one on a sector minus
certain rays and segments on which arg w = const. (See Chap. 7, Ref. 7,
pp.605-606.)
4. SCHWARZ-CHRISTOFFEL MAPPINGS

These are defined by the equation
(14)

w = fez) =

AfZ (z Xo

Xl)

k dz
1 •••

(z -

k
Xn) n

+ B,

where A, B are complex constants, Xo, XI, "', X n , kI, "', k n are real constants, and -1 ~ k j ~ 1. The function fez) is analytic for 1m (z) > 0,
with (z - Xj)k i interpreted as the principal value: exp [k j log (z - Xj)].
Every one-to-one conformal mapping of the half-plane D onto the interior
of a polygon can be represented in the form (14); this applies more generally
to every one-to-one conformal mapping of the half-plane onto a simply
connected region whose boundary consists of a finite number of lines, line
segments, and rays.
Polygon. When the function maps D onto a polygon, the points XI,
••• , Xn (and possibly 00) on the x-axis correspond to vertices of the polygon,

CONFORMAL MAPPING

10-09

and the corresponding exterior angles are k l 7r, ... , kn 7r. If there is an
(n + l)st vertex, corresponding to z = 00, then necessarily kl + ... + k n
~ 2; in general, 1 < kl + ... + k n < 3.
Convex Polygon. When the function (14) maps D onto a convex polygon, all exterior angles are between 0 and 7r and the sum of the exterior
angles is 27r; accordingly,
(15)

o < kj < 1

and

kl

+ ... + k ~ 2.
(n + l)st vertex
n

When kl + ... + k n < 2, there is an
corresponding to
z = 00. In general, for every choice of the numbers kl, ... , k n such that
(15) holds, eq. (14) describes !1 one-to-one conformal mapping of the
half-plane 1m (z) > 0 onto the interior of a convex polygon.
Rectangle. For the special case
(16)

o<

k

< 1,

the mapping is onto a rectangle with vertices ±K, ±K + iK', where

In this case F(z) is an elliptic integral of the first kind (Chap. 7, Sect. 8),
and its inverse is the elliptic function z = sn w.
A great variety of conformal mappings have been studied and classified.
See Ref. 1 for an extensive survey.
5. APPLICATION OF CONFORMAL MAPPING TO BOUNDARY VALUE
PROBLEMS

The applications depend primarily on the following formal rule. If
Vex, y) is given in a region D and w = fez) is a one-to-one conformal mapping of D on a region Dl, then
(18)

In particular, V is harmonic in terms of x and y:
(19)

if and only if V is harmonic when expressed in terms of u and v.

GENERAL MATHEMATICS

10-10

The boundary value problems considered require determination of U in
D when U is required to satisfy some conditions on the boundary of D
and to satisfy an equation
(20)
for given hex, y), in D. It follows from eq. (18) that a conformal mapping
w = fez) amounts to a change of variable reducing the problem to one of
similar form in the region D 1 • It is in general simpler to solve the problem
for a special region such as a circle or a half-plane. Hence one tries to find
a conformal mapping of D onto such a special region D 1 • Once the problem has been solved for U in D1, U can be expressed in terms of (x, y) in
D and the problem has been solved for D.
For most cases D has a boundary B consisting of a finite number of
smooth closed curves C1, "', Cn, the case n = 1 being most common.
The most important boundary value problems are then the following.
I. Dirichlet Problem. The values of U on B are given; U is required
to be harmonic in D and to approach these values as limits as z approaches
the boundary.
II. Neumann Problem. Again U is harmonic in D but on B the
values of au/ an are given, where n is an exterior normal vector on B.
Both problems can be generalized by requiring that U satisfy a Poisson
eq. (20) in D. In general this case can be reduced to the previous one by
introducing a new variable W, where
(21)

Furthermore, the Neumann problem can be reduced to the Dirichlet problem by consideration of the harmonic function Vex, y) conjugate to U
(Chap. 7, Sect. 2).
To solve the Dirichlet problem for a simply connected region D, one'
seeks a one-to-one conformal mapping of D on the circular region Iwi < 1.
This reduces the problem to a Dirichlet problem for the circular region.
If p(u, v) are the new boundary values, its solution is given by
(22)

U

1

=-

211'

i

0

2
71'"

p(cos q" sin q,)

1-

1

+r

2

r2

- 2r cos (q, - 0)

where r, 0 are polar coordinates in the uv-plane.

dq"

CONFORMAL MAPPING

10-11

If D is multiply connected, it is also possible to map D conformally on
a standard type of domain, for which solution of the Dirichlet problem is
known. For details, see Ref. 2.

REFERENCES
1. H. Kober, Dictionary of Conformal Representations, Dover, New York, 1952.
2. Z. Nehari, Conformal ~Mapping, McGraw-Hill, New York, 1952.
See also the list at the end of Chap. 7.

A

GENERAL MATHEMATICS

Chapter

11

Boolean Algebra
A. H. Copeland, Sr.

1. Table of Notations
2. Definitions of Boolean Algebra
3. Boolean Algebra and Logic
4. Canonical Form of Boolean Functions
5. Stone Representation
6. Sheffer Stroke Operation
References

11·01
11·01
11·05
11·08
11·09
11·10
11·11

1. TABLE OF NOTATIONS
Table 1 lists notations in current use. There are some inconsistencies
between different systems, and care is needed to ensure proper understanding. The list is not exhaustive and there are other notations even for the
crucial relations; for example, "a and b" is sometimes denoted by "ab."
The grouping under mathematics, engineering, and logic is somewhat arbitrary.
2. DEFINITIONS OF BOOLEAN ALGEBRA

First Definition. A study of the rules governing the operations on
sets (Chap. 1) leads to a type of algebraic system, in which the basic operations are U and n, frequently called "or" and "and," corresponding to
union and intersection of sets. In addition, the system can be partially
ordered (the relation of set inclusion) and each object of the system has a
complement.
11·01

GENERAL MATHEMATICS

11-02

TABLE 1.

TABLE OF SYMBOLS, BOOLEAN ALGEBRA

Operation Name
Mathematics
(Set Theory)

Engineering

Union
Intersection
Symmetric
difference
Complement
Order

"or"
"and"
Exclusive
"or"
Complement
Order

Symbols

,Logic

U

"or"
"and"
"or"

n
EBor

Negation
Material
implication
Sheffer
stroke
Existential
quantifier
Universal
quantifier

. Sheffer
stroke
Existential
quantifier
Universal
quantifier

Sheffer
stroke
Existential
quantifier
Universal
quantifier

EngiMathematics neer(Set Theory) ing

+

Logic

+

V

None

/\
A

==?

:::::>

V

3 or

L

/\

V or

II

'or C
~

3 or

U

n

t

A Boolean algebra B is a set of elements x, y, z, ... with two binary operations U and n, an order relation ~, and operation' of forming the complement such that:

= x,

(1)

x Ux

(2)

x U y = y U x,

(3)

x

n (y n z) =

(4)

x

n

x

n z)

= x,

ny =

x
(x

(y U z) = (x

x U (y

nx

y

n x,

n y) n z,
n y) U

= (x U y)

x U (y U z)

(x

n (x

n

= (x U y) U z,

z),

U z),

(5)

x

~

(6)

x

~ y

(7)

x ~ y and y ~ x imply x

(8)

B contains two elements 0 and 1 such that 0 ~ x ~ 1 for all x in B,

(9)

0

(10)

0 U x = x,

(11)

x

x,

nx

and y

=

0,

~

z imply x

1

~

z,

= y,

nx =

x,

1 U x = 1,

(12)

n x' = 0,
x
(x n y)' = x' U' y',

(13)

(x')' = x.

U x' = 1,
(x U y)'

= x' ny',

BOOLEAN ALGEBRA

11-03

The properties (1) to (13) can be regarded as a set of postulates, from
which all other properties are to be deduced. Some of the postulates are
consequences of others, so that the list could be considerably reduced (Refs.
1,3,5).
The definition given here is easily verified to be equivalent to that given
in Chap. 1, Sect. 7, in terms of lattices (Ref. 3).
Second Definition. An alternative definition is based upon the set
operation of symmetric difference, also known as "exclusive or." The symmetric difference of two sets X, Y, denoted by X EB Y, is the set of all elements in X, or in Y, but not in both. In symbols,
(14)

X

EB Y = {s Is E: X U

Y and s f[ X

n

Y}.

This is pictured in Fig. 1.
From the definition, a number of properties can be verified. For example, X EB X = 0 (here the empty set plays the role of the 0 of a Boolean

4111if

X

(f) Y

X

Iv

y

y

FIG. 1.

Symmetric difference.

FIG. 2.

Three-term symmetric difference.

algebra), (X EB Y) EB Z = X EB (Y EB Z). The proof of the second rule is
suggested in Fig. 2.
In an arbitrary Boolean algebra one can define x EB y in terms of the
other operations
(15)

x

EB y =

n y') U (x' n y).

(x

From (1), ... , (13) and (15) a number of rules can then be deduced by
algebraic means alone.
It is possible to consider EB and n as the basic operations and express U,
" and ~ in terms of these two:
(16)

x U y = (x

(17)

x' =

(18)

x ~ y if x

EB y) EB (x n V),

lEBx,

ny

= x.

Pursuing this point of view further, one is led to a second definition of a
Boolean algebra.

11-04

GENERAL MATHEMATICS

Alternative Definition. A Boolean algebra B is a set of elementtr x, y,
z, ... with two binary operations EB,
satisfying the laws:

n

= y EB x,

ny

n x,

(19)

x EB y

(20)

(x

EB y) EB z =

x

(21)

x

n (y EB z) =

(x

(22)

x

nx

(23)

B contains two elements 0 and 1 such that for all x in B, x
x EB x = 0 and x n 1 = x.

=

x

EB (y EB z),

= y

(x

n y) n z =

x

n (y n z),

n y) EB (x n z),

x,

EB 0 = x,

If the rules (19) to (23) are regarded as postulates and the relations
(16), (17), (18) as definitions of U, I, and ~, then one can prove all the
laws (1) to (13). Conversely, from (1) to (13) and the definition (15), one
can prove (19) to (23). Hence the two definitions of a Boolean algebra are
equivalent.
Relation to Set Theory. Although Boolean algebras arise naturally
in set theory, that is not the only source of such systems. They arise in
logic and in other mathematical contexts. It is natural to ask whether
every Boolean algebra can be interpreted as an algebra of all subsets of a
given set. This is not true as stated, but there is a close relationship between
each Boolean algebra and an algebra of sets (Sect. 5).
EXAMPLE 1. A very simple but nevertheless useful Boolean algebra is
one in which B contains only 0 and 1. The properties are given in Tables
2 and 3. This Boolean algebra is used in switching circuits: x = 1 means
TABLE

2.

~
0
1

xUy

0 1
0 1
1 1

TABLE

3.

~
0
1

xny

0 1
---

o

0

0

1

that a certain switch is closed and x = 0 means that the switch is open.
Two switches in parallel correspond to x U y; two switches in series correspond to x n y.
EXAMPLE 2. A somewhat more general Boolean algebra is used in the
design of electronic digital computers. This can be described as follows.
The elements of B are all ordered n-tuples x = (Xl, X2, " ' , x n ), where

11·05

BOOLEAN ALGEBRA

each Xk is 0 or 1; if Y
follows:

=

(YI, ... , Yn), then x U y, x

n yare defined as

n Y = (Xl n Yl, X2 n Y2, ••. , Xn n Yn),
Xk n Yk are evaluated as in Tables 1 and 2.
x

where Xk U Yk,
of B are defined as follows:

o=

(0, 0, ... , 0),

The 0 and 1

1 = (1, 1, ... , 1).

Electronic devices can be constructed to perform the Boolean operations
on the n-tuples x. The operation of ordinary arithmetic can be defined
in terms of the Boolean operations together with the operation of shifting
the decimal point.
3. BOOLEAN ALGEBRA AND LOGIC

Algebra of Sentences. Let x, y, ... stand for declaratory sentences.
For example x might stand for "greed is evil" and Y for "lead is heavy."
From two sentences x, Y one can form the new sentence "x and y"; this is
denoted by x n y. In the example given, x n Y is the sentence "greed is
evil and lead is heavy." From x and Y one can also form the sentence
"x or y"; this is understood to mean: x or y, but not both; the new sentence
is denoted by x Ee y. One can also form the statement "x and/or Y," meaning: x or Y or both; this is denoted by x U y. Finally, one can form the
negation of a sentence x: "lead is heavy" when negated becomes "lead is
not heavy." The negation of x is denoted by x'.
One can now verify that in the normal logical procedures for manipulating sentences, the operations U, n, Ee, I obey all the rules of a Boolean
algebra. Two sentences are regarded as equal if they are logically equivalent. In this sense, all fals~ sentences can be considered equal and identified with the 0 of the Boolean, algebra; a universal truth ("tantology") can
serve as the 1. In logic a table showing the Boolean algebra relationship of
variables is called a truth table. The order relation x ~ Y can be interpreted to mean: "x implies y." For example, if x is the sentence Itt is an
even integer" and Y is the sentence "2t is an even integer," then x ~ y,
but Y ~ x is false, so that x < y. The implication defined here is essentially strict implication (see below) (Refs. 4, 5).
Propositional Functions. The sentence Itt is an even integer" contains a variable, t. Accordingly, the sentence can be regarded as a function
of t. For each value of t, the function becomes a definite sentence or
proposition. Hence the function is termed a propositional function. It
can be denoted by f, with f(t) denoting the value for each t. For example,

GENERAL MATHEMATICS

11-06

f(4) is the true sentence "4 is an even integer," while J(3) is the false sen-

tence "3 is an even integer." The t's for which J(t) is true form a set.
Similarly, the sentence "t is a human being" is a propositional function
which in turn determines a set; namely, the set of all human beings. If
J, g, ... are propositional functions, then one can form new propositional
functions JUg, J
g, J EB g, J', ... as above. If Xj, X g , ••• are the sets
corresponding to these propositional functions, then the operations on the
functions correspond precisely to the operations U, n, EB, I on sets. For
example, J n g is true when J and g are true; therefore an object belongs to'
Xing when it belongs to XI and to X g, that is, to XI n Xg. Thus the calculus of propositional functions can be interpreted as a Boolean algebra of
sets. The zero element represents a propositional function which is false
for all values of the variable; the set 1 corresponds to a function which is
true for all values.
Conversely, each set X gives rise to a propositiohal function: t is an element of X. This function is true precisely when t belongs to X. A Boolean
algebra of sets thus leads to a Boolean algebra of propositional functions.
Because of the parallel between propositional functions and sets, one can
employ geometric set diagrams, as in Figs. rand 2, to reason about propositional functions. In logic they are called "Venn diagrams."
Quantifiers. The operation of forming the intersection of many sets
has an analogue for sentences or propositional functions. As for sets

n

n
n

(Chap. 1, Sect. 1)

Xt

denotes

Xl nX2 n ... n X n .

When the x's are

t=l

sentences, this is the new sentence: "everyone of the x's" or "for every t,
The range of t may be over an infinite set. When the range is underXt.
Similarly, if J(t) is a propositional funcstood, one writes simply
Xt."

.

n

n J(t) is

t

tion and t ranges over all values for which J(t) has meaning, then
read: "for every t, J(t)." Alternative notations for

IIJ(t). One terms
t

n a quantifier.

n J(t) are
t

t

V J(t) and
t

There is an analogous interpretation

t

U Xt and U J(t); the first is read "for some t, x/' and the second "there
exists a t such that J(t)." An alternative notat'ion for U is 3; U is also
of

t

t

called a quantifier.

t

t

t

hnplication. The statement "x implies y" is capable of various inter-

pretations, of which three will be discussed here: material implication, conditional implication, and strict implication. Throughout, x, y, ... denote
sentences forming a Boolean algebra B.
Material implication. From the sentences x, y one forms the new sentence: "x implies y" as the sentence x' U y. This is called material impli-

11 . . 07

BOOLEAN ALGEBRA

cation. One often writes x
(24)

:::::>

y or x => y for this implication:

x:::::> y = x' U y = x => y.

If x and yare propositional functions x(t), yet), then they can be represented
as sets X, Y. The sentence is then a propositional function which is true
for all t if X' U Y = I; that is, if X c Y. The notation x :::::> y is therefore
unfortunate. Material implication is the basis for most mathematical arguments, but it is criticized as permitting such statements as "if Iceland is an
island, then fish can swim" to be judged true.
Conditional implication. For each pair of sentences x, y a new sentence
y/x is formed and is read "if x then y" or "y if x." It will be assumed that
x -=;t. O. This is called conditional implication. The significance of the new
sentence is indicated by certain postulates:
(25)

x/x

= 1,

(26)

y/x

= 0 implies y n x = 0,

(27)

(y

(28)

z/(x

(29)

(1

(30)

for every x, y there is a z such that z/x = y, if x

n

z)/x = (y/x)

n y)

EB y)/x

n

(z/x) ,

= (z/x)/(y/x),
=

1 EB (y/x),
-=;t.

O.

Conditional implication is designed to fit the needs of the theory of probability.
When x is false, it may happen that y/x is true or that y/x is neither true
nor false.
One can verify that postulates (25) to (28) are s~tisfied by material implication, but that (29), (30) are not. However, (29) is a reasonable demand to make on an implication, and it is valuable in theory of probability. Postulate (30) requires that B contain sufficiently many sentences
so that one can always solve the equation z/x = y for the sentence z. A
Boolean algebra which has an operation x/V satisfying postulates (25) to
(30) is called an implicative Boolean algebra. It can be shown that an implicative Boolean algebra cannot be atomic (Sect. 4) but that one can
always construct an implicative Boolean algebra containing any given
Boolean algebra.
Strict implication is defined as follows. The strict implication x implies
y holds if and only if the material impli~ation is a tautology (i.e., x :::::> y
= 1) and this is true if and only if y / x =' 1. When x and yare interpreted
assets then the equation x :::::> y = 1 can be interpreted as stating that x
is contained in y. This relation has the following alternative notations:

GENERAL MATHEMATICS

11-08

x ~ y, y ~ x, x C y, Y ~ x, x C y, Y ::> x. The last two notations are unfortunate since they almost reverse the interpretation of the implication
symbol.

4. CANONICAL FORM OF BOOLEAN FUNCTIONS
Let a Boolean algebra B be given, with operations U, n, and ' as in
the first definition of Sect. 1. By a Boolean function or Boolean polynomial
in n variables Xl, " ' , Xn is meant an expression constructed from the n
variable elements XI, ••• , Xn by the three operations U, n, '. For example,

(X U y)

n (x'

U z')

is a Boolean polynomial in three variables. It would appear at first that
such expressions can be made arbitrarily long and hence that, for fixed n,
there are infinitely many polynomials. However, by the rules of the alge2n
bra, each polynomial can be simplified, and there are precisely 2 polynomials for each n. For example, there are four polynomials in one variable
x: x, x', 0 = x
x', 1 = x U x'.
If two Boolean polynomials in XI, " ' , Xn are given, one may wish to
determine whether they are the same; that is, whether one can be reduced
to the other by applying the algebraic rules. In order to decide this, one
reduces both polynomials to a canonical form, as described below. If both
have the same canonical form, they are the same; otherwise, they are unequal polynomials.
Definitions of Canonical ForIll and Minilllal Polynollliais. By a
minimal polynomial in XI, " ' , Xn is meant an intersection of n letters in
which the ith letter is either Xi or X'i.
EXAMPLES. There are four minimal polynomials in x, y:

n

X

ny,

X'

n

y,

X

ny',

X'

n

y'.

x'

n y' n z
X' n y' n z'.

Similarly, there are eight minimal polynomials in x, y, z:
X

n y n z,

X

n y n z',

n y' n z,·
X n y' n z',

X

n y n z,

X'
X'

n y n z',

There are 2n such minimal polynomials in XI, " ' , X n .
By a polynomial in canonical form is meant a polynomial which is either
o or else is a union of distinct minimal polynomials. (The order of the
terms can be specified, but this is of no importance since U is commutative.) For example,
(X

n

y) U (x'

n

y'),

(X

n

y) U (x

n

y') U (x'

n

y')

BOOLEAN ALGEBRA

11-09

are in canonical form. Every polynomial can be written in a unique canonical form, so that equality of two polynomials holds if and only if they have
the same canonical form (Ref. 3).
Reduction to Canonical ForlD. A given polynomial can be reduced
to canonical form by the following steps:
(i) Moving all primes inside parentheses by (12);
(ii) Moving all caps (n 's) to the inside of parentheses by the first rule
(4) ;

(iii) Simplification of terms by rules (1), (2), (9), (10), (11), (13), so that
one finally obtains a union of terms, each of which is a minimal polynomial
in some of the x's;
(iv) Adjoining missing x's to the minimal polynomials by inserting
x U x' = 1 for each such x;
(v) Applying steps (ii) and (iii) again.
EXAMPLE.

[x

n (y u z)] u [(x

U y)

n (y'

U z)']

n y) U (x n z)] U [(x U y) n (y n z')]
= (x n y) U (x n z) U (x n y n z') U (y n y n z')
=. [(x n y) n (z' U z)] U [(x n z) n (y' U y)] U (x n y n z')
= [(x

U [(y

n z') n (x'

U x)]

n y n z') U (x n y n z) U (x n y' n z) U (x n y n z)
U (x n y n z') U (x' n y n z') U (x n y n z')
(x n y n z') U (x n y n z) U (x n y' n z) U (x' n y n z')

= (x

=

5. STONE REPRESENTATION

Let a Boolean algebra B be given. Then it is possible to find a set S
and to define a one-to-one correspondence between the elements x, y, ...
of B and the certain subsets X, Y, ... of S in such a fashion that if x
corresponds to X and y to Y, then x U y corresponds to X U Y, x n y to
X
Y, x' to X', 0 to the empty subset 0, and 1 to S itself. Thus every
Boolean algebra can be represented as (is isomorphic to) the Boolean algebra of certain subsets of a set S. This is the Stone representation.
If B has only a finite number m of elements, then B can always be represented as the Boolean algebra of all subsets of a given set S. Furthermore,
m must be of the form 2n , where n is the number of elements in S. If Bl
and B2 are Boolean algebras both having m elements, where m is finite,
then Bland B2 are isomorphic.

n

GENERAL MATHEMATICS

11-10

STONE REPRESENTATION THEOREM. An infinite Boolean algebra B can
be represented as the Boolean algebra of all subsets of a set S if and only if B
is atomic, complete, and distributive. These properties are defined as

follows:
An element a of B is called an atom if the intersection x n a of a with an
arbitrary element x of B is either a or O. If, for each x other than 0 in B,
there is an atom a such that x
a = a, then B is said to be atomic. In the
representation of B as a class of sets, the atoms correspond to sets each
containing one point.
A Boolean algebra B is said to be complete: if every subset A of B has a
least upper bound (Chap. 1, Sect. 7). The least upper bound is then
unique; it can be denoted by

n

U(A)

or

and is also called the union of A.
A Boolean algebra B is said to be distributive if, whenever
(31)

U(A) exists,

{jn U (A) = U ({j n a)
a

E: A

for every {j in B.
6. SHEFFER STROKE OPERATION
In a Boolean algebra B let
(32)

x I y = x' U y'.

If x and yare sentences, x Iy is the sentence "either not x or not
can then prove that
(33)

x Ix

(34)

(x Iy) I(x Iy)

=x

(35)

(x Ix) I (y Iy)

= x U y,

(36)

x I (y Iy)

= x' U y,

(37)

xl (xix)

= 1.

y."

One

= x' = 1 EB x,

n y,

Accordingly, all the operations of the Boolean algebra can be expressed in
terms of the Sheffer stroke operation. This proves to be of value in the design

of electronic digital cotnputing machines, which compute in the scale of
two (see Ref. 6).

BOOLEAN ALGEBRA

11-11

REFERENCES
1. Digital Computers and Data Processing, J. W. Carr and N. R. Scott, Editors,
University of Michigan, Ann Arbor, HJ55, especially Article III. 4.1.
2. High-Speed Computing Devices, Engineering Research Associates, McGraw-Hill,
N ew York, 1%0.
3. G. Birkhoff, Lattice 'Theory, American Mathematical Society, New York, 1940.
4. 1. M. Copi, Symbolic Logic, Macmillan, New York, 1951.
5. P. C. Rosenbloom, 'The Elements of Mathematical Logic, Dover, New York, 1950.
6. M. Phistcr, Jr., Logical Design of Digital Computers, Wiley, New York, 1958.

A

GENERAL MATHEMATICS

Chapter

12

Proba bility
A. H. Copeland, Sr.

1. Fundamental Concepts and Related Probabilities

12-01

2. Random Variables and Distribution Functions

12-04

3. Expected Value

12-06

4. Variance

12-11

5. Central Limit Theorem

12-13
12-18

6. Random Processes

12-20

References

1. FUNDAMENTAL CONCEPTS AND RELATED PROBABILITIES
Postulates. The probability that an event will occur is a real number
between 0 and 1. If x denotes the sentence, the event will occur, then
Pr(x) denotes the probability that the event will occur. Thus Pr(x) is the
probability associated with the sentence x. Consider a Boolean algebra B
of sentences (see Chap. 11) in which 0 is interpreted as the sentence associated with an impossible event and 1 is interpreted as the sentence
associated with a certain event. This treatment will (a) show how some
probabilities can be computed from others; (b) study the relations between
probabilities of sentences connected by the words and, or, not, if (denoted
respectively by n, U, " /).
12-01

GENERAL MATHEMATICS

12-02

Assume that the following postulates hold:
(1) Pr(x) is a non-negative real number if x is in B.
(2) If xl, X2, "', are in B and Xi n Xj = 0 when i ~ j where i, j = 1, 2,
'.

••

... then

00

U Xk is in Band
k=l

(3) Pr(l) = 1.
(4) Pr(x
y) = Pr(x) Prey/x).
If Xi
Xj = 0, i.e., if Xi, Xj cannot both occur, then Xi, Xj are said to be
mutually exclusive and the events associated with them are also said to
be mutually exclusive. Thus postulate (2) states that the probability
that at least one of a set mutually exclusive events will occur is the sum
of their probabilities. The following theorems are consequences of the
above postulates.
THEOREM 1. 0 ~ Pr(x) ~ 1.
THEOREM 2. Pr(O) = 0 ..

n

n

THEOREM 3.

Pr

(U Xk) = f
k=l

k=l

Pr(xk) if Xi

n

Xj = 0 whenever i

~ j.

+

THEOREM 4. Pr(x U y) = Pr(x)
Prey) - Pr(x n y).
THEOREM 5. Pr(x') = 1 - Pr(x).
THEOREM 6. Pr(x
y') = Pr(x) - Pr(x
y).
THEOREM 7. If xl, X2, "', Xn are mutually exclusive (i.e., Xi
Xj = 0
when i ~ j) and exhaustive (i.e., Xl U X2 U ... U Xn = 1) and equally
likely (i.e., all Pr(Xk) are equal) then Pr(xk) = l/n for k = 1, 2, "', n.
EXAMPLE 1. As an illustration of Theorem 7 consider a coin which is
about to be tossed, and let Xl be the sentence "the coin will turn up heads"
and X2 be the sentence "the coin will turn up tails." If the coin is not
loaded, one says that it is honest and assumes that the hypotheses of
Theorem 7 hold. Then Pr(XI) = Pr(x2) = 72.
EXAMPLE 2. Next consider an honest die which is about to be thrown
and let Xk be the sentence "the face numbered k will turn up" where
k = 1, 2, "', 6. Again one assumes that the hypotheses of Theorem 7
hold and concludes that Pr(xk) = 76 for k = 1, 2, .. " 6. The probability
that the die will tum up an odd number is given by Theorem 3. Thus

n

n

n

EXAMPLE 3. Next let X = Xl U X3 U X5, Y = Xl U X2 U X3 and note
that Pr(x) is the probability that the die will turn up an odd number and
Prey) is the probability that it will turn up a number less than 4. It will

PROBABILITY

12-03

be instructive for the reader to check that

x

n y = Xl

U X3,

XUy =

Xl

U X2 U X3 U X5

and also to check Theorems 4, 5, and 6 for this X and y. To compute the
conditional probability Prey/x), i.e., the probability that the die will turn
a number less than 4 if it turns up an odd number, use postulate (4). Thus

Pr(x

n y)

=

t

= Pr(x)Pr(y/x) = tPr(y/x) ,

and hence

Prey/x) = -§-.
EXAMPLE 4. N ext consider three boxes and let Xk denote the sentence
"the kth box will be selected" where k = 1, 2, 3. If one of the boxes is
selected at random, this is interpreted to mean that the hypotheses of
Theorem 7 hold and hence that

Suppose further that the first box contains two silver coins, the second
contains one silver coin and one gold coin, the third contains two gold
coins, and that a coin is drawn at random from the box which has been
selected. Let y denote the sentence "a gold coin will be drawn from the
box which has been selected." Then

N ow suppose that this experiment has been performed and that the coin
has been examined and found to be gold. On the basis of this information
what is the probability that the coin came from the third box containing
the two gold coins? One interprets the answer to this question as the
conditional probability Pr(xa/Y) , i.e., the probability that the third box
was drawn if the coin was observed to be gold. It will be instructive
for the reader to verify that Pr(x3/Y) = 7i with the aid of the following
theorem which is called Bayes's theorem and which is a consequence of
the above postulates.
THEOREM 8. BAYES'S THEOREM. If XI, X2, .. " Xn are mutually exclusive,
exhaustive and distinct from 0, then for any y one has
n

Pr(xify) = Pr(xi)Pr(y/xi)/
if the denominator is not O.

L

Pr(xk)Pr(y/xk)

12-04

GENERAL MATHEMATICS

Independence. The sentences x!, X2, ... , Xn are said to be independent

if

and if a similar equation holds for every subset of x!, X2, "', Xn.
Thus when n = 3 one has
Pr(xi

n X2 n xs)
Pr(xi n X2)
Pr(x2
Pr(xi

= Pr(xI)Pr(x2)Pr(xS),

= Pr(xI)Pr(x2),

n xs) =
n xs) =

Pr(x2)Pr(xS) '
Pr(xl)Pr(x3).

If x!, X2 are independent and Pr(xI)

Pr(xi

n X2)

~

0, Pr(x2)

~

0 then

= Pr(xI)Pr(x2)
=

Pr(xI)Pr(x2/x I)

=

Pr(x2)Pr(xI!x2),

and hence
Pr(x2/xI) = Pr(x2)

and

Pr(xI!x2) = Pr(xI).

2. RANDOM VARIABLES AND DISTRIBUTION FUNCTIONS

Consider a physical experiment which is designed to result in a real
number. This number is subject to certain random fluctuations since in
all physical experiments one expects experimental errors to be present.
The result of the experiment is interpreted as a random variable X. For
a mathematical definition of a random variable, see below. Let Xx (for
any real number A) denote the sentence the experiment will produce a number less than A, i.e., the sentence X is less than A. Then the probability
Pr(xx) is a function F of the real variable A called the distribution function
of X. Thus Pr(xx) = F(A).
If Al < A2 then
,Pr(xx2 n x'x 1) = Pr(xX2) - Pr(xXl n xx 2) = F(A2) - F(AI)
is the probability that X is greater than or equal to A!, but less than A2.
Thus when F is known, one can find the probability that X lies in a given
interval.
In Chap. 11 it was noted that the elements of a Boolean algebra can be
interpreted as sets of points of some space. Thus one interprets xX 1 n X'X2
as a set and Pr(xx 2 n x'xJ as the probability of obtaining a point of this
set, that is, the probability that the experiment will select a point ~ of

PROBABILITY

12-05

this set. Imagine that the number which the experiment produces. is
determined by the point ~ selected and hence that X is a function of-~.
Then x>-. is the set of all points ~ for which X(~) < X. The only restrictions
placed on the function X are that it is real valued and that each of the sets
x>-. shall belong to B. Such a function is said to be measurable .with respect
to B. The measure of a set x>-. is defined as the probability Pr(x>-.). A
random variable X is a function which is measurable with respect to B.
Let X be a random variable, let x>-. be the set of points ~ for which X(~)
~ X, and denote Pr(x>-.) by F(X+). Then it can be proved (using postulate
2) that X>-. is in B for all real X. Moreover F(X+) is the limit of F(J.L) as
J.L approaches X through values greater than X. If J.L approaches X through
values less than X then the limit of F(J.L) is F(X). Furthermore, F is a
nondecreasing function for which
lim

x -+ -00

F(X) = F( -00) = 0,

lim F(X) = F( +00) = 1.
x -+ +00

The above properties characterize the distribution function of a random
variable.
EXAMPLE 1. As an illustration of a random variable let x be any element
of B and let
I if ~ is in the set x
,px(~) = { O'f'
. h
1 ~ IS not III t e set x.
Then ,px is called the characteristic functio.n of the set x and is interpreted
as the random variable which takes on the value 1 when x succeeds and the
F(X)

Pr(x' ) I - - - - - . . . . J

FIG. 1.

Distribution function for Example 1.

value 0 when x fails. The distribution function F of the random variable
,px is the following (see Fig. 1):
if X :::; 0
Pr(x') if 0 < 'A ~ 1
(
1
if 1 < 'A.
0

F('A) =

12-06

GENERAL MATHEMATICS

EXAMPLE 2. Consider a die and let
bered k will turn up. Let

X = 1/IxJ+ 21/1x2

3;k

denote the sentence the face num-

+ 31/1x3 + 41/1x4 + 51/1x5 + 61/1x6'

If the face numberedk does turn up then this will assign the value 1 to
1/Ixk and the value 0 to the remaining characteristic functions and hence X
F(X)

FIG. 2.

Distribution function for Example 2, random tossing of a die.

will take on the value k. Thus X is the random variable which takes on
the value which the die turns up (see Fig. 2).
It can be proved that sums, products, and differences of random variables
are again random variables. Furthermore any real number is a random
variable.
EXAMPLE 3. The number V2 is the random variable whose distribution
function F is given by (see Fig. 3):
F(A) -

o if A < V2

- 1 if

V2

~ A.

F(X)

FIG. 3.

Distribution function for Example 3.

3. EXPECTED VALUE

If X is a random variable associated with some experiment and if the
experiment is repeated a large number of times, then one should expect the
average of the numbers obtained to be very close to some fixed number

PROBABILITY

12-07

E(X) which is called the expected value of X. In order to make this idea
more precise the following definition is introduced. The random variables
Xl, X 2 , " ' , Xn are said to be independent provided xl,~p X2'~2' " ' , xn,~n
are independent for all AI, A2, "', An where Xk'~k is the set of points for
which Xk(~) < Ak.
,
As an illustration of independent random variables, consider a pair of
honest dice. Let X I denote the random variable which takes on the value
resulting from the throw of the first die and X 2 denote the random variable
corresponding to the second die. It is reasonable to assume that Xl and
X 2 are independent. Thus we assume that the occurrence of a number less
than Al = 3 on the first die and the occurrence of a number h~ss than A2
= 5 on the second die are independent events; simila~ly, for other choices
of Al and A2. Next let X3 = Xl + X 2 • . Then Xl and X3 are dependent
random variables.
Weak Law of Large NUlllbers. Now let X be an arbitrary random
variable and let Xl, X 2 , •• " Xn be independent random variables all having
the same distribution function as X. Let XE,n be the set of points'~ for
which

Then XE,n is interpreted as the sentence the average of 'Xl, X 2 ,
differ from E(X) by less than E. One might expect that
lim Pr(xE,n) = 1 for every

E

•• "

Xn will

>0

n~QO

and that there is only one choice of E(X) for which this limiting probability
is 1. This, as a matter of fact, is the case and this result is calle~ the weak
law of large numbers. Roughly the weak law of large numbers states thai if
an experiment is repeated a large number of times then it is very likely that the
average of the results will differ only slightly from the expected value. The
expected value E(X) exists for a large class of random variables but not for
all random variables.
Properties of Expected Value.

THEOREM 9.
E(AX + J.LY) = AE(X) + J.LE(Y) if A, J.L are real numbers
and X, Yare random variables for which E(X), E(Y) exist.
THEOREM 10. If E(X), E(Y) exist and X (~) ~ Y(~) for all ~ then E(X)
~ E(Y).
THEOREM 11. Ifl/lx is the characteristic function of the set x then E(1/;x) =
Pr(x).
With the aid of Theorems 9 and 11 one can compute expected value for
certain random variables called simple random variables.

GENERAL MATHEMATICS

12-08

A random variable X is simple if it has the form

where each Ak is a real number and each t/lxk is the characteristic function of
the set Xk.
THEOREM 12.

Theorem 10 is used to approximate expected value for a much larger
class of random variables called bounded random variables. The real numbers A, J.L are called bounds for a random variable X if A ~ X(~) ~ J.L for
all~. When the bounds exist X is said to be bounded.
THEOREM. 13. If A, J.L are bounds for X and if A = AQ < Al < ... < An
= J.L then E(X) exists and
n

L

n

Ak-I(F(Ak) - F(Ak-I)) ~ E(X) ~

k=1

L

Ak(F(Ak) - F(Ak_I))

k=1

where F is the distribution function of X. If each Ak - Ak-I
extreme members of the inequalities differ by at most e.
Theorem 13 is readily established as follows. Let

<

e,

then the

then
n

n

L

¢k =

1,

k=1

L

k=1

XCPk

=

X,

and
and hence the inequalities follow from Theorems 9 to 11. The difference
between the extreme members of the inequalities is:
n

L

k=1

n

(Ak - Ak-I)(F(Ak) - F(Ak_I))

< L

e(F(Ak) - F(Ak_I))

k=1

If F has a continuous derivative f (i.e., dF(A)/dA

=

e(F(J.L) - F(A)) = e.

= f(A)) then

F(Ak) - F(Ak-l) = f(J.Lk)(Ak - Ak -1)

PROBABILITY

where Ak-l

12-09

< J1.k < Ak and
n

n

lim
E -

L

Ak(F(Ak) - F(Ak_l» = lim

0 k=l

E -

L

Akf(J1.k)(Ak - Ak-l)

0 k=l

~ J."uf(u) duo
Thus
THEOREM 14. If the distribution function F of a random variable X has
a continuous derivative f and A, J1. are bounds of X then E(X) exists and

E(X)

~ J."uf(u) du ~ J."u dF(u).

THEOREM 15. If the distribution function F of a random variable X has a
derivative f then E(X) exists and

E(X)

~ foo

uf(u) du

-00

~ foo

u dF(u)

-00

whenever the integral exists.
Stieltjes and Lebesgue Integrals. The two cases which arise most
frequently in practice are the simple random variables and the random
variables whose distribution functions have continuous derivatives. In
the first case the expected value is computed by means of Theorem 12
and in the second case by Theorem 15. The integral on the right of the
equation of Theorem 15 can be assigned a meaning even when f does not
exist. In the case of a bounded variable this integral is defined to be the
limit of the approximations given in Theorem 13. A meaning can also
be assigned in certain unbounded cases. This integral is called a Stieltjes
integral. Another integral expression for E(X) is
E(X)

=

f X dPr.

This is called a Lebesgue integral and it is also defined in terms of the
approximations of Theorem 13.
The terms expectation and mean are often used as synonyms for expected
value.
Probability Density and Joint Distribution. The derivative f of
F is called the probability density. When the density is given the distribu-

GENERAL MATHEMATICS

12-10

tion function can be computed by the formula
F(A) =
.

f

A

feu) duo

-00

See Figs. 4 and 5, Sect. 5.
The joint distribution of two random variables X!, X 2 is a function F
such that FO\!' A2) is the probability that Xl < AI, and X 2 < A2' If the
joint distribution has a density f then

if FI, F2 are the distribution functions of Xl, X 2, and !I, f2 are the corresponding densities then
F,(A,)

~ f"
-00

F 2 (A2)

~ f"
-00

1,(U1)

~f

foo I(u" U2) du, dU2 ~ f" 1,(U1) dU1,
-00

-00

foo I(u" U2) dU2 du, ~ f "/2 (U2) dU2,
-00

oo/
(U 1' "2) dU2,
-00

(2(U2)

~ foo/(U" U2) dU1.
-00

The expected value of the product X I X 2 is

If Xl, X 2 are independent then.

and
Furthermore:
THEOREM 16:
If Xl, X 2 are independent then E(X 1 X 2 ) = E(X 1 )E(X2 ) •
. ' Two random variables Xl, X 2 for which E(X 1 X 2 ) = E(X 1 )E(X2 ) are
said to be uncorrelated. Thus Theorem 16 states that independent random
variables are uncorrelated. This result holds even when there is no joint
probability density. The converse is not true. That is, random variables
may be uncorrelated, but not independent.

PROBABILITY

12-11

As an illustration of a pair of random variables which are dependent but
uncorrelated, consider an honest die whose faces are numbered respectively
-3, -2, -1, 1,2,3. Let Xl denote the random variable which takes on
the value resulting from the throw of this die and let X 2 = X12. Then

E(X 1 ) = 0,

E(X 2) = J 34,

E(X 1X 2) = E(X I 3) = 0 = E(X I )E(X2)
Hence X I and X 2 are uncorrelated but they are clearly dependent.
4. VARiANCE
THEOREM 17. If f is a function of a real variable A with at most a finite
number of discontinuities and if X is a random variable with distribution
function F, then f(X) is a random variable and

E(f(X)) =

foo feu) dF(u)
-00

whenever the integral exists.
A special case of this formula is the following:
If
E(X) = J.L
and
E«X - J.L)2) = fOO (u - J.L)2 dF(u) = E(X2) - E2(X) = u2(X),
-00

then u2(X) is called the variance of X and the positive square root of the
variance, u(X), is called the standard deviation of X.
The Properties of Variance.
THEOREM 18.
u 2 (X
A) = u 2 (X),

+

(AX) = A2 u 2 (X).
THEOREM 19. If XI, X 2, "', Xn are independent random variables, then
2
2
U (Xl
X2
Xn) = U 2 (Xl)
U (X2)
u2(Xn),
2
2
2
2
u2
Xn) = ~ u (Xd
U (X2)
U (Xn).
n
n
n

+ + ... +
(XI + X+ ... +

U

2

+
+

+ ... +
+ ... +

If X E is the set of all points ~ such that IX(~) - J.L I < e where J.L = E(X)
then x' E is the set~. of all ~ such that IX (~) - J.L I ~ e. Moreover the inequality
e21/1x' /~) ~ (X(~) - J.L)2
can readily be verified when ~ is in X E and when ~ is in x' E and hence this
inequality holds for all~. Thus by Theorems 9 to 11 it follows that
E(e 21/1x) = e2Pr(x'E) ~ E«(X - J.L)2) = u2(X),

12-12

GENERAL MATHEMATICS

and therefore
This inequality is called Tchebysheff's inequality. By combining Tchebysheff's inequality with Theorem 19, one obtains:
THEOREM 20. If Xl, X 2, ... , Xn are independent random variables with
common mean J.L and common variance 0-2 and if XE,n is the set of points ~
for which
then
and

lim Pr(xE,n) = 1 for every e

> o.

n-+oo

The Strong Law of Large NUlllbers. The first part of Theorem 20
gives a crude approximation for the probability that the average will
differ from the common mean by less than e. Recall that the second part
of this theorem is ~he weak law of large numbers. The reasoning by which
one arrived at this result is of course circular but this circularity can be
avoided. The strong law of large numbers is the following:
THEOREM 21. If Xl, X 2, ... , Xn are independent random variables with
common expected value J.L and common variance 0-2 and if x is the set of points
~ for which
lim XI'(~)
X2(~)
Xn(~) = J.L.

+

+ ... +

n

n-+oo

Then Pr(x) = 1.
Even though Pr(x) = 1 it is not in general true that x = 1. If an
element x of B is such that Pr(x) = 1, then x is said to be almost certain.
The strong law of large numbers states that it is almost certain that the
limit of the average is the common expected value.
The following example will help one understand the distinction between
certain and almost certain. Let X be a random variable with distribution
function F defined as follows:

o if "A
F("A) =

{

::::;; 0

"A if 0 ~ "A ~ 1

1 if 1

~ "A.

Then it is almost certain, but not entirely certain, that X will take on a
va]ue distinct from ~.

PROBABILITY

12-13

5. CENTRAL LIMIT THEOREM

Distribution of Sums and Averages of Independent Random
Variables. Consider
eiXt = cos Xt + i sin Xt

where i 2 = -1 and t is a parameter. This exponential converts the real
random variable X into a complex valued random variable. The expected
value of the latter random variable is defined in a natural way to be
E(e iXt ) = E(cos Xt) + iE(sin Xt) = ¢x(t).
The advantage of the exponential is that it converts a sum into a product
and hence enables one to make use of the condition of independence. Thus
if X, Yare independent then it can be shown that eiXt , eiYt are independent
and hence by Theorem 16
¢x+y(t) = E(eiXteiYt) = ¢x(t)¢y(t).

The advantage of the factor i is that it produces a bounded random variable
and insures the existence of the expected value for all real values of t.
The advantage of the parameter t. is that it produces a function in terms
of which one can compute the distribution function. Thus ¢x is a function
of the parameter t called the characteristic function of the random variable X.
Unfortunat~ly the phrase, characteristic function, has two distinct meanings in the theory of probability, namely, characteristic function of a set of
points and characteristic function of a random variable.
Computation of the Characteristic Function of a Simple Random
Variable. Let
n

X =

L

AkY;xk,

k=l

where

Xl, X2, ••• , Xn

are mutually exclusive and exhaustive. Then
n

eiX(I;)t =

L

ei'AktY;Xk(~)

k=l

for all ~, since if ~ lies in Xk, then Y;xk(~) = 1 and the remaining characteristic
functions have the value 0 and hence both sides of the equation become
ei'Akt. From this it follows that
n

E(e iXt ) =

L

Pr(xk)ei'Akt = ¢x(t).

k=l

As a special case consider the simple random variable Y;x. One can write
Y;x = 0 ·Y;x'

+ l·1/tx

GENERAL MATHEMATICS

12-14

and hence

+ peit

cJ>if;x(t) = q

where p = Pr(x), q = Pr(x').
Next compute the characteristic function for the sum 1/Ixl +-,pX2 + ...
+ 1/Ixn where Xl, X2, " ' , xn are independent and Pr(xk) = p, Pr(x'k) = q
for each k. Then
'
n

II cJ>fXk(t)

cJ>fXl+fx2+"'+fxn(t) =

= (q

+

peit)n.

k=l

If X has the distribution function F then the characteristic function is
cJ>x(t) = E(e iXt ) = foo ei"At dF(A).
-00

This transforms the function F into the ftinction cJ>x (essentially the
Laplace-Fourier transform). The inverse transform is
!(F(A)

1

+ F(A+» = - .

27r~ h

lim
-+

fh eit -h

00

cJ>x(t)e-"At

t

dt.

To see why this is the case note that

1 if J.L < A,
= ! if J.L = A.
/ o if A < J.L.
This formula is verified by converting the integral into integrals of the form

f

oo

-00

sin mt
--dt
t

by means of the relation eimt = cos mt + i sin mt. N ow compute the inverse transform for a simple random variable

where

XI, X2, •• ', Xn

are mutually exclusive and exhaustive. Since
n

L: Pr(xk)
k=l

=

1,

12., 15

PROBABILITY

then

-

1

21ri

n

f

00

.
.
tt
(t) -tAt
e - C/>X e

1

dt = -

21ri

-00

f

00

"

L.J

n

Pr(x .)eit _
k

k=l

"
L.J

k=l

Pr(x )ei">\kte-iAt
k

-------------dt

-00

if A ~ any Ak. The final sum is the probability that X will take on a value
.
less than A and hence this sum is equal to
F(A) = F(A+) = t(F(A)

+ F(A+ )).

If A equals some Ak, then the corresponding term Pr(xk) /2 must be added
and again the result is Y2(F(A) + F(X+ )). The proof for an arbitrary
random variable X consists in approximating X by a simple random variable. In the general case the integral from -00 to 00 may not exist, and
one has to resort to integrating from -h to h and then passing to the limit.
Binolllial and Poisson Distributions. If F n is the distribution function of

where Xl, X2, "', Xn are independent and Pr(xk) = p, Pr(x'k) = q for each
k then F n(A) is the probability that less than A of the events Xl, "', Xn
will succeed. If X ~ any Ie then
Fn(X) = -

1

21ri

foo
-00

eit - c/>x(t)e- iAt

t

dt

where GD is the number of combinations of n things taken k at a time.
If A = some k, then the corresponding term (k)pkqn-k /2 must be added.

GENERAL MATHEMATICS

12-16

This is called the binomial distribution. To obtain an approximation to
this distribution for small p and large n set p = p./n and let n become
infinite. The limiting distribution F is given by
p.k

2: -

F('A) =

k compute its derivative. Thus
d
1
- ('A) = d'A
21r
2

Since t

+ 2i'At =

d
1
- ('A) = d'A
21r

(t

f+OO e- t2 / e- "Xt.
2

+ i'A)2 + 'A , set u
2

f+oo e-(t+i"X) /2e-"X /2 dt.
-00

i

-00

=

-

t

+ i'A, du

2
e-"X /2

= --

21r

=

dt and obtain

f+oo e-00

u2

/

2 du = Ae-"X
21r

/2
J

PROBABILITY

12-17

where

A = f~ e~'/2 duo
Hence
(X)

FIG. 5.

Probability density for the normal distribution function.

6. RANDOM PROCESSES
A continuous random process is a function X which assigns to every real
number t, a random variable Xt. If t ranges only over the integers then
the process X is said to be discrete and if t ranges only over the positive
integers then the process X is simply a sequence of random variables.
Consider complex valued random variables, i.e., random variables of the
form X = Xl + iX2 where Xl, X 2 are real. The complex conjugate of
X is X I - iX2 and is denoted by X. The inner product of two random
variables X, Y is denoted by (X, Y) and defined by the equation
(X, Y) = E(XY).

The covariance function R of a process X is defined by the equation
R(t, T)

= (Xt+n Xt).

If R depends only on T and not on t, then the process is said to be stationary
in the wide sense. A physical example of such a process is the phenomenon
of noise. In the mathematical model (i.e., the process X) the variable t
is interpreted as time. The process can be envisioned as being composed
of simple harmonic oscillations in which the amplitudes associated with the
various frequencies are selected in accordance with a certain random procedure. A simple harmonic oscillation of frequency A is represented by
e27ri"At and the (complex) amplitude associated with the frequencies between A and A dA is denoted by dY"A, and hence the contribution of such
frequencies to the process is

+

e27ri"At dY"A.
Here Y is a process which assigns' to each real number A a random variable

PROBABILITY

12-19

Y}.. The process X is obtained by adding the contributions associated with
the various frequencies. Hence

Xt

=foo e27ri}'tdY}..
-00

Thus the spectrum of the process X is described by the process Y.
The expected value of the square of the amplitude associated with the
frequencies between "A and "A
d"A is denoted by dF("A) and is defined by

+

dF("A) = (dY}., dY}.)

~

o.

Thus F is a monotone nondecreasing function. A property of the process
Y, called the property of orthogonal increments, is the following
(dY}., dYp.) = 0,

if the intervals d"A, dp, have no common points. Hence
R(T) = (X'+T> X,)
=

(f_:

e2 .,,(,+,)

dY" f~ e

2 i
• "

dY,)

foo foo e2Ti}.(t+T)e27rip.t (dY}., dYp.)
-00

=

=

-00

foo e27ri}.(t+T)e-27ri}.t dF("A) = foo e27ri}.T dF("A).
-00

-00

As a special case of this formula
R(O) =

foo dF("A).
-00

The following example of a one-dimensional Brownian motion will aid
in visualizing a random process. A tiny mirror is suspended by a fiber.
Particles of air bombard the mirror and cause it to turn through an angle.
A beam of light is reflected by the mirror and the position of the reflection
enables the observer to measure the angle X(t) through which the mirror
has turned at time t. Since X(t) is produced by the average effect of a
number of bombardments, one might expect X(t) to have a normal distribution. That is, the probability that X(t) < "A is

_~- f}.

o-v 211"

e-X2/2u2

dx,

-00

where 0-2 is the variance and is assumed to be independent of t. From this
formula one can readily show that E(X(t)) = o. The zero angle is the angle

GENERAL MATHEMATICS

12-20

in which the fiber is untwisted. If t and t + l' are two times at which the
mirror is observed, the joint probability that X(t) < Al and X(t + 1') < A2
is
1

271"0"

2 ... / .
V ~ -

fA! fA2

2

r (1')

-ao

-ao

e

-[X~-2r(T)Xy+y2] /20'~[1-r2(T)]

This is called the bivariate normal distribution.
can show that the covariance is
(X(t), X(t

+ 1')

=

d d
x y.

From this formula one

0"2r(1') ,

and hence that the process is stationary in the wide sense. If it is known
that X(t) = a then the probability that X(t + 1') < A is

fA

1

O"V 271"(1

- r2(1'»

..., .

e-[x-ar(r)] M/20'M[1-r~(T)] dx.

-ao

Any information concerning the motion previous to time t is irrelevant to
this probability. A process having this property is said to be Markovian.
The assumption that the above process is Markovian implies that
r(1') = e-kT , where k > O.

REFERENCES
1. H. Cramer, The Elements of Probability Theory, Wiley, New York, 1950.
2. J. L. Doob, Stochastic Processes, Wiley, New York, 1953.
3. W. Feller, Probability Theory and Its Applications, Wiley, New York, 1950.
4. A. N. Kolmogoroff, Foundations of the Theory of Probability, Chelsea, New York,
1950.
5. P. Levy, Theorie de l'addition des variables alCatoires, Gautier-Villars, Paris, 1937.
6. J. V. Uspenski, Introduction to Mathematical Probability, McGraw-Hill, New York,
1937.
7. Ming Chen Wang and G. E. Uhlenbeck, On the theory of Brownian motion. II.
Revs. Mod. Phys., 17,323-342 (1945).

A

GENERAL MATHEMATICS

Chapter

13

Statistics
A. B. Clarke

1. Nature of Statistics
2. Probability Background
3. Important Probability Distributions

13-01

4. Sampling
5. Bivariate Distributions

13-06

6. Tests for Goodness of Fit

13-16

7. Sequential Analysis
8. Monte Carlo Method

13·17

9. Statistical Tables
References

13-21

13-02
13-04
13-13
13-16
13-18

1. NATURE OF STATISTICS
The basic assumption underlying the application of the mathematical
theory of probability and statistics to physical situations is the following:
If a physical "experiment" is repeated under "identical" conditions and
"without bias," the observed relative frequency of success of any physical
"event" approaches as a limit the probability assigned to this event by
some underlying probability distribution.
Probability theory is the study of probability distributions as mathematical entities. Statistics is the analysis of probability distributions on
the basis of a number of experimental observations; the distribution is
in general not fully known to start with, and one seeks properties of the
distribution on the basis of the observations. Since an infinite number
13·01

GENERAL MATHEMATICS

13-02

'of experiments would usually be required to determine a distribution with
precision, it is only rarely possible to answer a statistical question with 100
per cent surety. Accordingly the answer to each statistical question should
consist of two parts: (a) the best possible answer to the question and (b)
the amount of confidence that can be placed in the correctness of this
answer. The omission of (b) greatly diminishes the value of the conclusion.
2. PROBABILITY BACKGROUND

The basic probability theory required for statistics is reviewed in Chap.
12. For the sake of convenience the principal definitions are recalled here.
(See Refs. 2, 6.)
SaInple Space. The sample space S is the collection of all, possible
outcomes of a physical experiment; the individual outcomes are sample
points. By an event is meant a certain type of outcome; in other words,
a certain set A of sample points. A class ex of events is assumed ~pecified.
To each event A of class ex is assigned a probability, Pr(A), which is a real
number between 0 and 1. One has Pr(0) = 0, PreS) = 1, and Pr(A U B)
= Pr(A) + Pr(B), provided A, B have no points in common (are mutually
exclusive events).
A sample space S is discrete if its points form a finite or infinite sequence
h, ~2, •••• For discrete spaces a probability is usually defined for each
point, and then for each subset A as the sum of the probabilities of the
points in A.
RandoIn Variables. A random variable is a function X = X (~) which
assigns to each sample point ~ a real number x in such a fashion that, for
each a, the set A for which x ~ a has a probability; thus Pr(X ~ a) is
well defined. With each random variable X is associated a distribution
F(x);F(a) = Pr(X ~ a). F(x)isnondecreasing,F(-oo) = O,F(+oo) = 1.
If XI, .. " Xn are random variables associated with the same experiment,
then their joint distribution is F(XI' "', x n ), where F(al' "', an) is the
probability assigned to the set where Xl' ~ aI, "', Xn ~ an. The random
variable X has a density j, if
F(x) =

(1)

IX j(t) dt;
-00

the random variables Xl, "', Xn have a joint density j if

(2)

F(x., ... , x n )

= f~ f~·

.. f~ I(t., ... , t

n)

dt n • • ·dt,.

When the range or collection of values of X forms a discrete sequence
then

XI, X2, ••• ,

STATISTICS

I: Pr(X =

F(a) =

(3)

Xi ~a

Xi) =

13-03

I: f(Xi) ,

Xi ~a

where Pr(X = Xi) = f(Xi) is the probability assigned to the set of sample
points for which X = Xi. This can be generalized to joint distributions.
Random variables Xl, "', Xn are mutually independent if
(4)

where F is the joint distribution and Fi(Xi) is the distribution of Xi.
Throughout the following it will be assumed that either the range of
each random variable is discrete or else each distribution has a density
(continuous case).
The expectation or mean of a random variable X is

E(X)

(5)

=

loo xf(x) dx
[I: Xi!(Xi)

(continuous case),

-00

(discrete case).

i

If cp is a continuous function of x, then

(continuous case),
(6)

(discrete case).
Moments. The moments of X about the origin are the numbers

(7)

J.I.'k

= E(X lc ),

k = 1,2, ....

The moments of X about the mean are defined by
k = 2,3,

(8)

where

E(X) = J.I.\. The quantity 0"2 = J.l.2 is the variance of X, while
0" = ~ is the standard deviation of X.
By expanding the quantity (x - J.I.)k by the binomial formula and
applying eq. (8), one obtains an expression for the J.l.lc in terms of J.I.\,
J.I. =

In· particular,
(9)

The mean J.I. is a measure of the location of the "center" of the distribution,
while the variance 0"2 is a measure of the "spread" of the distribution.

13-04

GENERAL MATHEMATICS

Other possible measures of central tendency are:
Median: a point Xo such that Pr(X ~ xo) = Pr(X ~ xo),
Mode: a point Xo where f(x) is a maximum,
Midrange: tea + b), if a ~ x ~ b is the smallest interval containing
all x for which f(x) > 0.

Other measures of the spread of the distribution are:
Mean deviation from the mean = E (I X - J.L I),
Probable error: a number a such that Pre IX - J.L I ~ a) =

t.

For comparison and tabulating purposes it is useful to describe a random
variable in a manner independent of origin and scale. These requirements
are met by the standardized variable X* = (X - ·J.L)/u, which has mean 0,
has standard deviation 1, is dimensionless, and is invariant under any
linear change of variable: X' = aX + b.
3. IMPORTANT PROBABILITY DISTRIBUTIONS

Binomial or Bernoulli Distribution. If X represents the number of
"successes" in n independent trials of an experiment, with probability p
of "success" each time, then X takes on the values 0, 1, 2, "', n with
probabilities

(10)

q

= 1-

p.

Hence the sample space S has 2n points ~, each representing one particular
succession of successes and failures. The random variable X assigns to
each ~ the number of successes in~. The mean and standard deviation
are found to be
(11)

J.L

=

np,

u

=

v;;;q.

Poisson Distribution. A discrete random variable X with values
0, 1, 2, "', is said to have a Poisson distribution if the corresponding
functionf(x) has form

(12)

(x = 0, 1, 2, ... ),

where a is a positive constant. One finds
(13)

J.L

=

a,

u

=

Va.

For large n and small p the binomial distribution (10) is well approximated
by the distribution (12), with a = np.

STATISTICS

13-05

If a number of events occur independently in space or time and if X
represents the number of these events occurring in any given space or
time interval, then the Poisson distribution is a good model for the distribution of X. Examples are the number of red corpuscles on a microscope
slide, the rate of emission of electrons or a-particles, the number of incoming calls to a telephone exchange.
NorIllal Distribution. Let X be a continuous random variable with
density
(14)

Then X is said to have a normal distribution; its mean and standard deviation
are J.L and u. One terms ¢(x) the normal density function of mean J.L and
standard deviation u; the corresponding distribution
cI>(x)

(15)

=

IX ¢(t) dt
-00

is the normal distribution function. The function cI> is tabulated for J.L = 0
and u = 1, and any other case is reduced to this by replacing X by its
standardized variable X* (Sect. 2). See Table 2, Sect. 9.
For large values of n, the binomial distribution may be approximated
by the normal distribution having J.L = np, u = vinpq. More precisely,
if X has a binomial distribution, then as n -7 00
(16)

Pr

(~;: ~ t) ~ Pr(X* ~ t)

---> (t).

The X2-Distribution. Let X be a continuous random variable with
values in the range 0 ~ x < 00. Then X is said to have a x2-distribution
with n degrees of freedom, if X has density

x

(17)

~

O.

One finds
(18)

J.L

= n,

(n=I,2,"')'

This type of distribution is of great importance in the theory of sampling
of normal populations (Sect. 4). See Table 3, Sect. 9.

GENERAL MATHEMATICS

13-06

Let X be a continuous random variable

Student t-Distribution.

with density
n

(19)

+

1)

r ( -2sn(x) = _ /
V

n7r r(n/2)

(

X2)-Cn+1)/2

1

+-

n

.

Then X is said to have a Student t-distribution with n degrees of freedom
(n = 1, 2, ... ). One finds
(20)

J..I.

= 0,

u=g.
°

As n ~ 00, sn(x) approaches the normal density function of mean and
standard deviation 1. The t-distribution is of value in sampling theory
(Sect. 4). See Table 4, Sect. 9.
4. SAMPLING

In a great variety of practical problems a precise answer is obtainable
only by making a very large number of measurements. For the sake of
economy, one makes a smaller number of measurements and estimates
the true answer from these. The theory of such methods of estimation is
called sampling . .Examples. The average height of 1,000,000 soldiers can
be estimated by averaging the heights of a selected 1000 soldiers. The
outcome of a presidential election can be estimated by polling a small
number of voters.
The successive measurements in an experiment yield a random sequence
XI, .. " Xn called a sample.
.
EXAMPLE. The measurements of the height of 1000 soldiers yield 1000
numbers. One can regard each soldier as a sample point ~, the aggregrate
of all 1,000,000 soldiers as the sample space S. If the heights follow some
definite pattern, then there will be a definite probability that the height
Xl be less than a fixed value. Hence, there is a distribution function FI(XI)
associated with Xl and Xl can be regarded as the value of a random variable Xl. Similar statements apply to the measurements X 2 , " ' , X n .
If the measurements are independent (i.e., each one is made without
considering the others), all measurements have the same distribution
F(x) and XI, "', Xn are random variables with joint distribution
(21)

The assumptions of the example considered will be assumed to hold
generally. A sample space is assumed given, with associated probabilities.

STATISTICS

13-07

A measurement x is a value of a random variable X; the probability that
X ~ a is F(a), where F is the distribution of X. Successive measurements
yield random variables Xl, "', X n . It will be assumed that these are
independent, so that eq. (21) gives the joint distribution.
Sample Moments. The sa1nple mean or average is the number

Xl

+ ... +

Xn

x = ------------

(22)

n

The sample moments about the origin and about the mean are defined
respectively as
1
(23)

mk

n

so that x = m'l'
the formula

The, number

8

2

n

L

=-

(Xi -

x)\

i=l

= m2 is the sample variance. One has

(24)

One can regard x and 8 2 as estimates for the mean fJ. and variance (]"2 of X;
x and S2, and indeed all the moments, are random variables, being functions
,
of Xl, "', X n .
From the fact that all Xi have a common distribution F(x), one can
deduce properties of the distribution of the various moments. For example,
(25)

1

E(x) = E ( - "2;X i
n

)

1

= - "2;E(X i )

n

= -1 ~fJ.
n

= fJ.

Similarly,
(26)

E(S2)

n - 1

= _ _ (]"2
n

Unbiased Estimate. A sample estimate is termed unbiased if its expectation is equal to the parameter being estimated. Equation (25) shows
that x is an unbiased estimate of fJ.; eq. (26) shows that S2 is not an unbiased
estimate of (]"2, although [n/(n - 1)]s2 is such an unbiased estimate.
Unbiasedness is a useful property of an estimate, but it is not as important
as some other properties. The bias in S2 need be considered only if n is
sufficiently small (less than 20, for example), so that (n - l)/n is appreciably ·different from 1.

GENERAL MATHEMATICS

13-08

Computational Procedures
Data Classification. The computation of sample moments for large
samples is simplified by the classification of the data. In this procedure
the sample range, the interval from the smallest to the largest sample value,
is divided into approximately fifteen class intervals of equal width (the
class width). The number of measurements whose Xi value lies in each
class interval, the frequency of the class interval, is then recorded, as well
as the midpoint of each interval, the class mark. In the subsequent computation one then replaces each sample value Xi by the class mark of the
corresponding class interval; usually a negligible error is introduced by this
replacement. Example. In measuring height of a population to the
nearest 0.1 in. one can choose class intervals 1 in. in width; to avoid ambiguity the end points of the class intervals should be 60.05 in., 61.05 in.,
.. " for instance, rather than 60 in., 61 in., ....
Computation. If there are h class intervals with frequencies Ii and
class marks Xj (j = 1, "', h), then the moments are computed as follows:

(27)

(28)

m ,2

1

h

n

j=l

= - ""j-2
L.J jXj,
1

(29)

S2 = -

n

h
""
L.J

j j (Xj

-

X-)2 =

m,2

-

X-2 •

j=l

The computation can be further simplified by coding the data; that is,
by introducing new measurements Yj by a linear change of variables:
(a ~ 0),

(30)

where the coefficients a, b are chosen to simplify the Yj data. The new
mean and variance y and Sy2 are related to the old, x and sx 2 , by the equations
'(31)

x = ay +

b,

If a is chosen to be the class width and b is taken to be one of the class
marks (usually chosen near the middle of the range), then the Yj are integers,
positive or negative, so that the computation is considerably simplified.
After y and sl are computed, x and sx 2 are found from eq. (31). The
procedure is illustrated in tabular form in Table 1.
'

STATISTICS
TABLE 1.

13-09

COMPUTATION OF SAMPLE MEAN AND VARIANCE
Coded Marks

Class
Intervals
aj - aj-l = a
aO--al
al--a2
a2--a3
a3-·-a4
a4--a5
aG--a7
a7--ag
ag--ag
ah-l--ah

Totals

Class Mark
Xj

Frequency

h

II

1HJ

Xl = ao
X2 = al

!I

I h

+ !a
+ !a

Vi = Xj - b
a

hYi

hvi

-5

-10

-4

-24

50
96

0

0

-3
-2
-1

b = a6 +!a

0
1
2

III

fit

:t1£

n
Sy2 = m'2,y -

nfj

tP,

x=

afj

+ b,

nm'2,1/

sx2 = a 2sy2•

Distribution of SaIllple MOIllents

If some information is known concerning the distribution F(x) of the
random variable X being measured, then one can draw conclusions as to
the distributions of the sample moments. These conclusions in turn permit one to make statements as to the accuracy of the sample moments as estimates of the true moments. For example, suppose that the variable X is
distributed uniformly over an interval of length 1; that is, F'(x) = f(x) = 1
for c ~ x ~ c + 1, and f(x) = 0 otherwise. If c is unknown, each sample
will give information as to its value. A single measurement X then allows
one to conclude that X - I ~ c ~ X, the mean c + 72 would be estimated as X and one knows that, with probability 1, the mean lies
between X - 7~ and X + 72.
One now proceeds to list properties of the distribution of sample moments
when various assumptions are made concerning the distribution F(x).
These results are applied below to estimation of accuracy of the estimates.
Distribution of x When (j Is Known. If X is normally distributed,
then x is also normally distributed, with mean J.L and variance (j2jn (Sect. 3).
Equivalently, one can state that
(32)

X-J.L_/-

x'=--Vn
(j

has a normal distribution of mean 0 and variance 1. The conclusion is

GENERAL MATHEMATICS

13-10

approximately true even if X does not have a normal distribution, provided
(See Chap. 12.)
Distribution of x When u Is Unknown. Let s = Y:;2, the sample
standard deviation and let

n is large.

t

(33)

x - p, _ ;---;= - - V n -1,
s

so that t can be considered as a random variable. If X is normally distributed, then t has a Student t-distribution with n - 1 degrees of freedom.
Again the conclusion is approximately true even if X does not have a normal
distribution, provided n is large. Furthermore, the i-distribution approaches
the normal distribution of mean 0 and variance 1 as n ~ 00.
Distribution of s When u Is Unknown. Let
(34)

If X is normally distributed, then u has a x2-distribution with n - 1
degrees of freedom. Again the conclusion is approximately true for large
n, regardless of the form of F(x).
Confidence Intervals and Hypothesis Testing

The results described are now applied to obtain estimates for the accuracy
of x and S2 as estimates of p, and u 2 • The accuracy will be described in the
terminology of confidence intervals. The statement "the interval (a, b)
is a 95 per cent confidence interval for p," means that Pr(A ~ p, ~ B)
is 0.95, where A, B are random variables with observed values a, b. One
can also say "either a ~ J.l. ~ b or an event of probability only 0.05 has
occurred in the sampling."
Confidence Intervals for p, When u Is Known.
The 95 per
cent interval is obtained from the fact that (x - p,)Vn/u has a normal
distribution of mean 0 and variance 1. By means of tables (Sect. 9) one
determines the number to .95 on the normal density curve such that 95
per cent of the area lies between -to.95 and i O. 95 ; that is,
(35)

 51n and k > 5.
Frequently in such a problem the hypothetical distribution is not
completely specified, but contains some adjustable parameters. For
example, one might wish to test whether a sample comes from a normal
population, in which case the mean and variance of the population must
first be estimated from the sample. It can be shown that the x 2-test
usually remains valid, provided one further degree of freedom is subtracted for each parameter estimated. More precisely, in order for the
test to be valid, the parameters must be estimated by the method of maximum lfkelihood. See Refs. 4, 5.
7. SEQUENTIAL ANALYSIS

The usual method of collecting data consists of the determination of a
fixed number of observations and their subsequent statistical analysis.
Frequently a considerable reduction in the number of observations required can be made by making the observations in sequence and reanalyzing the data after each observation. Such a process is known as a
sequential analysis and is particularly useful for such problems as production
testing.
EXAMPLE. Consider a population whose density function f(x; (J) depends
on some parameter (J (mean, variance, etc.) whose value is not known; let

STATISTICS

13·17

us suppose that (J can take only one of two given values (Jo, (Jl. The problem is to decide which value is the correct one. In such a decision problem,
errors can be made in two ways: by deciding that {}1 is correct when (Jo
is actually the true value of (J, or by deciding that (Jo is correct when {}l
is actually the true value. Denote the probabilities to be assigned to
these two types of errors by a and {3 respectively. The values of a and {3
can be preassigned by an experimenter, and clearly both should be small
if one wants to have great confidence in one's decision; however, the
smaller a and {3 are taken to be the more observations will be required to
come to a decision.
Let XI, X2, ••• be the sequence of observed values, and let f(x; (}j) denote
the density function of the population when (Jj is the true value of (J, j =
0, 1. Define the quantities
n

(51)

Pjn

=

II f(Xi; (Jj)

(j = 0, 1),

i=l

Each Pjn can be found from the preceding one after each observation by
multiplying by the corresponding f(xn; (Jj). The decision rule is then the
following.
If
{3

(52)

1-{3

--(u) =

f

---=

u

x2
e -"2 dx

FOR

0.00

~ U ~

(Ref. 10)

2.99.

V211' -co
.00

.01

.02

.03

.04

.05

.06

.07

.08

.09

.0
.1
.2
.3
.4

.5000
.5398
.5793
.6179
.6554

.5040
.5438
.5832
.6217
.6591

.5080
.5478
.5871
.6255
.6628

.5120
.5517
.5910
.6293
.6664

.5160
.5557
.5948
.6331
.6700

.5199
.5596
.5987
.6368
.6736

.5239
.5636
.6026
.6406
.6772

.5279
.5675
.6064
.6443
.6808

.5319
.5714
.6103
.6480
.6844

.5359
.5753
.6141
.6517
.6879

.5
.6
.7
.8
.9

.6915
.7257
.7580
.7881
.8159

.6950
.7291
.7611
.7910
.8186

.6985
.7324
.7642
.7939
.8212

.7019
.7357
.7673
.7967
.8238

.7054
.7389
.7703
.7995
.8264

7088
.7422
.7734
.8023
.8289

.7123
.7454
.7764
.8051
.8315

.7157
.7486
.7794
.8078
.8340

.7190
.7517
.7823
.8106
.8365

.7224
.7549
.7852
.8133
.8389

1.0
1.1
1.2
1.3
1.4

.8413
.8643
.8849
.90320
.91924

.8438
.8665
.8869
.90490
.92073

.8461
.8686
.8888
.90658
.92220

.8485
.8708
.8907
.90824
.92364

.8508
.8729
.8925
.90988
.92507

.8531
.8749
.8944
.91149
.92647

.8554
.8770
.8962
.91309
.92785

.8577
.8790
.8980
.91466
.92922

.8599
.8810
.8997
.91621
.93056

.8621
.8830
.90147
.91774
.93189

1.5
1.6
1.7
1.8
1.9

.93319
.94520
.95543
.96407
.97128

.93448
.94630
.95637
.96485
.97193

.93574
.94738
.95728
.96562
.97257

.93699
.94845
.95818
.96638
.97320

.93822
.94950
.95907
.96712
.97381

.93943
.95053
.95994
.96784
.97441

.940(]2
.95154
.96080
.95855
.97500

.94179
.95254
.96164
.96926
.97558

.94295
.95352
.96246
.96995
.97615

.94408
.95449
.96327
.97062
.97670

2.0
2.1
2.2
2.3
2.4

.97725
.98214
.98610
.98928
.9 2 1802

.97778
.98257
.98645
.98956
.9 22024

.97831
.98300
.98679
.98983
.9 22240

.97882
.98341
.98713
.9 20097
.9 22451

.97932
.98382
.98745
.9 20358
.9 22656

.97982
.98422
.98778
.9 2 0613
.9 2 2857

.98030
.98461
.98809
.9 2 0863
.9 2 3053

.98077
.98500
.98840
.9 2 1106
.9 23244

.98124
.98537
.98870
.9 2 1344
.9 23431

.98169
.98574
.98899
.9 21576
.9 23613

2.5
2.6
2.7
2.8
2.9

.9 23790
.9 25339
.9 26533
.9 2 7445
.9 28134

.9 2 3963
.9 25473
.9 2 6636
.9 2 7523
.9 28193

.9 2 4132
.9 25604
.9 2 6736
.9 2 7599
.9 28250

.9 2 4297
.9 25731
.9 26833
.9 27673
.9 28305

.9 2 4457
.9 25855
.9 26928
.9 2 7744
.9 28359

.9 2 4614
.9 25975
.9 2 7020
.9 2 7814
.9 28411

.9 2 4766
.9 2 6093
.9 2 7110
.9 2 7882
.9 2 8462

.9 2 4915
.9 2 6207
.9 2 7197
.9 2 7948
.9 2 8511

.9 25060
.9 2 6319
.9 2 7282
.9 28012
.9 2 8559

.9 25201
.9 26427
.9 2 7365
.9 2 8074
.9 2 8605

U

Example: <1>(2.57) = .9 2 4915 = .994915.

TABLE

P
Degrees
of
freedcm

P = 0.99

=

3.

THE X 2 DISTRIBUTION

the probability ofax2 deviation greater than the tabulated value
0.98

4

0.95

0.90

0.80

0.70

0.50

0.30

0.20

0.10

0.05

0.02

0.01

0.00393
0.103
0.352
0.711
1:145
1.635
2.167
2.733
3.325
3.940

0.0158
0.211
0.584
1.064
1.610
2.204
2.833
3.490
4.168
4.865

0.0642
0.446
1.005
1.649
2.343
3.070
3.822
4.594
5.380
6.179

0.148
0.713
1.424
2.195
3.000
3.828
4.671
5.527
0.393
7.261

0.455
1.386
2.366
3.357
4.351
5.348
6.346
7.344
8.343
9.342

1.074
2.408
3.665
4.878
6.064
7.231
8.383
9.524
10.656
11.181

1.642
3.219
4.642
5.989
1.289
8.558
9.803
11.030
12.242
13.442

2.706
4.605
6.251
7.779
9.236
10.645
12.017
13.362
14.684
15.981

3.841
5.991
7.815
9.488
11.070
12.592
14.067
15.507
16.919
18.307

5.412
7.824
9.837
11.668
13.388
15.033
16.622
18.168
19.079
21.161

6.635
9.210
11.341
13.271
15.086
16.812
18.475
20.090
21.666
23.209

0.000157
0.0201
0.115
0.297
0.554
0.872
1.239
1.646
2.088
2.558

0.000628
0.0404
0.185
0.429
0.752
1.134
1.564
2.032
2.532
3.059

12
13
14
15
16
17
18
19
20

3.053
3.571
4.107
4.660
5.229
5.812
0.408
7.015
7.633
8.260

3.609
4.178
4.765
5.368
5.985
6.614
7.255
7.906
8.567
9.237

4.575
5.226
5.892
6.571
7.261
7.962
8.672
9.390
10.117
10.851

5.578
6.304
7.042
7.790
8.547
9.312
10.085
10.865
11.651
12.443

6.989
7.807
8.634
9.467
10.307
11.152
12.002
12.857
13.716
14.578

8.148
9.034
9.926
10.821
11. 721
12.624
13.531
14.440
15.352
16.266

10.341
11.340
12.340
13.339
14.339
15.338
16.338
17.338
18.338
19.337

12.899
14.011
15.119
16.222
17.322
18.418
19.511
20.601
21.689
22.775

14.631
15.812
16.985
18.151
19.311
20.465
21.615
22.760
23.900
25.038

11.215
18.549
19.812
21.064
22.307
23.542
24.769
25.989
27.204
28.412

19.675
21.026
22.362
23.685
24.996
26.296
27.587
28.869
30.144
31.410

22.618
24.054
25.472
26.873
28.259
29.633
30.995
32.346
33.687
35.020

24.725
26.217
27.688
29.141
30.578
32.000
33.409
34.805
36.191
37.566

21
22
23
24
25
26
27
28
29
30

8.891
9.542
10.196
10.&56
11.524
12.198
12.879
13.565
14.256
14.953

9.915
10.600
11.293
11.992
12.697
13.409
14.125
14.847
15.574
16.306

11.591
12.338
13.091
13.848
14.611
15.379
16.151
16.928
17.708
18.493

13.240
14.041
14.848
15.659
16.473
17.292
18.114
18.939
19.768
20.599

15.445
16.314
17.187
18.062
18.940
19.820
20.703
21.588
22.475
23.364

17.182
18.101
19.021
19.943
20.867
21. 792
22.719
23.647
24.577
25.508

20.337
21.337
22.331
23.337
24.331
25.336
26.336
27.336
28.336
29.336

23.858
24.939
26.018
27.096
28.172
29.246
30.319
31.391
32.461
33.530

26.171
27.301
28.429
29.553
30.675
31.795
32.912
34.021
35.139
36.250

29.615
30.813
32.007
33.196

32.671
33.924
35.172
36.415
37.052
38.885
40.113
41.337
42.557
43.773

36.343
31.659
38.968
40.270
41.566
42.856
44.140
45.419
46.693
47.962

38.932
40.289
41.638
42.980
44.314
45.642
46.963
48.273
49.588
50.892

1
2
3
4

5

6

7
8
9
10

11

~4.382

35.563
36.741
31.916
39.087
40.256

(J)
~

~

Vi
-4

n
(J)

--

For degrees of freedom greater than 30, the expression v'2xi - V2n' - 1 may be used as a normal deviate with unit variance, where n' is the number of degrees
of freedom.
Reproduced from Statistical Methods Jor Research Workers, 6th ed., with the permission ot the author. R. A. Fisher. and his publisher. Oliver and Boyd, Edinburgh.

......

Cf
......
-.0

GENERAL MATHEMATICS

13-20
TABLE

Degrees
of
freedom 11

4.

STUDENT'S

t

DISTRIBUTION

*

Prbbability of a deviation greater thant

.005

.01

.025

.05

.1

.15

fj

63.657
9.925
5.841
4.604
4.032

31.821
6.965
4.541
3.747
3.365

12.706
4.303
3.182
2.776
2.571

6.314
2.920
2.353
2.132
2.015

3.078
1.886
1.638
1.533
1.476

1.963
1.386
1.250
1.190
1.156

6
7
8
9
10

3.707
3.499
3.355
3.250
3.169

3.143
2.998
2.896
2.821
2.764

2.447
2.365
2.306
2.262
2.228

1.943
1.895
1.860
1.833
1.812

1.440
1.415
1.397
1.383
1.372

1.134
1.119
1.108
1.100
1.093

11
12
13
15

3.106
3.055
3.012
2.977
2.947

2.718
2.681
2.650
2.624
.2.602

2.201
2.179
2.160
2.145
2.131

1.796
1.782
1.771
1.761
1.753

1.363
1.356
1.350
1.345
1.341

1.088
1.083
1.079
1.076
1.074

16
17
18
19
20

2.921
2.898
2.878
2.861
2.845

2.583
2.567
2.552
2.539
2.528

2.120
2.110
2.101
2.093
2.086

1.746
1.740
1.734
1.729
1.725

1.337
1.333
1.330
1.328
1.325

1.071
1.069
1.067
1.066
1.064

21
22
23
24
25

2.831
2.819
2.807
2.797
2.787

2.518
2.508
2.500
2 492
2.485

2.080
2.074
2.069
2.064
2.060

1.721
1.717
1.714
1.711
1.708

1.323
1.321
1.319
l.318
l.316

1.063
1.061
l.060
1.059
1.058

26
27
28
29
30

2.779
2.771
2.763
2.756
2.750

2.479
2.473
2.467
2.462
2.457

2 056
2.052
2.048
2.045
2.042

1.706
1.703
1.701
1.699
1.697

1.315
1.314
1.313
1.311
1.310

1.058
1.057
1.056
1.055
1.055

00

2.576

2.326

l.960

1.645

1.282

1.036

1
2
3
4

14

The probability of a deviation numerically greater than t is twice the
probability given at the head of the table.

* This table is reproduced from Statistical Methods lor Research Workers, with the generous
permission of the author, Professor R. A. Fisher, and the publishers, Messrs. Oliver and Boyd.

STATISTICS

13-21

REFERENCES
1. I. "V. Burr, Engineering Statistics and Quality Control, McGraw-Hill, New York,
1953.
2. H. Cramer, 'The Elements of Probability Theory, Wiley, New York, 1954.
3. W. J. Dixon and F. J. Massey, Introduction to Statistical Analysis, McGraw-Hili,
New York, 1951.
4. P. G. Hoel, Introduction to Mathematical Statistics, Wiley, New York, 1947.
5. A. M. Mood, Introduction to the Theory of Statistics, McGraw-Hili, New York, 1950.
6. J. Neyman, First Course in Probability and Statistics, Henry Holt, New York, 1950.
7. A. Wald, Sequential Analysis, Wiley, New York, 1947.
8. G. U. Yule and M. G. Kendall, Introduction to the Theory of Statistics, Giffin and
Co., London, 1937.
9. Symposium on Monte Carlo Methods, H. A. Meyer, Editor, Wiley, New York, 1956.
10. A. Hald, Statistical Tables and Formulas, Wiley, New York, 1952.

NUMERICAL ANALYSIS

B.

NUMERICAL ANALYSIS

Richard F. Clippinger
and Joseph H. Levin, Editors

14. Numerical Analysis, by Bernard Dimsdale

Murray Mannos
J. M. Cameron
R. F. Clippinger
J. B. Diaz
Bernard Friedman
Eugene Isaacson
Robert Richtmyer

B

NUMERICAL ANALYSIS

Chapter

14

Numerical Analysis
Richard F. Clippinger
and Joseph H. Levin, Editors

1. Interpolation, Curve Fitting, Differentiation, and Integration,
by Bernard Dimsdale

14-01

2. Matrix Inversion and Simultaneous Linear Equations, by Murray Mannos

14-13

3. Eigenvalues and Eigenvectors, by Murray Mannos

14-28

4. Digital Techniques in Statistical Analysis of Experiments, by Joseph M. Cameron

14-48

5. Ordinary Differential Equations, by Richard F. Clippinger

14-55

6. Partial Differential Equations, by J. B. Dia:z, Richard F. Clippinger,
Bernard Friedman, Eugene Isaacson, and Robert Richtmyer

14-64

References

14-88

1. INTERPOLATION, CURVE FITTING, DIFFERENTIATION,
AND INTEGRATION

Bernard Dimsdale

Definitions. Suppose j(x) is a function about which the following is
known: at each of n + 1 points Xo, XI, " ' , x n, called the basic set of points,
the numerical value of j or of one of its derivatives is known. It is to be
noted that X may represent one or more independent variables. Suppose
g(x; ao, aI, " ' , an) is given analytically and the a's are determined so
that g has the same numerical property as j at each point of the basic
set. Then g is called an interpolating junction for j, and R = j - g is called
the remainder.
14·01

14-02

NUMERICAL ANALYSIS

In the event that g is linear with respect to the a's, that is g(x; a)
aogo(x) + algI (x) + ... + angn(x) the interpolating function is called
linear, and the functions go, gl, ... , gn are called basic interpolating functions. In the further event that x is a single variable and gi(X) = Xi the
function g is called an interpolating polynomial.
If a function g(x; ao, ... , am) is given analytically for m ~ n, any requirement whatsoever on f - g over the basic set establishes g as a curve-fitting
function. If that requirement is that
n

2: [f(Xi)

- g(Xi; a)Fw(xi)

i=O

be minimal then g is a least square fit to f, relative to the weight function w,
which is presumed to be positive. Again g may be nonlinear, linear, or
polynomial.
Interpolation
General Solution of Interpolating Problem. For nonlinear g the
definitions imply that the a's can be determined by solving n + 1 simultaneous nonlinear algebraic equations. For linear g the equations for a
are linear and the problem is solved when an (n + 1)st order matrix is
inverted, which of course presupposes that it is not singular. No element
of this matrix depends on the values of f or its derivatives, so that the inverted matrix can be used for all those functions f for which the conditions
of interpolation, the basic interpolating functions, and the basic set of
points are the same.
Interpolating Polynomials for Arbitrary Basie Point Sets. If the
derivatives of f are not involved in the interpolation, then

f(n+l) (~)h(x)
R(x)

=

(n

+ 1)!

'

where hex) is the product of all x - Xv, v = 0, 1, ... , n; hi(X) is the same
except that~the factor X - Xi is deleted, f(n+l) (~) is the (n + 1)st derivative of f(x), ~ is an unknown function of x, but is some number between
the least and the greatest of the basic set of points. This is Lagrange's
formula, and has been put in practicable computing form by Aitken. Form
the table

NUMERICAL ANALYSIS
Xo
Xl
X2
X3

fo.o
fl,O
f2.0
fa.O

ft.1
f2.1
fa.1

14-03

12.2
fa.2

fa.3

where h.o = !(Xi),
j k.i+l = f i.i

+

(Xi - X)(h.i - !k.j)
I

Xk - xi

j> O.

If sufficient information about some derivative, say the pth, is available
to show that R = !(X) = O. Thus
four points would have been sufficient.
In the event that derivatives are also given, Neville's procedure applies
(see Ref. 1).
Interpolating Polynolllials for Uniforlllly Spaced Points. In the
event that the basic set of points has the property that Xp+l - Xp = h,
where h does not change with p, the procedure to be followed, if derivatives
do not enter, involves a difference table as follows:
Xp

~2fp_1

fp

~3jp-1

~fp

Xp+1

fpH

Xp+2

fp+2

Xp+3

fp+3

~2fp
~3fp

~fp+1

~2fp+1
~3fpH

~fp+2

~2fp+2

where ~kfq = ~k-lfq+1 - ~k-lfq and Ii = f(xj), that is, any element with
a ~ is the difference of its two adjacent left neighbors, and is obtained by

14-04

NUMERICAL ANALYSIS

subtracting the upper one from the lower one, and the subscripts on fare
constant along a line running diagonally downward to the right.
Let
u

(u)r

x -

Xo

= ---,
h

u(u -

=

1) ... (u - r

+ 1)

r!

,

then

and
R(x) = j a then Ij(n)(x) I ~ 1.36n! If it
is required that the remainder term shall not exceed 10-10, then for the
above n's the h's are 0.0008,0.013,0.03,0.06, and the number of evaluations
of integrand per unit b - a is 1250, 77, 33, 16 respectively.
Gauss's Forlllula. For any n let
Xi = a

+

(b - a)~i,

= b - (b -

a)~i,

i = 0, 1, "', [n/21,
i

=

[n/2]

+

1, "', n.

Then, for n = 2N

.r.
for n = 2N

N-l

b

f(x) dx = (b - a)[ANfN

+

E

Ai(fi

+ f2N-i)];

+1

.r.

b

EAMi
N

f(x) dx = (b - a)

+f2N-i),

where the A's and the fs are ,given in Table 4.
TABLE

4.

VALUES OF Ai AND Xi IN GAUSS'S FORMULA

n= 1
n=2

~o

n=3

~o

~o

6
h
n=4

~o

6
~2

n=5

~o

6
~2

n=6

~o
~I
~2

~a

=
=
=
=
=
=
=
=
=
=
=
=
=
=
=

0.21132 48654
0.11270 16654
0.5
0.06943 18442
0.33000 94782
0.04691 00770
0.23076 53449
0.5
0.03376 52429.
0.16939 53068
0.38069 04070
0.02544 60438
0.12923 44072
0.29707 74243
0.5

The remainder term is of order h2n+l.
Hobson (Ref. 6).

= 0.5
=]:\
= t
= 0.17392
= 0.32607
= 0.11846
= 0.23931
= 0.28444
= 0.08566
= 0.18038
= 0.23395
= 0.06474
= 0.13985
= 0.19091
Aa = 0.20897

Ao
Ao
Al
Ao
Al
Ao
Al
A2
Au
Al
A2
Ao
Al
A2

74226
25774
34425
43352
44444
22462
07865
69672
24831
26957
50253
95918

For further development, see

NUMERICAL ANALYSIS

14-13

Other Integration Methods. Tchebysheff has developed a method in
which the numerical integral has the form

!c(fo

+ il + ... + in)

which is useful if i represents data subject to uniform errors, since no error
is weighted more than another.
For multiple integration the methods given here may be applied repeatedly. If the number of repeated integrations is quite large, the Monte
Carlo method is useful.
For integrals over an infinite range and for infinite integrands, transformations of the variable of integration can frequently be found which
remove the difficulty.

2. MATRIX INVERSION AND SIMULTANEOUS
LINEAR EQUATIONS

Murray Mannos

General Remarks. The development of large scale electronic digital
computers has made it numerically possible to invert many large size
matrices and to solve large systems of linear equations heretofore considered impractical because of their Jarge size. Problems being attacked
by matrix inversion include:
(a) The numerical solution of a differential equation, a partial differential equation, or an integral equation satisfying boundary conditions
is often achieved by resolving the problem into a large approximating set
of algebraic equations.
(b) A nonlinear problem is frequently replaced by a sequence of linear
systems yielding successively improved approximations to the original
problem.
(c) Large systems of linear equations, at least in part, are serving as
preliminary models for economic and business type problems. The object in
linear programming (see Chap. 15), for example, is to maximize (minimize)
a linear objective function such as profit (cost) subject to the restraints
imposed by a system of linear equations (or inequalities). If the inverse
of the matrix of coefficients of a linear system of equations is already
known, the solution to the system is obtained by merely multiplying the
inverse by the column vector whose components consist of constants on
the right-hand side of the equalities. In the revised simplex technique
(Ref. 7) designed for solving linear programming problems it is the inverse
of certain basic column vectors that is calculated at each iteration or
stage of the algorithm.

14-14

NUMERICAL ANALYSIS

Practical ways of solving systems of linear equations are divided into
two categories: the direct and indirect methods.
(a) The direct method yields an exact solution in a finite number of
steps provided no roundoff errors are permitted.
(b) The indirect method usually involves an infinite number of iterations
to get an exact solution. In practice one accepts the fact that one cannot
get a precise answer but must be satisfied with a result sufficiently close
to the exact result. At this point in the indirect method the calculation
is broken off. To be really sure that the answer is sufficiently close either
some estimate of roundoff errors must be made or the closeness must be
determined perhaps by some physical considerations. Severity of roundoff
errors may easily render useless results. .
The discussion will be confined to matrices whose elements are real
and to linear systems whose coefficients are rea1. Many of the methods
and results described apply equally well to the complex elements and
coefficients simply by making appropriate word changes. Furthermore,
any matrix of order n wit~ complex coefficients may be represented by a real
matrix of order 2n.
No "best method" for either inverting matrices or solving linear systems
of equations can be recommended. For a given technique, a matrix or
a linear system of equations can always be constructed which will not
work too well but which may work better· with some other technique.
In some cases it is a combination of methods, perhaps a direct followed
by an indirect method, that works well for a system of linear equations.
Ill-conditioned matrices, of which the favorite seems to be the Hilbert
matrix, impose an extremely stringent test upon the accuracy of any given
matrix inversion technique. A measure of the ill-conditioning of a matrix
may be looked upon as the relative smallness of its determinant compared
with that of its individual elements. This wiI] suffice here although more
sophisticated measures could be used to interpret the notion of ill-conditioned matrices. The Hilbert matrix is denoted by H = (hij) where
hij = Iii + j + 1 (i, j = 1, 2, "', n).
Having obtained by a given technique a not entirely satisfactory approximation for the inverse of a matrix or for a solution to a system of
linear equations, one may consider using techniques for improving the
inverse of the matrix or the solution to the linear system of equations as
the case may be.
To facilitate the evaluation of procedures for matrix inversion or solution
of linear systems for use on digital computers, a summary table of approximate
storage requirements and number of operations is presented at the end of the
section.

NUMERICAL ANALYSIS

14-15

Matrix Inversion

Each nonsingular square matrix A of order n has an inverse A -1 such that

AA- I = A-1A = I.

(1)

If for A = (aij) the elements aij (i, j = 1, .. " n) are real, then the elements
bij of A-I = (bij) (i, j = 1, "', n) are also real. If the aij of the
matrix A arc specified, thc problem is to find the numbers bij of A-I.
For certain types of matrices this is relatively simple.
(a) If D = (dij) is a diagonal matrix, that is, dij = 0, i ~ j and d ii ~
(i = 1, 2, "', n), then the elements of its inverse D- I = (b ij ) are bij = 0,
i ~ j, and bii = l/dii (i = 1, 2, "', n).
(b) If T = (aij) is a nonsingular lower triangular matrix, that is, aij = 0,
i < j, and aii ~ (i = 1, 2, "', n), the elements of its inverse T- I = B
= (b ij ) can be obtained essentiaIly by solving a series of linear equations
in one unknown. Multiplying each of the columns of B by the first row
of T yields
allb l1 = 1;
(j = 2, "', n).
allblj =

°

°

This yields bl l = 1/all and blj =
ing B by the second row of T gives

°

°

(j = 2, "', n). Similarly, multiply- .
(j

= 1, "', n: j

~

2).

Substituting the known blj (j = 1, "', n) into the latter equations yields
new va1ues b2j (j = 2, "', n) from the resulting n linear equations in
each of these unknowns. By continuing in this way, multiplication of
each of the columns of B by the nth row of T gives
anIb ln

+ an2b2n + ... + annbnn = 1;
anIblj + an2b2j + ... + annb nj = °

(j = 1, "', n - 1).

Substituting the known bij (i = 1, "', n - 1; j = 1, "', n) yields the
values bnj (j = 1, .. " n) of the last row of B.
(c) An old standard method for inverting matrices is given by A -1 =
(l/det A)(· .. ), where the expression in parenthesis is the transpose of
the matrix of cofactors of the elements aij of the given matrix A. This
method is not to be recommended as practical for n greater than 3 or 4.
(d) If one has already computed the characteristic polynomial or better
stil1 the minimum polynomial of a matrix

+ a1xm-I + ... + am-IX + am; am ~ 0,
(-1/ am)(A
+ alAm-2 + ... + am-II) since A satisfies its

m(x) = xm

then A-I =

m-I

14-16

NUMERICAL ANALYSIS

minimum equation. In general it may be as much trouble calculating the
characteristic or minimum polynomial as it is to invert the matrix itself.
(e) Let Ai denote the ith row of the nonsingular matrix A and Ii the
ith row of the identity matrix. Then
n

Ai =

L

(j = 1, .. ·,n).

aij!j

j=l

If one has solved for the I/s in terms of the A/s, then
n

Ij

=

L

bjkAk,

k=l

and the matrix of coefficients of the latter equation is the desired inverse,
i.e., A-I = (b ij ). In general, this method is more cumbersome than a
number of the methods described below.
Jordan-Gauss Method. Write the matrix A with the identity matrix
beside it as shown

(2)

[Ull

a12

a1n

1

0

a21

a22

a2n

0

1

anI

a n2

ann

0

0

01

~j

A series of elementary row operations will be applied to A and these
will also be applied in the same order to I. When A has been reduced to
I by a series of elementary row transformations, then I will in turn be
transformed into A-I by the same transformations, and the process will
be finished. If A is nonsingular, then for some i = 1, ... , n it follows
that ail ~ o. One can by an exchange of rows guarantee that the element
in the first row of the first column is different from zero.
In case the matrix (aij) has been altered by an exchange of rows one now
denotes the left-hand matrix of (2) by (b ij ). Then adding to the ith row
- bi t/b l1 times the first row (i = 2, ... , n) the new left-hand matrix of
(2) takes the form

(3)

The minor of order n - 1 in the lower right-hand corner of the matrix
(3) has rank n - 1 so that at least one of the elements C2j ~ 0 (j = 2,

NUMERICAL ANALYSIS

14-17

n). Applying the same argument as before to this minor, all elements below
the diagonal element of column 2 of the left-hand matrix in (2) may be
reduced to zero. Similarly the element in the first row, second column
may be reduced to zero. The first column remains unchanged while the
second one has been altered to the desired form.
By continuing in this way the left-hand side of (2) may be reduced to
the diagonal form
.
bn

0

0

o

d22

0

o

0

Znn

with each diagonal element being different from O. By dividing the first
row of (2) by bl l , the second row by d22 , etc., the left-hand matrix of (2)
is finally reduced to the identity and the right-hand matrix is now A-I.
The diagonal elements bl l , d22 • •• of the first, second, ... columns which
are used to reduce the remaining elements of their respective columns to
zero are referred to as pivots. Care should be exercised whenever possible
not to select a pivot which is too small or too large; otherwise, loss of significance among other difficulties may arise. Numerous variations of the
use of elementary row operations for inverting matrices exist in the literature (Ref. 8).
Partition Method. Let the nonsingular n X n matrix A be partitioned
as

A = [All
A21

A12]
A22

where An is an m X m minor (m < n) which is likewise nonsingular.
Then the inverse A-I of A is given by the matrix

where Bll
B12
B21
B22
and

= A l1 - 1 +
= -XA-I,
= -A- 1y,
= L\ -1,

A -1 = [Bll
B21
1
XA- y,

B12]
B?.2

14-18

NUMERICAL ANALYSIS

Inverting a matrix of order n has been reduced to inverting a matrix of
m, and another of order n - m. However, one has to pay the price of
performing a number of matrix multiplications afterwards.
Morris Escalator Method. By starting with the inverse of the
2 X 2 principal minor M22 in the upper left-hand corner of the nonsingular
matrix A one may by the partition method obtain the inverse of the 3 X 3
principal minor M33 in the upper left-hand corner of A. Then M33 is
used to compute the inverse M 44 of the 4 X 4 principal minor in the first
four rows and columns. Step by step, one dimension at a time, the partition procedure is carried out until A -1 is obtained. The process is uninterrupted until the inverse of one of the M ii fails to exist, a fact which
is established by noting that the corresponding l1 i = O. This situation
is remedied by interchanging the ith row with an appropriate row, say the
jth, of the remaining n - i rows 6f A, computing the inverse of the new
i X i principal minor in the left-hand corner, and then continuing as before.
In order to obtain A -lone must interchange the ith and jth columns of
the resulting inverse so obtained. If several of the inverses of principal
minors encountered fail to exist, a similar procedure applies in each instance.
GraIn SchInidt Orthogonalization Method. Premultiplication of
the nonsingular matrix A by an appropriate matrix P transforms A into an
orthogonal matrix, i.e.,
(4)

PA =

o.

Since the inverse of an orthogonal matrix is its own transpose, it follows
from eq. (4) that

A-I = A'P'P,
where P = DN

0

o

o

o
o

0

o

1

o

0

Ci,i-l

1 0

1

0

o

o

1

o

0

o

0

o

0 1

o
o
o

o

0

o

o

1

0

NUMERICAL ANALYSIS

1/IQ11

0

0

0

1/IQ21

0

0

0

1/IQnl

14-19

D=

and in turn
Cij

=

AiQ'j
---I

QjQ'j

j =

1, 2, "', i-I,

where Ai denotes the ith row of A, Qj denotes thejth row of Q = NA, and
IQi I denotes the length of the ith row of Q considered as a row vector.
Inversion of Modified Matrices. If the inverse of a matrix A is
known, the inverse of a matrix differing from A in only an element, a row,
or a column can be found as a result. If the matrix differs from A by several
elements, rows, or columns, its inverse may be realized by repeated application of this method. The method is based on the matrix identity
(5)

(A

+

XV') -1 = A -1

_

( A-1

)( 'A-I)
X Y
(1 + y'A -IX)

I

where x and yare arbitrary column vectors. The matrix xy' can be made
to consist of all zeros except the element in the ith row and jth column
where it is to contain a fixed value c. This is easily achieved by taking
x = cei and y = Cj where Ci is the unit column vector containing a 1 in
the ith position and 0 elsewhere. By taking y = Ci the matrix XV' has x
for its ith column and all other columns consist of zeros. Hence, if the
vector x stands for the vector difference of the ith column of the matrix
whose inverse is desired and the ith column of A, the required inverse is
obtained from eq. (5). A similar argument applies if the matrices differ
only in one row.
Illlproving a COlllputed Inverse (Hotelling and Bodewig, see Refs.
9 and 10). Suppose that the matrix Co is considered a sufficiently good
approximation to the inverse of the matrix A so that B = I - ACo has
very small elements. If necessary, for some specific purpose, the computed
inverse can be improved by forming the sequence
k

= 1,2, ....

Actually, the sequence converges to A -1 and so A -1 is expressible in the
following form of an infinite product

14-20

NUMERICAL ANALYSIS
00

=

A-I

(6)

Co II (I

+ B 2k-

1
).

k=1

Very frequently the improvement found by computing Co(I + B) or
perhaps Co(I + B)(I + B2) is sufficiently satisfactory. Although there
are a number of variations other than eq. (6) for expressing A-I, the
present scheme has some merit when using an electronic digital computing
machine, since it is only necessary to keep successively squared powers of
B, adding this to the identity matrix I, and premultiplying by the last
computed approximation to A-I.
Systmns of Linear Equations. Direct Methods

Direct methods arrive at an exact solution in a finite sequence of arithmetical operations.
Elhnination. Given a set of m ~ n linear equations in n unknowns

+
+

al1 x l

(7)

a21 x l

aml Xl

+

a22 X 2

+ ... +
+ ... +

a m2X2

+ ... +

a12 x 2

a2n X n

= b1
= b2

amnXn

= bm

alnXn

or more briefly in matrix notation

Ax = b,
the augmented matrix (A Ib) is operated on by a sequence of elementary
row operations which reduce the matrix of coefficients A to echelon form
(see Chap. 3). If a row of the reduced form of (A Ib) is of the form (0,
0, ... , 0, c), where c ~ 0, the system (7) is inconsistent; otherwise, it is
consistent. Arbitrary values are assigned to those x's which do not correspond to a leading coefficient of 1 in some line; while the remaining x's
may be solved for in terms of these parameters one at a time as a linear
equation in one unknown whose coefficient is l.
Note. In the remainder of this section only the case with m = nand
the matrix A nonsingular will be considered.
Use of Cramer's Rule. Let A(Ie) denote the matrix constructed from A
in (7) by replacing column Ie by the column b of right-hand coefficients.
Then the unique solution to (7) is given by Cramer's rule in the following
form as a ratio of determinants
Xk

For n

=

detA(Ie)
detA

(le=l,···,n).

> 3 or 4 this method is not to be recommended as efficient.

NUMERICAL ANALYSIS

14-21

Known Inverse. If the inverse A -1 of A has already been calculated
by any of the previously described or perhaps other methods, the solution
in matrix form is given by
x = A-lb.

However, if A -1 must be computed for the sole purpose of getting x, the
method is not always efficient for large values of n.
Conjugate Gradient Method. Most of the iterative schemes involve an
infinite number of iterations and so are classified as indirect methods.
However, an outstanding iterative scheme called the conjugate gradient
method involves but a finite number of iterations and so is classified as
a direct method. Because of the way in which the algorithm for this
scheme is built up, it seems more appropriate to discuss it after the gradient
method, an indirect method. The elegant finite algorithm for the conjugate gradient method seems to have been independently discovered by
Stiefel, Hestenes, and Lanczos (Ref. 11). For a linear system Ax = h,
det A ~ 0, of n equations the algorithm starts with an initial guess Xo
building up successive approximations xI, "', Xn and finally terminates
after at most n of these steps or iterations. The corresponding residual
vectors
(i = 0, 1, "', n)

°

so formed are mutually orthogonal to the preceding ones. If ri ~
(i = 0, 1, "', n - 1) then rn orthogonal to each ri means rn must be the
null vector 0; since n + 1 linearly independent vectors of dimension n
cannot exist.
SysteIlls of Linear Equations: Indirect Methods

By and large this discussion includes most iterative methods since it
takes an infinite number of steps to carry through the whole process. An
iteration for solving a system of linear equations is a set of rules for operating on an approximate solution (Xl (k), " ' , Xn (k») to obtain an improved
or more precise solution (Xl (lc+l) , " ' , Xn (k+l)). The sequence of approximate solutions so defined must converge to the actua] solution of the
given system of equations. In some cases it is a pronounced advantage
to start out with a rather good initial approximation (Xl (0), " ' , Xn (0»),
whereas in others this is not necessarily true. It is frequently advantageous
to improve the solution obtained by a direct method by a few iterations,
since the direct solution usually is afflicted with roundoff errors.
Seidel Method. One starts off with a guess (Xl (0), X2 (0), " ' , Xn (0»)
as the initial solution to the linear system (7). Substituting in the first
equation of (7) the values X2 (0) for X2, X3 (0) for X3, " ' , and finally Xn (0)
for X n , and then solving for X yields a new value Xl (1) as the first component

14-22

NUMERICAL ANALYSIS

of the next approximate solution. Next by substituting in the second
equation of (7) the newly gained value Xl (1) for Xl and X3 (0) for X3, " ' ,
Xn (0) for X n , and then solving for X2, one obtains a new value X2 (1) as the
second component of the next approximation. Continuing in this fashion
and finally substituting in the nth or last equation of (7) the values Xl (1) for
XI, X2 (1) for X2, " ' , Xn-l (1) for Xn-l and solving for Xn yields the final
component Xn (1) of the new iteration (Xl (1), X2 (1), " ' , Xn (1»). J-'he approximation (Xl (1), X2 (1), " ' , Xn (1») is used primarily in the next iteration
to obtain the improved approximation (Xl (2), X2 (2), " ' , Xn (2»). One
continues in this way.
The process is very \vell adapted to machine usage. Convergence is
assured when either the matrix of coefficients A is positive definite or \vhen
the diagonal element of the ith row dominates the rest of the row for each
i, that is, when
(i = 1, 2, .. " n).
Convergence is also guaranteed for additional types of matrices, and there
are a number of variations of this procedure. In particular, the "back
and forth" Seidel method due to Aitken and Rosser was especially designed to handle those cases in which convergence of the regular Seidel
method was erratic.
Relaxation Method. First write the system (7) in the form

(8)

bl

-

allXl -

a12 x 2 -

••• -

alnXn

b2

-

a2l X l

a22X2 -

••• -

a2nXn

-

= 0
= 0

and assume that none of the diagonal elements aii (i = 1, "', n) is equal
to zero. Then take x(O) = (Xl (0), X2 (0), " ' , x/O), " ' , Xn (0») as an initial
guess to the solution. If it should accidentally happen that x(O) satisfies
(8), one is finished. If not, define the residual vector by reO) = (rl (0),
r2(0), " ' , rn(O»), where r/O) (i = 1,2, "', n) is the value or residual obtained by substituting x(O) in the left-hand side of the ith equation of (8).
Suppose that r/O) is a component of largest magnitude in r(O). The object
then is to reduce the residual r/O) to 0 by altering the value of the ith
component x/O) of x(O) while keeping the remaining components of x(O)
fixed. The next trial solution x(l) is constructed as follows:
Xk(l)

X/I)

=

=

(k = 1, "', n; k ~ i)

Xk(O)

r·(O)

Xi(O)

+ -~-.
aii

This effects a new set of residuals

r(l)

with ith residual equal to O. Seleci

NUMERICAL ANALYSIS

14-23

the residual of maximum magnitude and similarly apply the same scheme
as above to obtain X(2). This process is repeated again and again so as
ultimately to reduce all residuals to as close to 0 as possible.
It is sometimes possible to speed up convergence by picking residuals
not necessarily of maximum magnitude. In fact, by varying several of
the variables at one time it may be possible to speed up convergence considerably. However, it would be very difficult to write a code including
many such variations and tricks.
Note. In the following sections it is often convenient to introduce a
measure or metric different from the usual one in order to cut down on the
amount of computation required.
ApproxiIllations. Let A be a symmetric positive definite matrix, then
the length of a vector x with respect to the metric A is defined as I x IA =
(x' Ax)Y2, and any two vectors x and yare conjugate or A-orthogonal if
x' Ay = O. These are extensions of the usual definitions of length and
orthogonality. The latter may be obtained from the new definitions by
taking A = I.
With respect to the usual metric, IAx - b 12 = 0 if and only if Ax - b = '
O. This means that solving Ax = b is equivalent to finding an x such that
IAx - b 12 is minimized, since it is known that 0 is its minimum value.
Likewise with respect to the generalized metric B, lAx - b IB2 = 0 if
and only if Ax - b = 0 since B must be positive definite.
Now let
(9)

f(x) = lAx - b IB2

and consider the family of hyperellipsoids
(10)

f(x) = k,

where k may take on any constant value. Then the solution of the system
Ax = b is the common center of the family of ellipsoids eq. (10). The
game then is to construct a set of approximations x(O), x(1), ••• which get
us to or close to this center. The more rapidly this happens the less
computation is involved.
Gradient Method. Start with a guess x(O) as an initial approximation
to the solution of Ax = b. The ellipsoid of the family (10) obtained by
setting k = f(x(O» passes through the point x(O) in n-dimensional space.
Then proceed in the direction of the gradient of -f(x) at x(O) that is,
along the inner normal to the ellipsoid f(x) = f(x(O». It is known that
f(x) decreases most rapidly along the latter direction and so it is natural
to proceed in this direction until one arrives at the minimum of f(x) along
this inner normal. This happens at that point x(l) where the inner normal
becomes a tangent to one of the family of ellipsoids in eq. (10). Similarly,
proceed along the inner normal of the ellipsoid f(x) = f(x(l) until the

14-24

NUMERICAL ANALYSIS

minimum of f(x) in this direction is reached. Continue in this way and
work in closer and closer to the common center of the family of ellipsoids
in (10).
The algebraic procedure for solving Ax = b with respect to the metric
B according to the geometric scheme described above is as follows:
(a) Compute C = A'BA
c = A'Bb.
(b) Make an initial guess x(O) •
(c) Use the following algorithm to obtain the approximation x(i+1)
from that of x(i). (i) Calculate the vector z(i) in the direction of the gradient
of f(x) at x(i); i.e., z(i) = Cx(i) - c. (ij) Calculate

(iii) Obtain x(i+l) = x(i) - aiz(i), where the coefficient ai determines the
minimum value of f(x) in eq. (9) along the inner normal to f(x) = f(X(i»)
,at x (i) •
If A is a symmetric positive definite matrix, it is most convenient to
choose the metric B = A -1, for then A replaces Band b replaces c throughout the above algorithm with a resulting simplification.
A considerable advantage of the gradient method is that there need
not be an accumulation of roundoff error since the vector z(i) along the
gradient can be recalculated for each iteration. The function f(x) in eq.
(9) may be regarded as a measure of the closeness of an approximation
x to the true solution A -lb. For the gradient method it is true that
f(x(i+l)) < f(x(i») for each i and that f(x(i») approaches 0 in the limit;
that is, X(i) converges steadily toward the true solution A -lb. However,
it is still true that the convergence may be slow or, in other words, it may
take many iterations to get close to the center of the ellipsoids. A number
of variations of the gradient method have been devised to try to speed up
the convergence.
Conjugate Gradient Method. First consider the case where A is
symmetric and positive definite. The object in the conjugate gradient
method as in the gradient method is to get to the common center of the
family of ellipsoids eq. (10). However, the route taken in the conjugate
gradient method is different from that of the gradient method and is so
modified as to get to the center of the family eq. (10) in but a finite number of steps, namely, at most n iterations.
The procedure in three dimensions will be described. The discussion in
higher dimensions follows along similar lines.
As before, make an initial guess x(O) and proceed from x(O) along the
negative gradient of f(x) or what is the same along the inner normal of

NUMERICAL ANALYSIS

14-25

the three-dimensional ellipsoid f(x) = f(x(O»). Take as the next approximation the point x(l), which is the midpoint of the resulting chord of
the ellipsoid f(x) = f(x(O»). Consider the diametral plane through x(l)
containing the locus of midpoints of the chords of f(x) = f(x(O»), which
are paral1el to the direction of the inner normal. The diametral plane so
formed cuts out a two-dimensional el1iptic cross section from the el1ipsoid
f(x) ~ f(x(O»). The common center of the ellipsoids (10) of interest lie
in this two-dimensional elliptic cross section, and the method is designed
so that all subsequent approximating points shall remain trapped in this
cross section. The diametral plane of f(x) = f(x(O») is likewise a diametral
plane of the interior ellipsoid f(x) = f(x(l)) of the family (10) and cuts
it in a two-dimensional e1liptic cross section lying within the previous one
cut from f(x) = f(x(O»). Next proceed from X(l) along the gradient of
f(x) within the last elliptic cross section formed and take for X(2) the
midpoint of the chord so formed in the el1ipse. In other words, instead of
proceeding from x(l) along the inner normal of the ellipsoid f(x) = f(X(l»)
as in the gradient method, proceed along the inner normal of its cross
section made by the diametral plane through x(l).
Again the locus of centers of chords parallel to the chord through x(l)
and X(2) forms a diameter of the elliptic cross section of f(x) = f(x(l)),
which contains not only X(2) but also the center of the family (10). Next
proceed from X(2) along this diameter, choosing its center as the new and
final approximation 'X(3). By barring roundoff error, X(3) yields the exact
solution to a linear system of three equations in three unknowns. If either
of the chords mentioned above passing through x(O), x(l) happens to pass
also through the center of the family (10), the process will end in only one
or two iterations, respectively, instead of three. This will be indicated by
the residual r(l) = 0 or r(2) = 0, respectively.
Algorithms for a symmetric positive definite matrix of order n and for the
general n-dimensional case, respectively, will be given below, where Pi
denotes a vector in the direction of X(i) to x(i+l).
(a) Pick x(O); then let p(O) = r(O) = b - Ax(O),
1r(i) 12

(b) a·=
t

(11)

(p(i»), Ap(i)

,

+ aiP(i) ,

(c)

x(i+l)

= x(i)

(d)

r(i+l)

= r(i) - aiAp(i) ,
1r(i+l) 12

(e) bi =
(f)

p(i+l)

1r(i) 12

=

,

r(i+l)

+ biP(i),

\

L

14-26

NUMERICAL ANALYSIS

where the coefficient ai is selected to make X(i+l) the appropriate distance
from x(i) and bi to keep p(i+l) in the appropriate direction as described
above.
The algorithm eqs. (11) may be applied to a matrix which is symmetric
and positive semidefinite as well as to a symmetric positive definite matrix.
In the case where A is a general matrix, the system Ax = b is replaced
by the equivalent system
(12)

A'Ax = A'b,

where A' A is a symmetric and positive semidefinite. The algorithm
(11) could thus be applied to eq. (12), but in order to avoid the roundoff
errors due to computing A' A, it is better to use the following algorithm
which leads to theoretically equivalent results.

(a) Pick x(O), then let r(O) = b :... Ax(O) , p(O) ~ A'r(O) ,
(b) ai =

IA'r(i) 12
1Ap(i) 12'

(c) x(i+l) = X(i)

+ aiP(i),

(d) r(i+l) = r(i) - aiAp(i>,
1A'r(i+l) 12

(e) bi =

1A'r(i) 12 '

(f) p(i+l) = A'r(i+l)

+ biP(i).

The conjugate gradient method has numerous advantages in addition
to those already mentioned. One may start all over again with the last
approximation obtained as the initial approximation in order to nullify
the effects of accumulated roundoff errors. Also, each successive approximation is better than its predecessor. It is very important to note
that the given matrix is unchanged during the procedure so that the original data are used again and again. This permits use of special properties
of the given matrix such as its particular form or sparseness. A number
of variations of this technique have been devised.
A great many of the most important works in the field are to be found
in the extensive bibliographies of works by Forsythe and Householder
(Refs. 8, 9, 12, and 13).
COIllputer Storage RequireIllents and NUIllber of Operations.

Storage requirements for a given problem will vary in gener·al with the
machine, with the programmer, and with the layout of the program.
Hence, in Table 5 the number of storage locations required for the program
of a given technique of matrix inversion or solution of a linear system shall
simply be denoted by the symbol w.

NUMERICAL ANALYSIS

14-27

A multiplication or a division wil1 be identified simply as a multiplication.
Likewise an addition or a subtraction will be identified as an addition.
Since a multiplication requires from about 2 to 10 times as much time as
an addition on most computers, greater weight should be accordingly
apportioned to the number of multiplications. If the number of multiplica-:tions required for a given technique turns out to be, for example 2n3 + 3n
1; then 3n + 1 is negligible compared with 2n 3 when n is sufficiently large.
One says the number of multiplications required in this case is of the order
2n3 , and this is simply indicated by 2n3 •
In the case of the indirect procedures such as the Seidel, relaxation, and
gradient methods the number of iterations necessary for a satisfactory
solution varies from problem to problem. In fact, the number of iterations
required depends upon the original system of equations, the choice of the
initial solution, and the accuracy stipulated beforehand. In these cases
storage requirements and the number of operations are given for one iteration. For the conjugate gradient method these will be given totally for
all n iterations.

+

TABLE

n

5.

COMPUTER STORAGE REQUIREMENTS AND NUMBER OF OPERATIONS
FOR MATRIX INVERSION AND LINEAR SYSTEMS OF EQUATIONS

=

the order of matrix involved

w = the number of storage locations required for the computer program

of a given technique.
Method

Storage Requirements

Multiplications

Additions

111 atrix Inversion

+w
+w
~·n2 + w
n 2 + 2n + w

n3

2n 2
n2

Jordan-Gauss
Morris escalator
Gram-Schmidt
Modified matrix

-~n3
~ln3

(a) one element n 2
(b) one row or
column 2n 2
(c) whole matrix 2n3

n

3
3

in
J-I_n 3
(a) n 2
(b) 2n 2
(c) 2n 3

Linear Systems of Equations

Elimination
Seidel
(one iteration)
Relaxation
(one iteration)
Gradient
(one iteration)
Conjugate gradient
(one iteration)

+n +w
n +n +w
n +n +w
n 2 + 5n + 1 + w
Symmetric positive definite
2n + 6n + 2 + w
General case
4n + 5n + 2 + w

n 3/3

n 3/3

2

n2

n2

2

n2

n2

2n2

2n2

2

n2

n2

2

3n2

3n2

n2

14-28

NUMERICAL ANALYSIS

3. EIGENVALUES AND EIGENVECTORS

Murray Mannos

General ReInarks. The characteristic equation of a matrix together
with the corresponding eigenvalues (characteristic values) and eigenvectors
(characteristic vectors) plays a fundamental role in the theory of mechanical
or electrical vibrations. Examples: the flutter vibrations of an airplane
wing, the elastic vibrations of a skyscraper or bridge, the buckling of an
elastic structure, the transient oscillations of an electric network, and
mechanical wave vibrations of molecules and atoms. Similar remarks
concerning direct and indirect methods, roundoff errors, etc., apply to
the finding of the eigenvalues and eigenvectors as to the inverting of a
matrix and the solution of a linear system of equations (see Refs. 8, 9,
and 12-14).
In practice, it usually happens that all the eigenvalues of a matrix are
distinct. This gives rise to a matrix A which can be diagonalized by a
similarity transformation. Under a similarity transformation the eigenvalues of A remain invariant. A symmetric matrix can be diagonalized
by an orthogonal transformation and similarly a Hermitian matrix can
be diagonalized by a unitary transformation. Hence, these types of matrices are frequently singled out for special treatment by somewhat less
general methods than apply to the most general type of matrix. Matrices
which cannot be diagonalized by means of a similarity transformation
or whose eigenvalues are multiple or very closely spaced cause the procedures to become more complex. Results concerning the bounds of eigenvalues are sometimes useful in helping to isolate them. In numerous cases
it suffices to find either the dominant or the least eigenvalue.
The elements of the matrix A will usually be complex elements but they
may be confined to be real numbers in some instances. The matrix A
itself will always be of finite order.
Approximations for digital computer storage requirements and number of
operations for finding the eigenvalues and eigenvectors of a matrix cannot
be given as readily as in the cases of matrix inversion and the solution of
systems of linear equations. This is because the solution of an eigenvalue problem often consists of a number of major segments, such as an
iteration, the reduction of a matrix to a direct sum of triple diagonal
matrices whose sizes depend on the original matrix, the solution of complex
equations, or the evaluation of transcendental functions at specific places,
or the consideration of a sequence of Sturm functions. In the case of the
triple diagonal method, consideration of the computer aspects has been

NUMERICAL ANALYSIS

14-29

broken down in terms of the more important segments. Similarly, computer information for one step of the reduction process for finding eigenvalues of a symmetric matrix by the Jacobi method is also given in Table
G at the end of this section.
Characteristic Polynonlial. The characteristic polynomial f(x) of a
matrix A of order n over the complex number system may be defined as

(13)

det (AI _ A)

=

det

[~~21~1I..:~1~a2~
-anI

=

An

+ CIA

n

-

+ ... +

-a2n

1

~ ~ ~n~

-a n2
1

n

-al

Cn

= f(A).
The matrix AI - A has elements which are polynomials in A with complex coefficients. The characteristic polynomial may be found by the following methods:
1. The theory of determinants for such matrices is developed along the
same line as for those matrices whose elements are real or complex numbers.
Hence, the det (AI - A) can be expanded along any row or column to obtain
its characteristic polynomial. This method is not to be recommended for
n> 3.
2. The coefficients ClI C2, " ' , Cn of the characteristic polynomial in eq.
(13) may be obtained from subdeterminants of the matrix A itself: CI =
- (0,11
a22
ann) is the negative of the sum of the diagonal elements of A or simply the negative of the trace of A; C2 is the sum of the
determinants of the 2 X 2 principal minors of A (i.e., the totality of minors
having two of their elements on the diagonal of A); C3 is the negative of
the sum of the determinants of the 3 X 3 principal minors of A, "', Cn
= (_l)n det A. Likewise, this method is not to be recommended for
n> 3.
3. A finite iterative scheme based on repeated premultiplication by the
matrix A yields the coefficients ClI C2, " ' , Cn of (13) also. This is the socaned Souriau-Frame algorithm.

+

+ ... +

Al = A,

Cl

= - trace AI,

Ck

= - tracek'

Ak

(k = 2, 3, "', n)

4. Another way of finding the characteristic polynomial of a matrix A is

NUMERICAL ANALYSIS

14-30

to build it up one degree at a time by finding the characteristic polynomial
of the upper left-hand minors of A in increasing size.
Let Mi denote the upper left-hand minor of A of order i, Ii the unit
matrix of order i,and fiCA) the characteristic polynomial of Mi.
Since
it follows from a consideration of the last column that

o

o
+

(14)
bi-l,i(A)

bi-l;i(A)

fi-I(A)

ii-I (A)

where

bi -1.i(A)
fi~I(A)

is the ith or last column of adj (Ali - M i).
From the first i - I rows of the expressions in eq. (14) the coefficients
of the polynomials bki(A) (k = 1, 2, ... , i-I) are determined by comparing the various powers of A. The leading coefficient of each of the bki(A)
(k = 1, 2, ... , i-I) is determined by comparing coefficients of Ai-I.
Then by using these known coefficients and by comparing coefficients of
Ai - 2 , the second coefficients of each of the polynomials bki(A) (k = 1, 2,
... , i-I) are obtained. By continuing in this way the bki (k = 1, 2, ... ,
i-I) are completely determined. If the known bki (k = 1, 2, ... , i-I)
are now substituted in the resulting equation formed by setting the ith or
last rows of eq. (14) equal, the polynomial fiCA) is determined.
One first forms fl (A) = A - au and uses the above technique to find
f2(A) from II (A), etc., until finally f(A) = fn(A) is obtained from fn-l (A).
5. The method of finite iterations may be used to obtain a polynomial
equation from which some of the eigenvalues of a matrix A may be obtained.
Let x ~ 0 be an arbitrary vector and form Ax. If x and Ax are linearly

NUMERICAL ANALYSIS

14-31

independent, form A 2X. If x, Ax, and 11 2X are linearly independent, form
A 3X , etc. Continue in this way until one ultimately comes to a sequence x,
Ax, A 2X , " ' , A kx, which is linearly dependent. This must happen for
Ie ~ n, since at most n vectors are linearly independent. 'That is,
(15)

Form the corresponding polynomial
(16)
The polynomial Pk(A) of eq. (16) is a factor of the minimum polynomial
meA) of A, which will be defined explicitly in the subsection on eigenvalues
and eigenvectors. If lc = m where m is the degree of meA), then Pk(A)
coincides with the minimum polynomial meA). Finally if k = n, then
Pk(A), the minimum polynomial meA), and the characteristic polynomial
f(A) all coincide.
The coefficients CI, C2, " ' , Ck are obtained from eq. (15) by forming a
set of linear equations resulting from a comparison of components.
6. The necessity of testing for linear dependence and for solving a system of linear equations are disadvantages of the method of finite iteration.
However, the polynomial eq. (16) may be obtained while avoiding these
disadvantages by the so-called method of minimized iterations due to
Lanczos (Ref. 47). .
Lanczos employs a finite algorithm involving the sequences of polynomials:
Po(A) = 1
PI (A) = (A - aO)Po(A)
P 2 (A)

= (A - al)PI(A) - boPO(A)

and the vectors given by the equations:
(17)

where
(18)

ai-l = - - - y'i-IXi-1

and Xo ~ 0, Yo ~ 0 are not orthogonal but otherwise arbitrary vectors.
The algorithm proceeds to calculate the vectors Xi-l and Yi-l until one

14-32

NUMERICAL ANALYSIS

of them becomes zero and the process terminates. From Xo and Yo one gets
ao from the left-hand equation of (18) by setting i = 1. This determines
the polynomial PI(A) and in turn one gets the vectors Xl and YI from eq.
(17) by setting i = 2. From Xl and YI one gets the coefficients al and bo
from (18) by setting i = 2. This in turn determines P 2 (A) from which one
determines the vectors X2 and Y2 by the use of eq. (17) with i = 3. Continuing in this manner ultimately shows that either the vector Xk = 0 or
Yk = 0 for some k. When this occurs the polynomial Pk(A), whose coefficients are now determined, is singled out. The polynomial Pk(A) as before
is a factor of the minimum polynomial meA) and coincides with meA) if
k = m, and with the characteristic polynomial f(A) if k = n.
Deterlllination of Eigenvalues and Eigenvectors. f(A) = 0 is called
the characteristic equation of the matrix A and the n roots of this equation
are called the eigenvalues of the matrix A. From eq. (13) it follows that if
A is an eigenvalue of A, then det (AI - A) = f(A) = 0 so that the system
of linear equations
Ax = AX
has a nontrivial solution X ~ 0, and any such solution X ~ 0 is called an
eigenvector of the matrix A.
Once the coefficients of the characteristic polynomial have been determined, the characteristic equation can be solved by Graeffe's, BernouIli's,
or any other known method for solving a polynomial equation to obtain
the eigenvalues of A. If n is fairly high, a large amount of precision in the
calculations must be exercised or roundoff error may easi1y invalidate the
results.
Apart from multiplicity it is possible to find the eigenvalues of A by
considering a polynomial equation of lower degree than n. In this connection the minimum polynomial of the matrix A will be defined below. By
the well-known Cayley-Hamilton theorem it follows that f(A) = O. In
general, however, A satisfies polynomial equations of lower degree than
n = deg f(A). One denotes by meA) that polynomial of lowest degree with
leading coefficient 1 such that meA) = o. This polynomial is unique.
Furthermore, the minimum polynomial meA) divides the characteristic
polynomial f(A), and each of the eigenvalues of A is a root of meA) = o.
The multiplicity of such a root A of meA) = 0 is less than or equal to the
multiplicity of A as a root of f(A) = o. Hence, if a certain procedure leads
to the construction of the minimum polynomial of A, it may be sufficient
to obtain the necessary information concerning the eigenvalues of A from
meA), which may be of considerably lower degree than the characteristic
polynomial of A and so easier to work with. If one denotes by g(A) the
greatest common divisor of the polynomial elements of adj (AI - A) it
may be shown that

NUMERICAL ANALYSIS

meA)

14-33

!(A)
yeA)

Direct Methods

Apart from roundoff errors, the procedures described under the heading
of direct methods terminate in a finite number of steps with exact results.
The Escalator Method. If the eigenvalues of a symmetric matrix
Ai of order i are known and distinct and the eigenvectors are also known,
the symmetric matrix

obtained by bordering Ai with an additional row and column also has eigenvalues and eigenvectors which can be found in terms of the eigenvalues and
eigenvectors of Ai. Furthermore, the eigenvalues of A i +1 are distinct and
interlace with those of Ai.
Let Ak (Ie = 1, 2, ... , i) denote the eigenvalues of Ai, and Uk denote the
eigenvectors of Ai. Then the eigenvalues of A i +1 are obtained by solving
the equation

for the i + 1 values of fJ. which satisfy this equation.
The eigenvector Vk (Ie = 1, 2, ... , i + 1) of A i +1 corresponding to fJ.k
(Ie = 1,2, ... , i + 1) may be given by
Vk = (UCfJ.kI - A)-lU'ai+l, 1),

where U =

Ie = 1, 2, ... , i

+ 1,

CUb U2, ••• , Ui),

A=

Ai
and UCfJ.kI - A)-lU'ai+l give the first i components of Vk.
Starting with the matrix (an), which has an eigenvalue of an and an
eigenvector 1, yields the eigenvalues and eigenvectors of the matrix

14-34

NUMERICAL ANALYSIS

and continuing step by stepAinally yields the eigenvalues and eigenvectors
of the matrix A itself. It should be observed that it is necessary to calculate the eigenvalues and eigenvectors of each of the submatrices Ai (i = 2,3,
... , n - 1) as well as of the matrix A itself.
Triple Diagonal Method. Let A be a real symmetric matrix. The
method consists of first reducing the matrix A to a triple diagonal form by
means of a specia11y formed orthogonal transformation to be described below. Then the eigenvalues of the resulting matrix S, which are the same
as those of A, are obtained with the aid of a Sturm sequence of functions
consisting of the determinants of the first principal minors or upper lefthand corner minors of the matrix AI ~ S. Then also the eigenvectors of S
associated with an eigenvalue A are obtained directly from the solution of
the homogeneous equations (AI - S)x = 0 because of their exceedingly
simple form. From the eigenvectors of S, one then constructs the eigenvectors of A itself.
1. In the triple diagonal form of a matrix each element not on the main
diagonal, the diagonal just above it, or the diagonal just below it is o. To
obtain this form one attempts by appropriate orthogonal transformations
to reduce to 0 all elements of the first row beyond the second column and
likewise an elements of the first column beyond the second row. If all
these elements are already 0, no manipulation is required. If not, one next
looks at the element a12. If a12 = 0 and a1j is the first nonzero element of
the first row following a12, interchange the second and jth columns and do
likewise with the second and jth rows. Thus the new element in the first
row second column is nonzero. If a12 ~ 0 to begin with, look at the element a13. If a13 = 0, make an exchange similar to the one above so as to
bring a nonzero element into its position. If a13 ~ 0, postmultiply A by
the orthogonal matrix R 23 and premultiply A by R 23 -1 = R' 23, where

1 0

(19)

and

0

0

0

0

0

C

-8

0

0

0

0

8

C

0

0

0

R 23 = 0 0

0

1

0

0

0

0

0

0

1

0

_0

0

0

0

0

1

c=

[1 + (:::Yf';

8

=

[~23

=

(a '")c.

~nJ

a12

This amounts to a rotation in the x2x3-plane, and c,

8

are the cosine and

NUMERICAL ANALYSIS

14-35

sine, respectively, of appropriate angles for making the element a'13 of the
matrix R 23 -1 AR 23 = (a'ij) equal to 0 and hence also making a' 31 = O.
Also a'12 has larger magnitude than a12, and so a'12 ~ 0 also. Furthermore,
a'Ii = alj and a'il = ail (i, j = 4, "', n).
If a'I4 ~ 0, one may interchange the third and fourth columns and the third and fourth rows of
(a'lj) and apply the same type of transformation as before, and give rise
to an additional 0 in the first row and column of the newly formed matrix.
If a'14 = 0, one looks at a'15, etc. By continuing in this fashion one forms
a new matrix whose first row and first column, except possibly for the first
two elements in each case, consist of zeros.
2. The same scheme can next be applied to the resulting subma'trix of
order n - 1 in the lower right-hand corner. Here instead of R 23 one uses
R34 where

1 0
(20)

R34

=

LJ

U 34

,0

[

o

0

to reduce all elements in first row and column of the (n-l)-st order submatrix to zero except possibly for the first two elements. No elements
in the first row or column of the nth order matrix are affected by this.
Continue in this way and, if necessary, finally use
Rn -

I n-

= [
o

l,n

3

0
]
U n -1.n

to effect the final reduction to the following triple diagonal form.

s=

al

bl

0

0

0

0

bl

a2

b2

0

0

0

0

b2

a3

b3

0

0

0

0

b3

a4

b4

0

0

0

0

0

0

0

bn -

0

0

0

0

0

0

0

2

an-l

bn -

bn -

an

1

1

It fol1ows that
(21)

S

=

T'AT,

where T consists of a finite product of orthogonal matrices of the type eqs.
(19), (20), etc., and also of the type obtained from interchanging two columns of the identity matrix.

NUMERICAL ANALYSIS

14-36

3. If any bi = 0, the eigenvalues and eigenvectors of S can be obtained
by a consideration of the eigenvalues and eigenvectors of each of the two
individual submatrices thus formed: one above and to the left of the vanishing bi , and the other to the right and below, and both of lower order than
S. Further, such subdivisions or simplifications are possible as several
additional b's may vanish. It wiII suffice then to treat the case bi ~
(i = 1, 2, "', n - 1).
4. Let

°

PO(A) = 1
(22)

PI(A) = A - al

PiCA) = (A - ai)Pi-I(A) - b2i_IPi_2(A)

(i = 2, 3, "', n).

By expanding the determinant of the first principal minor of AI - S, whose
order is i, in terms of the ith row and ith column, one obtains the last line
of eq. (22).
If bi ~ 0 (i = 1, 2, "', n - 1), the polynomials Pn(A) = detlAI - SI,
Pn-I(A), "', PI(A), PO(A) = 1 form a Sturm sequence. This means that
the eigenvalues of S are distinct and may be isolated. Suppose that c < d
are two real numbers which are not roots of P n(A). Then the number of
variations in sign of P n(c), P n-l (c), "', PI (c), 1 minus the number of
variations in sign of Pn(d), Pn-l(d), "', Pled), 1 yields the exact number
of eigenvalues of A between c and d.
5. Once an eigenvalue A is determined, the homogeneous equations
(AI - S)x = 0 can be solved to obtain the associated eigenvalue x. The
equations when written out have the form:

X2 = l/b l (A - al)xI
X3 = 1/b2[(A - a2)x2 - blxd
(23)
(i = 4, .. " n)

It follows from eqs. (23) that if Xl is taken as an arbitrary nonzero real
number that X2, X3, "', Xn can be obtained in turn. The last equation of
(22) may be used as a check. When this has been done for each of the A'S
and one has all the eigenvectors of S, one must turn attention to finding
the eigenvectors of A.

NUMERICAL ANALYSIS

14-37

6. Sx = Xx by virtue of eq. (21) implies (T' AT)x = Xx or A (Tx) =
X(Tx). Hence Tx, where x is an eigenvector of S associated with X, is the
eigenvector of A associated with the eigenvalue X.
Adjoint XI - A and Eigenvectors. Here one assumes that the
eigenvalues Xi (i = 1, 2, "', n), not necessarily distinct, have already
been found. The adj (XI - A), its derivative, or perhaps one of its higher
derivatives when evaluated at X = Xi present fertile territory for finding
the eigenvectors associated with Xi.
The adj (XI - A) is ,a matrix whose elements are polynomials in X but
may also be viewed as a polynomial in X with matrix coefficients. If one
writes

from

F(X)(XI - A) = f(X)I = IX n + ciIA n- 1

+ ... + cnI

one may determine the matrix coefficients F o, FI, "', F n and comparing coefficients of X. These are

Fo

=

l

by expanding

I

FI = FoA

+ clI

F2 = FIA

+ c21

F n-l = F n-2A

+ cn_II.

If Xi is a simple root of f(A) = 0, then F(Xi) is of rank 1, and a nonzero
column of F(Xi) is an eigenvector of A associated with Ai. There exists in
this case only one linearly independent eigenvector of A associated with Xi.
If Xi is a root of f(X) = of multiplicity 2, there can exist two linearly independent eigenvectors of A associated with Xi. But this need not be the
case, as there may exist only one linearly independent eigenvector associated with Ai. In the latter case F(Xi) again is of rank 1 and any nonzero
column of F(Xi) is an eigenvector associated with Xi. On the other hand, if
there exist two linearly independent eigenvectors associated with Xi, F(Xi)
turns out to be the zero matrix. But F'(Xi), the derivative of F(X) at Xi, is
of rank 2 and any two linearly independent columns of F' (Xi) are such
eigenvectors associated with Ai. Likewise, if Xi is a triple root of f(X) = 0,
there can be three linearly independent eigenvectors associated with Xi, but
here again this need not be the case. There may be only two linearly independent eigenvectors or even only one. Again if only one linearly inde-

°

14-38

NUMERICAL ANALYSIS

pendent eigenvector is associated with Ai, any nonzero column of F(Ai),
which has rank 1, is the desired eigenvector associated with Ai. If there are
two linearly independent eigenvectors associated with Ai, F(Ai) = 0 and
F' (Ai) is of rank 2, and any two linearly independent columns yield the
desired eigenvectors. Lastly, if there are three linearly independent eigenvectors, F(Ai) = F'(Ai) = O. But F"(Ai) is of rank 3, and any three linearly
independent columns of F" (Ai) are the desired eigenvectors associated with
Ai. This procedure can be extended all the way to a root of f(A) = 0 having
multiplicity n.
Indirect Methods

Here the number of arithmetic operations necessary to arrive at exact
answers is infinite. The procedures are iterative and the eigenvalues and
eigenvectors of a matrix A are found without explicitly calculating the
characteristic polynomial of A.
Iterative Procedures for HerIllitian Matrices. It is easier to handle
the case of a Hermitian matrix since it has real eigenvalues; eigenvectors
associated ,vith distinct eigenvalues are mutually orthogonal; and, because
it can bediagonalized, the multiplicity of each eigenvalue A equals the
number of linearly independent eigenvectors associated with A.
Assume, for the time being, that the eigenvalues of a given Hermitian
matrix A are distinct. Also all the eigenvalues of A + pI, .which is also
Hermitian, can be made positive by picking p sufficiently large so that there
is no restriction in assuming that the matrix A has a single dominant eigenvalue, i.e., an eigenvalue whose absolute value is greater than that of any
other eigenvalue of A. One first concentrates attention upon a method of
finding the dominant eigenvalue and its associated eigenvector. Several
methods are then available for finding the remaining eigenvalues of A.
The procedure starts with an initial vector Xo and by repeated premultiplication of A builds up the sequence of vectors
(24)

(p = 1, 2, ... ).

In the nonexceptional case for p sufficiently large, the direction of the
vector Xp wilJ approach the direction of the eigenvector Ul associated with
the dominant eigenvalue AI. In the exceptional cases, Xp will approach
either some Ui associated with the eigenvalue Ai (i = 2, "', n) or else O.
The latter rarely happens, but at any rate, an Xo can be easily picked so
that the former case will apply. Again if p is sufficiently large, the ratio
of the ith component (i = 1, 2, "', n) of X p +l to that of Xp can be made
arbitrarily close to the dominant eigenvalue. The closeness with which
these ratios agree may be regarded as a measure of the accuracy of the

'NUMERICAL ANALYSIS

14-39

approximation to ~'1' An error made during the course of the computation
of Xp will not lead to an erroneous result since subsequent multiplication
by A will pull the computation back into line.
One may alternatively calculate Al by means of a ratio of numbers as
defined below. Let
(p = 1, 2, ... );

then
Al

.

ap+l
p~oo
ap

= hm - - .

If next one desires to find the minimum eigenvalue An of A together with
its associated eigenvector, one may consider the matrix cI - A where
c > AI' The matrix cI - A is Hermitian, and the same techniques may
be applied to it to find its maximum eigenvalue and its associated eigenvector. To get the minimum eigenvalue An of A, one simply changes the
sign of the maximum eigenvalue of cI - A and adds c. The eigenvector
associated with the maximum eigenvalue of cI - A is also the eigenvector
associated with the minimum eigenvalue An of A.
After Al and UI have been calculated, the determination of the remaining n - 1 eigenvalues and their associated eigenvectors of the nth order
matrix A may be done in terms of a matrix whose order is n - 1 instead
of n. If the normalized form of UI is denoted by UI *, i.e., UI * = uti lUll,
from the vector UI * a unitary matrix U may be constructed so that

D'AU = [AI

o

0 ].
Al

where Al is a Hermitian matrix of order n - 1 whose eigenvalues A2, A3,
.. " An are the n - 1 remaining unknown eigenvalues of A. The dominant
eigenvalue A2 of A I and its associated eigenvector V2 can be found as previously. The eigenvector U2* associated with the eigenvalue A2 of A is the
vector U (v~). The following construction of the unitary matrix U is due
to Feller and Forsythe. One writes 'ill * as fol1ows
UI

*=

(~),

where a is a complex number and z is an n - 1 dimensional vector with
complex components. rfhen

U= [ a -z'
z I n-

where k = (1 - a)/(l - aa).

l -

kzz'

].

14-40

NUMERICAL ANALYSIS'

Next one replaces Al by the matrix

,
where A2 is a Hermitian matrix of order n - 2 having eigenvalues A3,
An, and then one repeats the previous step. This is continued until all
eigenvalues and eigenvectors of A are obtained.
Another way to find the eigenvalues A2, "', An and their associated
eigenvectors, once Al and UI * are known, is to form the new Hermitian
matrix

(25)

of order n also. The eigenvalues of Al are 0, A2, .. " An. The known eigenvector UI * is associated with the eigenvalue of AI; while the unknown
eigenvector Ui* is associated with the eigenvalue Ai (i = 2, "', n) of Al as
well as of A. Thus the dominant eigenvalue A2 of Al and its associated
eigenvector U2* can be found as before by forming powers of Al instead of
powers of A as in eq. (24).
Next one forms the Hermitian matrix

°

of order n which has eigenvalues 0, 0, A3, "', An, and the unknown u/ is
associated with the Ai (i = 3, "', n) of A2 as well as of A. Thus one obtains A3 and its associated eigenvector U3*. Again one continues in this
fashion until all eigenvalues of A and their associated eigenvectors are
found.
Multiple Roots. So far the possibility of multiple roots has not been
considered. Suppose, as before, one starts with Xo and builds up sequence
(24), one obtains as before an eigenvector associated with AI. A distinct
starting vector Yo may be selected to build up a new sequence which will
again lead to the eigenvalue AI. But it may happen that Yo leads to an
eigenvector which is linearly independent of the one to which Xo leads. In
this case Al is a multiple eigenvalue. If Yo, as in the case of distinct eigenvalues, leads only to an eigenvector which is linearly dependent or simply
a multiple of the eigenvector to which Xo leads, then Al is a simple eigenvalue. If Xo and Yo lead to linearly independent eigenvectors, and a third
arbitrary vector Zo leads to an eigenvector which is linearly dependent upon
the first two eigenvectors, Al is an eigenvalue of multiplicity 2; whereas,
if Zo leads to an eigenvector linearly independent of the first two calculated eigenvectors, Al is at least of multiplicity 3. One can continue this
process for eigenvalues of higher multiplicity also.

NUMERICAL ANALYSIS

14-41

Let ~q be a root of multiplicity 2. Then in the two-dimensional vector
space generated by the two linearly independent eigenvectors obtained one
may select UI*' U2*' which are orthogonal and of unit length, and which
are eigenvectors associated with AI. By starting with UI * and U2* one may
similarly, as before, build up a unitary matrix U such that

o

where A2 is a Hermitian matrix of order n - 2 containing the remaining
eigenvalues A3, "', An of A. Similarly, one proceeds in the case of multiple eigenvalues, as outlined before.
A number of additional variations for obtaining the eigenvalues and eigenvectors of A is possible.
Iterative Process for General Type Matrices. If the matrix A can
be diagonalized, the method of successively premultiplying by A applies,
with some small appropriate modifications, to this case as well as to the
case of the Hermitian matrix. No longer are the eigenvalues of A necessarily real. There may be several distinct dominant eigenvalues. The
eigenvectors of A can no longer be assumed mutually orthogonal. In order
to get around this situation, one introduces the concept of row eigenvectors
as well as column eigenvectors. Associated with each eigenvalue Ai of A
is the row eigenvector u(i), where u(i) A = Aiu(i), and the column eigenvector Ui, where A Ui = Aiui (i = 1, 2, "', n). In this case
where
i

~j,

i = j.
u(i) and Ui (i = 1, 2, "', n) need not be unit vectors but only
= 1 (i = 1,2, "', n).

Here
U(i)Ui

One again starts with an arbitrary initial vector Xo and forms the sequence
of eq. 24. A unique dominant eigenvalue and its associated eigenvector
are found exactly as before. Whereas, in the case of a Hermitian matrix
one forms the matrix Al as in eq. (25) in order to study the remaining eigenvalues and their associated eigenvectors, one now forms the matrix
Xp

Al = A -

AIU(l)UI.

Finding the eigenvalues in the case where several eigenvalues are dominant is more complicated, as these are not computed as a simple ratio but

NUMERICAL ANALYSIS

14-42

rather as described below. Suppose that Xl p , X2p, ••• , Xkp are Ie linearly
independent vectors obtained from the sequence (24). For p sufficiently
large, these are arbitrarily close to the actual eigenvectors. It is desired
to find the eigenvalues AI, A2, ... , Ak associated with these. Take z as an
arbitrary vector and form
aip

=

Z'Xip

(i = 1, 2, ... , k),

then
1

det

l

rA

alp

a2p

akp

al,p+l

a2.p+1

ak.p+l

A2

al. p +2

a2.p+2

ak.p+2

Ak

al.p+k

a2.p+k

ak.p+k

1
J= 0

has Ie roots which are close approximations to the eigenvalues AI, A2, ... , Ak.
The great majority of matrices appearing in applications have distinct
eigenvalues and so can be'diagonalized. Therefore, the method of iterating by premultiplication of a given matrix is applicabJe. In the rare case
in which A has a root of multiplicity r and whose associated eigenvectors
number less than the full complement of r, it is not possible to diagonalize
A. Nevertheless, even in this case, it is still possible to use this iteration
scheme to find the dominant eigenvalue and the associated eigenvector of
matrix A having but a single dominant eigenvalue. One must, however,
consider the linear dependence of a finite number of successive xp's in
sequence (24) for p sufficiently large to obtain the dominant eigenvalue
AI. The associated eigenvector may be obtained as a linear combination
of a finite number of the xp's whose components contain powers of AI.
Jacobi Method. The technique applies to Hermitian and so to real
symmetric matrices too. The method hinges on the fact that a 2 X 2
Hermitian matrix

a>O
can be reduced to diagonal form D by a unitary transformation U-IHU
= D, where
eiifi / 2 cos ()
U= [ tifi
.
(26)
e- / 2 sin ()
and () is an angle in the first quadrant which satisfies tan 2(} = 2a/(au -a22).
= 0, i.e., H is real symmetric, the matrix (26) reduces to the familiar
form

If 1/;

NUMERICAL ANALYSIS

u=

COS
[

0

sin 0

-sin 0]
cos 0

14-43

I

which corresponds to a rotation in the plane.
If the nth order Hermitian matrix A = (aij) is written in the form
H

AI]

A = [ ..4\ A22

I

then the unitary matrix
(27)

u, = [~ ~}

where U is the matrix (26) and transforms A into a matrix B = (bij), where
= b21 = 0; furthermore, the sum of the squares of the diagonal elements of B exceeds the corresponding sum of A by the positive quantity
2a2 • If it is desired to transform A into a matrix B such that bij = bji
= 0, the elements of the matrix U in eq. (26) must be positioned in the
ith and jth rows as well as in the ith and jth columns of U 1 in eq. (27).
One might hope that after applying the product of a finite number of
the above unitary transformations one might reduce the matrix A to diagonal form, in which case the sum of the squares of the diagonal elements
will have the maximum possible value. Unfortunately, this is not true, as
some of the elements, which have previously been reduced to zero, will not
remain so while some additional elements are likewise being reduced to
zero.
The procedure is to reduce to zero a pair of off-diagonal elements of
greatest modulus. It is the infinite product of all these transformations
which will reduce A to diagonal form /\ and whose diagonal contains the
eigenvalues of A. The infinite product of unitary matrices of the type
(27) converges to a matrix whose columns are the eigenvectors of the matrix
b12

A.
Eigenvalues of Special Matrices

The types of eigenvalues to which certain important classes of matrices
give rise are worth noting.
Matrix A
(a) Real and symmetric

(b) Real, symmetric, and
positive definite
(c) Real, symmetric and
positive semidefinite
(d) Orthogonal

Every Eigenvalue Ai of A
(a) Real

(b) Real and positive

(c) Real and non-negative
(d) IAi I

= 1 for every i

14-44

NUMERICAL ANALYSIS

In (a), (b), and (c), if a real symmetric matrix is replaced by a Hermitian
matrix, the conclusions still remain valid. In (d) if A is unitary, the conclusion drawn there still holds.
Some additional properties concerning dominant roots of important
classes are listed below:
(i) If A is real and symmetric, the maximum eigenvalue Amax is given by
x'Ax

Amax

= max--,
Xr"O

and the minimum eigenvalue

'x'x

is given by

Amin

x'Ax

Amin

= min - - .
Xr"O

x'x

(ii) If A is a real positive matrix (i.e., A has positive elements),
real number.

Amax

is a

Bounds on Eigenvalues

It is often a helpful guide to establish bounds for the eigenvalues of a
matrix at the outset, as this may influence the procedure. It is extremely
advantageous when this leads to the isolation of some of the eigenvalues
of a given matrix. Some of the criteria for determining bounds are easily
applied. A number of such results wi1I be stated and in some cases additional information will be given concerning the associated eigenvectors.
First, the case of matrices with complex elements wi1l be treated, and subsequently this will be specialized to matrices with positive and also matrices with non-negative elements. However, when results on bounds apply
to a large class of matrices, the bounds cannot be expected to be as sharp
as those applying to a smaller more specialized class of matrices. The following cases are of interest.
1. Let A be an arbitrary matrix of order n with complex elements. Then

IAI

~ nM ,

where A is any eigenvalue of A, and M is the maximum of the moduli of the
elements aij (i, j = 1, ... , n) of A. This result is due to Hirsch in 1902.
2. Let
n

Ri =

L
j=l

n

laijl

and

Tj =

L

laijl·

i=l

Also let R = max Ri (i = 1, ... , n) and T = max T j (j = 1,
Then
IAI ~ min (R, T).

n).

NUMERICAL ANALYSIS

14-45

A number of variations on these two bounds (1) and (2) exists. Some
of these variations are a bit sharper, but these are simple to apply and will
be sufficient for the purposes at hand.
3. Let Pi denote the sum of the moduli of the off-diagonal elements of
the ith row of the matrix A and Qj the sum of the moduli of the off-diagonal
elements of the jth column of A. That is,
n

Pi

=

I:

n

I aijl

j=1

and

Qj

=

I:

I aij I·

i=1
i,cj

j,ci

Then a result due to Levy and Hadamard states that each eigenvalue of A
lies in at least one of the circles
(28)

(i = 1, "', n)

and in at least one of the circles

Iz -

alii ~ Qj

(j = 1, "', n).

In other words, if one takes the diagonal element aii and draws' a circle
with aii as the center and Pi (i = 1, "', n) as radius, all the eigenvalues
of A wiB be trapped in these n circles. A similar remark applies to the n
circles with the Qj (j = 1, "', n) as radii. It is to be noted that an eigenvalue of A may be in several of the n circles.
4. An interesting offshoot of this result is the following: If one of the n
circles is isolated from the remaining n - 1 circles, that is, has no point in
common with the remaining n - 1 circles, exactly one eigenvalue of A
will be found in the isolated circle. More generally Gersgorin showed that
when m circles intersect in a connected region isolated from the remaining
n - m circles, the connected region thus formed contains exactly m eigenvalues of A.
5. The following results concerning the number of associated eigenvectors
is noteworthy. If an eigenvalue A of the matrix A lies in only one of n
circles (28), A has only one linearly independent eigenvector associated
with it. This result is due to Taussky (Ref. 15). Stein has shown that if
an eigenvalue A has associated with it m ~ n linearly independent eigenvectors, A lies in at least m of the circles (28).
6. Before passing to the case of positive and non-negative matrices, it is
worth noting a result of Frobenius which gives a connection between the
eigenvalues of a matrix with complex elements and a dominating matrix
with non-negative elements. Let B = (b ij ) be a matrix with complex elements and A = (aij) be a matrix ,vith non-negative elements such that
Ibij I ~ aij (i, j = 1, "', n). Then the characteristic circle of A contains
the characteristic circle of B. (The characteristic circle of a matrix is the

NUMERICAL ANALYSIS

14-46

smallest circle about the origin containing the eigenvalues of the given
matrix.)
7. Turning attention next to real matrices with non-negative elements, one
can draw some additional and sharper conclusions. If A is a matrix whose
elements aij ~ 0 (i, j = 1, ... , n) then (a) A has a real eigenvalue Ad ~ 0
which is dominant (there may be other dominant eigenvalues), (b) Ad has
an associated eigenvector x ~ 0, i.e., all components of x are non-negative,
and (c) Ad does not decrease when an element of A increases.
The above results are due to Herstein and Debreu, who paralleled for the
case of non-negative matrices the results of Frobenius given below.
8. These results grow sharper when one further restricts the matrix A
to be indecomposable. A non-negative matrix A is called indecomposable
if A cannot be transformed to a matrix of the form

by the same permutations of rows and columns where An and A22 are
square submatrices of A.
If A is a non-negative indecomposable matrix, then (a) A has a real
simple eigenvalue Ad > 0 which is dominant; (b) Ad has an associated eigenvector x > 0, i.e., all the components of x are strictly positive; and (c) Ad
increases when an element of A increases. These important results were
first demonstrated by Frobenius nearly a half century ago.
9. If the matrix A is still further restricted so that all its elements are
positive, that is, aij > 0 (i, j = 1, ... , n), then the statement (a) above
can be strengthened to include the fact that Ad is the only dominant eigenvalue of A.
10. Again, suppose A is a positive matrix and let
n

Ri =

:E aij

(i

= 1, ... , n) , R = max {Rl , R2 , ... ,n,
R }

j=l

Frobenius first noted that

r

~ Ad ~

R.

Also Ad = r = R if and only if all Ri are equal; otherwise, the inequality

r < Ad < R
holds.

Suppose that not all the Ri (i = 1, ... , n) are equal and let

o = max {Ri/Rj},
Ri 0, 0 < 1, and
Frobenius' result as follows

(J'

<

1.

Ledermann improved the bounds on

and Ostrowski further sharpened the bounds with the inequalities
r

+ < (~ -

1) ; ;

Ad ;;; R -

«I -

q).

In fact, the right-hand side of Ostrowski's inequalities applies to matrices
with complex elements when in the definitions of Rand K one uses the
modulus of the elements. More recently, Brauer announced further improvement of the above bounds, stating that the best possible bounds have
been attained. That is, in order to get sharper bounds one would have to
restrict further the class of positive matrices.
11. Some specialized examples of non-negative matrices are the stochastic
matrices and the oscillation matrices. The eigenvalues of the former play
an important role in the theory of stochastic processes while the latter type
matrices are applicable in the theory of small oscillations of mechanical
systems.
The matrix A = (aij) is called stochastic if aij ~ 0 (i, f = 1, .. " n) and if
(i = 1, "', n).
If aij > 0 (i, f = 1, "', n), the matrix A is called a positive stochastic.
matrix. All the eigenvalues of a stochastic matrix lie within or on the
boundary of the unit circle. Also A = 1 is a dominant eigenvalue of any
stochastic matrix. Previous results on non-negative and positive matrices
may be directly applied to stochastic matrices.
The matrix A of order n is said to be completely non-negative (completely
positive) if all minors of all orders from 1 to n of A are non-negative (positive).
If A is completely non-negative and there exists a positive integer k such
that A k is completely positive, then A is said to be an oscillation matrix. A
non-negative matrix A will specialize to an oscillation matrix if and only if
det A ~ 0, ai,i+l > 0 and ai+l,i > 0 (i = 1, .. " n - 1). The eigenvalues
of an oscillation matrix have the interesting property that they are all
strictly positive and simple.
For an extensive bibliography on the bounds of eigenvalues, see Ref. 15.

NUMERICAL ANALYSIS

- 14-48

TABLE

6.

COMPUTER STORAGE REQUIREMENTS AND NUMBER OF OPERATIONS FOR
FINDING EIGENVALUES AND EIGENVECTORS

n
w, w', w"

= order of matrix involved
= program storage requirements
Storage
Requirements

Method

' Multiplications

Additions

Triple Diagonal
Triple diagonal form S
Eigenvalues of A a
Eigenvectors of S
Eigenvectors of A
Total for eigenvalues
and eigenvectors b

+ n/2 + w
2n - 1 + w'
2n + w"
tn + tn - 1 + w'"
n 2/2
2

2

!n3

in3

3n 2
2n3

2n2
n3

J~n3

fn 3

4n

2n

Jacobi (Symmetric matrix)
Eigenvalues
(one step of reduction)

n2 + 4

+w

a In finding the eigenvalues of A, the number of operations depends on the stipulated
requirements for accuracy. If n is sufficiently large, the number of operations required
to find the eigenvalues, once the matrix is in triple diagonal form, is negligible compared
with the number of operations required to reduce the original matrix to triple diagonal
form.
b w'" is the sum of w, w', w", and the number of cell locations used in finding the eigenvectors of A.

4. DIGITAL TECHNIQUES IN STATISTICAL ANALYSIS
OF EXPERIMENTS

Joseph M. Cameron

Introduction. In scientific experiments a variable is measured under
several different conditions with a view to assessing the effect of these conditions on the variable under study. There may be factors present in the
measurement process which, if not balanced out or their effect reduced by
randomization or replication, may invalidate the estimates of the effects
the experiment seeks to measure. The branch of statistics called the design
of experiments is concerned with the construction of experimental arrangements that permit the balancing out of such extraneous factors and at the
same time minimizing (for a given number of observations) the uncertainties in the estimates of the effects under study.

NUMERICAL ANALYSIS

14-49

In most applications the analysis required is the usual least squares analysis for estimating the parameters postulated to represent the data. In a
designed experiment the normal equations that arise in the estimation of
the parameters take on a particularly simple form and the calculations
have been systematized and given the name analysis of variance. Example.
Consider a set of measurements Xb X2, •• " Xn all postulated to be estimates
of a single quantity. The least squares estimate for that quantity is, of
course, the average £ = ~xdn. One can also compute from the data a
measure of the dispersion of the results about this average. Perhaps the
most common such measure is the standard deviation, V~(Xi - £)2/(n - 1).
In the analysis of variance one deals not with the standard deviation but
rather with its square, which is a quadratic form in the deviations divided
by the number of independent deviations, called the number of degrees of
freedom.
The analysis of variance in its general form is a technique for (a) computing estimates of the parameters involved in the problem and (b) computing the value of quadratic forms, called sums of squares, assignable to
certain groupings of the parameters, each sum of squares carrying with it a
certain number of degrees of freedom (the rank of the quadratic form).
Thus in the case of k averages each based on n measurements, the parameters to be estimated are the grand average and the (k - 1) independent
deviations of the individual averages about this grand average. Three
sums of squares are to be calculated: one for the grand average (with one
degree of freedom), one for the deviation of the individual averages about
the grand.average [with (k - 1) degrees of freedom], and one for the deviations of the observations about their own group averages [with ken - 1)
degrees of freedom].
Several examples of the analysis of variance are presented to illustrate
the different techniques of computation that are available. The advantage
of one over another probably depends on the nature of the computing device used.
The availability of modern high-speed digital computers makes it feasible to analyze experimental data involving a much greater number of
factors, each factor occurring at more levels than would otherwise be the
case. The types of calculations described above and in the succeeding
pages, because of their systematic nature, lend themselves particularly well
to treatment on automatic digital computers.
Analysis of Factorial Designs Using Hartley Method. An experiment in which the effects of several factors on a variable are studied by
making measurements at all possible combinations of the several states or
levels for each of the factors is called a factorial experiment. Example.
Four temperatures of heat treating can be combined with three time periods

NUMERICAL ANALYSIS

14-50

to give rise to twelve conditioning treatments for some alloy. This would
be a factorial design with two factors, one at four levels and the other at
three levels.
The most general method for the analysis of factorials was developed by
Hartley (Ref. 22). His method depends on three operators which he has
labeled 2:;, D, and ( )2, defined as follows:
~t

Sum over all levels t = 1, 2, "', T for each combination of the
other subscripts.
D t Difference between T times the original values and the total in
,the set ~t to which the original value contributed.
)2. Sum of squares of items indicated in the parentheses.
Procedure. The use of this technique will be illustrated for a two-factor
factorial having one factor at k levels and the other at n levels. Denote
by Xij the observation at the ith level of the first factor and jth level of the
second factor. Let X.j denote the set of sums
Xi} = X.j, there being n

L:
i

such sums. In Table 7 the plan of the calculations is shown. Table 8
shows the analysis of variance table derived from the results of Table 7.
TABLE

7.

PLAN OF CALCULATIONS USING HARTLEY TECHNIQUE, Two-FACTOR
FACTORIAL EXPERIMENT

Level of factor B

Al

A2

Ak

BI

Xu

X21

Xkl

B2

Xl2

X22

Xk2

Bn

Xln

X2n

Xkn

}";j
Dj

Xl

X2.

}";i}";j

Xk.

nXU -

Xl.

nX21 -

X2.

nXkl -

Xk.

nXl2 -

Xl.

nX22 -

X2.

nXk2 -

Xk.

nXln -

Xl.

nX2n -

X2.

nXkn -

Xk.

Di}";j
knxu -

kXI. -

X ..

kXI. -

nX.I

+ x.

}";iDj

kXk. -

X ..

knXkl -

kXk. -

nX.I

knXkn -

kXk. -

nX.n

X •.
nX.I -

X ..

nX.2 -

X ..

nX.n -

X ..

+ X ..

DiDj
knXln --: kXl. -

nX.n

+ X ..

+ X ..

The estimates of the parameters are obtained by dividing the entries in
the sets ~i2:;j, ~iDj, Di~j, and DiDj by nk giving in that order the grand

NUMERICAL ANALYSIS

'fABLE 8.

14-51

ANALYSIS OF VARIANCE FOR Two-FACTOR FACTORIAL EXPERIMENT
No. of
Items

Sum of
Squares

1

Degrees of
Freedom

(~i~j)2
(~iDj)2

n

(~i~j)2/nk
(~iDj)2/n(nk)

n-1

(Di~j)2

k

(Di~j)2/k(nk)

k-1

(DiDj) 2

nk

(DiDj)2/(nk) 2

(n - 1)(k - 1)

~~xil

nk

1

Sum of Squares Is
Associated with:
Grand average
Effect of different levels
of factor B
Effect of different levels
of factor A
Interaction: lack of constancy between levels
of A as level of B is
varied
Total (for check)

average, differences among levels of factor B, differences among levels of
factor A, and the differences due to lack of constancy of the different
levels of factor A as the level of factor B is changed. This technique can
be extended to cover the case of three or more factors by using the basic
operations of ~, D, or ()2 and is adaptable to other designs as well (see
Ref. 22).
An alternate procedure necessary when the experiment is run in blocks
containing only a fraction of the total number of observations or when a
fractional replication design is used is based on the technique described in
Ref. 23, and is discussed below. Still another procedure is given in Ref. 16
based on the computation of individual degrees of freedom with orthogonal
polynomials tabled in Refs. 20 and 21.
Balanced Incolllplete BlocI{s. When there are more objects or treatments than can be compared under the same conditions, i.e., on a given
batch of material, in a given time period, or other factor which limits the
uniformity of conditions to a few tests, it is necessary to schedule the
measurements so that all comparisons of interest may be estimated from
the data. The class of designs constructed for such a case is called incomplete block designs, the block being the group of tests within which the
environmental or other factor is assumed not to change. The analysis of
these block designs will be illustrated for the case of the balanc,ed incom-·
plete block design (see Refs. 16-23).
Observations have index Xbktr referring to B blocks with K units per block
and T treatments with R repetitions of each. The data are entered so
that the observations from the first block come first, followed by those
from the second block, etc.
Step I. Compute total sum of squares of original values, ~Xbk2.

14-52

NUMERICAL ANALYSIS

Step II. Consider only indices band k.
( )2 Applied to
Result Gives BK Times

Operation

Number
of Items

Result

2;k
Dk
2;b2;k
Db2;lc

B
BK
1
B

Xb.
KXbk - Xb.
X..
BXb. - X..

Correction factor
Unadjusted blocks sums
of squares

Step III. Now consider only indices t and r. The values of Dk are now
rearranged into T groups with R values each so that the R values corre. sponding to the first treatment come in a group followed by a similar
grouping for each of the remaining treatments. Call these values d fr and
denote operations after rearrangement with asterisk. 2;r *Dk results in
d t . = KXt. - B t
(2;r *Dk)2 =

K(K - l)TR
(T - 1)

X sum of squares for treatments (adjusted),
.

where B t = sum of block totals for blocks containing treatment t.
Analysis of Variance
Total
Blocks (unadjusted)
Treatments (adjusted)
Error

Sum of Squares
2;Xbk2 - x2 • ./BK
(Db2;k) 2/ BK
(T -

1)(~r*Dk)2

K(K - l)TR
By subtraction

Degrees of Freedom
BK-l
B-1
T-l

BK - B - T

+1

Analysis of Factorials by Using Relations alllong the Indices
Associated with the Treatlllents. To illustrate the method assume
there are three factors A, B, and C having levels n + 1, n + 1, and n + 1

respectively. Each observation is tagged with an index XIX2Xa, where Xl =
0,1, "', n, X2 = 0, 1, "', n, and Xa = 0, 1, "', n, where n is a prime.
For the main effect of A form the (n + 1) sums of values whose indices
satisfy
Xl =
mod (n + 1)

°

Xl

= 1 mod

(n

+ 1)

Xl

= n mod (n

+ 1)

NUMERICAL ANALYSIS

Denote these sums by AI, A 2 ,
effect of A is given by

+

A n +l • The sum of squares for the main

•• "

(~A)2

~A2

(n

14-53

1)2

- - - - (degrees of freedom = n).
(n + 1)3

Similar computations give the sum of squares for the main effects of B
and C.
For the two-factor interactions the sums of values whose indices satisfy
the equations below are computed.

:1 +
[+
Xl

:1 +
[+
Xl

X2 ~ 0 mod (n

X2

=

n mod (n

+ 1)
+

nX2 ~ 0 mod (n

nX2

=

n mod (n

1)

+ 1)
+ 1)

From the (n + 1) sums corresponding to Xl + aX2 = 0, 1, "', n mod
1) are computed the sum of squares associated with the n degrees
of freedom for ABO! and the AB interaction is given by the total of such
sums over all values of a. For the three-factor interaction one computes
the (n
l)n2 sums of values for which the indices satisfy
(n

+

+

Xl

+ aX2 + f3X3 =

0, 1, "', n mod (n

+

1),

where a = 1, 2, "', n, and f3 = 1, 2, "', n. Each group of (n + 1)
sums give the sum of squares associated with the n degrees of freedom for
the effect ABO!Cf3. For each group one computes:
~

(Sums)2

Number of items in each sum

(Grand total)2
Total number of items

The extension to higher order interactions is straightforward.
This technique is ideally adapted to analysis of variance of factorials
where block confounding occurs or to the analysis of fractional replication
of factorials. Example. A 3 4 design in blocks of 9 with ABD, ACD 2 ,
AB 2 C2 , and BC 2D 2 confounded with blocks is computed in the manner

14-54

NUMERICAL ANALYSIS

described to get the usual analysis except for the combination of the sums
of squares for the three-factor interactions which involve confounding with
blocks. For example the ABD interaction is given by the sum of squares
associated with AB2D, ABD 2, and AB2D2 each of which has two degrees
of freedom. The sum of squares associated with ABD is assigned to
blocks.
For fractional factorials (with or without block confounding) the analysis
is carried out as if it were a complete design' with fewer factors by suppressing one or more of the indices. The individual components, A, "',
B, "', AB, AB 2, "', ABC, ABC2, "', are computed, and an identification is made according to the identity relationships (and block confounding,
if any). (For further details see Ref. 23.)
Analysis of Variance for 2 n Factorials. An example for a 22 experiment will illustrate this procedure. Enter observations in the order designated.
Observed
Values

=

Xoo

a=

XOI

(1)

b = XI0
ab = Xu

First Sums and
Differences, Dl

+a
b + ab

Second Sums and
Differences, D2

D 22/2 n Will Give

+ a + b + ab
a - (1) + ab - b

(1)

(1)

Corree. for mean
A

a - (1)
ab - b

b + ab - (1) - a
ab - b - a + (1)

B
AB

In general:
(a) Form a column of sums of the 2n - l pairs followed by 2n - l differences
between the first and second element of a pair.
(b) Repeat this operation on the column so formed until the nth such
column is formed.
(c) Then square the entries in the nth column and divide by 2n to get
analysis of variance table in the order A, B, AB, C, AC, BC, ABC, ....
The observations are entered so that their subscripts form an increasing
sequence when regarded as binary numbers; e.g., for n = 3 the observations
are in the order Xooo XOOl XOlO XOll XlOO XlOl X110 X11l·
Analysis of Fractional Replication of 2n Factorials. Arrange the
(1/2k)2n = 28 observations in the proper order for a 28 factorial (suppressing the other indices) and carry out the analysis as above. Identify the
results of the analysis by using the identity relationships and the block
confounding in the manner shown in the following example.
EXAMPLE.

7.i replication of 26 in blocks of 8.
Fundamental identity: I = ABEF = ACDF = BCDE.
Block confounding: CD.

NUMERICAL ANALYSIS

Block Treatment
(1)
1
af
1
be
1
abef
1
cef
2
ace
2
bcf
2
2
abc
def
2
2
ade
2
bdf
2'
abd
cd
1
acdf
1
be de
1
abcdef
1
a

Index a
000000
0001 01
0010 10
001111
0100 11
010110
011001
0111 00
1000 11
1001 10
1010 01
1011 00
110000
1101 01
1110 10
1111 11

14-55

Identification
Mean
A=A
B=B
AB = AB +EF
C=C
AC = AC DF
BC = 13C DE
ABC = error
D=D
AD = AD CF
BD = BD+ CE
ABD = error
CD = CD
AF
ACD = F
BCD = E
ABCD = AE+BF

+
+

+

+

+ BE + blocks

Only the first four indices are used.

5. ORDINARY DIFFERENTIAL EQUATIONS

Richard F. Clippinger

Definitions and Introduction. An ordinary differential equation of
nth order is a relation between an independent variable x, a dependent
variable Yb and derivatives of Yl up to order n, (dnYddxn = Yl (n)):

F(x, Yl(X), y'!(x), "', Yl(n)(X)) =

o.

By the introduction of new variables, it is possible to obtain a system of n
equations of first order:
Gl(X, Yl(X), Y2(X), "', Yn(x), y'!(x), "', y'n(x)) = 0
G2(x, Yl(X), Y2(X), "', Yn(X), y'!(x), "', y'n(x)) = 0
Gn(X, Yl (x), Y2(X), "', Yn(X), Y'! (x), "', y' n(x)) = 0
which theoretically can usually be solved in the form:
Y'! =

11 (x,

Yl (x), "', Yn(x)).

y' n = In(x, Yl (x), "', Yn(x)).

NUMERICAL ANALYSIS

14-56

With vector notation, this system takes the form:
y'(x) = f(x, y(x)),

(29)

where y is a vector whose components are Yi(X), i = 1, 2, "', n, and f is
a vector whose components are h(x, YI (x), "', Yn(x)), j = 1, 2, "', n.
Vector notation will be used throughout this section covering systems of
equations which can be put in this form. The reader who is not familiar
with vectors can take the case where y(x) is a single function of x and use
this section as·a guide to the solution of one first order equation.
At the end of this section is a summary table of some useful numerical
methods for solving differential equations on a digital computer (see Table
11). Some important characteristics of each of these methods a~e listed.
The prospective user may employ this table as a quick guide in selecting
the most suitable method for the problem at hand.
Requireluents for Solution. A solution of eq. (29) is a vector y(x)
which satisfies eq. (29). It necessarily possesses a first derivative.
The differential equations used by engineers nearly always possess solutions which have continuous derivatives of many or all orders or indeed
are analytic (i.e., the Taylor series converges) except at isolated points.
They are said to be piecewise continuous and have piecewise continuous
derivatives. The isolated discontinuities are of practical importance since
engineer's derivatives are such quantities as current, voltage, velocity, and
acceleration which he must limit to avoid damage to his equipment. Methods of solving differential equations that are awkward at discontinuities
are of restricted value to him.
NUlllerical Solution. The Taylor series for y(x) in the neighborhood
of some point Xo:
y(x) = y(xo)

+

y'(xo)(x - xo)

+ ... +

y(m) (xo)(x - xo)m 1m!

+ .. "

enables one to approximate y by an mth degree polynomial in x - Xo.
Most numerical methods of solving differential equations depend directly or
indirectly on this fact.
Consider a set of points
Xi+j = Xj

+ ih,

i

= 0, ± 1, ±2, ....

These points are equally spaced along the x-axis and the distance between
neighboring points is h, called the grid size.
Write the Taylor series of y, hy', h2 y", etc., at each of these points:

+ ihy'j + ... + imhmy/m) 1m! + R m+b
= hy'j +
+ ... + im-1hmy/m) I(m - 1)! + R m+b
2
= h Y"j + ... + i m- 2 hmy/m) I(m - 2)! + R m+b

(30a)

Yi+j = y(Xi+j) = Yj

(30b)

hy'i+j

ih 2 y/,

(30c)

2

h y"i+j

NUMERICAL ANALYSIS

14-57

where Rm+l is a generic notation for a rel1winder which contains hm+1 as a
factor. Equations (30) can be used in an endless variety of ways to obtain
procedures for the numerical solutions of eq. (29).
Solutions for Known Yi and y'i. Yi and y'i are known at several past
points (i.e., i = 0, -1, -2, ... , - I) and Yj+l is desired. Solve eqs. (30a)
and (30b) at i = -1, ... , - I for 21 of the quantities:

h2Yj/2!, h3y/3) /3!, ... , h2I +l y/21 +1) / (21

+ I)!,

and substitute into eq. (30a) for Yj+l and obtain a formula accurate to
terms of degree 21 + 2 in h. Thus we have Table 9.
TABLE
Formula

I

Yj+l

Yj

9.
hy'j

EXTRAPOLATION FORMULAS
Yj-l

hy'j_l

2
3
4

2
3

Yj-2

hy'j_2

Yj-3

hy'j-3

-4
-18
12~

- --a-

4
9
16

5
9
-36

2

18
72

10
64

3
48

Error
y(2)h 2/2
y(4)h 4/8

(Euler method)

0

47

----:3

4

h 611(6) /20
h 8y(8)/70

First Order NI ethod. Formula 1 of Table 9 is the simplest and best known
of all solution methods and is due to Euler. The value of Yj+l is
(31)
then the value of y'j+l is obtained from eq. (29). It can be shown that the
approximate solution obtained in this fashion converges to the exact solution as the grid size approaches zero, the error at a given point being proportional to h. This is called a first order method. The principal attraction of
this method is its simplicity. Its principal disadvantage whether for hand
or electronic digital computation is that it requires a small grid size to obtain a given accuracy.
Studying the Stability of the Method. The most illuminating test
of any method of solving differential equations (ordinary or partial) is to
perturb the solution and study the local properties of the perturbed solution. To illustrate, consider Euler's method for solving a single eq. (29).
Suppose that a small error € is made at Xo and that Zj is the Euler solution
of eq. (29) with this error at Xo.
Let
1]j = Zj - Yj·
Then, since Yj and Zj each satisfy eq. (31), one finds, with the mean value
theorem, that

~Hl = ~i (1 + h :~ (x;, Yi + O~i)) ,

0<

0

< 1.

14-58

NUMERICAL ANALYSIS

Consider now a small enough neighborhood of Xo so that second order
effects may be neglected, i.e., that ajlay may be taken to be a constant d.
Then
1]j+l = 1]j(l

+ hd)

= 1]0(1

+ hd)j+l

= e[(l

+ hd)l/hd]hd(j + 1).

If attention is focused on a fixed point,
x

=

Xo

+ (j + l)h,

and if h is allowed to approach zero, 1]j+l approaches e exp (x - xo)d.
The error at x, due to the error € at Xo, thus remains finite as the grid size
goes to zero, and the method is said to be locally stable. The error grows
with x if ajlay is positive; otherwise it decreases.
The same method shows that the other extrapolation formulas 2, 3, and
4 of Table 9 cannot by themselves be used to solve differential equations
because they are locally unstable, i.e., the error at x due to a given error
at Xo becomes infinite as the grid size goes to zero.
Solutions for Known Yj+l- If Yj+l is obtained in some fashion, y'j+l
can be found from eq. (29). Using eq. (30) for hy'i+l in addition to the
equations used to obtain Table 9 results in Table 10.
TABLE

Formula

I

5
6
7

0
1
2

Yi+l

hy'i+l

10.

Yi

1

2"
1

"3
a
IT

0
27

-IT

EXTRAPOLATION FORMULAS
hy'i
1
1f
4

a

27

IT

Yi-l
1
27

IT

hy'i-l

Yi-2

hy'i-2

Yi-3

hy'i-3

(Trapezoidal formula)
1
(Simpson's rule)
a
27
a
1
IT
TT

Error
_h 3y(3)j4
-h 5y(5)/90
-1~()h7y(7)

Heun's second order method has its basis in formula 5, Table 10, the
trapezoidal formula. It is a considerable improvement on Euler's method
since a much larger grid size may be used. It is just as stable as Euler's
method, requires no past history, and calls for substitution in eq. (29) only
once per point.
One uses Euler's formula for a first value of Yj+l, eq. (29) to find y'j+l
and then Heun's formula for a better value of Yj+l. It is not necessary to
recompute y'i+l' The process may be iterated if desired.
The procedure which Milne (Ref. 24) recommends most highly for solving
ordinary differential equations uses
Yi+l = Yj-3 + 4hy'j_l + 8hI3(y'j_l - 2y'i-2 + y'j-3) + ~ gh 5y(5)

to extrapolate and formula 6, Table 10, which is Simpson's rule,
Yj+l = Yj-l + hI3(y'i-l + 4y'j + y'j+l) - h 5y(5) 190
to recalculate.

NUMERICAL ANALYSIS

14-59

Solution by formulas 2 and 6. A procedure which requires less past history
and therefore is better for starting and at discontinuities uses formula 2,
Table 9, to find a third order approximation to Yj+l and Simpson's rule to
recalculate. Either procedure calculates derivatives only once per point.
Formula 7, Table 10, is unstable and therefore useful for extrapolation
but not for recalculation.
Method of Adallls and Bashforth (Ref. 25). This approach is best
expressed in terms of differences:
\1Y'j = y'j - y'i-I,
\1 2Y'j = \1(y'i - y'i-l) = y'i - 2y'j-1

+ y'i-2,

\1n y 'j = \1(\1n-I y'i - \1 n - Iy'j_I)'
Yj+l = Yj

+ h(y'j + \1y'j/2 + 5\12Y'j/12 + 3\13 y'j/8 + 251\14y'j/720 + ... ).

For solutions whose derivatives of some order are everywhere continuous, this method has the advantage of yielding arbitrarily high order of
approximation with only one evaluation of derivatives per point. For automatic computer use, it has several disadvantages which lead to its rare use.
A special starting process is required; it is awkward to change grid size; at
each isolated discontinuity, the special starting process must be used again.
The Runge-Kutta l\1ethod. Like Euler's and Heun's methods, this
method avoids these difficulties (Refs. 26 and 27). It has several forms.
One of the best known, which has a truncation error proportional to h 5 , is:
Yi+l = Yj

+ (leI + 2le2 + 2le3 + le4)/6

leI = hf(Xh Yj),
le 2 = hf(Xj

+ h/2, Yj + led2),

+ h/2, Yi + le 2/2),
hf(Xj + h, Yi + le 3).

le 3 = hf(Xi
le 4 =

The Runge-Kutta method was recently adapted to automatic computers
by Gill in a form which concentrates on saving memory and reducing roundoff error (Ref. 28).
All forms of the Runge-Kutta method have the disadvantage that the
derivatives must be evaluated several times, four in these two cases.
Fourth Order Method. This method has been used extensively on
automatic computers since 1946 and has been carefully studied by Dims-

14-60

NUMERICAL ANALYSIS

dale and Clippinger (Ref. 29). It consists in extrapolating for Yj+2 by the
third order formula using one past point (see Table 11):
(32a) Yj+2 = Yj-2 + 4(Yj-2 - Yj) + 4h(2y'j + y'j-2) + 2h 4y/4).
The derivative y'j+2 is then found and also Yi+l by
(32b)

Yj+l = (Yj

+ Yi+2)/2 + (h/4)(y'j -

y'i+2) - h4y/ 4)/24.

The derivative y'j+l is then found, and Yj+2 is redetermined by Simpson's
rule:
(32c)
Yj+2 = Yj + h/3(y'j + 4y'j+l + y'j+2) + h5y/5) /90.
Isolated discontinuities are made to fall at odd-numbered grid points by
adjusting h. To start, or at points where the grid size is altered, eqs. (32c)
and (32b) are iterated, and eq. (32a) is not used. Thus, like Runge-Kutta's
or Gill's methods, this method requires no past history, and is well suited to
starting, discontinuities, and change of grid size. By the addition of a
single point from past history, it achieves the efficiency of Adam's, Milne's,
and other methods requiring only one evaluation of derivatives per point.
Higher Derivatives. Sometimes eq. (29) can be easily differentiated.
In this case a fourth order, stable procedure requiring no past history is
obtained by eliminating h3y(3), and h4y(4) from eqs. (30a), (30b), and (30c)
at i = 0, 1:
Yj+l = Yj + h(y'j + y'j+l)/2 + h2(y"j - y"j+l)/12 + h5y(5) /720.
By adding a single point from past history, one obtains the predictor,
Yj+l = 32Yj - 31Yj-l - 2h(8y'j

+ 7y'j-l)
+ h2(9Y"j

- 4y"j_l)/2

+ h6y(6) /720,

and the seventh order corrector,
Yj+l = yj-l

+ 2yj + 3h(y'j+l - y'j-l)/8
+ h2(8Y"j - y"j-l

- y"j+l)/24

+ h8y/8) /60450,

which can be used except at the start, at discontinuities, and at grid change
points.
Method of Brock and Murray. A method which takes advantage of the
fact that differential equations are locally linear with constant coefficients
and therefore have solutions which are locally linear combinations of exponentials has been developed by Brock and Murray (Ref. 48).
Extrapolation to Zero Grid Size. If a method has a local error
proportional to hn+I, it has an error at a given x proportional to hn, since
the number of local errors made going from Xo to x is (x - xo)/h. By call-

NUMERICAL ANALYSIS

14-61

ing E the error at x, and the exact answer, y, tl:en,
(33)

E

= Y - Y = ahn + bhn+1 + 1'/,

where the remainder, 1'/, goes to zero as hn+2. If eq. (20) is solved numerically at two grid sizes, hI and h2' one may write eq. (33) at both grids and
solve for y:
(34)

y = Yl

+

(Yl - Y2)r n/(l

- rn)

+ bh2n+lrn(1
-

- r)/(l - rn)
(1'/1 -

rn 1'/2)/(1 - r n ),

where r = ht/h 2 • Richardson (Ref. 30), who invented this procedure,
called it "extrapolation to zero grid size." Looking at the next to last
term, one sees that it would be more apt to call it "increasing the order
of accuracy from n to n + 1." Equation (34) is useful in many ways. For
example: (a) One can solve (29) at two grid sizes and use eq. (34) to get a
better answer at common points. (b) One can solve (29) at one grid size
and occasionally take a step at two grid sizes by using the second term to
estimate the error and adjust the grid size. (With this procedure it is important to use methods which depend on little past history.) (c) One can
take every step at two grids, use eq. (34) to improve the accuracy before
proceeding, and also use the second term to adjust the grid size.
Boundary Value Problems or Distributed Conditions. It may
happen that not all components of yare specified at one value of x. Instead, some of the components of Y may be given in terms of the others at
two or more points.
A pproach A. Perhaps the most obvious approach to this problem is to:
1. Assume initIal conditions at Xo.
2. Solve the problem.
3. Assume other initial conditions.
4. Resolve the problem.
5. Interpolate between the initial conditions for initial conditions which
will satisfy one of the other given conditions at some other point. (This
is based on the theorem that the solutions of differential equations are,
under suitable conditions, continuous functions of their values at particular points.)
.
6. Reiterate this process until all conditions are satisfied. If there are
many conditions to be satisfied by varying the same number of components of Y at Xo as parameters, the interpolation process becomes quite
complicated. If convergence is also slow, it may be necessary to solve the
differential equation thousands of times, treating the different equations
and distributed conditions as simultaneous equations for all the variables
at all the points.

NUMERICAL ANALYSIS

14-62

Approach B is to consider all the distributed conditions and the approximating equations simultaneously. For instance, consider the second order
system:
.
Y' = f(x, y, z),

(35)

Z'

= g(x, y, z),

with the distributed conditions:
(36)

yea) = A,

ky(b)

+ lz(b) + my'(b) + z'(b)

= O.

One might use Heun's approximating difference equations:
(37)

Yj+l - Yj = (f(xj, Yj, Zj)
Zj+l - Zj = (g(xj, Yj, Zj)

+ f(Xj+l' Yj+l, Zj+l)) (h/2),
+ g(Xj+b Yj+l, Zj+l))(h/2).

Replacing a, b by Xo, Xn one would write the side conditions eq. (36) in the
form
Y(Xo) = A,
(38)
m(Yn - Yn-l) + (zn - Zn-l) = (h/2)[mfn-l + gn-l - kYn - lZn],
where the second eq. (36) is replaced by one equivalent to it to third order
and fn is written for f(x n, Yn, zn).
Equations (37} and (38) are 2(n + 1) simultaneous equations for the
2(n + 1) unknowns Yj, zj, j = 0, 1, 2, .. " n. They are not linear; however,
h appears as a factor of the right members and the left members are linear.
It is therefore natural and quite practical to define an iterative process,
writing Y/ and Zji for the ith approximation to Yj and Zj:

Yj+l i
(39)

-

Y/ = (h/2)(f/-I

+ h+I i-I),

Zj+l i - z/ = (h/2)(g/-I + gj+l i-I),
m(Yn i - Yn-l i)
Zn i - Zn-I i

+

= (h/2) (mfn-l i - I + gn-l i - I

-

kYn i - I

-

Zn i-I).

Approach C, useful if f and g are readily differentiable, is to perturb eqs.
(35) by introducing 'YJ = Y - Ti, t = Z - z where Ti, z is some approximate
solution:
af
af
(40)
'YJ' =-'YJ+-t,
ay
az
It would be possible to use eqs. (39) to find Ti and z and then, by evaluating the derivatives af/ay, etc., at Ti, z solve eq. (40) as linear equations for
'1], t subject to initial conditions 'YJ(xo) = t(xo) = 0.

NUMERICAL ANALYSIS

14-63

COlllputcr Storagc Rcquirclllcnts and NUlllbcr of Opcrations.

The columns of Table 11 provide a guide to the use of the methods listed.
Similar remarks apply to this table as in the concluding paragraph of Sect. 2
relative to content and notation of Table 5 in that section.
The presence of past history requirements is an important consideration
for digital computers because this generally means programming special
starting programs for use at boundary points, at points of discontinuity of
the solution, or at points where the grid size changes. For this reason
formulas requiring no past history are in general easiest to program.
It is important in evaluating a digital computer procedure to be able to
estimate the number of operations, multiplication times, or other index of
the computing time. However, in practical problems in differential equations, this time is almost completely dominated by the time to compute
the derivative f(x, y) in eq. (29), and this, of course, cannot be determined
except in the context of a specific problem. The next best guide to the
volume of computations is the number of times the derivative must be
computed per integration step, and this is listed in the last column of
Table 11.
TABLE

11.

Method
Extrapolation Formulas
(Tables 9, 10)
Formula 1 (Euler)
Formula 5 (He un)
Adams-Bashforth
Runge Kutta
Gill
Fourth Order Method

COMPUTER REQUIREMENTS IN SOLUTION OF ORDINARY
DIFFERENTIAL EQUATIONS

Order
of Error

h2
h3

Arbitrary
h5
hU

hO

Predictor-corrector formulas
h5
Milne
Dimsdale-Clippinger
h5
(using 3 iterations)
Dimsdale-Clippinger
h6
(using extrapolator)
5th order predictor-7th
h8
order corrector
Order of hn+l,
Extrapolation to Z'ero grid
at least, if
siz-e
error of formula used is
of order h n

Past
History
Required

Computer
Storage a

None
None
k points, where
k is arbitrary
None
None
None

2n +w
3n
w
n(k
1)
w

3 points

5n +w

None

+
+ +
4n + w

3n +w
On +w

On

+w

1 point

On +w

1 point

On

+w

Derivative
Evaluated.
Times/Step

4
4

2
(1st derive once and
2nd derive once)

3

2
3n

a n is dimension of vector y, and w is undetermined amount of working storage and
program storage.

14-64

NUMERICAL ANALYSIS

6. PARTIAL DIFFERENTIAL EQUATIONS

J. B. Diaz
R. F. Clippinger
Bernard Friedman
Eugene Isaacson
Robert Richtmyer

Introduction. A variety of physical problems, when analyzed from a
mathematical point of view, lead to the consideration of boundary value
problems for differential equations. In many cases, the physical quantity
of interest is found to be represented by a function which satisfies a differential equation in a certain domain of the independent variables. Besides
the differential equation (which may be ordinary or partial, depending
upon whether the independent variables are one or more than one, respectively) the "unknown" function is required to satisfy certain other conditions, which will be referred to collectively as boundary conditions. Generally speaking, these additional boundary conditions select, from the totality of the solutions of the differential equation in question, the solutions
which correspond to the actual physical situation under study. Example.
The determination of the steady-state temperature in a plane circular plate
of unit radius, whose periphery is maintained at a given temperature,
amounts to the determination of a real-valued function u(x, y) satisfying
the partial differential equation

a2ujax 2

+ a2ujay2

= 0 for 0

~ x2

+ y2 < 1,

and the boundary condition
u(x, y) = !(x, y)

for

x 2 + y2 = 1,

where! is a prescribed function. (f is essentially the preassigned temperature distribution on the periphery.)
An equation involving a function of two or more variables and its partial
derivatives is called a partial differential equation. The order of a partial
differential equation is the order of the highest order derivative which actually appears in it. A partial differential equation is linear, if it is of the
first degree when considered as a polynomial in the unknown function and
its partial derivatives (otherwise the equation is called nonlinear). Example. The equation a2ujax 2 + a2ujay2 = 0 is a linear second order equation, while the equation (aujax)2 + u = 0 is a nonlinear first order equation.

NUMERICAL ANALYSIS

14-65

This section will consider linear and second order partial differential
equations starting with some mathematical background and leading to a
discussion of numerical methods suitable for digital computer use. The
section will conclude with a summary table giving some significant attributes of the methods listed from the point of view of digital computer
solution (see Table 12).
First Order Partial Differential Equations

Consider

F(x, y, z, p, q) = 0,
az
az
p =-,
q =-,
ay
ax

(41)

a par:tial differential equation of first order for z as a function of x, y. The
general solution of this problem depends on an arbitrary function. Lagrange showed that the general solution could be deduced from a "complete" solution, i.e., a two-parameter family of particular solutions. Lagrange and Charpit also showed that such a complete solution could be
deduced from the solution of the system of ordinary equations for x, y, z,
p, q in terms of a parameter:
dx

- =

x'

y'

Q =-,

dt

(42)

=

= pet)

z' = Pp
p' = _

q'

aF(x, y, z, p, q)
ap

aF
aq

+ Qq,
(aF + p aF),
ax
az

= _ (aF + q aF).

ay

az

Cauchy showed that any particular solution of eqs. (41) is composed of
curves he called characteristics obtained by integrating eqs. (42).
Let
(43)

Xo = f(s),

Yo = g(s),

Zo = h(s)

be the parametric equations of a curve through which a particular solution

NUMERICAL ANALYSIS

14-66

of eqs. (41) is to be found. Then Po(s) and qo(s) must satisfy the differential equation,
F(f, g, h, Po(s), qo(s)) = 0,

(44)
and the condition
(45)

fpo(s)

+

gqo(s) - h = O.

The solution of eqs. (42) subject to initial conditions (43), (44), and (45)
can be represented by
x = x(u, t),
y = y(u, t),

z = z(u, t),
P

= p(u, t),

q = q(u, t).
Thus the problem of finding the solution of eqs. (41) passing through
curve (43) is reduced to the solution of ordinary equations which can be
done by the methods of Sect. 5.
To illustrate, consider the linear equation,
x+y+z+p+q=O

subject to the conditions
y

= z = 0,

when
(46)

O~x~1.

When y is zero and x is outside the range (46), z is not defined.
Cauchy's method yields the solution

+

x = s

t,

y = t,

+ (s -

z = -2t
P

=

e- t -

q= - 1

2)(e- t

-

1),

1,

+

(1 - s) e-t.

O~s~1.

Eliminating sand t yields
z = -2y

+ (x -

y - 2)( -1

+ e-

Y

).

NUMERICAL ANALYSIS

14-67

Cauchy's method shows that, in general, if Z is given along some arc of
curve C terminated at points A and B, then Z is determined in a strip
bounded by a characteristic through A and a characteristic through B.
Call this strip the region of determinacy, in our example, t~e strip between
y = x and y = x - 1. Any method other than Cauchy's must therefore determine these characteristics one way or another to determine the region of determinacy.
Practically, it may be difficult to obtain aFlax, aFlay, and aFlaz. In

that case, it is not possible to obtain p and q by using the last two equations
(42). As an alternative, let the characteristic curves s = constant and the
curves t = constant be used as a curvilinear coordinate network. The
transformation from Cartesian coordinates x, y to coordinates s, t is governed by the relations
tx = Ys/ A,

ty = -XsIA,
(47)

Sx

= -YtI A ,

Sy

= xtl A,

where tx is atl ax, etc.
By definition and eqs. (47),
p

(48)

=

azlax

=

(ZtYs - zsYt)1 A,

q = azlay = (-ZtXs

+ zsxt)1 A.

Start along t = 0 and use Euler's method and the first three of eqs. (42)
to get x, y, Z at each point on t = h. Use numerical differentiation to obtain Xs , Ys, and Zs at each point on t = h. Use eqs. (48) to get p, q on the
same curve. If more accuracy is desired, H eun' s method may now be used
to obtain better values on t = h. The same process may now be repeated
for t = -h, 2h, -2h, etc.
Second Order Partial Differential Equations

The general linear partial differential equation of the second order in the
two independent real variables x and y is
(49)

a 2u

a -2
ax

a2u
a2u
au
au
+ 2b-- + c - + d - + e - +fu =
ax ay

ay2

ax

ay

g,

where the letters a, b, ... , g denote real-valued functions of x and y. The
equation is called homogeneous if the "nonhomogeneous term" 9 is identically zero. The linear homogeneous equation has the property that the

NUMERICAL ANALYSIS

14-68

"superposition principle of solutions" holds, i.e., that if u and v are solutions, any linear combination Au + Bv, with constant coefficients A and
B, is also a solution.
Classification. The partial differential eq. (49) can be reduced to
certain typical, or canonical forms by means of a suitable change of variables:
~

(50)

= Hx, y),.

'f} = 'f}(x, y).

Consider first the equation with constant coefficients
a2u
a2u
a2u
(51)
A -2+ 2B--+ C - = 0
ax
ax ay
ay2
'
and make the change of variables
~

(52)

= ax

+ {3y,

'f} = 'YX

+ oy,

with a, {3, '1', 0 real constants. In the new variables
(53)

(Aa 2 + 2Ba{3

~,

~u

'f}, one has
~u

+ C(32) -a~2 + (A'Y2 + 2B'Yo + C0 2)-2
a'f}
a2u

+ 2(Aa'Y + B[ao + {3'Y] + C{3o) -a~ a'f} = o.
Since eq. (51) is assumed to be of second order, not all three real constants
A, B, C are zero, i.e., A 2 + B2 + C2 > o. It will now be supposed further
that A ~ O. There is no loss of generality, since if A = 0 and C = 0, too,
then B ~ 0, and the equation is already in "canonical" form (see eq. 54
below); whereas if A = 0 and C ~ 0 one has merely to interchange the
roles of x and y. The classification into three types is as follows:

B2 - AC

> 0,

hyperbolic type,

B2 - AC

< 0,

elliptic type,

B2 - AC = 0,

parabolic type.

(The reason for the designations elliptic, hyperbolic, and parabolic is obvious from analytic geometry, the reduction of a quadratic bilinear form
Ax2 + 2Bxy + Cy2 to a sum of squares.)
HYPERBOLIC CASE. When B2 - AC
a=

>

0, by choosing {3 = 0 = 1,

'1'=

NUMERICAL ANALYSIS

14-69

in eqs. (52) and (53), and by dividing (53) by a nonzero constant, one obtains the canonical form
(54)

whereas by choosing
a=

-C
VB2 - AC

,

o = 1,

'Y = 0,

one obtains similarly the canonical form

a2 u a2u
---=0.
ae

ELLIPTIC CASE.

a=

-C
VAG - B2

a1]2

When B2 - AC

< 0,

by choosing

,

0=1,

'Y = 0,

one obtains the canonical form

a2 u

a2 u
a~2 + a1]2

= 0.

PARABOLIC CASE. When B2 - AC = 0, bychoosing,B = 1, a ~ -BjA
and 0 = 1, 'Y = - Bj A, one obtains the canonical form

a2 u
ae

-=0.

In the general case of an eq. (49) with variable coefficients, it is said to
be of elliptic, hyperbolic, or parabolic type at a given point (xo, Yo) according
to whether b2 (xo, Yo) - a(xo, Yo)c(xo, Yo) is < 0, > 0, or = 0, respectively.
If the coefficients a, ... , g are sufficiently smooth in a neighborhood of
(xo, Yo), and eq. (49) is elliptic at each point of the neighborhood, there is a
sufficiently small subneighborhood of (xo, Yo) in which one can introduce
new variables by means of eq. (50) (not necessarily a linear change of
variables as in the case eq. (51) of constant coefficients) so that eq. (49) becomes, in this subneighbdrhood,

2
au

-2
a~

2

au
)
+ -aa1]2u + ( Linear terms in -au
, - , and u
a~ a1]

= 0.

A similar statement applies in the hyperbolic and parabolic cases.

NUMERICAL ANALYSIS

14-70

Of course, eq. (49) with variable coefficients may be of different type at
different points of a domain, i.e., it may be of "mixed" type, as occurs in
the linearized equation for the potential function of a two-dimensional
compressible flow. Example.- The equation ya 2ujay2 + a2ujax 2 = 0 IS
elliptic for y > 0, parabolic for y = 0, and hyperbolic for y < o.
Representative equations commonly studied are:
(a) Elliptic
(b) Hyperbolic
(c) Parabolic

+

a2ujax 2 a2ujay2 = 0
a2ujax 2 - a2ujay2 = 0
a2ujax 2 - aujay = 0

Laplace
Vibrating string
Heat

The variable t is usually written instead of the variable y in the last two
equations. For a more detailed discussion of the canonical forms of eq.
(49), as well as for the classification into canonical forms of higher order
equations and systems of equations see Refs. 36-38.
Difference Equations. In numerical investigations it is often necessary
to replace the partial differential equation occurring in a given boundary
value problem py a suitable equa.tion involving differences rather than
derivatives of the unknown function (see Chap. 4). The basic principle
usually employed is none other than the fact that any partial derivative is
the limit of a certain difference quotient. For a function of one variable,
f(x), the difference quotients in the plus x and minus x directions, fx and
fx, are defined by
fx(x)

f(x

+ h)

- f(x)

h

and fx(x)

f(x) - f(x - h)
h

where h > o. The second differences of f(x) are defined as the differences
of the first differences. There are three second differences, fxx, fxx( = fxx) ,
and fxx. The second difference fxx is the most "symmetric" of the three:
. f(x
fxx(x)· =

+ h) + f(xh

- h) - 2f(x)

2

.

The corresponding differences for functions of several independent variables are defined as above, upon holding fixed all the variables but one at a
time. For example, for a function of two variables u(x, y):

+

u(x
h, y) - u(x, y)
ux(x, y) = - - - - - - - - h

and
uxxCx, y) =

u(x

+ h, y)

- 2u(x, y)
h2

+ 'u(x

- h, y)
' etc.

NUMERICAL ANALYSIS

14-71

Laplace equation. By taking the difference equation U xx + Uyfj = 0 as a
"difference approximation" to Laplace's differential equation a 2ujax 2 +
a2ujay2 = 0, one obtains the difference equation
(55)

U(x

+ h, y) + u(x -

h, y)

+ u(x, y + h) + u(x, y 4

h)
= u(x, V).

Vibrating string. By taking the difference equation U xx - 'llllfj = 0 as a
difference approximation to the vibrating string equation a 2ujax 2 a 2ujay2 = 0, one obtains the difference equation
u(x

+ h, y) + u(x -

h, y) - u(x, y

+ h)

- u(x, y - h) =

o.

In the case of the heat equation a 2ujax 2 - aujay = 0, one has
the alternative difference equations U xx - U y = 0 and UX,i: - 'llfj = o.
Exactly the same procedure is applicable to first and to higher order
partial differential equations, as well as to systems of equations. An alternative approach to the numerical treatment of first order partial differential equations can be based on the fact demonstrated in the previous subsection that the solution of a first order partial differential equation and
the solution of the characteristic system of ordinary differential eqs. (42)
corresponding to the given first order partial differential equation are
equivalent tasks.
Heat.

Note. The following three subsections represent results obtained at the Institute of Mathematical Sciences, New York University, under the sponsorship of
the United States Atomic Energy Commission Contract AT(30-1)1480. Reproduction in whole or in part permitted for any purpose of the United States Government.
Elliptic Partial Differential Equations

Consider a partial differential equation of second order for a function u
of n variables Xb X2, ••• , X n . One writes the equation as follows:
(56)

The coefficients aij, bi , c are assumed to be constant.
called elliptic in a region R if the quadratic form
(57)

This equation is

Laij~i~j
i.j

is non-negative definite for all values of the

~i

such that (6,

~2, ••• , ~n)

is

14-72

NUMERICAL ANALYSIS

a point in R. A typical example of an elliptic difference equation is Poisson's equation, that is,

Note that the condition (57) for ellipticity depends only on the highest
order derivative terms of eq. (56).
Dirichlet and Neulllann Prohlellls. A typical problem involving
elliptic differential equations is one that requires the solution of a boundary
value problem. For example, a typical problem would be to solve

Lu

=1

in a region R given. that u is a prescribed function Uo on the boundary B of
the region R. Such a problem is called a Dirichlet problem for eq. (56). If,
instead of the values of u, the values of au/av, the normal derivative of u,
are prescribed on B, the problem is called a Neumann problem. A more
general problem is that in which eq. (56) has to be solved, given that the
values of

are prescribed on B. Here hI and h2 are known functions. If h2 = 0, it is
a Dirichlet problem; if hI = 0, a Neumann problem; if neither is identically zero, it is a mixed problem.
Choice of Method. The standard procedure for solving a partial
differential equation numerically is to place a rectangular mesh on R, to
replace the differential equation at each mesh point by a finite difference
approximation, and thus obtain a set of linear equations. The main difficulty in this procedure occurs in the process of solving the set of linear
equations. Inverting the matrix of this set of linear equations is usually
not convenient because the matrix is generally ill conditioned. A "marching" process such as that used for solving hyperbolic equations by which
the values in the lines of the mesh are determined in succession from the
values on the preceding lines is not feasible because the values on any line
depend on the values of two preceding lines and the boundary data are not
sufficient to determine the values on two successive lines. Because of these
considerations, the method most frequently used for solving the linear
equations is an iteration or relaxation method.
Iteration Procedure. In order to discuss iteration methods, some
notation is needed. Suppose one wishes to solve a system of equations in
p unknowns. Let x denote a p-dimensional vector whose components are
the p unknowns xl, X2, ••• , xp; let the p X p matrix, K, of the coefficients

NUMERICAL ANALYSIS

14-73

of the unknowns be nonsingular, and let b be a vector whose components
are the p nonhomogeneous terms in the set of equations. Then write the
system of equations as follows:
(58)

Kx

= b.

To solve this system by iteration, put
K = N - P,

where Nand P are any matrices whose difference is K, and write (58) as
Nx

=

Px

+ b.

Make an estimate of the value of x and call it
possibly given by the vector x(l) such that
N x(l)

=

Px(O)

+

Nx(n+l)

=

px(n)

+

b,

A better estimate is

b.

This process can be continued indefinitely.
(n = 0, 1, 2, ... ) as the solution of
(59)

x(O).

Define the vector

x(n+l)

n = 0, 1,2, "',

and hope that the sequence of vectors x(n) converges in the limit to the
desired vector x.
The iteration method defined here is completely general in that the splitting of the matrix K into two matrices Nand P was arbitrary. Each distinct split gives a different iteration procedure. There are, however, two
restrictions on the ways of splitting 1(.
(a) To find x(n+l) from eq. (59) more easily than to find x from (58), N
must be a matrix with an easily found inverse. For example, N might be a
diagonal matrix or a lower triangular matrix.
(b) For the iteration scheme to converge, it is required that all the eigenvalues of the matrix N- 1P be in absolute value less than 1. It can be
shown that this is a necessary and sufficient condition for the sequence x(n)
to converge to x, no matter what the original guess x(O) is.
Richardson and Lieblllann Iteration Methods. The ideas of the
preceding section will be illustrated by applying them to the solution of
Poisson's equation
(60)

a2 u
ax2

°

+

a2 u
ay2 = I(x, y)

inside the unit square when ~ x ~ 1, 0 ~ y ~ 1. Assume that the values of u(x, y) are given on the boundary of the square.

14-74

NUMERICAL ANALYSIS

Put a square mesh of width lip over the unit square and let

u(;.;)

=

Ui;,

i, j = 0, 1, "', p.

By the use of the well-known finite difference approximation for the Laplacian (see eq. 55), eq. (60) becomes
(61)

ui-l,i

+ ui,i-l

-

4Ui,i

+ Ui+l,i
1
+ ui,i+l = "2
fij,
P

i,

f

=

1, 2, "', p - 1.

Since the values of UOj, Up;" (J = 0, 1, "', p) and of UiO, Uip (i = 0, 1, .. " p)
are given, (61) is a system of (p - 1)2 equations for the (p - 1)2 unknowns
uii (i, j = 1, 2, "', p - 1).
In Richardson's method for solving eq. (61), the following iteration scheme
is used:

+ u- -+l(n)
l,)

1

-

-f-2

l),

···0

n = 0, 1,2,

P

The values of Ui/O) are of course the initial guess to the solution of eq. (60).
If this method is compared with that in eq. (59), it is apparent that the
split is such that N is a diagonal matrix.

One disadvantage of Richardson's method when an electronic computer
is used is that all the previous values of Ui/ n ) must be stored until all the
new values of Ui/ n +1) are found. This disadvantage is avoided in Liebmann's method where the new value of Ui/ n +l) is calculated by using as
many new values as are available. Thus, if the values of Uij are calculated
in order along each row from left to right and the rows in order from bottom to top, the following iteration scheme would be used:
(63)

4u-l,)_(n+l)

= u-l -1 ,)_(n+l)

+ U- -

t,)-

l(n+l)

+ U-+l
l

_en)

,)

+ U- -+l(n) l,)

1

-f-2

l)'

P

It can be proved that the method defined by eq. (63) would converge twice
as fast as that defined by eq. (62).
The rate of convergence can be still further improved by using an extrapolation parameter a, thus obtaining what is called Liebmann's extrapolated

NUMERICAL ANALYSIS

14-75

method. The iteration scheme is now this:
4Ui/n+1)

= 4(1 -

a)ui/ n )

+ a [ Ui_l.j (n+l) + Ui,j-l (n+l) + Ui+l,j (n) + Ui,j+l (n)

1f ]

-

p2 ij

.

The value of a for which convergence is fastest is found by solving the equa.:
tion
a

2

tm 2

-

4a

+4 =

0,

where tm is the largest eigenvalue of the Richardson scheme eq. (62). For
the case considered, tm = cos (-Trip). For a rectangular mesh with p divisions in one direction and q divisions in the other,

tm = ![cos (nip)

+ cos (7rlq)].

In general the use of the extrapolated Liebmann method with the best
value of a will be much faster than the unextrapolated Liebmann method.
Line Iteration Schemes. Another iteration method which is useful
in many cases is given by the following scheme:
(64)

4Ui/n+1) -

Ui,j-l (n+l) -

Ui.j+l (n+l)
n

= Ui_l./ +l)

+ Ui+1./ n )

-

1
2
p

fij.

In this scheme instead of solving for the values of U at a point ij, salve
for all values of U on the ith column in terms of the values of U on the
(i - l)-th and (i
l)-th column. That is why eq. (64) has been written
with the left-hand side containing all the u-values on the ith column. Since
at each step the value of the right-hand side is known for all values of j,
the three-term relation defined by eq. (64) is solved for the values of U on
the ith column. (The method of solving the three-term relation is explained in the subsection on Hyperbolic Partial Differential Equations.)
Instead of solving for the values of U on a column, one may solve for the
values of U on a row. In that case use the following scheme:

+

(65)

4u··(n+1)
tJ

U·+l .(n+l) t

,J

U·t - 1 .J.(n+l)
-

Ui,j-l

(n+l)

+ Ui,i+l (n)

-

1
2
P

f ij·

Again this three-term recurrence scheme is solved for the values of U on the
jth row starting with j = 1.
The Method of Peaceman and Rachford (Ref. 35). This seems to
be one of .the quickest iterative methods for solving an elliptic differential

NUMERICA~

14-76

ANALYSIS

equation. It is essentially a line iteration scheme which uses columns and
rows alternately. The explicit description of the method is contained in
the following formulas:
Ui_l./

2n

1
+ ) -

(2

+ Pn)Ui/ 2n +1) + Ui+1./ 2n +1)
= -Ui,j-l (2n)

Ui.j-l (2n+2) -

(2

+ (2 -

Pn)Ui./

2n

) -

Ui.j+l (2n)

+ ~fij;
p

+ Pn)Ui/ 2n + 2 ) + Ui,j+l (2n+2)

= -Ui_1./ 2n +1)

+ (2 -

Pn)Ui/

2n

+1) -

Ui+1./ 2n + 1

)+

1 !ij.
p2

Here Pn is an extrapolation parameter which is to be determined so that the
method will converge as quickly as possible. In the present case Peaceman
and Rachford suggest putting Pn = Pk if n == k (mod p), where
• 2

Pk

= 4 sm

(2k

+ 1)71" .

4p

Variational Principle. An important characteristic of elliptic differential equations is that they can be obtained as the Euler equations of
problems in the calculus of variations. Physically, this implies that the
problem possesses an energy integral whose minimum value is given by the
solution of the elliptic partial differential equation. For exq,mple, in Dirichlet's problem the integral

(66)

ff

2

(V'u) dx dy

R

must be a minimum in the domain of all functions U satisfying the preassigned boundary conditions. In problems with mixed boundary conditions
the integral (66) must be modified (for details see Ref. 37). As another
example, in elasticity problems involving plates, the integral
(67)

f f(~U)2 dx dy

must be a minimum in the domain of all functions U satisfying the preassigned boundary conditions.
For numerical purposes the energy integral can be approximated by a
sum involving the values of the unknown function u at a set of points inside the region R. Then choose values of U so that the sum will be a mini-

NUMERICAL ANALYSIS

14-77

mum. For example, (66) would be approximated by

L L

(68)

i

[(Uij -

Ui_l.i)2

+ (Uij -

Ui.j_l)2],

j

and (67) by
(69)

LL
i

[Uij -

!(Ui.j+l

+ Ui.j-l + Ui+1.j + Ui_1.j)]2.

j

As these illustrations show, the sum is a quadratic form in the values Uij.
By differentiation with respect to Uij a set of linear equations is obtained
whose solution will make the sum a minimum. Simple algebra shows that
when this method is applied to eqs. (68) and (69) the standard difference
equations for Laplace's equation or the biharmonic equation are obtained.
Any iteration method which at each step reduces the value of the sum
must automatically converge to a minimum value. Use of this fact easily
shows that the various schemes proposed above do converge to a solution.
The variational principle is also useful in determining how the boundary
conditions should be taken into account.
Hyperbolic Partial Differential Equations

The equation of the vibrating string will be used to illustrate some finite
difference methods for solving problems involving a s'econd order hyperbolic partial differential equation.
If the end points of the string are held fixed at x = 0 and x = 1, the
deflection of the string u(x, t) is determined from the initial deflection
u(x, 0) = f(x), and the initial velocity Ut(x, 0) = g(x).
The conditions describing the motion are:

P.D.E.

1

""2 Utt = Uxx ,

for 0

< x < 1, t >

for 0

 l/c, the solution
U will not converge to u for all initial displacements. This may be verified
by noting that the solution of the initial value problem for the infinite
string is given by
x ct
+ g(~) d~.
(72)
u(x, t) = Mf(x + ct) + f(x - ct)] + -1
2c

i

x-ct

Formula (72) shows that the solution u(x, t) depends solely on the initial
data in the interval (x - ct, x + ct). The solution U(x, t) depends solely
on the initial data in the interval [x - (tIA), x + (t/A)]. Hence if A > l/c,
it is possible to vary f and g in the intervals [x - ct, x - (tIA)] and
[x + (lIA), x + ct] in such a way that the solution u(x, t) is changed, but
yet U(x, t) is unaffected. Hence if A > l/c, the solution U(x, t) cannot
converge as h, k ~ 0 since it would have to converge to different values.
Hence it is necessary for convergence that A ~ l/c, i.e., the "domain of
dependence" for the solution of the finite difference equation should contain the domain of dependence of the solution of the differential equation.
In fact, if A ~ II c, U does converge to u as h, k ~ o. The proof of convergence may be made to rest upon the Fourier series representations of
the solutions of eqs. (71) and (70), namely,
00

U(x, t) = ~ (An cos J.l.n t

+ Bn sin J.l.nt) sin nx,

n=l

where J.l.n is determined from the condition sin J.l.nkl2 = AC sin nhl2; and
00

u(x, t) = ~ (an cos nct
n=l

+ bn sin nct) sin nx.

14-80

NUMERICAL ANALYSIS

Roundoff and Truncation Errors. The calculation of U is in practice
effected by rounding to a finite number of decimal places. The equations
which determine U are

+ Ri,o,

Ui,Q

=h

Ui,l

=

2

Ui,j+l

=

C

UO,j

=

UN,j

C2A2

fi+l

+ (1

2A2 U + ,j
i l

2
A2)h

-

C

+ (2 -

C2A2

+ .2 h-l + kYi + Ri,l,

2C2A2) Ui,j

+

2A2U _1.j i

C

Ui,j-l

+ Ri.j+b

= 0,

where Rp,q is the roundoff error. The truncation error Tp,q is defined by
substituting U into eq. (71) as follows:

+ Ti,Q,

Ui,O

=h

Ui,l

= -h+l
2

C2A2

+ (1

-

2
A2)h

C

C2A2

+ -2h - l + kYi + Ti,l,

It is easily verified that Ti,o = 0, Ti,l = O(k3 ), Ti,j = O(k4) where O(le n )
represents a quantity which is bounded in absolute value for all sufficiently
small Ie by Mk n with some constant M.
It is reasonable to require that the roundoff error be of the same order
of magnitude as the truncation error or smaller, in order that the number
of digits carried in the calculation be appropriate for the interval size.
With this restriction, the total error ei,j is O(T2k2) for any finite time, T,
where
ei,j = Ui,j - Ui,j and
~ j ~ Tile.

°

IInplicit Schetnes. The restriction klh = A ~ 1/c may be relaxed by
using an implicit scheme. That is, it is possible to take larger time steps
at the expense of more involved calculations as follows:

P.D.E.

Ui,j+l -

2Ui ,j

+ Ui,j-l

c

2

- - - - - 2- - - - - = -2 [a?(Ui +1.j+1 k
h

+ Ui-l,j+l) + (1 -

+ U i -1.j)
2Ui ,j_l. + Ui-l,j-l)].

2
2a )(Ui +1.j - 2U i ,j

+ a 2 (Ui+1.j_1

-

2Ui ,j+1

NUMERICAL ANALYSIS

14-81

The equations when solved for the unknowns at the
have the form
(73)

(acX)2U i +l,i+l - [1

U + 1)-st time step

+ 2(acX)2]Ui ,i+l
for i = 1, 2, ... , N - 1,

where W involves information on the two preceding lines U and j - 1).
The labor involved in solving eq. (73) is minimal since the (N - 1) X
(N - 1) matrix of coefficients is in triple diagonal form. At the same time
the condition on X which insures convergence of U to u as h, Ie ~ 0 is

1

X2 C2 ~

- 1 - 4a2

and no restriction for

.%:

,

~ a2•

Solution of Triple Diagonal SysteIns.

+ CIX2 + 0 + ...
a2xl + b2X2 + C2Xa + 0
+ .. ~
o + aaX2 + baxa + CaX4 + 0

The equations

+0
+0
+0

b1Xl

+0

o

+
o +

+ ...

= Yl
= Y2

=

Ya

+ aN-IXN-2 + bN-1XN-l + CN-IXN

=

YN-l

+0

=

YN

+ aNXN-l + bNxN

may be solved by eliminating the unknowns in succession from the equations. By starting at the top the system can be put in the form

+ ... +0

0+··· + 0
0+···

+0

The numbers CK and Y K may be recursively computed from the formulas

(74)

CK

CK
=----bK - aKC K - 1

YK

for K = 2, 3, ... , N.

=

YK -

aKYK -

bK

aKCK

-

1

_1

,

NUMERICAL ANALYSIS

14-82

It is now easy to solve for the
XN =

XK

beginning with

XN

as follows:

YN ,

(75)

for K = N - 1, N - 2, "',1.
The von Neumann Criterion for Convergence. A quick method
for heuristically testing the convergence of a finite difference method has
been attributed to von Neumann.
In the case of linear differential equations with variable coefficients, the
method consists in replacing the coefficients by constants and then finding
all solutions of the difference equation of the form

U(x, t) = e'Yte i /3 x ,

with {3 real.

If Ie'Yt I ~ 1 for t ~ 0, for all (3 and for all admissible values of the coefficients, the finite difference method is said to be stable, otherwise not.
The von Neumann "test" for convergence is the same as for stability.
In practice, this test for convergence is as simple as any a priori calculation
could be; in addition, it has been shown to be a sufficient condition for convergence' for a large number of cases.
Parabolic Partial Differential Equations

Finite difference methods for parabolic equations are similar to those for
equations. The present discussion' will be restricted to equations of the first order in time and second order in one or more variables.
Illustrative methods will be given for:
(a) The linear heat flow equation in one dimension.
(b) A quasilinear equation in one space variable and time.
(c) A linear parabolic equation in two space variables and time.
For diffusion or heat flow in one dimension there is an initial value problem consisting of a partial differential equation (P.D.E.), an initial condition (I.C.), and boundary conditions (B.C.) for a function u(x, t). In the
simplest case, these are:
hyperbo~ic

P.D.E.

1
-Ut

= u xx ,

for 0



0

(j

"(76)

I.C.

U(X, 0) = f(x),

B.C.

u(O, t) = u(l, t) = 0,

for 0

 as h, k ~ 0, the solution of the
difference eqs. (77) diverges for all but special cases in which the initial
functionf(x) has a terminating Fourier series. One says that the equations
are unstable under these circumstances and that (78) is a condition for
stability. For general discussions of convergence and stability, see Refs. 39,
40, and 43.
The convergence as k ~ is slower, at least in a formal sense, than it
is for the hyperbolic problem, and its rate depends upon the smoothness
of the initial function f(x). If condition (78) is satisfied and f(x) is analytic
for ~ x ~ 1, the error ei,j of the approximation (77) is O(Tk) for t in a
finite interval ~ t ~ T. One can of course also write ei,j = O(Th2) because of the relation (78).
The method is more accurate in the special case in which hand k are s,o
chosen that
uk
1
(79)
h2 . = 6·

°!

°

°

°

It is easy to verify by Taylor's series expansions that in this case there is

14-84

NUMERICAL ANALYSIS

a cancellation of the first order error terms coming from the two members
of the first eq. (77). In consequence, if f(x) is analytic, eij = O(Tk2) =
" O(Th4).
A condition of the form (78) is perhaps not unexpected from the point
of view of the domain of depe~dence of the differential equation, which is
not confined to a small interval as it was for the vibrating string problem.
That is, u(x, t), for any t > 0, depends on all the initial data, i.e., on the
values of f(x) for the entire interval 0 ~ x ~ 1. For any finite values of h
and k the difference equations of course possess a restricted domain of
dependence, but as the mesh is refined this domain opens out (because eq.
78 requires that k vary as h2 ) so as to include all past values of the function.
hnplicit Equations. Implicit difference equations can be constructed
for the heat flow problem in many ways. For example, one can replace the
first eq. (77) by the equation
(80)

Ui.j+l uk

Ui,j

1

= h2 [a(Ui+1.j+l - 2Ui,j+l

+ (1

+

U i-1.j+l)

- a)(Ui+l,j - 2U i ,j

+ Ui-1.j)],

where a is a constant. The resulting method reduces to the foregoing explicit method for a = 0, to the so-called Crank-Nicholson method (Ref. 41)
for a = 72 and to the method of Laasonen (Ref. 42) for a = 1.
The condition for convergence of the solutions of eq. (80) as h, k -7 0 is
(81a)
(81b)

uk
1
-2 < - - h = 2 - 4a

if 0 ~ a

No restriction if

!

 0 for 0
by the difference equation
(83)
=

 0,

C

> 0,

AC - B2

>

°

can be approximated by an analog of the general implicit eq. (80). Because
of the number of variables it is convenient to introduce a slightly different
notational convention for this problem by calling the increments At, AX, Ay
and by relating the time variable t to a superscript n as follows. Let
Uj,Zn = U(j AX, lAY, nAt),
 =

f(Xb X2, ... , xn)

+ AlgI (Xl, X2,

... , xn)

+ ... + Amgm(XI, X2,
where
pliers.

AI, A2, ••• , Am

... , x n),

are undetermined multipliers called Lagrangian multi-

OPERATIONS RESEARCH

15-13

Then, in order to determine the extremal values of u = l(XI, X2, "', x n ),
all that is necessary is to obtain the solution of the system of eqs. (10) for
the unknowns Xl, X2, " ' , X n, Ab A2, "', Am. (See Ref. 5.)
EXAMPLE.
Find the point in the plane X + 2y + 3z = 14 nearest the
origin. The problem may be converted to that of finding values of x, y,
and z which minimize the square of the sphere diameter

u = D2 = x 2

+ y2 + Z2

subject to
g

=

X

+ 2y + 3z

- 14 =

o.

Form:
¢

= X2

+

y2

+ Z2

-

A(X

+ 2y + 3z -

14).

Take partial derivatives of ¢ with respect to x, y, z, and A.

a¢

-=2X-A

ax

a¢
- = 2z - 3A

az

'

'

a¢
- = 2y - 2A,
ay
a¢
- =
aA

X

+ 2y + 3z -

14.

Setting these four partial derivatives equal to zero and solving the system
of four simultaneous equations then yields:
X

= 1,

y

= 2,

z = 3, and A = - 2;

that is, the point in the plane X + 2y + 3z = 14 nearest the origin IS
(1, 2, 3).
Other examples of the use of Lagrangian multipliers can be found in
Ref. 6 and in texts on advanced calculus.
Modified Lagrangian Multiplier Method (Ref. 1, Chap. 10). Many
practical extremal problems have the added restriction that all variables
must be non-negative; for example, it makes no sense to produce -n units
of a given product. Furthermore, the restrictions may be given in the form
of inequalities instead of equalities. Since the Lagrangian multiplier
method does not guarantee the non-negativity of the solution variables, a
modification must be made.
EXAMPLE.
Consider an economic lot size inventory problem (of the
type described above) involving two products, with a restriction on the
total available warehouse space. If WI and W 2 are the respective unit
storage requirements, and an average inventory level is assumed equal to
one-half the lot size q, the total space requirement can be written as
(12)

OPERATIONS RESEARCH

15-14

If WI = 5 cu ft, W 2 = 35 cu ft, and S = 14,000 cu ft, eq. (12) becomes
(13)

or, equivalently,
(14)

5qI

+ 35q2 ~ 28,000.

The problem (for two products) can then be stated as:
Problem. Determine non-negative values of qi and q2 which minimize

TEC = (!C l1 Tql

+ ~ CSIRI) + (!C I2 Tq2 + ~
CS2R2)
q2
~.

subject to the restriction of eq. (14).
Solution. Define an undetermined multiplier A such that
(15)

A< 0

when S - !~Wiqi = 0;

A= 0

when S - !~Wiqi

> O.

Form
(16)

that is,
(17)

cp

=

(1z;C

l1 Tql

. qi1CslR ) + (1Z;C12 Tq2 + 1)
+
q2 Cs2 R 2
+ A(S - ! WIql
I

-

! W 2Q2)'

Since A(S - Y2Wiqi) is always identically zero by eq. (15), cp = TEe.
Taking partial derivatives of cp with respect to qi and q2 yields
(18a)

and
(18b)

a(TEC)

1

- - - = Z;C I2 T aq2

Setting eqs. (18) equal to zero yields
(19a)

and
(19b)

1
-2

q2

1

Cs2 R 2 - Z;AW2 •

OPERATIONS RESEARCH

15-15

For each product, the quantities Ri, Csi, Cli, Wi, and l' are known, but
A is still unknown. However, for any arbitrarily assigned value of A, qi
and, hence, }-2~Wiqi can be calculated. If }-2~Wiqi exceeds S (see eqs. 12
and 13), the lot sizes are too large. In this case, decrease A repeatedly and
recompute until }-2~lViqi = S has been obtained. If }-2~Wiqi < S for all
negative A, set A = 0 in eq. (19). The resulting q/s will allow the smallest
possible total costs for the company with existing warehouse space S.
TABLE

1.

STORAGE SET BY VARIOUS

A
-0.0000
-0.0012
-0.0024
-0.0036
-0.0060
-0.0084
-0.0120
a

Assumes: T

ql*
816
813
810
806
800
794
784

q2*
756
721
690
663
617
580
535

A VALUES a

!(5ql + 35q2)
15,270
14,650
14,100
13,618
12,790
12,135
11,323

= 12 months and
Product

Ri

e,i

eli

Xl
X2

2400
4800

$100
$ 25

0.060
0.035

Values of }-2(5ql + 35q2) are calculated in Table 1 in order to determine
the correct value of A. As indicated in Table 1, A should be approximately
equal to -0.0024 so that
ql*

= 810 and q2* = 690.

N ole. For this example, without any restriction on storage space (minimizing TEC, rather than ¢),
ql * = 816

and

q2* = 756.

Another approach to a modified Lagrangian multiplier technique can be
found in Ref. 7. Such modified Lagrangian multiplier techniques, when
applicable, are most cumbersome and impractical for a large number of
variables. (See Ref. 1, Chap. 10.) Where the objective function and the
restrictions are linear (and for some other special cases), the techniques of
linear programming are applicable (see Sect. 4).
Other Analytic Methods of Solution. There are many other analytic
methods of solution much more sophisticated than those presented here.
A number of the models arising in specific problems require the development of special methods for their solution. For these more sophisticated

OPERATIONS RESEARCH

15-16

and special methods, see journals such as Operations Research, Management
Science, Econometrica, Naval Research Logistics Quarterly, and the publica.:.
tions of the RAND Corporation (Santa Monica, CaliL). See also Refs. 7,
8, and 9.
Numerical Solutions

Numerical techniques of deriving a solution from a model consist of
substituting numbers for the symbols in the model and finding that set
of substituted numbers which yields the maximum effectiveness. Some
numerical procedures are trial-and-error procedures into which one seeks
to build some rationale for the selection of subsequent trials. Others are
so-called iterative procedures in which one converges to an optimum solution through successively better steps.
Newton's Method. An example of a. quite useful trial-and-error
procedure is Newton's method for solving equations, which is a procedure
for determining, within any desired degree of accuracy, the roots of an
algebraic equation. The method is based on the fact that, for a short distance, the tangent to a smooth curve is a good approximation to the curve.
Newton's method may be formulated as follows. Let f(X) = 0 be the
equation under consideration. A root of this equation is the abscissa of a
point at which the curve Y = f(X) crosses the X-axis.
Start with a trial solution, say Xo (see Fig. 1). This value Xo determines
a point P on the curve whose coordinates are (Xo, Yo). The tangent to the
y

x

FIG. 1.

Figure for Newton's method (Ref. 1).

curve at P is then drawn and will intersect the X-axis at (XI, 0). If the
curve and the tangent are nearly coincident over the range (Xo, Xl), the
value X I will be the first a.pproximate root of the equation. Furthermore,

15-17

OPERATIONS RESEARCH

using the fact that the slope of the tangent at P is given by f'(Xo) , namely
the derivative of f(X) evaluated at X = X o, yields

f(Xo)

(20)

Xl = Xo - - - .

f'(X o)

The procedure may be repeated as many times as necessary where, in'
general,
(21)
Whether and how fast the process will converge depends on the function
f(X) and the initial value Xo. Conditions 'favorable to convergence are
evidently that f(Xo) be small and f'(Xo) be large.
To illustrate Newton's method, consider

f(X) = X3 - 3X2

+ 4X -

2.

Although there are many devices which can be used to locate integers between or at which roots will lie, arbitrarily take Xo = 2 as the trial solution.
For the particular f(X) chosen, X = 1 is obviously a solution, and that is
what we wish to approximate by Newton's method. The deviation from
the value X = 1 will, of course, measure the degree of accuracy of this
approximation.
Now
f'(X) = 3X2 - 6X
4,
so that
f(2) = 8 - 12 + 8 - 2 = 2,

+

f' (2) =

12 - 12

+ 4 = 4.

Hence, using eq. (21) yields
Xl = 2 -

f

= 1.5.

By continuing in this manner,

f(1.5)

=

+ 4(!) - 2 = t,
6(!) + 4 = i,

(!)3 - 3(!)2

and

f'(1.5) = 3(!)2 so that

X 2 = 1.5 -

-f:r =

1.143.

15-18

OPERATIONS RESEARCH

Continuing once more gives
f(1.143) = 0.147,
f'(1.143) = 1.060,

so that
0.147

Xa = 1.143 - - - = 1.004.
1.060

One could continue in this manner, measuring at each stage of the iterative procedure the value of f(X i ) 'to indicate how quickly one is converging
to a solution [obviously, at a point of solution X*, f(X*) = 0], and, hence,
obtain this solution within any prescribed degree of accuracy.
Excellent examples of converging iterative procedures are to be found in
the several techniques of linear programming. These are discussed in
Sect. 4.

The Monte Carlo Technique
In many mathematical models, it is necessary to evaluate certain terms
in the model before a solution can be derived. Especially where probability concepts are involved, it may not be possible or practical to evaluate a
given function (within a model) by mathematical analysis. Such expressions, however, can be evaluated by the Monte Carlo technique. Specifically, the Monte Carlo technique is a procedure by which one can obtain
approximate evaluations of mathematical expressions which are built up
of one or more probability distribution functions.
The Monte Carlo technique consists of simulating an experiment to
determine some probabilistic property of a population of objects or events
by the use of random sampling applied to the components of the objects
or events. This statement can best be clarified by means of examples.
The RandoIn Walk ProbleIn. The discovery of the Monte Carlo
technique is said to be due to a legendary mathematician observing the '
perambulation of a saturated drunk. The mathematician wondered how
many steps the drunkard would have to take, on the average, to reach a
specified distance from his starting point, if it were assumed that, at each
step, there was an equal probability of the drunkard stepping off in ariy
direction.
EXAMPLE. To illustrate how the Monte Carlo technique can be applied
to this problem of the "random walk," an estimate can be obtained of the
probable distance traveled after five steps of equal size. (It is further ,assumed, for simplicity of presentation, that these steps are at 45°, 135°,
225°, or 315°.) To do this, refer to Table 2, which is a portion of a two~
digit random number table.
'

OPERATIONS RESEARCH
TABLE

2.

RANDOM NUl\IDEHS

15-19

(Ref. 1)

09 73 25 33
54 20 48 05
42 26 89 53
OJ flO 25 29
80 79 99 70

76
64
19
09
80

53
89
64
37
15

01
47
50
67
73

35
42
93
07
61

86
96
03
15
47

34
24
23
38
64

67
80
20
31
03

35
52
90
13
23

48
40
25
11
66

76
37
60
65
53

80
20
15
88
98

95
63
95
67
95

90
61
33
67
11

90
04
47
43
68

17
02
64
97
77

39
00
35
04
12

29
82
08
43
17

27
29
03
62
17

49
16
36
76
68

06
06
26
57
79

57
01
9.7
33
64

47
08
76
21
57

17
05
02
35
53

34
45
02
05
03

07
57
05
32
52

27
18
16
54
96

68
24
56
70
47

50
06
92
48
78

36
35
68
90
35

69
30
66
55
80

73
34
57
35
83

61
26
48
75
42

70
14
18
48
82

65
86
73
28
60

81
79
05
46
93

33
90
38
82
52

98
74
52
87
03

85
39
47
09
44

11
23
18
82
35

19
40
62
49
27

92
30
38
12
38

91
97
85
56
84

52
80
45
68
59

01
50
29
34
46

77
54
96
02
73

67
31
34
00
48

14
39
06
86
87

90
80
28
50
51

56
82
89
75
76

86
77
80
84
49

07 22 10 94 05 58
32 50 72 56 82 48
83 13 74 67 00 78
01 36 76 66 79 51
69 91 82 60 89 28

60
29
18
90
93

97
40
47
36
78

09
52
54
47
56

34
42
06
64
13

33
01
10
93
68

50
52
68
29
23

50
77
71
60
47

07
56
17
91
83

39
78
78
01
41

48
12
35
91
8~

11
43
09
62
32

76
56
98
68
05

74
35
17
03
05

17
17
77
66
14

46
72
40
25
22

85
70
27
22
56

09
80
72
91
85

50
15
14
48
14

58
45
43
36
46

04
31
23
93
42

77
82
60
68
75

69
23
02
72
67

74
74
10
03
88

73
21
45
76
96

03
11
52
62
29

95
57
16
11
77

71
82
42
39
88

86
53
37
90
22

40
14
96
94
54

21
38
28
40
38

81
55
60
05
21

65
37
26
64
45

49
33
10
55 .
60

91
69
48
07
64

45
45
19
37
93

23
98
49
42
29

68
26
85
11
16

47
94
15
10
50

92
03
74
00
53

76
68
79
20
44

86
58
54
40
84

46
70
32
12
40

16
29
97
86
21

28
73
92
07
95

35
41
65
46
25

54
35
75
97
63

94
53
57
96
43

75
14
60
64
65

08
03
04
48
17

99
33
08
94
70

23
40
81
39
82

37
42
22
28
07

08
05
22
70
20

92
08
20
72
73

00
23
64
58
17

19
47
55
48
52

69
44
72
11
37

04
52
85
62
83

46
66
73
13
17

26
95
67
97
73

45
27
89
34
20

74
07
75
40
88

77
99
43
87
98

74
53
87
21
37

51
59
54
16
68

92
36
62
86
93

43
78
24
84
59

37
38
44
87
14

29
48
31
67
16

65
82
91
02
26

39
39
19
07
25

45
61
04
11
22

95
01
25
20
96

93
18
92
59
63

42
33
92
25
05

58
21
92
70
52

26
15
74
14
28

05
94
59
66
25

49
54
96
80
05

35
99
31
80
88

24
76
53
83
52

94
54
07
91
36

75
64
26
45
01

24
05
89
42
39

63
18
80
72
09

38
81
93
68
22

24
59
54
42
86

45
96
33
83
77

86
11
35
60
28

25
96
13
94
14

10
38
54
97
40

25
96
62
00
77

61
54
77
13
93

96
69
97
02
91

27
28
45
12
08

93
23
00
48
36

35
91
24
92
47

65
23
90
78
70

33
28
10
56
61

71
72
33
52
74

24
95
93
01
29

17
23
56
15
86

90
46
54
51
43

02
14
14
49
19

97
06
30
38
94

87
20
01
19
36

37
11
75
47
16

92
74
87
60
81

52
52
53
72
08

41
04
79
46
51

05
15
40
43
34

56
95
41
66
88

70
66
92
79
88

70
00
15
45
15

07
00
85
43
53

86
18
66
59
01

74 31
74 39
6743
04 79
54 03

71
24
68
00
54

57
23
06
33
56

85
97
84
20
05

39
11
96
82
01

41
89
28
66
45

18
63
52
85
11

08.62 48 26 45 24 02 84 04 44 99 90 88 96 39 09 47 34 07 35 44 13 18
18,51 62 32 41 94 15 09 49 89 43 54 85 81 88 69 54 19 94 37 54 87 30
95'10 04 06 96 38 27 07 74 20 15 12 33 87 25 01 62 52 98 94 62 46 11

OPERATIONS RESEARCH

15-20

Use the following symbolism:
1. The lamppost is represented by the origin of the X- and Y-axis. See
Fig. 2.
y

x

FIG. 2.

Plotting of points (x n , xv) (Ref. 1).

2. The first digit of the two-digit random number selected from the table
represents one unit of X, positive if even or zero, negative if odd.
3. The second digit of the same two-digit random number selected represents one unit of Y, positive if even or zero, and negative if odd.
4. (xn, Yn) represents the position of the drunkard at the end of the nth
phase.
5. d n equals the distance of the drunkard from the lamppost at the end
of the nth phase; that is, d n2 = xn 2 + Yn 2 •
To start at random, select the two-digit number, say in column 10 _and
row 6 of Table 2, and, by reading down, obtain the following five numbers:
36, 35, 68, 90, and 35. These numbers may then be arranged and the
drunkard's moves obtained as shown in Table 3. The points (xn, Yn) may
also be plotted as in Fig. 2.
TABLE

3

Phase
n

First
Digit

Second
Digit

1
2
3
4

3
3

6

5

6

9
3

5
8
0

5

Point
Location
(Xn, Yn)
(-1, 1)
(-2,0)
(-1, 1)
(-2,2)
(-3, 1)

OPERATIONS RESEARCH

15-21

In this example, then, one estimate is that the drunkard will be 3.16 units
from the lamppost at the end of the fifth phase. This is obtained as follows:

d5 2 =
d5 2 =
d5 =

+
9 + 1,
vIW =
X5

2

2
Y5 ,

3.16.

This procedure must be repeated for different random numbers in the
table so that a group of estimates of the desired distance is obtained. The
estimates in this group can then be averaged to yield an average estimated
distance from the lamppost. In general, the estimates will improve as the
number of such samples is increased. The accuracy of the estimate will be
proportional to the square root of the number of samples.
More generally, from many such simulated trials, the probability of the
drunkard's being a specified distance from the lamppost for any number n
of irregular zigzag phases is estimated.
As a point of interest and as a basis for the reader comparing his own
Monte Carlo solutions, it might be pointed out that, for this example, an
analytic solution is obtainable and is given by

i.e., the most probable distance of the drunkard from the lamppost, after a
large number of irregular phases of his walk, is equal to the average length a
of each straight track he walks, times the square root of the number n of
phases of his walk.
For an illustration of the use of the Monte Carlo technique for the solution of problems involving normal distributions, see Sect. 6.
The use of the Monte Carlo technique for any probability distribution
function can be found in Ref. 1, Chap. 7. Reference 1 discusses only the
normal distribution, but the treatment is general and applicable to any
probability distribution function. For a discussion of the nature of tables
of random numbers and a bibliography of tables and works on this subject,
see Ref. 10. Examples of other uses of the Monte Carlo technique can be
found in Ref. 1, Chaps. 7, 14, and 17. See also Refs. 5, 11-13.
3. INVENTORY MODELS

ProbleIn Statelllent. Inventory problems are concerned with minimizing the sum of costs such as those due to (a) carrying inventory, (b)
setup, (c) shortage, (d) obsolescence, and (e) change of work force level.
Inventory problems require the determination of (a) how many (or much)
to order (i.e., produce or purchase) and/or (b) when to order.

OPERATIONS RESEARCH

15-22

This section will introduce the kind of analysis that yields symbolic
models of inventory processes. The mathematical models and solutions
presented here pertain to specific inventory situations and progress' from
the most elementary to somewhat complex ones. For a complete definition
and classification of the characteristics of inventory problems, see Ref. 1,
Chap. 7.
Decisions. The general class of inventory problems to be considered
involves decisions concerning inventory levels. These decisions can be
classified as follows: (1) The time at which orders for goods are to be p!aced
is fixed. The quantity to be ordered must be determined. (2) Both-the
order quantity and order time must be determined.
Cost. The costs associated "rith inventory are 6f three types: (1) setup
cost, the fixed cost per lot of obtaining goods (purchasing or manufacturing); (2) inventory holdirig cost, including cost of money spent in obtaining
the part, storage, obsolescence, handling, taxes, and insurance; (3) shortage
cost, cost resulting from it delay in supplying the goods or an inability to
fill the order at the time of request.
Variables. The three major classes of variables in an inventory problem
are: (1) cost variables,(2) demand variables, i.e., relative to customer demand for goods; (3) order variables, i.e., relative to obtaining the necessary
goods.
i ,
EleInentary Inventory Models (see Ref. 1, Chap. 8)
SYInbols. The following symbols are used throughout the discussion of
the elementary inventory models.
q

qi
q*

r
ri
S
Si
Si

input, or quantity ordered
input which occurs at the beginning of the ith time interval
optimum order quantity
requirements per time interval
requirements for the ith time interval
inventory level
inventory level at beginning of ith interval
inventory level at end of ith interval. Note. Si = Si - ri, and Si
Si-l

S*

t
ts
ts *
T
R
Cl
C2
C8

TEC

+ qi

optimum inventory level at the beginning of a time interval
an interval of time
interval betw~'en placing orders, in units of time .
optimum interval between placing orders
period for which a policy is being established
total requirement for period T
.
holding cost p,er unit of goods for a unit of time
shortage cost per unit of goods for a specified period
setup cost per production run
total expected relevant cost

=

OPERATIONS RESEARCH
71 EC*

15-23

minimum (optimum) total expected relevant cost
probability of requiring r units, where r is a discrete variable
probability density function of r, where r is a continuous variable
probability of requiring S units or less, where r is a discrete variable
cumulative probability function of r, where r is a continuous variable

per)
fer)
per ~ S)

is

F(r)

F(S)

fIr) dr, probability of requiring S or less units, where r is a continu-

ous variable.

Model I. (See Fig. 3.) Given: (a) Demand is fixed and known. (b)
Withdrawals from stock are continuous and at a constant rate. (c) No
shortages are permitted.
The variable costs are: CI and Cs (see Symbols above).

I(
I

ts

,I

E

ts

ts

I

~ ~

)I

IE

T

)I

)I

E

FIG. 3.

:ts

Inventory situation for Model I (Ref. 1).

Problem. To determine: (1) how often to make a production run; (2)
how many units should be made per run.
Cost Equation.

(22)
Solution.

(23)

q*

(24)

ts*

~
=
~

=

2--.
TC I
2--,
RC I

(25) .

Note that Model I is a special case of Model II, wherein C2 = 00. Accordingly, by letting C2 ~ 00 in eqs. (27-30), one readily obtains eqs. (2325).

15-24

OPERATIONS RESEARCH

Model II. (See Fig. 4.) Given: (a) Demand is known and fixed.
Shortages are permitted.

(b)

s
q

ts

IE

) IE

ts

) 1(

) IE

T

I(
FIG. 4.

ts

)'1
)'1

Inventory situation for Model II (Ref. 1).

Cost Equation.
(26)

1
TEC(q, S) = -S2C1 T
2q

+

(q - S)2C T
2

2q

CR

+ _8_.
q

Solution.
(27)

(28)

(29)

(30)

TEC* = V2RTC 1 Cs

~.

~~

Model III. Given: (a) Estimated variable demands and inputs, (b) discrete units, (c) shortages permitted (finite cost of shortage), (d) discontinuous distribution over time of withdrawals and input at a discontinuous
rate, (e) known and constant reorder cycle time.
In this model and in Model VI, the cost of carrying an inventory of parts
until they are used is not taken into consideration. Rather, in this elementary inventory situation, the cost of having excess parts that are never used
is balanced against the cost of being short of parts ,,,hen needed.
Problem. To determine how many units of a given part should be ordered
at the time of the initial purchase order. Here, one is balancing the cost of

OPERATIONS RESEARCH

15-25

having excess parts that are never used against the cost of being short of
parts when needed. No consideration is given to the cost of carrying the
inventory of parts until they are used.
Cost Equation.
S

(31)

TEC(S) = C1

00

:E P(r)(S -

r)

+ C2 :E

P(r)(r - S).

Solution. The optimum value, S*, is given by that value of S which
satisfies the inequalities: '
(32)

P, (r:::;; S - 1)
-

C

< C +2 C < Per
1
2

:::;; S).
-

For further discussion, derivation of this solution, and an example of its
use, see Ref. I, Chap. 8.
Model IV. Given: (a) Estimated variable demand and inputs; (b) continuous (rather than discrete) units; (c) shortages permitted, i.e., finite
cost of shortage; (d) continuous distribution over time of withdrawals and
input at a continuous rate; (e) known and constant reorder cycle time; (f)
negative orders, i.e., returns, not considered.
Problem. To determine the initial order quantity, where one balances
the holding cost against the shortage cost.
Cost Equation.
(33)

TEC(S) = C, i s (S

~ r)f(r) dr + c2lOO(r -

S)f(r) dr.

Solution. (See Ref. 1, Chap. 8.) The total expected cost is minimum for
that value S which satisfies the condition
F(S)

(34)

'is

==

C

fer) dr =

o

,2

C1

+ C2

•

Model V. Given: Conditions of Model IV plus a significant reorder
lead time, i.e., one must take into
account the lapse of time between
the
,
I
placing of an order and the receipt of the goods.
Problem. To determine how much (many) should be ordered for the lcth
day hence (where the reorder lead time is lc days).
Cost Equation.
Let

lc
So

= number of days in the order lead time,
= the stock level at the end of the period preceding, the
placing of the order,

15-26

OPERATIONS RESEARCH

qI, q2, "', qk-l = quantities already ordered and due to arrive on the
1st, 2nd, "', (k - 1)st days hence,
qk = quantity to be ordered for delivery k days hence,
k

R' =

L

ri, the total requirement over the order lead time,

i= 1

S' = total of amount available in stock at end of previous.
period and amounts ordered over the present 7-day
period; i.e.,
k-l

+ L qi + qk.

S' = So

The problem is to determine the value of qk which will minimize the total
expected cost over. the lead time period, i. e., k days. However, since orders
in the amounts ql, q2, .. " qk-l have already been placed, the total expected
cost for the first k - 1 days has already been determined and is no longer
subject to control. Hence, equivalently, the problem is one of determining
the value of qk which will minimize the tota,l expected cost for the kth day
only.
Solution. The stock at the end of the kth period can be expressed as
k-l

(35)

Sk = So

+L

Then, since

k

qi

+ qk - L

ri •.

k-l

(36)

S' = So

+L

qi

+ qk,

i=l

and
(37)

the total expected cost for the kth day will be given by
~

(38)

TEC(S') = Cl

r (S' Jo

+ C2i

R')f(R') dR'

00

~

(R' - S')f(R') dR'.

Equation (38) is equivalent to eq. (33); therefore the optimum value of
S' is given I by (see eq. 34)
8'

(39)

F(S')

==

C

.

r feR') dR' =
Jo

2

Cl

+ C2

•.

Once having determined the optimum value of S', namely S'*, qk * can be
determined from eq. (36), i.e.,
(40)

q,c * = S'* -

(

So

+

k-l)

~ qi •

l=l

OPERATIONS RESEARCH

15-27

See Ref. 1, Chap. 8, for further discussion and an example of the use of
this solution.
Model VI. (See Fig. 5.) Given: Conditions of Model III except that
withdrawals from stock are continuous and at a constant rate.

S

r

r-S
(a)

FIG. 5.

Illustration for Model VI (Ref. 1).

Problem. To determine how many parts should be ordered at the time of
the initial purchase order.
Cost Equation. (a) For r ~ s.
For a given value of r, the average number of units in stock over the
order cycle period is given by
1

(41)

2[8

+ (8 -

. r
r)] = 8 --.
2

The expected cost, for a specific value of r, (r

~

8), will then be

(42)

Therefore, the total expected cost for all r

~

S will be

s

(43)

L: P(r) (8 -

C1

!r).

Cost Equation. (b) For r > S.
Here, as seen from Fig. 5, there will be no shortages tIl (tl + t2) part of
the time, while shortages will occur t2 /(t 1 + t2 ) part of the time. Now
(44)

tl

8

+ t2

r

--- = tl

t2
r- S
and - - = - - .
tl + t2
r

Furthermore, the average amount stocked is Y28, and the average amount
short is Y2(r - 8). Therefore, the holding cost for each value of rover
the period Blr is given by
.
(45)

OPERATIONS RESEARCH

15-28

while the shortage cost for each r, over the period (r - S)/r, is given by
(46) ,

C2

C~ C~) ~
S)

C2 (r

~rS)2

Therefore, the total expected cost will be given by
S

(47)

TEC(S) = C1

S2
P(r)r=S+l
2r
00

L

P(r)(S - !r)

+ C1 L

r=O

~
£..oJ

(r - S)2

per) - - 2r

Solution. The optimum value of S is that which satisfies the condition
(48)

{ p[r

~

(S - 1)]

+ (S

_

!)

i
r=S

<

p(r))
r

k(r

<

C2

+ C2
~ S) + (S +!)
C1

,i:+l p;rl

See Ref. 1, Chap. 8, for further discussion, an example of the use of this
solution, and a case study employing this model.
Inventory Models with Price Breaks

In this section, decision rules are given for the optimum lot size (or optimum purchase quantity) as derived for a class of inventory problems in
which the unit manufacturing (or purchase) cost is variable, that is, subject to quantity discounts or price breaks. Specifically, this section will
generalize on Model I (see Elementary Inventory Models), which describes
a system in which demand is fixed and known, withdrawals from stock are
continuous and at a constant rate, and no shortages are permitted. (See
Fig. 3.)
SYlllbols. The following symbols are used:

TEK
TEK*

cost per unit of manufacturing or purchasing for range i
monthly holding cost expressed as a decimal fraction of the value of the unit
setup cost per production run or, when for purchased parts, the setup cost
associated with the procurement of the purchased items
total expected cost
minimum (optimum) total expected cost

As before,
T
R

ts
q
q*

the period of time for which the decision rules are being determined
total requirement during period T
interval between placing orders
input, or quantity ordered
optimum order quantity, i.e., economic lot size or economic purchase. quantity

OPERATIONS RESEARCH

15-29

Finally, let the price break situation be described by the following:
Range

Quantity

Unit Purchase Price

Rl
R2

1 ~ ql < b1
b1 ~ q2 < b2

kl
k2

Rn

bn -

~

kn

1

qn

where bj (j = 1, 2, ... , n - 1) are those quantities which determine the
price breaks.
Problem. The problem can be stated as one of determining: (1) how
often should parts be purchased; (2) how many units should be purchased
at anyone time.
Basic Cost Equation. ,The basic cost equation for the period T for any
one value of the unit purchase cost kl is given by ,
(49)

TEK

CsR

=-

q

+ klR + "2CsTP + "2kITPq,
1

1

while the basic solution is given by
(50)

q* =

J2C,R,
klTP

and
(51)

Solution. Decision Rules.

(See Ref. 1, Chap. 9.)

One Price Break.
1. Compute q2* from eq. (50), by using k 2. If q2* ~ b, then the optimum purchase quantity is q2*, that is, q* = q2*'
2. If q2* < b, compute TEK*(k l ) from eq. (51) [or, equivalently,
TEI(ql *) from eq. (49)] and compare this with TEK(b l ) as given by

eq. (49).
If TEK(ql *)
If TEK(ql *)

< TEK(b l ), then q* =
> TEK(b l ), then q* =

ql *.
bl .

Two Price Breaks.
1. Compute q3*. If q3,Q ~ b2, then q* = q3*.
2. If q3* < b2, compute q2*' If q3* < b2 and bl ~ q2* < b2, proceed as
in the case of one price break, i.e., compare TEK*(k2) with TEK(b 2) to

determine the optimum purchase quantity.
3. If q3* < b2 and q2* < bb compute TEK*(k l ) and compare it with
TEK(b l ) and TEK(b 2) to determine the optimum purchase quantity.

15-30

OPERATIONS,. RESEARCH

(n - 1) Price Breaks.
1. Compute qn *. If qn * ~ bn-b then q* = qn *.
2. If qn * < bn-b . compute qn-l *. If qn-l * ~ bn- 2, i.e., bn- 2 ~ qn-l
< bn-b proceed as for one price break, i.e., compare TEK*(k n_ 1) with
TEK(b n_ 1 ) to determine q*.
3. If qn-l * < b';~2' compute qn-2*' If qn-2*·~ bn- 3, proceed as for
two price breaks, i.e., compare TEK*(k n_ 2) with TEK(b n_ 2) and
TEK(b n_ 1 ) to determine q*.
4. If qn-2* < bn- 3, compute qn-3*' If qn-3* ~ bn- 4 , compare
TEK*(lr,n_3) with TEK(b n_ 3), TEK(b n_ 2), and TEK(b n_ 1 ).
5. Continue in this manner until qn-i* ~ bn-i-b (0 ~ j ~ n - 1),
and then compare TEK*(k n_ i ) with TEK(b n_ i ), TEK(b n- i +1 ),
,
TEK(b n_ 1 ) to determine the economic purchase quantity q*. Note. Define bo = 1 for this step.
Inventory Models with Restrictions

In some inventory situations it is ·necessary to consider restrictions on
production facilities, storage space, time, or money. When such restrictions are introduced in situations involving more than one product, it is
necessary to allocate the limited available resources among the products.
Models have been developed which enable one to determine how much of
each item to produce (or purchase) under the specified restrictions. Such
models are developed and solved in Ref. 1, Chap. 10. A brief description
of the approach to the solution of such models is given in Sect. 2, Modified
Lagrangian Multipliers. See also Refs. 14-16.
Other Inventory Models

Arrow, Harris, and Marschak (Ref. 17), Eisenhart (Ref. 18), Tompkins
(Ref. 19), and others have treated the problem of determining the optimum
buffer stock needed to protect against shortages, where demand is uncertain. Whitin (Ref. 20) has investigated the interaction between buffer
stocks and lot sizes. Dvoretzky, Kiefer, and Wolfowitz (Refs. 21 and 22)
have shown the conditions under which optimum inventory levels can be
found.
Multistorage Points. Berman and Clark (Ref. 23) have developed
and solved specific models for systems in which a central warehouse supplies a number of field warehouses which, in turn, supply distributors.
DynaInic Models. The dynamic problem of inventory is one in which
consideration must be given to the effect of a decision in the current period
on subsequent periods.
A servomechanism approach to the dynamic inventory problem which
utilizes feedback rules to adjust production to sales has been developed

OPERATIONS RESEARCH

15-31

and applied at Carnegie Institute of Technology (Refs. 8 and 24) for situations of uncertain demand. This procedure applies Norbert Wiener's autocorrelation methods (see Chap. 17). A related method has been developed
by Vassian (Ref. 26).
A number of persons have developed approaches with linear programming techniques. Such linear programming models al,'e designed primarily
for situations with important seasonal fluctuations in demands. Charnes,
Cooper, and Farr (Ref. 27) have treated this case while further assuming
that demand is known. See also Dannerstedt (Ref. 28).
Bellman (Refs. 29-32) has developed "dynamic programming" which
makes it possible to approach these problems through the calculus of variations. See also Bellman, Glicksberg, and Gross (Ref. 25). Holt, Modigliani, and Simon (Ref. 8) have developed "quadratic programming" and
applied it to setting .overall production levels for cases in which the cost
functions are quadratic.
For excellent summaries of the great amount of pertinent research and
application in the inventory area, see Whitin (Refs. 33 and 34) and Simon
and Holt (Ref. 35). See also Ref. 1, Chaps. 8-10.
4. ALLOCATION MODELS

Types of ProbleIns. Allocation models are used to solve a class of problems which arise when (a) a number of activities are to be performed and
there are alternative ways of doing them, and (b) resources or facilities are
not available for performing each activity in the most effective way. The
problem is to combine activities and resources in such a way as to
maximize overall effectiveness. These problems are divisible into two
types:
1. An amount of work to be done is specified. Certain resources are
available; i.e., a fixed capacity and/or ma.terial for doing the job is available and, hence, constitutes a restriction' or limitation. The problem is to
use these limited faciljties and/or materials to accomplish the required work
in the most economical manner.
2. The facilities and/or materials which are to be used are conside"red to
be fixed. The problem is to determine what work, if performed, will yield
the maximum return on use of the facilities and/or materials.
Linear PrograInIning. Generally speaking, linear programming techniques can .be used to solve a special class of allocation problems for which
the following conditions are satisfied:
1. There must exist an objective, such as profit, cost, or quantities, which
is to be optimized and which can be expressed as, or represented by, a
linear function.
2. There must be restrictions on the amount or extent of attainment of

OPERATIONS RESEARCH

15-32

the objective and these restrictions must be expressable as, or representable by, a system of linear equalities or inequalities.
The general linear programming problem may be expressed mathematically as follows:
PROBLEM-STATEMENT I.
Find the values of XI, X 2 , X a, •.. , Xn which
maximize (minimize)
(52)

subject to the conditions that
Xj

(53)

~

0,

. j = 1,2, ... , n

and

(54)

where aij, bi , and Cj are given constants (i = 1, 2, ... , m; j = 1, 2, ... , n).
PROBLEM-STATEMENT II. Given the column vectors from eq. (54),
alj
a2j
Pj =

j

= 1,2, ... , n

amj
(55)

bl

b2

Po =
bm
the problem can also be stated as follows: Determine non-negative values of
Xl, X 2 , ••• , Xn which maximize (minimize) the linear functional
n

(52a)

Z

= XIC I

+ X 2 C2 + ... + XnCn == L

. j=I

XjPj = Po.

OPERATIONS RESEARCH

15-33

Solution of Linear Programming Problems

Among the several techniques which can be used to solve linear programming problems, the most important ones are the simplex technique and the
transportation technique. There is also a special linear programming problem, called the assignment problem, for which special techniques greatly reduce the tremendous amount of computation that would otherwise follow
from the use of either the transportation or simplex techniques. The assignment problem is discussed in Ref. 1, Chap. 12.
Solution of Linear Programming Problems by the
Simplex Technique

The solution of linear programming problems by the simplex technique
may best be illustrated by the solution of a specific problem. The problem, simplified for purposes of illustration, may be stated as follows.
PROBLEM. A manufacturer wishes to maximize the profits associated
with producing two products, Rand 8. Products Rand 8 are manufactured by a two-stage process in which all initial operations are performed
in machine center I and all final operations'may be performed in either
machine center IIA or in machine center IIB. Machine centers IIA and
IIB are different from each other in the sense that, in general, for any given
product they yield different unit rates and different unit profits. In addition, a certain amount of overtime has been made available in machine
center IIA for the manufacture of products Rand 8. Since the use of overtime results in changes (decreases) in unit profits (but not in unit rates),
let us denote separately, by machine center IIAA, any overtime use of
machine center IIA.
The unit times required to manufacture products Rand 8, the hours
available in each machine center, and the unit profits are given in Table 4.
In this table, RI, R 2 , and Ra denote the three possible combinations for
producing R, and similarly, 8 11 S2, and 8 a are defined for product 8.
TABLE 4.

UNIT TIMES REQUIRED TO MANUFACTURE PRODUCTS RAND 8
Product 8

Product R
Operation

Machine
Center

I
IIA
IIAA
IIB
Profit per part
(in dollars)

1
2

Rl
0.01
0.02

R2
0.01

R3
0.01

81

82

83

0.03
0.05

0.03

0.03

0.02

0.05
0.03

0040

0.28

0.32

0.08
0.72

0.64

0.60

Hours
Available
850
700
100
900

OPERATIONS RESEARCH

15-34

The problem is to determine how much of each product should be made
through the use of each possible combination of machine centers so as to
maximize the total profits, 'and to keep in mind the prescribed limitations
on the capacities of the machine centers. The assumption here is that one
can sell all that one can produce. This is a simplification which may be
removed very easily by imposing additional restrictions in the form of
maximum permissible quantities of each product. (See Ref. 36.)
Silllplex Solution. The simplex technique is a procedure which,
through a series of repetitive arithmetic operations, progressively approaches, and ultimately reaches, an optimum solution. The procedure
may be summarized briefly as follows:
1. The problem is first set up in mathematical form in which all relevant
initial relationships and restrictions are stated.
2. The problem is then set up in tabular form.
3. An initial (feasible) solution is determined.
4. Alternative changes to this solution are evaluated.
5. A new solution is determined by introducing the "most favorable"
alternative change.
6. Steps 4 and 5 are repeated to derive successively better solutions.
7. When, at any stage, step 4 evaluates no alternative choice favorably,
the procedure is complete and gives an optimal solution.
More explicitly, the simplex technique is carried out as indicated in the
following steps:
Step 1. Rephrase the problem in mathematical form. Let XI, X 2 , X a,
X 4 , X 5 , X6 denote the amounts to be made of products RI, R2 , R a, SI, S2,
Sa, respectively. Then the total profit Z will be given by
(56)

Z = 0.40X l

+ 0.28X2 + 0.32Xa + 0.72X4 + 0.64X5 + 0.60X

6•

Furthermore, the restrictions to the problem will be given by
O.OIX l
(57)

+ 0.OIX2 + O.OIXa + 0.03X4 + 0.03X5 + 0.03X6 ~ 850
0.02X 1 + 0.05X4 ~ 700
0.02X2 + 0.05X5 ~ 100
0.03Xa + 0.08X6 ~ 900.

Therefore, the problem may now be restated as follows: Determine the values of Xj ~ 0 (where j = 1, 2, "', 6) which maximize eq. (56) subject to
the restrictions of eqs. (57).
The restrictions X j ~ 0, j = 1, 2, "', 6, arise from the fact that, since
the manufacturing process is irreversible, one must preclude the appearance of negative values for these variables.

OPERATIONS RESEARCH

15-35

Step 2. Reduce the system of inequations (i.e., the restrictions) to an
equivalent system of equations by introducing new IJ,on-negative variables
X 7 , X s , X g , X lO • These new variables, X 7 , X s , X g , and X lO , are variously
called "disposal activities," "pseudo variables," or "slack variables." In
this problem, it can be seen that positive values of these slack variables
represent underutilization of capacity in machine centers I, IIA, IIAA and
IIB respectively. The introduction of these slack variables results into the
system of equations:
0.01X1

(58)

+ 0.01X2 + 0.01X3 + 0.03X4 + 0.03X5 + 0.03X6 + X 7 = 850
0.02X 1 + 0.05X4 + Xs = 700
0.02X2 + 0.05X5 + Xg = 100
0.03X3 + 0.08X6 + XlO = 900.

Step 3. Complete the transformation of the given set of eqs. (56) and
(58) into the standard form used in the simplex technique by making the
following set of transformations. Rearrange eqs. (58) so that corresponding Xis appear in the same column. Then let the symbol P j denote the
column of coefficients of Xj (j = 1, 2, "', 10), and Po denote the righthand column of numbers in the system of eqs. (58).
Assuming a zero profit or cost associated with each slack variable X 7 ,
X s , X 9 , X lO , the linear programming example may now be restated as follows: Determine the values of a set of non-negative Xj (where j = 1, 2,
.. " 10) which maximize the linear form (functional)
(56a)

Z = OAOX 1

+ 0.28X2 + 0.32X3 + 0.72X4 + 0.64X5
+ 0.60X6 + 0·X7 + O·Xs + O·Xg + O·XlO

subject to the restrictions
10

(58a)

L

XjPj

=

Po.

j=1

Step 4. Exhibit the column vectors P j in a systematic, i.e., tabular, form.
This is done in Table 5 by means of eqs. (58), all blank spaces in the table
representing zeros.
It should be noted that eqs. (58) can be generated simply by multiplying
each coefficient in any Pj column by the corresponding Xj and then reading
across the rows. (The bold vertical line shows where to place the equal
signs.)
The square submatrix formed by {P 7 , P s, P 9 , P lO }, which consists of
elements that are equal to Ion the main diagonal and that are everywhere
else equal to zero, is of special importance.' This matrix is called the unit

15-36

OPERATIONS RESEARCH
TABLE

PI

P2

0.01

0.01

5.

COLUMN VECTORS FOR SIMPLEX SOLUTION

Pa

P4

0.01

0.03

P6

P6

P7

Ps

P 9 P IO

Po

- - - - - - - -- - - - -- ---- -- -0.03

0.03

1

850

---- -- -- -- -- --- -- ---- - -

0.02

1

0.05

700

- -- -- -- -- -- --- -- ---- - 0.02

0.05

1

lOO

- -- -- ----- -- --- -- -- -- - -

0.08

0.03

1

900

or identity matrix. The set of vectors which form the identity matrix are,
in turn, said to be a unit basis of the particular space of interest, which is,
in this problem, a four-dimensional space. The basis vectors are linearly
independent vectors in terms of which every point in the n-dimensional
(here, n = 4) space may be uniquely expressed and in terms of which a
solution (or solutions) will be stated.
Step 5. The columns of Table 5 are now rearranged as shown in Table 6a.
Then, a column labeled "Basis" is inserted to the left of the Po column and,
in this column, the basis vectors are listed. For this example, the slack
vectors form the unit basis. In some problems for which some of the restrictions are stated either in terms of equalities or in terms of inequalities
which impose minimum limits, so-called artificial vectors will have to be
introduced in order to form a unit basis (see Ref. 36). It should be noted
also that structural vectors may be such that they may be included in the
unit basis.
Next, a row of Cis is added, where the Cis are defined as the coefficients
of the corresponding Xis in the expression for Z given in eq. (56). Then,
a column of Ci's is added, these corresponding to the Cis, but having the
subscript i to denote the row, rather than the subscript j, which is used to
denote the column. The expression for Z can now be written as
10

(59)

Z

=

L: CjXj .
j=1

Step 6. Next, add a row of numbers labeled Zj, where j denotes the appropriate column. Letting X ij denote the element in the ith row and jth
column of the table, the Z/s (including Zo) are defined by
(60)

Zj =

L: CiX ij .
i

TABLE

6.

SIMPLEX METHOD

(Ref. 1)

(a) First Feasible Solution
Ci
~
J

Basis

Po

P7

Ps

Pg

PIO

0.40

0.28

0.32

0.72

0.64

PI

P2

P3

P4

Ps

0.60

P6
- ......

-

P7

850

Ps
Pg

700

PIO

0.01

1

-

1

100

.

0.01

0.02

-

0.03

0.03

_10.051

0.03

o
"'tJ

-

0.05

0.02

1

m

::0

~

0.08

0.03

1

900

0.01

(5

Zj

Z

Ul

. Zj - Cj

-0.40

-0.28

-0.32

-0.72*

-0.64

-0.60

::0

m
m

Ul

»

::0
()

(b) Second Feasible Solution

:I:
I

0.72

~

f-

P7

430

P4

14,000

Pg

100

PIO

900

1

-0.002

-0.6

0.01

1

0.4

20

0.03

0.01

10.05\

0.02

1

0.08

0.03

1

Zj

10,080

14.4

0.288

Zj - Cj

10,080

14.4

-0.112

0.03

0.72
-0.28

-0.32

-0.64*

-0.60

01

W
~

TABLE

6.

SIMPLEX METHOD

(Ref.- I)-Continued

01

(c) Third Feasible Solution
Ci
~
Basis

-+

Po

0.72

P1
P4

14,000

0.64

Ps

2,000

-

P10

370

P1

Ps

P9

1

-0.6

-0.6

P10

20

..,-.

W

-

-,

0.28

0.32

0.72

0.64

PI

P2

Pa

P,

Ps

-0.002

-0.002

20

(X)

~

0.40

0.01

0.60~

- --p&---

0.03

0.4

1
0.4

1

900

-

o
""'C

1
0.03

m

10.081

:::c

>
--I

0Z

Zj - Cj

11,360

14.4

12.8

-0.112
- - -

-0.024

----~-

-0.32
---

-

-0.60*

:::c
m
m

---

-

(J)

(J)

>
:::c

(d) Fourth Feasible Solution

(")

:::I:

P7

-+

"-0.72

P4

14,000

0.64

Ps

2.000

0.60

1

32.5

P6

11.250

Zj - Cj

18.110

~-'---------

-0.6

-0.6

3

-8

-0.002

-0.002

[QJJ

20

1

3

12!

14.4

12.8

1

0.4

20

---

-1
800

7!

8

-0.112* -0.024

1

-0.095
-

(e) F1Jih Feasible Solution
Ci
~
J

Basis

P7

Po

102.5

0.40

PI

35,000

0.64

Po

2,000

.... 0.60

P6

11 ,250

~

Zj - Cj

P7

Ps

1

-'2

1

Pg

PIO

-0.6

-i

0.40

0.28

0.32

0.72

0.64

0.60

PI

P2

P3

p.

P6

Ps

-0.002

-m

0.005

t

1

50
20

rn

-4-2

22,030

20

12.8

1

0.4

-

-0.024

7t

-0.095*

1

m

:;:c

~

(5
Z
en

0.28
--

----

o-c

:;:c

m

en
m

»
:;:c

(f) Sixth Feasible Solution

n

:::c

P7

140

·0.40

PI

35,000

0.64

Po

2,000

r~

0.32

P3

30,000

Zj - Cj

24,880

1

-v1

-0.6

-!

t

1

50

@]]

20

,.100

20
-

12.8

¥

-do

0.005

-0.002

1

t

1

-0.024*
--

0.28

0.2st

01

W
-0

111

1o

TABLE

6.

SIMPLEX METHOD

(Ref. I)-Continued

(g) Maximum Feasible Solution
Ci
~
J

Basis

Po

P7

150
35,000

0.28

PI
P2

0.32

Pa

30,000

Zj - Cj

25,000

0040

P7

Pg

1

-2"

1

P9

PIO

1

-a1

-2"

50

5,000

0040

0.28

0.32

0.72

0.64

0.60

PI

P2

Pa

P4

P5

P6

1

mro

1

21fO

m

;:0

»-I
5
z
en

1

i

1

50

200

o
."

;:0

m
m

en

5

1

»

'2

;:0

100
-3-

20

14

loj

()

8

1

3

0.28

0.06

:J:

0.25i
I

OPERATIONS RESEARCH

15-41

Step 7. A row labeled Zj - Cj is entered into the table and for any
column, say jo, consists of the corresponding CjO subtracted from the value
of ZjO which was entered in the previous row.
Steps 1 through 7 complete the first phase of the simplex technique calculations and result in what is known as a feasible solution to the problem,
namely a solution which satisfies all the restrictions but which does not
necessarily yield the optimum result. This feasible solution is given by the
column vector Po (Table 6a) in terms of the basis vectors P 7 , P s , P g , P lO ,
namely,
(61)

X 7 = 850;

Xs = 700;

Xg = 100;

X 10 = 900.

That is, the initial feasible program consists of "Do not use any of the time
available in any of the machine centers; i.e., do nothing," thus resulting in
a net profit of Z = O.
Optimum Solution Criteria. Having obtained a feasible solution,
one can proceed to the optimum solution by considering the following mutually exclusive and collectively exhaustive possibilities:
Ml. Maximum Z = 00 (i.e., maximum Z is infinitely large) and has been
obtained by means of the present program.
M2. IVlaximum Z is finite and has been obtained by means of the present program.
M3. An optimum program has not yet been achieved and a higher value
of Z may be possible.
The simplex technique is such that possibilities M1 or M2 must be
reached in a finite number of steps. Furthermore, if one remembers that
X ij denotes the element in the ith row and jth column of the table, the
technique is such that, for a given tableau (i.e., table or matrix):
Cl. If there exist any Zj - Cj < 0, either Ml or M3 holds: (a) if all
Xij ~ 0 in that column (for which Zj - Cj < 0), then Ml is true; (b) if
some Xij > 0, further calculations are required, i.e., M3 holds.
C2. If all Zj - Cj ~ 0, a maximal Z has been obtained (M2).
Iterative Procedure to an Optimum Solution. In the example
(Table 6a), Zl - C1 < 0 (as are Z2 - C2 through Z6 - C6) and, furthermore, some of the coefficients under PI are greater than zero. Hence, by
condition C1b, further calculations are required (i.e., condition M3
holds).
To discover new solutions, it is possible to proceed in a purely systematic
fashion by the simplex technique. Furthermore, any new solution so obtained will never decrease the value of the objective functional (although
an increase need not occur), and, as stated earlier, the optimal solution, if
one exists, must be reached in a finite number of steps. Hence, the simplex
technique is a converging iterative procedure.

15-42

OPERATIONS RESEARCH

Step 8. Of all the Zj - Cj < 0, choose the most negative. (In the particular example, this is Z4 - C4 = -0.72 and is so indicated by an asterisk
in Table 6a.) This determines a particular P j (namely P 4) which will be
introduced into the column labeled "Basis" in Table 6b.
Step 9. Determine the vector which this P j will replace by dividing all
the positive Xij appearing in the Pj column into the corresponding X iO
which appears in the same row under Po. (Since all the components of Po
must be non-negative, all these ratios must, in turn, be non-negative.) The
smallest of these ratios then determines the vector to be replaced. In the
present example, P 4 i's to replace one of the vectors P 7 , P s , P g , or P IO '
Under P 4 , there are two positive Xij, namely X 7 ,4 = 0.03 and X S ,4 = 0.05.
The division of these Xii's into the corresponding XiO'S which appear under
Po gives a minimum of 14,000 (Le., 700/0.05). Thus, P g is the vector to
be replaced by P 4, so that a new basis is formed consisting of the vectors
P 7 , P 4 , P g, and P IO (see Table 6b).
Step 10. Let subscript k denote "coming in," subscript r denote "going
out," X'ij denote the elements of the new matrix, and
(62)

. X iO
cp=mm-.
i
X ik

[i.e., cp is the minimum of all ratios (XiO/X ik ) for Xik > 0]. The elements
of the new matrix (X'ij) are calculated as follows. The elements, X' kj, of
the row corresponding to the vector just entered into the unit basis are
calculated by

X rj

X'kj = - .

(63)

X rk

The other elements (X'ij) of the new matrix are calculated by
(64)

Xrj) X Ok
X' tJ = X tJ.. - ( -X
rk' t ,
0

0

where eq. (64) also applies to the XiO'S appearing under Po and to the Zj
- Cj in the entire bottom row (but not to the Z/s in the second to the
last row).
The new value of the profit function will be given by
(65)

or, since Co = 0, the profit function will be given by
(66)

OPERATIONS RESEARCH

15-43

For example, starting with Table 6a and proceeding to Table 6b, the
most negative Zi - Ci is Z4 - C 4 = -0.72. Therefore Ie = 4. Hence,
from eq. (62),
¢

O for all X i4 > 0,
= min X'
_t
i

X i4

i.e.,
850

)

700

= min ( = 28,333; = 14,000 = 14,000.
0.03
0.05

¢

Therefore, P4 will replace P s ; or, in our notation, Ie = 4, r = 8.
The elements in the P 4 row of Table 6b are then computed by eq. (63)

XSj _ (XSi)
•

_
X I 4i -X S4

0.05

Therefore,

X /40 =

XSO) =
(-0.05

X /41 =

XSl)
(-0.05

(700)
= 14,000,
0.05

= (0.02)
= 0.4, etc.
0,05

For the elements of the other rows, where Ie = 4, r = 8 are substituted
into eq. (64),
X' tJ.. = X tJ.. -

-0.05 X
(-XXSi) X'4 = X .. - (XSi)
tJ

t

S4

t

'4.

Therefore,
X' 70

= X 70

-

XSO) (X
(0.05
-

74 )

= 850 - (700)
(0.03)
0.05

= 850 - (14,000)(0.03) = 850 - 420 = 430.
and

= (-0.40) - ( -0.02) (-0.72)
0.05
= (-0.4) - (0.4)(-0.72) = -0.4

+ 0.288 =

-0.112, etc.

OPERATIONS RESEARCH

15-44

Finally, the new value of the profit functional will be given (see Table 6b)
by
(Zo - Co)' = (Zo - Co) - cP(Z4 - C4)

=

0 - 14,OOO( -0.72)

=

+ 10,080.

The results are shown in Table 6b.
Step 11. The process is then repeated until such time as either condition
Ml or condition M2 holds. For the present example, the solution is obtained after six iterations, i.e., six tableaux or matrices after the first (see
Tables 6a-g). The final tableau, Table 6g, yields the optimum solution.
(If any other optimum solutions existed, they would be indicated by Zj
- Cj = 0 for j's other than those appearing in the basis. Here, Zj - Cj
= 0 for j = 1,2,3, and 7 only. Hence no other optimum solutions exist.)
This optimum solution is also stated, both in terms of the number of parts
and hours required, in Tables 7 and 8.
TABLE 7.

OPTIl\WM PROGRAM (NUMBER

Total

35,000
5,000
30,000
R = 70,000
$25,000

Total profit
TABLE 8.
Machine
Center

1
2

I
IIA
IIAA
IIB

o parts

parts
parts
parts
parts

o parts
o parts
8=0

+

0=$25,000

OPTIMUM PROGRAM (HOURS)
Product 8

Product R
Operation

PARTS)
Product 8

Product R

R1 (Centers I-IIA)
R2 (I-IIAA)
R3 (I-IIB)

OF

R1

R2

R3

81 82 83

350
700

50

300

0
0

100

0

0

0
900

0

SurHours
Used

Hours
Avail.

plus
Hours

700
700
100
900

850
700
100
900

150
0
0
0

Thus, one readily sees that the optimum (most profitable) program under
the prescribed conditions consists of manufacturing 70,000 units of product R to the complete exclusion of product S. Furthermore, by e"q. (56)
and also by (Zo - Co) in the optimum tableau, the total profits will be

+ 0.28(5,000) + 0.32(30,000) + 0.72(0)
+ 0.64(0) + 0.60(0)

Z = 0.40(35,000)

= $25,000.

15-45

OPERATIONS RESEARCH

Alternate Step 8. One should note at this point that the improvement
from one tableau to the next is given by -cp(Zk - Ck), see eq. (65). Furthermore, in practice, one need not select the most negative number (Zj - Cj)
but, rather, that negative number which yields the greatest improvement. Thus,
in the example,
-CP(ZI - Cl )

-

-CP(Z2 - C2)

-

-cp(Za - C-q)

-

-CP(Z4 - C4)

-

-CP(Z5 - C5)

-

-cp(Z6 - C6) = -

COO)
COO)
COO)
COO)
COO)

(-0.40) = 14,000,
0.02

k= 1

(-0.28) = 1400
0.02
'
,

k= 2

(-0.32) = 9600
0.03
'
,

k=3

(-0.72) = 10,080,
0.05

(- 0.64) = 1 280
'
,
0.05

(900)
0.08 (-0.60) = 6,750,

k=4
k=5
k = 6.

Therefore, instead of introducing P 4 into the basis, a greater gain is achieved
at this step through the introduction of Pl. In this particular example,
following alternate Step 8 enables one to reach the optimum solution, Table
6g, in three less iterations.
Further Restrictions in Linear Progralllllling Problellls. Once
having established the solution to a given linear programming problem, one
may wish to consider (or evaluate) further restrictions on the variables.
Thus, by referring to the preceding example, these restrictions may be in
the form of: (1) minimum requirements for product S, (2) changes in the
amount of time available in the machine centers, (3) changes in the prices
of the various products, (4) changes in the unit production rates, e.g., due
to the "introduction" of new equipment.
The simplex technique is such that, in general, new optimum solutions
can easily be constructed in terms of such added restrictions by' making use
of the optimum solution to the original problem. For a full discussion of
this point, see Ref. 1, Chap. II.
Solution of Minilllization Problellls by the Silllplex Technique.

To solve minimization problems by the simplex technique one may, in
Step 8, select either (1) the most positive Zj - CJ", or alternately (2) the

OPERATIONS RESEARCH

15-46

most negative Cj - Zj, and then proceed as before to the solution of the
problem.
The Transportation Problem

A linear programming problem, for which a special technique has been
developed is the so-called transportation problem which may be stated as
follows.
PROBLEM. Determine Xij ~ 0 which minimize
m

(67)

Z

n

= 2: 2: CijXij,
i=lj=l

such that
(68)

n

2: Xij =

Ai

(i = 1, 2, "', m)

Bj

(j = 1, 2, "', n).

j=l

and
(69)

m

2: Xij =
i=l

The transportation problem is obviously a special case of the general
linear programming problem; hence, it can be solved by the simplex
.
technique.
However, a special solution technique, far simpler than the simplex
technique, has been developed for solving transportation problems and,
quite appropriately, it is called the transportation technique (Ref. 39).
The procedure in the transportation technique is outlined as follows:
1. The problem is set up in tabular form.
a. All requirements are explicitly stated.
b. All permissible slack in the system is explicitly stated.
c. All appropriate costs and/or revenues are determined.
d. An objective function is determined.
e. The computational framework is established.
2. An initial solution is determined.
a. The initial solution must be technically feasible, i.e., it must meet all
restrictions.
3. Alternative choices are evaluated.
a. Changes in the solution are made one at a time.
b. The evaluation is of the complete effect of each unit change.
4. The "most favorable" alternative is selected.
5. The number of units to be included in this change is determined.
a. Owing to the linear nature of the model, each unit contributes the
same cost or profit difference.

. 15-47

OPERATIONS RESEARCH

b. The limit on the number of units involved in the particular change is
technical feasibility (non-negativity requirements).
6. A new solution is determined.
a. The elements to change and the number of units to include have been
previously determined.
7. Steps 3 through 6 are repeated. The process is a converging iterative
one.
S. When Step 3 evaluates no alternative favorably, the procedure is complete and one has an optimal solution.
EXAMPLE. This example, taken from Ref. 37a, deals with the problem
of moving empty freight cars from three "excess" origins to five "deficiency" destinations in such a manner that; subject to the given restrictions, the total cost of the required movement will be a minimum. The
specific conditions of the problem and the unit (per freight car) shipping
costs are given in Tables 9 and 10.
Table 9 states that origins 8 1 , 8 2 , and 8 3 have surpluses of 9, 4, and S
empty freight cars, respectively, while destinations D 1 ; D2 , D3 , D4 , and D5
TABLE

9.

PHYSICAL PROGRAM REQUIREMENTS

~ Destina-

~

DI

D2

D3

D4

Do

Surpluses

Origins

81

S2
S3
Deficiencies

-- -- -X I2 X I3 X l4 X I5
-----X 2I X 22 X 23 X 24 X 25
-----X31 X 32 X33 X 34 X35

Xu

3

-----5

4

6

3

9
4
8
21

are in need of 3, 5, 4, 6, and 3 cars, respectively. For simplicity, it ha.s been
assumed that the problem is self-contained, i.e., that the number of excess
cars is equal to the number of deficiencies. Any transportation problem
ca.n be made self-contained through the introduction of dummy origins or
destinations.
Table 10 lists the unit costs Cij of sending an empty freight car from the
ith origin to the jth destination.

15-48 .

OPERATIONS RESEARCH
TABLE

I~

10.

UNIT SHIPPING COSTS

I

tians

D1

D3

D2

D4

D5

Origins

Cn

C12

C14

C13

C15

S1

-10
C21

-5

-20

C23

C22

-9
C24

-10

C25

S2

-2

C31

-10
C32

-8
C33

-6

-30

C34

C35

S3

-1

-20

-7

-10

-4

The solution to this problem by the transportation technique is obtained
as indicated in the following steps.
Step 1. Set up the tables listing the physical program requirements
(Table 9) and unit shipping costs (Table 10).
Step 2. Obtaining a First Feasible Solution. Write down an initial (feasible) solution, namely one which satisfies the movement requirements. (If
a feasible solution also minimizes the total cost, it is then called an optimum
feasible or, in this case, a minimal feasible solution). This can easily be done
by applying a technique which has been developed by Dantzig, Ref. 38, and
which Charnes and Cooper, Ref. 39, refer to as "the northwest corner rule,"
The northwest corner rule may be stated as follows:
1. Start in the upper left-hand corner of Table 9 (requirements) and
compare the amount available at 8 1 with the amount required at D 1 • (a)
If Dl < S1, i.e., if the amount needed at Dl is less than the number of units
available at 81, set X 11 equal to Dl and proceed to cell X 12 , i.e., proceed
horizontally. (b) If Dl = SI, set X 11 equal to Dl and proceed to cell X 22 ,
i.e., proceed diagonally. (c) If Dl > 81, set X 11 equal to SI and proceed
to X 21 , i.e., proceed vertically.
2. Continue in this manner, step by step, away from the upper left corner until, finally, a value is reached in the lower right corner. Thus, in the
present example (see Table 11), proceed as follows:
(a) Set X 11 equal to 3, namely, the smaller of the amount available at
8 1 (9) and that needed at Dl (3).

OPERATIONS RESEARCH
TABLE

I~
Hons

11.

15-49

FIRST FEASIBLE SOLUTION

Dl

D2

D3

D4

Do

Total
Surpluses

Origins

81

----

®

®

CD

9

------

®

82

CD
®

®

8

6

3

21

4

------

83

- -- - - -

Total deficiencies

5

3

4

(b) Proceed to X 12 (rule la). Compare the number of units still available at 8 1 (namely 6) with the amount required at D2 (5) and, accordingly,
let X 12 = 5.
(c) Proceed to X 13 (rule la), where there is but one unit left at 8 1 while
four units are required at D 3 . Thus set X 13 = 1.
(d) Then proceed to X 23 (rule lc). Here X 23 = 3.
(e) Continue and set X 24 = 1, X 34 = 5, and, finally, in the southeast
corner, set X35 = 3.
The feasible solution obtained by this northwest corner rule is shown in
Table 11 by the circled values of the Xij. That this set of values is a feasible solution is easily verified by checking the respective row and column
requirements. The corresponding total cost of this solution is obtained by
multiplying each circled X ij in Table 11 by its corresponding Cij in Table
10 and summing the products. For any cell in which no circled number
appears, the corresponding X ij is equal to zero. That is, the total cost is
given by:
5

(70)

Total cost =

3

3

5

L: L: CijXi.i = L: L: CijXij.
j=1 i=1

i=1 j=1

The total cost associated with the first feasible solution is computed as
follows:
T.C. = X l1 Cl1

+ X 12 C12 + X 13 C13 + X 23 C23 + X 24 C24 + X 34C34

+ X 35C35

15-50

OPERATIONS RESEARCH

+ (5)(-20) + (1)(-5) + (3)(-8) + (1)(-30)
+ (5) (-10) + (3)( -4)

T.e. = (3)(-10)

-$251 (minus sign means "cost" rather than "profit").
Step 3. Evaluation of Alternative Possibilities. Evaluate alternative possibilities, i.e., evaluate the opportunity costs associated with not using the
cells which do not contain circled numbers. Such an evaluation is illustrated by means of the program given in Table 11 and is exhibited in Table
12 (noncircled numbers only). This evaluation is obtained as follows. (For
an alternative method of evaluation, see Ref. 1.)
TABLE

12.

FIRST FEASIBLE SOLUTION (WITH EVALUATIONS):

I~
tions

Dl

D2

D3

D4

D6

C = 251

Total

Origins

SI

-®

CD

®

1-181

-11

9

®

CD

-18

4

17

19

®

®

8

5

4

6

- -- -

S2

-11

-13

- -- S3

8

Total

3

--

3

21

1. For any cell in which no circled number appears, describe a path in
this manner. Locate the nearest circled-number cell in the same row which
is such that another circled value lies in the same column.
Thus, in Table 12, if one starts with cell SaD! (row 3, column 1), the
value ® at SaD4 (row 3, column 4) satisfies this requirement; i.e., it is the
closest circled-number cell in the third row which has another circled value,
CD at S2D4, in the same column (column 4). The circled number ® in
position SaD5 fails to meet this requirement.
2. Make the horizontal and then the vertical moves so indicated. In
the example, move from SaD! to SaD4 (see Table 12).
3. Having made the prescribed horizontal and vertical moves, repeat the
procedure outlined in Steps 1 and 2. For the example, this now gives cells
S2Da and SlDa respect'ively; accordingly, one moves from CD at S2D4 to
CD at SlDa by way of ® at S2Da.

OPERATIONS RESEARCH

15-51

4. Continue in this manner, moving from one circled number to another
by, first, a horizontal move and then a vertical move until; by only a horizontal move, that column is reached in which the cell being evaluated is
located. (The fewest steps possible should be used in this circumambulatory procedure.) Thus, to continue the example, this step is from CD at
S1D3 to ® at SIDl.
S. Finally, move to the cell being evaluated (here, S3Dl)' This completes the path necessary to evaluate the given celL (Note. For the purposes of evaluation, the path ends, rather than starts, with the cell being
evaluated.)
6. Form the sum, with alternate plus and minus signs, of the unit costs
associated with the cells being traversed. (These unit costs are given in
Table 10.) This is the (noncircled) evaluation to be entered into the appropriate cell in Table 12. Thus, for the example, one has for the evaluation of cell S3D1 :
Path (Table 12)
Unit cost (Table 10)
Evaluation (S3Dl)

S3 D 4
-10
+(-10)

-5
-(-5)

SlDl
-10
+(-10)

SaDl
-1

-(-1) = +8

Accordingly, one enters +8 in cell S3D1 of Table 12.
7. Repeat the procedure outlined until all cells not containing circled
numbers are evaluated.
Step 4. I terative Procedure toward an Optimum Solution. If the noncircled numbers (the evaluations) are all non-negative, an optimum has been
achieved. If one or more noncircled numbers are negative, further improvement with respect to the objective function is possible (e.g., the negative numbers in SID4, S2D2, etc., in Table 12). (At this stage, it should be
quite apparent that one must be careful to circle the values of Xii obtained
in a feasible solution in order to distinguish them from the "evaluation"
numbers which are also in the same table.)
Improvement is obtained by an iterative procedure in which one proceeds as follows:
(a) Of the one or more negative values which &ppear, select the most
negative one, say - N. If there are more than one such values, anyone of
these may be selected arbitrarily.
(b) Retrace the path used to obtain this most negative value.
(c) Select those circled values which were preceded by a plus sign in the
alternation between plus and minus and, of these, choose the one with the
smallest value written in its circle, say m.
(d) One is now ready to form a new table, wherein one replaces the most
negative value, - N, by this smallest value, m.
(e) Circle the number m and then enter all the other circles (except the

15-52

OPERATIONS RESEARCH

one which contained the value m in the previous program) in their previous
cells, but without any numbers inside.
The improvement in cost from one program to the next will then be
equal to mN. Furthermore, as with the simplex technique, one need not
select the most negative number. It is permissible, and sometimes advantageous, to select the first negative number which appears. Since the improvement from one program to the next is given by mN, a study of Table
12 shows that selections of S2DI, S2D 5 , S2D5 , or SlD 5 would have resulted
in improvements of 33, 39, 18, and 11 respectively, as compared with the
TABLE

13.

ITERATIVE PROCEDURE TOWARD SOLUTION

(a) Value to Be Moved

~tin~
tions
Origins '"

Dl

D2

~

SI

D3

D4

D6

Total

-- -0

0

S2

0

CD

9

----

4

0

----

S3
Total

3

5

0

0

8

6

3

21

---4

(b) Second Feasible Solution: C = 233

I~
tions

Dl

D2

Da

D4

D6

Total

CD

7

9

18

0

4

Origins

SI

S2

- - -®

®

®

- -- -11 1-131 CD

--

Sa
Total

-10

-1

1

®

®

8

3

5

4

6

3

21

OPERATIONS RESEARCH

15-53

improvement of 18 resulting from the selection of SlD 4 • Another alternative is to examine all products, mN, and select that negative numbered
cell which results in the greatest improvement, in this case, S2D5.
Thus in Table 12 the most negative number is -18 and appears in both
cells SlD 4 and S2D5 (i.e., - N = -18). For such ties, one may arbitrarily
select either of the cells containing this most negative number. Here, cell
SlD 4 is chosen. Retracing the path used to obtain the -18 value in cell
SlD 4, one then obtains +SlD3, -S2D3, +S2D4, -SlD 4. Of those preceded by a plus sign, namely SlD 3 and S2D4, both have the circled value
CD in their cells. Consequently, either one of these may be chosen as the
circled value to be moved. In this case, cell S2D4 is arbitrarily chosen.
The circled value CD is then entered into cell SlD 4 (see Table 13a, i.e., that
cell where 1-181 appeared in Table 12). (Therefore, the improvement over
the program given in Table 12 will be 1 X 18 = 18 cost units. That is,
the next program (Table 13b) will cost 251 - 18 = 233 cost units.) The
other circles (without numbers) are then entered in the same positions as
before (see Table 13a).
Step 5. A new feasible solution is obtained by filling in the circles according to the given surplus-deficiency (input-output) specifications. This
solution is given by the circled values in Table 13b.
Step 6. The program is then evaluated, as before, and negative (noncircled) numbers still appear.
Step 7. The process is successiyely repeated (Tables 14, 15, and 16)
until, finally, in Table 16 the evaluation of the corresponding program
given therein results in all (noncircled) numbers being non-negative. An
optimum feasible solution, or program, therefore, has been reached.
TABLE

14.

~
tions

THIRD FEASIBLE SOLUTION:

C

=

181

I
Dl

D2

D3

D4

D6

Total

Origins

81

- -- -- ®

CD

0

CD

7

9

0

13

31

13

4

®

8

3

21

- ----- -

82

2

83

- -- -- 1
101 -1
®

Total

1-

3

5

4

-6

15-54

OPERATIONS RESEARCH
TABLE

15.

C

FOURTH FEASIBLE SOLUTION:

=

151

~ Destina-

~

Dl

D2

D3

D4

81

10

-CD CD CD

82

12

83

®

Total

3

D5

Total

Origins

TABLE

16. '

9
7
-4
13
31
13
CD
- -- -- -- - ----1
8
® ®

81

-5

4

6

21

3

OPTIMUM FEASIBLE SOLUTION:

C

=

150

~ Destina-

~

Dl

D2

D3

D4

D5

Total

®

7

9

30

12

4

Origins

- - - -- -

81

10

82

11

1

CD

CD
CD

12
1

CD

®

8

5

4

6

3

21

- --

83

®

Total

3

--

-

- -- -- -

Alternate Optimum Programs. If any of the evaluation numbers in
the optimum tableau are zero, alternate optimum tableaux exist. These
alternate optimum solutions are obtained by essentially the same procedure as that which was just given. The only variation is that the zeros
(if any) which appear in the optimum feasible solutions are now treated in
exactly the same manner as were the negative values.
Furthermore, given such alternate optimum programs, say {PI!, {P 2 },
... , {P n }, where {P n } refers to the set of Xij which form the nth optimum
program, then

(71)

OPERATIONS RESEARCH

15-55

is also an optimum program provided the ai are non-negative constants
such that
n

2: ai = al + a2 + a3 + ... + an =

(72)

1.

i=l

For example, the cost minimization problem represented by Table 17 has
two optimum programs, namely those given in Tables 18 and 19. Table 19
, is obtained from Table 18 (and vice versa) by treating the zero in cell S3D5
of Table 18 (or cell S3D4 of Table 19) as the "most negative number" and
proceeding as before.
TABLE

I~
tions

D1

17.

UNIT COST MATRIX

D2

D3.

D4

D6

-3

0

5

-1

7

-1

6

5

18

Total

Origins

81

- -- -2

-1

-4

- -- -- -2
-5
-3
+1
- -- -- -2
-1
-4
-3

82
83

- -- - - -

2

Total

TABLE

~
tions

18.

2

4

5

OPTIMUM PROGRAM FOR TABLE

D1

D2

D3

D4

17

D6

Total

®

5

®

7

0

6

5

18

Origins

81

82

- -- ®

2
2
4
- -- - - -- 1
2
®

®

83

2

Total

2

- -- -- 2
® CD

- -- 2

5

4

15-56

OPERATIONS RESEARCH
TABLE

19.

ALTERNATE OPTIMUM PROGRAM FOR TABLE

I~
tions

Dl

17

D3

D4

Do

Total

®

2

2

®

5

1

2

(1)

CD

7

2

®

0

CD

6

2

5

4

5

18

D2

Origins

SI

-4

- --

S2

®

S3

2

Total

2

-

- -- -

- - - -- -

An infinite number of derived optimum programs can now be obtained
by forming what are called "convex linear combinations" of the two basic
optimum programs. Thus, if we select two positive fractions whose sum
is unity, e.g., 74 and ~4, we can obtain a new optimum program by multiplying every element of the first program by 74 and every element of the
second program by % and then adding corresponding cells. This yields
the derived optimum program of Table 22 and is obtained as shown in
Tables 20 and 21. Similarly, other optimum programs could be derived
for other non-negative fractions whose sum is equal to 1. Note. In general, derived optimum programs will involve fractional answers. These
programs are for use only where nonintegral answers are realistic.

~
tions

TABLE

20.

D1

D2

i

TABLE

Da

18

=

D4

Do

Total

Origins

81

S2
Sa
Total

------

--

CD
CD - - - - - CD CD
CD CD
------- 2
2
.5
4
5
CD

------- -

5

7
6
18

OPERATIONS RESEARCH

I~
tions

TABLE

21.

D1

D2

i

19

TABLE

D3

15-57

=

D4

Ds

Total

Origins

- - --- - -

CD

S1

CD

S2
S3
Total

TABLE

22.

A

2

®

- -- - - -

®
- - --2

5

4

DERIVED OPTIMUM PROGRAM:

I~
tions

D1

CD
CD
CD

5

- - - - - - - - -----

D2

D3

6

5

1 TABLE

D4

7

18

20

+i

TABLE

Ds

Total

®

5

CD

7

CD CD

6

21

Origins

-- -----®

S1

S2

®

- -- -

®

S3
Total

--

2

@

-----2

5

4

5

18

Solution of MaxiInization Problellls by the Transportation Technique. Although the exposition just given treats only a (linear) minimization problem, it should be obvious that the transportation technique is
equally applicable to (linear) maximization problems. The only difference
in solving maximization problems lies in the preparation of the "profit"
matrix. Whereas in the minimization problem all costs are entered with a
negative sign, here all profits (or whatever units are involved in the maximization problem) are entered without any modification of signs. Once
the initial datum matrix is obtained, one proceeds to the solution exactly
as previously outlined.

OPERATIONS RESEARCH

15-58

Many variations on
the transportation technique are available for the solution of transportation problems. One has already been cited with respect to the selection of
the cell to be introduced in~o the new basis. A second variation, designed
to decrease the number of iterations, involves a rearrangement of the cost
matrix. By using the problem cited in Tables 9 and 10, this may be illustrated as follows.
1. Form a new matrix in which the first row and first column correspond
to the cell yielding the least cost. In the example, this is S3D!. Enter the
totals of 8 for S3 and 3 for D! in the new matrix. Place the smallest of
these two numbers in that cell, S3D!.
Variations on the Transportation Technique.

~

Dl

S3

3

i

Total

:

- -- -- 8
- -- -- --

Total

3

- -- -- -

2. This satisfies the requirement for D!, but still leaves 5 units available
at S3. Hence, select the next least unit cost which involves S3. In the
example, this is -4 in cell S3D5' Therefore, list D5 in the second column
and enter the corresponding total (requirement) of 3. Compare the requirement of 3 units at D5 with the remaining availability of 5 units at S3,
and assign 3 units to cell S3D5.

~

Dl

S3

3

D5

- -- -- 3
- -- -- -

- -- -- Total

Total

3

- -- -- 3

8

OPERATIONS RESEARCH

15-59

3. Since 2 units are still available at 8 3 , select the third highest cost,
namely -7 in 83D3. Enter D3 in the third column along with its total
requirement of 4 units. Comparing the requirement of 4 units at D3 with
the remaining availability of 2 (8 - 3 - 3) units at 8 3 , assign 2 units to
cell 83D3' thereby using all available units at 8 3 but leaving 2 units still to
be assigned to D3 •

~

Dl

83

3

D6

D3

Total

- -- -- 3

2

8

- -- -- - - - -- -

- -- -- Total

3

3

4

4. Compare the costs associated with D3 (C 13 = -5 and C23 = -8)
and select 8 1 as the entry for the second rowand, with it, enter the availability at 8 1 , namely 9. Compare this availability at 8 1 (i.e., 0) with the
remaining requirement at D3 (i.e., 2 = 4 - 2) enter 2 units in cell SID3,
and thereby satisfy the requirement at D3 •

~

Dl

83

3

D5

D3

Total

-2
3
- ---- - -

81

9

- -- - - -

- -- -- Total

8

3

3

4

OPERATIONS RESEARCH

15-60

5. By proceeding in this fashion, the following matrix is obtained:

~

Dl

S3

3

D5

D3

D4

.

D2

Total

- -- -- 3

2

8

.---- - - - - -

SI

2

6

1

9

4

4

5

21

---- ----

S2
Total

3

- -- 3

4

6

The cost for this initial feasible solution is given by
3( -1)

+ 3( -4) + 2( -7) + 2(05) + 6( -9) + 1( -20) + 4( -10),

i.e., neglecting the minus sign which indicates cost,
T.C. = $153,
as compared with the first feasible solution of $251 obtained by the northwest corner rule (and with the optimum solution of $150). Such a reshuffling of the cost matrix generally leads to a better (i.e., lower cost or
higher profit) first feasible solution so that the optimum solution is usually
reached after a smaller number of alterations.
The reader should note that this first feasible solution costing $153 could
have been obtained without reshuffling the matrix. One simply starts in
the c~ll of lowest cost (here S3Dl) and proceeds accordingly.
For further details of the transportation technique, including a discussion of so-called degenerate cases, see Ref. 39. The mathematical derivation of the transportation technique is given in Ref. 40, Chap. 23.
Alternate Method of Evaluating Cells in Transportation Technique. An alternate evaluation technique (or procedure) is presented by

means of the problem represented by Tables 23 and 24, namely the unit
cost table and the table listing the first feasible solution of the transportation problem given earlier (see Tables 10 and 11). The evaluation technique presented here is a variation of that originally designed by Dantzig
in Koopmans (Ref. 40, Chap. XXI), and is part of the procedure described
in Henderson and Schlaifer (Ref. 41). The discussion of determining the
costs of deviating from the optimum solution is given in Ref. 41. The first

OPERATIONS RESEARCH
TABLE

I~
tions

23.

15-61

UNIT SHIPPING COSTS

D1

D2

D3

D4

Ds

-5
-8

-9

-10

-30

-20

-7

-10

-10
-6
-4

Origins

Sl
S2
S3

-20 .

-10
-2
-1

TABLE

~
tions

24.

D1

FIRST FEASIBLE SOLUTION

D2

D3

D4

Ds

Total

Origins

Sl

-3

S2
S3
Total

3

5
1
- -- -- 1
3
- -- -- 5

3

8

4

3

21

5

6

9
4

part of the evaluation procedure is to form a new table (Table 25) corresponding to Table 24, but listing the unit costs rather than the amounts to
be shipped. These costs are given by the boldface numbers in Table 25.
Add to Table 25 a column labeled "Row Values" and a row labeled
"Column Values" and calculate these values as follows:
1. Assign an arbitrary value to some one row or some one column. For
purposes of illustration, let us assign the value 0 to row 8 1 •
2. Next, for every cell in row 8 1 which contains a circled number representing part of the feasible solution, assign a corresponding column value
(which may be positive, negative, or zero) which is such that the sum of the
column value and row value is equal to the unit cost rate.
More generally, if ri is the row value of the ith row, Cj the column value
of the jth column, and Cij the unit cost for the cell in the ith row and jth

15-62

OPERATIONS RESEARCH
TABLE

25.

UNIT COSTS AND FICTITIOUS COSTS CORRESPONDING TO
FIRST FEASIBLE
.. ,:; SOLUTION

..

I'~
tions

Dl

D2

D3

D4

D6

Row
Values

Origins

81
82 !
83

--10 -20
-5 -27 -21
- -- - --- - _-13 -23
-8 -30 -24
- -- -10
12
-3
-4
7

- -- -

Cqlumn Values -10

-20

-5

-27

0
-3
17

-21

I

column which contains a circled number, then all row and column values
are obtained by the equation
(73)
Thus, by assuming r1 = 0, it can be immediately determined from eq. (73)
that
C1 = -10;
C2 = -20;
C3 = -5.
3. Next, since C3 = -5 and C23 = -8, determine that r2 = -3.
4. Since r2 = -3 and C24 = -30, then C4 = -27.
5. From C4 = -27 and C34 = -10, then r3 = + 17 is obtained.
6. Finally, for r3 = +17 and C35 = -4, C5 = -21 is obtained.
This procedure for assigning row and column values can be used for any
solution-matrix which is nondegenerate, i.e., given a matrix of m rows and
n columns, where the solution consists of exactly m +"n - 1 nonzero elements. (Any solution consisting of less than m + n - 1 nonzero elements
is said to be degenerate. Simple methods for dealing with degeneracy may
be found in Charnes and Cooper (Ref. 39), Henderson and Schlaifer (Ref.
41), and Dantzig (Ref. 38).)
After all row and column values for Table 25 have been computed, th~
table can be completed by filling in the remaining cells according to eq.
(73). This results in the lightface figures given in Table 25.
After Table 25 has been completed, the cell evaluations may be obtained
as follows. Form a new table (Table 26) which consists of the unit cost

OPERATIONS RESEARCH

15-63

rates of Table 23 subtracted from the number in the corresponding cell of
Table 25. That is, in symbolic notation,
{Table 26} = {Table 25} - {Table 23}.
The cells corresponding to movements which are part of the solution will
contain zeros. These zeros are given in boldface type in Table 26. The
TABLE

26.

CELL EVALUATIONS FOR TilE FIRST FEASIBLE SOLUTION

I~
tions

,

D2

D1

D3

D4

Do

Origins

81

0

0

0

-18

-11

82

-11

-13

0

0

-18

83

8

17

19

0

0

resulting numbers for the remaining cells are given in lightface type and
are the cell evaluations to be used in determining a better program or solution. (Comparison with Table 12 will show this to be true.)
When these cell evaluations have been determined, proceed as previously
outlined in the section.
Geollletric Interpretation of the Linear Progralllllling Problelll

A geometric interpretation of the linear programming problem may be
given by means of the following specific two-dimensional example.
PROBLEM 1.
To determine X,' Y ~ 0 which maximize Z = 2X + 5 Y
subject to
X ~4,

Y

(74)

X

~

3,

+ 2Y ~ 8.

The system of linear inequalities which constitute the restrictions results
in the convex set of points given by polygon OABeD of Fig. 6. That is,
any point (X, Y) on or within the polygon satisfies the entire system of inequalities (74). Hence, there exist an infinite number of solutions to system (74). The linear programming problem then is to select, from this

15-64

OPERATIONS RESEARCH
y

X=4

I
I

I
I
I

I

I
I
I

i--_~~B_____ :______________

y= 3

I

FIG. 6.

~~"x

Region satisfying restrictions (23) for non-negative X and Y (Ref. 1).

y

FIG. 7.

Family of parallel straight lines, Z = 2X

+ 5Y (Ref. 1).

y

...................

A....,...-----::::~

.....................
......................

....................

................

..........

.................. C . . . . . ~:::{'~+2Y=8
..........

.................... .........

............................. ___ ___
.............................. . . . . . .................... 2X+5Y= 19
...... ...... D
........................................................... . . . . .

--~~--------------~~--------~~'>~->~---->~----X

o ...............

. . . . . . . . . . 2X+5Y=8

...........

FIG. 8.

2X+5Y= 15

...................... ~X+5Y=O

Figure for geometric solution of linear programming problem (Ref. 1).

OPERATIONS RESEARCH

15-65

infinite number of points, the one or more points which will maximize the
function Z = 2X + 5 Y.
The function Z = 2X + 5 Y is a one-parameter family of parallel lines;
i.e., the function represents a family of parallel straight lines (of slope
-7~) such that Z increases as the line gets farther removed from the origin; see Fig. 7. The problem may then be thought of as one of determining
that line of the family, 2X
5Y = Z, which is farthest away from the
origin but which still contains at least one point of the polygon OABCD.
Figure 8 shows how several members of the family Z = 2X
5 Yare
related to the polygon OABCD and, in particular, shows that the solution
is given by the coordinates of point B. Point B is the intersection of Y = 3
and X
2Y = 8. Hence, B is given by (2, 3) and, in turn, Zmax = 2(2)
5(3) = 19.
GeoInetric Interpretation of the SiInplex Method. In order to
exhibit, geometrically, what happens when one solves the problem by means
of the simplex technique, the simplex solution of the example of Fig. 8 is
given in Tables 27 a-c. We see from those tables that the solution progresses
from the point (X == Xl = 0, Y == X 2 = 0) to the point (X == Xl = 0, Y
:= X 2 = 3) to the point (X == Xl = 2, Y == X 2 = 3); i.e., referring to
Fig. 8, from point 0 (origin) to point A to point B.
Mathematically, polygon OABCD (Fig. 6) constitutes a convex set of
points; i.e., given any two points in the polygon, the line segment joining
them is also in the polygon. An extreme point of a convex set is any point
in the convex set which does not lie on a line segment joining some two other
points of the set. Thus, the extreme points of polygon OABCD are points
0, A, B, C, and D. The optimum solution to the linear programming
problem will be at an extreme point and this optimum (extreme) point is
reached by proceeding from one extreme point to another. Note that, in
the example discussed here, the solution proceeded from extreme point 0
(Table 27a) to extreme point A (Table 27b) and, finally, to extreme point
B (Table 27c).
l\lore Than One OptiInuIn Solution. If the example is now changed
slightly to read:
PROBLEM 2. To determine X, Y ~ 0 which maximize Z = X + 2 Y
subject to the restrictions

+

+

+

+

X

X

~4,

Y

~

3,

+ 2Y ~ 8,

then Fig. 9 shows that the solution is given by either extreme point B or
extreme point C. This is because X
2Y = 8 is both a boundary line of

+

27.

TABLE

(a) Feasible Solution

SIMPLEX METHOD

Correspond~ng

to X = 0, Y = 0 in Fig. 8

",

2Z1

0

0

0

0

2

5

Basis

Po

Pa

P4

P5

PI'

P2

0

Pa

4

1

0

0

1

0

0

P4

3

0

1

0

0

1

0

P5

8

0

0

1

.1

2

Zj

0

0

0

0

0

Zj - Cj

0

0

0

0

-2

I

(b) Feasible Solution Corresponding to X = 0, Y

~

=

0
---5
"

3'in Fig., 8
"

0

0

0

0

2

5

Basis

Po

Pa

P4

P5

PI

P2

0

P3

4

1

0

0

1

0

5

P2

3

0

1

0

0

0

P5

2

1

1

Zj - Cj

15

-2
0
--5
0

1
--0

0

-2

Ci

(c) Maximum Feasible Solution Corresponding to X

~

0

Ci

I

=

2, Y

0
=

3 in Fig. 8 -

0

0

0

2

5

P5

PI

P2

Basis

Po

Pa

P4

0

Pa

2

1

2

-1

0

0

5

P2

3

0

1

0

0

1

2

PI

2

0

-2

1

1

0

Zj - Cj

19

0

1

2

0

0

15-66

OPERATIONS RESEARCH

15-67

the polygon OABCD and also a member of the family of parallel lines Z
= X + 2 Y. Hence B = (2, 3) and C = (4, 2) both constitute solutions
and yield the answer Zmax = 8.
y

o ............
X+2Y=O
FIG. 9. Geometric solution of linear programming problem with more than one optimum
solution (Ref. 1).

Furthermore, any convex linear combination of Band C will also be a
solution, namely, any point on the line segment BG.
The Dual Problem of Linear Programming

By a dual theorem of linear programming, one has a choice of two problems to solve instead of just one. This is because every linear programming
problem has a dual problem such that one involves maximizing a linear
function and the other involves minimizing a linear function. Furthermore, if one solves a linear programming problem by the simplex technique,
a tableau corresponding to an optimum solution automatically contains a
solution to the dual problem. Thus, one is free to work with either the
stated problem or its dual.
The dual problem of linear programming is illustrated by the example
given earlier (eq. 74), namely:
PROBLEM.
Determine X, Y ~ 0 which maximize Z = 2X + 5 Y subject
to
X ~ 4,

Y

(74)

X

~

3,

+ 2Y ~ 8.

This problem may be displayed in tabular form as is done in Table 28, that
is, the restrictions may be read off by interpreting a light vertical line as +
and the heavy vertical line as ~. Furthermore, the function to be maximized is given by the bottom row, na,mely 2X + 5Y. To obtain the dual
problem, extend Table 28 as is done in Table 29. Then, by reading down
each column as indicated, obtain the dual problem, namely:

OPERATIONS RESEARCH

15-68
TABLE

28.

TABULAR FORM OF PROBLEM

Max

TABLE

29.

x

y

1

o

4

o

1

3

1

2

8

----12

5

DUAL PROBLEM IN TABULAR FORM

x

y

Min

- -- -- - - 1
0
4
WI
- -- -- - - 0
1
W2
3
- -- -- - - 1
2
8
W3

- -- Max

DUAL PROBLEM

subject to
(75)

2

(see Table 29).

5

Minimize g = 4WI

+ 3W2 + 8Wa

+ Wa ~ 2,
W 2 + 2Wa ~ 5.
WI

The inequalities, ~, are converted to equalities by the subtraction of
non-negative slack variables. Then, since -1 cannot be entered into a
basis, one may also add artificial variables to provide for the basis. Thus,
WI + Wa ~ 2 is first converted into WI + Wa - W 4 = 2. Then the artificial variable W6 may be added to provide WI + Wa - W4 + W6 = 2.
For a detailed discussion, see Charnes, Cooper, and Henderson, Ref. 36.

OPERATIONS RESEARCH

15-69

If one returns to the simplex solution of the maximization problem of
Table 27c, one sees that the following results are given:
Zmax = 19,

and

(76)

Cl

0,

Xl = 2,

Zl -

X2

=

3,

Z2 - C2 = 0,

Xa

= 2,

Za - Ca = 0,

X4

=

0,

Z4 - C4 = 1,

X5 = 0,

Z5 -

=

C5 = 2.

Now, X a, X 4 , and X5 correspond to slack variables. Hence, if one starts
with the first slack variable and renumbers the Zj - Cj in order, and denotes these reordered Zj - Cj by Z'j - C'j, one obtains

(77)

Z'! - C,! = 0

(corresponding to former Za - Ca),

Z'2 - C'2 = 1

(corresponding to former Z4 - C4),

Z'a - C'a = 2

(corresponding to former

Z5 -

C5 ),

z'4

-

C' 4 = 0

(corresponding to former

Zl -

Cl ),

z' 5

-

C' 5 = 0

(corresponding to former Z2 - C2).

Setting W j = Z'j - C'j gives the solution to the dual minimization problem; that is, if the minimization problem were to be solved by the simplex
technique, the following results would be obtained:
gmin = 19,

and
WI = 0,

bl ) = 2,

= 1,

-(g2 - b2) = 0,

Wa = 2,

- (ga - ba) = 0,

W4

= 0,

-(Y4 - b4) = 2,

W5

=

W2

(78)

-(gl -

0,

-(g5 - b5)

= 3,

where the bj are the corresponding coefficients of the Wj in the minimization
function.
Conversely, given the solution to the minimization problem (i.e., given
eq. 78), the solution to the dual maximization problem can be determined
by starting with the first slack variable W 4 and relabeling the - (gj - bj )
in order. Hence, solution eqs. (76) would result.

15-70

OPERATIONS 'RESEARCH

For dual problems" it can be shown that Zmax = gmin; in other words,
that the two problems ani equivalent (see Ref. 36 and Ref. 40, Chap. XIX).
Hence, in solving a linear programming problem, one is free to work with
either the stated problem or its dual. Since, as a rule of thumb, the number
of iterations required to solve a linear programming problem is equal to one to
one and a half times the number of rows (i.e., restrictions), one can, by an
appropriate choice, facilitate the computation somewhat, especially in
such cases where there exists a sizable difference in the number of rows for
each of the two problems.
A Short Cut in Solving Linear Progralllllling Problellls

One of the many advantages of both the transportation and simplex
techniques is that judgment can be used to good advantage in facilitating
the computations required in order to arrive at an optimum solution. In
the transportation problem involving m rows and n columns, the use of
judgment (or a good guess) simply requires designating m
n - 1 cells
which are expected to correspond to a solution. After these m + n - 1
cells have been selected, proceed as in the transportation technique, first
filling in these cells with circled numbers and then "evaluating" the remaining cells to determine whether or not the solution is an optimum one.
Consider the problem of Fig. 6 and eq. (74). It will be shown that given
a "good" guess, the corresponding simplex matrix can be constructed.
Then one proceeds to the optimum solution, if the solution guessed is not
already optimum. This demonstrates how one may utilize judgment in
the general linear programming problem (using the simplex technique).
PROBLEM.
To determine X, Y ~ 0 which maximize Z = 2X
5Y
subject to
X ~4,

+

+

Y

(79)

X

~

3,

+ 2Y ~ 8.

Converting this system of inequalities to equalities by means of slack variables Sa, S4, and S5 yields

+ Sa
Y + S4
X + 2Y + S5
X

(80)

= 4,
= 3,
= 8.

N ow, suppose that one "guesses" or has reason to believe that the optimum
solution is such that it will not involve X; i.e., that the final solution will
consist of Y, Sa, and S5' This means, accordingly, that X = 0 and S4 = o.

OPERATIONS RESEARCH

15-71

Hence, to obtain the "solution," i.e., the elements of the basis that would
appear in the Po column of the simplex tableau, one needs only to set X = 0
and S4 = 0 in eqs. (80), yielding
S3 = 4,
Y

= 3,

+ S5

= 8,

(81)
2Y

so that

Y = 3,

(82)

S3 = 4,

S5 = 2.

These values are then entered in the simplex tableau (see Table 30) under
the column labeled Po. Note that P 2 corresponds to Y.
TABLE

30.

FEASIBLE SOLUTION, SHORT-CUT ApPROACH

~
Ci

2

5

Basis

Po

P3

P4

P6

PI

P2

0

P3

4

1

0

0

1

0

5

P 2(Y)

3

0

1

0

0

1

0

P6

2

0

-2

1

1

0

Zj

15

0

5

0

0

5

Zj - Cj

15

0

5

0

-2

0

Next, construct the body of the simplex matrix. Since each value of
Zj - Cj corresponds to the mini~u:m cost of deviating from the optim.um
program by one unit of Xii one can determine, for each j, the corresponding
Zj - Cj and the Xij which appear in that column. For example, consider
that one will deviate from the program of Y = 3, S3 = 4, and'S5 = 2 by
insisting that X = 1. One then needs to determine the changes in Y, S3,
and S5 which result from the unit change in X. Therefore, solve
1
(83) ,

+ S3

= 4,

Y = 3,
1

+ 2Y + S5

= 8,

which result from eqs. (80) by letting X = 1 and S4 =

o.

15-72

OPERATIONS RESEARCH

Solving eqs. (83) yields
(84)

x

= 1,

Y = 3,

Sa = 3,

8 5 = 1.

Comparing eqs. (82) with (84) then shows that the following changes in
Y, Sa, and 8 5 occur because of a unit change in X:
(85)

~Y

= 0,

~Sa =

1, -

Therefore, in setting up a simplex tableau (see Table 30), these values
would be inserted under the column labeled P 1 which corresponds to the
variable X.
Similarly, for 8 4 solve
Sa = 4,

+ 1 = 3,
2Y + 8 5 = 8.

(86)

Y

This yields
(87)

Y

= 2,

8 a = 4,

8 5 = 4,

so that
(88)

~Y

= 1,

~Sa

= 0,

~S5

= -2.

Insert these values in column P 4 of Table 30.
Next, since P 2 , P a, and P5 are in the basis, complete the corresponding
columns (as is done in Table 30) by inserting D's and l's in the appropriate
places.
_
Finally, compute the Zj - Cj's to determine whether the "solution" is
optimum. This is done as at the outset of any simplex solution; i.e., first
compute Zj by
(89)

Zj

=

2: CiXij
i

and then subtract the corresponding Cj. Since P 2 , P a, and P 5 are in the
basis, Z2 - C2, Za - Ca, and Z5 - C5 are all equal to zero. Additionally,
applying eq. (89), yields
Z1 -

Z4 -

+ 0(5) + 1(0) - 2
+ 1(5) + (-2)(0) -

C1 = 1(0)

C4 = 0(0)

= -2,

°= 5.

Thus Table 30 is completed and, not having an optimum solution (owing
to Z1 - C1 being negative), one can proceed to obtain the optimum solution as before.
The reader should note that Table 30 is identical with Table 27b and was

OPERATIONS RESEARCH

15-73

generated without a tableau such as is given in Table 27a. The same technique can also be applied to larger size problems so that, with a good estimate of the variables which will make up the solution, a great amount of
computation might be eliminated.
The Assignment Problem (See Ref. 1, Chap. 12)

The assignment problem is a special linear programming problem which
may be stated mathematically as follows:
Determine X ij which minimize
T

= L: aijXij
i,j

subject to
Xij
n

=

i, j = 1, 2, "', n

n

)' X··=)"
..
lJ
. X lJ
i=l

Xi?,

=

1,

i = 1, "', n; j = 1, "', n.

j=l

In other words, the assignment problem is such that:
(a) Xij = 1, if the ith facility is assigned to the jth job; 0, otherwise.
(b) Each row and column of the solution matrix will have one element
unity and all other elements zero.
For both the assignment problem and the transportation problem, socalled "methods of reduced matrices" exist which enable one to obtain the
optimum solution with great ease.
5. WAITING TIME MODELS

ProhleIll Statement. A waiting time problem arises when either units
requiring service or the facilities which are available for providing service
stand idle, i.e., wait. Problems involving waiting time fall into two different types, depending on their structure.
a. Waiting line problems involve arrivals which are randomly spaced
and/ or service time of random duration. This class of problems includes
situations requiring either determination of the optimum number of service
facilities or the optimum arrival rate (or times of arrival), or both. The
solution of these "facility and scheduling" problems is obtained through
what is called waiting line theory or (from the British) queuing theory.
Queuing theory dates back to the work of A. K. Erlang, who in 1908
published Use of Waiting-Line Theory in the Danish Telephone System. In
Erlang's and subsequent work up to approxim9tely 1945, applications were
restricted in the main to the operation of telephone systems. Since then
the theory has been extended and applied to a wide variety of phenomena.
See Ref. 42 and Ref. 1, Chap. 14.

15-74

OPERATIONS RESEARCH

Referenoe 42 also contains an excellent list of activities to which queuing
theor,y has been applied, a description of the use of the Monte Carlo technique in solving queuing problems, and a comprehensive list of references.
b. Sequencing. The second type of waiting time problem is not concerned
with either controlling the times of arrivals or the number of facilities, but
rather is concerned with the order or sequence in which service is provided
to availaqle units by a series of service points. This is the so-called sequencing problem. See Ref. 1, Chap. 16.
For a discussion of related problems such as the (assembly) line-balancing
problem and the traveling salesman (or routing) problem, see Ref. 1.
Problem Characteristics of Queuing Models

Every queuing or waiting line problem can be characterized by the following factors:
1. Input, the manner in which units arrive and become part of the waiting line.
2. Stations, the number of service units (or channels) operating on the
units requiring service.
3. Service policy, limitations on ,the amount of service that can be rendered~or is allowed.
4. Queue discipline, the order in which units are served, e.g., first come,
first served; random selection for service; priority.
5. Output, the service provided and its duration. To specify a queue
completely, all five factors must be described.
Notation (see Ref. 42).
X
p.
C
Cf

n
k
p

Pn(t)

pn

mean arrival rate (number of arrivals p~r unit time)
mean service rate per channel
the number of service channels
mean number of free service channels
number of units (customers) in the syst.em
number of phases in the Erlang service case
utilization factor for service facility: p = X/CIl
the probability that there be, at time t, exactly n units in the system, both
waiting and in service
the steady-state (time-independent) probability that there be n units in the
system, both waiting and in service:
n=oo

:E P net)
n=O

Cp

:E

Pn = 1

n=O

traffic intensity in erlangs:
n=c-l

X

Cp

P(=O)
P(>O)

n=OO

=

=- =
p.

C -

the probability of no waiting
the probability of any waiting

cf

=

:E
n=O

n=oo

nPn

+ n=c
:E CPn

OPERATIONS RESEARCH

15-75

P(>r)
L

the probability of waiting greater than time r
the average number of units in the system, both waiting and in service:

Lq

the average number of units waiting in the queue:
00

Lq =

L (n
n=c

- c)Pn

=L -

C

+ Cj

W

the average waiting time in the system:

A(t)
B(t)
bk(t)

cumulative distribution of times between arrivals with density function aCt)
cumulative distribution of service or holding times with density function bet)
probability density for kth Erlang distribution

Input. Arrivals or inputs into a queuing system may occur at intervals
of regular length. For such cases the cumulative distribution of time intervals between arrivals is given by the uniform distribution

=

A(t)

Dfort

< to;

1 fort

~ to.

If the input distribution is of Poisson type, the time interv[\,ls between
arrivals are exponentially distributed. The cumulative distribution is then
given by
A(t) = 1 - e-Xt •

An intermediate type of input may be described by the Erlangian frequency distribution of times between arrivals
b (t) =
k

[(Xk)k] e-Mttk - 1 •
r(k)

This yields the exponential distribution when k = 1 and the uniform distribution when k becomes infinitely large.
As Saaty points out (Ref. 42), the normal distribution also produces a
good fit to arrival data in some practical problems.
Output (Service or Holding Thnes). Distributions of service or
holding times are defined as for arrivals or inputs. In practice, Poisson inputs and exponential service times occur very frequently.
ASSulllptions Leading to a Poisson Input. (See Ref. 42.) One has
a Poisson input when the following assumptions are satisfied:
1. The total number of arrivals during any given time interval is independent of the number of arrivals that have already occurred prior to the
beginning of the interval.

OPERATIONS RESEARCH

15-76

2. For any interval (t, t + dt), the probability that exactly one arrival
will occur is Xdt + O(dt 2 ), where X is a constant, while the probability that
more than one arrival will occur is of the order of dt 2 and may be neglected.
For a further discussion of the Poisson input and properties of a Poisson
process, see Refs. 42 and 43.
AssulTIptions Leading to an Exponential Holding TilTIe Distribution. If a channel is occupied at time t, the probability that it will become
free during the following time interval dt is given by p. dt, where p. is a

constant. (See Ref. 42.)
It follows that the frequency function of the service times is p.e-p,t, while
the mean duration of service is 1/p., since the expected value of t is
E(t)
Queuing Models

=

i

oo

1

te-p,t dt = -.

p.

o

P.

To date, there have been essentially two different theoretical approaches
to queuing, one through differential difference equations due to Erlang and
the other through integral equations as studied by Lindley. The first approach may be illustrated by means of a single channel queuing system
with both X and p. constant. A Poisson input, exponential holding time,
first-come, first-served single channel queue is assumed.
Differential Difference Equations. If the operation starts with no

items in the queue, then the following equations describe the given system.
(See Ref. 1.)
Poet

+ dt)

= poet) (1

- X dt)

+ PI (t)p. dt

(n

= 0),
(n

~

1).

By transposing and passing to the limit with respect to dt, these equations
become
dPo(t)
dt
dPn(t)

- - = -(X
dt

(n

= 0),

+ p.)Pn(t) + XPn_l(t) + p.Pn+l(t)

(n ~ 1).

The time-independent steady-state solution is obtained either by solving
these time-dependent transient equations and letting t ~ 00 in the solution, or by setting the derivatives with respect to time equal to zero, and
solving the resulting steady-state equations. The latter approach yields,
successively:

OPERATIONS RESEARCH

15-77

By mathematical induction, these formulas then reduce to the single
equation:
.
Pn = pn(1 - p),
where p = A/ p., since c = 1.
The expected number of units in the system is given by
L =

L: npn

= (1 - p)

L: npn =

p/(1 - p).

The expected number of units in the line is given by

Lq = L - p = p2/(1 - p).
The expected waiting time is given by (see Ref. 42)
W =iooTdP(  T)

= pep,(p-I)r

and
W

=

A
p.(p. - A)

p

p. (1 - p)

The expected number waiting, of those delayed, is

1
(1 - p)

The expected waiting time of those delayed is
1

W

p.(1 - p)

P(>O)
See Refs. 42-44.
(b) Constant Holding Time Distribution (Refs. 42 and 45). The steadystate equations are:

Po = 1 PI

p,

= (1 - p)(eP

Pn = (1 - p)

1),

-

n
L:
k=l

[ (k )n-k
(_l)n-k ekp
p
(n - k)!

Here,

(k )n-k-I

+ --p--(n - k - I)!

]
(n

~

2).

k

P(>T)

= p L: eP(p,T-i) [ -p(P.T - i)]iji!,
i=O

where k is the largest integer less than or equal to P.T.
W

=

A[2p.2 (1 - ~)]
p.

=

p

2p.(1 - p)

•

Finally, the expected waiting time of those delayed is
1

2p.(1 - p)
(c) Poisson Input, Erlangian Holding Time Distribution. The probability
density function for the kth Erlang holding time distribution is given by

bk(t) =

[(;~~~] e-pk'tk-

1

•

OPERATIONS RESEARCH

15-79

The steady-state equations are (Ref. 45)

= J.l.Pl

APO

(A

Here

+ J.L)Pn

(n

+ APn-k

= J.l.Pn+l

L=
W =

= 0),

+ 2k -

pep

(n ~ 1).

pk)

2k(1 - p)

,

p(k + 1) •
2J.l.k(1 - p)

2. Priority Discipline: Arbitrary Holding Thnc, Nonpreemptive
Service (Refs. 42 and 46).
(a) Finite Number of Priorities, N. Assume a system with Poisson input

for the kth priority with arrival rate Ak, arbitrary holding time with service
rate J.l.k, and a priority queue discipline. Items of different types enter the
system with assigned priorities for service. Whenever the system is free
to service an item, it selects items of highest priority on a first-come, firstserved basis. However, if an item of higher priority enters the system while
one of lower priority is in service, this service is not preempted, i.e., sent
back to the waiting line. For this situation,

where

Wo

Wk =
.
(1 -

Uk-l)

,

(1 -

Uk)

Ai

Pi =-,

J.l.i
N

A=

L

=

L

Pi

Wo =

!A

i

Ai,

k

Uk

< 1,
oo

2

t dF(t) ,

1 N
F(t) = AiFi(t),
A i=l
and where Fk(t) is the cumulative holding time distribution function for
the kth priority.
The expected length of the line is given by

L

N

L =

L
i=l

AiWi.

OPERATIONS RESEARCH

15-80

(b) Two Priorities, Preemptive Service, Exponential Holding Time. (See
Refs. 42 and 47.) Priority 1 and 2 calls arrive at a single channel with
arrival rates Al and A2, respectively. Both priorities have Poisson arrival
distribution. Priority 1 calls in the queue enter the channel before all
priority 2 calls in queue and replace any priority 2 calls in the channel on
their arrival. The priority 2 call in the channel then reenters the queue.
Priority 1 and 2 calls have exponential service time distribution with
service rates J.LI and J.L2 respectively. Let PI = At!J.Lt, and P2 ~ A2/J.L2 where
At!J.LI + A2/J.L2 < 1.
Let Pnm be the probability that n priority 1 calls and m priority 2 calls are
in the queue. The steady-state equations are:

+ Al + A2)Pnm + AIPn_I m + A2Pnm- 1 = 0 (m, n > 0),
m
m
m
m 1
JLIPl + J.L2PO +
(J.L2 + Al + A2)PO + A2PO - = 0
(n = 0, m > 0),
o
0
o
J.LIPn+l - (J.LI + Al + A2)Pn + AIPn_I = 0
(m = 0, n > 0),
J.LIPn+l m -

(J.LI

1

-

(m

=

n

= 0).

The expected number waiting, first priority, is given by
PI

1 - PI
The expected number waiting, second priority, is given by
P2 [

1 - PI

+ (J.Lt!J.L2)PI

]

(1 - PI)(l - PI - P2)

•

(c) Continuous Number of Priorities (Ref. 42). For an excellent discussion of results for a single channel priority queuing system with application
to machine breakdowns, see Ref. 48. The number of available machines
is assumed to be infinite. Priorities are assigned according to the length of
time needed to service a machine with higher priorities being assigned to
shorter jobs. Since the length of service time may correspond to any real
number, a continuous number of priorities exists.
3. Randolll Selection for Service: Illlpatient Custolllers, Exponential Holding Tillle. (See Refs. 42 and 49.)
Assumptions. Poisson input, exponential holding time, random selection
for service, items leave after a wait of time To.
Steady-State Equations.
-APO
Pn-l - (A

+

(J.L

+ C1)Pl =

0

(n = 0),

+ J.L + Cn)Pn + (J.L + Cn+1 )Pn+l

=

0

(n ~ 0),

OPERATIONS RESEARCH

15-81

where Cn is the average rate at which customers leave when there are n
customers in the system and where Po is obtained from
00

L Pn
n=O
Solution.

= 1.

n

Pn = Xnpo

II (J.L + Ck)-l

(n=1,2, ... ),

k=l

Cn =

J.L exp (- J.LTo/n)
1 - exp (-J.LTo/n)

.

4. Lhnited Source: Exponential Holding Time. (See Refs. 42
and 43.)
Assumptions. Input from a source having only a finite number m of customers. Exponential service time, single channel (servicing of m machines).
Steady-State Equations.
(n = 0)
[em - n)X

+ J.L]Pn =

(m - n

+ l)Xpn-l + J.LPn+l

(l~n~m-l),

(n ~ m).

J.LPm = Xpm-l

Solution.
m

Po = 1 -

LPn,
n=l

L = m - [(X

+ J.L)/X](l

- Po).

5. Constant Input: Exponential Holding Time.
and 50.) For constant input at intervals of length 0,

(See Refs. 42

Pn = Po(l - po)n,
where Po is given by
1 - Po = exp (-J.LPoO).
Furthermore,
P(>T) = (1 - Po) exp (-J.LPoO),

and
W

=

_.£00 T dP(>T) = (1 -

PO)/JLPoo

6. Queue Length-Dependent Parameters and Time-Dependent
ParaIlleters. For an excellent resume of queuing results for queue length-

dependent and time-dependent parameters, see Ref. 42.
51-53.

See also Refs.

OPERATIONS RESEARCH

15-82

Queuing Theory Formulas.

Two Channels in Series (See Ref. 54.)

Exponential Holding Times. (See Ref. 42.)
Assumptions. Poisson input with mean A, two channels in series with
exponential holding times, f.J.l and f.J.2, respectively. After finishing service
at the first gate, the customer moves on to the second gate.
(a) Unlimited Input. The average distribution of customers throughout
the system is given in the following table.
Channell

Average number of customers
waiting for service
Average number of customers
being served
Average total number of
customers

Channel 2

Total System

~+~

Xl

X2

1 - Xl

1 - X2

Xl

X2

Xl

X2

1 - Xl

1 - X2

1 - Xl

Xl

1 - X2

+ X2

~+~
1 - Xl

1 - X2

The steady-state solution giving the probability that there are nl customers waiting at the first gate and n2 at the second is given by
where
Xl

=

"A/f.J.l

<1

and

The probabilities of having n customers waiting at the first channel and
at the second channel are, respectively,
Pl(n) = xln(1 -

Xl),

P2(n) = X2n(1 - X2).

(b) Limited Input. For a resume of results for limited input, see Ref. 42.
Queuing Theory Formulas. Three Channels in Series
Results for the case of three channels in series can be found in Ref. 54.
Queuing Theory Formulas. Multiple Channels in Parallel, Poisson Input
An excellent resume of results for both a finite and infinite number of
channels can be found in Ref. 42. For the case with a finite number of
channels, this includes: (a) identical exponential holding times, (b) identical constant holding times, (c) (priority discipline) different Poisson inputs,
a finite number of priorities with the same exponential holding time (nonpreemptive), and (d) (limited source) exponential holding time.

OPERATIONS RESEARCH

15-83

The resume for an infinite number of channels covers: (a) exponential
holding time and (b) limited source, exponential holding time.
Sequencing Models

For a detailed discussion of sequencing models, see Ref. 1, Chap. 16.
Only a few results are presented here.
1. Two-Station and n Jobs, No Passing. Consider the case of
n jobs to be processed on two machines, A and B, with each job requiring
the same sequence of operations and no passing allowed. The order (sequence) in which jobs are processed on machine A must be retained in
processing these same jobs on machine B. It is assumed that material can
be held between work stations so that, in the meantime, the preceding work
station is left clear to start work on another job. It is further assumed
(without loss of generality) that all jobs must first go to machine A and
then machine B.
Let Ai = time required by job i on machine A,
Bi = time required by job i on machine B,
T = total elapsed time for jobs 1, 2, ... , n,
Xi = idle time on machine B from end of job i - I to start of job i.
The sequenr.ing problem is to minimize T, the total elapsed time.
The total elapsed time may be expressed as
n

T =

L

n

Bi

+ LXi.

i=l

i=l

For any given set of items, ~? =lB i is constant; therefore, the problem of
minimizing T is equivalent to that of minimizing
n

Dn(S) =

LXi,
i=l

where Dn(S) is a function of the sequence S.
Procedure for Finding the OptilllUlll Sequence. A procedure for
finding the optimum sequence for two stations, n jobs, and no passing is
due to Johnson (Ref. 55) and can be described by means of the example
represented in Table 31.
.
TABLE

31.

MACHINE TIMES (IN HOURS) FOR FIVE JOBS AND

i
Ai Bi
136
272

347
453
574

Two

MACHINES

OPERATIONS RESEARCH

15-84

Step 1. Examine the A/s and B/s and find the smallest value [min
(Ai, Bi)]. In this illustrative case, this value is B2 = 2.
Step 2. If the value determined falls in the Ai column, schedule this job
first on machine A. If the value falls in the Bi column (as it does in this
case), schedule the job last on machine A. Hence, job 2 goes last on

machine A.
Step 3. Cross off the job just assigned and continue by repeating the
procedure given in steps 1 and 2. In case of a tie, choose any job among
those tied. In this illustrative case, once job 2 is assigned, the minimum
value which remains is 3, and it occurs in Al and B 4 • There is a choice, so
arbitrarily select AI. Then job 1 goes oil machine A first. Now B4 is the
minimum remaining value. Hence, job 4 goes on machine A next to last.
The minimum remaining value is 4, and it occurs in Aa and B 5 • Then
job 3 can be put on machine A second and job 5 on third to the last. The
resulting sequence is optimum and is 1, 3, 5, 4, 2.
2. Three Stations and n Jobs, No Passing.
Let Y i = the idle time on the third machine before it starts work on the
ith job,
Ci = working time of the third machine on the ith job.
The total elapsed time for three stations and n jobs (no passing) is given by
n

T =

L
i=1

n

Ci

+L

Y i•

i=1

Since ~i=ICi is fixed, the problem is to minimize ~i =IYi . Johnson
(Ref. 55) has found an optimum solution to this problem for the special
case where either (1) min Ai ~ max Bi or (2) min Ci ~ max B i . The first
of these conditions is satisfied by means of an exact equality in the illustrative data given in Table 32.
TABLE

32.

MACHINE TIMES (IN HOURS) FOR FIVE JOBS AND THREE MACHINES

i
1

Ai
8

Bi
5

2
3
4
5

10
6
7
11

6
2
3
4

Ci
4
9
8
6
5

To obtain an optimal sequence a new table, such as Table 33, is formed.
Then the procedure (described in the preceding) for obtaining an optimum
sequence for two stations is applied to Table 33. In this case, the following

OPERATIONS RESEARCH

15-85

TABLE 33. SUMS OF MACHINE TIMES (IN HOURS) FOR FIVE JOBS FOR FIRST AND
INTERMEDIATE MACHINES AND FOR INTERMEDIATE AND LAST MACHINES

i
1
2
3

Ai

4
5

+ Bi
13
16
8
10
15

Bi

+ Ci
9
15
10
9
9

sequences arise and are optimum for the originally cited three-station
problem:
3,2, 1,4,5
3,2, 1,5,4
3,2,4,5,1

3,2,5, 1,4

3,2,4, 1,5

3,2,5,4, 1

In situations where the conditions min Ai ~ max Bi or min Ci ~ max Bi
do not hold, no general procedure is available as yet for obtaining an optimum sequence. It follows that no general solution is yet available for the
more general problem of n jobs and m machines, each job following an
identical route with no passing allowed. However, the following statement
holds: For optimum sequences (the criterion being the total elapsed time),
the total idle time of the last machine must be minimized.
3. Identical Routing, Passing Permitted. Although each of n jobs
may have to pass through each of m stations according to a specific route,
the process characteristics do not always require that the order in which n
jobs pass through each of the stations be identical, i.e., passing is permitted.
Bellman (Ref. 56) and Johnson (Ref. 55) have shown, that for two or three
station processes, the optimum sequence always involves the same ordering
of jobs over each station. This result does not necessarily hold where
more than three stations are involved.
.
4. Different Routing. In many production operations, particularly in
job shops, the various jobs which must be done require different routing
through the work stations or centers.
The problem of determining the optimum sequence for two jobs which
have to be processed on m machines using two different routes has been
treated by Akers and Friedman (Ref. 57) who, by means of Boolean algebra, have developed a technique for eliminating sequences which are technologically unfeasible. Their technique yields a subset of sequences, one
or more of which is optimum.

15-86

OPERATIONS RESEARCH

The Akers-Friedman technique can also be extended to apply to the case
of n jobs and m stations. See Refs. 56 and 57.
6. REPLACEMENT MODELS

ProbleIll State~ent. The theory of replacement is concerned with
the prediction of replacement costs and the determination of the most economic replacement p~licy. There are two basic types of replacement problems concerned with (a) items that deteriorate with age and/or use and, (b)
items with probabilistic life spans and with efficiencies that do not decline
over their life spans.
For type (a) items that deteriorate or degenerate with age and use, the
problem is to determine when to replace equipment so as to minimize the sum
of costs due to loss of efficiency, on the one hand, and cost of new equipment, on the other hand.
For items whose efficiency declines over their life spans (e.g., machine
tools, vehicles), prediction of costs involves determining those factors which
contribute to increased operating cost, forced idle time, increased scrap,
increased repair, etc.
The alternative to the increased cost of operating aging equipment is the
cost of replacing old equipment with new. There is some age at which replacement of old equipment is more economical than continuation at the
increased operating cost. At that age, the saving from the use of new equipment more than compensates for its initial cost.
For type (b) items that do not essentially deteriorate with age and use,
but which have probabilistic life spans (e.g., light bulbs or'radio tubes), the
problem is to determine when and how to replace the items (i.e., individually
or in groups) so as to minimize the sum of costs (\f (1) the items, (2) replac'
ing items after failure, and (3) group replacements.
For a group of items with a probabilistic life span, the prediction of costs
involves the estimation of the probability distribution of life spans and calculation from these of the predicted number of failures as a function of the
age of the group of items. For several schemes for approximating the
number of failures, see Refs. 43 and 58-62.
'For a complete discussion of both types of replacement problems, see
Ref. 1, Part VII.
ReplaceIllent of IteIlls That Deteriorate

The measure of efficiency used as a basis for determining optimum replacement decision rules is the discounted value of all future costs associated with
any replacement policy. Discounted cost is the amount required at the
time of the policy decision to build up a fund at compound interest large
enough to pay the pertinent cost when due.

15-87

OPERATIONS RESEARCH

In general, the costs included in the replacement decisions cited here are
all costs that depend upon the choice or age of the machine. See Ref. 1,
Chap. 17, for a discussion of the relevant costs in replacement theory considerations.
Cost Equation. Consider a series of time periods 1, 2, 3, 4, "', of
equal length, and let the costs incurred in these periods be ClI C2, C3 , C4 ,
.. " respectively. It is assumed throughout that, relevant to items that deteriorate, these costs are monotonically increasing with time. Assume that each
cost is paid at the beginning of the period in which it is incurred, that the
initial cost of new equipment is A, and that the cost of investment is 100r%
per period.
The discounted value I(n of all future costs associated with a policy of
replacing equipment after each n periods is given by
(90)

I(n =

[+
A

Cl

~
~
+ 1~
+ r + (1 +
r)2 + ... + (1 + r)n-l
A

+ Cl

C2

Cn

.

]
]

+ [ (1 + r)n + (1 + r)n+l + ... + (1 + r)2n-l + .. '.
Equation (90) may also be written as
n

A
(91)

I(n =

+ L: [Cd(l + r)i-l]
i=l
1 - [1/(1 + r)]n

or, if
1
X=--,

(92)

l+r

then

n

A
(93)

Kn =

+ L: CiX i - l
i=l

1 - xn

Now, if the best policy is replacement every n time periods, the two
inequalities
(94)

must hold. Furthermore, for the case where the Cn are monotonic increasing, these conditions are sufficient as well as necessary ones for Kn to be
minimum.

OPERATIONS RESEARCH

15-88

From eq. (93), K n- 1 - Kn

> 0 is equivalent to (see Ref. 1)

(95)
and Kn+1 - Kn
(96)

> 0 is equivalent to
Cn+1 > (1

- X)K n.

These inequalities, (95) and (96), may also be written as:
(A

(97)
and

Cn

(98)

Cn+1

<

>

+ C1) + C2X + ... + c n_ 1xn-2
1 + X + X 2 + ... + xn-2

(A

+ C1) + C2X + ... + cnxn-1
'
1 + X + X + ... + xn-1
2

where the right-hand terms are the weighted averages of all costs up to and
including the (n - l)st and the nth periods, respectively.
Decision Rules. As a result of these two inequalities, the following
decision rules for minimizing costs may be stated:
1. Do not replace if the next period's cost is less than the weighted average
of previous costs.
2. Replace if the next period's cost is greater than the weighted average
of previous costs.
For further discussion and a geometric interpretation of these decision
rules, and also an illustration of their use, see Ref. 1, Chap. 17.
Replacelllent of Itellls that Deteriorate by Different Equiplllent.
Here, one considers the replacement of equipment by new or alternate
pieces of equipment other than those currently in use.
Let
K' n = minimum discounted value of all future costs of new
equipment,
D 1, D2, "', Dm = costs in each future period incurred with present
equipment,
X = 1/(1 + r), the discount factor,
'lrm = discount value of all future costs if present equipment
is discarded after m periods.
Cost Equation.

(99)

'lrm = D1

+ D2X + ... + DmXm- 1 + K' nxm.

Therefore
(100)

'lrm+1 - 'lrm = Dm+1Xm + K' n(Xm+1 - X m),

and
(101)

'lrm - 'lrm-1 ~ DmXm- 1 + K' n(Xm - X m -

1

).

OPERATIONS RESEARCH

The condition

7l"m-l -

(102)
whereas the condition
(103)

7l"m

15-89

> 0 is equivalent to
Dm

7l"m+l -

<
7l"m

D m+1

(1 - X)J('n

> 0 is equivalent to

>

(1 - X)J('n.

Conditions (102) and (103) show that the minimum cost is achieved by
continuing the use of the old equipment until the cost for the next period is
greater than (1 - X)J(' n, where (1 - X)J(I n is the weighted average of the
costs of using the equipment for n periods between replacements.
ReplaceIllent of IteIlls That Fail

The second class of replacement problems is concerned with items that
do not deteriorate markedly with service but which ultimately fail after a
period of use. The period between installation and failure is not constant
for any particular type of equipment but will follow some frequency distribution. This section is concerned only with items that fail with increasing
probability as they age. Furthermore, it is assumed hereafter that all failures will be replaced. The problem, therefore, is to plan the replacement
of items that have not failed.
Replacing a used but still functioning item with a new item is justified
only if the cost of replacement is higher after failure than before, and if
installing the new item reduces the probability of failure.
The replacement policy will depend upon the probability of failure. It
is therefore of considerable importance to estimate the probability distribution of failures. Statistical techniques used in such "life testing" are
being developed rapidly and a growing literature on the subject is becoming available. See Refs. 63-65. The costs of replacement before and after
failure are the other important factors.
In this section, the cost of the alternatives of replacement or retention is
considered and two policies are developed that minimize expected costs as
a function of the cost of replacement, cost of failure, and probability of
failure.
Mortality Curves. The initial information on the life characteristics
of a light bulb, for example, may be shown in the form of a mortality curve.
A group of N light bulbs is installed, and at the end of t equal time intervals
the number of bulbs surviving equals some function of t, say Set). The
proportion of the initial bulbs remaining is, then,
(104)

Set)
set) = - .
N

OPERATIONS RESEARCH

15-90

A typical mortality table is shown in Table 34 giving, at regular intervals
of time, the number of survivors out of an original group of 100,000 bulbs.
Specifically, the mortality curve would result from column 2 in Table 34,
namely that given by S(t), and is given in Fig. 10.
TABLE 34.'
(1)
Time
Units
Elapsed

t
0
1
2
3
4
5
6
7
8
9
10
11

12
13
14
15
16
17
18

LIFE CHARACTERISTICS OF A LIGHT BULB: ORIGINAL POPULATION
OF 100,000 UNITS

(2)
Survivors

Set)
100,000
100,000
99,000
98,000
97,000
96,000
93,000
87,000
77,000
63,000
48,000
32,000
18,000
10,000
6,000
3,000
2,000
1,000
0

(4)

(5)

Reduction in
Survivors

Probability
of Failure

Conditional
Probability of
Failure

Set - 1) - Set)

pet)

Vt.o

0
1,000
1,000
1,000
1,000
3,000
6,000
10,000
14,000
15,000
16,000
14,000
8,000
4,000
3,000
1,000
1,000
1,000

0
0.01
0.01
0.01
0.01
0.03
0.06
0.10
0.14
0.15
0.16
0.14
0.08
0.04
0.03
0.01
0.01
0.01

0
0.0100
0.0101
0.0102
0.0103
0.0312
0.0645
0.1149
0.1818
0.2381
0.3333
0.4375
0.4444
0.4000
0.5000
0.3333
0.5000
1.0000

(3)

Column (1), number of elapsed periods.
Column (2), survivors at end of period, based on figures supplied by a major light
bu~b manufacturer.
Column (3), rate of change of column (2).
Column (4), column (3) divided by 100,000.
Column (5), column (3) divided by value in column (2) for previous period.

Life Span. Perhaps a more familiar presentation of the life characteristics of a group of items is in the form of a probability distribution of
life spans. Such a probability distribution may be derived from the mortality table by taking
SCt - 1) - set)
(105)
N
= pet),

the proportion of units failing in time period t. (See Table 34.)
probability function, p(t), is plotted against t in Fig. 11.

This

15-91

OPERATIONS RESEARCH
100,000
90,000
80,000
-::::ti5' 70,000
~ 60,000
0
>

.~

::l

en

50,000

'0

.... 40,000
(I.)

.c

E
::l

Z

30,000
20,000
10,000
2

4

6

8

10

12

14

16

18

Time units elapsed, t

FIG. 10.

Number of survivors after t periods of time.

(Data from Table 34.)

0.16
0.15
~ 0.14

:s
"C

0.13

0

0.12

a.

0.11

.~
(I.)

E
+=l 0.10
c::
.~

0.09

:§
0.08
~
~
c::

0.07

~ 0.06
c::

~

0.05

8. 0.04

~

a.. 0.03

0.02
0.01
0

0

2

4

6

8

10

12

14

16

18

20

Time units elapsed, t

FIG. 11. Probability of failure in tth period of bulb installed at beginning of first period.
(Data from Table 34.)
,

OPERATIONS RESEARCH

15-92

Conditional Probability of Failure. Another descriptive notion of
life characteristics is the conditional probability of failure or its complement, the probability that an item at time t will survive to time t + 1.
This probability is given by

(106)

VtO

,

=

Set - 1) - SCt)
set)
= 1 - --Set - 1)
Set - 1)

and is the proportion of surviving units failing in the subsequent period.
(See Table 34.) This c'onditional probability function is plotted against t
in Fig. 12.
1.0
o

0.9

J!l:;'~ 0.8

'c'g

~.~ 0.7

cc.

:21:: 0.6
~aJ

~

6-

.... aJ

0.5

~.g 0.4

Oil)

~.!: 0.3
c.1lO

£@

.E

0.2

0.1
OL-~==~~~L-~~~~~~~

o

2

4

6

Time units elapsed, t

FIG. 12.

Conditional probability of failure in tth period.

(Data from Table 34.)

ReplaceInent Process. It is assumed here that failures occur only at
the end of a unit period of time. During the first t - 1 time intervals, all
failures occurring during any given time interval are replaced at the beginning of the next time interval. At the end of the tth time interval, all units
are replaced regardless of their ages. The problem is to determine that
value of t which will minimize total cost.
Rate of Replacement. The general expression for the number of units
failing in time interval t is
(107)
=

J(t)

N{P(t)

+

%

p(x)p(t - x)

+

E[~

where N = total units in the installation,
p(x) = probability of failure at age x.

p(x)p(b - x)] p(t - b)

+ ... }.

OPERATIONS RESEARCH

15-93

Table 35 illustrates the use of eq. (107) to determine the total number of
failures in each time period t, based upon the data of Table 34.
TABLE 35. TOTAL FAILURES (REPLACEMENTS) IN EACH PERIOD t a

(1)

Period
1

2
3
4
5
6

7
8
9
10
11

12
13
14
15
16
17
18
19
20

(2)
(3)
Replacements

(1)

Current

Cumulative

f(t)

'J;f(t)

°

1,000
1,000
1,010
1,020
3,030
6,040
10,090
14,201
15,392
16,665
15,000
9,480
6,175
6,160
5,521
7,309
9,317
10,181
11 ,529

°

1,000
2,000
3,010
4,030
7,060
13,100
23,190
37,391
52,783
69,448
84,448
93,928
100,103
106,263
111,784
119,093
128,410
138,591
150,120

Period
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

(2)
(3)
Replacements
Current

Cumulative

f(t)

'J;f(t)

12,047
11,706
10,820
9,697
8,700
8,288
8,413
8,862
9,523
10,100
10,413
10,507
10,348
9,999
9,636
9,079
9,220
9,271
9,447
9,669

162,167
173,873
184,693
194,390
203,090
211 ,378
219,791
228,653
238,176
248,276
258,689
269,196
279,544
289,543
299,179
308,258
317,478
326,749
336,196
345,865

Column (1), periods since original installation.
Column (2), calculated as described in text.
Column (3), cumulative sum of values in column (2).
a

Data based on Table 34.

A second method for determining the number of failures in each period t,
based upon the conditional probability of failure and using vector algebra
is given in Ref. 1, Chap. 17.
Cost of ReplaceInent. A second fundamental requirement of a useful
replacement policy is that the cost of replacement after failure be greater
than the cost of replacement before failure. This difference in cost is the
source of savings required to compensate for the expense of reducing the
probability of failure by replacing surviving units. Group replacing of units
can cost less than replacement of failures by virtue of labor savings, volume
discounts on materials, or for other reasons.

15-94

OPERATIONS RESEARCH

Cost Equation. Let K(t) = total cost from time of group installation
until the end of t periods.
Then, if the entire group is replaced at intervals of length t periods,
K(t)

- - = Average cost per period of time.
t
\

Let C1 =
C2 =
f(X) =
N =
Then the
(108)

unit cost of replacement in a group,
unit cost of individual replacement after failure,
number of failures in the Xth period,
number of units in the group.
;
total cost, K(t), will be given by ,
K(t)

= NC 1 + C2

t-l

:E f(X).
X=1

Therefore, the cost per period is given by
(109)

]((t)

NC 1

C2

t-l

t

t

t

X=l

-=-+-

:Ef(X).

Minimization of Costs. Costs are minimized for a policy of group replac
ing after t periods if
K(t)
K(t - 1)
(110)
--<
,
t
t- 1
and if
K(t)
K(t + 1) .
(111)
-<
.
t
t+1

J

By using eq. (109), conditions (110) and (111) may be rewritten respectively
as
t-2

NC l

(112)

C2 f(t - 1)

+ C2 :E f(X)

X=l
< -----

t- 1

and
(113)

C2 f(t)

X=l
> -----

Conditions (110) and (111) and, in turn, conditions (112) and (113), are
necessary conditions for optimum group replacement. They are not sufficientas illustrated by the function F(t) = t sin t, 0 ~ t ~ 471", which satisfies
these conditions for not one but two values of t, although the function has
but one true (as opposed to relative) minimum point.

TABLE

36.

AVERAGE COSTS FOR ALTERNATIVE GROUP REPLACEMENT POLICIES:

CI/C2 = 0.25

(Data from Table 35)
(1)

(2)

(3)

(4)

(6)

(5)

t

J(t)
(Current)

t
1
2
3

4

5
6

7
8
9
10

°

1,000
1,000
1,010
1,020
3,030
6,040
10,090
14,201
15,392

Total Cost

L:J(X)
X=l

(Cumulative)

°

1,000
2,000
3,010
4,030
7,060
13,100
23,190
37,391
52,783

t-2

(t - l)J(t - 1) -

L: J(X)
1

°°

2,000
2,000
2,040
2,090
14,150
35,220
67,620
104,619

t-l

t-l

tJ(t) - L:J(X)
1

[((t)

=

NC 1

+ C2 L:J(X)
1

°

2,000
2,000
2,040
2,090
14,150
35,220
67,620
104,619
116,529

25,000C2
25,000C2
26,OOOC2
27,OOOC2
28,01OC2
29,030C2
32,060C2
38,10DC2
48,190C 2
62,391C2

(7)
Average
Cost per
Period
K(t)/t
25,000C2
12,500C2
8,667C2
6,750C 2
5,602C2
4,838C2
4,580C2 a
4,762C 2
5,354C2
6,239C2

0

-c
m

:;:c

»
~

5

Z

(J)

:;:c

m
m

(J)

»

:;:c

n

::I:

Column (1), number of periods between group replacements.
Column (2), number of replacements from Table 35.
Columns (3), (4), (5), calculated as indicated in column headings
Column (6), calculated as indicated with C1 = 0.25C2 •
Column (7), column (6) divided by column (1).
a

Therefore l

=

7.
01

.0

01

15-96

OPERATIONS RESEARCH

Conditions (112) and (113) may be interpreted as follows:
1. One should not group replace at the end of the tth period if the cost of
individual replacements at the end of the tth period is less than the average
cost per period through the end of t periods.
2. One should group replace at the end of the tth period if the cost of
individual replacements for the tth period is greater than the average cost
per period through the end of t periods.
The use of these decision rules for the light bulb example (see Table 34)
is illustrated in Table 36. For a full discussion of this replacement model
and the solution of the light bulb example, see Ref. 1.
Solution of Replacelllent Problellls by Monte Carlo Technique

In the determination of optimum group replacement policies for items
that fail, one needs to determine that value of t such that
(114)
will be a minimum, when
N = number of units in the group,
C 1 = unit cost of replacement in a group,
C2 = unit cost of individual replacement after failure,
cp(t) = number of failures in time t.
Failure Equations. If f(t) is the probability density function of failure,
then the expected number of first generation failures in time t is given by
(115)

(t)

=

N {I(t) dt

=

NF(t),

where
(116)

F(t)

=

f)(t) dt.

Similarly, the number of second generation failures is given by

That is,
t

(117)

CP2(t) = N

f. F(Ol)F(t -

a) dOl.

The number of third generation failures is given by

OPERATIONS RESEARCH

That is,
t

(t) = NF(t)

+N

1.

t

F(a)F(t - a) da

+ N f.~o £~a Ji'(a)F({3 -

a)F(t - (3) d{3 da

+ .. '.

Unless simplifying assumptions are made relative to second and higher
generation failures, it is almost impossible to obtain an analytic solution
for Cj)(t) as given by eq. (119). However, by the use of the Monte Carlo
technique one can solve for values of cJ>(t) without making any simplifying
assumptions. That is, one can determine cJ>(t) for many values of t and then
construct K(t) as a function of t in order to determine the optimum group
replacement policy.
EXAMPLE. The Use of the Monte Carlo Technique in Solving Replacement
Problems. Assume that one wishes to determine the optimum group replacement policy for a group of light bulbs whose life pattern follows a
normal distribution, the mean and standard deviations of which are 30 and
10 days, respectively. (That is, J.L = 30 days and (]' = 10 days.) Furthermore, assume that
C1 = $0.50,
C2 = $1.00,

N = 10,
T = 360 days,

where T is the total time period under consideration.
For purposes of illustration only, further assume that if group replacement is used, it can be done only at the end of 10, 20, 30, or 40 days.
A chart can then be set up and, by use of a table of random normal numbers, the total expected number of failures can be determined for each
value of t (t = 10, 20, 30, or 40 days). Tables of random normal numbers
are based on a mean of 0 and a standard deviation of 1. Hence, any number selected from the table of random normal numbers must first be multiplied by 10 and then added to (if positive) or subtracted from (if negative)

OPERATIONS RESEARCH

15-98

30. Thus, if the first random normal number selected from the table is
0.464, the adjusted random normal number will be
(0.464) (10)

+ 30 = 34.64.

That is, in the simulation of the light bulb system, the first bulb will last
34.64 (or 35) days before failing.
The next number from the table of random normal numbers is, say,
0.137, which is adjusted to (0.137)(10) + 30 = 31.37. Therefore, the replacement to the first bulb will last 31 days, that is, 35 + 31 = 66 days
after the start of the analysis. Therefore one can expect the first bulb to
burn out after 35 days and its replacement to last through the balance of
the 40-day period under discussion.
This procedure is carried out for all ten lighting fixtures in the installation, and the expected number of failures for each of the intervals 10, 20,
30, and 40 days is determined as in Table 37.
TABLE

~
10

20
30
40

2

1

37.

3

FAILURE TABLE FOR

4

5

N

6

=

7

10

9

8

10

Total
Failures

-- - -----31
31
31
31,36,62

35
35
35
35,66

55 27
55 27
55 27,38
55 27,38,75

29
29
29,64
29,64

33
33
33
33,47

27
27
27,59
27,59

43
43
43
43

32
32
32
32,36

20
20
20,55
20,55

0
0
4
10

The entire procedure was repeated nine more times and the ten samples
(each of sample size N = 10) gave the results shown in Table 38.
TABLE

38.

SUMMARY TABLE FOR TEN SAMPLES WITH

t
10

20
30
40

N = 10

Total Number
of Failures

Average Number
of Failures, cfJ(t)

1
15
51
96

0.1
1.5

5.1
9.6

From Table 38 and eq. (114), one can determine and compare the cost of
group replacement for the 10-, 20-, 30-, and 40-day periods. These total
expected costs over time period T (360 days) are:

KlO

=

3160

0[(10)(0.50)

+ (0.1)(1.00)]

=

$183.60,

OPERATIONS RESEARCH

+ (1.5)(1.00)]
](30 = 3360°[(10)(0.50) + (5.1)(1.00)]
](40 = .!14600[(10)(0.50) + (9.6)(1.00)]
](20

=

2260°[(10)(0.50)

15-99

= $117.00,
= $121.20,

= $131.40.

Thus, if one is to group replace at 10-, 20-, 30-, or 40-day intervals, one
would do so every 20 days. (This assumes, of course, that, in practice, one
has taken sufficiently large sample of random normal numbers.)
If one did not group replace, one could expect to replace each bulb, on
the average, every 30 days. Accordingly, the total expected cost over time
period T of not group replacing, call it I(~J' is

a

ICf)

= 10(3360°) (1.00) = $120.

Therefore, under the assumptions of this illustration, one should group
replace every 20 days (since Koc; > K20)'
Other Models

Although the solutions presented apply only to the particular model
described earlier, models of other characteristics may be approached in the
same way. For example, a model could be concerned with group replacement in which new bulbs are used for group replacement only, and used
bulbs replace failures in between group replacements. A different model
is needed when surviving bulbs are replaced at a fixed age, rather than at
fixed intervals of time. The considerations of this chapter have been limited to demonstrating an approach to two basic replacement problems, one
involving deterioration, and the other involving probabilistic life spans of
equipment.
For a discussion of the models which have been developed and solutions
obtained for various sets of assumptions about the conditions of the problem, see Ref. 1, Chap. 17. For a useful review of equipment replacement
rules from an industrial point of view, see Ref. 66.
7. COMPETITIVE PROBLEMS

Introduction

A competitive problem is one in which the efficiency of one's decision is
affected by the decisions of one's competitors. Such problems include, for
example, competitive advertising for a relatively fixed market or bidding
for a given set of contracts.
Game Problems. The most publicized competitive problem in O.R.
is the "game" as developed by the late John von Neumann and discussed
in his Zur Theorie der Gesellshaftsspiele (Ref. 67) in 1928 and, jointly with

15-100

OPERATIONS RESEARCH

Oskar Morgenstern, in their Theory of Games and Economic Behavior (Ref.
68) in 1944.
For many decades, economists tended to take as their standard model
for their science, the situation of Robinson Crusoe, marooned on an uninhabited island and concerned with behaving in such a manner as to maximize the goods he could obtain from nature. It was generally felt that it
would be possible to gain insight into the behavior of groups of individuals
by starting with a detailed analysis of the behavior in this simplest possible
case: the case of a single individual all alone and struggling against nature.
This line of attack on economic problems, however, suffers from the defect
that in going from a one-man society to even a two-man society, qualitatively different situations arise which could not have been foreseen from
the one-man case. Von Neumann was led to believe that group economics
could more profitably be viewed as analogous to parlor games of strategy.
V on Neumann's game is characterized by a fixed set of rules and a known
number of competitors whose possible choices are also known. Furthermore, the payoff for each combination of choices is also assumed to be
known. The solution to von Neumann's game is obtained by a principle
of conservatism called the minimax principle, namely one which will maximize the minimum expected gain or minimize the maximum expected loss.
Very little has been accomplished by way of applying the von Neumann
theory of games. Military applications have been referred to but have
not been made public. Several authors have explored the possibility of
applying game theory to industrial problems, but they have not dealt with
actual applications. What then is the significance and value of game
theory? This can best be answered by quoting Williams' (Ref. 69, p. 217)
succinct appraisal:
While there are specific applications today, despite the current limitations of the
theory, perhaps its greatest contribution so far has been an intangible one; the
general orientation given to people who are faced with over-complex problems.
Even though these problems are not strictly solvable-certainly at the moment
and probably for the indefinite future-it helps to have a framework in which to
work on them. The concept of a strategy, the distinction among players, the role
of chance events, the notion of matrix representations of the payoffs, the concepts
of pure and mixed strategies, and so on, give valuable orientation to persons who
must think about complicated conflict situations. *

Bidding Problellls. A second type of competitive problem is one in
which bidding takes place. Bidding problems differ from game problems
in that: (a) the number of competitors is not usually known, (b) the number of choices is not known (since one can bid over a large range), and (c)
the payoffs are not usually known but, rather, are subject to estimation

* Reprinted by permission from The Compleat Strategyst by J. D. Williams, copyright 1954. McGraw-Hill Book Co.

OPERATIONS RESEARCH

15-101

(e.g., in bidding for mineral rights). Furthermore, in some bidding situations (e.g., those in which one bids a dollar amount plus a percentage of
the royalties), one may not be able to determine readily whether or not a
given bid would have won or lost.
Only a limited theory of bidding exists to date, although the concepts
and techniques of statistical decision theory hold great promise in this area.
A major research contribution has been made by Friedman (Ref. 70 and
Ref. 1, Chap. 19). The number of applications of bidding theory has been
very limited; however, in at least one instance, the results obtained have
been spectacularly successful. Bidding models will not be discussed here.
See Refs. 1 and 70.
The Theory of GaInes
Defini tions.
Game, a set of rules and conventions for playing.
Play, a particular possible realization of the rules.
Move, a point in a game at which one of the players selects an alternative

from some set of alternatives.
Choice, that particular alternative selected.
Strategy, a player's predetermined method for making his choices during
the play.
Classification of GaInes.
1. Players, the number of sets of opposing interests: (a) one-person, (b)
two-person, (3) n-:-person (n > 2).
2. Payment, (a) zero-sum game, a game in which the sum of the payoffs,

counting winnings as positive and losses as negative, to all players is zero;
(b) nonzero-sum game, a game in which the sum of the payoffs to all players
is not zero.
3. Number of moves: (a) finite, (b) infinite.
4. Number of choices: (a) finite, (b) infinite.
5. Amount of information regarding opponent's choices: (a) all, (b) part,
(c) none.
One-Person GaInes. One-person zero-sum games are trivial games
which say "do nothing" since there is no gain to be made by the one participant in the game. One-person nonzero-sum games are the ordinary
maximization and minimization problems solvable by calculus and other
optimization techniques. Thus, in order to study the characteristic properties of games of strategy, it is necessary to go to games which involve
more than one player. The discussion here will center mainly on twoperson zero-sum games.
Two-Person, Zero-SuIn GaInes. Analysis of the very simplest of
games shows that there are two general kinds, which may be illustrated
by two kinds of coin matching.

15-102

OPERATIONS RESEARCH

Single Strategy Games. Assume that one is matching dimes and quarters
where, if both coins are the same, it is a standoff, but if the coins differ, the
quarter takes the dime. In this game it is safest always to playa quarter,
for then one can never lose, whereas one may lose by playing a dime.
Such games in which each opponent will find it safest to stick to one
strategy are called single strategy games.
Mixed Strategy Games. The second general type of game may be illustrated by the usual penny-matching situation in which each player chooses
either heads or tails. If the coins match, the matcher wins; if they do not
match, the matchee wins. In this case, if either player sticks to one strategy, he may consistently lose. The only safe way to play the game is to
play heads or tails in a completely random manner, as, for instance, by
flipping the penny in the air just before one plays it. Such games are called
games of mixed strategy.
Payoff Matrix. Games can have any number of strategies. In principle,
once each player has chosen one of the sets of strategies available to him,
it is possible to calculate the probable outcome of the game. The net payoffs can then be arranged in a two-dimensional matrix, the payoff matrix.
From the payoff matrix, one can then find whether the game is a single or
a mixed strategy game and, if mixed, in what proportions to mix the
playing.
For further discussion of the definitions, classification of games, and
examples of the construction of payoff matrices, see Ref. 71, Chap. 1.
Single Strategy, Two-Person, Zero-Sulll Gallles
Minilllax Principle.
respect to player PI is

Consider the game whose payoff matrix with

A=

If player PI chooses the number i (i.e., adopts the ith strategy, i = 1, 2,
.. " m), he is certain to receive at least
j

= 1,2, "', n.

Since he can choose i as he pleases, he can, in particular, choose i so as to
make minjaij as large as possible. Thus player PI can choose i so as to
receive at least
max min aij.
i

i

OPERATIONS RESEARCH

15-103

Similarly, player P 2 can choosej so as to make certain that he will receive
at least
max min (- ail),
i

j

since for two-person, zero-sum games, the payoff matrix with respect to
player P 2 will consist of elements (payments) which are the negative of the
elements of matrix A. That is, player P 2 can choose} so as to make certain
that he will receive at least
-min max ail
l

i

or, equivalEmtly, that player PI will get at most
min max ail.
l

i

Saddle Point. In summary, PI can guarantee that he will receive at least

max min ail
l

i

and P 2 can prevent PI from receiving more than
min max ail.
If

i

max min ail = min max ail = aiolo = v,
i

i

j

l

PI will settle for v and P 2 will settle for -v. Games for which the equation
above holds are called games with a saddle point. More specifically, (io, jo)
is called a saddle point and aiojo is called the value of the game for player Pl.
Furthermore, the best strategy for player PI is i o, and the best strategy for
player P 2 is jo. (See Ref. 71, Chap. 1.)
It should be noted that a saddle point of a matrix is a pair of integers
(io, jo) such that aiojo is at the same time the minimum element of its row
and the maximum element of its column.
Every single strategy two-person, zero-sum game has a saddle point. This
saddle point provides the solution of the game by designating the best
strategies for each player and the value of the game. Example. The game
represented by
437

6

5.2 1 0

o

1 3

4

2 2 1 5
has a saddle point at (1, 2). Its solution consists of the strategies 1 and 2,
respectively, and the value of the game is 3.

OPERATIONS RESEARCH

15-104

Stated in another manner, every game which contains a saddle point is a
single strategy game (see Ref. 71). , Games without saddle points, such as
the game represented by
1

I -1

are mixed strategy games.

I

-1
1

Mixed Strategy, Two-Person, Zero-SuIn GaInes

Consider the (penny-matching) game whose payoff matrix is

P2
Heads
Heads
Tails

Tails

I:: :: I

Such a matrix has no saddle point and, hence, is not a single strategy game.
Furthermore, one can readily see that it makes little difference to player PI
whether he chooses strategy 1 (heads) or strategy 2 (tails), for, in either
case, he will receive 1 or -1 according as P 2 makes the same or opposite
choice. Player PI must play the game by making his selections by means
of some chance device. The procedures for determining optimum mixed
strategies are discussed below.
Dominance. If a = (ab a2, "', an) and b = (bb b2, "', bn) are vectors
(or rows or columns of a matrix), and if ai ~ bi (for i = 1, 2, "', n), one
says that a dominates b. If ai > bi (for i = 1, 2, "', n), one says that a
strictly dominates b.
Convex Linear Combination.
Let x(l) = (Xl (1), " ' , Xn (1»,

X(2) =

(Xl (2),

"',

X(r)

(Xl (r),

"', Xn (r»,

Xn (2»,

X = (Xl, "', Xn).
Let

a = (ab "', ar)
such that ai ~ 0 (i = 1, 2, "', r) and al + a2 + ... + ar = 1. Then X
is a convex linear combination of x(l), "', x(r) with weights aI, "', ar if

Xj = alx/ I) + a2x/2)

+ ... + arx/ r),

for} = 1, 2, "', n.

OPERATIONS RESEARCH

15-105

Thus, the point (0, 15) is a convex linear combination (with weights 7'13,
of the points (6, 12), (-9, 15), and (4, 16).
THEOREM.
Let r be a rectangular game whose matrix is A; suppose that,
for some i, the i-th row of A is dominated by some convex linear combination of
the other rows of A; let A' be the matrix obtained from A by omitting the i-th
row; and let r' be the rectangular game whose matrix is A'. Then the value of
r' is the same as the value of r; every optimum strategy for P 2 in r' is also
an optimum strategy for P 2 in r; and if w is any optimum strategy for PI in
r' and x is the i-place extension of w, then x is an optimum strategy for PI
in r. Moreover, if the i-th row of A is strictly dominated by the convex linear
combination of the other rows of A, then every solution of r can be obtained in
this way from a solution of r'. (See Ref. 71, Chap. 2.)
Note. A similar theorem applies to dominating columns. (See Ref. 71,
Chap. 2.)
EXAMPLE. The following example of the application of this theorem is
cited in Ref. 71 (p. 50):

3/s, and 72)

3

2

4

0

3 4

2 4

4

2

4 0

0

4 0 8

Row 1 is dominated by row 3, yielding
3

4 2 4

4

2

4 0

0 4 0

8

Column 1 dominates column 3, resulting in
424

240
408
Column 1 dominates a convex linear combination of columns 2 and 3,
namely:
4 > !(2) + !(4),

2 = !(4)

+ !(O),

4 = !(O)

+ !(8).

OPERATIONS RESEARCH

15-106

Thus, the first column can be omitted, yielding
2 4
4

0

o

8

Row 1 is now dominated by a convex linear combination of rows 2 and 3,
sInce
2 = ~(4) + ~(O),
4 = ~(O)

+ ~(8).

Therefore, the matrix reduces to

As wiII be seen later, the solution to this latter matrix consists of the mixed
strategy (%, Ys) for each player and a game value of %. Therefore, the
value of the original game is ~-s, and the optimum strategy for the original
game is (0, 0, %, Ys) for each player.
General Theorems for Rectangular Games (Refs. 1 and 71)
THEOREM 1. Every rectangular game has a specific value g. This value is
unique. Furthermore, there exists for player PI a best strategy, i.e., there exist
non-negative frequencies Xl, X2, "', Xm such that Xl + X2 + ... + Xm = 1
and such that if he plays plan I with frequency XI, plan I I with frequency
X2, "', plan M with frequency Xm, then he can assure himself at least an
expected gain of g, which is the value of the game.
Similarly, for player P 2, there exists a best strategy Y = (YI, Y2, "', Yn),
YI + Y2 + ... + Yn = 1, such that ~f P 2 played plans I, II, "', N with the
above frequencies, respectively, he (P 2) can assure himself at most a loss of g.
THEOREM 2.
The unknowns, XI, X2, .. " Xm, YI, Y2, "', Ym and g (for the
solution of a game) can be determined from the following relations:
m
Xl

+ X2 + ... + Xm == 2: Xi =

1,

Xi ~ 0

(i = 1, 2, "', m) ;

1,

Yj ~ 0

(j = 1, 2, "', n);

i=l
n

YI

+ Y2 + ... + Yn == 2: Yj =
j=l
11~

2: Xiai i ~ g

(j=1,2,"',n);

i=l
n

L: aiiYj ~ g
j=l

(i = 1, 2, "', m).

OPERATIONS RESEARCH

15-107

THEOREM 3. Let X* = (Xl *, X2*' ... , Xm *) and y* = (YI *, Y2*, ... , Yn *)
be any optimal strategies for PI and P 2 , respectively, for a game whose value is g.
If, for any i,
n

L: aijYj < g,
j=I

then
Similarly, if for any j,

m

L: Xiaij > g,
i=I

then

Y/ == o.
Solutions of Rectangular

G~llles

Two-by-Two Gallles. To solve two-by-two rectangular games, first
look for a saddle point. If one exists, the game is a single strategy game and
the solution is immediately given as discussed above. If no saddle point
exists, the game is a mixed strategy game and is solvable by either of the
following methods.
Algebraic Solution.

Given:
a

b

c

d

A=

Let X and 1 - X be the frequencies with which PI plays plans I and II
respectively. Then, if player P 2 plays plan I, PI can expect
a(x)

+ c(l

- X) =

C

+ (a -

c)x.

On the other hand, if player P 2 plays plan II, PI can expect
b(x)

+ del -

X) = d

+

(b - d)x.

The solution of any two-by-two game is given by the minimax principle,
namely, by solving
c

EXAMPLE.

+ (a -

c)x

= d

+

Given:

-3

7

6

1

(b - d)x.

OPERATIONS RESEARCH

15-108

Then
-3(x)

+ 6(1

- x) = 7(x)

yields

+ 1(1

- x)

i,
= i.

x =
(1 - x)
Similarly, one determines that

and

y =

-g.,

(1 - y) =

!.

Finally, the value of the game is given by (since x =
g = -3(i)

+ 6(i)

=

~)

3.

For this and other algebraic procedures, see Ref. 1, Chap. 18.
Method of Oddments (Two-by-Two Game).
The method of oddments for two-by-two games is given by Williams
(Ref. 69).
EXAMPLE. The method may be stated. by means of the game whose
payoff matrix is

Plan

I

II

--- - - --

I

-3

7

--- - ---

II

6

1

To determine the optimum frequencies for PI, subtract the numbers in
the second column from those in the first column. This gives:

One of the two numbers will always be negative. Ignore the minus sign
for the purpose of computing oddments.

OPERATIONS RESEARCH

15-109

Then, the oddment for PI (1) is given by
I

5
whereas the oddment for PI (II) is given by
10
II
Therefore, the oddments for PI are 5 and 10, respectively, or, equivalently, the optimum frequencies are
5

1

--- = -

5

+ 10

3

10
and - - 5 + 10

2

=-.

3

Similarly, by subtracting rows, one can determine that the optimum
frequencies for player P 2 are % and %.
Two-by-n GaInes. To find the solution of a two-by-n game:
1. Look for a saddle point. If one exists, the game is a single strategy
game and the solution is given by the saddle point.
2. If no. saddle point exists, examine the payoff matrix for dominance
and, eliminate all dominated strategies (if any) for PI and all dominant
strategies (if any) for P 2 •
3. The matrix which remains will then contain a two-by-two submatrix
with the property that its solution is also a solution to the two-by-n game.
The pertinent two-by-two submatrix can be found in one of several ,vays,
probably the easiest of which is the graphical method.
Graphical Solution of Two-by-n Games.
Given the game whose payoff matrix is:
(P 2 )

1

1

2

-6

1

4

3

3

5

5

0

6

7

-4

-1

- - - - - -- - - - - -- 2

7

3

-2

4

-3

0

1

OPERATIONS RESEARCH

15-110

Plot the payoffs for each strategy of P2 on two parallel axes, as shown in
Fig. 13.
Then, join the line segments which bound the figure from below and mark
the highest point on this boundary. The lines which intersect at this. point
identify the strategies that player P2 should use.
7

3

3
1

O~~----~~~~~------~O

-1

-2
-3

-4
-6
FIG. 13.

Graphical solution of two-by-n game.

In the given example, these are strategies 5 and 6. (N ote that strategies
2, 4, and 7 dominate strategy 6 and could have been eliminated immediately. Similarly strategy 3 dominates 5 and could be eliminated.) Therefore, the appropriate two-by-two subgame is

1

o

-4

2

-3

0

which, by the method of oddments, gives

(t, t)
and a game value of g = -177.
x* =

and

y*

= (1-, t)

OPERATIONS RESEARCH

15-111

Hence, the solution to the original game is
X* =

(-'f, :})

y* = (0, 0, 0, 0, -},

and

t, 0),

and, again, g = _177It should be noted that, for m-by-two games, one proceeds as above,
marking, however, the line segments which bounds the graph from above
and then identifying the lowest point on this boundary. N ole. This is
merely a graphical application of the minimax principle. See Refs. 1, 69,
and 71.
Three-by-Three Games. To find the solution of three-by-three
games:
1. Look for saddle point.
2. If none exists, examine the payoff matrix for dominance and reduce it
accordingly.
3. If a three-by-three matrix remains, solve by method of oddments to
see if a three-by-three solution exists.
4. If oddments method fails, try the two-by-two subgames for a solution.
EXAMPLE. Method of Oddments (Three-by-Three Game).
Consider the
game whose payoff matrix is

1
1

3

6
6
- -- -- 8
--:2

°

°

- -- -- -

3

4

6

5

To determine the optimum frequencies for player PI, subtract each column
from the preceding column, yielding

1

3

6

-6

10

-2

-2

1

15-112

OPERATIONS RESEARCH

The oddment for PI (1) is given by

1
10

-2

-2

1

the numerical value of which is the difference between the diagonal products:
10(1) - (-2)( -2) = 6.
Similarly, P 1 (2) is given by
6

-6

-2

1

2

namely,
6(1) - (-6)( ~2) = -6,

and P 1 (3) is given by
6

-6

10

-2

3
or
(6)( -2) - (-6)(10) = 48.

Therefore, the oddments for PI are
6:6:48

so that the optimum frequencies are

X* =

(lo,

110'

t).

OPERATIONS RESEARCH

15-113

Similarly, by subtracting rows, one determines the oddments for P2,
namely,
38: 14:8
and optimum frequencies

y*

=

(~8,

7
4
3 0' 3 0)'

Furthermore, the value of the game is given by
g=

(1)(6) 1r 1(8) 1r 8(4)
10

23
=_e
5

Note. Every solution obtained by the method of oddments must be
tested. It may well be that the three-by-three game does not have a threeby-three solution, but, rather, a two-by-two solution. See Ref. 69.
Three-by-n GaInes. To find the solution of three-by-n games:
1. Look for a saddle point.
2. If none exists, examine the payoff matrix for dominance and reduce it
accordingly.
3. If a three-by-n matrix remains, the problem is then to find the solution
by the earlier methods, since every three-by-n matrix has solutions which
are either three-by-three or two-by-two (or a saddle point).
Solve the two-by-n sub games by the graphical method. If no two-bytwo solutions exist, the solution must then be a three-by-three solution
which can be obtained by successively trying each three-by-three subgame.
See Refs. 1, 69, and 71.
Four-by-Four GaInes. For games which do not have a saddle point
(i.e., mixed strategy games) and which, after removing rows and/or columns due to dominance, reduce to a four-by-four game, there is a method
of oddments for obtaining the desired solution. For this method, see
Williams (Ref. 69).
Other Solutions of Rectangular GaInes

There are a variety of other methods for solving rectangular games, a
few of which are cited here.
Matrix Solution of GaInes.

Let A = (aij) be the m X n matrix of a game,
B = (b ij ) be any square submatrix of A of order r
J r = (1,1, "',1), a 1 X r matrix,
CT = transpose of C, where C is any matrix,
adj B = adjoint of B,
Xi ~ 0,
~Xi = 1,
X = (XI, X2, "', x m),
Yj ~ 0,
~Yi = 1,
y = (YI, Y2! "', Yn),

>

1,

OPERATIONS RESEARCH

15-114

x=

a 1 X r matrix obtained from X by deleting those elements
corresponding to the rows deleted from A to obtain B,
Y = a 1 X r matrix obtained from Y by deleting those elements
corresponding to the columns deleted from A to obtain B.
Solution.
1. Choose a square submatrix B of A of order r (~2) and calculate
_

X -

J r adj B

- Jr(adj B)Jl

= (Xl X2

"

••• , X r ),

.

and

2. If some Xi < 0 or some Yj < 0, reject the chosen B and try another.
3. If Xi ~ 0 and Yj ~ 0 for all i, j = 1, 2, ... , r, calculate

IBI

g =----=
T

Jr(adj B)J r

and construct X and Y from X and Y by adding zeros in the appropriate
places.
Check whether
m

L: Xiaij ~ g,

for all j,

i=l

and whether
n

L: Yjaij

~ g,

for all i.

/=1

If one of the relations does not hold, try another B. If all relations hold,
then X, Y, and g are the required solutions. See Refs. 1 and 71.
Iterative Method for Solving a GaIDe. There is an approximate
method of solving rectangular games which enables one to find the value of
such games to any desired degree of accuracy and also to approximate to
optimal strategies. See Ref. 71, Chap. 4, and Ref. 1, Chap. 18.
Solution of Rectangular GaIDes by Linear PrograIDIDing. It can
be shown that the problem of solving an arbitrary rectangular game can
be regarded as a special linear programming problem and, conversely, that
many linear programming problems can be reduced to problems in game
theory. Thus, the techniques for solving linear programming problems (e.g.,
the simplex technique), especially through the use of high-speed electronic
computers, can be applied to the solution of game theory problems. See Ref. 71,
Chap. 14.

OPERATIONS RESEARCH

15-115

Zero-Sum, n-Person Games

The theory of n-person games, n > 2, is not in an altogether satisfactory
state. For an excellent exposition on the elements of zero-sum n-person
games, see Ref. 71 and the original text on the subject, namely that of von
Neumann and Morgenstern (Ref. 68). A very brief discussion is to be
found in Ref. 1.
8. DATA FOR MODEL TESTING

Introduction. The type of evidence one uses to test a model depends
very much on the kind of test one has in mind. Tn testing a model one
asks, "What are the possible ways in which a model can fail to represent
reality ~dequately and hence lose some of its potential usefulness?" Following are four ways in which one may question the adequacy of a model.
1. The model may assert a dependence of the effectiveness of the system
(the dependent variable) on one or more (independent) variables which,
as a matter of fact, do not affect the system's effectiveness. That is, the
model may fail by including variables which are not pertinent.
2. The model may fail to include a variable which does have a significant
effect on the system's effectiveness.
3. The model may inaccurately express the actual relationship which
exists between the measure of effectiveness and one or more of the pertinent
independent variables.
4. Finally, even if the model is an accurate picture of reality in the sense
of conforming to the foregoing three conditions, it may still fail to yield
good results if the parameters contained in it are not evaluated properly.
In testing the model, begin by testing it as a whole, i.e., by determining
the accuracy of its prospective or retrospective predictions of the system's
effectiveness. If this procedure shows that the model is not adequate, further testing will be required to find out which of the four types of deficiencies
mentioned here is present.
The design of the process of collecting data consists of the following
parts: (1) definition (including measurement), (2) sampling (including experimental designs), (3) data reduction, (4) use of the data in the test, (5)
examination of the result, and (6) possible redesign of the evidence.
S~ientific Definitions. Scientific defining consists of specifying the
best conceivable (not necessarily obtainable) conditions under which, and
procedures by which, values of the variables can be obtained.
Concern with ideal (or optimum) observational conditions and procedures
is quite important if one wants to know how good are the results one eventually obtains. Further, and more important, the ideal conditions and procedures act as a standard by means of which one can evaluate the attainable

15-116

OPERATIONS RESEARCH

observational conditions and operations, determine their shortcomings,
and make any necessary adjustments in the resultant data. For a detailed
discussion of scientific defining, see Ref. 72.
The two most common types of quantitative variables are the enumerative and the metric. The enumerative variable requires counting for its
evaluation whereas the metric variable requires measurement.
Scientific Definitions of Enumerative Variables. Two types of
errors can arise in the counting operation, overenumeration and underenumeration. Overenumeration results either from counting the same unit
more than once or from counting units which should not be counted at all.
Underenumeration, on the other hand, results from the failure to count a
unit which should be counted. Furthermore, these errors can occur because of a failure to match elements with consecutive integers (e.g., overenumeration because of skipping numbers and underenumeration through
duplication of numbers).
It is desirable to design the best conceivable counting procedure, even if
the design cannot be carried out in practice. This involves specifying the
standard environment in which, and the standard operations by which,
the count can ideally be made, as well as providing an explicit definition
of the elements to be counted. Once this standard is specified, it will be
possible to use it to evaluate alternative practically realizable counting
procedures and to select the best of these. The standard also provides a
basis for estimating the error that is likely to occur in the practical counting procedure which is eventually used.
Scientific Definitions of Properties (Metric Variables). The idealized design of a procedure for measuring properties depends primarily on
the type of property involved. Scientific definitions of -properties involve
specifying the following characteristics of the idealized measuring procedure:
1. Identification of the thing, event, or class of things or events which
should be observed.
2. Specification of the environment in which the observations should be
made.
3. Specification of the changes in the environment which should be made,
if any, during the observation period.
4. Specification of the operations to be performed and the instruments
and measure to be used by the observer.
5. Specification of the readings (data) to be made.
6. Specification of the analysis of the data.
The formal description of the measure to be used states what logical and
mathematical operations one wants to be able to perform on the data to be
obtained in evaluating a variable. For a complete discussion of the theory

OPERATIONS RESEARCH

15-117

of measurement see Refs. 73-79. The scientific definition (observational
standard) states how, ideally, one would go about collecting pertinent data.
The operational specification of the data collection process states how one
actually intends to collect and adjust the data. Errors can arise in each of
these three stages of planning relative to testing the model.
Sampling. In evaluating variables, one is either involved in measuring
the property of a single unit or in counting the members of, or measuring
the properties of, a class of units (a population). The definition of a property of a single unit specifies the conditions under which the observation
should be made. If these conditions can be met and observations can be
made without error, only one observation is required. But if the conditions are not met, observations are subject to error which can only be estimated if two or more observations are made. How many observations to
make, and where, are sampling questions. Since the standard conditions
specified in the definition can seldom be met in practice, one must choose
one of two courses: (1) An experimental design must be chosen which, by
techniques such as the analyses of variance and covariance, makes it possible to assess the magnitude of the deviations and ascribe them to specific
environmental factors, or (2) observations must be made on a subset (of
the population) which make it possible to draw inferences that are valid
for the whole population with the least possible bias. The subject of sampling is concerned with the selection of appropriate subsets.
In the main, sampling can be described as the selection of items from a
population. The "population" of objects, events, environments, and stimuli to be sampled should be specified in the definition of the variable being
evaluated. The population represents all the possible data of the relevant
kind that can be collected.
Evaluation of Samples and Sample Estimates. The decision which
must be made in designing a sampling procedure is concerned with the
method of drawing the sample and the method of making estimates about
the population from the sample. If a prescribed method is carried out correctly, there are two opposing considerations: (1) the probability that the
estimate made on the basis of the sample will actually deviate from the
true population value by an amount greater than some amount x; (2) the
cost of taking the sample.
In the main, the probability of deviations will decrease with an increase
in the sample size, but the cost of taking the sample will increase with an
increase in sample size.
Types of Sampling Designs. In unrestricted random sampling every
possible sample has the same chance of being chosen. Restricted random
sampling represents methods by which each possible sample does not have
an equal probability of being drawn. But in each case where random

15-118

OPERATIONS RESEARCH

sampling is used scientifically, the probability of selecting any sample is
known.
All the various schemes for sampling are based on very simple, practical
con~iderations. These are:
1. Items of the population may fall into recognizable groups (e.g., in
terms of location or dollar amounts on an invoice). If this is the case, it is
reasonable to think in terms of sampling from these groups,because in
general one reduces the variance of the estimates and (more important) one
can be selective in the amount of sampling that is done in each group. Invoices with large dollar amounts are more important than ones with small
dollar amounts; hence a larger sample of the more important items should
be taken.
2. Items of a population often fall into clusters (e.g., a shipment shown
on an invoice; people in a house, block, or town; items in a warehouse). If
one looks at some item in a cluster, one might just as well look at the rest
of the items. Hence the cluster becomes the basis of sampling, not the
original items. The use of clusters may increase the variance of the estimates but greatly decrease the costs of gathering the sample-the usual
economic balancing problem.
3. One does not have to plan completely in advance. One can let the
sample information that comes in dictate how the next steps are to be taken.
The following is a general classification of the principal types of sampling
designs:
1. Fixed sampling design. The sampling design is fixed and not subject to
change in terms of sample data.
A. Unrestricted random sampling. A random sample 'is selected from
the whole population by either
1. Simple random sampling. Assigning a different number to each
element in the population and using random numbers to select
the sample, or
2. Systematic random sampling. Where a population is ordered,
selecting a starting place at random and then selecting subsequent elements at a fixed interval from the first and subsequent
selections.
Tables of random numbers can be found in Refs. 10 and 80-82.
Details on the generation of such numbers can be found in Ref. 83.
B. Restricted random sampling. The population is divided into subgroups (and possible subsubgroups, etc.) and either some of these
are selected and/or random samples from some or all of these are
selected.
1. Multistage random sampling. Random samples are drawn from
subgroups which ,have themselves been selected (a) with equal

OPERATIONS RESEARCH

15-119

probability, or (b) with probability proportionate to the relative
size of the subgroup, or some other criterion.
2. Stratified random sampling. A random sample is drawn from
every subgroup of the population. The size of the sample from
the subgroups may be (a) independent of the size of the subgroups (i.e., samples of equal size), (b) proportionate to the relative size of the subgroup, or (c) proportionate to the relative size
of the subgroup and the dispersion of the elements within it
(optimum allocation).
3. Cluster sampling. A random sample of subgroups is selected, all
elements of which are included in the final sample.
4. Stratified cluster sampling. A combination of B2 and B3, where
more than two stages of sampling are involved.
II. Sequential sampling. A small random sample is selected and analyzed,
on the basis of which a decision is made as to whether or not to continue
sampling and if so, how. The samples may be either
A. In groups, as in double or multiple sampling, or
B. Single items taken one at a time.
For details on sequential sampling see Refs. 84-87.
The aspect of sampling called experimental design usually refers to a
sampling plan based on the variables in the model which is to be tested.
Instead of keeping everything fixed except one variable, it is possible to
design data collection systems at optimum locations of some of the variables
of the model. This sampling method assumes that the variables of the
model can be manipulated in reality-or at least in a realistic model. For
information on various types of experimental designs, see Refs. 88-91. For
comprehensive surveys of contemporary sampling theory see Refs. 72 and
92-95.
Reduction of Data. The observations made on the sampled items or
in a sample of situations provide the raw data on the basis of which variables are assigned values and hence, provide the basis for testing all or
part of the model. In many cases the data require collation, editing, coding, punching, etc. Discussion of these phases of data processing can be
found in Ref. 72, Chap. X.
In general, the ultimate form to which the data must be transformed to
be useful in the testing process will be either an estimate of the value of a
parameter or an inferential "statistic" which describes a relationship between two or more variables. For example, in testing a lot-size model the
cost variables must first be evaluated in order to compute the total cost
"predicted" by the model. Once these predictions are obtained they are
compared with observed values in order to derive a "statistic" which can
be used to determine whether or not the model predicts well.

15-120

OPERATIONS RESEARCH

For a discussion of data reduction and the problems of estimation and
obtaining estimates of the variability of estimates, see Ref. 1, Chap. 20.
For statistical tests of the significance or nonsignificance of a variable, see
Refs. 96-105. These tests may make it possible to determine such matters
as: (a) whether a variable should or should not be included, (b) whether the
form of an analytic function is linear or some other type, (c) whether the
form of a probability function is normal or some other type, (d) whether
the model has failed to include a variable that ought to have been included.
For further discussion of procedures for testing the adequacy of models
and the solutions derived from them, see Ref. 74, Chap. 20.
9. CONTROLLING THE SOLUTION
Introduction. Many, if not most, O.R. projects deal with mangement
decisions that are recurrent. Hence the solution must be used over and
over again. But the systems which are dealt with in O.R. are seldom stable.
Their structure is subject to change. Relationships between the variables,
or system parameters, which define the system and the value of the parameters themselves are usually subject to change.
In such situations the relationships and parameters used in the decision
rule must be adjusted for changes in the system as they occur. Costs may
change, the distribution of demand may change in some or all of its characteristics, and the relationships between variables may change over time.
Hence the values of the relationships and parameters should be periodically
reevaluated and the assumptions involved in the model (from which the
decision rule is derived) should be reexamined periodically. That is, the
solution must be controlled lest it lose some of its effectiveness because of
changes in the system.
Complete methodologies have not yet been developed for optimizing
control procedures. Enough is known, however, to design procedures
which are more likely to lead to success than either leaving this phase of
the project to chance or relying on others (management or operating personnel) to take care of them. For a complete discussion, see Ref. 1, Chap.
21.
Controlling the Solution. The effectiveness of a solution in an O.R.
problem may be reduced by changes in either values of the parameters of
the system or the relationships between them, or both. A previously insignificant parameter may become significant, or, conversely, previously
significant parameter may become insignificant. Changes in the values and
functional relations of the parameters which remain significant may also
affect the effectiveness of the solutiop..
Not every change in a parameter or relationship is significant. In general terms, a change is significant if (1) adjustment of the solution for the

a

OPERATIONS RESEARCH

15-121

change results in an improvement in effectiveness and (2) the cost of making the adjustment and carrying it out does not offset the improvement in
effectiveness.
Design of a control system, then, consists of three steps: (1) listing the
variables, parameters, and relationships that either are included in the solution or should be if their values were to change; (2) development of a procedure for detecting significant changes in each of the parameters and relationships listed; (3) specification of action to be taken or adjustments to be
made in the solution when a significant change occurs.
The l:;tst two steps are interrelated because determination of the significance of a change (step 2) depends on the cost of making the adjustment
specified in step (3).
Control of Parameters. The first step in designing a control procedure
involves listing all the variables and relationships which, if they were to
change in value, might affect the effectiveness of the solution.
The parameters which are listed should be classified into two types:
1. Variables whose values during the period covered by a decision can
be known in advance, such as the number of models in a line, the number
of work days in an accounting period, and the price for which an item is to
be. sold. Control of such measures consists either (a) of establishing communication lines between those who know these values and those who use
the decision rules or (b) of providing the latter with source material (such
as a calendar in the case of the number of work days per accounting period).
2. Measures whose values cannot be known in advance, such as number
of units sold, number of hours worked, and arrival rate of trucks. These
values must be estimated in advance.
Essential to the control of any measure, is the determination of whether
its true value or one or more of the characteristics of its estimate have
changed. This determination consists of testing the hypothesis that no
change has occurred in the variable or the characteristics of its estimate
(which are themselves variables).
Errors in Detecting Changes. Determination of whether or not such a
change has occurred is subject to two types of error: (I) asserting that a
change has occurred when it has not and (II) asserting that a change has
not occurred when it has. An understanding of the two types of error is
essential to comprehend what is involved in controlling a variable. See
Ref. 1, Chap. 21.
Detection of and Adjustment for Significant Changes. Ideally, in
the design of a control system for a variable, six interdependent decisions
should be made if possible. (In some situations there may be no choice
with regard to one or more of these decisions.) The decisions are as
follows:

15-122

OPERAliONS RESEARCH

1. The frequency of (i.e., period between) control checks.
2. The number of observations per control check, if more than one is
possible.
3. The way items should be selected for observation (i.e., the sampling
design), if more than one observation is specified.
4. The statistical testing procedure to be used to determine whether or
not a value has changed.
5. The specific decision rule based on the test.
6. The action to be taken, if the test indicates that a parameter's value
has changed.
Costs. Again, ideally, these decisions should be made in such a way as
to minimize the sum of the following costs:
1. The cost of taking the observations.
2. The cost of performing the test.
3. The expected cost of a type I error (i.e., the cost of changing a value
when it is not warranted).
4. The expected cost of type II errors (i.e., the cost of not changing a
value when it is warranted).
Unfortunately, at the present time the six decisions listed cannot be
made in such a way as to assure minimization of the sum of the four costs.
The design of an optimizing procedure can be specified in general terms;
i.e., a model can be constructed which expresses the total expected cost as
an abstract (but not as a concrete) function of the six decisions. In addition, some of the expressions which would appear in the model cannot be
evaluated. In most situations, for example, the expected costs associated
with type I and type II errors cannot be determined. For further reading,
see Refs. 106-109.
Further details on methods of controlling parameters are given in Refs.
110-118.
Controlling Relationships. Every probability distribution asserts a
relationship between the probability of an event and the values of other
variables. In the case of distributions which appear in the model and solution, as, for example, the distribution of demand, the parameters which
'define the distribution must be controlled (e.g., the mean and variance) as
well as the form of the distribution (e.g., normal or Poisson). Both aspects
of the distribution should be subjected to control.
There are no "standard" procedures for controlling the form of a distribution. Such control may be obtained by periodically testing the
"goodness of fit" in the manner given by standard statistics texts. The
frequency with which such tests should be conducted depends on the rate
at which data are generated. The visual plotting of data, as they become
available, can frequently indicate when a check should be made. Examina-

OPERATIONS RESEARCH

15-123

tion of these charts can provide clues to changes in the parameters of the
distribution as well as to the form.
The control of relationships which do not ta.ke the form of probability
distributions also involves control over the form of the function which relates the variables and the values of the variables.
Every O.R. project has unique characteristics which create unique control problems but which also offer challenging opportunities for the development of unusual control procedures. There is a good deal of room for
scientific creativity in this phase of the research. For a full discussion and
illustration of the development of control procedures, see Ref. 1, Chap. 21.
10. IMPLEMENTATION
Concern of the O.R. Team. Once the solution has been derived and
tested, it is ready to be put to work. Conversion of the solution into operation should be of direct concern to the research team for two reasons:
1. No matter how much care has been taken in deriving a decision rule
and testing it, shortcomings may still appear when it is put into operation
or ways of improving the solution may become apparent. If adjustment
of the decision rule to take care of unforeseen operating problems is left in
the hands of those who do not understand how it was derived, the adjustment may seriously reduce its effectiveness. Operating personnel may,
for example, see no harm in making what appears to them to be a slight
change, but such a change may be critical.
2. Carrying out the solution may not be as obvious a procedure in the
context of complex operations as it initially appears to be to the researchers.
The solution must be translated into a procedure that is workable if its
potential is to be fully realized. The procedure must be as accurate a
translation of the solution as is practically feasible and only the researchers
can minimize the loss in the solution's effectiveness that is incurred in this
translation.
The nature of the implementation problem depends on whether the solution pertains to a one-time or repetitive decision. In the case of one-time
decisions the problem is simpler but by no means disappears.
Translation of the solution into the operating procedure involves answering three questions and proceeding accordingly. The three questions are:
(1) Who should do what'? (2) When'? (3) What information and facilities
are required to do it? On the basis of the answers to these questions the
operating procedure can be designed and any necessary training and transition can be planned and executed.
Implementation of a solution involves people taking action. These people must be identified, and the required action must be specified. The details cannot be enumerated without a thorough knowledge of the opera-

15-124

OPERATIONS RESEARCH

tions and the division of responsibility in the organization under study.
The analysis of the organization provides much of the needed information
and the rest should be provided by management and operating people working with the research team. This and the other phases of implementation
require continuous cooperation and communication among management,
operators, and researchers.
Each person who is given responsibility for initiating action in carrying
out the solution or using the decision rule should be instructed as to when
they are to take action. The tools required to do the job should be made
available to those who need them, and these people should be trained in
their use. The tools should not be too complex for the operating personnel
to use. It may be necessary, for example, to convert even a simple equation
into a nomograph or tables. In some cases the tools may require simplification even if such simplification results in a loss of some of the original solution's power. The solution or decision rule is generally used by personnel
whose mathematical sophistication is less than desired. Consequently, if
one wants to assure use of the recommended decision rules, one must frequently simplify them before handing them over to executives and operating personnel. In many cases this means that one must either translate
elegant solutions into approximations that are easy to use or sidestep the
elegance and move directly to a "quick-and-dirty" decision rule.
It should be realized that in one sense almost every solution in O.R. is an
approximation and is "quick and dirty" to some degree. This follows from
the fact that in constructing every model some simplifying assumptions
are made. Reality is too difficult to represent in all its complexity. These
simplifying assumptions reduce the generality of the model and solutions
derived from it. But this is only a polite way of saying that quickness and
dirtiness are involved. It is well for the operations researcher to realize
that an approximate solution which is used may be a great deal better
than a more exact solution which is not.
For further discussion of the problems of implementing the solution, see
Ref. 1, Chap. 21.
REFERENCES
1. C. W. Churchman, R. L. Ackoff, and E. L. Arnoff, Introduction to Operations
Research, Wiley, New York, 1957.
2. F. N. Trefethen, "A History of Operations Research," in Operations Research for
Management, J. F. McCloskey and F. N. Trefethen (Editors), The Johns Hopkins Press,
Baltimore, Md., 1954.
3. M. L. Hurni, Observations on Operations Research, J. Opns. Research Soc. Am.,
2 [3], 234-248 (1954).

OPERATIONS RESEARCH

15·125

4. H. F. Smiddy and L. Naum, Evolution of a "Science of Managing" in America,
Mgmt. Sci., 1 [1], 1-31 (1954).
5. J. H. Curtiss, "Sampling Methods Applied to Differential and Difference Equations," in Seminar on Scientific Computation, International Business Machines Corporation, New York, 87-109, Nov. 1949.
6. 1. S. Sokolnikoff and E. S. Sokolnikoff, Higher Mathematics for Engineers and
Physicists, McGraw-Hill, New York, 1941.
7. R. E. Bellman, Dynamic Programming, Princeton University Press, Princeton,
N. J., 1957.
8. C. C. Holt, F. Modigliani, and H. A. Simon, A linear decision rule for production
and employment scheduling, Mgmt. Sci., 2 [1], 1-30 (1955).
9. C. C. Holt and H. A. Simon, Optimal decision rules for production and inventory
control, Proceedings of the Conference on Operations Research in Production and Inventory
Control, Case Institute of Technology, Cleveland, 0., 1954.
10. The RAND Corporation, A Million Random Digits, The Free Press, Glencoe, Ill.,
1955.
11. H. Kahn, Applications of Monte Carlo, Project RAND, RM-1237-AEC, RAND
Corporation, Santa Monica, Calif., April 19, 1954.
12. G. W. King, The Monte Carlo method as a natural mode of expression in Operations Research, J. Opns. Research Soc. Am., 1 [2], 46-51 (1953).
13. U. S. Department of Commerce, National Bureau of Standards, Monte Carlo
Method, Applied Mathematics Seminar 12, June 11, 1951.
14. J. B. Crockett and H. Chernoff, Gradient methods of maximization, Pacific J.
Math., 5 (1955).
15. B. Klein, Direct use of extremal principles in solving certain optimizing problems
involving inequalities, J. Opns. Research Soc. Am., 3 [2], 168-175 (1955).
16. H. W. Kuhn and A. W. Tucker, "Nonlinear Programming," in Second Berkeley
Symposium on Mathematical Statistics and Probability, J. Neyman (Editor), University
of California Press, Berkeley, Calif., 481-492, 1951.
17. K. Arrow, T. Harris, and J. Marschak, Optimal inventory policy, Econometrica,
19 [3], 250-272 (1951).
18. C. Eisenhart, Some Inventory Problems, National Bureau of Standards, Techniques of Statistical Inference, A2-2C, Lecture 1, Jan. 6, 1948 (hectographed notes).
19. C. B. Tompkins, Lead time and optimal allowances-an extreme example, Conference on Mathematical Problems in Logistics, George Washingt.on University, Appendix
I to Quarterly Progress Rept. No.1, Dec. 1949-Feb. 1950.
20. T. M. Whitin, The Theory of Inventory Management, 2nd edition, Princeton Universit.y Press, Princeton, N. J., 1957.
21. A. Dvoretzky, J. Kiefer, and.J. Wolfowitz, On the optimal character of the (s, S)
policy in inventory theory, Econometrica, 21 [4], 586-596 (1953).
22. A. Dvoretzky, J. Kiefer, and J. Wolfowitz, The inventory problem, Econometrica,
20 [2], 187-222 (1952); [3], 450-466 (1952).
23. E. B. Berman and A. J. Clark, An optimal inventory policy for a military organization, RAND Rept. D-647, RAND Corporation, Santa Moni.ca, Calif., March 30,1955.
24. H. A. Simon, On the application of servomechanism theory in the study of production control, Econometrica, 20 [2], 247-268 (1952).
25. R. Bellman, 1. Glicksberg, and O. Gross, On the optimal inventory equation,
Mgmt. Sci., 2 [1], 83-104 (1955).
26. H. J. Vassian, Applicat.ion of discrete variable servo theory to inventory control,
J. Opns. Research Soc. Am. 3 [3], 272-282 (1955).

15-126

OPERATIONS RESEARCH

27. A. Charnes, W. W. Cooper, and D. Farr, Linear programming and profit preference scheduling for a manufacturing firm, J. Opns. Research Soc. Am., 1 [3], 114-129
(1953).
28. G. Dannerstedt, Production scheduling for an arbitrary number of periods given
the sales forecast in the form of a probability distribution, J. Opns. Research Soc. Am.,
3 [3], 300-318 (1955).
29. R. Bellman, Some applications of the theory of dynamic programming, J. Opns.
Research Soc. Am., 2 [4], 275-288 (1954).
30. R. Bellman, Some problems in the theory of dynamic programming, Econometrica,
22 [1], 37-48 (1954).
31. R. Bellman, The theory of dynamic programming, Bull. Am. Math. Soc. [6], 503516 (1954).
32. R. Bellman, I. Glicksberg, and O. Gross, The theory of dynamic programming as
applied to a smoothing problem, J. Soc. Ind. Appl. Math., 2 [2], 82-88 (1954).
33. T. M. Whit in, Inventory control and price theory, Mgmt. Sci., 2, 61-68 (1955).
34. T. M. ·Whitin, Inventory control research: A survey, Mgmt. Sci., 1, 32-40 (1954).
35. H. A. Simon and C. C. Holt, The control of inventory and production rates-A
survey, J. Opns. Research Soc. Am., 2 [3], 289-301 (1954).
36. A. Charnes, W. W. Cooper, and A. Henderson, An Introduction to Linear Programming, Wiley, New York, 1953.
37. G. H. Symonds, Linear Programming: The Solution of Refinery Problems, Esso
Standard Oil Co., New York, 1955.
37a. A. Orden, Survey of research on mathematical solutions of programming problems, Mgmt. Sci., 1, 170-172 (1955).
38. G. B. Dantzig, Ref. 40, Chaps. I, II, XX, XXI, and XXIII.
39. A. Charnes and W. W. Cooper, The stepping stone method of explaining linear
programming calculations in transportation problems, Mgmt. Sci., 1 [1], 49-69 (1954).
40. T. C. Koopmans (Editor), Activity Analysis of Production and Allocation, Cowles
Commission Monograph No. 13, Wiley, New York, 1951.
41. A. Henderson and R. Schlaifcr, Mathematical programming, Harvard Business
Rev., 32, 73-100 (May-.Tune 1954).
42. T. L. Saaty, Resume of queuing theory, Opns. Research, 5, 161-200 (1957).
43. W. Feller, An Introduction to Probability Theory and Its Applications,Wiley, New
York, 1950.
·44. E. Brockmeyer, H. L. Holstrom, and Arne Jensen, The life and works of A. K.
Erlang, Trans. Danish Acad. Tech. Sci., 2, Copenhagen, 1948.
45. Raymond, Haller and Brown, Inc., Queuing Theory Applied to Military Communication Systems, State College, Pa., 1956.
46. A. Cobham, Priority assignment in waiting line problems, J. Opns. Research Soc.
Am., 2, 70-76 (1954); also 3, 547 (1955).
47. J. Y. Barry, A priority queuing problem, Opns. Research, 4, 385-386 (1956).
48. E. Koenigsberg, Queuing with special service, Opns. Research, 4, 213-220 (1956).
49. D. Y. Barrer, A waiting line problem characterized by impatient customers and
indifferent clerks, J. Opns. Research Soc. Am., 3, 360-361 (1955).
50. R. Kronig, On time losses in machinery undergoing interruptions, Physica, 10,
215-224 (1943).
51. A. B. Clarke, The time-dependent waiting line problem, Univ. Mich. Engr. Research Inst., Rept. No. M720-1 R 39, 1953.
52. A. B. Clarke, A waiting line process of Markov type, Ann. Math. Stat., 27 [2],
452-459 (1956).

OPERATIONS RESEARCH

15-127

53. T. Homma, On a certain queuing process, Rept. Stat. Appl. Research, 4 [1] (1955).
54. R. R. P. Jackson, Queuing systems with phase type service, Opnal. Research
Quart., 5, 109-120 (1954).
55. S. M. Johnson, Optimal two- and three-stage production schedules with setup
times included, Nav. Research Log. Quart., 1 [1], 61-68 (1954).
56. R. Bellman, Mathematical aspects of scheduling theory, RAND Rept. P-651,
RAND Corporation, Santa Monica, Calif., April 11, 1955.
57. S. B. Akers, .Jr., and J. Friedman, A non-numerical approach to production
scheduling problems, J. Opns. Research Soc. Am., 3, 429-442 (1955).
58. A. W. Br.own, A note on the use of a Pearson type III function in renewal theory,
Ann ..Math. Stat., ll, 448-453 (1940).
59. N. R. Campbell, The replacement of perishable members of an operating system,
.T. Roy. Stat. Soc., B7, 110-130 (1941).
60. W. Feller, On the integral equation of renewal theory, Ann. Math. Stat., 13, 243267 (1941).
.
61. A. J. Lotka, A contribution to the theory of self-renewing aggregates, with special
reference to industrial replacement, Ann. Math. Stat., 10, 1-25 (1939).
62. A. J. Lotka, The Present Status of Renewal Theory, Waverly Press, Baltimore, Md.,
1940.
63. B. Epstein and M. Sobel, Life Testing. I, J. Am. Stat. Assoc., 48, 486-502
(1953).
64. B. Epstein and M. Sobel, Some theorems relevant to life testing from an exponential distribution, Ann. Math. Stat., 25, 373-381 (1954).
65. L. Goodman, Methods of measuring useful life of equipment under operational
conditions, J. Am. Stat. Assoc., 48, 503-530 (1953).
66. Tested Approaches to Capital Equipment Replacement, Special Rept. No.1, American Management Association, New York, 1954.
67. J. von Neumann, Zur Theorie der Gesellshaftsspiele, Math. Ann., 100, 295-320
(1928).
68. J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior,
3rd edition, Princeton University Press, Princeton, N. J., 1953.
69. J. D. Williams, The Compleat Strategyst, McGraw-Hill, New York, 1954.
70. L. Friedman, Competitive Bidding Strategies, Ph.D. Dissertation, Case Institute
of Technology, Cleveland, 0., 1957.
71. J. C. C. McKinsey, Introduction to the Theory of Games, McGraw-Hill, New York,
1952.
72. R. L. Ackoff, The Design of Social Research, University of Chicago Press, Chicago,
Ill., 1953.
73. N. R. Campbell, An Account of the Principles of Measurements and Calculations,
Longmans, Green and Co., New York, 1928.
74. C. W. Churchman, "A Materialist Theory of Measurement," in Philosophy for the
Future, R. W. Sellars, V. J. McGill, and M. Farber (Editors), Macmillan, New York,
1949.
75. E. Nagel, Measurement, Erkenntniss, 2, 313-333 (1931).
76. E. Nagel, On the Logic of Measurement, Thesis, Columbia University, New York,
1930.
77. F. F. Stephan, "Mathematics, Measurement, and Psychophysics," in Handbook
of Experimental Psychology, S. S. Stevens (Editor), Wiley, New York, 1951.
78. S. S. Stevens, On the problem of scales for the measurement of psychological
magnitudes, J. Univ. Sci., 9, 94-99 (1939).

15-128

OPERATIONS RESEARCH

79. S. S. Stevens, On the theory of scales of measurement, Science, 103, 677-680
(1946).
80. H. B. Horton, Random Decimal Digits, Interstate Commerce Commission, Washington, D. C., 1949.
81. M. G. Kendall, "Tables of Random Sampling Numbers," in Tracts for Computers,
No. 24, Cambridge University Press, Cambridge, England, 1940.
82. L. H. C. Tippett, "Tables of Random Sampling Numbers," in Tracts for Computers, No. 15, Cambridge University Press, Cambridge, England, 1927.
83. M. G. Kendall and B. B. Smith, Randomness and random sampling of numbers,
J. Roy. Stat. Soc., 101, 147-166 (1938).
84. C. W. Churchman, Statistical Manual: Methods of Making Experimental Inferences, Pittman-Dunn Laboratory, Frankford Arsenal, Philadelphia, Pa., 1951.
85. Statistical Research Group, Sequential Analysis of Statistical Data: Application,
Columbia University Press, New York, 1946.
86. A. Wald, Foundations of a general theory of sequential decision functions, Econometrica, 15, 279-313 (1947).
87. A. Wald, Sequential Analysis, Wiley, New York, 1947.
88. W. C. Cochran and G. M. Cox, Experimental Designs, Wiley, New York, 1950.
89. W. T. Federer, Experimental Design, Macmillan, New York, 1955.
90. R. A. Fisher, The Design of Experiments, Oliver and Boyd, London, 1949.
91. H. B. Mann, Analysis and Design of Experiments, Dover, New York, 1949.
92. W. E. Deming, Some Theory of Sampling, Wiley, New York, 1950.
93. M. H. Hansen, W. N. Hurwitz, and W. G. Madow, Sampling Survey Methods and
Theory, Wiley, New York, 1953.
94. F. F. Stephan, History of the uses of modp,rn sampling, J. Am. Stat. Assoc., 43,
12-:-39 (1948).
95. F. Yates, Sampling Methods for Censuses and Surveys, Griffin, London, 1949.
96. W. J. Dixon and F. J. Massey, Jr., Introduction to Statistical Analysi:~, McGrawHill, New York, 1951.
97. R. A. Fisher, Statistical Methods for Research Workers, Oliver and Boyd, London,
1948.
98. A. Hald, Statistical Theory with Engineering Applications, Wiley, New York, 1952.
99. P. G. Hoel, Introduction to Mathematical Statistics, 2nd edition, Wiley, New York,
1954.
100. P. O. Johnson, Statistical Methods in Research, Prentice-Hall, New York, 1949.
101. F. J. Massey, Jr., The Kolmogorov-Smirnov test for goodness of fit, J. Am. Stat.
Assoc., 46, 68-78 (1951).
102. E. B. Mode, The Elements of Statistics, Prentice-Hall, New York, 1941.
103. G. W. Snedecor, Statistical Methods, 4th edition, Iowa State College Press, Ames,
Ia., 1946.
104. H. M. Walker, Elementary Statistical Methods, Holt, New York, 1943.
105. S. S. Wilks, Elementary Statistical Analysis, Princeton University Press, Princeton, N. J., 1949.
106. C. W. Churchman, Theory of Experimental Inference, Macmillan, New York,
1948.
107. J. Neyman and E. S. Pearson, On the problem of the most efficient tests of
statistical hypotheses, Phil. Trans., A231, 289-337 (1933).
108. A. Wald, "On the Principles of Statistical Inference," in Notre Dame Mathematical Lectures, No.1, Notre Dame University, Notre Dame, Ind., 1942.
109. A. Wald, Statistical decision functions, Ann. Math. Stat., 20, 165-205 (1949).

OPERATIONS RESEARCH

15-129

110. A. J. Duncan, Quality Control and Industrial Statistics, Irwin, Chicago, Ill., 1952.
111. E. L. Grant, Statistical Quality Control" 2nd edition, McGraw-Hill, New York,
1952.
112. J. M. Juran (Editor), Quality Control Handbook, McGraw-Hill, New York, 1946.
113. C. W. Kennedy, QuaWy Control Methods, Prentice-Hall, New York, 1948.
114. S. B. Littauer, Social aspects of scientific method in industrial production, Phil.
Sci., 21, 93-100 (1954).
115. S. B. Littauer, Technological stability in industrial operations, Trans. N. Y.
Acad. Sci., ser. II, 13 [2], 66-72 (1950).
116. P. Peach, An Introduction to Industrial Statistics and Quality Control, 2nd edition, Edwards and Broughton, Raleigh, N. C., 1947.
117. J. G. Rutherford, Quality Control in Industry-Methods and Systems, Pitman,
New York, 1948.
118. W. A. Shewhart, Statistical Methods from the Viewpoint of Quality Control, U. S.
Department of Agriculture, Washington, D. C., 1939.

INFORMATION THEORY
AND TRANSMISSION

D.

INFORMATION THEORY AND TRANSMISSION
16. Information Theory, by Peter Elias
17. Smoothing and Filtering, by Pierre Mertz
18. Data Transmission, by Pierre Mertz

D

INFORMATION THEORY AND TRANSMISSION

Chapter

16

Information Theory
Peter Elias

1. Introduction

16-01

2. General Deflnitions

16-02

3. Simple Discrete Sources

16-08

4. More Complicated Discrete Sources

16-19

5 Discrete Noiseless Channels
Distribution of Information

16-24
16-26

Channel Capacity and Interpretations

16-32

6. Discrete Noisy Channels
7. Discrete Noisy Channels

II.

8. The Continuous Case

16-39

References

16-46

1. INTRODUCTION

Basis of Information Theory. As used here, information theory is a
body of results based on a particular quantitative definition of amount of
information. This definition has a firm 'claim to unique importance in
connection with the engineering questions which arise in systems which
transmit and store information. It has proved interesting and sometimes
useful in other fields (Refs. 1-4). However, other definitions have also
been proposed (Ref. 5) and one of them has a long and useful history in
statistics (Ref. 6). Caution is therefore needed in applying this definition
to a situation in which the theorems which are its main justification in
transmission and storage problems do not apply.
Communication Theory. Information theory is a subdivision of a
broader field, the statistical theory of communication, which includes all the
16-01

16·02

INFORMATION THEORY AND TRANSMISSION

probabilistic analysis of communications problems. This broad field includes in addition to information theory the analysis of random noise
(Ref. 7), work on optimum linear filtering and prediction (Ref. 8, see also
Chap. 17), statistical analysis of signal detection (Refs. 9, 10), and many
other applications of probabilistic ideas which make no use of an information measure.
Note on Terminology. Some authors, particularly in England, use
information theory in a very broad sense, to include theories of scientific
method and of statistical inference along with communications problems
(Ref. 11). They then use "communication theory," or "mathematical
theory of communication," or "theory of selective information," to denote
what is here defined as information theory.
Mathematical Character. Information theory is essentially a branch
of mathematics. Although the language has a physical ring, the words
information source, channel, coder, etc., are mathematical concepts physically
inspired. The theory can be presented as a formal series of definitions,
theorems, and comments. However, its relevance to a given problem is
then not very clear. Section 2 provides contextual definitions and qualitative results: the later sections are more formal.
2. GENERAL DEFINITIONS
A Communications System

Figure 1 shows the model of a communications system which is used in
information theory. At the transmitter the source produces an output
that is coded and fed into the channel. The channel output may be identiNoise

Message

FIG. L

Coded I
signal

Received
signal

Decoded
message

The model of a communications system which is used in information theory.

cal to the channel input, or it may be altered by noise or distortion. At
the receiver the channel output is decoded and used. The model derives
from communications, but it is applicable to the communications aspects
of other problems. Examples. The storage of a digital computer is a
channel, possibly noisy, with input and output separated by time. A
control system may be a channel with electrical input and mechanical
output.

INFORMATION THEORY

16-03

The Information Source

For purposes of the theory, the source in a given analysis is the point
at which information enters the part of the system under consideration.
The source may not actually generate information, but may merely store
or relay it. Examples: a stack of telegrams waiting to be transmitted; a
reel of recorded magnetic tape. Whether the source under consideration
is a true generator of information or merely a storage point is not of concern.
Controlled and Uncontrolled Sources. A controlled source is one
which generates information at a rate controllable by the transmitter.
Examples. The stack of telegrams being read by a telegraph operator is a
controlled source; so is a speaker who may be slowed down by his audience
when he speaks too rapidly for them to take notes. An uncontrolled source
is one which produces information at a rate determined internally, which
cannot be adjusted to the coding and transmission facilities available.
Segmentation. The output of a source is a sequence of symbols. It
is convenient to break this sequence into segments at a number of different
levels. This process is called segmentation by the linguists. Example. The
output of a teletype system is a sequence of binary selections, "Mark"
and "Space." If these are denoted by "0" and "1," then 0 and 1 are
called the elements of the representation.
The group of consecutive elements which represents a single letter, number or mark is called a character or letter of the representation, and the set
of all possible characters or letters is called the alphabet. The alphabet in
a teletype system may be thirty-two characters in number, with each
character a group of five elements.
Words and Messages. In many alphabets one of the characters is
called the space, and given special significance. Sequences of characters
occurring between successive spaces are called words. A sequence of
words which is more or less independent of the preceding and succeeding
source output is called a message. Example. A single telegram might be
called a message. If successive source symbols are highly correlated the
whole (possibly infinite) source output is the message.
Other Levels. The above list of levels is not exhaustive, nor are all
these levels relevant to the description of a particular source. Examples.
In written English the additional levels of syllable, phrase, clause, sentence,
and paragraph are recognized, but the element level is not used.
In the 'analysis of spoken language a set of elements, the distinctive
features, have been introduced (Ref. 12). In a binary digital computer,
element and character coincide as being the symbols 0 and 1 used to represent the binary digits. In a binary-coded-decimal machine the elements
are the same, but the characters are groups of four, five, or six elements
representing one decimal digit or one alphanumeric symbol. The group of

16-04

INFORMATION THEORY AND TRANSMISSION

digits which fits into one storage register is called a word, and one line of
coding, consisting of several related words, which is called an instruction
in computer terminology, might be called a message.
Choice of Levels. The choice of levels of segmentation is not standardized outside linguistics, nor is there any agreement on terminology.
When no particular type or level of segmentation' is implied, the output
of a source will be a sequence of symbols, selected from some finite alphabet.
When two levels are needed at once, as in discussion of word-by-word
translation of a sequence of letters, word will be used for the higher level
and symbol or letter for the lower level. In mathematical discussion, a
segment of a sequence will be called a message only if it is strictly statistically
independent of preceding segments.
Representations and Codes

If a source makes a series of binary selections, its output may be represented, for example, by a sequence of A's and B's or by a sequence of
O's and l's. These are two representations of the source output. If the
first is taken as primary, then the second is called a coded version of the first.
Codebooks. To get from one representation to the other requires a
dictionary, called an encoding codebook. This has two entries:

A
(1)

~

0

B~l

to get back from the second representation to the first requires a decodiny
codebook with the entries
O~A

(2)

l~B

Codes. A code is a transformation, which is defined by an encoding
codebook or an equivalent set of rules. If a coded message is to be decoded,
the inverse transformation as defined by the decoding codebook is also
required. In the example of eqs. (1) and (2), the transformation is oneto-one on each symbol and defines its own inverse, so that only one codebook is required and it may be written with double-headed arrows:
A~O

(3)

B~l

In representations with large alphabets the two codebooks may still be
useful even if only one is necessary. Example. It is convenient to find
a telephone number in a standard d,irectory by looking up a name, but the
inverse operation is tedious, though unambiguous.

INFORMATION THEORY

16-05

Transliteration. A code is called a transliteration if each input
symbol is transformed directly into one output symbol, so that symbolby-symbol coding.is possible and the number of entries in the codebooks
is equal to the size of the alphabet. The code of eq. (3) has this property,
but not all one-to-one codes do. Example. The representation of the
binary source output as a sequence of A's and B's may be coded into a
representation as a sequence of O's and l's by the codebook

°

AA ~
AB ~ 10
(4)

BA

~

110

BB

~

111

To each input sequence corresponds one output sequence, which may be
decoded into its original form by reversing the arrows in the codebook
if the time origin of the coded version is known. The coded output is
a one-to-one transformation on the input sequences, but it is not a transliteration of the symbols A and B. Choose a different level of segmentation
(word, rather than character) and let the first representation consist of
sequences of the four words AA, AB, BA, BB. Then if the four words
0, 10, 110, and 111 are taken as the dictionary for the second representation,
the code becomes a transliteration.
Significant Codes. In assigning code numbers to objects (Example: the
items in a catalog) a distinction is made between significant and nonsignificant codes. Each code number may be considered as a coded version
of a description of the item in English. A significant code is one in which
transliteration is possible at some level of segmentation below that of the
entire code number. Example. The Gode number assigned to a garment in
a catalog may consist of a sequence of groups of decimal digits, the first
group denoting type of garment, the second size, the third color. Each
group may be independently decoded into English words, so that transliteration is possible if each group is considered as a word in the coded
version of the message.
A code that cannot be decoded piece by piece is called nonsignificant.
Example: a code assigning simple serial numbers to items in a catalog.
Coding and Decoding Delay. Note that coding and decoding delays
arise when the coding is not a transliteration. Example. In the codebook
of eq. (4), after the source has selected its first symbol, the coder must
wait until the second symbol has also been selected before it can encode
the pair. In decoding, after a 1 has been received, it is necessary to wait
for one or two more input symbols before the appropriate output pair
can be selected from the set AB, BA, BB.

16-06

INFORMATION THEORY AND TRANSMISSION

Representation and Selection. The output of a coder is called a
representation of its input if it is obtained from the input by a one-to-one
transformation with at most a finite encoding delay~ Note that this
definition agrees with the colloquial meaning of representation for a significant code, in which each segment of the output represents a corresponding
segment of the input.
A nonsignificant code requires a different interpretation. The code number corresponding to a telegraphic greeting is not a modified version of the
message, but an instruction as to where in the decoding codebook the
message will be found. The coded version does not represent the message
but selects it. This concept is basic for information theory (Ref. 13).
Coders. The coder in Fig. 1 matches the source to the channel. The
first requirement on the coder is that it match alphabets. It must transform
sequences of symbols from the source alphabet into sequences of symbols
in the alphabet which the channel will accept. This requirement does
not specify.the coder completely. The two codebooks of eqs. (3) and (4)
both transform A's and B's into O's and l's, but they describe different
coders.
.
Statistical Matching . . For economy of transmission facilities the
coder may be designed to minimize the number of channel symbols required, on the average, per source symbol. This requires knowledge of the
statistics of the source. Example. The codebook of eq. (4) is more complicated than the codebook of eq. (3) and introduces delay. However, if
A's occur 99 per cent of the time and B's only 1 per cent, the output coded
via eq. (4) will require only 0.5015 channel symbol per source symbol,
whereas the output coded via eq. (3) will require one channel symbol per
source symbol, and so will take nearly twice as long to transmit. Economical coding will be discussed in Sect. 3.
Channels

A discrete channel, like a coder, accepts a sequence of symbols selected
from its input alphabet and produces a related sequence of symbols selected
from its output alphabet. The precise boundaries of the channel in a given
system are a matter of choice. Example. A teletype system may be
analyzed by using as a channel the medium of transmission of the electric
pulses. A second analysis of the same system might treat the channel as
running from input keyboard to output printer.
As essential difference between a channel and a coder is that the channel
output may not be an accurate representation of its input: some information about the message may get lost in transit. This may occur in two
ways.

INFORMATION THEORY

16-07

a. Loss. A channel is lossy if it is possible to make finer distinctions
at its input than are preserved in its output. Example. Pulses of fixed
duration and of any a"mplitude greater than 0 may be used successfully to
trigger a circuit which provides an output pulse of fixed duration and
amplitude, no output being produced if the input pulse is smaller than O.
A channel is made of this circuit by using as an input alphabet pulses of the
ten amplitude levels -5, -4, ... -1, + 1, "', +5. This channel accepts
ten input symbols and produces two output symbols: it loses the additional

A~
o
B

1

C

~

a

1

A<.5

b

F~

H
I

1

~

ARo.a
9

E

G

a

0.5

b

1

8<.5

0.1

0.2

B

0.8

b

c

0.5

J

d

(a) Lossy channel

(b) Noisy channel,

(c) Noisy channel

accurate reception
FIG. 2.

Some examples of discrete channels.

amplitude information present in the input. Such a channel is shown
schematically in Fig. 2a.
b. Noise. A channel is noisy if a given input sequence may be received
as anyone of a number of possible output sequences, depending on random
action of the channel. In a lossy channel the received sequence is determined by the transmitted sequence; in a noisy channel this is not true.
Examples. Figure 2b shows a noisy channel in which the noise does not
bother the receiver, who is still able to tell what has been transmitted,
although the transmitter does not know exactly what has been received.
Figure 2c shows a noisy channel in which the channel noise prevents either
transmitter or receiver from knowing with certainty what happens at the
other end of the channel.
Decoders. Any of the channels of Fig. 2 can transmit information at a
definite rate with an arbitrarily small probability of error. This can be
done simply for the first two channels by merely lumping together some
of the input symbols and output symbols, respectively. It can also be
done for the channel of Fig. 2c, by making use of the proper coder and

16-08

INFORMATION THEORY AND TRANSMISSION

. decoder. The coder is still like the codebooks of eqs. (3) and (4), although
the matching job performed is more sophisticated. But the decoder is
different. Since the channel performs a transformation on the input
sequences which is many-to-many rather than one-to-one, the decoder
must perform a many-to-one transformation. In the channel, each input
sequence may produce many output sequences. The decoder must decode all of these (or at least all' of them which occur with appreciable
probability) into the same output sequence, if it is to avoid making errors.
This will be discussed in Sect. 7.
3. SIMPLE DISCRETE SOURCES

Self-Inforlllation Measure

Let x denote a particular event, and let Prob {x} be its probability.
The amount of information associated with the occurrence of the event x
is defined to be
(5)

I(x) = - log Prob {x},

where the choice of logarithmic base corresponds to the choice of a unit
of information. The quantit.y I(x) is sometimes called self-information
(Ref. 14) to distinguish it from the mutual information relating two events,
discussed in Sect. 6.
Units. Logarithms to the base 2 are chosen for eq. (5). The resulting
unit is the bit, which is the amount of information associated with the
occurrence of an event of a priori probability one-half. Other information
units include the Hartley, which is the information given by an event of
probability 710, and the nat, or natural unit, which is the information
given by an event of probability 1/e, where e is the base of N apierian
logarithms, e = 2.71828· ...
Bits and Binits. In computer terminology bit is often used as a contraction of binary digit. This practice cannot be followed in information
theory, since the occurrence of a binary symbol with a priori probability
other than Yz does not provide a bit of information. The word binit
will therefore be used as an abbreviation for binary digit (Ref. Ifi).
Properties. The information measure has the following two important
. properties.
1. Since Prob .{x} ~ 1 for any event x,
(6)

I(x)

~

O.

2. Let x and y be two statistically independent events, and let x, y
denote the event which is their joint occurrence. Then
(7)

I(x, y) = I(x)

+ I(y),

INFORMATION THEORY

16-09

since the probability of the event x, y is, by hypothesis of independence,
the product of the probabilities of x and y.
Distribution of InforIllation

Message Source. Consider a set Jlf of n different messages, llf =
{mi}, 1 ~ i ~ n, and a random process that generates sequences by
selecting messages from this set. (The word message implies that successive
selections are statistically independent. See Sect. 2.) Let Xk be the kth
message selected in time sequence, - co < Ie < co. Then Xk is a random
variable, taking values from the set M = {md, with
(8)

as the probability of selecting the ith message as the leth choice. As
eq. (8) implies, it is assumed that the process is stationary, so that Pi is
independent not only of the earlier selections but also of the time index k.
Bar Plot of Distribution of InforIllation. The amount of information associated with the selection of message mi is then also independent of
k and of prior selections: it is given by
(9)

The random variable I

(Xk)

takes its values from the set

with probabilities
(10)

Because of eq. (9), if all the probabilities Pi are different from one another,
then on a bar plot of the distribution of information, the bars which give
the probabilities of the different possible information values all terminate
on the single exponential given by
(11)
Mean and Variance. The information distribution is completely
determined by the probabilities Pi, via eq. (9). The most important para:meters of the distribution are its mean value,
n

(12)

and its variance,
(13)

n

16-10

INFORMATION THEORY AND TRANSMISSION

EXAMPLE 1. The source illustrated in Fig. 3 selects messages from the
set M = {A, B, C} with probabilities {0.755, 0.185, O.OGO} and information
. values {0.405, .2.434, 4.059}. The iriformation distribution has mean
value 1.00 bit/symbol, and standard deviation (J'I = 1.10 bits/symbol.

1.0

t

"

0.75

~ 0.5
:.c
~

.c

£. 0.25
3.0

4.0

Information, bits ~

FIG. 3.

Bar plot of information distribution.

The three message probabilities terminate on the exponential of eq. (11),
shown dotted.
EXAMPLE 2.
The source illustrated by Fig. 4 has an alphabet M =
{O, I}, with probabilities {72, 72}. This distribution also has a one-bit

to~: """'" }.
Co

e

~ 0,25

0.0 L::-----:::-:-::-----lL--_---L___---I
0,0
0,25
0,5
0,75
1.0
p(O) - - -

FIG. 5.

The entropy function H(p(O), pel)) = -p(O) log p(O) - pel) log pel).

operation is more complicated, and 1 no longer has the form of the entropy
function of a probability distribution.
The entropy function H (PI, P2) is illustrated in Fig. 5. Since PI
P2 = 1,
this is actually a function of a single variable only.

+

Binary Coding

The rate of a source is a significant parameter because it determines the
communications facilities required to transmit the source output after

16-12

INFORMATION THEORY AND TRANSMISSION

proper coding.

The source in Fig. 4 generates information at a rate

R = 1 bit/symbol, and each output symbol is just one binit. The curve
of Fig. 5 shows that one bit is the maximum average amount of information that one binit can convey. In this case, the rate R has the interpretation that the source output may be represented in binits so as to require
R = 1 output binits per source symbol. This interpretation can be
generalized to other sources.
FIRST BINARY CODING THEOREM (Controlled Sources). Given a discrete
message source which generates information at an average rate R bits per
message and given any 0 > 0, it is possible to construct a representation of
sequences of messages as sequences of binary symbols so that, on the average,
less than R + 0 output binary symbols are required per input symbol from the
source. It is not possible to find a representation using fewer than R output
binary symbols per source symbol (Ref. 16).
A code which satisfies the requirements of this theorem does the job of
statistical matching referred to in Sect. 2.
Shannon-Fano Coding. The general strategy in constructing efficient binary codes is to divide the message set into two subsets of nearly
equal probability and to use the first digit of the coded output sequence
to indicate in which half the selected message lies. Each half is divided
into two subsets again by the next digit, and the process terminates on
subsets which contain only one message (Ref. 17).
This procedure is not quite explicit, however. It will not be possible
to make all dichotomies equiprobable unless all the message probabilities
are powers of 72. If not, then there are many possible not-quite-perfect
codes, and it is difficult to choose among them. The following procedure,
called Huffman coding, is explicit, and it gives a "best possible" code
(Ref. 18).
Hufi'lllan Coding.
1. List all possible messages in order of decreasing probability, and

assign as the last digit in the coded output a 0 to the next-to-Iast message
and a 1 to the last message. These two messages will agree in all the
(as yet unknown) digits preceding the last one.
2. Merge the last two messages, adding their probabilities, and insert
the sum in its proper position in the list of message probabilities. Now
repeat step 1. Continue until all messages are merged.
EXAMPLE. The process is illustrated for the message set of Fig. 3 in
Fig. 6, by a kind of a graph which is called a tree for obvious reasons. The
code for each message is read off starting at the left node and reading the
O's and 1's which label the branches along the (unique) path terminating
in the selected message. For Fig. 6 this leads to the codebook

INFORMATION THEORY

(15)

p(A) = 0.755
= 0.185
0.060

p(B)
p(C)

=

A

~

0

B

~

10

C

~

11.

16-13

1.000

0.060-- C

A coding tree.

FIG. G.

Codewords. The Prefix Property. The output sequences of a codebook are called codewords. The code of eq. (15) and of Fig. 6 illustrates a
characteristic feature of Huffman codes called the prefix property: no
. codeword is a prefix of any other longer codeword.
The prefix property is a sufficient condition to guarantee that a sequence
of codewords written down in order without spacing can be uniquely decoded into a sequence of source symbols. Decodability is required in
order that the output be a representation of the input, and spaces between
words are not permitted since, if they were used, the output alphabet
would be ternary and not binary. The prefix property is not necessary,
however. I t is possible to construct codes, which can be decoded after
some delay, that do not satisfy this condition.
EXAMPLE.
The codebook

(16)

A

~

0

B

~

01

C

~

11

is decodable, but the codewords do not satisfy the prefix condition since
o is a prefix of 01. There is no advantage to such codes in the binary coding
case, and there seems to be none in general (Refs. 19, 20, 21).
The Szilard-Kraft Inequality. Let Wi be the number of binits in
the codeword for the ith message mi. Thus for the codebook of eq. (15)
one has
WI = 1
(17)

W2

= 2

W3

= 2.

16-14

INFORMATION THEORY AND TRANSMISSION

The smaller the Wi are, the fewer output binits are required per .input
symbol. However, if they are tOb small, the codewords cannot all be
different. Thus if Wi = 1 for all i, the only distinct codewords are 0 and
1, which cannot distinguish three different messages. A condition on the
lengths of code words is given by:
THE SZILARD-KRAFT INEQUALITY. Given a set of n messages, it is possible
to assign a codeword of length Wi to the i-th message, and to satisfy the prefix
condition, if and only if the Wi satisfy the inequality
n

L 2-

(18)

wi

~ 1.

i=l

If the codeword lengths do not satisfy this condition, no decodable code

can be constructed (Refs. 22, 23).
Coding hnplications. Suppose all Pi are powers of 72, so that all
information values are integers. Then let Wi = Ii = - log Pi. This gives
n

n

n

(19)

which satisfies the constraint of eq. (17). Thus a decodable code can be
constructed in which each message has a codeword length in binits equal to
its information content in bits. Then the average codeword length, which
is the average number of output binits per input message, is
n

(20)

ill

=

L

n

PiWi

i=l

=

L

Pili

= 1 = R.

i=l

It can be shown that no smaller value of ill can be obtained from codeword lengths satisfying eq. (18). This proves the First Binary Coding
Theorem for these special cases, with 0 in the theorem = O.
EXAMPLE. Consider the following set of five messages and their codes.
Message
ml
m2
ma
m4
m5

Probability
1

2
1
"4
1
"8

716
716

Codeword
0

Ii

Wi

1

1

10

2

2

110
1110
1111

3
4

3
4

5

5

General Case. In general the Ii are not integers, since the Pi are not
powers of 72. However, a decodable code can always be constructed in
which Wi is the smallest integer which is greater than or equal to Ii. Then

16-15

INFORMATION THEORY

< Ii + 1,
PiWi < Pili + Pi,

Ii ~ Wi

(21)

PJi ~

R=1~w~1+1=R+1.

so that the average number of output binits per message is never more
than one in excess of the average number of bits per message. This means
that if the number of bits per message is large, the percentage excess is
small. One can always make the number of bits per message large by
coding sequences of input messages, taking all possible sequences of length
L messages as a new message set, containing n L different messages.
EXAMPLE. For the codebook of eq. (15) and the source of Fig. 6, the
codeword lengths Wi are given in eq. (17). One can compute w, the average
number of binits per source symbol:
n

(22)

W =

L: PiWi
i=l

=

1 X 0.755

+ 2 X 0.185 + 2 X 0.060

=

1.245 binits/bit.

N ow form all 32 = 9 possible pairs of messages selected by the source
of Fig. 6. Using the Huffman coding procedure, as illustrated by the tree
0.5700

0.0111

--Be

--M

o
1.0000 L..---=---=-_._--..:::.....----..--=----.,--..:::...---..------=--:::-=~-=--_._-=-_<0.0147

--CA

0.0453

FIG. 7.

--CB

0.0111

A coding tree for message pairs.

in Fig. 7, gives the following set of messages, probabilities and codes.
Message
AA

AB
AC

BA
BB
BC
CA
CB
CC

Probability
0.5700
0.1397
0.1397
0.0453
0.0453
0.0342
0.0111
0.0111
0.0036

Code

o
11
101
1001
10001
100000
1000011
10000100
10000101

--CC

0.0036

16-16

INFORMATION THEORY AND TRANSMISSION

Evaluating eq. (22) in this case gives W = 2.0767. The average information per message is two bits: for since successive messages are statistically
independent, the information in a sequence of two messages is the sum of
the informations in each of the two, and the average of a sum is the sum
of the averages. The efficiency of this coding in bits per binit is about
0.96. In terms of the inequality of eq. (21) this might be as low as % or
as high as 1.00. The fact that the coding here has an efficiency well
above its lower bound is typical of coding results. I t is due to the fact
that the entropy curve in Fig. 5 has a very broad maximum, so that the
message set must be quite skewed in probabilities before the efficiency of
coding drops very low.
General Case Continued. Define WL as the average number of binits
required to code a block of L source symbols. Rewriting inequality (21)
for the new message set gives

LR

~ WL ~

LR

+ 1,

(23)
WL
R+ 1
R'5::-'5::--·

-

L -

L

Now vh/ L is the average number of output binits per original input
message, so that the last line of eq. (23) satisfies the requirements of the
First Binary Coding Theorem for any L > 1/0. This proves the .theorem
in the general case of a message source with arbitrary information distribution. It also justifies a definition which assigns the same rate to the two
very different sources of Figs. 3 and 4, if these sources can be controlled.
Controlled Source Coding. The source of Fig. 4 may be controlled to
read out one binit per second. The source of Fig. 3 may be controlled so
that its coded output produces one binit per second. This will require that
the source be speeded up when it generates A's, and slowed down when
it generates B's or C's, but on the average it will be generating very nearly
a symbol per second. The average rate of the source, then, determines
the communications facilities required to transmit its encoded output.
The differences in the distributions of Figs. 3 and 4 affect only the amount
of delay and the size of the codebook required for efficient coding into
binits. This result holds for any controlled source, whose symbol rate
may be varied in order to keep its information rate constant.
Uncontrolled Sources. Here the source generates messages at a
constant rate. If there is any variance in its information distribution, the
rate at which it generates information will then fluctuate. In an efficient
code, Wi, the number of output binits in the codeword for the message
mi, is still close to the message self-information J(mi). The average number

INFORMATION THEORY

16-17

of binits per message WL is still near to the source rate R. But it is possible
(though highly improbable) that all L of the messages in a block will
be those of maximum self-information. Therefore it is not possible to
transmit all message sequences as they come along unless a channel is
used which can transmit binits at a rate equal to Wmax times the (fixed)
rate at which the uncontrolled source generates messages. Here Wmax is
the largest of the Wi, i.e., the length of the longest codeword.
MiniInax Coding. If all the message sequences generated by an uncontrolled source are to be transmitted unambiguously, the best code to
use is not the Huffman code, which minimizes W but will have a large
W max if the information distribution has appreciable variance. Rather it is
better to use a code that minimizes the value of W mux , a minimax problem
with a simple solution (Ref. 24). The codewords are taken of uniform
length W m , where Wm is the integer such that
(24)
or
Here nm is the number of messages. Since there are 2Wm codewords of
length W m , there are at least enough to label all the messages. Note that
this coding procedure is quite independent of the message probability or
information distribution and depends only on the number of messages in
the set.
Efficient Coding: Uncontrolled Source. The average rate R at which
a source generates information still has significance when the source is
uncontrolled. Not all sequences of L messages can be coded into about
LR binits, but almost all of them can. More precisely we have:
SECOND BINARY CODING THEOREM (Uncontrolled Sources).
Given a
discrete message source which generates information at an average rate R
bits per message, and given any 0 > 0 and any E > 0, it is possible to construct
a representation of sequences of messages as sequences of binary symbols so
that, for each message sequence, less than R + 0 output binary symbols
are required per input symbol from the source, except for a set of message
sequences whose total probability is less than E.
The procedure is to code the messages in blocks of length L, coding
each block into a codeword of length Wi ~ Ii
1. The theorem follows
because the self-information of a sequence of messages is the sum of the
self-informations of the component messages (since statistical independence
is assumed), and because the sum of a large number L of identically distributed, statistically independent random variables is very likely to be very
near, percentagewise, to Ltimes the mean of the distribution.

+

16-18

INFORMATION THEORY AND TRANSMISSION

Sums of Random Variables. The Second Binary Coding Theorem
follows from the weak la\v of large numbers. Stronger results derive from
the Tchebysheff inequality and the central limit theorem. These three
results applied to self-information are:
1. WEAK LAW OF LARGE NUMBERS. For any E > and any a > 0,
an integer Lo can be found so large that the probability that a sequence of L > Lo
messages will have an amount of self-information greater than L(1
a) is < E.
2. TCHEBYSHEFF INEQUALITY. For any a > 0, the probability that a
a)
sequence of L messages has an amount of self-information greater than L(1
is < E = ui/La 2 •
3. CENTRAL LIMIT THEOREM. For any a > 0, the probability that a
sequence of L messages has an amount of self-information greater than
L[1 + (a/VI)] is asymptotically given by the expression

°

+

+

(25)

E

=

1 2
V27rLUI

foo
X=-fJ/VL

e-x 2/2L u r 2dx =

1

2

~

fOO

e-Y 2/ 2ur2dy.

y=fJ

Here I is the mean and ui is the variance of the self-informationdistribution of the messages.
Coding Interpretations. Each of these results translates into a result
for efficient coding, since by eq. (21) it is possible to assign distinct binary
codewords to all possible message sequences of length L so that the difference between codeword length in binits and information in bits is less than
unity for each sequence. Thus to every sequence in a set of sequences of
total probability ~ 1 - E, we can certainly assign codes of length
< L(I + a) + 1. The remaining sequences, of total probability ~ E, may
all be assigned the same codeword, and will cause ambiguity or error a
fraction E of the time.
Storage and Delay. The central limit theorem shows that for fixed
error probability E, the difference between wL/L, the binits per message,
and I, the bits per message, decreases with blocklength L only like 1/ VL.
This implies that it may be necessary to use much longer blocks to get
efficient coding for an uncontrolled source than would be required for a
controlled source, for which the difference between wL/L and 1 = R decreases like I/L, as in eq. (23).
Effective Number of Messages. From the weak law of large numbers,
there is a set of message sequences of total probability > 1 - E, each
sequence of which has self-information within ±La of Ll. Each sequence
in this set then has probability in the range

(26)
and the total number of sequences in the probable set, for large L, lies

INFORMATION THEORY

16-19

in the range
(27)

no matter how small e and o. Adding unity to L ultimately multiplies the
number of messages in the probable set by 21. This is what would happen
if there were just
(28)

different messages in the set, and neff is therefore called the effective number of messages, or the effective alphabet size. Since I ~ log n, we have
(29)

That is, the fact that all messages are not equiprobable produces a growth
in the probable sequence set as if a smaller equiprobable message set were
being used.
4. MORE COMPLICATED DISCRETE SOURCES

Most natural sources are more complicated than those discussed in
Sect. 3. A more general source is a random process which generates
sequences of symbols like the letters of English text, in which each symbol
is selected with a probability which depends on the values of the preceding
symbols.
Joint Probabilities. Consider a set S of n different symbols,

S

=

lsd,

1

~

i

~

n.

Let Xk be the symbol selected at (integer) time k, -00
is a random variable, taking values from the set S =
process is well defined if the joint probabilities

Xk

< k < +00.
{Si}.

Then
The random

(30)

are known for all combinations of x values and all values of j. It will
be assumed that the process is stationary (and indeed ergodic), so that the
probabilities are independent of the time index k.

16-20

INFORMATION THEORY AND TRANSMISSION

Conditional Probabilities. Knowledge of the joint probabilities of
eq. (30) is equivalent to knowledge of the conditional probabilities
qO(Si)

I

qi (Si Xk-I)

(31)

I

qj(Si Xk-b Xk-2, ... , Xk_j).

The two sets of probabilities of eqs. (30) and (31) are related by
qO(Si)

(32)

=

PI (Si)

I

qj(Si Xk-b Xk-2, ... , Xk-j)Pj(Xk-b Xk-2, ... , Xk-j)

=

Pj+I (Si, Xk-b Xk-2, ... , Xk_j).

Markov Sources

If for some integer N and all integers j
(33)

qN+j(Silxk-b Xk-2, ... , Xk-N-j)

=

>0
qN(Silxk-b Xk-2, ... , Xk-N),

so that a knowledge of N preceding symbols gives all the probabilistic
information available about the next symbol value, then the process is
a multiple Markov process of order N (Ref. 52). When N = 1, the process
is called a simple Markov process.
Self-InforInation. If the process is Markov of order N, then the
self-information provided when the symbol Si occurs is the negative logarithm of its probability, but this is now a conditional probability depending on the values of the preceding N symbols. The self-information is
thus a random function of the N random variables Xk-b Xk-2, ... , Xk-N.
(34)

I N(Si IXk-l,

Xk-2, ... , Xk-N)

= - log qN(Si IXk-I,

Xk-2, ... , Xk-N).

Average Self-InforInation. A self-information of the symbol Si which
is not a random function may be obtained by averaging eq. (34) over the
conditional probability rN(xk-b Xk-2, ... , Xk-N lSi) that when Si occurs
at time k, the preceding N symbols will have the values Xk-I, Xk-2, ... ,
Xk-N. By Bayes's theorem,
(35)

PN+I(Si, Xk-I, Xk-2, ... , Xk-N)
rN(xk-b Xk-2, ... , Xk-N lSi)
PI (Si)

and

INFORMATION THEORY

16-21

is defined as the average self-information of the symbol Si for an Nth order
Markov process.
Source Rate. The average rate R at which a Markov source generates
information is equal to the average IN of the IN(si) over the probabilities
of the symbols PI (Si). This gives the average self-information per symbol
of the process:
n

(37)

R = IN =

2: PI(Si)IN(Si).
i=l

Other Sources

If the source is not a Markov process of finite order, the self-information
of a symbol may not be well defined, since it may then be a function of an
infinite number of random variables. However, the quantity I N(Si IXk-b
Xk-2, " ' , Xk-N) given by eq. (34) is still defined for each N, and it is
called the Nth order conditional self-information of Si: the quantity
IN(si) defined by eq. (36) is called the N-th order average self-information
of the symbol Si. It can be shown that for any process, for all Si and any
N,

IN(si)
(38)

~

0

IN(si) ~ IN+1(Si);

the average information provided by the occurrence of Si when the N
preceding symbols are known is a monotone decreasing function of
N. Further knowledge of the past, on the average, makes whatever symbol
happens next more probable, and therefore less informative.
Upper Bounds on Source Rate. If an information source has a
measured first-order probability distribution PI (Si) , the average self-information of each of its symbols is at most equal to - log PI (Si), the value
it would have if successive symbols were statistically independent. If
the source has a given conditional distribution of order N, the average
self-information of each symbol is bounded above by the average selfinformation IN(si) of a Markov precess of order N with the same Nth
order conditional distribution. Again, statistical dependence beyond what
is contained in the given distribution can only reduce the average selfinformation of each symbol.
Self-Inforlllation of a SYlllbol. From eq. (38) it follows that one can
define a limit,
(39)

I(si) = lim IN (Si) ,
N_~

which converges for each Si to a non-negative number. The limit, I(si),

16-22

INFORMATION THEORY AND TRANSMISSION

is the average self-information added by the occurrence of the symbol si
when all the preceding symbols are known.
Note that no general lower bound better than 0 can be obtained. . A
process which looks random on the basis of Nth order statistics may always
be deterministic on the basis of statistics of order N + 1 and have an
average rate of zero.
Source Rate. The average self-information i of the process is again
equal to its rate R, and is .given by either of the two expressions

(40)

lim iN,
N-+~

where iN, the average,Nth order self-information of the process, is given
byeq. (37).
.
More General Sources. The only type of source of greater generality
than the multiple Markov process of finite order which has been studied
in detail is called a finite-state source (Ref. 25) . Note. A finite state
source (a) includes the Markov processes of finite order but it is not included among them and (b) is still not general enough to generate all and
only the grammatical sentences in English word by word (Refs. 26, 27).
Because of the complexity of natural sources like written language, indirect
methods must be used to estimate their rate. A straightforward application of the definitions involves the measurement of probability distributions
of order so high that in all the written English there would be too small a
sample for an accurate estimate.
Coding and Delay

The first binary coding theorem still applies to more complicated sources.
It carries over unaltered to any discrete ergodic source, whether it be
multiple Markov, finite state, or still more general. This may be shown
by two coding methods.
1. Block Coding. Segment the source output into blocks of length
L, and code each block from a fixed codebook containing a binary sequence
for each of the n IJ possible sequences of source symbols. Each source
sequence may be coded as before into a number of binits at most one
greater than its total information content in bits.
The average self-information in a block of L source symbols is equal to
io, the average self-information of the first symbol in the block when no
past history is known, plus i1, the average self-information of the second
symbol when the first is known, etc., the last term being i L - 1 , the average

INFORMATION THEORY

16-23

self-information of the Lth symbol when all preceding L - 1 are known.
Averaging over all sequences gives WL, the average number of binits per
block of L input symbols, as bounded by
L-l

L-l

2: I j

(41)

~

WL

2: I j + 1.

~

j=O

j=O

Dividing eq. (41) by L gives the average number of binits per source
symbol:
1 L-l _
WL
1 L-l _
1
(42)
- 2: I j ~ - ~ - 2: I j
L j=O
L
L j =0
. L

+-.

The summation in eq. (42) is the average of the first L average self-informations of the process. Since Ij cannot increase with j, the average will in
general be greater than the limit I = R, but it will approach the limit
as L ~ 00. Given any 0 > 0, it is always possible to find an L so large
that the first coding theorem is satisfied, but the required L may be very
large even if the process is Markov of small order. Example. A simple
Markov process (of order one) has 10 = 10 bits per symbol, and 11 =
R = 710 bit per symbol. It will take L = 100 to code the output of this
source at 50 per cent efficiency, i.e., at two output binits per input bit.
2. Conditional Coding. A more complicated but more efficient
procedure is conditional coding. Here blocks of L2 symbols are encoded
by using one of 11,L1 codebooks. The codebook to be used is determined
by the preceding Ll symbols. Such a process has an encoding delay of
L = Ll + L2 input symbols, and an average number of binits per source
symbol given by
1

(43)

L-l _

- 2: I j
L2 j=L1

~

WL2
-

L2

~

1
-

L-l _

1

2: I j + -.

L2 j=L1

L2

EXAMPLE.
In the simple Markov example given for block coding,
conditional coding will give better than 50 per cent efficiency for Ll =
1, L2 = 10, L = 11, which is much less delay and a much smaller codebook than is required by the L = 100 above for block coding.
Correlated InforIllation Values. In extending the second binary
coding therorem to more complicated uncontrolled sources, an additional
kind of delay arises. The occurrence of a symbol with self-information
value above the mean may make it more probable that the succeeding
symbol will also have self-information abov~ the mean. Then it will be
necessary to encode much larger blocks of. symbols in order to make it
highly probable that the percentage deviation of the self-information of the
block from its mean value will be small. Mathematically the problem

16-24

INFORMATION THEORY AND TRANSMISSION

becomes one of adding correlated rather than statistically independent
random variables, and convergence to the mean may be much slower.
The second binary coding theorem itself extends to a very broad class of
sources, but the Tchebysheff inequality and the central limit theorem do
not apply in the form given above.
5. DISCRETE NOISELESS CHANNELS

Type I. Channels. The simplest noiseless channel is one in which each of
the set S = {Si} of symbols which may be applied as an input to the channel
is received unaltered at its output. Since there is a one-to-one correspondence between input and output symbols, they may be identified and
called channel symbols. If all channel symbols are of equal duration,
their number n and their common duration t completely specify the channel.
Such a channel will be called a channel of type 1.
Channel Capacity. The capacity of a noiseless channel is the maximum
average rate at which information can be received over it. The capacity
may be measured in bits per symbol, denoted by C, or in bits per second,
denoted by Ct. If the common symbol duration is t seconds, then

tCt = C.

(44)

For a type I channel, inequality (38) and the following discussion show
that statistical dependence between successive symbols cannot increase
the rate of transmission. Therefore the capacity can be computed by
assuming statistical independence and maximizing the average rate R
in bits per symbol.
(45)

with respect to variations in the probability distribution PI (Si). But this
rate is just the entropy of the Pi(Si) distribution, which has the maximum
value log n, attained when all PI (Si) are equal to lin. Thus for a type
I channel,
C

(46)

= log n bits per symbol,

Ct = (lit) log n bits per second.

Redundancy. If a source is connected to a type I channel and selects
channel symbols with unequal probabilities or with statistical dependence,
the rate R = I at which it generates information will be less than the
capacity C of the channel. The difference C - R is defined as the (absolute)
redundancy, in bits per symbol, of the source with respect to the channel.
The ratio of absolute redundancy to channel capacity, a number between

INFORMATION THEORY

16-25

o and

1, is defined as the relative redundancy of the source with respect
to the channel. In terms of the interpretation at the end of Sect. 3 of
R = I as the logarithm of the effective alphabet size of the source, the
redundancy is a measure of the reduction in logarithm of the size of the
effective alphabet, due to nonoptimum utilization of the channel.
Type II Channels. A more complicated noiseless channel, which will be
called type II, has a different duration ti for each channel symbol Si.
lt is again true that capacity is attained by using symbols chosen with
statistical independence and maximizing the average rate R t (now in bits
per second) with respect to the symbol probabilities PI (Si). But the rate
is now given by

(47)

and the maximization leads to the condition that the instantaneous rates
at which each symbol transmits information all be equal to the channel
capacity C. Thus
(48)

or
PI(Si)

= 2-Ctti •

The capacity Ct is determined, for given durations ii, by the normalization requirement that the probabilities of the symbols sum to one; this
gives
(49)

and C t is the (unique) real root of this equation. Redundancy with respect
to a channel of type II is defined as it was for a channel of type I, using
R t and Ct rather than Rand C. Notice that a source is redundant with
respect to a type II channel unless the probabilities with which it chooses
symbols are unequal, and are given by eq. (48).
Type III Channels. Shannon (Ref. 28) discusses a finite-state channel,
which will be called type III. Here the symbols are of different durations,
and the alphabet of symbols available at each instant depends on the
preceding symbols which have been sent over the channel. An expression
for the capacity of such a channel has been given (op. cit.). Type III
channels have storage. They will not be discussed here, except to note

16-26

INFORMATION THEORY AND TRANSMISSION

that for each such channel there is a corresponding finite-state source,
which has no redundancy with respect to the given channel. This optimizing source no longer selects successive symbols with statistical independence.
Noiseless Channel Coding Theorellls. The capacity C of a noiseless
channel is a rate at which some particular optimizing source can transmit
information over the channel. Since the source whose output may need
transmission will not usually be an optimal source, this is not a justification
for considering'C to be an important channel parameter. The justification
is that given a channel of capacity C, any source of rate R < C, and no
source of rate R > C may be so encoded as to'permit reliable transmission
over the channel.
Binary coding theorems one and two can be interpreted as showing how
any source can be coded into a binary noiseless channel of type I. These
theorems can be generalized.
NOISELESS CHANNEL CODING THEOREMS

I. Controlled Source. Given a discrete controlled source and a discrete
noiseless channel which has capacity C t bits per second, it is possible to control
the source to any average rate R t < C t and to encode its output for unambiguous
reception over the channel. This is not possible for any R t > Ct.
II. Uncontrolled Source. Given: a discrete uncontrolled source of type I,
II, or III with average rate R t bits per second; a discrete noiseless channel of
capacity C t bits per second; and any 0 > O. If R t < C t , it is possible to
encode sequences of source symbols for transmission over the channel so that
the probability that such a sequence will be incorrectly decoded is < o. This
is not possible if R t > Ct.
6. DISCRETE NOISY CHANNELS.

I. DISTRIBUTION OF INFORMATION

Mutual Inforllla tion

Let x and y denote two related events, and let x, y denote the event
which is their joint occurrence. Let Prob {x}, Prob {y}, and Prob {x, y}
be the associated probabilities.
The self-information given by the occurrence of x is defined in eq. (5) as
(50)

I(x) = -logProb {x}.

If y is now observed, and x and yare not statistically independent, the
probabili ty of x a priori will be changed a posteriori to
(51)

Prob {x/y} =

Prob {x, y}
Prob {y}

.

INFORMATION THEORY

16-27

This change in the probability of x changes the amount of information
required to select it to
Prob {x, y}
(52)
l(xly) = - log Prob {xIY} = - log
.
Prob {y}
The diflerenee between eqs. (50) and (52) measures how the amount of
information required to select x is changed by the knowledge of y. This
difference is denoted by lex; y), the amount of mutual information between
xandy. Then
(53)

lex; y) = - log Prob {x}

- log Prob {y}
log

+

log (Prob {x, y}/Prob {y})

+ log (Prob

Prob {x, y}
Prob {x} Prob {y}

{x, y} /Prob {x})

.

Mutual information is measured in the same units as self-information
(see Sect. 3).
Properties.

1. lex; y) is symmetric:
(54)

lex; y) = ley; x).

This follows from the last line of eq. (53), and justifies the name "mutual
information."
2. lex; y) vanishes if x and yare statistically independent. If not,
there is a decomposition generalizing eq. (7):
(55)

lex, y) = - log Prob {x, y} = lex)

+ ley)

- lex; y),

showing that lex; y) plays the role of a correlation (Ref. 2). If Prob
{x, y} is greater than Prob {x} Prob {y} then lex; y) is positive.
3. lex; y) may be positive or negative, but cannot be greater than the
self-information of x or y:
l(x; y) ~ lex)
(56)

~

ley).

This follows from eq. (53), since the conditional probabilities are at most
unity, and have nonpositive logarithms (Refs. 14,29).
Notation. Any I function whose argument contains no semicolons is
interpreted as the negative logarithm of the probability of its argument:
thusl(xly) = -logProb {xly}. Any I function whose argument contains
a semicolon between two sets of variables is interpreted as in eq. (53),
where x and y stand for the expressions to the left and right of the semicolon, and x, y stands for their conjunction.

16-28

INFORMATION THEORY AND TRANSMISSION

Distribution of Mutual Information

Mutual information measures how much information one symbol provides about another. It can be used in the discussion of Sect. 5 on sources
which generate sequences of related symbols. Here it will be applied only
to the discussion of noisy channels.
Discrete Noisy Channel. Consider a simple noisy channel, as illustrated in Fig. 2. There is a set U = {Ui}, 1 ~ i ~ n u , of symbols
which may be transmitted, and a set V = {Vi}, 1 ~ j ~ n v , of symbols
which may be received. It will be assumed throughout that the channel
is without storage and that it and the source are stationary: the probability
of transmitting the symbol Ui and receiving the symbol Vj is independent
of time and of prior transmissions and receptions. Let Xk and Yk be the
transmitted and received symbols at (integer) time k, Xk E:: U and Yk E:: V.
Then the pair (Xk, Yk) is a stationary random variable, taking values from
the set U X V = {Ui' Vj} of ordered pairs of transmitted and received
symbols, with probabilities
(57)

Denote the first order probabilities by
nv

(58)

P(Ui) =

L: P(Ui, Vj),

j=l

nu

q(Vj) =

L: P(Ui, Vj)

j=l

and the conditionals by
p(Ui, Vj)

(59)

P(Ui)

Mutual Information of a Noisy Channel. The amount of information given by Yk about Xk is also a stationary random variable, which takes
values equal to the numbers I(ui; Vj) with probabilities P(Ui, Vj).
The distribution of this random variable is completely determined by
P(Ui, Vj), through eqs. (53) and (58). Its most important parameter is its
mean value, the average rate R at which the received symbols give information about the transmitted symbols:
nu

R =

nv

L: L: P(Ui, vj)I(ui; Vj),

i=lj=l
(60)

INFORMATION THEORY

16-29

Although for particular Ui, Vj the mutual information may be negative,
the average R of eq. (60) is always positive.
EXAMPLE 1. The Binary Erasure Channel. As illustrated in Fig. 8, this
channel accepts two input symbols, 0 and 1, and produces three output
symbols, 0, 1, and X. With probability p its output reproduces its input;

p(O)

t

= ! 0=Ul~0.9
Vl=O
0.1

:5

Vz =x
0.1

p(1)

=!

1 = Uz

FIG. 8.

v3

0.9

=1

1.0

~ 0.8
0.6
23 0.4
o
0: 0.2

-1.0

0.0
Information, bits ~

1.0

Binary erasure channel and mutual information distribution.

with probability q, its input is erased and an output X indicates the erasure;

o and 1 are transmitted with equal probability.

When a 0 or a 1 is received, the mutual information is
Prob {ut, vd
1(0; 0) = log - - - - - - Prob {UI} Prob {VI}

(61)

log p
=

Prob {vII ud
Prob {VI}

= log 2 = 1 bit = 1(1; 1).

p/2

When an X is received,
1(0; X) = log

(62)

Prob {v2lud
Prob {V2}

q

= log - = log 1 = 0
q

= 1(1; X).

This gives the information distribution shown in Fig. 8, with the average
value
2

(63)

R =

3

2: 2: P(Ui, vj)l(ui; Vj)
i=1 j=1

= (p/2)1(0; 0)
= 2(p/2)

+ (q/2)1(0; X) + (p/2)1(1; 1) + (q/2)1(1; X)

+ 2(q/2)

X 0

= p.

16-30

INFORMATION THEORY AND TRANSMISSION

EXAMPLE 2.
The Binary Symmetric Channel. As illustrated in Fig. 9,
this channel also accepts the two input symbols 0 and 1, but it only produces
the same two output symbols. With probability p its output reproduces

p(O) =

p(l)

i

=i

~

O=UIX~
~
1=u2

8~

t 1~

O=Vl
1=v2

~~:~
E 0.4
£~

-3~.0----~--2~.0-------~1.~O------0~.0------~1.0
Information, bits ~

FIG. 9.

Binary symmetric channel and information distribution.

its input: with probability q = 1 - p its output is the incorrect symbol.

o and 1 are transmitted with equal probability, and q is < %.

When the correct symbol is received, the mutual information is
(64)

leO; 0) = log

Prob {v!/ud
Prob {v!}

= log pl(!) = log 2p > 0

= 1(1; 1).
When an error is made,
(65)

1(0; 1) = log

Prob {v2/ud
Prob {V2}

= logql(!) = log2q < O.

This gives the mutual information distribution illustrated for q =
in Fig. 8, with the average rate
(66)

R = p log 2p
=

79

+ q log 2q = p + q + p log p + q log q

1 - H(p, q),

where H(p, q) is the entropy function illustrated in Fig. 5. For q =
H(p, q) = H(t,

t)

79,

= 0.5032,

and

R = 0.4968 bit per symbol.
Averages of Inforlllation Measures.

In addition to the average rate

R, other averages of information measures must be considered.
Notation. An average of an information function l(ui; Vj) over the
joint distribution P(Ui, Vj) is denoted by replacing the names "u/' and
"v/' of the symbols by the names "U" and "V" of the sets from which
the symbols are selected. A single capital denotes an average over a

INFORMATION THEORY

univariate distribution.

16-31

Thus
nv

nu

L L

leU; V) = R =

P(Ui, vj)l(ui; Vj)

i=l j=l
nu

leU) = H( {P(Ui)})

L

p(ui)l(ui)

i=l

(67)
nu

no

L L

l(U I V)

P(Ui, Vj)l(uiIVj)

i=lj=l
nu

np

- L L

P(Ui, Vj) logp(uiIVj).

i=l j=l

Equivocation. The average rate at which information about the transmitted symbols is supplied to the channel is the average self-information of

the transmitted symbols. This by eq. (56) is greater than the average
rate at which such information is received. The difference is

nu

nv

i=l

j=l

nu

nv

i=l

j=l

- L L

P(Ui, Vj) log P(Ui)

(68)

- L L
=

l(UI V) ~

p(Ui' Vj) log p(uiIVj)

o.

This quantity is the conditional entropy of the set U given V. It measures
the average amount of information about the transmitted symbol which
the receiver still lacks after noisy reception, and thus the average rate at
which it would be necessary to transmit additional information over an
extra channel in order to make the receiver certain of each transmitted
symbol. This quantity is also called the average equivocation of the
received symbols. Equivocation is present in the channels of Figs. 2a
and 2c, but not in Fig. 2b.
Irrelevance. The average rate at which the received symbols give
information (subject matter unspecified) is the average self-information
of the received symbols. From eq. (56), this is also greater than the
average rate at which information is received about the transmitted symbols.

16-32

INFORMATION THEORY AND TRANSMISSION

The difference may be shown, as in eq. (67), to be
nu

nv

L L

leV) - leU; V)

p(Ui, Vj) log q(Vj lUi)

i=l j=l

(69)

= I(VI U) ~

o.

This quantity is the conditional entropy of V given U. It measures the
amount of received information not relevant to the transmitted information,
but relevant only to the channel noise.
The names "Spread," "Dispersion," and "Prevarication" have all been
used for this quantity. "Irrelevance" seems more appropriate in the case,
for example, of the channel of Fig. 2b, in which the receiver receives information which is irrelevant but not misleading. The channel of Fig. 2a
has no irrelevance, but that of Fig. 2c does.
7. DISCRETE NOISY CHANNELS.
INTERPRETATIONS

II. CHANNEL CAPACITY AND

The Noisy Channel Specification

Formally a noisy channel without storage is a graph, like those in Figs.
2, 8, and 9, in which each branch connects a transmitted symbol with a
received symbol and has a number on it. The numbers are the conditional
probabilities q(Vj IUi) which define statistically what the channel does to
given input symbols. These numbers are fixed for a given channel, but the
transmitter is free to decide how to use the input symbols. Only the channel
with no storage will be considered.
TransInitter Strategy. The transmitter strategy formally is a random
process, selected by the transmitter to generate sequences of transmitter
symbols for transmission over the noisy channel. If an input message
sequence is coded into a sequence of transmitter symbols, the random
process which generates the messages and the operation of the coder may
be combined to obtain the new random process which is the transmitter
strategy.
In Sect. 4, eq. (38) et seq., it was pointed out that the self-information
of a symbol can on the average only be reduced by statistical dependence
on preceding symbols. The same is true of the mutual information provided by a received symbol about transmitted symbols for a channel with
no storage. If the transmitter wants to maximize the average amount of
mutual information received, he can do no better than to select successive
transmitted symbols independently from some distribution P(Ui). Then
the problem of choosing a transmitter strategy reduces to the problem of
choosing a first order distribution P(Ui) for the transmitted symbols. Then
P(Ui) and the q(vjl Ui) together determine channel operation completely.

INFORMATION THEORY

16-33

Capacity of a Noisy Channel. The channel capacity C of a given noisy
channel is defined as the maximum value of the transmission rate R which
can be obtained by varying the transmitter strategy. For a channel with
no storage, this is the maximum R which can be obtained by varying P(Ui),
with q(Vj I Ui) fixed. Thus from eq. (69),
nu

(70)

C = max R = max
p(u,)

p(u,)

nv

2:: 2:: p(Ui)q(Vjl Ui) log

1i=l j=l

q(volu o)}
J
~
q(Vj)

,

where the values of q(Vj IUi) are held fixed as the P(Ui) are varied, and the
variation of q(Vj) is determined from the relation
nu

(71)

q(Vj) =

2:: p(Ui)q(Vj lUi).

i=l

The maximization is carried out by differentiating eq. (70) for R with
respect to each of the P(Ui), subject to the constraint that
nu

(72)

2:: p(Ui)

=

1.

i=l

However, this maximization may lead to negative values for some of the
p(Ui). It is then necessary to eliminate one or more of the input symbols
by setting its probability at zero, and to maximize again with a smaller
input set, until a maximum R is obtained with all P(Ui) non-negative (30).
Interpretations of Capacity

One interpretation of the capacity C of a noisy channel is provided by
its definition. In Example 1, Fig. 8, the binary erasure channel has a rate
of transmission R = P when O's and l's are transmitted with equal probabilities, as shown by eq. (63). This is the maximum rate attainable for
this channel, and is therefore its capacity, C = p. One bit of information
per symbol is supplied to the channel, and on the average P bits of information about the transmitted symbol are received. But the transmission process
is not reliable. Since information is being supplied to the channel at a
rate greater than the channel capacity, not all of it can get through, and
the channel determines in a random fashion which bits will be lost and
which will be saved.
Feedback Interpretation. In the binary erasure channel, suppose that
the transmitter can look over the receiver's shoulder and can see which of
the transmitted symbols have been erased in the channel. This can be
accomplished by having a noiseless feedback channel from receiver to
transmitter. Every time a transmitted digit is erased the transmitter can
repeat it, going on to the next digit as soon as the first unerased version
of the preceding one has been received.

16-34

INFORMATION THEORY AND TRANSMISSION

The transmitter is now supplying the channel with information at an
average rate of p bits per transmitted symbol. The repeated digits do
not give additional information about the message to be transmitted, but
only about where erasures have occurred in transmission. The receiver
receives information at the same average rate of p bits per symbol, and
receives each message symbol once. Here channel capacity has the interpretation it has in the case of a noiseless channel: it is the maximum
average rate at which information can be transmitted reliably over the
channel. In fact noiseless channel coding Theorem I: Controlled Source
(Sect. 5) applies verbatim to this noisy channel.
Coding Interpretation. A noiseless feedback channel is not usually
available. Fortunately it turns out that it is not needed. It is possible to
obtain reliable transmission over a noisy channel, at any rate less than
channel capacity, by proper encoding, without making use of any feedback information. This is the primary justification of "capacity" as a
significant parameter for a noisy channel. Indeed it is perhaps the most
important single justification for the definition of mutual information and
the whole structure of information theory. The formal expression of this
fact is the:
NOISY CHANNEL CODING THEOREM. Given: a discrete source of type I,
II, or I I I (Sect. 5) with average rate R t bits per second; a discrete noisy channel
without storage of capacity C t bits per second; and any 0 > O. If R t < C t ,
it is possible to encode sequences of source symbols for transmission over the
channel so that the probability that such a sequence will be incorrectly decoded
is < o. This is not possible if R t > Ct.
Relation to Noiseless Case. This theorem is essentially the second
Noiseless Channel Coding Theorem, with a few minor modifications. Both
of these theorems may be strengthened to give relations between the error
probability 0, the difference Ct - R t between rate and capacity, and the
length L of the sequence of source symbols which must be encoded. Results
like the Tchebysheff Inequality and the Central Limit Theorem must be
applied to both the source self-information distribution and the channel
mutual information distribution. Some work has been done on these
problems recently (Refs. 31-34).
hnplications. The Noisy Channel Coding Theorem shows that a lack
of reliability in a channel does not impose a corresponding lack of reliability
on the received and decoded messages. This alone is not surprising. For
example in the binary erasure channel, transmitting at rate R = liN bits
per symbol by repeating each message binit N times for transmission, the
error probability per message binit is
(73)

16-35

INFORMATION THEORY

since the receiver can decode each message binit unless all N repetitions
are erased. The probability in eq. (73) can be made arbitrarily small,
but only by letting R ~ o.
The theorem also states, however, that error probability can be made
arbitrarily small without decreasing rate R, so long as R < C, the channel
capacity. This requires proper encoding of long sequences of source symbols.
Construction of Codes. There is as yet no analog to the Huffman
code for noisy channels. Considerable work has been done in designing
codes for the binary symmetric channel of Fig. 9 (Refs. 35-43). However,
no simple, explicit coding procedure has yet been found for transmitting
at rates arbitrarily close to channel capacity with arbitrarily small error
probability.
The only constructive procedure available transmits at rates less than
capacity. However,if the rate is kept fixed, it is possible for the receiver
to set the error probability as low as he desires, but this depends on how
much delay he is willing to tolerate. This procedure has been discussed
for the binary symmetric channel (Ref. 40). It will be illustrated here for
the binary erasure channel of Fig. 8.
Error-Free Coding for the Binary Erasure Channel

In the binary erasure channel of Fig. 8 the error probability can be reduced by using some of the input binits as information symbols and some
as check symbols. Such a coding procedure is illustrated in Fig. 10.
Message

10100110

Received noisy message

__ -' 0 1 0 0 11 11

X

55J

d

1 11

X X

Erasures

I I

0 0 0

I I

0 0

Decoded message

10
FIG. 10.

00110 1 111

x x 010 1 101

Parity check coding for the binary erasure channel.

16-36

INFORMATION THEORY AND TRANSMISSION

Parity Check Codes. Assume that the probability q of erasure per
digit in the channel is 0.05. In Fig. 10, each group of four successive
message digits has added to it by the coder a fifth digit for checking. The
added digit is selected to be a 0 or a 1 so as to make the total number of
I's in the block of five coded digits even. A check digit of this type is
called a parity check (Ref. 35). If the channel erases only one digit or none
in each block of five, the receiver can correct the erasure by filling it in
with a 0 or 1 so as to make the total number of I's in the block even again.
The channel may erase two or more digits in a single block of five, as
shown in the third block in Fig. 10. The receiver cannot correct that
block. However, with q = 0.05, so that on the average only 5 per cent of
the transmitted symbols are erased, the receiver will be able to correct
more than three-quarters of the erasures. In fact, the average number of
erasures per block of N = 5 remaining after correction is equal to N q,
the average number before correction, less NpN-I q, the contribution to
the average due to blocks containing a single erasure. This gives

(74)

Nq - NpN-I q = Nq[I - pN-I] = Nq[I - (1 _ q)N-I]

< Nq[I

- (1 - q)N]

< Nq[I

- (1 - Nq)] = (Nq)2.

Thus the number of erasures remaining is reduced from Nq to (Nq)2, by
a factor of Nq = 5(0.05) = 0.25.
Behavior of First Stage. In the coding procedure illustrated in Fig. 10,
input information is being supplied at an average rate of ~t = 0.80 bit
per input binit. The capacity of the channel is C = p = 1 - q = 0.95
bit per symbol. The resultant probability of remaining erasures is <74
of the original probability of erasure in the channel. Without reducing
the input information rate it is possible to reduce the probability of remaining erasures by iterating this kind of checking procedure. The second
step in such an iteration is illustrated by Fig. 11.
Iteration. In Fig. 11, the size of the basic block has been increased from
5 to NI = 10 digits, the first nine being message digits and the tenth
being a parity check on the preceding nine. After nineteen such blocks
of ten, the coder adds a twentieth check block. The first digit in the check
block is a parity check on the first digits in each of the nineteen preceding
blocks, i.e., digits 1, 11, 21, .. " 181 in time order. The second digit in the
check block checks all the second digits in preceding blocks and so on;
the last digit in the check block checks the nine preceding check digits,
and it is in fact a parity check on the whole group of (10)(20) = NIN2
= 200.digits. This is visualized most easily as in Fig. 10, in which the
blocks are aligned below one another, and each digit in the check block

INFORMATION THEORY

16-37

checks the column above it. The last digit in the check block checks
both the row to its left and the column above it.
The receiver decodes each row which has no more than a single erasure
as soon as the check digit at the end of that row is received. If a row has
more than one erasure, it can still be decoded properly when the check
block arrives, if each column has only one erasure left after the first decoding step.
1 0 1
1 1 0
o 0 I
I 1 1
o 1 I
o 0 0
0 0 I
I I 1
o I I
o I 1
I o 0
o I I
o I I
I I 0
I I I
0 o 0
o 0 1
0 I 0
0 I 0
o 0 o

1
I
I
I

1 000
I 0 I o
1 I I 0
100 I
o 0 I 0 I
I 1 I 0 I
0 1 1 0 1
000 0 I
I 100 I
0 I 0 I I
I o 1 I 0
1 0 I 0 1
0 I I I o
0 1 0 I I
I 0 0 1 0
I 0 I I 0
0 0 I o I
I I I 0 I
0 1 0 0 1
0 I 1 0 I

I 1 1 0 I 1
0 I I I 0 1
0 1 0 0 1 I
o 0 I I I I
0 o 0 I I 0
0 o 0 o 0 I
I 100 I 0
0 o I I I 0
0 I 0 1 1 1
I 001 1 x
0 o I 001
I 0 o I I 1
I o 0 I I 0
0 I I 100
0 I I I 1 1
I 0 0 001
100 010
0 I 0 I 0 I
1 o 0 1 x 0
100 o 0 x

100 o I 1 I 0 1 1
lOx o 0 I I 1 0 I
I I I o 0 I 0 0 I I
x 0 x I 001 I 1- I
0 I 0 I o 0 o I I 0
I I 0 I o 0 0 o 0 I
1 I 0 1 1 100 I o
000 I 0 0 I I I 0
100 1 0 1 0 I I I
I 0 I 1 I 001 I 0
o I 100 0 1 0 0 I
0 I 0 I I o 0 I I I
I I I 0 1 o 0 I I 0
I 0 I I 0 I I 100
o 0 1 0 0 I I I 1 1
o I I x x o 0 o 0 I
0 I 0 I 1 x 0 o I 0
I I x I 0 I 0 I 0 I
1 o 0 I 100 100
1 I 0 I I 0 o 0 0 0

I 000 1
I o 1 o 0
I I 100
x o x I o
0 I 0 I o
I 1 0 I 0
I 1 0 I 1
000 1 o
I o 0 I 0
I o I 1 I
0 I I o 0
o I 0 I I
I I I 0 I
I o I I o
0 0 1 0 o
0 I I x x
0 1 0 I I
I I 0 I o
I 001 I
I I 0 I I

1 1 0 I
1 I 1 0
100 I
0 1 I I
0 o I I
0 o 0 o
I 0 0 I
0 I I I
I 0 I I
0 o 1 I
0 100
0 o I I
0 o I I
I 1 I 0
I 1 1 I
0 000
0 0 o I
1 0 I 0
0 0 I 0
0 000

I
I
I
I
0
I
0
0
I
0
I
I
0
0
1
I

100
I 0 I
I I I
100
0 1 0
1 1 0
1 I 0
000
I o 0
1 o 1
o 1 I
010
I I I
I o I
o 0 1
0 I 1
o 0 1 0
I I 1 0
0 1 o 0
0 1 I 0

0 1 1
0 1
o 0 I
100
1 0 0
I o 0
1 1 1
1 0 0
1 0 1
1 1 0
000
I I 0
0 1 0
1 0 1
o 0 1
0 1 0
1 I 0
1 0 1
1 I 0
I 1 0

o

'-----,vv---.,J \\,._ _ _ _ _- - - - - J1 \\,.----Vv---~/\....- - - " " v v - - -......

Transmitted
message

FIG. 11.

Received noisy
message

After correction
by rows

After correction
by columns

Iterated checking for the binary erasure channel.

Erasure Probability. Since none of the digits appearing in a single
column have ever been together in a check group before, they are statistically independent, and the distribution of erasures in each column is
binomial again. Define q as the erasure probability in the channel, ql as the
average erasure probability remaining after correction of rows, q2 as the
average erasure probability remaining after checking by columns, PI
1 - ql, P2 = 1 - q2·

N1ql = N1q - N1pN-1q
(75)

<

(Nlq)2,

For q = 0.05, Nl = 10, this gives ql < !q. For N2 = 20, q2 < !ql.
Further Iteration. The next step, keeping Nl = 10 and N2 = 20, is
to add a check layer of 200 check digits after 39 layers of 20 blocks of ten
digits each have been transmitted. This third order check will again
multiply the erasure probability by a factor < 72. This procedure can

16-38

INFORMATION THEORY AND TRANSMISSION

be repeated indefinitely, giving for the kth order check a remaining erasure
probability
qk

(76)

<

(N kqk-l)qk-l

<

!qk-b

Nk = 2Nk -

1

or

qk

<

2-kq:

= 2k - 1N 1 •

In the limit as k ~. 00, the remaining erasure probability becomes arbitrarily small. The rate of transmission in bits per symbol is just the
fraction of input symbols which are message digits and not check digits.
This is
R = (1 - /0)(1 - lo)(l - io)'"
(77)

>1>1>1-

+ lo + io + ... )
(/0)(1 + t + t + ... )

(/0
2
10

=

0.80.

Thus the rate is at least as great as the rate in the simple block checking
scheme of Fig. 10, but the error probability is as low as the receiver cares
to set it if the transmitter adds the check digits of all orders, and if the
receiver is willing to wait long enough for a sufficiently high order check to
come along before decoding.
Relation between Error Probability, Rate, and Delay. The iterative coding procedure just discussed is not optimum. However, it shares
two characteristics with optimum systems.
1. The reliability attained increases, for fixed rate, as the permitted
coding delay increases.
2. The reliability attained increases, for fixed delay, as the required
transmission rate decreases.
For any noisy channel there is a trading relationship between the probability P e of residual error, the permissible delay N, the transmission rate
R, and the channel capacity C. Here N is the number of symbols delay
permitted between the transmission of a given symbol and the computation
of its decoded version. The best terms of trade can be shown to give an
approximately exponential decrease of error probability with delay:
(78)

in the sense that
(79)

Pe) = x(C, R)

log
lim ( - - N-+w
N

exists for C > R as a positive number. The function x(C, R) is called the
exponent of the error probability, For C - R small but positive, the ex-

INFORMATION THEORY

16-39

ponent is approximately given by
(C - R)2
(80)

x(C, R) -

2ui

where ui is the variance of the mutual information distribution for the
given channel, and for the transmitter distribution P(Ui) which attains
capacity (Refs. 31, 32, 34, 41).
8. THE CONTINUOUS CASE
Con tinuous Sources

A waveform like that of Fig. 12 shows two kinds of continuity. It takes
on a continuum of amplitude values, and its amplitude changes continuously with time.
Signal

ampf"de
~Time

FIG. 12.

A continuous waveform.

Quantization. The amplitude continuity may be removed by amplitude quantization, as in Fig. 13. This may be accomplished by a quantizer,
which has an amplitude transfer characteristic of the staircase type as
illustrated. The output is a waveform whose amplitude values are selected

Quantizing
levels
Signal
amplitude

~Time

t
FIG. 13.

An amplitude-quantized waveform.

from a discrete set, but whose jumps from one value to another occur at
arbitrary times. The difference between the input signal and the quantized
output is the quantization noise (Ref. 44).
Salllpling. The time continuity may be removed by making periodic
observations of the amplitude of the waveform, deriving from the con-

16-40

INFORMATION THEORY AND TRANSMISSION

tinuous time function a sequence of sample values. The period of the
sampling is called the sampling interval. The resultant samples still have
amplitudes selected from a continuous set.
SAMPLING THEOREM.
If a waveform x(t) is bandlimited to frequencies
between 0 and W cycles per second, then it is completely determined by its
samples x(kT) taken at a sampling interval T = 1/ (2W) seconds. The function x(i) may be re-created from its sample values by the expansion

(81)

x(t)

=

-Time

Transmitted waveform

° ° D9 D9
FIG. 15.

CJ

~

P 799

Binary pulse-code modulation.

Self-Information of Continuous Signals. In a sampled sequence
which is not quantized, each sample value is selected from an infinite set
and may have infinite self-information. If the samples x(kT) are statistically independent and have the probability density p(x), then - log p(x)
and its average value

(82)

H(X) = -

foo p(x) log p(x) dx
-00

are still definable in many cases, but they no longer represent information
values. The quantity H(X) is still called the entropy of the distribution
with density p(x), but is no longer the average self-information of the
source per sample. The entropy function in the continuous case is not
invariant under a change in the scale by which the amplitude x is measured.
The infinite self-information associated with a selection from a continuous set arises from the fact that the selection of a single real number
between 0 and 1 is equivalent to the selection of an infinite sequence of
binary digits, namely the binary expansion of the real number, and conversely.
Continuous Noisy Channels

Only stationary channels, without storage, and with bandwidth limited
from 0 to W cycles per second will be considered. Such a channel is
defined by a conditional probability density function q(y Ix). For a given
value x of the transmitted sample, q(y Ix) gives the density of the distribution of possible received values y.

16-42

INFORMATION THEORY AND TRANSMISSION

Additive Noise.
Let z be a noise voltage selected with probability density r(z), and let
the received signal y be the sum of the transmitted signal and the noise,
y = x + z. Then
EXAMPLE.

(83)

q(ylx) = q(x

+ zlx)

= r(z) = r(y - x).

Thus a continuous channel with bandwidth Wand additive noise is completely specified by the distribution of the noise which is added.
Mutual Information and Rate. If each transmitted sample value
x is selected from a probability density p(x), and the channel is specified
by the conditional density q(ylx), the joint density p(x, y) = p(x)q(ylx)
defines both the channel and the transmitter strategy, in analogy to the
discrete case. The probability density q(y) of the received sample values
is then given by
q(y) =

(84)

foo p(x, y) dx.
-00

The random variable
p(x, y)

lex; y) = log - - -

(85)

p(x)q(y)

is again defined as the mutual information between x and y, and its average
value
OO
p(x y)
(86)
R = leX; Y) =
p(x, y) log
,
dx dy
-00
-00
p(x)q(y)

f

foo

is the average rate of transmission of mutual information, in bits per
sample. This measure retains its informational significance while selfinformation does not, because mutual information is invariant to a change
in the scale on which both the transmitted sample x and the received
sample yare measured.
EXAMPLE.
Additive Noise. As before, let y = x + z, where z is an
added noise, statistically independent of x. Then by eqs. (83) and (86),
(87)

R = l(X; Y) =

OO

f

-00

=

f

OO

-00

foo p(x, y) log q(y Ix)
- dx dy
q(y)

-00

foo p(x, y) log rex -

q(y)

-00

y)

dx dy

-foo foo p(x, y) log q(y) dx dy
+ foo foo p(x, y) log r(y -00

-00

= H(Y) - H(Z).

-00

-00

x) dx dy

16-43

INFORMATION THEORY

Channel Capacity. The capacity C of a noisy continuous channel is the
maximum value of the rate R which may be obtained in eq. (8G) by varying
the probability density p(x), with which the transmitted sample values
are chosen. The variation is usually constrained so as to keep constant
a given peak or mean square value of the time function x(t), which is
determined through eq. (81) by the sample values. In general, finding C
is a difficult variational problem.
EXAMPLE. Additive Noise. By eq. (87), the rate R in a channel with
independent additive noise is the difference between the entropy of the
distribution of received signal values y and the entropy of the distribution
of the additive noise z. For a given channel, the noise distribution is fixed,
so that maximizing the rate reduces to the maximization of H(Y) by
variation of p(x): thus from eqs. (83) and (84),

(88)

max H(Y) = max
p{a:}

{_foo [foo p(x)r(y --:- x) dX]

p{x}

X

-00-00

log

[L:p(x)r(y - x) dX] dY },

subject to constraints on peak or average transmitter power. This problem
is difficult.
Entropy of Gaussian Distribution. If the sample values x(kT) of
a bandlimited function x(t) are selected with statistical independence from
a distribution with density p(x), the mean square value of the time function
x(t) is equal to the mean square value of the samples (Ref. 44), which is
given by

x2 =

(89)

f'" X2p(X) dx.
-00

Thus x 2 is the signal power S. For x 2 = S fixed, the distribution with
maximum entropy is Gaussian (Ref. 16), with
(90)

and entropy, from eq. (82), given by
(91)

H(X) = - (1/v!2;S)

foo e-00

= log v!2;S

= ! log 27reS.

+ ! log e

x2
/

2S

(log [1/v!2;S] - x 2 /2S) dx

16-44

INFORMATION THEORY AND TRANSMISSION

Gaussian Additive Noise. If an added bandlimited noise z is statistically independent of the signal x, and either x or z has zero mean value,

(92)
where N is the mean noise power. Thus if the channel is given and the
average transmitter power is constrained, the received power is determined
by eq. (92). The entropy H(Y) in eq. (87) will then be maximized if
q(y) is Gaussian with variance S + N.
However, the sum of two independent random variables cannot have a
Gaussian distribution unless both random variables themselves have
Gaussian distributions (Ref. 49). Thus only if the noise is Gaussian can
the transmitter select a (Gaussian) p(x) which will lead to a maximum
H(Y). The rate will then be
R = H(Y) - H(Z)

(93)

+ N)

=

t

=

t log (1 + SjN) bits per sample.

log 271"(S

-

t

log 271"S

Rewriting eq. (93) on a bits-per-second basis gives:
CAPACITY OF A CHANNEL WITH ADDITIVE GAUSSIAN NOISE. Given a
channel bandlimited from 0 to W cycles per second, with an average transmitter
power S, perturbed by additive white Gaussian noise of total power N, its
capacity is

(94)
c = W log (1 + SjN) bits per second.
The restriction to white noise (noise which has a uniform spectral density
in the interval 0 to W cycles) is required in order that successive samples
of noise be statistically independent. If they are not, the capacity will
be greater than that given by eq. (94).
Dependence of Capacity on Bandwidth. Holding S fixed and increasing W, the noise power N increases with W, since noise power in

1.5

t

=

=

/Iog2e
bits_ _
1.44_
bits
1 nat
_-.l
___
__
__ _

1.0

C
bits

0.5

1 2 3
WI(SINo)~ .

FIG. 16.

Channel capacity and

bandw!~th.

INFORMATION THEORY

16-45

frequencies previously rejected now enters the channel. For thermal
noise and shot noise, the noise power N is directly proportional to W:
N = NoW watts,

(95)

where No is the noise power per cycle bandwidth.
Substituting this in eq. (94) gives
(96)

C = Wlog (1

+ S/NoW),

which is plotted in Fig. 16.
Interpretations of Capacity

Some interpretation of the capacity of a continuous noisy channel is
required, since the channel can accept input information at an infinite
rate, but can only transmit information about its input to the receiver at a
finite rate.
Discrete Input Interpretation. One interpretation is provided by
the fact that the noisy channel coding theorem still applies to the continuous noisy channel. The output of a discrete source may be encoded
for transmission over a continuous noisy channel at any rate less than
channel capacity, and the receiver can then decode the received signal
with arbitrarily small error probability. In this case the transmitted
signals may be continuous waveforms, but they are selected from a finite
set, and therefore have finite self-information (Ref. 45).
Quantization Interpretation. If the receiver cannot distinguish
between transmitted waveforms which are very near to one another, it
is not necessary to transmit the precise waveform generated by a continuous
source: some "near-by" waveform will do. The quantization process
discussed earlier shows one procedure for selecting a near-by waveform.
The transmitted waveforms of any finite duration are then a discrete set,
and one may be selected by the receiver with small error probability
despite the noisy channel. Other measures of distance, or fidelity of reproduction, have been introduced and studied (Refs. 16, 46).
Reduction of Ignorance Interpretation. A final interpretation also
carries over from the discrete case. If successive samples are statistically
independent, the receiver knows a priori that x will be selected from p(x).
A posteriori the true value of x is selected from p(x Iy), a narrower distribu.
tion with less entropy. The change in entropy,
(97)

H(X) - H(XI Y)

measures the average reduction in· the receiver's ignorance of the value of
the transmitted sample (Refs. 16, 29).

16-46

INFORMATION THEORY AND TRANSMISSION

Generalizations. The analysis of the continuous case has been extended to cases of mixed type, i.e., to distributions which have discrete
probabilities as well as densities (Refs. 16, 50, 51), and some discussion
has been given of the nonbandlimited case (Ref. 46).

REFERENCES
1. Y. Bar-Hillel and R. Carnap, Semantic information, Jackson, Editor, Communication Theory, Butterworths, London, 1953.
2. W. J. McGill, Multivariate information transmission, Trans. I.R.E., PGIT-4,
93-111, Sept. 1954.
3. S. Kullback, An application of information theory to multivariate analysis, Ann.
Math. Stat., 23, 88-102, March 1952.
4. B. Mandelbrot, Simple games of strategy occurring in communication through
natural languages, Trans. I.R.E., PGIT-3, 124-137, March 1954.
5. M. P. Schutzenberger, On some measures of information used in statistics, C.
Cherry, Editor, Information Theory, Butterworths, London, 1956.
6. R. A. Fisher, Theory of statistical estimation, Proc. Cambridge Phil. Soc., 22"
700-725 (1925).
7. S. O. Rice, Mathematical analysis of random noise, Bell System Tech. J., 23,
282-332 (July 1944); 24,46-156 (Jan. 1945). Reprinted in N. Wax, Editor, Noise and
Stochastic Processes, Dover, 1954.
8. N. Wiener, The Extrapolation, Interpolation and Smoothing of Stationary Time
Series with Engineering Applications, Technology Press and Wiley, New York, 1949.
9. D. Middleton and D. Van Meter, Detection and extraction of signals in noise from
the point of view of statistical decision theory, J. Soc. Ind. Appl. Math. 3, 192-253
(Dec. 1955); 4., 86-119 (.Jun.e 1956).
10. W. W. Peterson, T. G. Birdsall, and ·W. C. Fox, The theory of signal detectability, 'Prans. I.R.E., PGIT-4, 171-212, Sept. 1954.
11. C. Chcrry, On Human Communication, Technology Press and Wiley, 1957, esp.
p. 247, footnote.
12. R. Jacobson, G. Fant, and M. Halle, Preliminaries to speech analysis, M.I.T.
Acoust. Lab. Rept. 13, 1952.
13. R. V. L. Hartley, Transmission of information, Bell System Tech. J., 7, 535-563
(July 1928).
14. R. M. Fano, Statistical Theory of Informatio~, Technology Press, Cambridge,
Mass., 1957.
15. M. J. E. Golay, Bits and binits, Proc. I.R.E., 42, 1452 (Sept. 1954).
16. C. E. Shannon, A mathematical theory of communication, Bell System Tech. J.,
1948 as reprinted in C. E. Shannon and W. Weaver, The Mathematical Theory of Communication, Univ. of Illinois Press, Urbana, 1949. See p. 28.
17. R. M. Fano, The transmission of information, M.l.T. Research Lab. Electronics
Tech. Rept. 65, March 194n.
18. D. A. Huffman, A mcthod for the construction of minimal-redundancy codes,
Proc. I.R.E., 40, 1098-1101 (Sept. 1952).
19. A. A. Sardinas and G. W. Patterson, A necessary 'and sufficient condition for
unique decomposition of coded messages, I.R.E. Convention Record, Pt. 8, 104-109,
March 1953.

INFORMATION THEORY

16-47

20. B. Mandelbrot, On recurrent noise limiting coding, in Proceedings of the Symposium
on Information Networks, Polytechnic Institute of Brooklyn, New York, 1955.
21. M. P. Schutzenberger, On an application of semi-group methods to some problems
in coding, 'l'rans. I.R.E., IT-2, 47-60, Sept. 1956.
22. L. K. Kraft, A Device for Quantizing, Grouping and Coding Amplitude-Modulated Pulses, S.M. Thesis, Elec. Eng. Dept., M.LT., 1949.
23. B. McMillan, Two inequalities implied by unique decipherability, Trans. I.R.E.,
IT-2, 115-116, Dec. 1956.
24. B. Mandelbrot, Diagnostic et transduction en l'absence de bruit, Institut de
Statistique de l'Universitc de Paris, Paris, 1955.
25. Shannon, op. cit. in (Ref. 16), p. 22.
26. N. Chomsky, Three models for the description of language, Trans. I.R.E. IT-2,
113-124, Sep~ 1956.
27. N. Chomsky, Syntactic Structures, Mouton and Co., London, 1957.
28. Shannon, op. cit. in (Ref. 16), p. 26.
29. P. M. Woodward, Probability and Information Theory, with Applications to Radar,
McGraw-Hill, New York, 1953.
30. S. Muroga, On the capacity of a discrete channel, J. Phys. Soc. Japan, 8, 484-494
(1953).
31. A. Feinstein, A new basic theorem in information theory, Trans. I.R.E., PGIT-4,
2-22, Sept. 1954.
.
32. C. E. Shannon, The rate of approach to ideal coding (abstract only), I.R.E.
Convention Record, Pt. 4, 47, March 1955.
~3. P. Elias, Coding for noisy channels, I.R.E. Convention Record, Pt. 4, 37-46,
March 1955.
34. C. E. Shannon, Certain results in coding theory for noisy channels, Information
and Control, 1, 6-25 (Sept. 1957).
35. R. W. Hamming, Error detecting and error correcting codes, Bell System Tech. J.,
29, 147-160 (1950).
36. M. Plotkin, Binary codes with specified minimum distance, Univ. Penna. Moore
School Research Div. Rept. 51-20, 1951.
37. M. J. E. Golay, Binary coding, Trans. I.R.E., PGIT-4, 23-28, 1954.
38. E. N. Gilbert, A comparison of signalling alphabets, Bell System Tech. J., 31,
504-522 (1952).
39. 1. S. Reed, A class of multiple error-correcting codes and the decoding scheme,
Trans. I.R.E., PGIT-4, 38-49, Sept. 1954.
40. P. Elias, Error-free coding, Trans. I.R.E., PGIT-4, 29-37, Sept. 1954.
41. P. Elias, Coding for two noisy channels, in C. Cherry, Editor, Information Theory,
Butterworths, London, 1956.
42. D. Slepian, A class of binary signalling alphabets, Bell 811'!tem Tech. J., 35,
203-234 (Jan. 1956).
43. D. Slepian, A note on two binary signalling alphabets, Trans. I.R.E., IT-2,
84-86, June 1956.
44. W. R. Bennet, Spectra of quantized signals, Bell System Tech. J., 27, 446-472
(July 1948).
45. C. E. Shannon, Communication in the presence of noise, Proc. I.R.E., 37,
10-21 (Jan. 1949).
46. A. N. Kolmogorov, On the Shannon theory of information transmission in the
case of continuous signals, Trans. I.R.E., IT-2, 102-108, Dec. 1956.
47. P. Elias, Predictive coding, 'Trans. I.R.E., IT-I, 16-33, March 1955.

16-48

INFORMATlqN THEORY AND TRANSMISSION

48. B. M. Oliver, J. R. Pierce, and C. E. Shannon, The philosophy of PCM, Proc.
I.R.E., 36, 1324-1331 (1948).
49. H. Cramer, Random variables and probability distributions, Cambridge Tracts
in Math. No. 36, Cambridge, England, 1937.
50. S. Kullback and R. A. Liebler, On information and sufficiency, Ann. Math.
Stat., 22, 79-86 (March 1951).
51. K. H. Powers, A unified theory of information, M.l.T. Research Lab. Electronics
Tech. Rept. 311, Feb. 1956.
52. J. L. Doob, Stochastic Processes, Wiley, New York, 1953, esp. p: 89.

D

INFORMATION THEORY AND TRANSMISSION

Chapter

17

Smoothing and Filtering
Pierre Mertz

1. Deflnitions: Smoothing and Prediction. Symbols
2. Deflnitions: Correlation
3. Relationship between Correlation and Signal Structure
4. Design of Optimum Filter
5. Extensions of Procedure
6. Network Synthesis
References

1. DEFINITIONS: SMOOTHING AND PREDICTION.

17-01
17-05
17-09
17-13
17-19
17-25
17-32

SYMBOLS

Time Sequence of Data. A plot is shown in Fig. 1 of a small portion of
a time sequence of data, f(t). Such a time sequence may also be represented
by an electrical signal, in which the variable is a voltage or current.
The sequence of data may be taken only at successive discrete intervals
of time, instead of continuously. This is illustrated by the discrete ordinates
indicated in Fig. 1.
t

oj.

Time--+

FIG. 1.

[11=

Variation of a physical quantity with time.
]7-0)

17-02

INFORMATION THEORY AND TRANSMISSION

Stationary TiIne Sequence of Data. The variation of a physical
quantity with time constitutes a continuous time sequence of data. The
values form a distribution. If this distribution does not show a long-range
trend with time, the time sequence is said to be stationary. (See Chap. 13.)
Quasi-stationary time sequence is a distribution that is statistically
stationary (i.e., shows no trend) in the short range but not in the long
range.
Errors or Noise. There is usually a random error in the determination
of a given physical quantity, or in its representation by a given electrical

Amplitude
t

t+T
Time
FIG. 2.

Variation of a physical quantity, with superposed error, with time.

signal. This is illustrated by the erratic solid line of Fig. 2. The dotted
line is the same plot as Fig. 1.
A random error may be considered as added to the actual physical
quantity. If there is no long time trend in the distribution of the random
error, it is also a stationary sequence.
In electrical signals the added sequence is usually called "noise," and
in industrial processes, "disturbances."
Smoothing Problem. For data with random errors such as Fig. 2, an
averaging process could be used to reduce the error. This assumes that
the physical sequence is "smoother" than the data sequence. In an electrical signal representing the data such an averaging process can be carried
ou t by a filter.
Optimum Filter. There is likely to be some filter design which has an
optimum frequency-response characteristic. If the filter suppresses the
rapid departures too much, it also suppresses some real variations in the
physical quantity represented by the data. If, on the other hand, it does
not suppress them sufficiently, it is not reducing the error as much as is
feasible. For a classical theoretical analysis by Wiener see Ref. 1. The
optimum filter as designed by mathematical theory is not usually critical.
In practice an elementary filter is generally devised which approximates
it and gives almost equal performance.

SMOOTHING AND FILTERING

17-03

Predicting Problem. It is occasionally desirable not only to smooth
the data, but also to extrapolate or predict the data. For example in Fig.
2 it may be needed, at time t, to predict the most likely signal which will
occur at time t + T. This is feasible because the variation of the physical
quantity is restrained by physical laws, and it does not have complete or
random liberty of action.
Wiener's analysis indicates that prediction may also be effected with a
filter. An optimum design is secured from nearly the same formulation
as that used for the smoothing process.
0.6 r-----..,------.-------r-----,
o

~ ~

-8 :5 0.4 I------t-----+---~"'-I----'------I

........
::J

:a.

ctJ

;1:-

~ ~

Amplitude

~ ~0.2r----+-~~--r----+---~

~if

t!:

0.5

1.0

1.5

2.0

Radian frequency, w

FIG. 3.

Amplitude and phase characteristics vs. frequency of an optimum filter.

EXAMPLE.
Prediction Filter. The transfer amplitude response and
transfer phase shift of a smoothing and predicting filter designed according
to the Wiener theory are presented in Fig. 3. The ordinates are shown as

1.5

z.1.0

~

'Vi

c:
Q.I

...

"0

"~

Cl/-J,.

Q.I

:=

o

0.0.5

·~"'!!.:!!.s:e
~

Noise alone

o
o

0.5

1.0
Radian frequency,

FIG. 4.

------

1.5
W

Signal and noise power spectra.

2.0

17-04

INFORMATION THEORY AND TRANSMISSION

functions of the radian frequency w. The filter is designed for a prediction
time of 1 second, and for the signal and noise power spectrum illustrated
in Fig. 4. More details regarding the problem and the design are given
below.
Prediction: Discrete Data. When the data are discrete rather than
continuous, the solution is not simply embodied in the form of an electrical
filter. The solution describes instead a mathematical process that accomplishes an analogous averaging and predicting effect.
Basis of Treatment. In the present exposition, the Wiener theory is
followed (Ref. 1). Advantage is taken of the treatments of Levinson
(Ref. 2) and of Bode and Shannon (Ref. 3) in simplifying the presentation.
It is to be recognized that much other work has been done on the subject.
At the close of the chapter some of this work, particularly recent development, is noted.
SYInbois
A,B
An
A(w)

a, b, c
B(w)

bi

C
C,D,E,F

Ci
e
F(w)
j, j(t)
g, g(t)
gi
h(t)

I, II
i
I(w)
J,H
j, h
K(w)
Ko(x)
k(t)
Li

M,N
m,n
p

Q(w)

constants
numerator of partial fraction
amplitude part of transfer response, in nepers
portions of an integral
phase of transfer response, in radians
coefficient of pi in polynomial
parameter
numerators of partial fractions, sometimes with subscript
indices
capacitance of jth element in filter
base of N apierian logarithms
signal correlation function
signal amplitude
noise amplitude
coefficient of p in continued fraction
total instantaneous wave amplitude
integrals, forming part of m~re extensive formulas

v=i

current, function of frequency
limits of summation indices
summation indices
transfer response function of frequency, of Wiener filter
Bessel function of second kind, pure imaginary argument
transfer response function of time, of Wiener filter
inductance of jth element in filter
limits of summation indices
indices
Heaviside operator = iw
factor of (W)
cI>n(w)
cI>jf(W), cI>gg(w)
cP
cp(r)
CPn(r)
cpjf(r), cpgg(r)
cpjg(r), cpg/(r)
'l'(w)
'l'*(W)
'l'n(W), 'l'n *(W)
1/;(r),1/;*(r)
W

17-05

Fourier transform of Q(w)
resistance
variable of integration, for radian frequency
prediction time
time variable
variable of integration, for radian frequency
voltage, function of frequency
voltage, function of time
voltage amplitude
numerators of partial fractions
transfer response function of frequency
admittance, function of frequency
impedance, function of frequency
driving point impedance, function of operator P
transfer impedance, function of operator P
transfer impedance, function of frequency
zeros of polynomials, with imaginary parts > 0
complex conjugates of corresponding a's and (3's
constant
integration limit on time variable
constant
natural logarithm of'l'(w)
indices
time variable, for correlation or integration
correlation spectrum, Fourier transform of cp(r)
correlation spectrum of nth derivative of input function
autocorrelation spectra
phase shift
correlation function
correlation function of nth derivative of input function
autocorrelation functions
cross-correlation functions
factor of cI>(w) for which singularities have imaginary parts
>0
complex conjugate of'l'(w)
corresponding factors of 'i'n(w)
Fourier transforms of 'l'(w), 'l'*(w)
radian frequency
zero of polynomial, with imaginary part > 0
complex conjugate of Wj

2. DEFINITIONS: CORRELATION

Autocorrelation Function. The amplitude of a signal at any given
time is not wholly independent of its value at other times. The correlation that exists may be expressed in terms of an autocorrelation function.
In Fig. 5 the signal amplitude f is measured at times t and t + r. The
autocorrelation function cjJ(r) of a signal is the average product of the
signal at time t and the signal at time t
r, averaged over a period of

+

17-06

INFORMATION THEORY AND TRANSMISSION

t+r
Time--?

FIG. 5.

Data taken for determination of autocorrelation coefficient.

time long enough to smooth out instantaneous fluctuations.
overscribed bar to indicate averaging,
cp(r) = J(t)J(t

or

1
cp(r) = lim -

(1)

()-+"'>

2()

With the

+ r),

I() J(t)J(t + T) dt,
_()

where T = finite time shift.
The autocorrelation function is a measure of the extent to which the
value of J(t) at any given time can be used to predict J(t) at a time interval
T later.
EXAMPLE. An autocorrelation function is illustrated in Fig. 6. This is
the one assumed for the signal whose spectrum is illustrated in Fig. 4.
1.5 , . . . - - - - . , - - - - . , - - - - . , - - - - ,

~

-e. 1.0 t - - - - - - t - - - - + - - - - - - + - - - - - l
C
o

:es::::
~

s::::

o
~ 0.5 t------t-~---+-----.:~-+-----l

f!

o

(.)

o-~2-----~1------~0------~1------~2
Correlation time,

FIG. 6.

T

Autocorrelation function.

SMOOTHING AND FILTERING·

17-07

The autocorrelation function in general is at a maximum at T = 0,
and it usually drops off to zero or nearly zero for large values of T. It
shows even symmetry about T = O. The peak is broad when J(t) contains
primarily low frequencies and narrow when J(t) contains primarily high
frequencies.
Power Density Spectrum. It is useful to deal with the Fourier transform of the autocorrelation function cp(T) which may be called the autocorrelation spectrum, pew). This is
(2)

and reciprocally
(3)

Wiener (Refs. 1 and 4) has identified the autocorrelation spectrum  0,
= complex conjugates of a, {j.

The a's are the zeros, and the (j's poles, of the function / Yew) /2. A
plot of these, for the function plotted in Fig. 7, is given in Fig. 13. There

I

o Zeros x Poles
3

tra

Co

~o

c:

·00
ra

E
-1

... 2

-1

o

2

Real part, w
FIG. 13.

Singularities of rational correlation function.

are only poles, one being above, and the other below, the axis of real w's.
The response indicated by Fig. 9, with zero phase shift, is not represented
by a rational function, and hence it cannot be described simply by zeros
and poles in the complex frequency plane.
Location of Zeros and Poles for Minilllulll Phase Network. From
Bode and Shannon (Ref. 3), the minimum phase network has the transfer
response function
(w - al)(w - (2)
(13)
Yew) = K - - - - - - - (w - (jl) (w - (j2)
The zeros and poles are all in the upper half-plane of the complex frequency space. A plot of the zeros and poles which applies to the amplitude

SMOOTHING AND FILTERING

17-13

response of Fig. 9 and phase shift of Fig. 11 is shown in Fig. 14. In this
specific case there are no zeros and only one pole, which is in the upper
half-plane.

I
o Zeros x Poles
3

-e
co

a.
~O
co
c

·00
co

E
-1

-2

o

-1

2

Real part,

FIG. 14.

W

Singularities of physically realizable shaping network.

4. DESIGN OF OPTIMUM FILTER

Criterion of Opthnization. A criterion of performance is necessary
to judge when one filter design is better than another. That used by Wiener
is based on a comparison between the filter output and the actual signal,
freed of noise, at the extrapolated time. The difference, or error, is measured as a function of time, and its root-mean-square value determined.
The optimum filter is taken as that for which the root-mean-square error
(Chap. 13) is a minimum. This assumption is important to the development of the theory.
Wiener Solution, Smoothing and Prediction. The optimum filter
proposed by Wiener has a frequency response which may be designated as
K(w). This has a Fourier transform k(t). The reciprocal relations between
the two are:

K(w) =

(14)

foo k(t)e-

iwt

dt

-00

(15)

k(t) =

f

100.
twt

~
27r

K(w)e

dw.

-00

In the simple Wiener solution the transfer response of the optimum filter
has the value
(16)

K(w)

=

1

27r'l1(w)

i

00

0

e- iwt dt

f

00

-00

 (u)

~ eiu(t+T) du
'l1*(u)

17-14

INFORMATION THEORY AND TRANSMISSION

Here (w) of eqs. (8) and (9). The factors are
taken, as in eqs. (12) and (13), such that
(w) assumed known, is given by
Levinson (Ref. 2) as

(19)
(20)

_1

X(w) -

~
7r

00

0

w log rI>(s)
2
2 ds.
w-s

'l'*(w) = rI>(w)/'l'(w).

Identification with Minimum Phase Network. The connection of
eq. (17) with eqs. (12) and (13) identifies 'l'(w) with the transfer response
characteristic Yew) of the minimum phase shaping network which has the
amplitude response characteristic

I Yew) I = vi ff(W) ( =

(27)

--.

2

w

+1

The Fourier transform is found in Campbell and Foster (Ref. 7). This
is tabulated with the argument p = iw instead of w, so that
-1

cI>fI(P) = - - .
p2 _ 1

(28)

From pair 444 (Ref. 7)
(29)

This is represented in Fig. 6, ignoring the factor
For the noise alone:

%.

(30)

where C
(31)

---?

00.

(r)

1

1
2
w + 1

---? - - -

+1
+ eCe-IT/CI).

(w/C)2

= !(e-ITI

+

e2 ,

For the conditions illustrated in Figs. 9 and 10, where it is assumed
that the phase response of the shaping network is zero (Sect. 3),
(33)
(34)

SMOOTHING AND FILTERING

17-17

From pair 558 (Ref. 7),
(35)
Here 1(0 is the Bessel function of the second kind, and imaginary argument
(the ac tual argument being i IT I). It is tabula ted in Watson (Ref. 8).
This is illustrated in Fig. 10.
For the shaping. network assumed as physically realiz;~ble: .
-'L

(36)

wJJ(w) = - - ' . '
w -

'L

1

(37)

,'.W!f(P)

= --.

p+ 1

From pair 438 (Ref. 7), .

t> O.

(38)
This is illustrated in Fig. 12:

i
wJf*(w) = --.'

(39)

w+'L

-1
wJf*(p) = - - .
P- 1

(40)

From pair 439 (Ref. 7),

t < O.

(41)

(42)

" e[w - (i~/e)]
w(w) =, .
.
'
w -

.
w*(w)

=

e[w

'L

+ (i~/e)]
.
w+'L

1
--=~------'-------;:==:--

W*(w)

e[w

~

-:iHw +

D
w -' i

(iVI

+

e2 /e)]

E

+ w+

(iV 1

+

e2 / e) ,

17-18

INFORMATION THEORY AND TRANSMISSION

where
-~

D =

'·e

, _/

+ 'v I + e2
~

E=

'

,

e+ VI +e2

.

De-T I(w - i)

. e-T
(43)

K(w) =

_/

(e + v'1 + e2 ) (v 1 + e2 + eiw)
-T

=e

1
• ----,----

exp [-i tan- 1 (Ew/~ )]

e+ v'1 + e'2

v'I + e2 +

e2 w2

The form of eq. (43) indicates the nature of the variables in K(w). The
first factor depends only upon the prediction time T, and the second only
upon the noise spectral density e2 • The denominator of the third factor
is a combined function of noise density and frequency. All three of these
quantities are real and hence affect only the amplitude response of K(w).
The numerator of the third factor is complex. It has a modulus of unity
and expresses the phase shift of K(w). The amplitude response and the
phase are the two quantities plotted in Fig. 3.
EXAMPLE 2. A second illustration, by Wiener, assumes no noise. The
function required of the filter is prediction. Consider
1

+ 1)2'

(44)

CP(w) = (w 2

(45)

1


O.

+ 1)2

SMOOTHING AND FILTERING

17-19

From eq. (18)
(47)

1
K(w) = 'I'(w)

r
J

00

!/I(t

+ T)e- iwt dt

0

(48)

This solution is also reached from the alternative formulation of eq. (26).
The solution consists of a term independent of wand a term which
comprises a differentiation of the input wave. This was also the case in
the previous example, except that there the total band was limited by the
denominator. In the present case the total band is infinite.
In polar form the solution is
(49)

K(w) = e-T V1

+ 2T + T2 + w2T2 exp

{itan-l [wT/(l

+ T)]}.

5. EXTENSIONS OF PROCEDURE

Filters with Lag. On occasion the urgency in time of reproduction of a
signal mixed with noise is not so great as the need for greatest feasible
reduction of the noise in the reproduction.
In such cases the prediction time T is advantageously changed into a
lag, that is, T becomes negative. The optimum filter formula of eqs. (16)
and (26) still holds generally, but there are difficulties in carrying out the
second integration, with the lower limit of zero.
Wiener (Ref. 1) suggests an approximation thus:
0T
etw

(50)

~

[1+ (iwT/2V)]"

•

1 - (iwT/2v)

The single case is considered where the ratio of eq. (24) contains only
one pole in the upper half-plane of w. This is assumed of the first order, at
WI. A possible pole in the lower half-plane may be at W2*. Then
iwT

(51)

(52)

ipjf(w)e
'I'*(w)

~

~

[1 +
1-

~
n=l

(iWI T /2v)]"
(iWI T /2v)

F
(w ~ WI)(W - W2*)

.An
[1 - (iWIT /2v)]n

X

+ (w -

WI)

+ (w

Y
- W2*) .

17-20

INFORMATION THEORY AND TRANSMISSION

In eq. (52) the first summation term in the partial fraction expansion is
approximated only to n = N ::; v. Then
(53)

K(w) ~

(

LN A n
n=l

[1 - (iwlT j2v)]n

X-) /'It(w).
+w - Wl

In the example which was illustrated by eqs. (36) to (43), Wiener gives
the result as
(54)

T2j2
K(w) = [1

1

+ (Tj2)][(Tj2)VI +

e2 ....;

e]

+ iw

+ (VI + + eiw)[1 e2

(iwTj2)]

1 - (T/2)

+ [1 + (T/2)](e + VI + e2 )(VI + e2 + eiw)
Where the noise level is high and the signal weak a simple formula is
obtained for the optimum filter with lag. Let
(55)

n(w)
may be factored as
(59)

Then the Wiener formula for the optimum filter is, for the prediction time T,
(60)

K(w) = 1

(iwT)n-l

+ iwT + ... + - - (n - 1)!

i oo

n
+ w
271"'l'n(w )

.

e-~t&

0

[ (e

.T

W

iut
'l'n(u) e
-00
un

foo

X

(iUT)n-l]
1) - (iuT) - ... - - - - duo
(n - 1)!

-

This is the form of the equation in which no noise is assumed in the input. Where there is noise the 'l'n(u) in the integrand would be replaced
by the expression cI>ff(w)/'l'*(w) in eq. (16).
EXAMPLE. Assume that the correlation spectrum of eq. (44) describes
the correlation in the second derivative of the input function and that no
noise is present. That is,
(61)

or

1

cI> (w) -

2

- (w2

+ 1)2 •

Then the solution for the filter for the prediction time T, as given by eq.
(60), is
oo . foo 'l'2(U) .' .
2
w
J(w) = 1 iwT +
e- UJJt dt
- - 2 - elut(ewt - 1 - iuT) duo
271"'l'2(W) 0
-00
u

i

+

As in eq. (46),

Let
-1

I = 271"
-1

1=271"

foo [
-00

foo
-00

2

U

iut
e

. 2

(u - ~)

.
(e luT - 1 - iuT) du,

eiu(t+T)
eiut
iTe iut ]
du
u 2(u - i)2
u 2(u - i)2
u(u - i)2

The integration of the third term of the integrand is accomplished by the

17-22

INFORMATION THEORY AND TRANSMISSION

combination of pairs 210 and 442 (Ref. 7) and gives
c = -Tte- t - Te- t

+ T,

t>

o.

The integration of the first and second terms is accomplished by the
further application of pair 210, and gives

+ T)e-(t+T) - 2e-(t+T) - (t + T) + 2,
te- t + 2e- t + t - 2,
t > o.

a = - (t

b=

t

+ T> 0,

Thus

I=a+b+c
= -te-t(e-T + T - 1) - e-t(Te-T + 2e-T + T - 2),

t>

o.

Continuing to the second integration of eq. (60)

II

=

i

=

oo

I e-iw' dt

ioo(Ale-'

-A

+ Be-')e;W' dt
iB

------.
(w - i)2

W -

t

'l'2(W)

+ iwT + w2A + w2B + iw3B.
1 + iw"T + (iw)2(Te-T + 3e-T + 2T - 3)
+ (iw)3(Te- T + 2e-T + T
K(w) = 1

(62)

K(w) =

- 2).

In this solution, there are three successive differentiations of the input
wave to add to the constant term. As in the previous example, characterized by eq.(48), the absence of noise in the assumptions leads to an infinite
band in the filter.
"Filters" for Discrete Data. The signal which has been assumed in
the discussion up to this point has been continuous. However, it could be
expressed as the amplitude of a succession of discrete pulses, such as the
maximum daily temperatures at a given location.
Where the data are discrete, in electrical form or not, they cannot be
passed through a physical electrical filter to carry out the operations which

SMOOTHING AND FILTERING

17-23

have been discussed. The operations can, however, be expressed in terms of
mathematical processes, with only small changes from the previous description.
For discrete data eq. (1) becomes
1

(63)'

cp(n)

= 1I~~~ 2M + 1

!If

mEM f(m)f(m + n),

and eq. (2) becomes
00

(64)
n=-oo

The factorization problem of eq. (17) is modified somewhat, because
cI>(w) is more likely to be an empirically determined rather than an analytically expressed quantity.
Wiener (Ref. 1) has indicated various means for handling this problem;
one method is the following.
Equation (17) may be written:
(65)

log (w) =

L

00

ane- iwn

+ ao + L

ane- iwn

n=-oo

With cI>, and therefore log cI>, real, the series shows symmetry between
positive and negative n's. From the discussions, regarding eqs. (10),
(11), (12), and (17) to (20), and recognition that the first term at the
right of eq. (66) shows no response at positive times (n > 0), and the third
term no response at negative times (n < 0), the terms of eq. (66) may be
identified with those of eq. (65). That is,
00

(ao/2)

+ L ane-iwn
1

are identified with log 'IF(w), and
-1

(ao/2)

+L

ane- iwn

-00

with log 'IF*(w).
This permits the computation of 'IF(w) and 'IF*(w) from the Fourier series
of eq. (66). The Fourier series itself may be obtained empirically by
numerical computation or by the use of a harmonic analyzer.
Other empirical solutions of the 'factorization problem have been presented. Some of these relate to the determination of the minimum phase

17-24

INFORMATION THEORY AND TRANSMISSION

of a network which shows an empirical transfer amplitude response characteristic (Refs. 6 and 9).
Additional empirical solutions for the factorization have been presented,
related to other problems (Ref. 10).
Alternatives to Wiener Criterion of Optilllization. Lee (Ref. 11)
has explored the possibilities of an alternative to the Wiener criterion.
He minimizes the integral square cross-correlation error (integrated with
respect to time). He expresses the conditions in terms of an autocorrelation of the autocorrelation function (as if the latter were itself a signal)
and a cross-correlation of the autocorrelation function and the crosscorrelation function. In these terms the specifications for the optimum
filters are completely analogous to those of Wiener.
Zadeh and also Middleton and van Meter have outlined possibilities in
the design of an optimum filter which uses the methods of decision theory.
(See Ref. 12.) This utilizes all the information known to the designer about
the signal and the noise, and optimizes the decision. This optimum minimizes the integrated risk of wrong decisions. The results obtained are
chiefly of conceptual value, because the computational problems are
formidable even in relatively simple situations.
Other authors have also applied criteria of or akin to those used in
decision theory (Ref. 13). The procedures are particularly effective conceptually where the signal interpretation occurs under extremely unfavorable noise conditions, such as for the signals from fringe areas of search
radars. The paper of Middleton and van Meter (Ref. 12) contains a
complete bibliography.
Zadeh and Ragazzini (Ref. 14) have extended the Wiener theory to the
case where the data are described by a nonstationary time series. Particularly they assume its approximation by a polynomial of a given order
in time, with unknown coefficients. Such cases have a certain practical
interest. The treatment uses an approach suggested in a report by Bode,
Blackman, and Shannon, in 1948, to the Research and Development
Board.
Chang (Ref. 15) has enlarged the Wiener criterion by considering also
integrated squares of errors in the frequency domain. The integration
further weights these errors according to arbitrary functions of the frequency. He develops two theorems, a minimization theorem and a separation theorem. The first is an extension of the Wiener theorem in terms of
frequency, and the second represents an alternative procedure to the
factorization methods of Wiener.
Still other authors (Ref. 16) have given consideration to extending the
hypotheses under which the Wiener criterion can be used. In particular
they have extended the treatment to include nonstationary noise and a
system having time-varying linear parameters.

SMOOTHING AND FILTERING

17-25

Nonlinear Prediction. The discussion so far has centered on a filter
which performs linear operations on the signal input to it. That is, it
multiplies the Fourier components of that input by a given numerical
factor and shifts their phase by a given angle. Both of these vary with the
frequency of the component, but they are independent of the amplitude
of the component.
Some thought has been given by Bode and Shannon (Ref. 3) and others
(Ref. 17) to the possibilities of nonlinear prediction, in which the operation
on the signal would vary with the signal amplitude. At the expense of
functional complications, this permits improvement in the accuracy of
prediction under certain conditions. The problem has not been worked out
to anything like the analytical detail devoted to linear prediction.
6. NETWORK SYNTHESIS

Introduction. Many of the problems discussed so far lead to a solution
in the 'form of an electrical filter. Specification of the transfer properties
of this filter comes out as the end product of the solution in terms of
frequency as K(w), or of time as k(t). In an actual case a further step is
necessary, namely the network synthesis. The filter or network must
be built, and it needs specification in terms of components.
This is an art with an extensive background and innumerable ramifications. The scope of the present discussion is limited to electric networks,
with some references for amplification. In data signal processing, on occasion the mechanical properties of equipment may affect signal propagation. Some discussions of electromechanical elements have be~n given by
Everitt and Anner, and by Graham (Ref. 18).
The stage of analysis considerea at this point bridges some of the steps
from a theoretical toward a schematic design of the equipment. Some of
the solutions which have been advanced in the present chapter, particularly
to problems which are largely of prediction and exclude noise, are essentially formulas for analog computation (see Vol. 2, Analog Computers).
COnlponents of Electric Networks. The elements composing electric
networks within the limited scope of this treatment consist of resistances,
inductances, and capacitances. These elements are marked by various
relations between voltage across them and current flow.
The· voltage may be set up as a function of time, as

vet) =

Vo

cos (wt

+ cj»

or as a function of radian frequency, as
V(w) = voe icfJ •

Here it is taken as a complex quantity.
expressed.

The current may be similarly

17-26

INFORMATION THEORY AND TRANSMISSION

The relationship between the two, for a resistance, is

Yew) = RI(w),
where R is the value. of the resistance (idealized as independent of frequency). For an inductance the relation is

Yew) = iwLI(w)
and for a capacitance it is

Yew) = I(w)/(iwC).
The quantity

Z(w) = V(w)/I(w)
is called the impedance, and its reciprocal

Yew) = l/Z(w) = I(w)/V(w)
is called the admittance.
The impedance (or admittance) can be expressed for any aggregation of
interconnected elements ending in two terminals. As such it may be called
the "driving point impedance" (or admittance) of that network at the
specified pair of terminals. In an aggregation it is also possible to measure
the voltage at one pair of terminals, and the current at another pair.
Here the voltage to current ratio is called the transfer impedance between
the two pairs of terminals, and the current to voltage ratio the transfer
admittance.
Specification in Tenns of Transfer Response. The properties
required in filters such as have been specified in Fig. 3, or in eqs. (16), (26),
(43) and others, have been expressed as a transfer response. When a function of frequency, it has been called K(w), and when of time, k(t). According to the circumstances of the particular equipment considered, the response as a function of frequency may be set up as a transfer impedance or a
transfer admittance. It may also be a transfer ratio merely of voltages,
or of currents.
For simplicity, consideration here is limited to a case practical in vacuum
tube circuitry, where the input to the filter is taken as a current, and the
output from it a voltage. That is, K(w) is identified with a transfer impedance, or ZT(W).
Cauer's Method of Synthesis. In Fig. 15 a vacuum tube gives output
I into the filter. The voltage V across the terminating resistance R drives
a succeeding vacuum tube. The figure shows a generally practical case
of the filter both starting and ending with a bridged capacitance. The
filter itself therefore comprises an odd number of elements. If in any case

17-27

SMOOTHING AND FILTERING

it is desired to omit one of the end capacitances it may be assumed to
approach zero.
For a filter of this type, the transfer admittance is a rational function
of W with n poles and no zeros, and can be written as
(67)

Zo
ZT(W) = - - - - - - - - - - (w - Wl)(W - W2) ..• (w - wn )

Cauer (Ref. 19) has indicated a method for synthesizing the network from
this function, which has been noted by Peless and Murakami (Ref. 20).

R

t

V

~__~__~____~~~__~t
FIG. 15.

Low-pass ladder filter.

For this, Z is changed to a function of P = iw, thus
(68)

Zl(P) =

R
bnpn

+ bn_lpn-l + ... + blP + 1 .

The normalizing factor becomes R.
Then the driving point impedance Zl (p) of Fig. 15 (excluding the
terminating resistance R) is found by dividing the even part of the denominator by the odd part, and multiplying the whole by the normalizing
factor R.
bn_lp n- l
bn_ 3 p n- 3
b2p2
1
(69)
Zl(P) = R
.
bnpn
bn _ 2p n-2
blP

+
+

+ ... +
+ ... +

+

Z is then expanded into a continuing fraction, as
(70)

R
Zl(P) = - - - - - - - - - 1
glP

+ --------1

g2P+-----g3P

+

1

17-28

INFORMATION THEORY AND TRANSMISSION

Zl is found from the elements of Fig. 15 as

(71)

1

Zl(P) = - - - - - - - 1 - - - -

pC l

+ --------1

Thus
C l = gdR,

L2 = g2R,
C3 = g31R,

(72)

Cn

= gnlR .

ILLUSTRATION. The solution of the illustration expressed by eq. (43)
represents an especially simple case.

(73)

ZT(P)

= (e/VI

+

e2 )p

+

l'

Here K represents a constant factor or gain adjustment to be set when
lining up the equipment.
R
(74)
Z1(P) = (e/VI + e2)p'
(75)

C1 = (e/V 1 + e2 )IR.

Butterworth-Th 0 III son Filters. A few general characteristics of the
performance of filters like Fig. 15 may be noted. When a stepped wave
signal, as indicated in Fig. 16a, is used as input, the output signal has the
essential character of Fig. 16b. Where the input amplitude is current, and
the output amplitude is voltage, this trace in Fig. 16b is called the indicial
impedance of the filter.
The trace is distinguished by a rise time, which measures the duration
between crossings of 0.1 and 0.9 of the final amplitude. (Somewhat different ranges are occasionally used.) The trace is also distinguished by
an overshoot. This is measured as a per cent of the final amplitude.

SMOOTHING AND FILTERING

17-29

In the design of such filters for general purposes (and in the absence of
a specific formulation like the vViener equations) it is usually desirable to
conserve the shape of the input signal as much as feasible. This is obtained
with a short rise time and low overshoot.
I t is similarly desirable to conserve frequency space. In terms of a
response characteristic, such as illustrated in Fig. 9, the region where the
response is large is called the passband. The region where it is small is
called the elimination band. An intermediate region is called the rolloff
Overshoot

t

Q)

S

~

c..
E

C2:

o
Time~

Time~

(a)

FIG. 16.

(b)

Passage of step function signal through filter: (a) input, (b) output.

band. Conserving frequency space consists in limiting the total band,
comprising passband and rolloff band together. For a given passband it
means limiting the ratio of the rolloff bandwidth to the passband width.
Such conservation is significant in that it represents a system cost, in
money, bulk of apparatus, and other factors, to maintain transmission
over a wider frequency band than necessary.
Within a given frequency space (passband plus rolloff band) rise time
and overshoot conditions are mutually antagonistic. A plot of one versus
the other, for some filter designs of the type of Fig. 15, is shown in Fig. 17.
Here the rise time is normalized in a manner discussed below.
A series of filter designs was presented by Butterworth (Ref. 21) characterized by a minimum of curvature of the response characteristic in the
passband and called maximally flat amplitude. This is plotted in Fig.
17 for m = 0 (Butterworth). Here n has the same meaning as in Fig. 15.
The design condition leads to generally short rise time, but fairly high
overshoot.
A similar series of designs was presented by Thomson (Ref. 22) characterized by a minimum of curvature in the phase characteristic and called
maximally flat envelope delay. It is plotted in Fig. 17 for m = 1 (Thomson).
This design condition leads to generally higher rise times, but lower overshoots.

INFORMATION THEORY AND TRANSMISSION

17-30

20~----~-------r------~------~~------~--~

10
8
6
4

.....
c:
(1)

u

....

2

(1)

a.

C;
0

.s::

l!?
(1)
>

00.8
0.6
0.4

0.2

1.2
5
Elements
0.1
2

3

4

Rise time. normalized seconds
FIG. 17.

Design curves, transitional Butterworth-Thomson filters.

Transitional Butterworth-ThoInson Filters. Peless and Murakami
(Ref. 20) have prepared a series of designs intermediate between these two,
by degrees indicated by the parameter m. Their rise time versus overshoot performance is plotted in Fig. 17.
In all these filter designs the ratio of roll off bandwidth to the passband
width tends to reduce as the number of elements is increased. This is why
the better compromises between rise time and overshoot in Fig. 17 appear
for the smaller numbers of elements.
A rough measure of the utilization is given by the ratio of the frequency
for which the drop in response is large (say 30 db, or an amplitude ratio of
1 to 31.6) to the frequency for which the drop is small (say 3 db, or an
amplitude ratio of 1 to 1.41). In the figure this quantity is called the
cutoff flatness; it is plotted in dotted lines. Each line indicates a locus of
compromises, between rise time and overshoot, for a given fixed degree. of
frequency band conservation.

SMOOTHING AND FILTERING

17·31

The normalized rise time has been taken, in the Butterworth filters, to
indicate the rise time of a filter whose response has dropped 3 db at a radian
frequency of 1 radian per second (or a cyclic frequency of 1/(271") cycles
per second). The rise times for the other filters are for designs whose amplitude response characteristics in the elimination band are asymptotic to
those of the Butterworth filters. Where m is positive, the 3-db points of
the filters come at lower frequencies than for the Butterworth filter. The
exact amounts are indicated by w in Tables 1 to 4.
TABLE

-0.2
1.054

o

0.2
0.949

1

m

CIWR
L2W/R
C3wR
L 4w/R
C5wR

0.8
0.820

1.0
0.786

1.2
0.756

o

o

o

o

o

0.673
1.486

0.643
1.554

o

0.707
1.414

0.618
1.618

0.596
1.677

0.577
1.732

0.561
1.782

1.0
0.712
0.411
1.184
2.055

1.2
0.671
0.398
1.168
2.148

1.0
0.659

1.2
0.617

o
1

0.500
1.333
1.500

2.

0.2
0.933
0.478
1.287
1.624

-0.2
1.064

o
1

3.

FOUR-ELEMENT FILTERS

o

o

o

0.354
1.008
1.508
1.856

0.342
0.978
1.492
2.004

0.330
0.951
1.484
2.143

0.320
0.928
1.481
2.273

0.311
0.907
1.482
2.395

1.0
0.617
0.262
0.767
1.221
1.659
2.453

1.2
0.572
0.255
0.747
1.203
1.676
2.602

o

o

0.368
1.043
1.535
1.698

0.309
0.894
1.382
1.695
1.545

0.8
0.756
0.425
1.201
1.958

0.8
0.712

'0.383
1.082
1.577
1.531

1

0.6
0.810
0.441
1.224
1.854

0.6
0.774

0.399
1.127
1.643
1.353

o

0.4
0.868
0.458
1.252
1.743

0.4
0.845

o

-0.2
1.064
0.321
0.928
1.435
1.770
1.323

THREE-ELEMENT FILTERS

0.2
0.924

TABLE

w

0.6
0.859

o

TABLE

C1wR
L 2w/R
C3wR
L 4w/R
C5wR

0.4
0.902

0.747
1.338

-0.2
1.064
0.525
1.393
1.368

m

Two-ELEMENT FILTERS

o

TABLE

w

1.

4.

0.2
0.916
0.298
0.864
1.338
1.656
1.751

o

o

FIVE-ELEMENT FILTERS

0.4
0.824
0.288
0.836
1.300
1.640
1.945

0.6
0.740
0.279
0.811
1.269
1.638
2.125

0.8
0.671
0.270
0.788
1.243
1.646
2.295

Design Data. The information given above permits making a general
compromise choice of filter for any given situation. The specific compromise choice depends upon what ultimate use is made of the signal. Where

17-32

INFORMATION THEORY AND TRANSMISSION

the timing indication is important, the compromise would favor low rise
time as against low overshoot. Where amplitude indication is important,
the reverse holds. Where frequency conservation is important as compared
with a more favorable rise time versus overshoot compromise, or where
other specifications indicate need of sharpness in cutoff, enough elements
to do this may be chosen.
The element values, taken from the Peless-Murakami paper (Ref. 20),
are listed in Tables 1 to 4. The values are normalized as for the rise times,
and are also normalized to a termination resistance R. Thus the entry
of 1 for CwR means that C alone is 1 farad divided by 27r times the cyclic
frequency of the Butterworth 3-db cutoff, and again divided by R. If
that cutoff is 1000 cycles, and R is 1000 ohms,

C = 1 X (27r X 103 )-1 X 10-3 farad
(76)

= 0.159 microfarad.

Similarly the entry 1 for Lw/R means that L alone is 1 henry times R
divided by 27r times the cyclic frequency of the Butterworth 3-db cutoff.
Again for a cutoff of 1000 cycles, and R of 1000 ohms.
(77)

L = 1 X (27r X 103 )-1 X 103 = 0.159 henry.

The even element filters are all designed for C1 = o.
Tchebysheff-Darlington Filters. More general discussion of the
synthesis of networks has been presented by Darlington (Ref. 23) and by
Grossman (Ref. 24). The procedures there employed lead to the use of
mathematical contributions of Tchebysheff, and the filters have been
called Tchebysheff-Darlington filters. Essentially in the papers referred to,
the design is specifically applied to filters in which tolerances are placed
on a permissible variation in transmission over the passband, and on a
required completeness of suppression in the elimination band.

REFERENCES
1. N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series,
Wiley, New York, 1949. (Published in a Classified Report in 1942.)
See also:
A. Kolmogoroff, Interpolation und Extrapolation von stationaren zufalligen Folgen,
Bull. acado sci. U.R.S.S., Ser. math., 5, 3-14 (1941).
H. Jacot, Theorie de la Prevision et du filtrage des series aIeatoires stationnaires
selon Norbert Wiener, Ann. Telecommunications, 7, 241-249, 297-303, 325-335 (1952).
2. N. Levinson, A heuristic exposition of Wiener's mathematical theory of prediction
and filtering, J. Math. and Phys., 26 (2), 110-119 (1947). (Reprinted in Ref. 1.)

SMOOTHING AND FILTERING

17-33

3. H. W. Bode and C. E. Shannon, A simplified derivation of linear least square
smoothing and prediction theory, Proc. I.R.E., 38, 417-425 (1950).
4. N. Wiener, Generalized harmonic analysis, Acta Math., 55, 117-258 (1930).
5. H. Nyquist, Certain topics in telegraph transmission theory, Trans. Am. Inst.
Elec. Engrs., 47, 617-644 (1928).
6. H. W. Bode, Network Analysis and Feedback Amplifier Design, Van Nostrand,
Princeton, N. J., 1945.
7. G. A. Campbell and R. M. Foster, Fourier Integrals for Practical Applications,
Collected Papers, American Telephone and Telegraph Company, New York, 1937;
also Bell System Tech. J., 7, 639-707 (1928).
8. G. N. Watson, A Treatise on the Theory of Bessel Functions, Macmillan, New York,
1944.
9. D. E. Thomas, Tables of phase associated with a semi-infinite unit slope of
attenuation, Bell System Tech. J., 26, 870-899 (1947).
10. E. O. Powell, An integral related to the radiation integrals, Phil. Mag., 34 (7),
600-607 (1943).
A. Fletcher, Notes on tables of an integral, Phil. Mag., 35 (7), 16-17 (1944).
F. W. Newman, The Higher Trigonometry, Superrationals of Second Order, Macmillan and Bowes, Cambridge, England, 1892.
A. Fletcher, J. C. P. Miller, and L. Rosenhead, An Index of Mathematical Tables,
Scientific Computing Service, Ltd., London, 1946.
11. Y. \V. Lee, On Wiener filters and predictors, Proceedings of the Symposium on
Information Networks, April 12-14, 1954, Vol. III, pp. 19-29, Polytechnic Institute of
Brooklyn, N ew York.
12. L. A. Zadeh, General filters for separation of signal and noise, Proceedings of the
Symposium on Information Networks, April 12-14, 1954, Vol. III, pp. 31-49, Polytechnic Institute of Brooklyn, New York.
D. Middleton and D. van Meter, Detection and extraction of signals in noise from
the point of view of statistical decision theory, Pts. I and II, J. Soc. Ind. and Appl.
Math., 3, 192-253 (1955); 4, 86-119 (1956).
13. P. M. \Voodward and 1. L. Davies, Information theory and inverse probability
in telecommunication, Proc. Inst. Elec. Engrs. (London), 99, Pt. III, 37-44 (1952).
1. L. Davies, On determining the presence of signals in noise, Proc. Inst. Elec.
Engrs. (London), 99, Pt. III, 45-51 (1952).
D. O. North, Analysis of the Factors Which Determine Signal to Noise Discrimination
in Radar, Rept. PTR-6C, RCA Laboratories, June 1943.
G. W. Preston, The design of optimum transducer characteristics using the method
of statistical estimation, Proceedings of the Symposium on Information Networks, April
12-14, 1954, Vol. III, pp. 51-59, Polytechnic Institute of Brooklyn, New York.
L. A. Zadeh and J. R. Ragazzini, Optimum filters for the detection of signals in
noise, Proc. I.R.E., 40, 1223-1231 (1952).
J. L. Lawson and G. E. Dhlenbeck, 'Phreshold Signals, Mass. Inst. Technol. Radiation Laboratory Series, Vol. 24, McGraw-Hill, New York, 1950.
T. G. Slattery, The detection of a sine wave in the presence of noise by the use of
a non-linear filter, Proc. I.R.E., 40, 1232-1236 (1952).
14. L. A. Zadeh and J. R. Ragazzini, An extension of Wiener's theory of prediction,
J. Appl. Phys., 21, 645-655 (1950).
15. S. S. L. Chang, Two network theorems for analytical determination of optimumresponse physically realizable network characteristics, Proc. I.R.E., 43, 1128-1135
(1955).

17-34

INFORMATION THEORY AND TRANSMISSION

16. R. C. Booton, An optimization theory for time-varying linear systems with
non-stationary statistical inputs, Proc. I.R.E., 40, 977-981 (1952).
R. C. Davis, On the theory of prediction of non-stationary stochastic processes,
J. Appl. Phys., 23, 1047-1053 (1952).
J. Bendat, A general theory of linear prediction and filtering, J. Soc. Ind. and Appl.
Math., 4, 131-151 (1956).
17. A. G. Bose, A theory for the experimental determination of optimum non-linear
systems, I.R.E. Convention Record, Pt. 4, pp. 21-30, March 1956.
R. Drenick, A non-linear prediction theory, Trans. I.R.E., PGIT-4, 146-152,
(Sept. 1954).
18. W. L. Everitt and G. E. Anner, Communication Engineering, McGraw-Hill, New
York, 1956.
R. E. Graham, Linear servo theory, Bell System Tech. J., 25, 616-651 (1946).
19. W. Cauer, Ausgangsseitig Leerlaufende Filter, ENT, 16, 161-163 (1939).
E. A. Guillemin, A summary of modern methods of network synthesis, in Advances
in Electronics, Vol. 3, pp. 261-303, Academic Press, New York, 1951.
20. Y. Peless and T. Murakami, Analysis and synthesis of transitional ButterworthThomson filters and band pass amplifiers, RCA Rev., 18, 60-94 (1957).
21. S. Butterworth, On the theory of filter-amplifiers, Exptl. Wireless and Wireless
Eng., 7, 536-541 (1930).
V. D. Landon, Cascade amplifiers with maximal flatness, RCA Rev., 5, 347-362
(1941).
22. "\V. E. Thomson, Networks with maximally-flat delay, Wireless Eng., 29, 255263 (1952).
J. Laplume, Amplificateurs moyenne frequence a distortion de phase reduite,
L'Onde Electrique, 31, 357-362 (1951).
23. S. "Darlington, Synthesis of reactance fourpoles, J. Math. Phys., 18, 257-353
(1939).
24. A. J. Grossman, Synthesis of Tchebyeheff parameter symmetrical filters, Proc.
I.R.E., 45, 454-473 (1957).

D

INFORMATION THEORY AND TRANSMISSION

Chapter

18

Data Transmission
Pierre Mertz

1. Introduction and Symbols
2. Formation and Use of the Electrical Signal
3. Transmission Impairment
References

18·01
18·07
18·18
18·30

1. INTRODUCTION AND SYMBOLS

Basic Considerations. Data which are generated at a given point,
either as a result of collecting original information or at the output of a
computer as a result of the processing of other data, often have to be
transmitted to some other point in order to be used for further data processing or remote control. Two basic parameters determine the extent of the
undertaking which this transmission involves.
1. The order of magnitude of the distance between the points of origination and utilization.
2. The nature of the data that 'are to be transmitted. This includes the
information content of the data and the frequency band which is required
to handle it in the transmission medium. For fundamental discussion on
this point see Chap. 16. The treatment here analyzes principally the
current practical art, in which the efficiency of utilization of the frequency
band is much lower than ideal.
The adaptability of the data signals to transmission over available
facilities is a practical factor of great importance. There are extensive
18·01

18-02

INFORMATION THEORY AND TRANSMISSION

systems of communication already set up, reaching over vast areas, which
are in current commercial use.
Transmission Distances Involved. The engineering effort required
to set up a system of data transmission varies considerably with the
distance involved. Some concrete illustrative gradations are:
A few inches or feet
One to a few hundred feet
One to a few miles
One to several hundred miles
Several hundred to several thousand miles
International or intercontinental facilities
This discussion stresses particularly lengths in the middle regions, from
a few miles to a thousand miles.
Nature of Transmission Facilities Available

These facilities need to meet certain requirements, discussed below. Of
these the frequency band which they are capable of transmitting is paramount, and is of chief concern here.
Local Wiring. The band which this wiring will handle is indefinite,
and it varies with the physical structure of the conductors and their inductive exposure to other electrical circuits. Bands have been handled
from less than 100 cycles to television bands of a few megacycles.
Telephone Facilities. These include all of the plant which has been
developed for telephonic purposes, and hence they comprise a wide variety
of facilities. They are sometimes nominally characterized as capable of a
3-kc band. This full width band is not usable for data transmission,
. partly because some of the facilities cut off below this and partly because
the lower frequency region, below 1000 cycles, is not likely to be effectively
employed in the data transmission (Ref. 1). See Sect. 2 for more quantitative details regarding a usable band.
There are telegraph facilities of narrower band, but since these are
usually multiplexed on telephone facilities, they are not considered separately.
Program Transmission. These circuits are commercially used for the
interconnection of radio broadcast stations. They have frequency bandwidths, in round numbers, of 3, 5, 8, and 15 kc (Ref. 2). As in the case
of the telephone facility, the full band cannot be expected to be utilizable
for data transmission. Also the commercial demand for the 8- and 15-kc
bands is very low, so that there is at present a substantial network of only
the 3- and 5-kc bands.

DATA TRANSMISSION

18-03

Television TransInission. An extensive network exists at present
interconnecting television broadcast stations and studios and facilities for
theater television (Ref. 3). The bandwidth of these facilities generally
runs to a little over 4 Mc. However, on older coaxial cable facilities the
bandwidth is only 2.7 Mc. Some experimental facilities of broader frequency band than 4 Mc have been furnished for short period tests, but
not on a commercial basis.
Other Wide Band Conductor TransInission. For economy, telephone facilities are frequently gathered together in more or less large groups.
The combined signal for the entire group is then handled over a wire circuit much as a single signal (Ref. 4). Groups of 48-kc bandwidth, and
super-groups usually of 5 groups, or 240 kc, are handled in this manner.
Also, on other types of system, bands of 16 kc are found.
The use of these types of bands would, of course, require the development
of arrangements for extending them from the terminals, at offices of the
common carriers where they are located, to other premises.
Carrier current facilities are also multiplexed by power companies on
power lines. These are suitable for data transmission (Ref. 32).
Radio Facilities. Radio facilities naturally present certain elements
of flexibility in their use compared with facilities provided over conductors.
The limits to this flexibility are, however, set by allocation problems and
by the propagation characteristics of the frequency region used (Ref. 5).
The frequency bandwidths used run from those for individual channels to
large aggregations of multiplexed channels which may include television
channels (Ref. 6).
The utilizable bandwidth for the individual channels is not necessarily
set by the adjacent allocations. It is often actually set by multipath echo
effects. It tends to run from something under a telephone bandwidth
(3 kc), up to the general order of magnitude of television channels (6 Mc).
Radio channels that form part of a large aggregation, particularly those
leased from common carriers, tend to run at telephone or television bandwidth, and differ little from similar circuits over conductor facilities.
Similarly, group and super-group bands of intermediate width are transmitted, but the use of these again requires the development of arrangements for extending them from the terminals.
Nature of the Data

Data consist fundamentally of two types of information (Ref. 7).
1. Choices among a group of possible conditions. A single datum, such
as a room temperature, represents the single choice out of an established
gamut. The total possible number of choices in that gamut depends both
on the range of the gamut and on the precision of the indication within the

A

18-04

INFORMATION THEORY AND TRANSMISSION

gamut. For example, a range of room temperatures may be established
between 50° and 90° F, and the indication may be given to individual
degrees. Then the datum represents one choice out of a possible 40, and
it may be this which is to be transmitted.
2. The timing of one or a series of events (Ref. 31). One might, for
example, send the equivalent of a clock ticking from one geographical
place to another, to assure the simultaneity of astronomical observations
made at these places.
In many cases in practice, both types of information may be needed.
For example, in air traffic control, both the position of a given plane and the
time at which it occupies that position, are needed. The position is indicated at the same time that it is occurring. Such a datum is said to be
sent in real time, as distinguished from sending it much later on as a component in some abstract calculation. Data sent in real time are characterized by becoming "stale," i.e., of losing their value, if delayed too long
in transmission.
Continuous Analog Data. In the room temperature example cited
above, the temperature may be represented by the position of the end of a
mercury thread, or by the angular position of a shaft (dial thermometer),
or again by the value of a given voltage. This position or voltage is not
the actual temperature, but may be identified as analogous to it. Such
data, where some different quantity varies proportionately (or according
to some other appropriate law of variation) to the quantity desired, are
called analog data. The demarcations between choices are not emphasized
in the datum quantity, but they are important in a statement of the indication.
Where the analog relationship between the utilized data and the original
quantity is not interrupted, the data are said to be continuous.
Discontinuous Analog Data. It may suffice to have the temperature
information once every 10 minutes instead of continuously. An analog
quantity may be set up in which the relationship to the original quantity
is interrupted when not needed. The results are called "discrete" or discontinuous analog data. This may make it possible to interlace other data
between the temperature readings.
Multiple Speed Analog Data. In the case of a clock it has been found
convenient to use the angular position of the shaft of the minute hand to
identify one out of 60 choices, or one minute in the hour. For general use,
however, a range of 12 hours is desirable. It is not convenient to use a
pointer that can identify one out of 720 possible choices. The problem
is solved by using two shafts, one geared to the other. The minute hand
identifies one out of 60 choices. The hour hand identifies one out of 12
choices, each of which corresponds to one group of the 60 choices of the

DATA TRANSMISSION

18-05

minute hand. The principle is sometimes extended by adding a third
shaft and hand to read seconds. It is even further extended in conventional
watthour and gas meters. All these are examples of "multiple speed" or
"multiple shaft" analog data.
Digital Data. The above process can be carried to its logical conclusion,
where each shaft distinguishes only one out of two choices. In this extreme
case the demarcations between choices are emphasized, and the choices
would more usually be indicated by two-position members rather than
by shafts. The choice may be considered as identified by a sequence of
binary indications or binary digits and the data are called digital. Less
extreme forms are sometimes used in which one out of three or more
discrete choices are indicated by each digit.
Digital information may be transmitted over a group of wires, byassigning one digit to each wire. This is known as parallel transmission. Or
the various digits may be assigned to successive ordered pulses (or spacing
intervals) on a single wire. This is called serial transmission, and the
series of digits may be ordered in either direction. Examples. The digits
indicating large values may come first (as in reading decimal digit Arabic
numerals), or those indicating small values may come first (as in adding or
multiplying operations with Arabic numbers).
TiIning Data. This information can be indicated in a variety of ways.
More usually the desired time is indicated by the wave front of an appropriate transition, say in voltage.
Starting and Other Auxiliary Inforlllation. In the example just
given above, where successive digits are transmitted serially, it is usually
desirable to identify the start of the sequence by some auxiliary information.
At other times the auxiliary information is in the nature of a pilot, reference
or calibration datum against which the magnitude of the utilized data are
compared before actual use. Other auxiliary information is sometimes
needed for error checking or possible other purposes.
Error Standards. It is not generally expected that data transmission
will be completely perfect. For one or another reason, errors are caused.
Thus in engineering a given system there is some need to give thought to
what kind of error performance will be acceptable.
In the case of analog data although the boundaries between successive
choices are not emphasized, the spacing between the choices is important.
This spacing is obtained from the precision which is found useful in the
data. It is expected that this precision will be maintained in the transmission of the data. I t is common to express the error expected or experienced, in terms of its root-mean-square (rms) value (see Chap. 13).
Occasionally a maximum error is noted, often say three times the rms

18-06

INFORMATION THEORY AND TRANSMISSION

value. In a Gaussian distribution, errors larger than this occur with a
frequency of about 1 part in 370.
Timing errors are measured by a similar rms or maximum displacement,
but in a timing variable rather than an amplitude parameter (Ref. 31).
In the case of digital data an elementary measure of the error is the
frequency of occurrence of errors in the binary digits in the data at the
receiver. On occasion, a more sophisticated measure is desirable which
takes account of the distribution of the errors in time. This is because
in general, when one digit in a specific group of digits is in error, the usefulness of the entire group is vitiated. Thus errors in close succession, in
such cases, do not cause as much ultimate impairment as when they are
more scattered. Measures of impairment in such cases are not easily
established without some detailed knowledge of the entire scheme for
setting up and using the data that are transmitted.
Where special measures are incorporated into the signals for error checking, it is usually convenient to count the frequency of occurrence of both
the detected and undetected errors. Of these, the first are apt to constitute only a minor impairment but the second are serious. Undetected
errors are those not detected during the test, but obtained from some later
comparison between the signals actually sent and those actually received.

SYIllbols
a
D

f
I
k

M
p(x)

R

r
r(w)

T
t

V
v
()

).
T

¢, ¢(w)
w

relative echo amplitude
envelope delay
cyclic frequency
wave amplitude
normalizing constant, equal to mean square value of I
matrix of resistances
probability density of variable x
resistance, with subscripts for specific cases
amplitUde ratio
amplitude ratio at radian frequency w
pulse duration time
time variable
voltage
instantaneous signal voltage
pulse separation time (front edge to front edge)
wavelength of ripple along cyclic frequency scale (= llf)
echo delay
phase shift at radian frequency w
radian frequency (= 27rf)

DATA TRANSMISSION

18-07

2. FORMATION AND USE OF THE ELECTRICAL SIGNAL

Encoding and Decoding

The first step in the preparation of a signal for transmission consists in
expressing the variable that is intended for transmission into some sort of
code that can be used to form the electrical configuration.
Analog Data. There is not very much latitude for coding such data,
aside from transferring from one type of physical quantity into another.
Thus a temperature or a distance may be transformed into a shaft rotation
or a voltage. The principal modification that can be introduced is the
insertion of some sort of nonlinear relation between the one quantity and
the other.
Digital Codes. A simple code into which an analog quantity may be
converted is the binary digital code (see Chap. 16). A diagram of the 8
choices for a 3 binary digit code is illustrated in Fig. 1. The dark areas
2345678

FIG. 1.

~

Mark

D

Space

Diagram of 3-digit binary code selections.

indicate, say, voltage (or current) "on," and the white areas, voltage (or
current) "off." They are termed respectively marking and spacing. A
variation of this code sometimes used to simplify an encoding mechanism
is the reflected binary or Gray code (Ref. 8). This is shown in Fig. 2. The
simplification in mechanism comes
2345678
essentially because the change from
any given choice to the next adjacent
choice involves the change of only
one binary digit. Other variations
of this simple code type have been
devised. One such is a coding to in- FIG. 2. Diagram of 3-digit reflected
elude negative values of the encoded
binary (Gray) code selections.
quantity (Ref. 9).
More complicated codes have been devised in which present code designation is a function also of past history of the value being encoded, or of
more than one variable (Ref. 10).
These complications are conceived principally to condense the information to be transmitted into the most compact code possible. They do
involve an increase in cost of equipment, and a loss in time at both en-

18-08

INFORMATION THEORY AND TRANSMISSION

coder and decoder that may be important where the transmission operates
in real time (see Sect. 1, Nature of the Data). Some economic study has
been made of such points (Ref. 11). See also Chap. 16.
Processes of Digital Encoding. Only some elementary principles can
be mentioned here (Ref. 9). More details of these processes are given in
VoL 2, Chap. 20.
1. A basic method of encoding consists in laying out the analog input
along one dimension of a two-dimensional code matrix and reading the
coded output along the other dimension. This can be utilized for any
arbitrary code. As an illustration the diagrams of Figs. 1 and 2 may
represent plates, with holes punched through the shaded squares, in a
cathode ray tube (Ref. 12). The electron beam is deflected along the
horizontal coordinate by analog voltage input. A subsequent vertical
deflection then gives the coded signal, in serial form, on an electrode beyond
the plate. The beam goes through the punched holes, but is stopped where
no holes exist.
2. A second basic method consists in encoding the analog quantity
first into a unit-counting code. For each value of the analog quantity to
be transmitted a counting mechanism counts and cumulates unit increments up to a value nearest to the input quantity. A unit-counting code
is not efficient for transmission since the number of binary digits sent is
large. It can be converted into a binary digital code by successive scale
of two counting dividers (see Vol. 2, Chap. 20). If other codes are desired
a further conversion can be made.
3. A third basic method uses the general principle that any decoder
may be used for encoding by associating it with an appropriate inverse
feedback path. An arbitrary code indication is set up, say the last previous
transmission. This is decoded, and the result is compared with the input. The inverse path mechanism uses the error to step the code in the
direction to reduce the error. The stepping mechanism continues until the
error is less than the smallest choice interval.
Coding mechanisms in which the present output depends on more than
the single present input exhibit a greater variety of types and will not
be discussed here.
Processes of Digital Decoding and Smoothing. The decoder in
general has two broad functions.
1. Decoding proper is to convert the digital indication into an analog
indication. In nearly all cases this appears as an individual analog indication when the code is received. In some cases this is the only function
needed.
2. To hold or store and possibly smooth the analog indications are needed
where individual indications are required at more frequent intervals than
the code permits, or where continuous analog indications are required.

DATA TRANSMISSION

18-09

The decoding function proper may be classified by types of mechanism,
as for the encoder. For the moment these are limited to the case where
a single input leads to a single output.
(a) In a basic type of decoder the choices indicated by the respective
binary digits lead to the single element of an arbitrarily prearranged matrix.
This element translates to its prearranged analog output.

M

FIG. 3.

Relay matrix for 3-digit code.

An example of this is shown in Fig. 3 in terms of relays. The respective
digits operate relays 1, 2, and 3. Any given choice leads to some resistance
of the matrix M. These are chosen in advance to yield the desired
analog voltage V at the output, for the given choice.
(b) A variation of this is applicable to codes where the successively
ordered digits contribute proportioned weights to a cumulation of the

1~
2~
3~
FIG. 4.

Relay matrix for 3-digit binary code.

analog total. This occurs in the binary digital code. An example is shown
in Fig. 4. The successively ordered digits choose respective resistances
Rl, R 2 , and R 3 • These are proportioned to cunmlate currents, in the
progressive ratios of 4, 2, and 1, in the output resistance, which must be

18-10

INFORMATION THEORY AND TRANSMISSION

low compared to the R's to keep the contributions independent. The
output voltage V gives the analog for the binary digital choice.
(c) If instead of current contributions, successive pulse counts are
cumulated, the process leads to a translation from a binary digital to a
unit-counting code. This can then translate further to the final analog
quantity. Such an arrangement is the inverse of the second basic encoding
process.
(d) Finally an encoder may be inserted in an inverse feedback path
for conversion into a decoder. An arbitrary analog output is set up, say
the last value just previously decoded. This is encoded, and the code
indication is compared with the present input code. The inverse feedback
path uses the difference to change the analog value in the direction to
reduce the difference. Several steppings of the code may be needed until
identity is secured. This basic method is not easily applicable to arbitrary
codes.
Holding the analog signal requires some form of temporary storage
(Ref. 13). Where the error objective calls for a more accurate interpolation
between the discrete values, still more equipment is needed. The process
has been here called interpolation, but it is clear that after one discrete
value has been obtained and before the next is available, the process really
required is extrapolation or prediction.
The principles are described in Chap. 17 for the optimizing properties
required in the above processes. More than just an electrical filter may
be needed, because of the discrete character of the values.
Where the data are such that the best correlation occurs between successive values of the analog variable, a mere holding, or zero order prediction,
is optimum. Where, as is quite possible in practice, a good correlation holds
between successive rates of change (or velocities), a first order predictor
is better. This predicts from the velocity as derived from past data. Where
a good correlation held on the accelerations, a second order predictor would
be called for.
This second function of the decoder may be used by itself in cases where
the data were merely sampled as discrete analog data, and not digitally
encoded.
Error Detection Codes. It is possible to introduce deliberate redundance into the code used in the data transmission path. This establishes
auxiliary relationships. At the receiver a test may be made for these
auxiliary relationships. When they are found missing, the fact is an
indication of error in the transmission.
A simple form of this redundance is the parity chec7~ (Ref. 14). For
this the message is divided into successive groups of binary digits, and an
extra digit is provided at the end of each group for the redundant informa-

DATA TRANSMISSION

18-11

tion. The number of marks in the group is noted as being even or odd.
If even, the added digit is made marking; if odd it is made spacing, to
make the total always odd. Hence the reception of a total even number
of marks indicates an error. Undetected errors can exist when two errors
conspire to main tain the total odd. The system can also be arranged
to make the correct total always even.
Another example is the 2 out of 5 code. Here 5 binary digits are always
disposable for the signal, and 2 of these are always made marking. This
gives 10 combinations, which is very handy for translation from and to a
decimal digit code. When the receiver receives any other than 2 marks,
it indicates an error. Two errors can combine here also to evade detection.
The code may also be used with 3 marks out of the 5 binary digits.
The redundance may be increased to the point where the specific digit
in error may itself be located in the signal, and therefore corrected. This
is an error-correcting code. Some combinations of errors exist that can be
detected by this, but not corrected. Even rarer combinations are possible
which evade detection.
Modulation and Multiplexing Methods

Several steps must be considered in these processes.
Baseband Signal. The information which is to be transmitted from
one point to the other eventually appears in the form of an electric amplitude (say a voltage or a current) before it is propagated over the transmission medium. In a continuous analog system it appears, say, as a
continuously varying d-c voltage. In a discrete analog system it appears
as a succession of pulses of varying voltage amplitudes. In a binary
digital system it appears as a succession of pulses each individually of
either marking or spacing voltage amplitude.
The signal in this form is called a baseband signal. It has a frequency
spectral distribution of power which goes down to and includes zero frequency (or d-c). Its amplitude distribution depends upon the shaping
of the individual pulses or shape factor (where pulses are involved) and upon
the sequence of amplitudes which codifies the information or discrimination
factor (Ref. 15).
Nyquist has shown (Ref. 15) that the complex amplitude at each frequency
is equal to the product of these two factors, each of which is complex. In a
code that gives sufficient randomness to the signal, and that permits
positive and negative voltage values, with an average of zero, the long time
average of the power distribution in the discrimination factor is flat over
the frequency range. In such a system, therefore, the signal power distribution is equal to that for the shape factor, or for a single pulse (aside from a
normalizing factor).

18-12

INFORMATION THEORY AND TRANSMISSION

For an idealized pulse with rectangular sides, such as shown at A in
Fig. 5, the frequency band is infinite, as illustrated by the full line. However, most of the power is located below the frequency l/T, where T
is the pulse duration.
Practical pulses are in general rounded somewhat as shown at B of
Fig. 5. For these, all but a negligible proportion of the power is located
1.0 I"'III::"""----r----,.-----r---~--___.

... 0.75

Q.)

~

0

c.

roc

.~
Ul

0.5

Q.)

>

~

co

Qj

a::

0.25

FIG. 5.

Power spectra of various pulses.

below frequency l/T, as illustrated by the dotted line. The pulse form
at C indicates that obtained from the full line spectrum, cut off to zero
above frequency l/T.
NOIninal Effective Band. Nyquist h,as further shown (Refs. 10 and
15) that the minimum frequency band required to transmit independent
amplitudes for each pulse, where the pulse separation is 0, is 1/(20). This
may be called a nominal effective band. In practice a somewhat larger band
is generally used. If the successive pulses are set up edge to edge (that is,
o made equal to T), then in the full line of Fig. 5, the nominal effective
band reaches from 0 to 1/(2T). The band actually used in practice usually
reaches to l/T, or twice as far. The part of the band between 1/(2T) and
l/T transmits a portion of the signal that has low power, and may be
designated as rolloff band.
Occasionally a narrower rolloff band than that reaching to l/T may be
used. This results in oscillatory transients or "ringing" before and after
each pulse of the signal.
Where short pulses are transmitted at infrequent intervals, 0» T. In
such cases a wider frequency band is used than necessary for the informa-

DATA TRANSMISSION

18·13

tion, and liT» I/O. Additional pulses from other channels of information can be interlaced in between, to use the frequency space more fully.
It is found in Sect. 3, Tolerances, that this leads to the need for high
fidelity in the transmission.
All1plitude Modulation. Except for direct wire transmission over
short distances it is not usually practicable to transmit a baseband signal
of the spectral distribution illustrated in Fig. 5. This is because it involves
transmission all the way down to and including d-c .

...

Q.I

3:

o

C.
Q.I

::ro

:0:;

ai
0::

Frequency
FIG. 6.

Spectra of baseband and carrier pulses.

A simple procedure to avoid the need of d-c is to use the baseband
signal to form the envelope of a carrier wave, as shown at B in Fig. 6.
Here the spectrum becomes that of the carrier frequency and two symmetrical sidebands. Each of these has the same shape as the baseband. The
baseband or envelope signal is recovered at the receiver usually by the
use of a rectifier.
Certain precautions are needed when using a carrier signal in this way.
Interferences are likely to develop between the lower sideband and the
baseband, if the carrier is placed at such a low frequency that they overlap.
Each of these interferences can, if needed, be reduced at the source. The
more usual procedure, however, is to allocate the spectrum to avoid such
an overlap.
Another point to. be noted is that over certain types of telephone facilities, in which another carrier wave is used for the transmission, the data

18-14

INFORMATION THEORY AND· TRANSMISSION

signal carrier may not be reproduced at its exact frequency. The received
signal may be displaced up to two cycles per second from that transmitted
(in some older facilities this may be some 20 or 30 cycles per second). Thus
the system cannot be designed in such fashion that the reproduction of
this exact frequency is critical. For example, the carrier frequency cannot
be depended upon for use as a synchronizing frequency.
Vestigial Sideband Transmission. The information carried in one
sideband is duplicated in the other. Thus only one of the two is necessary

Carrier

FIG. 7.

Vestigial sideband spectrum.

for transmission, and a saving in frequency space required in the transmission medium is achieved by suppressing the other sideband. However, cutting off a band sharply at the carrier is difficult in data transmission, where the power spectrum contains frequencies close to and including the carrier. To solve this Nyquist (Ref. 15) has indicated a sloping
cutoff as shown in Fig. 7. This retains a "vestige" of the suppressed band,
and is, therefore, called vestigial sideband transmission. The cutoff through
the carrier region introduces interfering spurious components to the signal.
These are called "quadrature" components because in their interfering
effect they add in quadrature to the undistorted signal. The interference
usually causes an impairment in signal-to-noise ratio, which is discussed
more fully in Sect. 3, Quadrature Component in Vestigial Sideband TransmISSIOn.

When all the precautions which have been discussed are allowed for,
it is found that with double sideband modulation a signaling speed of about
650 signal elements (each of duration T = 1.54 milliseconds in Figs. 5
and 6) per second may be transmitted over a substantial proportion of telephone circuits (Ref. 16).
With vestigial sideband transmission about 1600 signal elements per second
have been transmitted over selected and suitably treated telephone circuits
(Ref. 17). This is slightly. more than double that with double sideband.
The increase comes mostly from the use of the vestigial band, but in part
from selection and treatment of circuits.

DATA TRANSMISSION

18-15

Frequency Modulation. Characteristics of the carrier wave other
than its envelope amplitude may be varied in accordance with the baseband signal. A common example is the variation of its instantaneous
frequency. This can have certain advantages, for example, when transmitting over a medium whose amplitude response at the receiver varies from
instant to instant. A more detailed analysis also shows that transmission
by frequency modulation is less subject to impairment from noise than by
amplitude modulation (Ref. 18).
Other Methods. Still other characteristics of the carrier wave may be
varied to indicate the signal (Ref. 19). Phase modulation may be used.
Or the data signal (whether itself constituted of pulses or not) may be
transmitted over a medium that uses a pulse code form of modulation
(Ref. 20). In this case the instantaneous amplitudes of the data signal
are reproduced by a secondary pulse code. However, the requirements
for this mode of transmission have not as yet been worked out. The range
of different possibilities is very great.
Frequency Division Multiplex. In Fig. 6 a second carrier with its
sidebands may be placed at a higher frequency than B, and transmit an

....

CLl

::

a
c.
CLl

.2:

ro
(j)

a:

Frequency

FIG. 8.

Carrier signal spectra, frequency discrimination.

independent signal. More carriers can, of course, continue to be added,
as suggested in Fig. 8. The limit is the frequency band available on the
transmission system. This is known as multiplexing by frequency division.
A consequence of this form of multiplexing lies in modulation products
which are generated between pairs (and larger groups) of the simultaneously
operating channels. These products can cause interference into the channels in which they fall. The modulation arises from nonlinearity in the
transmission process, at a possible variety of points according to the
details of the facility used. Engineering precautions are needed to keep
the interference down to an acceptable level.
Thne Division Multiplex. Where a signal uses a basic pulse which is
repeated at much longer intervals than its own duration, other independent

18-16

INFORMATION THEORY AND TRANSMISSION

signals may use pulses which are inserted intermediate between these. A
scheme of five such channels is indicated in Fig. 9. The limit is, of course,
fixed by the relative durations of the individual pulses, the spacing interval
between pulses of the same channel and the guard space which is required
between pulses of one channel and of its nearest neighbors to prevent
mutual interference;
The choice of whether frequency or time division multiplexing is preferable in a given case depends upon the nature of the transmission impair-

1l--.JlL...--___.-JnL...--____

n

~

~

~

15Jl

n

Il~

______

n~

n

n

____

~

11

Time

FIG. 9.

Pulse signal profiles, time discrimination.

ments to be expected and upon the relative costs. Both methods are in
extensive use.
Other Forllls of Multiplexing. In a generalized study of multiplexing
(Ref. 21) it is found that n independent channels can be multiplexed on a
given signal through the superposition of n mutually orthogonal functions.
The arrangements discussed above represent two possible solutions, but
there are many others. A single example is the use of independent amplitude modulation channels on the sine and cosine waves of a carrier.
Although these other orthogonal function solutions offer possibilities in
the art, they have not at the present time received as much design effort
on actual embodiments as have the frequency division or the time division
multiplexing.
Auxiliary Signals

It was noted in Sect. 1, Nature of the Data, that when the data are
presented in certain ways it is desirable to include some starting or other
auxiliary information to mark out specific blocks of the information or to
give other reference conditions. This auxiliary information needs to be
distinguished in some way from the primary information. It comes
regularly in the organization of the transmission, so that while the system
is in normal operation the distinction need not be particularly conspicuous.
However, for one cause or another the transmission may occasionally be
interrupted. When this occurs, reestablishment is likely to be quicker, the
more distinctive the auxiliary signals are.

DATA TRANSMISSION

18-17

lYluliiplexing the auxiliary signal with the principal signal may be done
in a large variety of ways. A simple form is used in the standard teletypewriter. Here the distinction is secured by setting up a pattern of
marking and spacing in the binary signal that is not duplicated in any

FIG. 10.

Stop and start pulses in teletypewriter signal.

portion of any character. As indicated in Fig. 10, this pattern consists
of a marking stop signal that in duration is equal to or greater than 1.4
signal elements, followed by a spacing start signal of one signal element
duration. At a and b are shown stop signal elements of minimum duration,

Frequency

FIG. 11.

Double sideband signal with auxiliary word start channel.

at c one of longer than minimum duration. There is, of course, some possibility that the excess over the minimum duration would bring the stop
signal to 2 signal elements. This could then be duplicated in portions of
characters, which could then be confused with the stop-start pattern.

FIG. 12.

Use of amplitude discrimination for word start channel.

This does occur on occasion, and the return to normal operation takes
several characters.
At the other extreme, distinctiveness in the auxiliary signal is obtained
by the use of an extra, narrow band carrier channel for it. One example
of this is illustrated in Fig. 11. Another method is to use amplitude
discrimination (Ref. 17). This is illustrated by the signal of Fig. 12.

18-18

INFORMATION THEORY AND TRANSMISSION

With this arrangement the amplitude range permitted for the signal is
reduced in comparison with the power capacity of the transmission
medium, since the maximum capacity is used for the auxiliary signal.
Thus the effective signal-to-noise ratio is less than it could be if the principal
signal utilized the full capacity.
There are numerous other methods of introducing the auxiliary signals.
3. TRANSMISSION IMPAIRMENT
Electric circuits do not reproduce signals with complete fidelity. A
basic element of the engineering of a system consists, therefore, in establishing tolerances on the permissible impairment of the signal consistent with
acceptable performance.
The influence of limitation of the frequency bandwidth has already been
discussed in Sect. 2.
Noise

No electric communication circuit is ever free from varying currents and
voltages which are uncorrelated with the transmitted signal (except
possibly in some statistical manner) and which tend to be confused with
it at the receiver. These erratic waves have been perceived and studied
in telephony. They have there been named noise because they end in
audible noise in the receiver. The term has, however, been generally
extended in the art to cover the effect in other types of communication.
Since noise is unpredictable in detail, it has in general to be dealt with
in a statistical manner. Extensive study has been made of its statistical
and other properties (Ref. 22). The discussion here is confined to a simple
exposition of what one can expect in signal transmission media.
Single Frequency Noise. The noise wave may consist of a sustained
single frequency. On the time scale this is a simple harmonic variation in
voltage or current. On the frequency scale it is a single line spectrum.
Single IITIpulse Noise. On the other hand, the wave may consist of a
sharp impulse at a given time. On the frequency scale it consists of a
density of components which is uniform in amplitude out to some frequency beyond which it drops and approaches zero. This frequency depends upon the duration of the impulse. For an infinitesimal duration
the frequency is infinity. The phases of the components are closely correlated.
CUITIulation to Gaussian Noise. It is clear that the single frequency
and the single impulse represent opposite extreme types of noise. In
practice one can encounter a cumulation of a number of different single
frequencies, each of different amplitude and phase. One can also encounter
a cumulation of different single impulses, each of different amplitude and
timing.

18-19

DATA TRANSMISSION

Each of these cumulations, as it becomes more extensive, and with
sufficient randomness in its components, approaches "white" Gaussian
noise (Ref. 23). White Gaussian noise may be defined in simple terms as
random noise which has a Gaussian distribution of amplitude in the time
domain and uniform distribution of power in the frequency domain. (To
keep the total power finite, the uniformity need extend only somewhat
beyond the frequency range under consideration for the signal.) The
term "white" stems from its analogy to Rayleigh-Jeans radiation in optics
(Ref. 24). This is somewhat of a misnomer, however, in that this radiation
l.0
R

1 .1

.

aylelgh distribution

(\

0.8

I

\1;1

I
I
I

,

"C

\

\
\

I

~

I

:cro

a:

\

I

cQ)

-g

\\

I

~0.6
en

\

!

0.4
Gaussian _
distribution

V ,1\

J

0.2

o ~
-3

,/

-2

II

,

\
\
\

\

\

I

-1

)

0

1

\
\

\

~~
2

3

Amplitude, ratio to rms

FIG. 13.

Normalized Gaussian and Rayleigh noise distributions.

has a uniform distribution of power in the frequency spectrum. A white
radiation used by colorimetrists has uniform power distribution in the
wavelength spectrum. Thus Rayleigh-Jeans radiation, which has more
power than this in the blues and less in the reds, is not white but really blue.
The Gaussian distribution in a normalized form is expressed by the
equation
(1)

P

dI _
(~)
Vk Vk -

~

-I2/2k.

V27rk e

This is the probability that the instantaneous amplitude lies between

18-20

INFORMATION THEORY AND TRANSMISSION

(I + dI) / 0, where k is the normalizing constant, equal to
the mean square value of I.
A plot of this normalized distribution is illustrated in the full line of
Fig. 13. The normalization consists in referring to the amplitude I in
terms of its ratio to the rms value 0.
Noise encountered in practice is rarely apt to be exactly anyone of the
three types which have been described. These are, however, much used
as idealizations for mathematical and engineering purposes.
Non-White and Non-Gaussian Noise. On occasions where the
noise actually encountered is sufficiently different from the idealization to
influence a conclusion it is necessary to deal with the non-white and nonGaussian noise as such.
The deviation of noise from the idealized white Gaussian may be characterized in a number of different ways. It may be characterized by size,
as large or small; toward single frequency idealization or toward impulse
idealization; by variation in spectral distribution or in distribution of
amplitudes as a function of time; or other characteristics.
One frequently used variant is noise obtained from Gaussian noise by
envelope rectification. This follows a Rayleigh distribution of amplitudes
(Ref. 23). The Rayleigh distribution in normalized form is expressed by
the equation

1/0 and

(2)

1>0
= 0,

1<0,

where the symbols have the same meaning as in eq. (1).
A plot of this distribution, in the normalized form, is illustrated in the
dotted lines of Fig. 13.
A second frequently used variant is filtered white Gaussian noise. This
modifies the power spectrum of the noise. A special case occurs when the
filter has a passband which becomes narrow compared to the' spectrum
of the signal which is being disturbed. In this case the noise approaches
single frequency noise.
A third variant is the impulsive noise encountered in communications
circuits. This is cumulated from single impulses of the type which have
been considered. However, the number cumulated is small enough and
not sufficiently random in amplitude or time of occurrence, so that the
distribution of amplitudes (including zero amplitude, which is important)
is not Gaussian. This can occur for example in telephone circuits exposed
to some forms of dial-switching equipment or to static. It is necessary

DATA TRANSMISSION

18-21

for close engineering in such cases to determine the exact distribution of
amplitudes, and sometimes of timing instants.
Influence of Noise on Error. The effect of noise, of course, is that it
changes the received wave and tends to cause the signal for one set of
data to be confused with that for another.
CASE 1. Analog Data. The error may vary continuously from zero to
large amounts. One simple method of expressing the error is in terms of
its rms value. There are other methods (see Chap. 17), but they usually
lead to more complicated techniques of engineering. The optimum noise
performance may be obtained in a given system when both signal and
noise are filtered through the optimum filter (see Chap. 17).

t

Amplitude

FIG. 14.

a
Time~

Effect of noise in causing errors in data pulse signals.

CASE 2. Digital Data. The received signal wave may vary over a
range without any misinterpretation resulting (Ref. 25). When the departure exceeds this range, however, a marking signal may be misinterpreted as a space, or a space as a mark.
A simple illustration of this is shown in the baseband signal of Fig. 14.
At a is the assumed signal, consisting of a space, a mark, two spaces, two
marks, and a space. The interpretation of marking or spacing is assumed
to be made according to whether the wave amplitude falls above or below
the critical value b, at sampling instants which are designated on the
line b. The effect of a few noise pulses/are indicated by dotted lines on a.
In the simple baseband system shown, the critical level b is at half the
marking level, or 6 db below marking. Thus the critical signal-to-noise
ratio, in terms of marking level to noise peaks, is 6 db. For a higher signalto-noise peak ratio, no errors in transmission are caused by the noise. For
a lower ratio, errors appear.
Because of the erratic nature of noise, this is not always a convenient
specification. If the noise has, say, a Gaussian distribution, no matter
what the critical level b is, there is some finite probability that it will be
exceeded (with the appropriate polarity) and cause an error. The engineering of the system then consists in first setting an acceptable error performance (see Sect. 1, Error Standards). This sets the acceptable probability
for the existence of noise pulses of a given polarity 6 db below marking

18-22

INFORMATION THEORY AND TRANSMISSION

level. In Fig. 13 one can tell, for Gaussian noise, how far above rms a
level must be to occur with any given probability. This indicates how
far below the critical level b the rms of the noise must be set. If that
ratio is translated into db, then by adding 6 db one obtains the figure which
must be specified for the signal- (marking level) to-noise (rms) ratio for the
system, in order to meet the desired error performance.
Error probability, 1 part in

10 8

24

22
>
~ 20

Q;

tlO

:ii

18
@ 16
E
3: 14
o
~ 12

r-~

1--.-

106

107

-

-r- r-1-

I'-~

10 3

10 4

105

102

Gaussian noise,

-r-- ~'r-.

1-_

r--..

'''1--

Rayleigh /
noise -

.I
-b: ~

-,~~

a)
(f)

"

8

1\ \

6

E4
2
o

~

~~.

:§1O

'g

10 5 2 1

10- 6

\
1\

0::

I'

10- 5

10- 4

10- 3

10- 2

10- 1

1 2 5 10

100

Error probability, per cent

FIG. 15.

Error probabilities from Gaussian and Rayleigh noise distributions.

For convenience this ratio in decibels has been plotted as the solid line
in Fig. 15. A dotted line is shown for the case of noise having a Rayleigh
distribution.
If the noise is of the impulsive type it is usually impractical to specify
it in terms of its rms value, as a ratio to the signal. In such cases it is
apt to be expeditious to make measurements on the noise itself, to determine the amplitude of its peaks at the error frequency which has been set.
Then the marking amplitude can be set 6 db above this.
The noise margin of 6 db which has been discussed here is a basic figure.
If the signal has three levels to be recognized, as in Fig. 12, the figure
has to be increased. It will also be found (see below) that other margins
need to be added to allow for other impairing effects on the signal.
Echoes and Equalization
Echoes and Transfer Functions. In general, no transmission medium
reproduces a sent signal wave shape exactly in all its characteristics. The
departures from exact faithfulness can be considered from two points of
view, sometimes one being more convenient and sometimes the other.

DATA TRANSMISSION

18-23

According to the first point of view the departures may be considered
as a succession of more or less delayed echoes of the original wave. Some
of the echoes are of the same polarity as the original wave, and some of the
opposite polarity.
According to the second point of view the signal and its transmitted
reproduction may be analyzed into their Fourier transforms. Each Fourier
component of the reproduction is obtainable from that of the original
by multiplication by an amplitude response factor and displacement by a
phase shift. The factor and phase shift vary from frequency to frequency
over the spectrum of the signals. Since the Fourier transform is unique,
the complete description in a linear system of a given case according to
the one viewpoint can also be inatched exactly by a complete description
according to the other viewpoint. Whichever one is used is then simply a
matter of engineering convenience.
Practical experience indicates that the echo treatment leads quickly to
equalizer designs. This is because it forms an immediate bridge to functions of frequency, and equalizer designs are simple in such terms. The
design of filters and equalizers from transfer functions of time is usually
far more cumbersome.
Equalization. In practical transmission systems these distortions are
usually reduced by what is called equalization. A network is placed in
tandem with the system which again multiplies all the Fourier components,
each by an amplitude response factor, and displaces each by a phase shift.
The response factor of the network is designed to vary in the inverse way
from that of the system, so that the products of the two are as nearly as
possible constant over the frequency spectrum. For this reason the network is called an equalizer and the process called equalization. The phase
shifts are designed to add together to a total phase shift which is proportional to frequency. Since the perceptive mechanism of the ear is not
very responsive to phase shifts, the equalization of telephone circuits has
generally concerned itself almost exclusively with an equalization of the
amplitude response factor. This unconcern with the phase correction is
occasionally of importance in the use of telephone facilities for data transmission. It sometimes requires the insertion of phase correcting networks
to supplement the amplitude correction already existing for the telephone
use.
It is not usually possible or economically feasible in practice to correct
a system exactly. The considerations which are given below can apply
equally well to a residual departure, left after such correction has been
applied as is practical, or to an uncorrected facility.
hnpairlllent of Noise Margin. It is clear that an echo partakes of
one property in common with noise, i.e., it changes the received wave and

18-24

INFORMATION THEORY AND TRANSMISSION

tends to cause the signal for one set of data to be confused with that for
another. Where a given deviation has previously been set as acceptable,
the echo uses up some of this possibility for deviation and leaves less of it
as an allowance for the noise. In this sense, therefore, it impairs the noise
margin of the system. Where, as was noted above, a margin of 6 db was
necessary for the marking signal level over individual noise peaks to just
avoid potential error, a greater margin is needed in the presence of echo.
The amount of this excess margin which can be allotted to echo, in any
given case, is a matter of engineering judgment. It depends upon the
relative costs of reducing the echo and the alternative of reducing the
noise. One may say that, in general, an inGrease of 1 db in margin is small,
and that as severe f1limitation as this on the echo is economical where the

Time-

FIG. 16.

Close-in and remote echoes in signal.

echo is fairly easy to reduce. At the other extreme one may say that an
increase of 10 db in margin is fairly large. I t is likely to be economical
only where it is quite difficult to reduce the echo, or where the noise expected is very low. In the engineering of data systems it is convenient to
consider steps respectively of 1, 3, 6 and 10 db in the noise margin impairment.
The amplitude of echo that can cause a given impairment depends upon
how much it is delayed with respect to the original signal. As an illustration the signal A in Fig. 16 may be followed by a comparatively long
delayed echo at B. Here the impairment depends upon the echo amplitude,
and it does not vary much with small changes in the echo delay.
The signal may also be followed by a closely spaced echo as at C. The
sum of signal and echo is shown at D. Here the major effect of the echo
is to change somewhat the wave shape of the signal, but mostly it changes
signal amplitude. A substantial part of the effect of the echo consists
merely in changing the effective loss of the transmission facility somewhat.
This part of the effect can be compensated for by an adjustment of receiving
gain. The impairment from a closely following echo of a given amplitude
is less than from a long-delayed echo of the same amplitude. Also the
impairment can be expected to vary rather rapidly with delay, for the
short delays.
Relationship between Echoes and Equalization. This relationship,
suggested above, may be examined quantitatively (Refs. 7 and 26).

DATA TRANSMISSION

18-25

Consider a single Fourier component of the signal voltage, of frequency
w/27r

v = cos wt.

(3)

When this is transmitted over a system that generates an echo of relative
amplitude a and relative delay T, it becomes
(4)

(5)

v = coswt

v

+ acosw(t -

T)

+ a cos wt cos WT + a sin wt sin WT,
+ a cos WT) cos wt + a sin wt sin WT.

=

cos wt

=

(1

If the overall transmission is designated as

v = r(w) cos [wt - 

,///

,/

Frequency

(a)

FIG. 17.

/

/'

/'

Frequency

(b)

Nominal
effective
band= 21T

Phase shift characteristics: (a) remote echo, (b) close-in echo.

listed in the third column as an echo tolerance. It is converted into
decibels in the fourth column.
The tolerance may be placed instead on the ripple amplitude in the amplitude response characteristic. The numerical figure of the third column
expresses this ripple excursion, in each direction, in nepers. It is converted, in the fifth column, into decibels.
The tolerance may also be placed on the phase shift ripple. For this
purpose the quantity of the third column is assumed as measured in
radians. For convenience it is converted into degrees in the sixth column.
So far, nothing has been said concerning the absolute propagation time
of the system. When this is taken into consideration, it is found that the
phase ripple really occurs about a diagonal straight line through the origin,
as illustrated in Fig. 17a.
Where the echo delay is very short, and only a small portion of a ripple
cycle appears in the utilized frequency band, the straight line about which

DATA TRANSMISSION

18-27

the phase deviations are to be taken is not so easily identified. A more
or less arbitrary, but practical construction is given in Fig. 17b. Here
the straight line is drawn to intersect the actual phase at the frequency
which marks the top of the nominal effective band. This is the reciprocal
of twice the signal element duration. The phase departure is taken as
the maximum double excursion from this straight line, as indicated by
~cp in the figure.
Note that in Fig. 17a the excursion ~cp measures double the ripple amplitude and consequently double the echo amplitude. In Fig. 17b the double
ripple amplitude is not accessible within the scope of the plot, and is larger
than i1CP. Figure 17a represents a remote echo, such as at B in Fig. 16,
and Fig. 17b represents a close-up echo, such as at C in Fig. 16.
It is then clear that for a given excursion ~cp the echo amplitude in Fig.
17b (close-up echo) is larger than for Fig. 17a (remote echo).
This variation of the actual echo amplitude for a given ~cp corresponds
approximately to the variation in permissible echo amplitude for a given
impairment suggested in Fig. 16. Thus (Ref. 7) the specification of the
phase departure, for an allotted impairment, is roughly independent of
whether the departure occurs as a single long bend such as in Fig. 17b,
or as a fine structure ripple such as in Fig. 17a.
The tolerances which are listed in the last two columps of Table 1 were
set for a binary digital transmission. Tolerances for a continuous analog
transmission have in practice been found to be of the same order of magnitude.
However, tolerances for a discrete analog system, with time division
multiplexed channels interlaced, need to be much more severe (Ref. 27).
Envelope Delay Distortion. In practice it is often convenient to
measure the phase shift characteristic of a transmission system in terms
of its envelope delay. This represents the time of transmission of the
envelope of a carrier, as the carrier frequency is varied through the spectrum. It is measured as (Ref. 28)
(12)

D = dcp/dw,

where D = envelope delay, seconds,
cp = phase shift, radians,
w = radian frequency, radians per second.
When this differentiation is applied to the simplified eq. (10) the result is
(13)

D = (d/dw)

a

sin WT =

aT

cos WT.

18-28

INFORMATION THEORY AND TRANSMISSION

The double excursions of the ripples in eq. (13) are
llD = 2aT.

(14)

It is found from eq. (13) that the application of a fixed tolerance on the
envelope delay ripple irrespective of the wavelength of the ripple (or corresponding echo delay T) leads to an exaggeratedly severe limitation on
echo amplitude a for large values of T.
In other words the use of the. envelope delay for the purpose of specifying
limits on phase distortion for data transmission tends to be unduly severe on
fine structure excursions in the characteristic. Thus when the envelope delay

criterion is used, it is necessary to be aware of this and appropriately ignore
the finer structure ripples in the characteristic.
In a general way it is found (Ref. 7) that a delay distortion of ±OA
signal element duration gives a noise impairment, under unfavorable
conditions, of about 3 db. If one takes distortions as roughly proportional
to the permissible echoes the tolerance figures are given in Table 2.
TABLE

2.

ENVELOPE DELAY TOLERANCES

Impairment, db
1
3
6
10

Tolerance, Signal Element
±0.15
±0.4
±0.7
±0.9

Quadrature Component in Vestigial Sideband Transmission. An
interfering component similar in certain respects to an echo is generated
by the usual form of vestigial sideband transmission (Sect. 2, Amplitude
Modulation). This is the quadrature component, so called because this
interference adds in quadrature to the otherwise undistorted signal (Ref.
29).
Although the precise wave shape of this interfering component is different from that of an echo, it does use up signal amplitude range in much
the same manner and requires an increase in signal to noise margin.
It has been shown (Refs. 29, 30) that the amplitude of this interfering
component varies according to how much frequency space is allowed to
the vestigial sideband, and, to some degree, to the particular shape of the
cutoff. The wider this frequency space, the smaller will be the amplitude
of the quadrature component. In actual data transmission practice the
vestigial bandwidth used, as measured from the carrier to the frequency
at which the response drops to a very low value, tends to run from some
one-half to one-fourth of the nominal effective band.

DATA TRANSMISSION

18-29

The amplitude of the quadrature component can also be changed by
changing the depth of modulation of the signal. The depth of modulation
is reduced by sending a finite amplitude (instead of the more usual zero
amplitude) of carrier during a spacing signal. This reduces the quadrature
component, and to that extent it reduces the impairment which it causes
in the signal to noise ratio. It does, of course, also reduce the amplitude
range of the signal between marking and spacing, and to that extent also
impairs the signal-to-noise ratio. As the spacing carrier rises, this impair15r-------.-------,--------r-r----~

en

a;
.0

'g
"0

10r-------~-------+--~~-;------~

...:
c:
Q)

E

'ro
a.

.S
Q)

en

'0
Z

0.25

0.50

0.75

1.0

Ratio of spacing to marking signal

FIG. 18.

Noise impairment caused by quadrature component in vestigial sideband
transmission.

ment also rises, and the quadrature component impairment drops. At
some value there is a minimum total impairment. A typical case has been
worked out by ,Sunde (Ref. 30) and is illustrated in Fig. 18.
Level Changes

Transmission systems in general show some variation with time in overall net loss (or gain). This comes from a variety of causes, such as changes
in temperature (and therefore resistance) of conductors, variation in
battery supply, and aging or replacement of vacuum tubes.
Analog SysteIn. An amplitude modulated analog system is especially
vulnerable to received level change. A system engineered to a possible
±5 per cent error does not represent a very high performance. Yet if
all the error is assigned to level change, this is required to be within less
than ±Y2 db. This is a severe requirement for anything but a comparatively short direct wire circuit.

18-30

INFORMATION THEORY AND TRANSMISSION

Because of this, an analog system usually uses a pilot channel of some
type to transmit a reference amplitude. Also, frequency modulation is
often preferred to amplitude modulation. Even in this case, however, in
some carrier facilities, as was mentioned above in Sect. 2, there is a change
in frequency which is analogous to a level change. Thus a pilot channel
may be needed to send a reference frequency.
Digital Systento In a binary digital system a level change cuts into
the signal-to-noise margin somewhat in the same way as does an echo.
If no change is assumed in the critical level distinguishing a mark from a
space, then for the four grades of impairment considered before, the
allowances are given in Table 3.
TABLE

3.

LEVEL CHANGE TOLERANCES

r = Amplitude Ratio

+ 1)/2 =
Amplitude
Tolerance

0.89
0.71
0.50
0.32

0.95
0.86
0.75
0.66

(r

Impairment, db
1

3
6
10

Tolerance,
db
0.5

1.4
2.5
3.6

These are still fairly severe requirements, and usually some compensating
device is provided in the system. This can be an automatic adjustment
of the critical level at a given fraction of marking level, or an automatic
volume adjuster for the marking level, or possibly even both.
If a three-level signal is used, as in Fig. 12, the tolerances are correspondingly more severe.

REFERENCES
1. A. B. Clark, Telephone transmission over long cable circuits, Bell System Tech. J.,
2, 67-94 (1923).
.
J. T. O'Leary, E. C. Blessing, and J. W. Beyer, An improved 3-channel carrier
telephone system, Bell System Tech. J., 18, 49-75 (1939).
H. J. Fisher, M. L. Almquist, and R. H. Mills, A new single channel carrier telephone system, Bell System Tech. J., 17, 162-1S3 (193S).
C. W. Green and E. 1. Green, A carrier telephone system for toll cables, Bell System
Tech. J., 17, S(}--105 (193S).
R. S. Caruthers, The Type N-1 carrier telephone system: Objectives and transmission features, Bell System Tech. J., 30, 1-32 (1951).
2. F. A. Cowan, R. G. McCurdy, and 1. E. Lattimer, Engineering requirements for
program transmission circuits, Bell System Tech. J., 20, 235-249 (1941).
R. A. Leconte, D. B. Penick, C. W. Schramm, and A. J. Wier, A carrier system for
SOOO-cycle program transmission, Bell System Tech. J., 28, 165-1S0 (1949).

DATA TRANSMISSION

18-31

3. F. A. Cowan, Networks for theater television, J. Soc. Motion Picture & Television
Engrs., 62, 306-313 (1954).
S. Doba and A. R. Kolding, A new local video transmission system, Bell System
'Pech. J., 34, 677-712 (1955).
C. H. Elmendorf, R. D. Ehrbar, R. H. Klie, and A. J. Grossman, L-3 Coaxial
system design, Bell System 'Peck. J., 32, 781-832 (1953).
4. R. E. Crane, J. T. Dixon, and G. H. Huber, Frequency division techniques for a
coaxial cable network, Trans. Am. Inst. Elec. Engrs., 66, 1451-1459 (1947).
K. E. Appert, R. S. Caruthers and W. S. Chaskin, Application and transmission
features of a new 12-channel open-wire carrier system, Trans. Am. Inst. Elec. Engrs.,
73, Pt. I, 18-27 (1954).
5. Radio Spectrum Conservation, Report of the Joint Technical Advisory Committee,
McGraw-Hill, New York 1952.
6. A. A. Roetken, K. D. Smith, and R. W. Friis, The TD-2 microwave radio relay
system, Bell System Tech. J., 30, 1041-1077 (Pt. II) (1951).
7. P. Mertz, Transmission line characteristics and effects on pulse transmission,
Proceedings of tke Symposium on Information Networks, April 12-14, 1954, Vol. III,
pp. 85-114, Polytechnic Institute of Brooklyn, New York.
8. W. M. Goodall, Television by pulse code modulation, Bell System Tech. J., 30,
33-49 (1951).
9. B. Lippel, A systematic survey of codes and coders, I.R.E. Convention Record,
Pt. 8, Information Theory, pp. 109-119, 1953.
10. A. E. Laemmel, Design of digital coding networks, Proceedings of the Symposium
on Information Networks, April 12-14, 1954, Vol. III, pp. 309-320, Polytechnic Institute
of Brooklyn, New York.
A. Feinstein, A new basic theorem of information theory, Trans. I.R.E., Professional Group on Information Theory, PGIT-4, pp. 2-22, Sept. 1954.
P. Elias, Predictive coding, I.R.E. Trans. on Information Theory, IT-I, No.1,
pp. 16-33, March 1955.
D. Slepian, A class of binary signaling alphabets, Bell System Tech. J., 35, 203-234
(1956).
C. E. Shannon and W. Weaver, The Mathematical Theory of Communication,
University of Illinois Press, Urbana, IlL, 1949.
R. M. Fano, The transmission of information, Mass. Inst. Technol., Research Lab.
Electronics, Tech. Rept., No. 65, 1949.
11. B. M. Oliver, Efficient coding, Bell System Tech. J., 31, pp. 724-750 (1952).
N. M. Blackman, Minimum-cost encoding of information, Trans. I.R.E., Professional Group on Information Theory, PGIT-3, pp. 139-149, March 1954.
12. R. W. Sears, Electron beam deflecting tube for pulse code modulation, Bell
System Tech. J., 27,44-57 (1948).
13. L. B. Wadel, Analysis of combined sampled and continuous-data systems on an
electric analog computer, I.R.E. Convention Record, Pt. 4, pp. 3-7, 1955.
G. Franklin, Linear filtering of sampled data, I.R.E. Convention Record, Pt. 4,
pp. 119-128, 1955.
S. P. Lloyd and B. McMillan, Linear least squares filtering and prediction of
sampled signals, Proceedings of the Symposium on Network Theory, April 13-15, 1955,
Vol. V, pp. 221-247, Polytechnic Institute of Brooklyn, New York.
R. M. Stewart, Statistical design and evaluation of filters for the restoration of
sampled data, Proc. I.R.E., 44, 253-257 (1956).
14. R. W. Hamming, Error detecting and error correcting codes, Bell System Tech. J.,
29, 147-160 (1950).

18-32

INFORMATION THEORY AND TRANSMISSION

M. J. E. Golay, Binary coding, Trans. I.R.E., Professional Group on Information
Theory, PGIT-4, pp. 23-28, Sept. 1954.
P. Elias, Error-free coding, Trans. I.R.E., Professional Group on Information
Theory, PGIT -4, pp. 29-37, Sept. 1954.
1. S. Reed, A class of multiple-error-correcting codes and the decoding scheme,
Trans. I.R.E., Professional Group on Information Theory, PGIT-4, pp. 38-49, Sept.
1954.
R. A. Silverman and M. Balser, Coding for constant-data-rate systems, Trans.
I.R.E., Professional Group on Information Theory, PGIT-4, pp. 50-63, Sept. 1954.
15. H. Nyquist, Certain topics in telegraph transmission theory, Trans. Am. Inst.
Elec. Engrs., 47, 617-644 (1928).
16. A. 'V. Horton and H. E. Vaughan, Transmission of digital information over
telephone circuits, Bell System Tech. J., 34, 511-528 (1955).
17. J. V. Harrington, P. Rosen, and D. A. Spaeth, Some results on the transmission
of pulses over telephone lines, Proceedings of the Symposium on Information Networks,
April 12-14, 1954, Vol. III, pp. 115-130, Polytechnic Institute of Brooklyn, New York.
18. D. Middleton, On the theoretical signal to noise ratios in FM receivers: A comparison with amplitude modulation, J. Appl. Phys., 20,334-351 (1949).
19. H. S. Black, Modulation Theory, Van Nostrand, Princeton, N. J., 1953.
20. W. M. Goodall, Telephony by pulse code modulation, Bell System Tech. J., 26,
395-409 (1947).
21. N. Marchand, Analysis of multiplexing and signal detection by function theory,
I.R.E. Convention Record, Pt. 8, pp. 48-53, March 1953.
22. P. L. Chessin, A bibliography on noise, I.R.E. Trans. on Information Theory,
IT-I, No.2, pp. 15-31, Sept. 1955.
J. R. Pierce, Physical sources of noise, Proc. I.R.E., 44, 601-608 (1956).
W. R. Bennett, Methods of solving noise problems, Proc. I.R.E., 44, 609-637
(1956).
23. S. O. Rice, Mathematical analysis of random noise, Bell System Tech. J., 23,
282-332 (1944); 24, 46-156 (1945).
24. F. K. Richtmyer and E. H~ Kennard, Introduction to Modern Physics, p. 189,
McGraw-HilI, New York, 1942.
25. B. M. Oliver, J. R. Pierce, and C. E. Shannon, The philosophy of PCM, Proc.
I.R.E., 36, 1324-1331 (1948).
26. P. Mertz, Influence of echoes on television transmission, J. Soc. Motion Picture &
Television Engrs., 60,572-596 (1953).
H. A. Wheeler, The interpretation of amplitude and phase distortion in terms of
paired echoes, Froc. I.R.E., 27, 359-385 (1939).
27. W. R. Bennett, Time division multiplex systems, Bell System Tech. J., 20, 199221 (1941).
28. H. Nyquist and S. Brand, Measurement of phase distortion, Bell System Tech. J.,
9, 522-549 (1930).
29. H. Nyquist and K. W. Pfleger, Effect of the quadrature component in single
sideband transmission, Bell System Tech. J., 19, 63-73 (1940).
30. E. D. Sunde, Theoretical fundamentals of pulse transmission, Bell System Tech. J.,
33,721-788,987-1010 (1954).
31. P. M. Woodward, Theory of radar information, Trans. I.R.E., Professional
Group on Information Theory, PGIT-l, pp. 108-113, Feb. ~953.
32. Guide to application and treatment of channels for power-line earrier, Trans.
Am. Inst. Elec. Engrs., Pt. III-A, 73, 417-436 (1954).

FEEDBACK CONTROL

E.

FEEDBACK CONTROL

w.

M. Gaines, Editor

19. Methodology of Feedback Control, by W. M. Gaines
20. Fundamentals of System Analysis, by S. J. Jennings and

A. A. Winkeljohann
21. Stability, by W. E. Sol/ecito and S. G. Reque
22. Relation between Transient and Frequency Response, by C. E. Bradford

and M. W. DeMerit
23. Feedback System Compensation, by P. G. Cushman
24. Noise, Random Inputs, and Extraneous Signals, by D. L. Lippitt
25. Nonlinear Systems, by W. M. Gaines
26. Sampled-Data Systems and Periodic Controllers, by J. E. Barnes, Jr.

E

FEEDBACK CONTROL

Chapter

19

Methodology of Feedback Control

w. M.

Gaines

1. Symbols for Feedback Control

19-01

2. General Feedback Control Definitions

19-04

3. Feedback Control System Design Considerations

19-12

4. Selection of Method of Synthesis for Feedback Controls

19-19
19-21

References

1. SYMBOLS FOR FEEDBACK CONTROL

Alphabetical List by Letter Symbols

Terminology given in Table 1 is for feedback control covered in the
following chapters. In the case of specific physical examples, the terminology of the particular field from which the example is taken will be used;
for example, in an electrical example, e may be used for voltage and i
for current. The last column of the table lists the chapter where the symbol is first used. This reference may be useful to the reader for looking up
discussions of the various quantities.
The nomenclature used is patterned after the standard nomenclature
and symbols of the American Standards Association (Ref. 1). Capital
letters will be used to represent the Laplace transforms of the time functions; for example, A(s) is the Laplace transform of aCt). An asterisk (*)
indicates that the quantity is in sampled form; for example, e*(t) is the
sampled form of the signal e(t).
19-01

TABLE 1. LETTER SYMBOLS FOR FEEDBACK CONTROL
Symbols
a

A
aCt)
b,B
B
bet)
c(t)
c*(t)

d,D
D
D(s)
e(t)
e*(t)
J
J(t)
get)

GD'(lxl, w)
or GD
h(t)

Use or Term
Arbitrary constant and/or coefficient for differential
equation
Arbitrary constant for time response equation
Impulse response of reference input terms (function
of time)
Arbitrary constant for time respons,e equation
Magnitude of deadband
Primary feedback variable (function of time)
Controlled variable (function of time)
Sampled form of c(t)
Arbitrary constants for time response equation
Magnitude of negative deficiency; a denominator
term; used also as a subscript
Polynomial in s usually the denominator
Actuating signal (function of time)
Sampled form of e(t)
Frequency, cycles per second (see definition of w)
Arbitrary variable (function of time)
Impulse response of forward element (function of
time)
Describing function

£
£-1
m
met)

Impulse response of feedback elements (function of
time)
Magnitude of hysteresis
ith term in a series, used as subscript
Ideal value of the ultimately controlled variable
(function of time)
Complex number, Y=1
kth term in a series, used as a subscript
Gain constant for system
Dynamic error coefficients, subscript indicates associated derivative
Static position error coefficient
Static velocity error coefficient
Static acceleration error coefficient
Denotes application of the Laplace transform integral
Inverse Laplace transform
Used as a subscript to denote mth term in series
Manipulated variable (function of time)

M

Magnitude of

Mm

Maximum value of

n
net)

Use as a subscript to denote nth term
Output of nonlinear element
Particular value of n; a numerator term; a subscript
Polynomial in s usually the numerator

H
i
i(t)

j
k

K
Ko, K 1 , K 2 ,
etc.
Kp
Kv
Ka

N
N(s)

~ (jw) , i.e., I~ (jw) I

I~ (jw) I

19-02

First
Used

Chap. 20
Chap.
Chap.
Chap.
Chap.

25
20
20
26

Chap. 25
Chap. 22
Chap. 20
Chap. 26
Chap. 20
Chap. 25
Chap. 20
Chap. 25
Chap. 20

Chap. 20
Chap. 20
Chap. 20
Chap. 20

Chap. 20
Chap. 21
Chap. 21

Chap. 22

TABLE

Symbols
p,p
pn

2

p(x)

p

P(x
q(t)

ret)
R

tT
td
tp

t8
T

u(t)
vet)
wet)
x(t) }

yet)

ye(t)
Yd(t)
z(t)

z
Zn

Z
a

'Y

8

A
~

f
e

~}
7r

II(x)
n

=

n)

1.

LETTER SYMBOLS FOR FEEDBACK CONTROL

(Continued)

First
Use or Term
Used
Differential operator, p = d/dt, p2 = d2/dt 2
Chap. 20
nth pole
Chap. 22
Probability distribution of x
Chap. 24
N umber of poles in right half of s-plane
Chap. 21
Probability function
Chap. 24
Indirectly controlled variable (function of time)
Chap. 20
Reference input variable (function of time)
Chap. 20
Number of counterclockwise rotations of a vector
Chap. 21
from -1 + jO to H(jw)g(jw) locus as w varies from 0
to jrxl to -jrxl to -0
Magnitude or level of saturation
Chap. 25
Laplace transform operator = (J' + jw
Roots of numerator of G(s), zeros; Zn also used in this
Chap. 22
case
Roots of denominator of G(s), poles; Pn also used in
Chap. 22
this case
Time, seconds
Rise time, seconds
Chap. 22
Delay time, seconds
Chap. 22
Time to first peak or overshoot of transient, seconds
Chap. 22
Settling time, seconds
Chap. 22
Time constant, seconds
Chap. 20
Disturbance variable (function of time); step function
Chap. 20
Desired value or command variable (function of time)
Chap. 20
Impulse response of given element
Variables used when standard
termmology for feedback systems is not applicable
System error (function of time)
Chap. 20
System deviation (function of time)
Chap. 20
Indirectly controlled system impulse response
Chap. 20
z transform operator
Chap. 26
nth zero
Chap. 22
Number of zeros in right half of s-plane
Chap. 21
Phase angle of closed loop frequency response
Chap. 21
Phase margin
Chap. 21
Increment; Dirac function, impulse function
Chap. 20
Incremental change in variable, usually used Ax, Ay,
etc.
Denotes equality by definition
RelatIve dampmg, damping tactor
Phase of angle of open loop frequency response
Chap. 21
Frequency, damping, used when (J' and ware not applicable
3.14159
Product sign meaning Xl' X2' X3' X4 • ••• • Xn
Chap. 20
Standard deviation (probability);
decrement factor
19·03

Chap. 24
Chap. 20

19-04

FEEDBACK CONTROL
TABLE

~

n

Symbols
(x)

r
cp(r)


-<

o

"T1
"T1

m

m

o0'

»
()
A

()

o

z

~

:::c

o
r-

"P

-'

01

~

TABLE

Type
9. Minimum error criterion

10. Phase margin

11. Gain margin

4.

COMMON PERFORMANCE SPECIFICATIONS

Specified
Transient or frequency response

Frequency response

Frequency response

12. Mm peak

Frequency response

13. Band width

Frequency response

Definition
The response of the system
is adjusted to minimize a
function of the total error
that results from both signal and noise or extraneous signals. The criterion may take several
forms, e.g., min. squared
error, min. absolute error
times time.
Defined as 180 0 + phase shif t
at unity gain of the open
loop frequency response.
Gain margin is ratio of maximum stable gain to actual gain, i.e., gain at
phase crossover.
Ratio of maximum of closed
loop frequency response
to a low frequency value.
Defined variously (a) usually as frequency where
closed loop response falls
to V1or3dbofitslowfrequency value, or (b) sometimes as the frequency at
the significant peak M m,

(Continued)

General Remarks
Used to optimize system response to reject unwanted noise, and pass the true signal. Used
to specify performance index when just signal
is considered. Within basic assumptions
frequency response analysis is very useful.
Used on systems which operate on random
or noisy data, e.g., missile radar guidance and
fire control. Analog computers can be used
to apply criterion to "nonlinear systems.
Used as a rule of thumb in frequency response
analysis to indicate stability and performance. Easy to use and to obtain directly from
frequency response diagram.
Same as 10. Indicates relative sensitivity of
system to gain variations. Can be calculated by Routh's criterion. Not as good a
criterion for performance as 10. Little used.
Used with Nyquist and frequency response
analysis. Rules of thumb relate Mm and
transient overshoot. Easy to calculate from
frequency response diagram.
Used with frequency response analysis and is
related to speed of response of system. Used
also when definite frequency bandpass is
needed for fidelity. M m , bandpass, and the
phase shift at these valves give a good indication of the closed loop response and are
often used when a number of closed loops are

'fJ
.....

0.

."

m
m

o

O:J

»
()
A
()

o
Z

-I

:::c

or-

14. Static error coefficient

15. Dynamic error coefficients (or steady-state
error coefficients)

16. Maximum system
error

Frequency response

Frequency response
and root locus

Transient response

or (c) the crossover of the
open loop response.
Defined as the final error resulting from a continuous input of position, or
velocity, or acceleration,
etc. The magnitude of
the input and the maximum tolerable error must
be specified.
Defined as the steady-state
error resulting from the
derivatives of the input
function. The time function and/or its derivatives must be specified as
well as the maximum tolerable error.

Defined as the maximum
tolerable system error, Yeo
The input function and
opera ting condi tions must
be specified.

operated in tandem as system.
Used to set low-frequency gain of open loop
frequency response. Useful where steady inputs are encountered.
~
m
-t

:c

Relates system gain and time constants to errors arising from higher derivatives of input.
Used to estimate error resulting from varying
input to given system and conversely to determine closed loops pole-zero location to give
desired error. Accurate where input varies
at slow rate compared to bandpass. Becomes
poorer as input varies more rapidly because
of transient effects. Used in analysis of fire
controls, machine controls, etc., where input
varies in an expected manner.
Distinguished from steady-state error because
maximum error under dynamic conditions is
specified. Normally used to define performance with a varying input, e.g., automatic
milling machine control. Not usually used
with simple aperiodic inputs. Used in conjunction with minimum error criterion (9)
to place absolute bound on error.

o
o
o
ro

G>

-<

o
"'T1
"'T1

m
m

o
c;I

}>

n
A

n

o

Z

-t
AI

or-

~

'I

TABLE

4.

COl\nION PERFORMANCE SPECIFICATIONS

Type
17. Resolution

Specified
Low level characteristics

18. Duty cycle

Power element
rating

19. Maximum operating
conditions

Power element
rating

Definition
Defined as the maximum
tolerable change in the input without a change occurring in the output. Input and operating conditions must be specified.
Defined variously, depending upon application. Intent is to define the average power requirement.

Included to indicate the
wide variety of maximum
performance requirements sometimes specified, e.g., maximum velocity, maximum load
torque.

(Continued)

General Remarks
Can appear in various forms, i.e., maximum
position input change required to obtain output change, or minimum velocity at which a
servomechanism will track with tolerable
velocity error.
Objective of specification is to allow more efficient selection and/or design of the power
element. Specification can take the form of
an rms power requirement or where an average does not adequately describe the situation a time distribution and level may be
given. Used extensively when large power
drives are involved.
Many of these limits are implied by other performance requirements. Often necessary to
define implicitly load requirements separately, e.g., load running torques or power (independent of accelerating torques).

'P

-'

ex>

"'T1

m
m

o

~

»
n
A

n

o

z-t

;:0

or-

METHODOLOGY OF FEEDBACK CONTROL

19-19

c. Practical Aspects. The ultimate cost and manufacturability must
be considered during the synthesis. This, of course, implies ascertaining
the physical realizability of the controller and assuring that practical
tolerances are maintained. Reliability and ease of servicing must also
receive proper consideration. Environmental conditions and customer
requirements on component packaging must also be factored into the
mechanical design and may affect the performance.
5. Test and Evaluation of EquipIllcnt. In most cases unpredicted
and secondary effects will require final adjustment to be made after the
actual equipment is assembled. This is often the more economical way
to reach a final design when a wide range of adjustment can be included
in the design or preliminary models can be built and tested relatively fast.
This would be the case for many types of instrument servos. On the other
hand, it would be horrendous to attempt this approach with an elaborate,
expensive missile system which is expended at each test firing. In such
cases the extensive use of analysis and computer facilities to minimize the
testing is justified.
4. SELECTION OF METHOD OF SYNTHESIS FOR FEEDBACK CONTROLS

The major analytical methods available to aid in the synthesis of feedback control systems are summarized in Table 5. No general rules are
available for the selection of the proper method, and the designer should
be familiar with all methods in order to select the one best suited to his
problem. It is often desirable to carry root locus and frequency response
diagrams in parallel. The root locus supplies time domain information,
and the frequency response provides the simplest method of estimating
the method of compensation.
SysteIll OptiIllization

N one of these techniques allows a completely systematic design
approach. The major difficulty is in defining and specifying optimum
performance and determining what performance index to use for evaluation. See Chap. 24. Although criteria have been proposed, the mathematical labor involved in the more sophisticated is prohibitive. Actually
the accuracy and extent of the available data usually warrant the use of
only the more simple criteria. These criteria do not encompass the entire
problem and therefore must be used carefully. The material in the following chapters presents the available useful design criteria.

FEEDBACK CONTROL

19-20
TABLE

5.

SUMMARY OF MAJOR ANALYTICAL TECHNIQUES FOR FEEDBACK CONTROL
SYSTEM ANALYSIS

Type
1. Differential
equations

2. Routh Hurwitz criterion
3. Root locus

4. Frequency
response

5. Describing
functions
6. Closed loop
pole-zero
location

Usefulness
Classical solutions of differential equations are generally too involved for practical use in synthesis.
N ondimensional performance charts help on second order systems. Significance of individual
system component values difficult to ascertain.
Used to determine the limiting stability conditions.
Can be extended to include damping factor only
with difficulty. Limited usefulness.
The best solution to the problem of directly synthesizing the time response. Particularly useful when the performance specifications are in
terms of the time response. Construction of the
diagrams can be time-consuming and the performance can be sensitive to small relative
changes of locus in low-frequency region.
The most used approach presently available. The
locus can be plotted in the form of a Nyquist, log
magnitude-angle diagram, or the log magnitude
and phase diagram. The latter has the advantages of easy construction by templates and of
easy introduction of compensating characteristics. Easy to include experimental data in frequency response analysis. The difficulty of relating transient and frequency response is a limitation.
An extension of the frequency response techniques
to nonlinear systems. Good performance criterion not available. Method can treat higher order systems.
Requires determining realizable and practical components after the definition of the system response. Not in wide use as yet but possesses
the good feature of working directly from the desired closed loop response.

Described in
Chap. 20

Chap. 21
Chaps.
21, 23

Chap. 23

Chap. 25

Chaps.
22, 23

The Use of Computers

This has supplanted much of the paper design study. This approach
allows rapid and complete (often visual) evaluation of the expected system
performance. At the present state of the art, however, it is not possible
to obtain a complete design from the computer without interpretation at
various steps by the design engineer. The ultimate use of the computers
will occur when a complete systematic design can be programmed; but
this cannot be done until mathematical expressions can be equated to the
decisions now based upon "engineering judgment."

METHODOLOGY OF FEEDBACK CONTROL

19-21

Availability of computers has not eliminated the need for a thorough
knowledge of the standard feedback control techniques for analysis. Although, when an analog computer facility is available, the conventional
analytical techniques are used principally for preliminary, order-of-magnitude estimates and for verifying computer solutions, experience has shown
that a thorough knowledge of alternate techniques will enhance the usefulness of the computers.

REFERENCES
1. Letter Symbols for Feedback Control Systems, ASA YI0.13-1955, American Standards
Association, New York, July 1955, Sponsored by American Society of Mechanical
Engineers.
2. LR.E. Standards on Terminology for Feedback Control Systems, 1955 Proc. I.R.E.,
44, No.1 (1956).
3. Proposed Symbols and Terms for Feedback Control Systems, A.S.E.E. Subcommittee Rept., Elec. Eng., October 1951.

E

FEEDBACK CONTROL

Chapter

20

Fundamentals of System Analysis
S. J. Jennings and A. A. Winkeljohann

1. Representation of Physical Systems
2. Classical Methods of Analysis
3.
4.
5.
6.

Block Diagrams
System Types
Error Coefficients
Analysis of A-C Servos: Carrier Systems
References

20-01
20-28
20-56
20-66
20-70
20-79
20-84

1. REPRESENTATION OF PHYSICAL SYSTEMS

Methods of System Analysis

In order to study the performance of a physical system, equations must
be written from the physics of the situation to describe the excursion of
all variables. To describe the operation of a physical system in mathematical form, its differential equations may be written which, in general,
will be nonlinear in character. In many cases it is possible, by restricting
the region for which results are valid, to write linear differential equations
with constant coefficients for the system. The solution of the linear
differential equation then yields the complete steady-state and' transient
response of the system for a given input. The transient response indicates
the system stability while the steady-state response to a sinusoidal input
is very useful in system synthesis.
20-01

FEEDBACK CONTROL

20-02

TABLE 1.
Parameter
Translation systems:
Mass

Spring
DefOrma?

TYPICAL COMPONENT EQUATIONS (Ref. 10)
Equation

Description

d 2x
M dt 2

The net force acting on a body is
equal to its mass times its acceleration with respect to an arbitrary
fixed reference.

11 - 12 =
dx = ~
M
dt

;

f (11 - h) dt
dx

1 df

The force which must be applied to
each end of a spring to deflect it
a distance x is equal to the spring
constant K times x.

f = D (dXl _ dX2)

The force which must be applied to
each end of a dashpot to produce a
relative motion of its two ends is
equal to the viscous damping coefficient D times the relative velocity.

f = Kx; dt = K dt

f~JXFree

f'~length

Dashpot (viscous damping)

dt

dt

The net torque acting on a body is
equal to its inertia times its angular acceleration with respect to an
arbitrary fixed reference.

Rotational systems:
Inertia
Oc!l~

.,

"\.(0/q;

Torsional
spring

Rotational
dashpot

The torque which must be applied
to each end of a torsional spring to
produce a relative angular deformation 01 - O2 of its two ends is
equal to the rotational spring constant times the angular deformation.
dO

q=Bdt

The torque which must be applied to
a rotational dashpot to cause it to
rotate with an angular velocity is
equal to the rotational viscous
damping coefficient times the angular velocity.

FUNDAMENTALS OF SYSTEM ANALYSIS

20-03

TABLE 1. TYPICAL COMPONENT EQUATIONS (Ref. 10) (Continued)
Equation

Parameter
Electrical systems:
Inductance

di

Vl -

V2

= L dt ;

The voltage drop caused by current
flowing through a capacitance is
equal to the integral of the net
current flowing through the capacitance divided by its capacitance.

Capacitance

Resistance
R

~

OJ

x
M

K
D

112

Description
The voltage drop caused by current
flowing in an inductance is equal
to the inductance times the rate
of change of the net current flowing in the direction of the drop.

Vl -

V2

= Ri;

The voltage drop caused by current
flowing through a resistance is
equal to the net current flowing
through the resistance multiplied
by the resistance.

English Gravitational Units
q
torque, lb-ft
time, seconds
inertia, slug-ft2
distance, feet
J
()
angle, radians
mass, slugs
G torsional spring conspring constant,
stant, lb-ft/rad
lb/ft
B rotational damping codamping coeffiefficient, lb-ft/rad/sec
cient, lb/ft/se~

v
i
L

C
R

Electrical Units
time, seconds
vol tage, volts
current, amperes
inductance, henrys
capacitance, farads
resistance, ohms

The solution to differential equations by either the direct method or
by Laplace transformations is useful primarily in the analysis of a given
system with all parameters prescribed. This approach is less useful in
the design or synthesis of a control since the effect of the variations of
parameters on the exponential time function exponents is difficult to visualize. For more complex systems the problem of factoring the high order
polynomial characteristic equation becomes quite laborious.
For synthesis the root locus, frequency response, and closed loop pole
zero location methods are recommended. (Chaps. 21, 22, and 23.)
Even for analysis the classical time solution has been largely supplanted
by the wide usage and availability of analog computers. As a result the

20-04

FEEDBACK CONTROL

classical techniques of solution are used primarily as checks on analog
computer results or as aids in visualizing the basic performance. The
charts included in Sect. 2 are useful in this case.
Although the solution of differential equations is no longer of paramount
importance, the correct description of the system or component dynamic
performance by differential equations is basic to all methods of analysis and
synthesis. It is most important that the control designer understand differential equations and 'their application to his field of endeavor.
A suggested approach for obtaining these physical equations is:
1. Understand the system well enough to draw a schematic diagram
showing the relationship of all variables, including all pertinent components
as well as the load.
2. Replace the schematic with equivalent circuits or analogies.
3. Rearrange this diagram into convenient noninteracting sections or
blocks.
4. Write the characteristic equation of each section from the functional
relationship.
5. Obtain the transfer function from these equations.
6. Simplify this block diagram and obtain the complete system characteristic equation by algebraic manipulation.
This sequence is an analysis approach; synthesis of a system reverses
this method after starting with known requirements to obtain a system
equation.
Physical Laws. To write the equations which mathematically describe
the system or component performance, it is necessary to understand the
basic operation of the device and the physical laws governing the various
processes involved. The wide field of application of feedback control
theory makes it prohibitive to list all the fundamental laws that might be
required. The following partial list of textbooks in the particular field
of interest for these physical laws and Table 1 are useful.
1. Physics: Erich Hausmann and E, P. Slack, Physics, Van Nostrand, Princeton,
N. J., 1948.
2. Electrical: W. L. Everitt, Communication Engineering, McGraw-Hill, New York,
1937.
3. Thermodynamics: P. J. Kiefer and M. C. Stuart, Principles of Engineering Thermodynamics, Wiley, New York, 1954.
4. Fluid Mechanics: R. C. Binder, Fluid Mechanics, Prentice-Hall, New York, 1949.
5. Kinematics: J. L. Synge and B. A. Griffith, Principles of Mechanics, McGraw-Hill,
New York, 1949.
6. Circuit Analysis: E. A. Guillemin, Mathematics of Circuit Analysis, Wiley, New
York, 1949.
7. Materials: Stephan Timoshenko, Strength of Materials, McGraw-Hill, New York,
1953.

FUNDAMENTALS OF SYSTEM ANALYSIS

20-05

8. Hydrodynamics: H. Lamb, Hydrodynamics, The University Press, Cambridge,
England, 1932.
9. Mechanics: F. B. Seely, Analytical Mechanics for Engineers, Wiley, New York, 1952.
EXAMPLES. The following examples illustrate the use of basic physical
laws and Table 1 in obtaining the equations describing the system performance. Whenever possible, simplifying initial conditions are chosen.

Electric circuit.

FIG. 1.

1. An electric circuit such as Fig. 1 requires:
The summation of voltage drops in a closed loop is
equal to zero.
diet)
1
E cos wt = Ri(t)
Li(t) dt.
(1)
C 0
dt
KIRCHHOFF'S LAW.

+

E

+-

it

X

»m;;;;;;;;;;J77/T////

Viscous damping, D

FIG. 2.

Damped spring mass system.

2. A spring mass system such as Fig. 2 requires:
The summation of forces acting on a body equals the
change in momentum.
NEWTON'S LAW.

(2)

d2x
dx
M - 2 = - D - - Kx
dt
dt

or

(M S2

+ Ds + K)x =

O.

3. A combined electrical and rotational mechanical system is a d-c motor
with fixed field excitation (ignoring armature inductance) and a pure
inertia load is shown in Fig. 3.

20-06

FEEDBACK CONTROL

FIG. 3.

D-c motor with inertia load.

Summing voltage drops:

E(t) = Ri(t)

(3)

+ KeN(t),

where Ke = motor voltage constant,
N(t) = motor speed.
Summing torques:
dN(t)
.
(4)
J - - = Ktz(t)
dt
'
where K t = motor torque constant,
J = motor inertia.
Eliminating i(t) from eqs. (3) and (4) results in the transfer function of
output speed to input voltage:

N(t)

(5)

1

1

E(t)

where s = d/dt,

Tm = RJ/KeKt = time constant.
4. A common electromechanical system is a synchro with a pure inertia

-G~
81

Kt

Z

r)

[7H

!l

82

J

B

FIG. 4.

Schematic of synchro system.

load and with viscous damping as shown schematically in Fig. 4, where K t
is the torque gradient.

FUNDAMENTALS OF SYSTEM ANALYSIS

20-07

Summing torques:
(6)

Further examples are used throughout this section.
Circuit Simplification Techniques

A nalogies are useful in setting up physical systems and interpreting their
boundary conditions since this approach compares known systems with
the unknown. Often thermal, mechanical, hydraulic, etc., systems are
converted to an electrical equivalent since electric circuit analysis methods
have been developed to a high degree. Examples of conversions of physical
systems to electrical equivalents are given in Ref. 2.
ANALOGIES. The following equations show the analogies among three
systems:
Equations
System
(7)

d2 x
M dt 2

dx

+ D dt + Ksx =

J(t)

Mechanical translatory system

Mechanical rotation system

Electric circuit
where M = mass, slugs,
D = damping, Ib/ft/sec,
Ks = spring gradient, Ib/ft,
x = distance, ft,
K t = torque gradient, lb/ft,
() = angular displacement, rad,
L = inductance, henrys,
R = resistance, ohms,
J = inertia, slug-ft 2 ,
F = friction, lb/rad/sec,
C = capacitance, farad,
q = charge, coulombs.
Analogous elements are listed in Table 2.

"-l

TABLE 2.

a
b

ANALOGOUS ELEMENTS (Ref. 2)

00

Electrical elements

Electrical resistor

Electrical capacitor

:"'tR~:

o qq;-o

= R d(ql - q2)

E

Ec =

dt

R

R = resistance
q = charge

Ie

o

C

Electrical inductor

'I,L

0

1

C(ql -

q2)

= capacitance

d 2q
EL=L-2
dt
L = inductance

."

m
m
0

c:J

»

n
A

n
0
Z

Viscous damper

Mechanical elements (translational)

1--Jr

l

l~%2
i = D d(Xl
x

- X2)
dt

= displacement

D = damping coefficient

Spring

l--

tx)

'(_t..
f = K(XI - X2)
K = spring constant

Inertia

~j
i=

2
Md x
dt 2

M = inertia

-I

;;0

0r-

Mechanical elements (rotational)

Torsional damper

Shaft stiffness

Ie

r

G

K

y

B

Inertia

~
."

2

T = K({}l - (}2)

T = Jd {}
dt 2

C

Z

o

>

~
m

B = damping coefficient

K

=

stiffness coefficient

J = moment of inertia

Z

-i

>
r-

en

o
."
Fluid capacity

Fluid resistance

Hydraulic elements

en

-<
en

-i

Rh
q.---

-q2

Rh

= hydraulic resistance
q = rate of flow
Q = quantity of flow
(Q2 = 0 in electrical analog)

~q 2 -

>
>
r-

P

Ui

q . - Ch

P = Rhq = Rh dQ
dt

m
~

Z

-<
en

=

Ql - Q2

Ch
Ch

=

hydraulic capacity

Ql = quantity of inflow
Q2 = quantity of outflow
p. = pressure

t-.)

o
b

'0

FEEDBACK CONTROL

20-10

FIG. 5.

Wye-delta transformation.

Aids for Circuit Shnplification. The following techniques are useful
in reducing the system equations to simpler form:
WYE-DELTA TRANSFORMATION.
The circuits of Fig. 5 are equivalent if
the following relations are satisfied:

(10)

(11)

(12)

Zl=

Z2 =

Za =

ZbZc
Za

+ Zb + Zc
ZaZc

Za

+ Zb + Zc
ZaZb

Za

+ Zb + Zc

(13)

Za =

(14)

Zb =

(15)

Zc =

Z lZ2

+ Z2 Z a + Z3 Z

1

Zl
Z l Z2

+ Z2 Z a + ZaZl
Z2

Z l Z2

+ Z2 Z a + ZaZl
Z3

SUPERPOSITION. If a system is linear the system response to several inputs
will be the sum of the response to each input separately (refer to Fig. 6).

,

Linear system
characteristic
equation g

g(x)

+ g(y) + g(z) + g(w)

=

FIG. 6.

Output

,

Superposition.

THEVENIN'S THEOREM. The effect of any impedance element in a circuit
may be determined by replacing all the voltage sources by a single equivalent
voltage source and all other impedances by a signal impedance in series with
the impedance of interest. For Fig. 7, the equivalent voltage Eab is equal
to the open circuit voltage that is present across a-b with the circuit broken
at a-b.

FUNDAMENTALS OF SYSTEM ANALYSIS

20-11

Zab

a

b

Network
containing
impedances
and voltage
sources

I=~
Zab+Z

FIG. 7.

Thevenin's theorem.

N ORTON'S THEOREM. The current in any impedance ZR, connected to two
terminals of a network, is the same as if ZR were connected to a constant-current
generator whose generated current (Isc) is equal to the current which flows
through the two terminals when these terminals are short-circuited, the constantcurrent generator being in shunt with an impedance equal to the impedance of

CD

Network
containing
impedances
and voltage
sources

~

IR

01----.1

L..-._ _ _ _ _- '

FIG.

8.

Equivalent circuits using Norton's theorem.

the network looking back from the terminals in question. This theorem is
similar in many respects to Thevenin's theorem. It is illustrated by
Fig. 8.
Nodal and Mesh Analysis. A general approach to circuit analysis is
illustrated by Fig. 9a, band eqs. (16) through (21). In the nodal analysis

+

(a)

FIG. 9.

(b)

(a) Nodal approach; (b) mesh or loop approach.

the summation of currents at a junction or node is equal to zero. This is
useful in solving for an unknown voltage, given driving voltages and im-

FEEDBACK CONTROL

20-12

pedances.

From Fig. 9a,

(16)
e - el
el
el
------=0
Zl
Z2
Z3
'

(17)

el =

(18)

eZ2Z3
Z2 Z 3

+ Z l Z2 + Z l Z3

.

Use of the mesh analysis uses voltage summations about the closed loops.
Usually the unknown current, such as i2 of Fig. 9b is found in terms of
the known voltages and impedances. From Fig. 9b,

to solve for i2 by using determinates:

(21)
Zl

+ Z2

-Z2

-Z2
Z2

+ Z3

Tables of Typical Transfer Functions. The transfer function of a
system or element is the ratio of the transform of the output to the transform
of its input under the conditions of zero initial energy storage. It is a com-

plete description of the dynamic properties of a system and may be represented as a mathematical expression of the frequency response, or the time
response to a specified input.
In Tables 3 to 5 are summarized typical transfer functions in Laplace
transform form for typical mechanical, electrical, and hydraulic control
elements. For a more complete tabulation of transfer functions of RC
networks see Chap. 23, Sect. 2.
Tables 6 to 8 consist of three sections of a morphological table of servo
components appearing in Ref. 11.
Further material on the subject of transfer functions may be found in
Refs. 3, 11, 12.

TABLE

3.

SUMMARY OF TRANSFER FUNCTIONS FOR REPRESENTATIVE MECHANICAL ELEMENTS (Ref.

Mechanical Elements: Transfer Functions

N omencla ture
Rotation

Rotation

Spring mass damper

3a)

Oz = load angular position, radians,
Om = motor position, radians,
J = moment of inertia, pound-

8 l (s)
8 m (s)

foot-seconds/second,
n = gear ratio,
Ks = shaft spring constant, poundfeet/radian,
B = damping torque coefficient,
pound-feet/radian/second.

l/n

(J /Ks)S2

+ (B/Ks)s + 1

"'T1

c:
Z

o

»

~
m

Z

-t

».-

en

o

"'T1

Translation
Translation

Mass
M

Spring mass damper

Xes)
Yes)

Spring-dashpot (lag)

Xes) _
1
Yes) - (D/Ks)s

Spring-dashpot (lead)

Xes) _
(D/Ks)s
Yes) - (D/Ks)s + 1

1

=

(M/K)S2

+ (D/K)s + 1

Damp~

+1

1:

%(1)

x = mass displacement, feet,
y = platform displacement, feet,
M = mass, pound-seconds/second/
foot,
D = damping coefficient, pounds/
foot/second,
K, Ks = spring constant, pounds/foot.

en
-<
en

-t

m

~

»
z
»

~
en
Ui

1

yet)
~

Displacement
reference

t-.)

tp

--

Co)

TABLE 4.

SUMMARY OF TRANSFER FUNCTIONS FOR REPRESENTATIVE ELECTRIC ELEMENTS (Ref. 3a)

Electric Elements: Transfer Functions
D-C motor
For speed control
N(s)
1
Vacs) = Ke(TmS + 1)
For position control
1

8(s)

Vacs)

=

KeS(TmS

+ 1)

D-C generator and motor
For position control

Kg/KeR
Ec(s) = s(Trs + l)(TmS

Nomenclature
D-CMotor

"ut

8(s)

',"'lje3

Kg = generator voltage constant, volts/field ampere,
R = series resistance of motor and genera tor armature circuit, ohms,
Tf = generator field time constant, seconds,
Ec = voltage applied to generator control field, volts,
Va = voltage across drive motor terminals, volts.

Kl
Js2(TlS + 1)

to

~ "'woo,,",

~ ~I

ttt

Consta nt
flux

"mm

oc:J

»
n
A

n

o
z
-I
AI

or-

Galvanometer

Galvanometer
8(s)
1(s)

N = velocity of motor, radians/second,
() = position of motor, radians,
Va = applied voltage, volts,
Ke = voltage constant of motor, volts/radians/second,
T m = motor time constant, seconds.

D-C Generator

+ 1)

~

~
....
J:o,.

J = moment of inertia of galvanometer element,
pound-foot-seconds/second,
Kl = torque coefficient, pound-feet/ampere,
Tl = time constant of galvanometer coil circuit, seconds,
() = position of galvanometer element, radians,
I = signal current, amperes.

Gyroscope

Gyroscope

n(S)
K2
1(s) = J S(T2S + 1

~

1-;'~

Precession
coil

ttt

Constant
flux

Stabilizing networks
For rate signals (phase lead)

Eo(s)
Ts
Ein(s) = Ts + 1
For integral signals (phase lag)

Eo(s)
1
Ein(S) = Ts + 1

J = moment of inertia of gyroscope, pound-footseconds/second,
Q = angular velocity of gyroscope, radians/second,
JQ = angular momentum of gyroscope, pound-footsecond,
K2 = torque coefficient, pound-foot/ampere,
T2 = time constant of gyroscope precession coil circuit, seconds,
I = signal current, amperes.

Eo(s)
TIT2S2 + (T I + T 2)s + 1
Ein(S) = TIT2S2 + (T I + T2 + T I2)S + l'
Tl2 »TI + T2

Z

o
3:
m

»

-i

Ein = input voltage, volts,
Eo = output voltage, volts,
T = RC, time constant, seconds.

:---t~

R~ tEo

»
r-

(J)

o
on
(J)

E·nI
.f1IE'

-<
(J)

Phase Lag

T = RC, time constant, seconds.

-i

m

3:

»
»

E

z

•

For rate and integral (lead-lag)

C

Z

Phase Lead

E~l

on

Lead-Lag

TI = RICI)
T2 = R 2C2 = time constants, seconds.
Tl2 = R IC2

~
(J)
Ui

~

~

-01

~

C?
.....

0-

TABLE

5.

SUl\fMARY OF TRANSFER FUNCTIONS OF REPRESENTATIVE HYDRAULIC ELEMENTS

x = piston displacement from neutral, feet,
y = input displacement from neutral, feet,
C1 = piston velocity per valve displacement, second-I,
C2 = piston travel per valve displacement.

Cl

Spring load dominant

For phase lag
a, b = linkage distances, feet,
Tv = a +C b , valve effective time constant, seconds.

Xes) _ C
Yes) - 2

a

Valve-piston linkage
For phase lag

Cylirrder

Xes)
b/a
yes) = Tvs + 1

1

For phase lead
1, b, d = linkage distances, feet,
Tv = (1 : b)C' valve time constant, seconds,

Valve

For phase lead
Xes)
d [T3 S
Yes) = 1 + b T"s

T = Tv(1
3

+ 1J
+1

»
()

Val ve-Piston Linkage

=-;-

; T3

>

onds.
Tv

"'T1

m
m

oOJ

Nomenclature

Hydraulic Elements
Valve-piston
Load reaction negligible

Xes)
yes)

(Ref.3a)

+d
b)(b + d)
'

lead time constant sec,

A
()

o

Z

-f

:::0

o
r-

Hydraulic Motor

Hydraulic motor
'With compressibility
8(s)

Yes)

Sp/dm

m

2

'With negligible compressibility
8(s)

Yes) =

Sp/dm

[LJ

s d

m

2

fI ed

ad 6nes

+ dLJ s + 1]

[VJ 2
s Bdm 2 s

Rotatab~

Rot.ling

Cjllnd"b!ocl.~"de
... rblocl
~lIt~:~

+ 1]
s

Motor load

8 shi"

St.t"",,,

va!veplatO$

PIstons

Motor

8 = motor position, radians,
y = displacement of pump stroke from neutral, feet,
Sp = flow, cubic feet/second, from pump per unit displacement, y, feet,
dm = motor displacement, cubic feet,
J = moment of inertia, pound-foot-seconds/second,
L = leakage coefficient, cubic feet/second/pound/
square foot,
V = total oil volume under compression, cubic feet,
B = oil bulk modulus, pounds/square' foot.

."

c:

Z

o

»

~
m

Z

>

r-

en

o
."
en
-<
en

-t

m
~

»
»
r-

z

~

Ui

to.)

C?

--........

""C?

--'

TABLE

6.

ERROR DETECTORS (REF.

00

11)

(a) Type

(b) Main

No.

Application
(a) D-c or a-c

resistance
bridge
(b) Position control

(a) D-c tachom-

eter bridge
(b) Speed con-

trol
(a) A-c mag-

netic bridge
(b) Position con-

trol, particularly for
gyro pickups
where very
small forces
prevail

4

(a) A-c synchro-

system
(b) Position control where
continuous
rotation is
desired

Operation

Possible
Modifications

Error voltage, x, appears when the position of the
moving arms of the potentiometers A and B are
not matched. The power source, E, is applied
across both potentiometers. A measures reference
position as voltage and B regulated position as
voltage, their difference being x.

Potentiometer can
be wound on a helix
to get more than
360 of rotation.

Error voltage, x, appears when speeds of tachometers A and B vary. A measures reference speed
as a voltage and B regulated speed as a voltage.
The difference between these voltages is x.

A can be replaced by

Error voltage, x, appears when relative positions
of rotor A and stator B do not match. Rotor A
measures reference position magnetically and stator B regulated position magnetically. Voltage E,
across exciting coil, L, provides energy. When
rotor covers unequal areas of each exposed stator
pole (unbalanced magnetic bridge) pickup coils M
and N have unequal voltages induced. Voltage
difference is x.
Error voltage, x, appears whenever the relative
positions of the rotors of synchro-generator, A,
and synchro-control transformer, B, are not
matched. The reference position is measured by
A as a magnetic flux pattern which is transmitted
to the synchro-control transformer through the
interconnected stator windings. If the rotor of B
is not exactly 90 0 from the transmitted flux pattern, x is produced.

Operating Features

A and B can be remote.
Continuous rotation not
possible.

Accuracy
Limited by
Potentiometer
winding.

Features DeterFrequently Used
mining Energy
with This Device:
Required to Vary (a) Table 7 Amplifier
Reference Quan- (b) Table 8 Error
tity Measurement
Corrector
Contact arm and
bushing friction.

(a) 2, 3, 4, 5

(b) 1,2,3

0

-n
m

m

A and B can be remote.
Top speed limited by
commutator.

Tachometer accuracy. Commutator resistance.

Brush and bearing
friction.

(b) 1,2,3

Four poles instead of
three can be used
with two having exciting windings and
two pickup coils
connected bucking.

Limited rotation. Air
gap usually small.

Machining tolerance, magnetic
fringing, and voltage phase shift.

Load taken from x.
Bearing friction.

(a) 2, 4
(b) 1,2,3

A dual system can be
used whereby the
unity synchrosystem
sets the approximate
position and the
high-speed or vernier
system sets the accurate position.

Unlimited rotation. The
synchro-generator and
control transformer can
can be remote.

Machining tolerance, accuracy of
winding distribution.

Distributed or nondistributed winding
of control transformer rotor. Load
taken from x.
Bearing and slip
ring friction.

a battery as the reference.

(a) 2, 3, 4

o

OJ

>
n

'"n

o

z-t

:;:a

o
r-

(a) 2, 4
(b) 1,2,3

(a) Frequency

bridge
(b) Frequency
control

(a) Millivolt

bridge
(b) Temperature control

Error voltage, x, appears when reference and regulated frequencies differ. Tube channel A produces
a filtered sawtooth wave that gives a d-c voltage
inversely proportional to the reference frequency.
Tube channel B produces a similar voltage as a
measure of the regulated frequency. The difference of these doc voltages is x.

May be used as a
speed regulator if B
is made an a-c tachometer.

A and B can be remote.
Tubes can be either gas or
vacuum. A wide range
of frequencies can be covered. Vacuum tubes
should be used for high
frequencies.

Temperature and
aging effects on
tube and circuit
elements.

Tube input impedance.

(a)4

Error voltage, x, appears whenever the regulated
temperature differs from the reference temperature.
The regulated temperature is measured as a voltage by the thermoelectric effect of two dissimilar
metals, B. The reference temperature is represented as a voltage from the battery-potentiometer source A. The difference in these voltages is x.

An electronic voltage source or another thermocouple
can be substituted
for A.

A and B can be remote
A wide range of temperature can be covered.

Ability to detect
very low millivolt
signals.

Contact arm and
bushing friction.
If electronic voltage source A is
used, tube input
impedance.

(a) 2, 4

(b) 1,3

(b) 1,6

"C

Z

o

>
~
m

Z

-i

>
r-

(J)

o
"

B

(J)

-<

(J)

x

-i

m

~

>
Z
>
r-

CD

-<

(J)

m~m

Ui

~~~

o

Synchro
generator

Synchro control
transformer

CD

o

~x~

"->

C?
.....

-0

~

a
TABLE

No.

7

(a) Type
(b) Main
Application
(a) Phototube

bridge
(b) Position control by intercepting a
light beam

6.

ERROR DETECTORS

Possible
Modifications

Operation
Error voltage, x, appears when movable shutter is
in other than desired position. Light reaching
phototube, B, measures shutter position. This
light is measured as a voltage by the phototube
current variation. A reference position of the shutter is represented by the battery-potentiometer
voltage. The difference of these voltages is x.

An electronic voltage source or another
light source and phototube can be substituted for A.

tv
o

'(Continued)

. Operating Features

A and B can be remote.
Glass surfaces through
which light travels must
be kept clean.

Accuracy
Limited by
Continued accuracy of light source
and phototube.

Features DeterFrequently Used
mining Energy
with This Device:
Required to Vary (a) Table 7 Amplifier
Reference Quan- (b) Table 8 Error
tity Measurement
Corrector
Contact arm and
bushing friction.
If electronic voltage
Bource A is used,
tube input impedance.

(a) 2, 4

(b) 1

"'T1

m
m

o

I:JJ

(a) Mechanical

differential
(b) Position control and
speed control

(a) Beam bal-

ance
(b) Voltage control, speed
control, and
tension control

10

(a) Modified

beam balance
(b) Speed control (flyball
governors)

Displacement x appears whenever the relative
reference and regulated positions change. Reference position is measured as an angle by one side
of the differential A and regulated position as an
angle by the other side of the differential B. The
difference in the two positions rotates the middle
member of the differential giving displacement x.

Spur-gear differential.

Displacement x appears whenever the variable
force is different from the reference force. The
variable force, B, and the reference spring force,
A, are measured as moments. The difference in
these moments produces displacement x.

Any variable force
other than a spring
can be used.

For remote operation B
can be a transmitted
force. x movement limited. By changing
springs a wide force range
can be covered.

Displacement x appears when regulated speed, w,
differs from reference speed. This is represented
by spring force, A, about fulcrum, 0, the regulated speed by centrifugal force
mass, B, about
O. Difference in moments of forces about 0 produces displacement x.

Any variable force
other than a spring
can be used.

A wide speed range can
be covered. x movement
limited.

or

Since A and B must be located together, synchro
ties or their equivalent
can be used to transmit
remote positions to A
and B. Continuous rotation possible with speed
limited by gears.

Gearing backlash.

Power taken from
x. Bearing friction. Pitch of
gears.

(a)

1,6,7,8

(b) 1,2,3, 4, 5

»
n
A

n

o
Z

-f

::c

o
rLoad taken from

x. Bearing friction.

Load taken from
x. Friction.

Magnitude of
forces. Screw
pitch and friction.

Magnitude of
forces. Screw
pitch and friction.

(a)

1,6,7

(b) 1,3,5

(a)

1,7,8

(b) 1,5

11

(a) Bimetal
(b) Temperature

control

12

(a) Float
(b) Liquid level

control

1,6
1,6

Displacement x appears whenever the surrounding temperature and the reference temperature are
different. The reference temperature is represen ted by the position of the adjustable reference
point, A. The surrounding temperature is measured by the position of the bimetal strip, B. The
difference in these positions produces displacementx.

Bimetal can be made
snap acting at some
standard temperature.

Wide temperature range
possible by selection of
proper bimetal.

Load taken from
x. Ability to
measure accurately small x deflection. Time lag and
hysteresis of bimetal.

Mounting of reference point.

(a)
(b)

Displacement x appears when regulated and reference liquid levels differ. Point A is reference.
The liquid level is measured as a position by the
floatB. The difference produces displacement x.

A float controlling a
pulley system can be
used rather than a
lever.

With the proper mechanical arrangement a wide
variation in liquid height
can be controll€d.

Load taken from
x. Variable density of the liquid.
Friction.

Mounting of reference point.

(a) 2, 7
(b) 3. 4

Phototube, B

t~

~

Moveable shutter

------'--~~~=~.:./­

=~==~~~====~~= ;

5;
m

~

hi

.<]
Reference
point

Reference~

.

o

»

B

~§~

C

point

Z

--I

»

r-

en

o

"TI

®

R

"TI

c:

Z

en

U;
--I

m

5;

CD
Reference
spring, A

!TTl

Bimetal,

I

B

,x •

1\~Jf

AdjUsta~le'
reference
point, A·

I
Screw
adjustment

@

@

~/

-~eference

point

~

»
z
»
Adjustable
reference
point, A
x~,

~

en

en

&:;

I
'l;:

Liquid level

@ ~~~~~~]~~~~~~~~~~~~~~~~~~

t-.)

o

~

6.

TABLE

ERROR DETECTORS

....,

(Continued)
Features DeterFrequently Used
mining Energy
with This Device:
Required to Vary (a) Table 7 Amplifier
Reference Quan- (b) Table 8 Error
tity Measurement
Corrector

(a) Type

Possible
Modifications

(b) Main

No.
13

(a) Bellows

(b) Pressure

control and
temperature
control

14

Operation

Application

(a) Piston

(b) Pressure

control

Operating Features

Displacement x appears when surrounding and
reference pressures differ. Reference pressure is
represented as position by adjustable point, A.
Surrounding pressure is measured by the bellows
as a position. Difference in these positions produces displacement x.

Spring can be added
in addition to bellows spring.

Limited x travel.

Displacement x appears when regulated pressure
outputs of pump and reference differ. Reference
pressure is a force on the piston by spring, A. Regulated pressure is a force on the piston by the fluid.
Their difference produces displacement x.

A standard pressure
source can be substituted for the spring.

A and B can be remote.
Limited x travel.

Accuracy
Limited by

..,....,o

1,4,5,6

Load taken from
x. Hysteresis of
bellows spring.

Mounting of reference point.

(a)

Friction. Load
taken from x.

Piston forces involved. Screw
pitch and friction.

(a) 7,
(b) 5

(b) 1,6

8

."

m
m

oc:J

»
n
A

n

x

~
I

I
I

I

I
I

.'f

o

!

Z

I

~

 T 2) will start flat at log Xo (xo =
initial value) and then asymptotically approach the slope of the reciprocal
of the larger time constant (kITl)' Drawing the asymptote will give an
intersection
log

[x o

Tl

on the response axis.

See Ref. 6.

Tl

+ T2

]

FUNDAMENTALS OF SYSTEM ANALYSIS

20-27

Frequency Response. The most generally useful method is to excite
the system or element under test with a sinusoidal signal. The frequency
response is obtained by making a comparison of the amplitude and phase
relations of the input and output over the frequency range of interest.
The phase and amplitude relations can be obtained in a number of ways,
e.g., from (a) direct oscillograph or recorder readings of the variables,
(b) Lissajous patterns on a long persistence oscilloscope, and (c) special
test equipment that gives a direct reading of the phase and amplitude
ratios.
An analytical expression approximating the transfer function can be
obtained by curve matching techniques. Sufficiently good results are often
obtained .by a simple trial and error approximation of the frequency
response obtained by the use of the straight line asymptote defined in
Chap. 21. Straight lines with slopes which are multiples of 20 db per
decade are first drawn so as to approximate the experimental data. The
exact frequency response corresponding to the estimated straight line
response is then calculated by use of the graphs of Chap. 21. The agreement between the calculated and measured response is checked and the
process is repeated if necessary. With a little experience one or two
iterations are usually sufficient. The intersections of the straight lines
are the poles and zeros of the transfer function. More elaborate approximation methods are available if needed. See Refs. 6 and 8.
Correlation Technique. The autocorrelation function of white noise
is an impulse. Therefore the cross correlation of the input and output of
the system is simply the impulse response of the system when the input is
white noise. An experimental setup similar to Fig. 10 can therefore be
Xi (t)

~

"

System
under test

Xo(t) -

10 (t)
r

g(t)

Ii (t)
White
noise
generator

FIG. 10.

Cross
correlator

g(t)

~

Test configuration to obtain system response by correlation technique.

used to evaluate the transfer function of one element or system. Because
the cross correlation filters all signals not correlated with the input white
noise, the technique has the potential advantage of allowing the normal
system operation to continue while the test is being conducted. See
Ref. 8. The practical difficulties of mechanizing a satisfactory crosscorrelator has limited the usefulness of this method.

FEEDBACK CONTROL

20-28

2. CLASSICAL METHODS OF ANALYSIS

System Equations

In Methods of System Analysis in Sect. 1, the correct description of the
system or component dynamic performance by differential ,equations was
stated to be basic to all methods of analysis and synthesis.
General Linear Differential Equations. A common type of system
equation, the general linear integro-differential equation with constant
coefficients may be w'ritten in terms of the input x(t) and the driving
function yet) (Ref. 13) as
dnx(t)
dn-lx(t)
ao - +
al
1
n
dt
dt n-

(22)

dx(t)

+ ... + an-l -dt. - + anx(t)

+ an+lfx(t) dt + ... + an+qfqx(t) dt q = yet).
As a class, the homogeneous equation resulting from reducing the righthand side of the equation to zero has as its general solution a linear combination of solutions of the exponential form ePnt , where Pn may be real or
complex.
Characteristic Equation. The operator p = d I dt together with
lip

=

f

dt may be substituted into the reduced homogeneous equation.

The resulting operational equation may be handled by the rules of algebra
as explained in Chap. 8, Sect. 1. Factoring out the operational part of
this equation yields the characteristic equation (Ref. 13):
(23) aopn+ q + alpn+q-l + ... +
an_lpl+q

+ anpq + a~+lpq-l + ... + an+q = o.

General Solution to Linear Differential Equations

The complementary solution to eq. (22) is
(24)

Xt = An+qe(pnH)t

+ An+q_le(pnH-l)t + ... + Ale(Pl)t,

where pn are the roots of the characteristic eq. (23). The complete solution
is (Ref. 13)
(25)

x(t) = Xt

+ Xs ,

where Xs is the particular solution to eq. (22). The particular solution is
obtained by substituting an assumed solution and solving for the coefficients. (See Part A, General Mathematics.)

FUNDAMENTALS OF SYSTEM ANALYSIS

20-29

Absolute Stability Defined. (See also Chap. 21.) The stability of a
system may be broadly defined as that property which insures that it
will remain in operating equilibrium through normal conditions (Ref. 14).
A system is said to be on the verge of stability when it is hunting, that is,
subject to sustained oscillations; if the oscillations grow, the system is
unstable; if they decay, the system is stable. N onoscillatory instability
is also possible, such as the exponential growth of a system variable in
response to a disturbance. Tables 14 and 15 give examples of stable and
unstable performance and the dependence of stability upon the nature of
the roots of the characteristic equation (or exponents of the complementary
solution). When system gain is increased to provide desired accuracy,
instability is frequently encountered. This is the situation which is
attacked with equalization (or stabilization) methods designed to provide
a margin of stability without compromising system accuracy. A margin
of stability is nearly always desired from the hunting condition. I t is
implied that the system is linear or may be linearized in the neighborhood
of the operating point for the purpose of analyzing stability (for linearization of nonlinear systems, see Linearization, Chap. 25, Sect. 2).
EXAMPLE. Second Order System (M otor Synchronizing on a Fixed Signal).
In Fig. 11 a motor drives a load to which it is coupled directly from an

Initial pOSition
of controlled
variable = Co

Motor

FIG. 11. Motor driving load from an initial position Co to correspondence at position O.
Combined inertia = J Ib-ft-sec 2 ; damping = D lb-ft/rad/sec; stiffness = ]( lb-ft/rad
(Ref. 3a).

initial rest position Co to correspondence at position 0 (Ref. 13). The
input to the motor produces a torque that is proportional to the difference between the controlled load position and the reference input position.
Thus motor torque equals - Kc since the desired final position is zero.
Present in the load and the motor are mechanical friction and electrical
damping torques both of which are proportional to the motor speed;
friction and damping torque equal D(dc/dt). Static friction forces are
negligible. There is also a torque due to the combined inertia J and this
torque equals J(d 2 c/dt2 ).

FEEDBACK CONTROL

20-30

The complete torque equation can now be written as
d 2c
J- 2
dt

(26)

dc

+ D -dt + Kc =

O.

The steady-state displacement is zero, that is, the corresponding position,
so that the transient response is the entire motion. By writing the characteristic equation as
2
D
I(
(27)
p + J P + J = 0,
a further modification will be made. in the interest of obtaining a simpler
form of the solution. Let
(28)

and
(29)

.JIf: =

"'0

= undamped natural frequency,

D
_ rr;-; =

2vKJ

r

=

damping factor.

If these substitutions are made, the characteristic equation may now be
written (in nondimensional form) as
(30)
p2 + 2rwop + wo 2 = 0,

in which the two roots are
(31)

PI = - [r -

(32)

P2 =

v?-=-i ]wo,

-[r + Vr2 -

l]wo.

The effect of r upon the form of the transient solution of a second order
system is treated in the next section.
Use of Laplace Transfornt

The work involved in using the classical approach to the solution of
linear differential equations may be simplified to a routine process through
the use of the Laplace transform and its inverse, which uses the same
approach tb obtain both transient and steady-state solutions (Refs. 15,
16). The Laplace transform has the advantage of handling initial conditions and discontinuous inputs directly. For a complete presentation of
this method see Ref. 17.
The Laplace transform is defined as
£[J(t)] = F(s)

~ L"f(t)e-" dt.

FUNDAMENTALS OF SYSTEM ANALYSIS

20-31

The inverse Laplace transform is defined as

f+
C

£-l[F(s)] = f(t) = -1.
27rJ

iOO

F(s)e t8 ds,

t

~

o.

c-ioo

In these definitions s is the complex operator (J' + jw. The abscissa of
absolute convergence, denoted by (J'o, is at (J'o > o.
The Laplace transform and inverse Laplace transform are referred to
as a transform pair.
Laplace Transforlll Applied to Feedback Control Systelll.

A

closed loop linear control system may be represented in terms of the complex operator s by the eqs. (33), (35), and (36) as follows:
C(s) = G(s)E(s),

(33)

where C(s) = transform of the controlled variable, c(t),
E(s) = transform of the actuating error, e(t),
G(s) = transform of the transfer function of the forward control
elements and may be given the factored form:
(34)
(35)

G(~

=

K(s - sa)(s -

Sb) •••

sn(s - Sl)(S - S2) •••

,

E(s) = R(s) - B(s),

where R(s) = transform of the reference input, ret),
B(s) = transform of the feedback, bet).
(36)

B(s) = H(s)C(s),

where H(s) is the transform of the transfer function of the feedback elements and may be similar in form to that given in eq. (34).
The block diagram for the above system of equations is given in Fig.
25b. The transform of the closed loop transfer function (see Fig. 25c for
the block diagram) for this control system is
(37)

C(s)
R(s)

G(s)
1

+ G(s)H(s)

Expressing C(s) in terms of polynomials in s,
N(s)
avs v + aV_ls v- l + ... + als + ao
C(s) = =
,
(38)
D(s)
sq + bq_ls q- l + ... + bls + bo
where, because of the nature of the functions G(s), H(s), and R(s),
general q ~ v.

III

TABLE

9.

F(s)

No.

2
3

_1 (e- as
s

_

o~ t

J(t) ,

1
s
_1 e-a8
s

1

(Ref. 15)

SOME USEFUL LAPLACE TRANSFORM PAIRS

1 or u(t), unit step at t

e-bs )

u( t - a) - u( t - b),

r1m u( t)

5

1
S2

t, unit ramp at t

6

1
,sn

(n - I)!

16

17

S2(S

+ a)

s(s

+ a)2

e-at _ e-'Y t

1

+ a)(s + "Y)

1
s(s + a)(s

"Y- a
1

-+
a"Y

+ "Y)

1

S2

tn-Ie-at

1
- 2 (1 - e-at - ate-at)
a

1

(s

tn- l

1
- (1 - e-at )
a
e-at + at - 1
a2

+ a)
1

12

15

1
(n - l)t

1

11

0

te-at

+ a)n

s(s

=

e-at

1

(S

10

14

1

1
s+a
1
(s + a)2

9

13

a-+O.

ai

1/1 ~ tan- l (ala - ao)/{3
al

1

25

S2 - {32

26

s
S2 - (32

27

sF(s) - f(O+)

1 . h
~ sm {3t
cosh (3t
df(t)
dt

d 2f(t)

df(t)
s2F(s) - sf(O+) - (0+)
dt
F(s)
jC-l)(O+)
29
-+
s
s
F(s)
jC-I)(O+)
jC-2)(0+)
30 - +
+
S2
S2
s

28

--;}j2
ff(t) dt
f[ff(t) dtJ dt

31

aF(s)

af(t)

32

FI(S) ± F2(S)

!t(t) ± h(t)

33

aF(as)

f(~)

34
35
36

F(s + a)

e-atf(t)

FCs - a)

eatf(t)

e~a8F(s)

f( t =F a),

a

where r(t - a)

=

0,

+ a)

=

0,

f(t
20-33

>0


2

fJo e -at sin
-

Unit impulse,
£

· u(t) [11m
a-+O

fJt

a

"TI

(e- at _ e--Y')

'Y- a

fJ

u(t - a)]

1
1

+ ~fJ e- at sin (fJt

-

tan-l~)
-a

1 - (1

1

+ at)e- at

+ 'Ye - at -

ae--yt

a-'Y

o
5:
m

Z

--I

»r-

o

s

t-

-2a2
fJo

+

-1 e- at sin ( fJt - 2 tan -1 - fJ )
fJ

-a

2

t- -

a

+ ie-at +

2

_e-at
a

'Y

"TI

+a

'Y2e- at -

a 2e-'Yt

t - -- + -----a'Y

a'Y( 'Y - a)

en
-<
en
--I

m

£[t] =~

Specific form
of C(s)/R(s)

Z

en

=-

Unit ramp,
1

C

»

= 1

Unit step,
£[u(t)]

~

a 2te- at

1

5:
fJo 2

where
and

a
fJo 2

(s

+ a)2 + fJ2

=

two
a2

=

a2

where

+ fJ2 = wo2

a'Y

(s

+ a)2

(s

a

= wo

where a
and

+ a)(S + 'Y)

+ 'Y = 2two
a'Y

=

wo2

»
z
»r-

-<
en
iii

Note. Laplace transform, C(s), of each time response in this table is simply the product of the transform of the input, R(s), by the system transform C(s)/R(s) in the table.

~

o

W
tn

20-36

FEEDBACK CONTROL

The general case of expansion into partial fractions with higher order
poles yields the solution (see transform pair No. 0.21 of Ref. 15)
(43)

where
(44)

and
(45)
Order of SysteID Responses as Seen froID Partial Fraction Expansions. Any complex linear system can be represented as a combination of

first and second order systems. This may be seen from the partial fraction
expansion of a system response function such as that of eq. (38) into
partial fractions
(46)

N(s)
C(s) = D(s)

C1
C2
Ck
a1 s + ao
=--+--+
... +--+
+ ... ,
s + 2rwos + w0
Sl

S -

S2

S -

Sk

S2

2

where two conjugate complex first order poles have been combined into a
single term.
The response of the complex system can, therefore, be considered as
the sum of the responses of first and second order systems. The responses
of first and second order systems are thus of considerable importance and
are given for various inputs in the following.
First Order SysteID Responses. A first order system is characterized
by a single energy storage. An example of a first order system is the
simple hydraulic servo of Table 5 for phase lag. The system equation is
thus (using T for Tv),
(47)

dx(t)

T -

dt

+ x(t) =

(b/a)y(t)

= f(t).

The transform of the equation may be written:
(48)

[Xes)] = [

Ts

1

+1

] [F(s)]

which is of the form (see Ref. 16)
(Response function) = (System function)(Excitation function).

Asymptote

1:: 100%

------r------------- Final value

Q)

(a)

I

II

E
Q)

Tangent?/

o
(1J
Ci.

/

U)

C 63.2%

I

----/--

/

I.

/

/

/

/

/

h

T
t~

FIG. 12a. Response of first order system to step showing time constant relationships.
1.0

,

0.8

....

(b)

I

I

I

I

,1-

r-Ste p

/

1/
I

I

\

V

0.5

~I~

",

J 1\

0.4

II

0.3

~

I

1/

0.2

,
t\.
I"
t--.,i)Z

0.1 I
II

I_I mpu Ise, xT
A

'N..

1/

o
o
FIG. 12b.

::r::pf

/

\

~I~ 0.6
0

k:
V

\
\

0.7

1-0- ~~r-

I

0.9 1\

2

t
T

rH-3

4

Step and impulse response of first order system:

dx

T dt

= e- tIT,

Impulse:

xT
A

Step:

~=1-

+ x = J(t).
A

= ioo(impU!Se function) dt.
-00

e-tIT,

A = magnitude of step.

20·37

5

FEEDBACK CONTROL

20-38

The characteristic equation is
(49)

Ts

+1=

0,

and the transient solution is
(50)

The performance can be characterized by the quantity T, called the
time constant of the system (see Ref. 13). Physically, the time constant is
the time to complete 1 - e- 1 = 63.2% of the change after either a step
5

./

V

1/
4

V
Input, At-:--..

3

(c)

1/

.I

':>lV
/

V

v

V

2

II

V

~

"

v

V
V
1/

1/

x
AT

1/

'I'
f
~

1/
~

/

"
1-"

7

~.1

/",'

/1"

/

v/

/
/

o
o
/

"--~

~ - Final asymptote

1/

V

I

I

I

tit

I

'/
/

1

2

t

3

4

5

T
FIG. 12c.

Ramp response of first order system:
dx

T dt

+ x = f(t) = At,

x
t
-AT = -T - (1 - e- tIT) .

or impulse input. Also it is the time given by the intersection of the
tangent to the transient at t = 0 with the asymptote to the final value
when a step or impulse is applied at t = 0 (see Fig. 12a).
Table 10 lists three types of input J(t) , the corresponding excitation
function F(s), the response function Xes) for the system function
l/(Ts + 1) and the inverse transform x(t) of the response function, Xes)
and x(t) forming a Laplace transform pair. In Fig. 12b are plotted the step
and impulse response of a first order system obtained from the solutions

FUNDAMENTALS OF SYSTEM ANALYSIS

20-39

appearing in Table 10. In Fig. 12c is plotted the ramp response of a first
order system from the solution appearing in the same table.
Second Order SystClll Responses. The solutions for unit impulse,
step and ramp inputs to the second order system of eq. (26) generalized
by setting the right-hand side equal to ](r(t), namely
d 2c

J-

(51)

dt

2

de

+ D - + Ke
dt

= Kr(t)

are illustrated respectively in Figs. 13, 14, and 15a, b, e. The transform
0.8

II 1\

0.7

\

/

/
II
II

0.6

0.5

'/

'1/

0.4

wo
0.2

0.1

0

0.1

1\
~

'"

v-

~\
'< ~v

r = 0.6

\ 1\

r = 0.8

~// :---... rv ~ ~ r =1.0
~ :---... "- ,,\ )(
V ~ ~"\ rY ~-r = 1.2

c(t)

0.3

r = 0.2
\/ vr = 0.4

V

,

~ ~ ~L-r

j

"

= 1.4

"~~

J

\\\ ~ ~ I::::::::-t-.

II

1

2

\\
3 \

'\ \' '"
4""",

1\

,
1\

0.2

1\
\

0.3

t.....

\

0.4

5

V

""

/
-fJ, ~7-::::

1/r;
J

I

II

V

J
II

Response of second order system:

wo2

C(8) =
8

2

+ 2,rW08 + wo2 R(8)

to unit impulse in r(t) for various values of ,r.

~
f\

J...-

r--..:.:

t--..

f\

FIG. 13.

1'\

/
~::-...

wot

8

-N~1\1o
9

20-40

FEEDBACK CONTROL
1.6

'"'

I\
I 1\

1.4

\

II

1.2

I

.....

I

I

I

--, r,\

''\

/

wot
I2 IIi / V-4~j ~6
-\-- r-=:::.--- --

1.0

[/

/

.....

'tr 0.8 r-

r

0.4

FIG. 14.

I

.....

1.4

1:2

1.0

1,/
'/!f!1

WI

~V

l VI

I

~

Response of second order system:
C(s)

~

/

Iff!V

f

o

/

=
s2

to unit step in ret) for various values of

8/ --

F
~ ... --- ~ --_ ...../

~r=
r=o?f!/,I ~r-t =

0.4

0.2

~

/'

=0.6

f-

.....-

/

=

~

,~

I /
/r\ /
r 0.8-1-/- IL/I/, ;/
r
// /j; = "---./
= m
I

0.6

",,--

wo2
+ 25"wos

r.

+ wo2 R(s)

1o

;...-

9r---r---r---r---+---+---~--;---;---~~~
8r---r---r---r---+---+---~--;---~~~~~

41---jf---

8

9

10

10r---.--.,,-r-rr---n~-'---'__--r---~--'----r---r---'-'--

I

; /

:

/

1/

I

~I

I

/

V

/ 1

//
9~--+-~~~~~4---;--T~--~---r---+---+---+~~--~

8

I : I / /
/ l!
1,1/
~//'V
rr == 0.8~
0.6 -;....! / I II " /
'
:, -,....
T-fl"---r.i-'-f--+-f-t-----iT-I--+--t--t----t---:t'-~""t_-_r-__j
r = 1.0 Ci'-~fNl / /
/r /
//V

7

r = 1.4y-" 11/,1 /

r= 1.2~,""~NW ' ~r=O.4
/L
,I!
//
"
V
,'///
,/
r=o.~/

JY~V

o~~~--~--~--~--~--~----~--~--~--~-~--~--~

o

2

4

6

8

10

12

14

16

18

20

22

24

26

(b)

FIG. 15a, b. Response of second order system: C(s) = [(w02)/(S2
unit ramp r(t) = t 2 for various values of r.
20·41

+ 2rwos + wo2)]R(s) to

FEEDBACK CONTROL

20-42
24

/.

/.

/.

/.

//}'

22
/.

20

",

/.

/.

/.

/.

V

:/

V

/.
.(/. /"

18
. /.

16

/.

r= 0.2- ~
f= 0.4
r=0.6

14

V

./;:
/.

/

/

/

/.

Y

l-?

/.

/./.

-:;:;-\

~

12

-"""
~N 10

~/.

//V
/.

8

/.

r=
6 - r =
r=
4 _ r=
2

0.8-..
1.0 . . . .
1.2,
1.4

A

o

r

~ 'l

1//.,1bY/

o ~~
/.

/.~

2

V

4

8

6

10

12

14

16

18

20

22

24

26

wat

2f
(c)

FIG. 15c.

Response of second order system:
wo 2

C(s)

= 82

+ 2rwos + wo2 R(s)

to unit ramp r(t) = t for various values of

r.

of this equation may be written in a nondimensional form (as for the
first order system of eq. 47).
(Response function) = (System function) (Excitation function)
(see Ref. 16), which in this case is
2

(52)

C(s) = [

2
2] [R(s)]·
s + 2twos + Wo
w0

The time response may then be obtained by looking up the inverse
transform pair in a table of Laplace transforms (see Ref. 15). The form
of the response will be oscillatory, critically damped, or overdamped as

20-43

FUNDAMENTALS OF SYSTEM ANALYSIS

the damping factor r < 1, = 1, or > 1. Table 11 is a chart of the time
responses illustrated in Figs. 13 to 15 (see Refs. 6, 13, 18, and 19).
Tables 12 and 13 and Figs. 16 to 19, from Ref. 20, are for the determination of equation coefficients and system parameters for second order systems. Table 14 illustrates time responses. Table 15 treats stability.
I
1.0

I

I

I 1 1 1

I

II

H = steady state change
h = transient overshoot from steady state
~.

,

I

\
0.8

I

I

I

I

I

I

I

I

1-'

I

I

,

,

I

I

I

I

1

s[..i!..+2t

1\

w02

\,

\
0.6

I

Wo

S+I]

ll~
0
t-

:~

\

1\

\

\

h

s

~

0.4

~~+
W02

\

'\
0.2

~

r'\.

0.2

0.4

2t 8+1]

wo

~

H

t
~
H

t

f--

-

.-

-

~
.........

o
o

h

I--

~

j"...

0.6

~

"'" r-- r-0.8

1.0

FIG. 16. Determination of equation coefficients for second order systems from response
.curves (Ref. 20).

I'-)

o

l:...

,f::o..

TABLE 12.

RELATIONSHIP AMONG SYSTEM PARAMETERS AND EQUATION COEFFICIENTS FOR SECOND ORDER SYSTEM (Ref. 20)
az[(d 2x)/(dt 2)]

Parameter
Damping ratio

Undamped angular natural
frequency

Symbol
~

al

2~

I~

Wo

Undamped natural frequency

jo

Undamped natural period

To

Angular natural frequency

w

~aturalfrequency

j

+ al[(dx)/(dt)] + aox =

y(t)

Definition in
Terms of Equation
Coefficients

'\j a2

Equivalent Expressions To
Tl
T2
woTc' 2trTc ' 2Y'l'lT2

1 lao
( al)2
2; '\j~ - 2a2

1

w

Wo

2trVl=12' To'

2tr'

2tr ~-~ao
( al)2
2a2

'

"mm

+1

oOJ

2Y;

»
n

2tr

2tr~-~ao

lao

V

w
1
1
---,---, 2trjo, - , - , ~
yl - S2
To sTc y T l T 2

1

'\j~ -

+

1

271", TVl=12,
Wo

woV1 -

S2

,

2trf 2tr,

'T

woyl=f2
271"

1
2trSTc

2tr~Tc, 101

vl=f2
2trVl=f2
rT
'
~

w 1
' 2tr' T'

To

c

vl=f2
2trSTc

'"n

o
Z

-I
::0

or-

Natural period

Critical time constant

L[\rge time constant (S

2?r

T

1)

>

1)

I( al)2

T2

>

'\J

1)

v

2a2

+ '( al)2
\j 2a2

ao
a2

ao
a2

al
ao

Tt

Overcritical time constant

1
To Ty!=12 2TlT2
2TI T t
swo' 2?rs'
2?rS
'T I + T 2' V + l' 2S2

1

Tl

al
2a2

Time parameter ratio (S

2a2
2a2
al

al
2a2

Small time constant (S

( al )2

luo
'\J az

Tc

>

Z?r
2?r
2?rSTc
1
To
;-' woy1- S2' y1- S2' i' y1- S2

. al
2a2

+ . I( ai )2
'\J 2a2

al
2a2 -

~

_ _ _1_ _ ,
wo[S - yS-2 - 1]

~

vT
c"

2

~

-n

V

Z

+1

c:

o

»

wo[S

+

lITo
STc TI
yS-2 - 1( y~wo' 2?ry~' y~' -;

m

Z

--I

»
r-

en

2S
- , TI

al 2 ao
(2a) -

VvST

2?r

Wo

5;

Wo

ao
a2

,rv, TeVv,

+T

v

+ 1 T I, 2S T e, _2s
2

j=

2, - -

v

V v

Tc

o
-n
en
-<
en
--I
m

S + YS2 - 1 T I
S - yt! - l' T2

5;
·

»
z

»
r-

-<
en

en

'"o

1c.n

TABLE

13.

DETERMINATION OF EQUATION COEFFICIENTS FOR SECOND ORDER SYSTEMS FROM RESPONSE CURVES (Ref.

Type of
Response

Transfer
Function

Response Curve

Equation
Parameters
Used

Equation Coefficients
in Terms of ao and
Equation Parameters

Method Used to Find
Equation Parameters

[(~y + ~~

Oscillatory
8

+ 1]

o < r < 0.5

".:"" ",,,,,.k

~

t"o~vo
f\

"1

[(~y + ~~

f, T

~

%

o

~

0..

aoT2
41T'2

(1 - r)

t_

"3

toM"

+ 1]

rVl="?

1T'

Xo, Xl, X2, Xa, •••

".

"T1

Form ratios

"0 can be any peak

s

aoT

Measure T,

e

...,

a2

al
8

20)

t~

"3

m
m

Xl X2 Xa X2 Xa
-,
-, -, -, -,

-

Xo

X2

Xo

Xo

Xl

Xl

X3

og:,

...

»

()

A

8 [

(~y + ~~ 8 + 1]
s

[

t

(~) + ~~ s + 1 ]
2

"

s [

(~y + ~: s + 1 ]

r from Fig.

17

o
Z

-i
::0

t_

o
rNear critically
aperiodic

0.5<
1

Find
1

()

r

t lm~

f,

--t6 --.. --1-

Measure tlJ t2, ta
Form ratios
t2 ta ta - t2
-,-,--tl il t2 - tl

t% lOO~-~i

Find r, wolt, wOt2, wota
from Fig. 18
Compute value of

~

0.736 --

"0 0.406

I

0.19

< 2.0

tl

"

o

t2 t3
t_

0.801 ---0.599 --I
0.264 _
I!

!

t_

wo

r

=

fav and

wo

=

WOav

2aor

ao

wo

wo2

s

~y+ 2r

[( s

wo S

+ 1]

Critically
aperiodic

r=

t
x

Tc

To

1.0

t
x

+ ~~ s + 1]

Measure Tc on Fig. 18
for r = 1.0,
Tc =

t-

1
s [(;)

t1:,------

100
0.7360.264 -,.

tl

=

t2 -

tl

=

t3 -

2aoTc
t2

lOOtlr------

0.736 --

I

0.264

o

:

To

-n
C

t_

Z

8

~

[( s

y 2r

+WOS+ 1 ]

t

N onoscillatory

x

r > 1.0

t:~:;=-

t_

t

"

""~
Xo '
xl ---

%2

----!-

11 12

t-

Semilog plot of response curve

VlI

Tl

Plot response curve on
semilog paper. Extrapolate straight line portion of plot to t = O.
Measure In Xl and In X2 at
and t2 respectively.
Measure Xo and X(ex)
Compute TI from
Tl =

aOTI(V
v

+ 1)

o

»

~
m

Z

-t

»

r-

CJ)

tl

o
-n
CJ)

-<

CJ)

-t

t2 -

In Xl

-

tl

In X2

m
~

»
z
»

1
S

2r
]
[ (~S)2 +Wos+1

t
%

-:t:z=

Compute v from
X (ex)

v=---":'':'''''':''X(ex) - Xo

r-

-<

CJ)

Ui

t_

Invert response curve
to agree with above
plot, then plot on
semilog paper.

t-.)

o

1..
"'-J
()

1)

~

0

TABLE 14.

Step function
position

F(s)

JCt)

1
s

1

CD

1..
00

TIl\m RESPONSES OF Smm COMMON TRANSIENT MODES (Ref. 20)

fflJ,t==Time Response

t~

Step function
velocity

Step function
acceleration

First order lag
converging

®

MV ,---

1
~

®

1

!t2

83

Q)

1

Tcs

+1

_1 e- t / Tc
Tc

""LJ ,--/("~

1/Te

t~

®

1

sCTcs

+ 1)

1 - e- t / Tc

'~------

1ft)

,-

"T1

m
m
0

D'

>
n
A

n

0

Z

-I

:::0

0r-

First order lag
divergmg

®

~ et/ T c

1

+1

-Tc8

®

1
8(-Tc8

-1

+ 1)

~

fllJI

Tc

IITc~
t __

+ etlTe

"., 1--==-===
t __

Undamped
second order

t=O

O 100, r",

f

For r > 5, v"V4t2

/
1/

I

/

6

5
4

/

'/

3

2

/

L

L

lL

II
1

1.0 1.2

1.6

2.0

s

4.0

2.5 3.0

Damping ratio,

r

5.0

s

1

1

FIG. 19.

1

Time constant ratio TdT2 as a function of damping ratio for overdamped
second order system (Ref. 20).

FUNDAMENTALS OF SYSTEM ANALYSIS
TABLE

15.

20-53

STABILITY As A FUNCTION OF THE NA'l'UItE OF 'l'HE ROOTS OF THE
CHAItACTERISTIC EQUA'l'ION (Ref. 20)

No. of Example
of Performance
Given in Table 14

Type of
Stability
of System
Stable
Stable
Verge of stability; undamped oscillatory
time response
Unstable
Unstable

Nature of Roots of
Characteristic Equation
(or Exponents of
N onoscil- OscilComplementary Solution)
latory
latory
All roots have negative real parts.
4
A single zero root; all other roots, if any,
1
9
have negative real parts.
Conjugate imaginary roots all different,
5
8
in addition to roots for stable systems
above, if any.
Roots with positive real parts, in addi6
10
tion to other types of roots, if any.
7
Repeated zero or conjugate imaginary
2
roots, in addition to other types of
3
roots, if any.

Application of Convolution Integral

A convenient method of calculating the time response of a system to
any arbitrary input makes use of the convolution integral (see Ref. 21),
which may be written as
c(t) =

(53)

it

f(T)g(t - T) dT,

-00

where c(t) is the time response, J(t) is the input, and get) is the weighting
function or characteristic time response to a unit impulse (see Weighting
Function in Chap. 9). To evaluate this equation the arbitrary input is
approximated by a series of impulses as shown in Fig. 20. If the im-

I

I

I{t)

I
: ({T) I

I

II
II
I
4-AT~
I
I
I

,,1

FIG. 20.

Approximation of a function, f(t), by a series of impulses.

FEEDBACK CONTROL

20-54

pulsive response, get), is known, the sum of these responses to the impulses
approximating the input signal constitutes the total time response, as
illustrated by Figs. 1.47 and 1.48 of Ref. 8. Of course, a theoretical impulse function has zero width in time; in the practical case, if its width
is much smaller than the response time of the system being considered,
the results obtained will be valid. The quantity fer) is the average height
of the rectangular approximation of an impulse; Ar is the width; and r
is the time to the center of the rectangle as illustrated in Fig. 20.
The value of the time response at time t1 may be expressed as
(54)

2::

C(t1) =

fer) Ar get! - r).

T=Tl,72,·· -,il

This indicates that C(t1) is the sum of responses to impulse inputs, all
evaluated at t1 •
This same method may be used with the transfer function of the system,
since it is simply the Laplace transform of the weighting function.
Steady-State Solution of System Equations

Although the complete solution of a linear differential equation for a
system subjected to some driving function contains both transient and
steady-state portions, the steady-state part can be obtained independently
of the transient.
Sinusoidal Driving Functions. The general form of the steady-state
response of linear systems to sinusoidal excitation is sinusoidal and of the
same frequency as the driving function. An example of the steady-state
response cs(t) of the second order system of eq. (51) to a sinusoidal input
is given in Fig. 21.
When steady-state excitation with sinusoidal driving forces is considered,
the Laplace transform is intimately related to the impedance concept.
For the Laplace transform it will be found that s may be replaced by jw
to obtain the steady-state response to a sinusoidal driving function (see
Ref. 22). In Table 16 are given typical terms of an integro-differential
equation showing use of the operator jw to obtain the electrical and
"motional" impedances of analogous electrical and mechanical forms.
The justification for this substitution of jw for s is given in Ref. 22.
Application of this technique to the differential eq. (52) yields:
(55)

C(· )
JW = (jw)2

w
0

2

+ 2swoUw) + wo 2

R(· )
JW.

Complex Plane Plot. The steady-state response of a system as a function of frequency is very useful in servomechanism and regulator design

20-55

FUNDAMENTALS OF SYSTEM ·ANALYSIS
1.0
0.8
0.6
0.4
0.2

~

2

'"

~

0 r-----r-~--------r_----+_~--------r_-~t
-0.2
-0.4
-0.6
-0.8
-1.0

FIG. 21.

Example of steady-state response, cs(t) to unit amplitude sine wave input,
vet), for second order system:

C(s)

=

2

s

"-'0

2

+ 2rwos + wo2 for r =

1 and w = O.5wo.

TABLE 16. SUMMARY OF EQUATION TERMS AND COMPLEX QUANTITIES (Ref. 3a)
Physical
System
Electrical

Derivative
Form
L diet)
dt

Transform
Form

Complex
Form

Complex
Impedance

LsI(s)

j~LI(j~)

j~L

Ri(t)

Rl(s)

R1(j~)

.!. 1(s)

1(j~)

C s

j~C

M dv(t)
dt

MsV(s)

j~MV(j~)

j~M

Dv(t)

DV(s)

DV(j~)

D

Kfv(t)dt

KV(s)
s

l! V(j~)

-)-

b

fi(t) dt

Mechanical

1~

= jXL
R

-j

-

~C

= -jXc

.K
~

20-56

FEEDBACK CONTROL

(Ref. 23). Use is made of complex plane diagrams in which the magnitude
and the angle of the output to input ratio are shown by a single line on
the complex plane as in Fig. 22. The complex output-input ratio C/ R
is obtained by substituting jw for s in the transform eq. (52).

o

-0.2

1.0

-0.8j

FIG. 22.

Complex plane plot of

C
R

(jw)2

+

wo 2
2swo(jw)

+ wo2

for control system of Fig. 11.

Logarithlllic Plots. Instead of plotting vector loci of the transfer
function as in Fig. 22, the contours can be plotted to a logarithmic scale
(see Refs. 24, 25, and 26). To exploit certain manipulative advantages,
the attenuation and phase angle graphs are made separately. The attenuation is plotted in decibels, or 20 IOglO Iatten. I versus the IOglO W; the phase
angle is also plotted versus loglO w. In Fig. 23 the complex transfer function of eq. (55) has its attenuation and phase angle plotted against the
IOglO(W/WO), giving a nondimensional chart for the frequency response of
second order systems over a range of values of damping factor r.
3. BLOCK DIAGRAMS

Definition of Terllls

A block diagram is a simplified method of presenting the interconnections
of significant variables. It displays the functional relationships rather

FUNDAMENTALS OF SYSTEM ANALYSIS

20-57

than the physical and thus gives a clear· insight to the problem. The
physical system and interrelationship determine the block diagram arrangement, each block is a logical step in the flow or signal process. Block
diagrams are built up by algebraic combinations of individual blocks where
each block is a transfer function. An example is shown by Fig. 24 where
the transfer function of the controlled system is G3 = C/M5 • Most
block diagrams only show the desired inputs and outputs; however, in
many physical systems there are loading and -regulating effects. These
effects must be considered and can be handled as separate input effects.
The recommended nomenclature (Ref. 29) for symbols in the block
diagram is illustrated by Fig. 24 where:

V = desired value,
R = reference input,
E
M
U
C
Q

=
=
=
=

actuating error,
manipulated variable,
disturbance function,
controlled variable,
= indirectly controlled variable,
B = feedback.

The symbols used, such as C, may be in Laplace, operational, or sinusoidal
form and can be indicated as C(8), C(p), or CUw). Lower-case letters are
used to indicate time functions (r, v, e, m, U, c, b). Generally the parentheses
(8), (p), or (jw) are dropped unless a particular form of representation is
required.
The transfer functions are labeled as follows: A for reference input, G
for forward elements, i.e., from error to output, N for disturbance input,
Z for i.ndirectly controlled system, and H for feedback. Numerical subscripts are used to identify individual elements. References 29 to 31 are
the standards of the American Institute of Electrical Engineers and the
Institute of Radio Engineers. The important point is that a consistent
system be used.
Construction and Signal Flow

As illustrated by Fig. 24 the arrows connecting the blocks indicate the
unidirectional signal flow. The circular junction point with appropriate
plus or minus signs is used to indicate summing or differencing points
respectively, i.e.,

:-Ir

X_Y_ _

~

x..;.Y_ _
; ....

FEEDBACK CONTROL

20-58

/

7

/

/
00

.Iv
10

/

AV

~

10

",,:0

10

C'!

10

10

°

~ II

V

II

M

M

(\J0II

J1~~

~

\.

I

°......

( I ( I / / / /
~ ~\ \ I / / /
(

~~\

00

o

1/ / /

~

/1

~

~

III

o

~\
fl ~
dII dII II

MMM
('t)

o

C\I

o

°

C\I

0 0
0
.-4
(\J

I
salqpap ''1I:J

I

o
('t)
I

o

3
3

o -=

~
~~~-C-t-.,
_~t--~l ~t---. i'-...~r =0,05

-20

----.;~~~S§~~d:=g g:15 -

I

-40

e -60

en

~

-"0

_~ -80
Q

'0
Q)

"60

r----+-~- t ~"~0~~-\-t:f
0.4~~ ~~H ..-t ~ g:~5 --I----f--.!-~- - ~
,~
r----f--~ f;rr= g~~vv~~~~
~ ~
---I--~-+~~Ll~lJ
=

= 0.3

0.8_

-

1.0.......

~ -120

if.
-140
-160

"'T1

C

Z

o
»
;::
m

-100

c
cu

I

Z

~~

-

r--t---+-l-LL_J

-e---

I

I

-180
0.1

0.3

0.4

0.5

0.6

0.8

1.0

Magnitude and phase shift of C IR versus frequency ratio
Note.

For

r>

r-

en

'~~~

o
"'T1

2

3

4

CJJ/CJJo
FIG. 23.

»

-

lil~~~"-~
~

III

0.2

-t

5

6

8

10

en
-<
en

-t
m

;::

»
z
»

!:(
en

wi wo for various values of r.

Ui

1 plots are simply those for two unequal time lags (Ref. 3a).

t-.)

o

~

-0

...

,

e
be

Disturbance
input element
N

M4
Reference
input elements
A

~
BI

- Control
elements
GI

MI

~
B2

Control
elements
G2

+

~
M5

-n

Controlled
system
Ga

~C~

Indirectly
controlled
system Z

m
m

o

cc

»
n
A

n

o
Z

-I

Feedback
elements

;:c

C

H2

Feedback
elements

C

HI
FIG. 24.

Block diagram of representative closed loop system.

o
r-

FUNDAMENTALS OF SYSTEM ANALYSIS

20-61

The I.R.E. standard graphical symbols may also be used (Refs. 30 and
31).
Mixing point:

~
1

X3

f

x3=f(Xl,X2)

X2

Summing point:
l

+

~
~ _

X3

X3

=Xl -

X2

X2

Multiplication point:

~
Xz
X3

71'

X3

= XIX2

Algebra of Block Diagrams

A complex block diagram can be rearranged or reduced by combining
blocks algebraically. When all the loops are concentric, the indicated
manipulations can be carried out directly by successively applying the
relation CIR = GI(l + GH) to the innermost loops. When the inner
loops are not concentric or even intertwining, the block diagrams can
usually be reduced to concentric loops by the following rules and by reference to Table 17.
.
1. Data takeoff channels can be moved forward (in the direction of
arrows) or backward in the system at will except that the takeoff point
cannot pass a summing point. Whenever a data takeoff branch is moved
forward past a function G, the function 1/G, must be added in series with
the branch. Whenever a data takeoff branch is moved backward past a
function G, the function G, must be added in series with the branch.
2. A channel feeding into a summing point can be moved forward and
backward in the system at will except that it cannot pass a data takeoff
point. As this feed channel is moved forward in the system past a function G, the function G must be added in series with the channel. As it is
moved backward past a function G, the function 11G must be inserted in
the channel.
3. In some cases, it will be found necessary to move a takeoff point
past a summing point or a summing point past a data takeoff point in order
to reduce the system block diagram to simple concentric loops or parallel
paths, which can be handled by methods (1) and (2). This can be done
by removing a troublesome feedback point or data takeoff by closing an

20-62
TABLE

FEEDBACK CONTROL
17.

THEOREMS FOR THE TRANSFORMATION AND REDUCTION OF BLOCK
DIAGRAM NETWORKS (Ref. 17)

Equivalent Network

Original Network

Theorem

1. Interchange
of elements

2. Interchange of
summing points

~

tb

ct

~

tc

bt

tb

ct

~±b±C

---~)

3. Rearrangement
of summing
points

a

b±c
c

4. Interchange of
takeoff points

5. Moving a
summing point
ahead of an
element

6. Moving a
summing point
beyond an
element

7. Moving a
takeoff point
ahead of an
element

8. Moving a
takeoff point
beyond an
element
9. Moving a
takeoff point
ahead of a
summing point
10. Moving a
takeoff point
beyond a
summing point
11. Combining
cascade
elements

a

~a____~____~__~

a

t.

c

c

)

Ic=a±b
...(<-='c_______--:t-_--',
b

~
c·

c=a±b

a

b

a
~
c

a=~

c=a±b

r=
b

(c=a±b
b

20-63

FUNDAMENTALS OF SYSTEM ANALYSIS
TABLE

17.

THEOREMS FOR THE TRANSFORMATION AND REDUCTION OF BLOCK
DIAGRAM NETWORKS

(Continued)

12. Removing an
element from
a forward loop

13. Inserting an
element in a
forward loop

14. Eliminating a
forward loop

15. Removing an
element from a
feedback loop

16. Inserting an
element in a
feedback loop

~
17. Eliminating a
feedback loop

18. Special form
of 17

19. Special form
of 17

K1Gl
~
11= (K 1 Gl)(K2G2~

~

~

KIG 1
HK1Gl

~

~

~

__
1_

~

b

c=a±b

b=d

l+K 1G 1

KIG 1

20. Inserting a
feedback loop
to replace an
element

a

a

21. Different form
of 20

~

B

d )'

d ;,.

~
~b

K1G 1
!-K1G 1

;lI

_1__ 1
K1Gl

1:,
~d'

20-64

FEEDBACK CONTROL

internal loop, thus replacing a loop by a closed loop transfer function,
which has no takeoff or feedback points.
Exalllpies to Illustrate Transforlllation Rules.
EXAMPLE 1. The forward elements GI, G2 , and G3 , in Fig. 25a may be
combined by multiplication as shown by eq. (56) and Fig. 25b:

(56)

C

Ml

E

E

- =-

148

M2

.-

.-

M2

Ml

Mt

C

= G1G2 G3 = G.

'~.

M2

)EJ

c

):

(a)

_R~

c

~
(c)

(b)

FIG. 25. (a) Simple closed loop system; (b) combinations of forward transfer functions;
(c) system in simplest form.

In practice, loading of one element by another must be considered.
Figure 25b is further reduced to Fig. 25c by use of eq. (57):
(57)

C

G

R

1 +GH

Direct or unity feedback is also common and represents a particular case
of eq. (57), where H = 1.
EXAMPLE 2.
Reduction of a complex diagram is shown in Fig. 26.
Note in the first reduction, Fig. 26b that the block diagram is altered to
include an additional G4 element for mathematical simplicity, although
the signal flow and algebra is identical. Also note in Fig. 26d that a
positive feedback is accomplished by using eq. (58), except that the H
has a negative sign.
(58)

FUNDAMENTALS OF SYSTEM ANALYSIS

20-65

c
M5

~----------~Hl ~----------~

(a) Original system

(b) First reduction

_
I----~ G 5 -

(c) Second reduction

E

(d) Third reduction

(e) Final reduction

FIG. 26.

G3

1+G4H2

(Ref. 3a).

20-66

FEEDBACK CONTROL

EXAMPLE 3. For systems with multiple input and/or disturbances, the
superposition theorem is used. The example of Fig. 27 is used to show the
response C as a function of two inputs.

c

(a) System with multiple inputs

~r:::r
(b) With

R2 = 0

c

(c) With R t

=0

FIG. 27.

Let R2 = 0 as in Fig. 27b:
C
(59)
Let Rl = 0 as in Fig. 27c:
(60)

Combining inputs:

C

G2

-=------.

R2

1

+ G1G2H

(61)
4. SYSTEM TYPES

Definition of Systelll Types

The idea of the functional similarity of seemingly different transfer
functions is strengthened by classification into types. Three common types

FUNDAMENTALS OF SYSTEM ANALYSIS

20-67

are ones in which the following conditions are obtained after the transient has
subsided:
Type O. A constant value of the controlled variable requires a constant
actuating error signal.
Type 1. A constant rate of change of the controlled variable requires a
constan t actuating error signal.
Type 2. A constant acceleration of the controlled variable requires a
constant actuating error signal.
These characteristics may be identified in terms of the transfer function.
For a simple closed loop system with direct feedback the error signal

E(s) = R(s) - C(s),

(62)

where R, the reference input signal, is compared with C, the output signal.
The forward transfer function
C(s)
G(s) = (63)
E(s)
is of the general form
K(l + als + a2 + ... )
(64)
G(s) =
2
3
•
sn(1 + bls + b2s + b3 s + ... )
The value of the integer n in eq. (64) is equal numerically to the type of the
system.
Complex plane plots may be obtained by replacing s by jw in eq. (64).
The nature of the plots as w -7 0 will be representative of the type of
servomechanism studied, as illustrated in the following section subdivision, Typical Complex Plane Plots.
Typical Complex Plane Plots

A type 0 servomechanism representative plot is given in Fig. 28 (Ref. 34).
At w = 0, the transfer function G(jw) is on the positive real axis and has a
Imaginary

G(jw)

FIG 28. Representative complex plane plot for type 0 servomechanism system (Ref. 34).

20-68

FEEDBACK CONTROL

finite value Kp. Generally as w -7 00, G(jw) traverses the fourth and
then the third quadrants and approaches the origin.
A type 1 servomechanism representative plot is given in Fig. 29 (Ref. 34).
For this plot as w -7 0, the polar plot .of G(jw) approaches minus infinity
on the imaginary axis. Generally as w increases toward +00, G(jw) enters
Imaginary

w

= +00

G(jw)

Re

=

jw(l +jw)(l +jro)(l +j~)

FIG. 29. Representative complex plane plot for type 1 servomechanism system (Ref. 3).

the third and then the second quadrant. The type 1 servomechanism
when used for a position control system may also be called a "zero displacement-error system" meaning that the output has the desired value
of displacement, in contrast to the type 0 servomechanism, where an
error proportional to the desired amount of displacement is necessitated.
A type 2 servomechanism representative plot is presented in Fig. 30 (Ref.
34). For this plot as w -7 0, the plot of G(jw) approaches minus infinity
Imaginary

-1 +jO

\

\

\

" "-

Re

'\.

........

w-O+ ................ _
----- ......

--------.,.,....

FIG. 30. Representative complex plane plot for type 2 servomechanism system (Ref. 3).

FUNDAMENTALS OF SYSTEM ANALYSIS

20-69

on the real axis. The plot may be closed from w = 0+ to w = 0 - by a
circle of infinite radius traversed in a counterclockwise direction, as indicated by the dotted line. The type 2 servomechanism has a "zerovelocity error" characteristic since it is able to maintain a constant output
speed with no actuating error. It is also capable, like the type 1 servomechanism, of maintaining a constant output position without actuating
error.
Typical Application

Examples of type 0 servomechanisms are speed regulators for d-c motors
and jet engines and other forms of regulators controlling voltage, current,
or temperature, where proportional controllers are employed.
Examples of type 1 servomechanisms are position control systems with
such integral controllers as d-c motors, hydraulic motors, and hydraulic
valve-piston linkages. Other examples of type 1 servomechanisms are
speed control systems such as for a jet engine with proportional and integral control.
Examples of type 2 servomechanisms are position control systems in which
a pilot motor is employed to drive a control element, whose position
controls the speed of the main drive motor that supplies power to the
load being positioned and torque motors with series compensation.
Block diagrams of each of these servomechanism types are given in
Ref. 34. The following paragraph is in substance from Servomechanism
Analysis by G. J. Thaler and R. G. Brown (Ref. 35).
TABLE

Type
System

18.

CHARACTERISTICS AND ApPLICATIONS OF TYPES
SERVOMECHANISMS (Ref. 35)

Locus
Characteristic

Error
Characteristic

o

Closed.

Position error at all
times.

1

Open. The low-frequency end of the locus goes to infinity
along the negative
imaginary axis.
Open. The low-frequency end of the locus approaches infinity along the negative real axis.

No static error. Lag
error when operated
at constant velocity.

2

No static error. No
position error at constant velocity. Constant error in acceleration.

0, 1

AND

2

Application
Static positioning systems where high accuracy is not important. Some regulator systems.
High-accuracy static
and dynamic positioning systems.
High-accuracy dynamic posi tioning systems. Control acceleration errors.

FEEDBACK CONTROL

20-70

In general, the complexity of equipment, cost, and difficulty in design
increase greatly with the more advanced type of systems. Type 1 servomechanisms are therefore more common than any of the others. Occasionally, accuracy requirements will justify the type 2 system, and in
other cases where high accuracy is not essential a type 0 system is more
economical. Table 18 summarizes characteristics and applications of
types 0, 1, and 2 servomechanisms.
5. ERROR COEFFICIENTS

One of the important figures of merit of a system is its accuracy under
various conditions. By accuracy is meant the ability of the system to
minimize the error between the actual output and the desired output.
The usual types of accuracy specified for a control system are its static
accuracy and its dynamic accuracy. Static accuracy is the accuracy for
the output or one of its specified derivatives in a steady-state condition.
Dynamic a"ccuracy is the accuracy existing during transient conditions of
the output and of its derivatives.
Static Error Coefficients

A static error coefficient may be defined as the ratio of the steady-state
constant value of the output or of one of its constant derivatives to a
constant applied error. The static error coefficients are then:
Position error coefficient,
Output, c
. C(s)
Kp =
= hm-Applied error, e 8~O E(s)
for constant output, c.
Velocity error coefficient,
Kv =

Velocity of output,

c

Applied error, e

for constant velocity of output,

!::.

.

sC(s)

8~O

E(s)

= hm--

dc

c = -.
dt

Acceleration error coefficient,
Ka

=

Acceleration of output,

•

.. !::.

.

2

d c
dt

-2'

S2C(S)

= hm-8~O

Applied error, e

for constant acceleratIOn of output, c =

c

E(s)

FUNDAMENTALS OF SYSTEM ANALYSIS

20-71

K p , [(v, and Ka are respectively the gain constants of type 0, 1, and 2 control
systems.
For a sinusoidal applied error e, the error coefficients for types 1 and 2
control systems may be defined in terms of the maximum velocity and acceleration of the output c as follows:
Velocity error coefficient (type 1):
v,

(65)

.Il...

v

where e /Cmax ~ e at time of max

Cmax
= l'Im---I
w-'O e /cmax

c.

Acceleration error coefficient (type 2) :
VI
.I\, a

(66)

=

l'Im---I
cmax
w-'O e / Cmax

A
•
f
..
were
e /cmax =
e at tIme
0 max c.
h

In the limit K'v and K' a are identical in value to the values of Kv and
Ka obtained for constant c and C.
Table 19 presents a comparison of errors for various types of controlled
motion in which c, C, or c is constant or c is oscillating sinusoidally at a
much lower frequency than that corresponding to the shortest time constant of the control system.
TABLE

19.

COMPARISON OF STEADy-STATE ERRORS

Type of control system
Limit G(s), s ~ 0
Error for constant output, c
Error for constant velocity of output, c
Error for constant acceleration of output,
Maximum e for sinusoidally varying c =
C max sin wt

c

0

Kp
c/Kp

(Ref. 34)

1
Kv/ s
0

2

Ka/ s2
0
0

00

c/Kv

00

00

c/K a

Cmax/Kp

wCmax/Kv

w2Cmax /Ka

Dynamic Error Coefficients

A form of dynamic error coefficient is defined as the ratio of the input
of one of its specified derivatives to the component of the error which may
be assigned to it during a dynamic condition. That is, the error may be
expanded in a series in terms of the input and the derivatives of the input.
The dynamic error coefficients are then the reciprocals of the coefficients
of the various derivatives since they indicate proportionality between
dynamic error components and input derivatives.

. FEEDBACK CONTROL

20-72

vVriting the transform of a system with unity feedback gives
1

E(s)

(67)

R(s)

1

+ G(s)

E(s) may be expanded from the ratio of two functions of s,
R(s)

(68)

E(s) = 1

+ G(s)

into R(s) operated upon by the Maclaurin series expansion of 1/[1
or by simply dividing the numerator by the denominator,
(69)

E(s) =

1
1

+ I(p

R(s)

+-

1

KI

sR(s)

1

+-

K2

+ G(s)],

s2R(s) "',

which is valid near s = 0 (the steady state). Let K o = 1 + Kp.
Ko, Kl, K 2, ... are commonly called dynamic error coefficients (Refs.
36 and 37) of the system. Note. Strictly speaking, Ko, Kl, K 2, etc., are
not "dynamic" error coefficients since the transient terms were lost upon
the series expansion of eq. (68). More correctly, they might be termed
steady... state error coefficients. The term dynamic is common usage.
Ko, Kl, K2 may contain not only the values of the static error coefficients K p, K v, and K a , respectively, but also expressions involving the
static error coefficients and the time constants of the system. Therefore,
high gain alone is not sufficient for accurate dynamic performance, for
low system time constants are also important for this purpose.
EXAMPLE. Dynamic error coefficients. For the position servo of Fig. 11:
Kv
G(s) = - - s(1
TIS)

(70)

+

where Kv = K/D,
Tl

= J/D.

From eqs. (68) and (70) for unity feedback
(71)

(72)

E(s) _ _ R_(s_)_
1
G(s)

+

E(s)

=

~ sR(s) + [Tl
I(v

+ TlS2
+ s + TIS2 R(s),.

s
Kv

Kv

-

~] s2R(s) + ....
Kv

The dynamic error coefficients in this case are Kv and Kv2/(KvTl - 1),
showing that here the system time constant produces an acceleration
component of error proportional to the time constant.

TABLE

lId
[ e(t) = -Ko ret) + -Kldt

Locus
Identification

+ K12dt2d

20

u

4

ret)

SERVO ERROR COEFFICIENTS

nt)

+ -K31 3dtd

+ ... (Ko
.

ret)

= 1

(Ref. 39)

+ Kp and Kp

=

00

for all servos in table)

]

"'T1

Transfer
Function G(s)
WlW2

6-12

s(s

+ W2)

WIW2 W3

6-12-18
s(s

+ W2)(S + W3)

1

1

1

KI

K2

Ka

1

WI -

1
WI

WlW2

+ WlW3

0

»
~
m

2WI

W2 -

W2

Z

3
Wl W2

2
Wl W2

WI

c

Z

-

Wl 2

W3 W 2

2
Wl W2W3

-

2WIW2 - 2WIW3
3
Wl W2 W3

-i

»
r+ w2w 3

(J)

0"'T1
(J)

6-12-6-12

6-12-18-12

+ W3)
s(s + W2)(S + W4)
WIW2(S

+ W4)
s(s + W2)(S + W3)

WIW2(W3/W4)(S

w4

WIW3(W2

WI

-

W2W4(WI

+ W4)

WIW3(WIW3 -

2W4

-

WIW4(W2

+ W3)

-

W2 W3(WI

2
Wl W2 W3W4

wlw4 -

2W2W4 -

WIW2)

+ W2W4(WI + W4)2

3
Wl W3 3w 2

Wl 2w 32w 2

WlW3

1

+ W4)

2

+ W4)

WIW4(WIW4 -

2W3W4 -

2W2W4 -

Wlwa -

3
W1 W4 3w 2

WIW2)

+ W2W3(WI + W4)2

-<

(J)

-i

m
~

»
Z
»
r-<

(J)

Ui
12-6-12

WI(S
s2(s

+ W2)
+ W3)

0

W3
WIW2

W2 WIW2

W3
2

~

0

~

w

FEEDBACK CONTROL

20-74

Error Calculation for a Given Input. The error may be calculated
for a given input when ret), r'(t), r"(t), etc., are known or calculated. First
the transforms R(s), sR(s), s2R(s) , etc., are evaluated from the input and
its derivatives by the formula for real differentiation:
n

(73)

snF(s) = £ff(n)(t)]

+ L

j-.;:,.

'li .W
L:

/

1/

Observer

Input Angle ret) and Two Derivatives
ret)

t

an

-1

d'l.

d2

dt 2 [r(t)]

dt 2 [r(t)]

A/L

At
L

1 + (At/L)2

-2(A;i/L3)t
+ (At/L)2]2

[1

Values of Error Coefficients Ko, K I , K2 (see eqs. 69 and 72)

o
Dynamic Error, e(t) (see eq. 74)

A/L
]
1 ] [
e(t) = [ Kv
1 + (At/L)2 -

[TIl]
[ 2(A 3/L3)t ]
Kv - Kv 2 [1 + (At/L)2]2 + ...

An upper bound for the correction term Rn+l may be found by replacing
rn+l(r) by Irn+l(r) Imax and performing the integration of eq. (75) (a valid
procedure for many functions gn+l(t - r)).
When r, r', r", .. " rn suffer at most step discontinuities, the response may
be expressed by the first n + 1 terms of eq. (74) plus Rn+l of eq. (75) plus
the expression:
n

Mi

i=O

k=l

2: 2:

flikgi+l (t - tik) ,

where Mi is the number of step discontinuities of ri(t), flik is the magnitude
of the discontinuities, and tik is the time of the kth discontinuity. The
contribution of impulses can be added in separately.
The response at any time t may be written in three parts:
1. A finite number of terms of the equivalent of the familiar dynamic
error expansion, eq. (74),

20-76

FEEDBACK CONTROL

2. A corresponding finite set of transient terms which accounts for
possible discontinuities in the arbitrary forcing function and its derivatives.
3. A convolution integral which clearly places in evidence the exact
inaccuracy in the response involved in using a finite number of coefficients (equivalent to the familiar error coefficients).
The above expression results from a closed expansion of the convolution
integral to which the response at any time t may be equated. The usefulness of this expansion of the response into three parts lies in the fact that
in many problems Rn+l contributes a small portion of the total response.
Consequently Rn+l may either be neglected or crudely approximated
without introducing appreciable inaccuracy in the total response. In such
cases the process of convolution to obtain the response is replaced by
differentiation and summation.
If rn+l(r) in eq. (75) is replaced by its maximum absolute value, then
for many functions gn+l (t - r) it may be shown that integration yields
an upper bound,

IRn+llmax = \gn+2(O) \ Ir n+l(r)lmax = \cn+ll I r n+l(r) Imax,
where Cn+l is the coefficient of rn+l(t) in the expansion of the response and
of sn+l in the Maclaurin series for the system function such as E(s)jR(s).
EXAMPLE.
Dynamic Error Expressed by Expansion of the Convolution
Integral. Illustrate the expansion of the dynamic error for a case in which
there exists a known solution. Let E(s)jR(s) of eq. (67) be Bj(s + b),
the response characteristics gi+l (t) being

and the Maclaurin series expansion being

2:-B ( 00

i=O

b

S

)i

-b

(see eq. 69). Then the dynamic error may be expressed by n
of . the error expansion
n B ( 1 )i .
2:
- - r1(t)

i=O

b

-b

(see eq. 74) plus the transient terms
M"
2: 2:
n

i=O k=l

AikB

(

-

1 )i+l

-b

e-b(t-ttk)

+ 1 terms

FUNDAMENTALS OF SYSTEM ANALYSIS

20-77

plus the remainder

I

t

[rn+l(r)][B

(

-=-1 )n+l

""

e-b(t-T)u(t - r)] dr

b

-00

(see eq. 75), which may be bounded by
Ir"+I(r)

1 max

/

~

CJ+

1
/,

+

in which is recognized the (n
1)th coefficient of the Maclaurin series for
E(s)jR(s).
Let R(s) be Aj(s
a). Then the input and its derivatives ri(t) are
given by (-a)iAe-atu(t), and for each value of i, ri(t) has one discontinuity
(Mi = 1) at time til = 0 of magnitude ~il given by (-a)iA. Also
Irn+l(r) Imax is given by an+IA.

+

The dynamic error then may be written as

~ A (~) (~)\e-at _

z=o

b

b

e-bt ),
'

with the remainder Rn+l bounded by A(Bjb)(ajb)n+l. This series for
the error converges rapidly and is useful for a« b, corresponding to
an input which is slow compared with the system response characteristics.
As n ~ 00, the dynamic error expansion ~ AB[(e-at - e-bt)j(b - a)],
which checks with the inverse Laplace transform of ABj(s + a)(s + b).
Another example of the expansion of the convolution integral to yield a
time response in three parts is given in Ref. 41.
Relative Usefulness of Error Coefficients. Both static and dynamic
error coefficients are treated in Ref. 34. Dynamic error coefficients are
also treated in Refs. 37 and 36, which includes a treatment of the approximate relations, for small overshoot, between the time delay and rise
time of the step function response and the dynamic error coefficients.
Dynamic error coefficients [{o, KI, K2 ... have been defined by eq.
(69) and have the same values for all error constants up to and including
the one which is nonzero and finite in the classical definition of static
error coefficients, K p , K v, [{a, ... (as given in St~tic Error Coefficients).
This definition of dynamic error coefficients by eq." (69) has the advantage
of giving additional information about a system, because the value of
any constant is not forced to be zero whenever the preceding constant is
nonzero and finite as is the case with static error coefficients K p , K v , K a ,
.. '. (These coefficients Ko, K I , K2 ... are the same in the steady state
as the dynamic error coefficients defined for any time t at the beginning of
this section subdivision.)

20-78

FEEDBACK CONTROL

Relationship between Dynamic Error Coefficients and Roots of
System Equations. This section is, in substance, from Automatic Feed-

back Control System Synthesis by J. G. Truxal (Ref. 42).
As a result of the relation between C/R and E/R, there results the
Maclaurin expansion
C(s)
1
1
1 2
- - = l - - - - s - - s - ....
R(s)
Ko
Kl
K2

(76)

The relation between the dynamic error coefficients K o, Kb K2 and the
poles and zeros of the closed loop expression C/ R is readily determined
if C/ R is written in factored form.
(77)

where the zeros lie at -Zi, the poles at -Pi.
The solutions for the dynamic error coefficients are:
m

K
(78)

Ko =

II Zi
i=l

n

m

II Pi -

K

II Zi

m

where

II indicates the product of all factors from i
i=l

i = m and for cases where Kp

~ ex:>

= 1 to and including

(Ko = 1 + Kp),

(79)

(80)
Equations (79) and (80) are of basic importance in servo synthesis for
they represent the correlation between the dynamic error coefficients and
the system response characteristics, specifically the time delay and rise
time of the response of the system to a step function. In addition the
two equations indicate the manner in which lead and integral equalization
permit control over Kl and K2 without affecting relative stability. Generalized (dynamic) error coefficients are treated in greater detail in Ref. 36,
and their relation to closed loop roots is treated at length in Ref. 42.

FUNDAMENTALS OF SYSTEM ANALYSIS

20-79

Guillemin's Method. Equations (79) and (80) are of basic importance
in feedback control system synthesis by Guillemin's method, which is
described in Chap. 23.
In the first step of this method the closed loop transfer function is determined from the specifications for frequency response and transient
response. In Chap. 23 use is made of techniques for obtaining with the
aid of these two equations the zeros and poles for compensation required
to obtain the desired closed loop transfer function.
6. ANALYSIS OF A-C SERVOS: CARRIER SYSTEMS

In a d-c servo the signals are directly proportional to the instantaneous
amplitude whereas in an a-c system the signals are modulated carrier
waves, and the information is carried in the modulation. For instance in
an amplitude modulated system the envelope of the modulated wave
contains the signal information. Owing to the convenience and simplicity
of using a synchro or chopper circuit as a modulator and a two-phase
motor as a demodulator, many carrier systems use suppressed-carrier,
amplitude modulation. Frequency and phase modulation are not yet in
common use although they offer theoretical advantage for null detection.
The value of an a-c system lies in the possible use of sensitive, accurate,
low-force level sensors, inexpensive and relatively easily produced as amplifiers, and power elements with low maintenance requirements.
Basic Types of Elements. Three types of elements are encountered in
carrier systems:
Type 1. Elements in which both input and output are modulated
carriers.
Type 2. Modulators which have inputs at signal frequencies and
outputs which are modulated carriers.
Type 3. Demodulators which have modulated carriers as inputs and
have signal frequencies as outputs.
Figure 31 shows several electronic circuits that can be used as modulators
and demodulators. In addition the mechanically tuned or electromechanical choppers are used extensively. The a-c servo motor also serves
as a demodulator and the a-c tachometer acts as a modulator.
Two-phase servomotors and a-c tachometers are analyzed in Refs.
45-47 and 49-51. An accurate mathematical representation of these
elements is complex and simplifying assumptions and analogies are commonly used.
Simplifying Assumptions. The majority of work done with a-c feedback systems has been on suppressed. carrier systems. For this type of
system the following assumptions are normally made:

FEEDBACK CONTRO"t

20-80

r.c
C

a
,

d-c signal
Input (modulator)
Output (demod.)

Output (mod.)
Input (demod.)

Vacuum Tube Diode

r

a-c ref.

'-----"'\ 00000)

a-c signal

1

d-c signal

Diodes

a-c signal

Transistor

FIG. 31.

Modulators and demodulators.

FUNDAMENTALS OF SYSTEM ANALYSIS

Input
a-c (demod.)
d-c (mod.)

20-81

Output
d-c (demod.)
a-c (mod.)

Vacuum Tube Triodes

Input a-c/d-c

aJc.
ref.

Output
d-c/a-c

E:

C
Magnetic-Amplifier

Input

Output

Transistor

FIG. 31.

Modulators and demodulators (continued).

FEEDBACK CONTROL

20-82

1. Modulators generate perfect suppressed carrier signals; i.e., harmonics besides the sideband frequencies, self-generated noise, and serious
phase shifts in the modulator are neglected.
2. The motor acts as a perfect demodulator with a prescribed static and
dynamic relationship between the driving voltage envelope and the output shaft position.
3. The carrier frequency and magnitude remain constant.
Suppressed Carrier SystCIn. A simple suppressed carrier open loop
system is shown in Fig. 32. If the input to the preamplifier is the product
of the carrier and the reference input, then

(M cos aWet)(V cos wet)

= MV(cos awet) (cos wet).

Expanding this yields
(81)

MV[cos (1

+ a)Wet + cos (1

- a)wetJ.

As indicated by eq. (81) this type of modulator has an output containing
only the two sideband frequencies, the sum and difference of the modulating

FIG. 32.

Open loop carrier system (Ref. 3b).

and carrier frequencies, (1 - a)w e and (1 + a)w e, and no carrier frequency,
We. This gives rise to the name suppressed carrier.
A typical suppressed-carrier feedback system using tachometer stabilization is shown in Fig. 33. The signal equations are indicated at the significant points in the system.

M cos aWct ,-----,

K cos (awct

+ cP)

FIG. 33.

Typical a-c servo with tachometer feedback (Ref. 3b).

20·83

FUNDAMENTALS OF SYSTEM ANALYSIS

SystmTI Analysis and Design. For the purposes of system analysis
the a-c components can usually be treated in the same manner as analogous
d-c components. For instance as shown in Fig. 34, the speed-torque
curves of a d-c and an a-c machine are analogous, and the analysis of the
a-c machine can proceed in the same manner as the analysis of the d-c
machine. See Table 5, Sect. 1. Note, however, that the a-c machine

8

Fixed
source

r:)
e~I~!~~

v

1

I

Load

1/

N
"t:I

Applied voltage
Rated voltage

"t:I

Q)
Q)

Q)
Q)

c.

c.

en

en

Torque
Torque-speed characteristics
of d-c shunt motor
FIG. 34.

Torque
A-c two-phase motor linear
approximation to d-c motor

D-c analogy to a-c two-phase motor.

characteristics are nonlinear and that to derive a linear transfer function,
the analysis must be carried out on a linearized, incremental change basis.
The linearization methods of Sect. 2, Chap. 25, can be used.
There are torques at other frequencies besides the signal frequency
produced in the motor. These torques are at frequencies (2 + a)w e and
(2 - a)w e and normally because of the high frequency and low amplitude
produce little mechanical motion. However, the associated currents can
produce important heating effects. Similarly because the motor tends to
discriminate against quadrature control signals, the currents produce little
torque but the heating and/or saturating effects of quadrature currents
can be important.
The a-c and d-c analogy can be extended to a-c tachometers. An a-c
tachometer with a control signal of a frequency aWe and an amplitude pro-

20-84

FEEDBACK CONTROL

portional to aWe affects system performance in a manner similar to a d-c
tachometer.
Alternating-current stabilizing networks and a-c system stability are
treated in Chap. 23, Sect. 3.
Noise, quadrature voltage or carrier phase shift, variations in carrier
frequency, and pickup present major problems in a-c system design, and
their consideration dictates as much as the stability analysis the form and
characteristics selected. As a result it is desirable to define the environment arid operating requirements before investigating the system stability.
ACKNOWLEDGMENTS

The cooperation of the following is gratefully acknowledged in granting permission to reproduce material in this chapter:
American Institute of Physics. From: Journal of Applied Physics (part of Sect.
5).
General Electric Company. From: Servomechanisms and Regulating System Design by H. Chestnut and R. W. Mayer (John Wiley & Sons, N ew York) (Tables
3, 4, 5, 16, 19; Figures 11, 23, 26, 28, 29, 30, 32, 33).
McGraw-Hill Book Company. From: Automatic Feedback Control by W. R.
Ahrendt and J. F. Taplin (Table 21); Servomechanism Practice by W. R. Ahrendt
(Tables 1, 20); Servomechanism Analysis by G. J. Thaler and R. G. Brown (Tables
2, 18).
Westinghouse Engineer (Tables 6, 7, 8).
John Wiley & Sons. From: Transients in Linear Systems by H. F. Gardner and
J. L. Barnes (Tables 9, 15).
American Institute of Electrical Engineers. From: Electrical Engineering (Table
17).
Bureau Of Aeronautics, United States Navy. From report prepared by Northrop
Aircraft Co. (Tables 13 to 15, Figures 16 to 19.

REFERENCES
1. E. A. Guillemin, Synthesis of RC Networks, J. Math. Phys., 28, 22-42 (1949).
2. G. J. Thaler and R. G. Brown, Servomechanism Analysis, Chap. 1, McGraw-Hill,
New York, 1953.
3. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design:
(a) Vol. I, 1951, (b) Vol. II, 1955, Wiley, New York.
4. J. R. Ketchum and R. T. Craig, Simulation of Linearized Dynamics of Gas-Turbine
Engines, Nail. Advisory Comm. Aeronaut., Tech. Notes 2826, November 1952.
5. L. M. Toss, How to reckon basic process dynamics, Control Eng., 3,50-55 (1956).
6. H. Chestnut and R. VV. Mayer, Servomechanisms and Regulating System Design,
Vol. II, Chap. 1, vViley, New York, 1955.
7. J. B. Reswick, Determine System Dynamics-without Upset, Control. Eng., 2,
50-57 (1955).
8. J. G. Truxal, Automatic Feedback Control System Synthesis, McGraw-Hill, New
York, 1955.

FUNDAMENTALS OF SYSTEM ANALYSIS

20-85

9. 'V. M. Gaines, Frequency response methods in design of turbojet engine controls,
Second Feedback Controls System Conference, Am. Inst. Elcc. Engrs., April 1954.
10. W. R. Ahrendt, Servomechanism Practice, McGraw-Hili, New York, 1954.
11. S. W. Herwald, Forms and principles of servomechanisms, lVestinghouse Eng., 6,
149-155 (1946).
12. W. R. Evans, Control System Dynamics, McGraw-Hill, New York, 1954.
13. H. Chestnut and R. 'V. Mayer, Servomechanisms and Regulating System Design,
Vol. 1, Chap. 3, Wiley, New York, 1951.
14. S. B. Crary, Power System Stability, Vol. 1, p. 1, Wiley, New York, 1945.
15. M. F. Gardner and J. L. Barnes, Transients in Linear Systems, Vol. 1, Appendix A,
Wiley, New York, 1942.
16. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. 1, Chap. 4, Wiley, New York, 1951.
17. M. F. Gardner and J. L. Barnes, Transients in Linear Systems, Vol. 1, Chaps. 3-6,
Wiley, New York, 1942.
18. G. J. Thaler, Elements of Servomechanism Theory, Chap. 3, McGraw-Hill, New
York, 1955.
19. G. J. Thaler and R. G. Brown, Servomechanism Analysis, Chap. 4, McGraw-Hill,
N ew York, 1953.
20. Methods of Analysis and Synthesis of Piloted Aircraft Flight Control Systems,
BuAer Rept. AE 61-41, March 1952, Appendix, Sect. A.
21. M. F. Gardner and J. L. Barnes, Transients in Linear Systems, Vol. 1, Chap. 8;
Wiley, New York, 1942.
22. E. Weber, Linear Transient Analysis, Vol. 1, Chap. 2, Wiley, New York, 1954.
23. H. Chestnut and R. 'V. Mayer, Servomechanisms and Regulating System Design,
Vol. 1, Chaps. 9 and 10, Wiley, New York, 1951.
24. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. 1, Chaps. 12 and 13, Wiley, New York, 1951.
25. G. S. Brown and D. P. Campbell, Principles of Servomechanisms, Chap. 8, Wiley,
New York, 1948.
26. H. M. James, N. B. Nichols, and R. S. Phillips, Theory of Servomechanisms,
Chap. 4, McGraw-Hill, New York, 1947.
27. M. F. Gardner and J. L. Barnes, Transients in Linear Systems; Vol. I, Chaps. 2
and 7, Wiley, New York, 1942.
28. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. I, Chap. 7, Wiley, New York, 1951.
29. A.I.E.E. Standards Subcommittee on Terminology and Nomenclature of the
Feedback Control Committee, Am. Inst. Elec. Engrs., January 1950.
See also Letter Symbols for Feedback Control Systems, ASA Y10.13-1955, American
Standards Association, New York, July 1955.
30. IRE 26.S2 Standards on Terminology for Feedback Control Systems, 1955, Proc.
I.R.E., 44, 107-109 (1956).
31. IRE 26.S1 Standards on Graphical and Letter Symbols for Feedback Control
Systems, 1955, Proc. I.R.E., 43, 1608-1609 (1955).
32. T. D. Graybeal, Transformation of Block Diagram Network, Elec. Eng., 70, 985990 (1951).
33. T. M. Stout, A Block Diagram Approach to Network Analysis, Trans. Am. Inst.
Elec. Engrs., Application and Industry, 71, 255-260 (1952).
34. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. 1, Chap. 8, Wiley, New York, 1951.

20·86

FEEDBACK CONTROL

35. G. J. Thaler and R. G. Brown, Servomechanism Analysis, Chap. 7, McGraw-Hill,
New York, 1953.
36. J. G. Truxal, Automatic Feedback Control System Synthesis, Chap. 1, McGraw-Hill,
New York, 1955.
37. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. II, Chap. 2, Wiley, New York, 1955.
38. P. E. Smith, Jr., Design Regulating Systems by Error Coe.fficients, Control Eng., 2,
69-74 (1955).
39. W. R. Ahrendt, Servomechanism Practice, Chap. 14, McGraw-Hill, New York,
1954.
40. W. R. Ahrendt and J. F. Taplin, Automatic Feedback Control, Chap. 7, McGrawHill, New York, 1951.
41. E. Arthurs and L. H. Martin, Closed expansion of the convolution integral (A
generalization of servomechanism error coefficients), J. Appl. Phys., 26, 58 (1955).
42. J. G. Truxal, Automatic Feedback Control System Synthesis, Chap. 5, McGraw-Hill,
New York, 1955.
43. R. A. Bruns and R. M. Saunders, Analysis of Feedback Control Systems, McGrawHill, New York, 1955.
44. M. Panzer, Envelope transfer function analysis in a-c servosystems, Trans. Am.
Inst. Elec. Engrs., 75, 274-279 (1956).
45. S. S. L. Chang, Transient analysis of a-c servomechanisms, Trans. Am. Inst. Elec.
Engrs., 74, 30-37 (1955).
46. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. II, Chap. 6, Wiley, New York, 1955.
47. R. J. W. Koopman, Operating characteristics of two-phase servo motors, Trans.
Am. Inst. Elec. Engrs., 68, Pt. I, 319-329 (1949).
48. A. Hopkin, Transient response of small two-phase induction motors, Trans. Am.
Inst. Elec. Engrs., 70, Pt. I, 881-886 (1951).
49. L. O. Brown, Transfer function for a two-phase induction servo motor, Trans.
Am. Inst. Elec. Engrs., 70, Pt. 2, 1890-1893 (1951).
50. R. H. Frazier, Analysis of the drag-cup a-c tachometer, Trans. Am. Inst. Elec.
Engrs., 70, Pt. 2, 1894-1906 (1951).
51. S. A. Davis, Using a two-phase motor as a tachometer, Control Eng., 2, 75-76
(1955).
52. J. G. Truxal, Automatic Feedback Control System Synthesis, Chap. 6, McGrawHill, New York, 1955.
53. G. M. Attura, Effects of carrier shifts on derivative networks for AC servomechanisms, Trans. Am. Inst. Elec. Engrs., 70, Pt. 1, 612-618 (1951).
54. C. S. Draper, W. McKay, and S. Lees, Instrument Engineering, Vol. II, McGrawHill, New York, 1953.

E

FEEDBACK CONTROl

Chapter

21

Stability
W. E. Sol/ecito and S. G. Reque

1. Introduction
2. Classical Solution Approach
3. Routh's Criterion
4. Nyquist Stability Criterion
5. Bode Attenuation Diagram Approach
6. Root locus Method
7. Miscellaneous Stability Criteria
8. Closed loop Response from Open loop Response
References

21-01
21-02
21-05
21-09
21-29
21-46
21-71
21-72
21·81

1. INTRODUCTION

Definition of Stability. A stable system is one wherein all transients
decay to zero in the steady state. An unstable system is here loosely
defined as one in which the response variable increases without bound
with a bounded signal input.
Reason for Stability Analysis. The primary objective of a control
system design is to devise a system such that a controlled variable is
related to a command signal in a desired manner within permissible tolerances. If power elements with reliable, unchanging characteristics were
available, the problems of control system design would be much simplified.
Since, in the main, the characteristics of power elements change with
time, temperature, load, pressure, etc., a feedback element is employed to
21·01

21-02

FEEDBACK CONTROL

remove the deleterious effects of change in element characteristics. To
improve performance of the system, a _natural solution is to increase the
gain or amplification in the system. The combination of a closed loop and
high gain leads to problems of instability.
Purpose of Stability Analysis. To be a satisfactory control system,
the system deviation resulting from any normally encountered deviation
stimulus must reduce with increasing time to a small value within acceptable tolerance. It is the purpose of stability studies to indicate a system's
dynamic behavior, and if this behavior is improper or inadequate, the
studies should point the way toward proper system revision to improve
performance.
Methods of Stability Analysis. The methods of studying stability
presented in this chapter are restricted to linear systems .. A linear system
is one in which the output due to simultaneous inputs is the same as the
sum of the several outputs due to

R~S)
+_:E

the inputs acting alone. In other
words, a linear system is one which
may be described by ordinary linear
differential equations. wherein the
FIG. 1. General negative feedback system.
theorem of superposition holds true.
For nonlinear systems, see Chap. 25.
Consider the general feedback system shown in Fig. 1; s is the Laplace
transform complex variable. The transfer characteristics are given by
:::

C(s)

I:J.

C(s)

R(s)

G(s)
1

+ G(s)H(s)

The stability of a system is uniquely defined by those values of s which
make
(2)

1

+ G(s)H(s)

= 0.

All the methods of stability analysis, therefore, confine themselves to investigation of eq. (2), in one fashion or another. The techniques can be
classified in two general categories: those which obtain the explicit values of
the roots of eq. (2) and those which obtain information about the bounded
regions wherein all the roots lie. In the first category belong the classical
approach and the root locus method. In the second category belong Routh's
criteria, Nyquist's criteria, Bode's method, and many others. The relative
merits of each will be discussed as each method is examined in detail.
2. CLASSICAL SOLUTION APPROACH

As shown in Chap. 20, it is possible to relate the controlled variable to
the command (reference) variable by a differential equation. For the sta-

21-03

STABILITY

bility studies involved here, assume this is a linear differential equation of
the form
dnx(t)
dn-lx(t)
(3) ao - - + al
1
+ ... + anx(t)
dtn
dt ndmy(t)
dm-ly(t)
= bo - - + bl
I
+ ... + bmy(t) .
dtm
dt mSince this equation is linear, the solution may be broken into the sum of
two solutions, the particular solution and the complementary or homogeneous
solution.
The particular solution, Xss , also called the forced response or the steadystate solution, is of the form
xss = f(x).

(4)

In other words, the forced response is of the same character as the reference. For example, if y is sinusoidal, Xss is also sinusoidal.
Characteristic Equation. When the operator p is substituted for d/dt
and y is set equal to zero, eq. (3) becomes
(5)

aopnx(t)

+ aIpn-Ix(t) + a2pn-2x(t) + ... + an-lPx(t) + anx(t)

=0

or
(6)

[aopn

+ aIpn-l + a2pn-2 + ... + an-lP + an]x(t)

=

o.

This is the characteristic equation (see Chap. 20) of the system. The
complementary solution, Xl, also called the homogeneous solution or transient
solution, is of the form
(7)

The exponents pI, P2, Pa, ... , Pn are the roots of eq. (6). The coefficients AI, A 2 , A a, ••• , An depend upon the initial conditions of the system and the forcing function, y.
.
Note that when multiple roots occur, say PI = P2, the transient solution
is of the form
(8)

The total solution is the sum of the two parts
(9)

X = Xss

+ Xl.

Relation of Stability to Characteristic Equation. Instability has
been defined as the output becoming large without bound for bounded
input. Since the steady-state solution is of the same character as the
forcing function, only the transient solution can provide terms which in-

21·04

FEEDBACK CONTROL

crease without bound for bounded input. This occurs when any of the
roots Pb P2, Pa, ... , pn have positive real parts because the corresponding
exponential terms in eq. (7) or (8) tend to infinity as t becomes infinite.
Because Ab A 2 , A a, ••. , An are finite values depending on initial conditions and Y88 is bounded for a bounded input, system stability is dependent
only upon the nature of the characteristic equation of the system! In other
words, system stability is uniquely determined by the behavior of the exponential terms in the transient response given by eq. (7) or (8).
a. If all the roots have negative real parts, all the exponential terms

decay to zero as time increases. This is a stable system.
b. If any of the roots have positive real parts, the corresponding exponential terms increase without limit. This is an unstable system.
c. If any of the roots are purely imaginary, the corresponding terms
oscillate at constant amplitude. This condition is the dividing point between a stable and an unstable system. It is here also considered unstable.
d. If it so happens that multiple roots occur, i.e., PI = P2, which are
purely imaginary, the output increases without bound. Again this is an
unstable condition.
The fundamental problem in ascertaining system stability is therefore one
of determining the nature of the roots of the characteristic equation of a given
system. The straightforward method of determining stability of a system

consists of the following steps:
a. Write the differential equation of the system relating input and output variables.
b. Substitute P for d/dt and equate the input signal to zero. This is
the characteristic equation of the system in operational form.
c. Obtain the roots of the characteristic equation with assigned values
for all constants.
d. Examine the roots. If all roots have negative real parts, the system
is stable. If any of the roots have zero or positive real parts, the system is
unstable.
Figure 2 shows the regions of root location for stable and unstable
systems.
Note. If the Laplace transform method of analysis had been used, the
confllusions would have been identical except that the complex variable
s would replace the operator p. Equation (2) would yield:
(10)

[aos n

+ aIsn- 1 + ... + anlX(s)

= [bosm + bIsm -

1

+ ... + bmlY(s).

As an input-output ratio similar to eq. (1) this is
(11)

C(s)

Xes)

R(s)

Yes)

--- = --- = ----------------------------

STABILITY

21-05

Solution of the equation resulting from setting the denominator of eq.
(11) to zero is exactly the same as solution of eq. (2). [1 + G(s)H(s)]
is a fraction of polynomials in s where the numerator polynomial is the
characteristic function of the system. As shown in Fig. 2, stability is
uniquely defined by those values of s which satisfy eq. (2). Because of
more universal acceptance, the complex variable s will be used in place of
the operator p in all the following methods of stability analysis.

I axis

FIG. 2.

Pictorial representation of stability definition.

Relative Merits of the Classical Solution Approach. This method
of determining stability has the advantage of being theoretically exact,
but suffers from two major disadvantages:
a. A great amount of labor is required to factor equations of degree
higher than 3.
b. To factor any higher order equations, the coefficients must be numerical values. The loss of system parameters in literal form obscures the
ways to improve system performance should redesign become necessary.
3. ROUTH'S CRITERION {Refs. 1, 2, 3}

In 1877 E. J. Routh developed an algebraic method for determining
whether a polynomial has roots with positive real parts. This method
does not reveal the exact values of the roots but shows the bounded regions
wherein they are located. Reference to Fig. 2 shows that this is all that
is necessary to determine whether a system is stable or not. If all roots
lie in the left half s-plane, the system is stable.
Application of the Routh Criterion.

Step 1. Write the characteristic equation in the form
(12)

[aos n + alsn- 1 + a2Sn-2

+ ... + an_IS + an]X(s)

=

O.

Remove all the zero roots, i.e., the roots that occur at s = O. If the
zero roots do occur, they can easily be recognized because s or some mul-

FEEDBACK CONTROL

21-06

tiple of s will be common to all terms in eq. (12). For example, if an = 0
in eq. (12), s would be common to all terms and could be placed outside
the brackets.
Step 2. Examine eq. (12) to see that all the coefficients of s are nonzero and of the same sign. If this is not true, an unstable system is immediately indicated.
Step 3. Arrange the coefficients in an array of the form:
Index
n:
n
n
n
n
n
n
n

-

bl

a2
a3
b2

as a7
b3

CI

C2

C3

4:

dl

d2

5:
6:

el

e2

iI

1:

ao
al

2:
3:

-7:

a4

a6

gl

The index number indicates the highest order of s in a row.
The first two rows consist of all the terms in the given equation and the
rest are calculated in the following fashion.
b2 =

d2 =

ala4 - aOa5
al

clb a -

blCa

,

, etc.

etc.

Cl

etc.
Notice that two terms in the first column are used in each calculation.
As the term to be calculated shifts to the right, the additional two terms
in the formula shift to the right also.· The formulas for calculation of
terms in a row use only those terms in the two rows immediately above.
The process is continued for (n
1) rows where n is the order of the characteristic equation.
Step 4. After the array has been completed, stability can be investigated
by inspection of terms in the first column. The number of changes in sign
of the terms in the first column is the number of roots with positive real parts.
This constitutes Routh's criterion.

+

STABILITY

21-07

EXAMPLE. Given the fourth order equation
8s4

+ 2s3 + 3s2 + s + 5 = O.

The array becomes:
Index
[b 1 =

2·3 - 8·1
=-2
2 = -1 J
2

1

[b

2

=

2·5 - 8·0
=
2

5

[Cl

=

-1·1-2·5
=-l1=l1J
-1
-1

[d

=

4:

+ 8

3

3:

+ 2

2:

-

1:

+11

0:

+5

1

5

l

2~5

11· 5 - (-1· 0)
11

=

= 5J

1~~5

= 5J

There are two changes of sign in column one (between indexes 3 and 2,
and 2 and 1), therefore the equation must have two roots with positive
real parts. Since a fourth order equation has four roots, the remaining two
roots must lie in the left half s-plane.
Note. A generalization can be made from this example. The last term
(+5) came down through the array without change. Since all the coefficients in the equation are positive, the first two terms in column one are
positive. Only terms of index 2 and 1 in column one can be negative.
Thus, a maximum of two sign changes can occur. Therefore, one can
conclude that if all terms in a fourth order equation are nonzero and of the
same sign, at least two roots must lie in the left half s-plane. This conclusion is of no great import in itself but it merely points the way to
intelligent use of this method of analysis.
Special Cases in Applying the Routh Criterion. Because the Routh
criterion can be used to advantage in other commonly used stability studies,
it is worth while to pursue the criterion in greater detail here.
a. Row multiplication. Any row may be multiplied by a positive constant without affecting the criterion. This may be used to decrease the
arithmetic labor involved.
b. When the first term in a row is zero and other terms in the same row are
not zero. To continue the process, replace the first column zero by an arbitrarily small positive constant, ~, and continue the calculations. Examine
the complete array in the usual fashion. If necessary, ~ may be assigned
any arbitrarily small value. This number may be positive or negative
but is customarily assumed positive.

FEEDBACK CONTROL

21-08

c. When all terms in a row are zero. This special case arises when roots
lying radially opposite and equidistant from the origin occur as shown in
Fig. 3. A pair of conjugate pure imaginary roots is of this category. When
a row of zeros occurs, take the preceding row of coefficients and form a
subsidiary function. This subsidiary function is the polynomial in s
having as coefficients the terms of a row; the exjw
ponent of the highest power of s is the index of the
~--­ ---¥
J
row and successive powers of s decrease by two.
I
I
I
EXAMPLE. The subsidiary function of the row
1
(J'
with
index 3 of the preceding example is
I
I
I

*-~-

Is-plane

---*

FIG. 3. Roots radially
opposite and equidistant
from origin.

f(s) = 2s3

+ s,

whereas the subsidiary function of the row with
index 2 is
f(s) = -S2 + 5.

Upon formation of the subsidiary function of the row preceding the
row of zeros, differentiate it with respect to s and replace the row of zeros
by the corresponding coefficients of the differentiated function. Proceed
in the usual manner. The index numbers remain unaltered. Upon completion of the array, the number of changes in sign indicates the number
of roots in the right half s-plane. The remaining roots are either in the
left half s-plane or on the axis of imaginaries. One of several procedures
can be utilized to determine the number of each (Ref. 4). A straightforward approach is as follows.
In the original equation replace s by - s. This substitution rotates all
the roots of the equation through 180 degrees. Those roots of the original
equation in the left half s-plane are now in the right half s-plane. Application of Routh's criterion to this new equation determines the number of
these roots. Thus, the number of roots of the original equation in the
left half s-plane has been ascertained. The total number of roots is equal
to the order of the original equation. Therefore the number of roots on
the axis is equal to the total minus the sum of those in the right and left
half s-planes.
Relative Merits of Routh's Criterion. This criterion serves as a
quick check on absolute system stability. It can also be used to advantage
in the more powerful Nyquist criterion. It nicely avoids the necessity
for factoring an equation to determine the nature of its roots. This method
does not provide a clear indication of system performance and does not
clearly show the ways to improve a design should improvement be required.

STABILITY

21-09

4. NYQUIST STABILITY CRITERION

This powerful criterion is based on the fact that the frequency response
of the open loop transfer function indicates the stability characteristics of
the closed loop system. In Fig. 1 the open loop transfer function is represented by G(s)H(s).
Restrictions on the General Nyquist Criterion.

a. G(s)H(s) must be the ratio of the transforms of linear differential
equations.
b. G(s)H(s) must be single valued and an analytic function (Ref. 5)
for all values of s having zero or positive real parts except at possible discrete
points (Ref. 4).
Basic Definitions. In general G(s)H(s) is a fraction of rational polynomials in s.
N 1 (s)
Kl(S + SI)(S + sa) ...
G(s) = - - = - - - - - - - - (13)
1)1(S)
(s + S2)(S + S4)(S + S6)
(14)

N 2(s)
K2(S + s'!)(s + s'a) ...
H(s) = - - =
.
1)2(S)
(s + S'2)(S + S'4)(S + S'6) ...

The all important eq. (2) can be written as
(15)

1

+ G(s)H(s)

= 1

+ N 1(s)N2(s)
1)1 (s )1)2( s)

(16)

1

+ G(s)H(s)

1)1(S)1)2(S)
=

+ N 1(s)N2(s)

1)1 (S)1)2(S)

.

Characteristic Function. [1)1 (S)1)2(S) + Nl (s)N 2(s)] represents the characteristic function of the closed loop system of Fig. 1. The characteristic
equation is merely the characteristic function set equal to zero.
Zeros. The factors (s
SI), (s
sa), "', represented by Nl (s) are
called zeros of G(s). This terminology arises because when s takes on the
value of a root of Nl (s), i.e., -s1, -Sa, .. " Nl (s) equals zero and G(s) does
likewise per eq. (13).
Poles. The factors (s + S2), (s + S4), '.', represented by 1)1 (s) are
called poles of G( s). When s takes on the value of a root of 1)1 (s), i.e.,
-S2, -S4, '.', 1)1(S) equals zero and G(s) goes to infinity per eq. (13).
This rise to infinity is called a pole.
Note. Per eq. (16), poles of G(s)H(s) are also poles of [1
G(s)H(s)]
whereas zeros of [1 + G(s)H(s)] are unknown and their nature to be
determined by the stability criterion. Zeros of [1 + G(s)H(s)] are poles
of C(s)/R(s).

+

+

+

21-10

FEEDBACK CONTROL

Nyquist Criterion. General Procedure.
a. Plot G(s)H(s) for s traversing the boundary of the entire right half
s-plane in a clockwise direction. (See following note.)
b. Draw a vector, Y, from (-1 + jO) [the minus one point in the
G(s)H(s)-plane] to G(s)H(s) and observe the angular rotation of this
vector for the above values of s.
c. Let R be the net number of revolutions of this vector. R is positive
for counterclockwise revolutions and negative for clockwise revolutions.
d. Determine the number of poles of G(s)H(s) in the right half s-plane,
i.e., poles with positive real parts. Call this integer number P. If necessary,
Routh's criterion may be used to determine this.
e. The number of zeros of [1 + G(s)H(s)], Z, is determined from the
equation
(17)

Z = P - R.

f. The system is stable if and only if Z = 0, i.e., if the number of
counterclockwise revolutions of G(s)H(s) about the -1 point is equal to
the number of poles of G(s)H(s) in the right half s-plane.
Note. If G(s)H(s) has any poles on the jw-axis (i:e., pure imaginary
roots), when s is taking on values up the jw-axis, it must bypass these
points. It is customary to make s traverse a small semicircle to the right
of these points as shown in Fig. 4. If G(s)H(s) ever does have poles on
+joo

jw
jWl

(J'

-jwl

s-plane

-joo

FIG. 4. Traversal of s for the Nyquist plot where G(s)H(s) has poles at ±jwl and O.

the jw-axis whose values are unknown, it is almost as much effort to determine these as it is to find the zeros of [1 + G(s)H(s)] directly. In this
case, Dzung's criteria (Refs. 19, 20) may be a better approach for stability
analysis. Fortunately this condition arises infrequently.
The Physical Meaning of Making s Traverse the jw-Axis. In short,
it is obtaining the steady-state frequency response of the open loop transfer

STABILITY

21-11

function G(s)H(s). Consider the case shown in Fig. 5. A sin wt is the
input and in the steady-state condition, B sin (wt + 0) is the output. To
be theoretically exact, the steady-state condition is the condition that exists
after an infinite time has elapsed. This allows all the transients to die
out to absolute zero for a stable system. For practical consideration,
steady state occurs after the transients have settled down to arbitrarily
A sin we ): .....
I __
G_(S_)H_(S_)_-.JI B sin (wt + 0)

FIG. 5. Frequency response measurements.

small values. Comparison of the ratio of the output sinusoid to the input
sinusoid reveals that a gain change, B/ A, and a phase shift, 0, have occurred. This gain change and phase shift are due to G(s)H(s) and can be
considered as the magnitude and direction of a vector. This is the vector
notation of the steady-state behavior of G(s)H(s).
Sinusoidal Input Variable. The Laplace transform of the input
variable is
(18)

w

w

.c-1 [sin wt] =

2

S

2

+w

(s

+ jw)(s -

jw)

The graphical representation of the Laplace transform of this sinusoid
is a pair of points on the imaginary axis a distance of ±w from the origin.
As the frequency of the sinusoid varies, w varies with it. See Fig. 6.
Consider the case where
K
G(s)H(s) =
.
(19)
(s

+ S2)(S + S4)

The poles of G(s)H(s) are plotted at -S2, and -S4 in the s-plane in
Fig. 6. In this same figure are plotted the poles of the input sinusoid
whose frequencies are successively Wo, wI, W2, W3, W4, •• .', wn • The corresponding vectors representing the gain change and phase shift G(jw)H(jw)
for each frequency are plotted in Fig. 7. One method to obtain G(s)H(s)
for any particular s is to substitute the particular value of s in eq. (19).
An equivalent, but more illuminating, procedure is to consider each factor
in eq. (19) as a separate vector. In Fig. 6 are shown the two factor vectors
for s = jW2' As s assumes values up the jw-axis, the vectors from the roots
increase in magnitude and phase. Since these vectors appear in the
denominator of eq. (19), as s traverses up the jw-axis, the magnitude of
G(s)H(s) decreases whereas its phase becomes increasingly negative. For
this particular transform given by eq. (19), for positive w, G(jw)H(jw) lies
in the third and fourth quadrants. The entire curve expands or contracts
with respective increase or decrease in the gain eom;tant K.

21-12

FEEDBACK CONTROL

Imaginary

-jwz
G(s)H(s)-plane

s-plane

FIG. 6.

s-Plane plot of input sinusoid and
vectors of G(s)H(s) factors.

FIG. 7. G(s)H(s)-Plane plot of

frequency response ofG(s)H(s).

Conformal Mapping. Mathematically, G(s)H(s) is a function which
transforms a point in the s-plane to a point in the G(s)H(s)-plane. This
mapping of points or curves in one plane to points or curves in another
plane is called conformal mapping. The line along the jw-axis in the
s-plane maps into the curve in the G(s)H(s)-plane shown in Fig. 7 by use
of the transform G(s)H(s). An important point to remember is that any
curve in the s-plane produces a corresponding curve in the G(s)H(s)plane. The curve in the s-plane which lies on the jw-axis corresponds to
the input function being a variable frequency sinusoid. The shape of
the corresponding curve in the G(s)H(s)-plane depends on the particular
fraction of polynomials represented by G(s)H(s).
Figure 8 shows other lines along which 8 might vary. Figure 9 shows
the corresponding curves of G(s)H(s) for G(8)H(s) given by eq. (19).
Points on line (1) correspond to input functions of the form.

(20)

whose Laplace transform is
(21)

£-1[e- u1t

w

sin wt] = - - - - - - - - - - (8 + 0"1 + jw)(s + 0"1 - jw)

STABILITY

21-13

Points on line (2) correspond to input functions whose Laplace transform
is
(22)
The conformal mapping procedure obtains definite corresponding curves
for G(s)H(s) as shown in Fig. 9.
Imaginary

/
-0"1

-0"1

+ jWl

+ jco

----

+ jco

jw

0"1

j~L__

0"1 + jWl

2

/

./

--

/ /
I /
I I /\ I I
\ \ (

~:\~

"~~

Wo

~- - / _ From Fig. 7
,..-

'-

"

,

,

'~

2

\
1
. . . , " '\.A
"0. \

i

~

Wo

'1

Re

s-plane

G(s)H(s) -plane

FIG. 8.

Particular paths of s in
s-plane.

FIG. 9. Plot of G(s)H(s) for paths of s

in Fig. 8.

Principles of Nyquist Criterion. By use of conformal mapping principles it can be shown (Ref. 6) that if s is made to traverse the boundaries
of a given area, observation of the behavior of the vector from the -1
point to G(s)H(s) in the G(s)H(s)-plane indicates how many zeros of
[1 + G(s)H(s)] lie in the area whose boundaries were traversed by s.
Refer to Fig. 10, where s is made to traverse the boundary of area A,
and the corresponding path of G(s)H(s) is as shown in Fig. 11. Observation of the net rotation of the vectior iT about the -1 point gives a clear
indication of the roots of [1 + G(s)H(s)] in area A. For every pole of
G(s)H(s) I09ated in area A, iT will experience one net counterclockwise
rotation about the -1 point. For every zero of [1 + G(s)H(s)] in area
A, V will experience one net clockwise rotation about the -1 point. Therefore if the number of poles, P, of G(s)H(s) in area A is known, the number
of zeros of [1 + G(s)H(s)] in area. A can be found by subtracting from P
the number of net revolutions of 11 about the -1 point. If area A is

FEEDBACK CONTROL

21-14

made to encompass the entire right half of the s-plane, existence of zeros
of [1 + G(s)H(s)] in this area can be determined from the above procedure
and stability of the closed loop system can be ascertained!
Imaginary

Re

G(s)H(s)-plane

s-plane

FIG. 10.

Arbitrary path of s in
the s-plane.

FIG.

11.

Corresponding path of
G(s)H(s).

The left portion of the boundary in Fig. 12 corresponds to making the
input to G(s)H(s) a sinusoid. The traversal out at infinity is only of
mathematical importance because infinite values are difficult to handle
in physical equipment. For practical purposes, that finite region relatively
close to the origin is of major imjW---~----"""'~
portance as will be more clearly
demonstrated in the Bode approach
\
Outatoo~
to stability analysis.
SUllllllary. Stability is uniquely
I
t
defined by those values of s which
make
I
1 + G(s)H(s) = O.
I

,

,

s-plane

I

To ascertain existence of zeros of
[1
+ G(s)H(s)] in the right half s.,,/JIplane, Nyquist's criterion requires
-----~---s to traverse the boundary of the
FIG. 12. Path of s enclosing entire right
half s-plane.
entire right half s-plane. The portion of the boundary of major importance, thejw-axis, corresponds to a sinusoidal input function. Therefore,
the frequency response of the open loop transfer function G(s)H(s) gives
clear indication of stability of the closed loop system. This is most fortunate because constant amplitude variable frequency generators are
much easier to build than exponentially varying variable frequency
I

STABILITY

generators.
mented.

21-15

Experimental procedures are thereby more easily imple-

Application of Nyquist Stability Criterion.

EXAMPLE 1.

Given

G(s)H(s)
In Fig. 13 consider s in the region from b to c. As w becomes increasingly
large,
J(
lim [G(s)H(s)] = "3 = 0/ -270°.
8--+j~
s
In this region G(s)H(s) approaches zero asymptotically to the -270degree direction, i.e., the +j.B-line. As s traverses the boundary c-d-e
Imaginary

c t - - - - _.....

--~ ......

jw

,
\

\
I

b

I
I

b

------qd

Ih

A
-1

I
I

G(s)H(s)plane

--_/
FIG. 13.

s-Plane plot

G(s)H(s) =

s(s

FIG. 14.

/

I
t

I

Nyquist plot of G(s)H(s).

1(s + SI)
+ S2)(S + S4)(S + S6) .

out at infinity, the G(s)H(s) vector rotates 540 degrees in the counterclockwise direction, but since the magnitude is zero, this rotation is unobservable. The region e-f is the conjugate of c to b. In the region f
to g there is a continuous curve which wiggles a bit because of the pole
zero locations as shown in Fig. 13. At point g the s traverse takes a 90degree turn to the right. In conformal mapping, angles are preserved in
the small, therefore the G(s)H(s) plot also takes a 90-degree turn to the
right. In the region g-h-a, G(s)H(s) behaves like K/s.

K
lim [G(s)H(s)] = -.
8--+0

S

FEEDBACK CONTROL

21-16

In other words, the movement of s is very close to the pole at the origin
so the vectors of the poles, and zeros relatively far away do not experience
great change. The vector of the pole at the origin experiences a 180degree change in the counterclockwise sense. Since this vector is in the
denominator of G(s)H(s), G(s)H(s) experiences a 180-degree change in
the clockwise sense.
The region a to b is the conjugate of g to f. The G(s)H(s) plot is usually
plotted solid for s = +jw and dotted for the rest of the boundary. From
Fig . .13 it is apparent that G(s)H(s) has no poles in the right half s-plane.
p = o. Notice that the zero of G(s)H(s) is not considered at all. If the
gain constant is such that the -1 is at point A in Fig. 14, R = o. Therefore
z = P - R = 0 - 0 = 0,
and the closed loop system is stable.
If the gain constant is raised such that the -1 point is at B, there are
two clockwise encirclements of the -1 point,

z

= P - R = 0 - (-2) = +2,

and the closed loop system is unstable and has two poles in the right half
s-plane.
If the gain were adjusted such that the -1 point were at D, i.e., the
G(s)H(s) curve passes right through the -1 point, R is indeterminant.
This condition produces a constant amplitude sinusoidal oscillation in the
closed loop system. A change in the gain constant is like changing the
calibration on the coordinate axes.
EXAMPLE 2.
Given
K(s - 10)
G(s)H(s)
S2 + 100
In the region a to b in Fig. 15
G(s)H(s) =

For

+ .)

K( 10
- _ wJW
100
2

KVw2 + 100 /
100 _ w2

tan- 1(w / -10).

K

W

= 0, G(s)H(s) = - / -180° .
10

As w increases, the magnitude of G(s)H(s) increases and the phase angle
becomes more negative.
As W approaches 10, G(s)H(s) approaches infinity along the + 135-degree
line. In the region b-c-d
lim [G(s)H(s)] =
8-+jl0

K

..

(s - JI0)

STABILITY

21-17

Therefore, since s takes a 90-degree right turn and proceeds 180 degrees
counterclockwise, G(s)H(s) takes a right turn and proceeds 180 degrees
clockwise. In the region d-e, G(s)H(s) is well behaved and proceeds to
zero as s approaches joo. As s ~ 00,

K
lim [G(s)H(s)] = - = 0/ -90 0

s

8--+jOO

•

Along e-f-g, G(s)H(s) remains at zero. The rest of the curve is the conjugate image of e to a. For this system P = O. From Fig. 16, R = 0
for the -1 point at A. This system is stable.
For the -1 point at B, R = -1 and Z = 1. This system is unstable.
Imaginary

e

;f/

jw

--~

h

-----."..?'/~ ,

b

.,/'

d

\ \

\I
II
II

tl

I
(J"

A

f

a B 'e-f-g

-1

10

",/
s-plane

i(
g

FIG.15.

II

If/

c

h

/

s-Plane plot

,(/

/

It

I

I

II
II

II

/

tI
d/II

~____

FIG. 16.

Re

~

_ _ _ _ _ ,Jfi

Nyquist plot of G(s)H(s).

K(s - 10)
G(s)H(s) =
s2 + 100

Practical Considerations in Plotting DiagraIlls. If one should ever
find that the number of counterclockwise encirclements of the -1 point
is greater than P, he may correctly infer that he has made a mistake in
calculating either P or R!
The procedure in drawing the Nyquist diagrams is first to draw in the
approximate shape of the G(s)H(s) curve for the prescribed traversal of
s. The labor involved is by no means negligible. To avoid unnecessary
labor, the reader is advised to learn first how to use the following Bode
diagrams and apply them to obtain the exact Nyquist plot when necessary.
The Bode approach is presented after the Nyquist criterion for ease in
presenting the requisite theory. Subsequent usage should by no means
be affected by order of theoretical presentation.

21-18

FEEDBACK CONTROL

Strictly speaking, the small semicircles about poles of G(s)H(s) on the
jw-axis and also the traversal of s out at infinity do not correspond to
constant amplitude sinusoidal input. The polar plot used in the Nyquist
criterion is therefore not strictly a frequency response plot. For purposes
of simple definition, these exceptions are overlooked.
Abbreviated Nyquist Stability Criterion. When the open loop transfer function is stable by -itself

P

=

0,

and the criterion for stability reduces to

R =

o.

STATEMENT 1. For a stable open loop transfer function, the closed loop
system will be stable if there are no encirclements of the - 1 point in
the G(s)H(s)-plane for s = jw.
The criterion may be further reduced to observing the behavior of
G(s)H(s) for positive w in the region where the magnitude of G(s)H(s) is
near unity. The additional restriction is that G(s)H(s) becomes a constant

less than 1 (or zero) as s becomes increasingly large. This restrictfon
means that in eq. (15) the order of N 1 (s)N 2 (s) is less than or equal to the
order of Dl(S)D2(S). Where the respective orders are equal, the product of
gain constants, K 1 K 2 , from eqs. (13) and (14) must be less than 1.
For the cases that fall within the above-mentioned restrictions (and
there are many), the criterion can be restated.
STATEMENT 2. In the region of frequencies where G(jw)H(jw) is near the
unit circle, the system is stable if the -1 point is not encircled.
STATEMENT 3. If the further restriction is imposed that G(jw)H(jw) is
well behaved in the region of the unit circle, then stability is indicated by the
phase angle of G(jw)H(jw) for positive values of w when it crosses the unit
circle. For phase angles less than -180 degrees at unit circle crossover

the system is stable. For phase angles more negative than -180 degrees
at unit circle crossover, the system is unstable. In :Fig. 17, G1 (s)H 1 (s)
represents a stable closed loop system whereas G2 (s)H2(S) represents an
unstable system. A well-behaved G(s)H(s) is loosely defined as one that
does not wander too much in the region of the unit circle. A not too wellbehaved open loop transfer function is shown in Fig. 18. For systems of
this type, the general Nyquist criterion should be used. Adequate information about system stability is contained in Fig. 18, but more than
the first unit circle crossover must be inspected.
Phase Margin. For those systems that do fall within the abbreviated
criterion, additional definitions have evolved.
The phase of G(jw)H(jw), measured with respect to the positive real axis

STABILITY

21-19

and defined as positive in the counterclockwise sense, is given as e. The
phase margin is the phase of G(jw)H(jw) at unit circle crossover and is
measured with respect to the direction of the -1 point:
(23)

'Y

= 180

0

+ e.

In Fig. 17, G 1 (s)Hl(S) has a positive phase margin whereas G2 (s)H2(S)
has a negative phase margin. Phase margin at unit circle crossover evidences
system stability with plus and minus values indicating stable and unstable
systems respectively. Zero phase margin at unit circle crossover means that
G(jw)H(jw) passes through the -1 point and therefore that the closed

loop system will sustain a constant amplitude oscillation.
Imaginary
Imaginary

Re

G(s)H(s)-plane

FIG. 17.

Unit circle in the G(s)H(s)-plane.

FIG. 18.

Not too well-behaved G(s)H(s).

EXAMPLE. In Fig. 18 the phase margin at the first unit circle crossover
is positive. The -1 point is encircled so the system is unstable. This
example illustrates the case where inspection of only the first unit circle
crossover could lead to erroneous conclusions.
Gain Margin. A second point of particular significance is the gain or
magnitude of G(jw)H(jw) where it crosses the negative real axis. This is
0"1 in Fig. 19. The reciprocal of this value is the gain margin of the system.
The gain constant of G(s)H(s) could be raised by a value 1/0"1 before
instability arose.
The -1 point can be considered a vector of unit magnitude and direction
of. -180 degrees. Note that the phase margin is defined with relation
to the magnitude of the -1 point whereas the gain margin is defined with
relation to the direction of the -1 point.

Conditional Stability, Unconditional Stability.
A conditionally
stable system is one where instability can come about by either an increase
or decrease in system gain. An unconditionally stable system is one where

21-20

fEEDBACK CONTROL
Imaginary

Re

l' = 180

+ (}

G(s)H(s)-piane

FIG. 19 .. Determination of phas;e margin, gain margin.

instability can come about only for an increase in system gain. Figures 20
and 21 illustrate these cases.
Imaginary

./

-I

./

,.. -*

---,

Imaginary

"'-

/

---

,

~

~

'\.

\

/

\

I

\

\

\
\ Re

\
\

........

\

~

-1

\ Re

I

J

\

\

,

"- "-

FIG. 20.

~-

--

---~

./

/

/

/

/

'f

-~-

,..,,/

G(s)H(s)-plane

G(s)H(s)-plane

Conditionally stable system:

,

/

/

I

FIG. 21.

Unconditionally stable system:
K

G(s)H(s) = - - - - s(s
S2)(S
S4)

+

+

Inverse Polar Plots. The preceding Nyquist diagrams are polar plots
of G(s)H(s). These diagrams led to ascertainment of the nature of the
zeros of eq. (2),
1 G(s)H(s) = O.

+

STABILITY

21-21

If in this equation both sides are divided by G(s)H(s),

1

---+
1 =0.
G(s)H(s)

(24)

Let G'(s)H'(s) represent the inverse of G(s)H(s)

G'(s)H'(s)

(25)

+ 1 = o.

The above mathematical manipulations cannot alter the factors of eq.
(2). The zeros of eq. (24) or (25) are exactly the same zeros of eq. (2).
Investigation of system stability via the inverse polar plot leads to conclusions identical to those arrived at by use of the direct polar plot of
G(s)H(s). In certain design applications use of the inverse loop transfer
function may more clearly demonstrate effects of design changes.
Polar Plots of Some Common Open Loop Transfer Functions.

The following plots represent some commonly encountered system functions. Once the reader recognizes how these were generated, he should be
ready to handle any newly encountered situation. In the G(s)H(s)-plane
plots are the letters A, B, C. These represent possible locations of the -1
point dependent upon the value of the gain constant K. Stability is
indicated for various locations of the -1 point. See Figs. 22-36. See also
Figs. 13-16.
Imaginary

~---,

//
---x-------+--------~~-

\

-1

-S2

"\

Re
w=O

G(s)H(s)-plane
Always stable

FIG. 22.

Polar plot of
G(S)H(8)

=

K
--I

(8

+ 82)

p

=

o.

21-22

FEEDBACK CONTROL
Imaginary

-+______r-~~~------~w~=-~e

---X--~r-~--------~-r-

-s2

-1

-Sl

\

\ "' .... _ _ ...... .Jt! /

I

I

G(s)H(8)-plane
Always stable

FIG. 23.

Polar plot of
G(s)H(s)

=

K(s
(s

+ S1) ,

+ S2)

p = O.
Imaginary

jw

-"",,
\

-o

FIG. 39.

Minimum phase response curve regions.

The Nyquist Stability Criterion Rephrased in Terms of the Bode
Diagrams. For well-behaved, minimum phase G(s)H(s), the closed loop
system is stable if at the frequency where the log magnitude of G(jw)H(jw) is
equal to zero, its phase angle is less than - 180 degrees. Or, the system is
stable if at the frequency where the phase angle of G(jw)H(jw) is -180°,
the log magnitude is less than zero. If the condition arises where the phase

angle is equal to -180 degrees and the log magnitude is zero at a frequency
Wo, the closed loop will sustain a constant amplitude oscillation at a frequency woo This condition corresponds to G(jw)H(jw) passing through
the -1 point in the Nyquist plot.
H the phase angle of G(jw)H(jw) is defined in terms of phase margin as
given by eq. (23) and Fig. 19, the stability criterion is commonly expressed
as follows. At gain crossover (the point where the magnitude curve crosses
the log M = 0 line) a positive phase margin indicates a stable system whereas
a negative phase margin indicates an unstable system.

By use of Bode diagrams it is possible to deduce whether a system is
or is not stable in situations more complicated than that wherein G(s)H(s)
is a well-behaved, minimum phase network. When complicated situations
arise, final conclusions should be checked by use of the general Nyquist
criterion or the Routh criterion.
Mechanics of Drawing Bode Diagrams. When given a transform
G(s)H(s), the most straightforward procedure in drawing Bode diagrams
is to pick values of jw, substitute into G(jw)H(jw) , and grind out the
complex algebra. Fortunately this laborious procedure is not required
frequently because G(s)H(s) is usually known in factored form. There
are four basic building blocks used in drawing Bode diagrams.
1. K±\ a pure gain constant.
2. S±l, a pure differentiation or pure integration.
3. (s + wO)±l, a simple lead or simple lag.
4. (S2 + 2rwos + w0 2)±1, a quadratic lead or quadratic lag.
In reverting these basic factors to logarithmic plots it would be entirely
possible to use logarithms to the base e and to use the common multiplier

21-32

FEEDBACK CONTROL

of 1. Since the decibel concept was in vogue and orders of 10 are easier
to handle than orders of e, logarithms to the base 10 were used and the
multiplying factor was taken as 20. A decibel is equal to

Po
Vo
Decibels = db = 10 IOglO - = 20 IOglO - .
Pi
Vi

(30)

Transfer functions in general are more similar to voltage ratios, VO/Vi'
than to power ratios, PO/Pi, therefore the multiplying factor of 20 is
commonly used. Some writers (Ref. 10) would rather use the multiplying
factor of 10 and units of decilogs, but this seems to be of small consequence
in stability analysis.
The First Building Block: The Pure Gain Constant.

20 log K±l = ±20 log K.

(31)

The logarithm of a pure gain constant is independent of frequency and
therefore plots as a horizontal line in the magnitude and phase curves.
K has zero phase angle if it is positive and -180 degrees if it is negative.
Magnitude

/ 2 0 log K (K> 1)

~~

tl.O.2 0

~20 log

0·-

-

u

K(K'tx
0.4 .
0.5

-60
IJ)
Q)

~§~ ~ ~
0.6V-'::::~
10.8 I
r = 1.0--

~ -80
Q)

~~

"C

-120

-180
0.1

»
n

(;:2 + 2r C:O+ 1) merely

reverse sign of ordinate
I

o

r-

r-f-..:

~

I
0.2

A

n

\~~ ~
~ t:-~
~~
~~~::::- r:::--r=::::t:::::r-t--

To obtain values for

-160

oo;J

~

~

«i

::l -100

~

a..

-140 I - -

-n
m
m

0.3

0.4

0.5 '0.6

0.8

1.0

2

wlwo~

FIG. 45.

Phase curves of [(S2/w0 2)

+ 2r(s/wo) + 1]-1.

3

4

5

6

8

10

Z

--I
;;:c

o
r-

STABILITY

break frequency. From eq. (40)
(41)

M(w) = ±20 log

21-39

Je:Y+ ( ::s·
1-

For w/wo much less than 1, all w/wo terms are negligible and
(42)

M(w)1

~ ±20 log v1

W

=

0 db/decade.

-«1
wo

For w/wo much greater than 1, the dominant term is
(43)

M(w)

I;;»

1

= ±20 log

J(-::s

w

±40 log-·
Wo

This is merely twice the slope of a simple break for the similar assumption. Therefore, the asymptote of a complex break is ±40 db per decade
for large w/wo.
When a quadratic factor is encountered, a first approximation is made
by considering r = 1, which means that a simple break of multiplicity
2 occurs at woo For more accurate work, the data in Figs. 44 and 45 must
be used. In this case, r is calculated from the given quadratic factor and
the requisite magnitude and phase information obtained from the corresponding r curves.
The scale shown in Fig. 43 can be used to obtain phase of quadratic
factors by proper use of the additional scales on the right and left sides.
Since a graph must be kept for magnitude information, it seems logical
to use a graph for phase information also.
It is possible to plot graphs of magnitude correction terms for simple
and complex factors to an expanded scale to improve accuracy. Since
the corresponding phase correction curves offer small expanded scale
possibilities, it is usually of little value.
The Transforlll of a Pure Tillle Delay e -sT. This is of particular
interest in many cases.
(44)

20 log e- sT I

. = 20 log 1 -

jwT = 0 - jwT.

S=3W

Equation (44) shows that the magnitude is independent of frequency
and the phase is linearly related to frequency. The magnitude and phase
curves are shown in Fig. 46. This function falls within the limitations
of the Nyquist criterion, and stability can be investigated in the usual
fashion.

FEEDBACK CONTROL

21-40
In

Qj

10

..c
'(3

Q)

"C

~
tlO
0
..J

;/

a

-Magnitudele-ST

I

-10

/Phase/e-ST

-~

a
-100

~

~,
\

In

Q)

~-200

Q)

"C

~

-300
-400
0.1

0.2

0.4

2

4

\

10

wT~

FIG. 46.

Bode diagram of e- sT •

Application of Bode DiagraIlls.
EXAMPLE

1. Draw the Bode diagram of

G(s)H(s) =

316

s(s

+ 10)

.

First put the simple lag factor in nondimensional form

G(s)H(s)

31.6

s[(s/10)

+

1]

Separate individual factors
1

1

G(s)H(s) = 31.6 . - .
s (s/10)

+

.
1

The asymptotic approximate and the exact Bode diagrams of these individual factors are shown in Fig. 47. The composite G(s)H(s), in heavy
solid lines, is merely the summation of all the separate magnitude and
phase curves as indicated by eq. (29). At gain crossover w = 16, () = -147°
for the exact curve whereas w = 18 and () = -150° for the approximate
curve. In most cases the approximate answers are sufficiently accurate
because in practice the transfer functions represent average values and
will not correspond exactly with delivered equipment. Also, as equipment wears in normal use, the transfer characteristics change. For these

21-41

STABILITY

reasons the designer must usually provide a margin of safety and some
adjustments which will permit small changes when required for improved
system performance.
The system shown in Fig. 47 is stable for all values of loop gain because
the phase angle approaches -180° asymptotically. The Nyquist plot of
a similar transfer function is shown in Fig. 25. It is well to keep both
representations in mind.
60

I

I'--r-...
40
..0
.(3

20

Q)

"C

c:
0

:;::

ro
:J
c:
Q)

~

0

---

31.6

:\

I--.....

en

OJ

IS(fo+l) I

31.6

--k~1

Exact gain crossover"

!--<~ (fo~ 1)1'\ ~ k.~APproXimate gain crossover
1

---

-20

«

r--.... I----

~

~~

---r-- ""'-'"----r--.. r--..
-......
t--- -.....

-40

~ ......

1-/31.6

0
:i.l -30
~ -60
"C -90
. ~ -120
it -150
-180
0.1
Q)

;t

0.2

-

A

0.4

;/
1""---_

--1

2

....

4

--L

-......

~

10
w-

TO +
~

--...

t..-

-s-

1

31.6

./ /,S(&+1)
~
I
"20
40
100

200

Bode diagram of G(s)H(s) = 31.6· (l/s)· [l/(s/lO)

FIG. 47.

400

1000

+ 1].

EXAMPLE 2.
Given the following loop transfer function, determine ](
such that gain crossover occurs at 0 = -135° at w greater than 100 rad/sec.

K(s

G(8)H(8) = (82

+ 80)

+ 68 + 100)(8 + 400) .

Again, by nondimensionalizing and separating factors
G(s)H(s)
]( ·80
100·400

[(sI1O)2

+ (~.6SI1O) + 1J .(8: + 1) . (8140~) + 1

The frequency response curves are shown in Fig. 48. The composite
curves can be drawn without resort to drawing all the individual curves
as done in Fig. 47. Neglect the constant term and consider first the

FEEDBACK CONTROL

21-42

asymptote approximations to the separate factors. There is a quadratic
lag break at 10, a simple lead break at 80 and a simple lag break at 400.
The approximate curve is flat out to 10, breaks down to -40 db per decade
at 10, breaks to - 20 db per decade at 80 because of the simple lead, and
then breaks back to - 40 db per decade at 400 and continues on at this
slope. The servomechanism scale shown in Fig. 43 is very useful in drawing
thei3e asymptote lines. The exact curve is drawn in by obtaining correction

If)

"iii
·0

./"" ~

0

OJ

"tJ

-20

r:::

0

:;:;
ro
:J

I""-

I
ydb

.0

-40

0

c

I. de
Magnltu

~" .....

,

~

OJ

~ -60

---

0
111

-45

OJ

So
OJ
."tJ

-90

a)
If)

"

ro

E: -135
-180

1

FIG. 48.

2

4

\

Phase

"',
10

20

"""

t
40

100

200

--...

~

400

" '"

"-

--

1000 2000 4000

..
( )H( ) _ ~
(8/80) + 1
Bode dIagram of G 8 8 - 500 [(8/10)2 + (0.68/10) + 1] [(8/400)

"'

10,000

+ 1]

terms for the quadratic lag for r = 0.3 from Fig. 44 and for the simple
lead and simple lag from Fig. 42. The exact phase curve is drawn in by
use of the servomechanism scale shown in Fig. 43 for the simple breaks
and the phase curve for the quadratic lag by use of the phase curve for
r = 0.3 in Fig. 45. The arrowhead on the servomechanism scale is placed
at the frequency where the phase is desired, and the phase contribution
by the simple breaks is read at the break frequencies. The lead and lag
terms contribute positive and negative phase angles respectively.
To set the gain constant such that gain crossover occurs at () = -135°
at w greater than 100, the entire magnitude curve is shifted up until
this occurs. Instead of shifting the magnitude curve up, it is simpler to
shift the zero db line down. This corresponds to recalibration of the db
axis. The required amount of O-db line shift corresponds to /(/500. To

STABILITY

21-43

meet the requirements of the example /(/500 = 42 db. ]( therefore must
equal 63,000.
Use of Bode Diagrams in Drawing Nyquist Plots. When system
design is attempted by use of the Nyquist diagrams, it soon becomes apparent that the labor involved in drawing the diagrams is excessive. This
comes about because design changes come in terms of multiplying factors
which are laborious to incorporate because multiplication is a relatively
complex process. The logarithm concept of the Bode diagrams reduces
10

'""

4

"~

Is(to +1) I31.6

'\

\

~

\

0.2
0.1

1

2

4

10

20

\

40

100

w~

FIG. 49.

Reproduction of magnitude of Fig. 47.

multiplication to the simple process of addition. It is advantageous first
to plot a Bode diagram and transfer the values from this diagram to the'
polar plot "Then information is desired in such form. The major stumbling
block to this procedure is the conversion of decibels to gain numbers. It
is possible to plot the Bode magnitude diagrams on log-log paper, as shown
in Fig. 49 and thereby circumvent the use of decibels. Gain factors are
clearly brought to view.
This approach suffers from two major disadvantages. First, the plot
of phase is still best accomplished on semilog paper, therefore separate
scales would be required for the magnitude and phase curves. Use of
decibels allows a. single semilog paper to be used for both plots. Second,
in adding the magnitudes of two factors, a pair of dividers or some such
device would become necessary. Shift of the zero line is not as simple as
it is with the decibel scale.
A more useful approach is to use a scale as shown in Fig. 50. The scale
is transparent. Three possible scale factors on the decibel scale are avail-

FEEDBACK CONTROL

21-44
38.6

40

100

342.L- r - 5 0

56 60

32.2
29.5

30

-20

23.5

18

15
2 0 _ '--10
18

15.6
14

35.6
35 34
32.2
3 0 - -32
29.5

40
32

30

25 26 -

18.6

40
38.6- -85
38

38

35.6

26

85

25

23.5

15- ,....---L-5.6 5

18

100

40
38.6
38

60
50

35.6

40

34

30

32.2

20

29.5

10

3.2
~ ~2

III

~

"2

I:)

3.5

18
1.8

III

-6

c:
·iii

5

-7.8

C)

.32 3

15- 14 -

-.2

-16.5

6

-24.4.
-26

25

-27.8
-30.4 -30
_35- 34 -

.06

.18

.~

c

.056
.032

-36.5

1.5 ~

.0

III

.085

-2

05
03

.018

-10.5

015

20

23.5

15

20
18.6
18

10
8.5

15.6 15- -5.6 6
14
5

2

0
1 . 4 - -.85

4.4
-6
7.8
-10

-15

.~b

.32

3

6

.3

2

5- -1.8

1.5 ~

3.5

Q)

If)

..c

iii
.18

.2

=3

.15

Q

Q)

-20
.1
21.4- 1--.085

-25 24.4
-26

.06
05
27.8
.04
- 3 0 - 1--.032
-30.4
03

1 §
85 ~

0
-2

-1.4

'ro

6
-4.4 -5-.56
-6
.5

-7.8

.018

-10.5

02

36.5

015

-40

01

3

.2

15- -.18

-16.5

.15

-20
-21.4
-22

.1
- ·9!3S

-24.4

25- -.056

-26

-30.4
-34

-36.5
-40

Gain-decibel conversion scale.

.06
.05

-27.8

FIG. 50.

(!l

.4
10- -.32

-14

-34
-35

9.5

c:

'iii

.6
.5

.4

16.5

-22

4

10- -3.2

C!l

-14

-40-'--.01

12.2

~c:

Q;

.0

.04

r-.02

1.8

3.5
.15

_ 21.4 - 20-;;- _ . 1

30

25- -18

.4
10

40
30- -32

15

1

5- -L.56

50

26

20
10
18.6- -8.5

15.6
5.6 6
14
5
12.2
4
1 0 - -3.2
9.5
3

t

1.5

.85

-1.4
-4.4

-10.5

3

60

35- -56

15

12.2
9.5

100
85

04
30- -.032

35- -.018

.03
.02
.015
.01

21-45

STABILITY

able. The scale is placed vertically on the Bode magnitude diagram and
values of gain read directly.
Another approach is that of a graph as shown in Fig. 51. Values of
decibels are read off the magnitude curve and the graph is used to convert
decibels to gain numbers.
+50
J
IL

+40

t'!.

L

+30
II

+20

1.;

IL

+10
If
VI

II

Qi

:g

0

l.-'

Q)

CI

1..,-

-10
1/

-20
1"'-

-30 1.1
-40
Gain numbers

FIG. 51.

Gain-decibel conversion chart.

Relative Merits of the Bode DiagraIn Approach. This approach
is by far the most generally used method in system design. Design modifications can be analyzed with a minimum of labor involved in drawing
the diagrams. The approximate curves allow the designer to investigate
a host of designs in a short time. The most promising approximate designs
are then investigated more exactly. The host of curves provides an indication of how system performance will change in the presence of some nonlinearities in the system.
This approach is based on the limitations of the abbreviated Nyquist
criterion, and when complex situations arise, it is best to revert to the

21-46

FEEDBACK CONTROL

Nyquist criterion or Routh criterion for exact stability 'evaluation.
Bode diagrams can be used to draw the Nyquist diagrams.

The

6. ROOT LOCUS METHOD

This method (Refs. 11-13), developed by Evans, provides a means for
obtaining the roots of the characteristic equation of a closed loop system,
the values of which clearly indicate system stability. Essentially, the
method assumes that a chosen complex number is a root of the characteristic equation and tests to see if it can be. If this test is favorable, one
constant is changed to a value such that the complex number is a root of
the equation: This constant is the loop gain of the closed loop system.
The complex numbers which represent possible roots of the characteristic
equation, when plotted in the s-plane and identified with the necessary
corresponding loop gain, form curves which are the loci of the characteristic
equation roots. These roots represent poles of the closed loop response
which clearly indicate system stability and transient performance.
This plot in the s-plane provides a rapid evaluation of the effects of
varying the gain in a system. It provides a graphical representation of
the predominant features of a closed loop system, i.e., its poles, and when
system behavior is inadequate, provides a clear indication of proper
compensation.
For a system whose loop transfer function is given by

K
G(s)H(s) = - - - - - S(T2S
1)(T4s
1)

+

+

the root locus plot is shown in Fig. 52. This plot shows that as the loop

(

X,---,
1
-

T{

s-plane

FIG. 52.

Root locus for
K
G(s)H(s) = - - - - - S(T2S
1)(T4s
1)

+

+

STABILITY

21-47

gain, I(, is increased, the closed loop poles move in the direction of the
arrowheads. For all values of gain less than Ie, the closed loop poles lie
in the left half s-plane, and the system is stable. For all values of gain
greater than K c, the system is unstable. Kc is a critical gain factor. Also
shown by the plot is the fact that, as the gain varies from Kl to I(c, the
closed loop poles have a decreasing damping factor. Therefore, one can
expect the transient response to be more oscillatory and to have a longer
settling time as the gain is increased in this region.
.
Theory of Construction. Consider the general negative feedback
system shown in Fig. 1.
Assuming that the feed forward and feedback transfer functions are
composed of fractions of rational polynomials in s, i.e.,
am1sm1 + aml_lsml-l + ... + alS + ao
Nl(S)
G(s) = --,
(45)
n1
- bnts + bnl_lSnl-l + ... + bls + bo
Dl(S)
(46)

H (s) =

d sm2 + dm _ISm 2-1 + ... + dIS + do
N (s)
m2
2
= _2_ .
en2 sn2 + en2_lsn2-1 + ... + els + eo
D2(S)

The closed loop response is
(47)

G(s)

C(s)
R(s)

1

N 1(s)D2(S)

+ G(s)H(s)

DI(S)D2(S)

+ N 1(s)N2(s)

The root locus method obtains the roots of the fractional equation
[1 + G(s)H(s) = 0], which are the roots of the characteristic function
[Dl(S)D2(S) + N 1(s)N 2(s)]. It is of interest to note that the closed loop
response has numerator factors (zeros) which are identical to the zeros of
the feed forward transfer function and poles of the feedback transfer
function, N I (s) and D2 (s) respectively.
To find the closed loop poles
1 + G(s)H(s) = O.
Therefore
(48).

G(s)H(s) = -1 = II ±N7r,

N

=

1,3,5,7,

For this identity to exist, the angle of G(s)H(s) must lie along the negative
real axis of the s-plane. This constitutes the angle' condition:
(49)

IG(s)H(s) = ±N7r,

N = 1,3, 5, 7,

Also, the magnitude of G(s)H(s) must be unity.
magnitude condition:
(50)

IG(s)H(s) I =

1.

This constitutes the

21-48

FEEDBACK CONTROL

In general,
(51)

G(s)H(s)

+ 81)(8 + 83)(8 + 85) ... (8 + 82m-I)
(8 + 82)(8 + 84)(8 + 86) ... (8 + 82n)

K(8

where

K(SI)(83)(S5) ... (82m-I)

(52)

(S2) (84)( S6) . . . (82n)

represents the loop gain. Each factor in eq. (51) represents a vector in the
s-plane as shown in Fig. 53.
jw

(f

s-plane

FIG. 53.

Vector representation of typical polynomial factors.

The angle condition, eq. (49), requires that
(53)

---- ---- + ...

[/8+S1 +/8+S3
or
(54)

+/s+s2m_d - [/S+82

----

k=m

L: /(8 + 82k-l)

+ ...

+/s+s2nl = ±N7I'

i=n
-

L: /(8 + S2£)

= ±N7I'.

This states that for an exploratory point, s, to lie on the root locus, the
summation of the angles of the zeros minus the summation of the angles
of the poles of the open loop response mU8t be an odd multiple of 71'. •
Procedure. The procedure is to plot the poles and zeros of G(s)H(s) in
the s-plane, choose an exploratory point, s, layoff the factor vectors [note
that the factor vector arrowheads always lie on the exploratory point s
in Fig. 53], sum the angles with the proper sense; if they add up to ±N71',
the point is on the root locus. If not, move the exploratory point over and
repeat. This constitutes assuming that a chosen complex number is a
root of the characteristic equation and testing to see if it can be, i.e., if it
satisfies the angle condition.

STABILITY

21-49

When a point is located that does satisfy the angle condition, then the
vector magnitudes are measured and values are substituted in the magnitude condition, eq. (50), to calibrate the constant K. [Note. K determines
loop gain, but it is not defined as such per eq. (52).]
(55)

IKI = __
I(S_+
__S2_)_'_I(_s_+_S_4)_I_"_'_'_(S_+_S_2n_)_'
, (s + SI) I '(s + S3)' ... '(s + S2m-l) ,

Repetition of the above steps should ascertain the complete locus.
When the locus is completed, the actual K of the given system is determined from G(s)H(s). By reference to the root locus plot, the closed
loop poles are then obtained by inspection.
General Theorell1s for Construction. At first glance, it appears as
though a root locus may lie anywhere in the complex plane and to discover it may be a hit-or-miss proposition. Fortunately, the locus must
take on certain definite patterns governed by the number and location of
the open loop poles and zeros. The following general theorems aid in
ascertaining the approximate root locus.
THEOREM 1. Number of branches of the locus is equal to the number of
closed loop poles. A branch is a separate portion of the root locus which
has all values of loop gain on it. For a given loop gain only one pole may
exist on one branch of the complete root locus plot. The number of
branches is therefore equal to the degree of the characteristic equation in
s because this determines the total number of poles. Reference to Fig. 54
shows that for a given loop gain, KI, there are four closed loop poles, one
on each of the four branches labeled (1), (2), (3), (4). ]( is the loop gain
because all factors are in nondimensional form.
THEOREM 2.
The locus starts at open loop poles or infinity (K = 0) and
ends at open loop zeros or infinity (K = 00). Inspection of the magnitude
eq. (55) shows that at open loop poles K is zero because one of the numerator magnitudes becomes zero. K is infinite at open loop. zeros because a zero magnitude term appears in the denominator. For the locus
to start at infinity it is imperative that G(s)H(s) have more zeros than
poles, i.e., its numerator would be of higher degree than its denominator.
Equation (55) shows that for this case and for s approaching infinity, ](
approaches zero. For the locus to end at infinity, it is imperative that
G(s)H(s) have more poles than zeros. In this case eq. (55) shows that for
s approaching infinity K approaches infinity also. Figure 54 shows the
case where the loop transfer function has four poles and one zero. The
branches start at the poles for a loop gain of zero. As the loop gain increases to infinity, branch (2) goes along the real axis from the pole to the
zero while the other three branches tend toward infinity.

FEEDBAcK CONTROL

21-50

THEOREM 3. For the locus to exist on the real axis, the sum of poles and
zeros to the right of the exploratory point must be odd. This is so because
conjugate complex roots together contribute zero angle when the exploratory point is on the real axis. Only poles and zeros on the real axis
to the right of the exploratory point contribute angle (180° each), there-

x

(K=O)

(KXO)

FIG. 54.

Root locus plot for
G(s)H(s) =
(T2S

s-plane

K(T}s + 1)
+ 1)(T4S + 1)[(s2/w02) + 2t(s/wo) + 1J

( fore, the above conclusion. By again referring to Fig. 54 it is seen that
where the locus exists on the real axis there are either five or three poles
and zeros to the right.
THEOREM 4. The locus is symmetrical with respect to the real axis. The
characteristic equation is a rational polynomial in s with real coefficients.
Therefore, the roots, when complex, must occur as conjugate pairs. In
Fig. 54, branch (3) is the image of branch (4) about the real axis.
THEOREM 5. The locus leaves an open loop pole or approaches an open
loop zero in the direction given by ±N7r minus the sum of angles of vectors
from remaining poles and zeros to the pole or zero in question. Consider the

exploratory point s to be very close to an open loop pole. As s circles the

STABILITY

21-51

pole, the angle change due to the vector from the pole in question to s
changes greatly. The other vectors change direction only minutely;
Therefore, since the angle contribution from all other poles and zeros is
nearly fixed, the angle contribution of the pole vector in question must

J

Probable

Asymptote line

[l~ol = 60 ~/

A

0

]

I

/

locus

Direction of departure
870

/
-L
L"-;?

[180"-90"+54°-33°-24"=87°1

j4

",,/'i/f I

Asymptote line

",/ ,// /

I

(r3:2~0 = 18001 //"'33"/?
I
~l •
J"'"24. /
1i40
I
x~
x~~~·~~~----;-------~
(J"

-15

-12

-9

-3
I

l-

s-plane

I 90·

x----L

FIG. 55.

Construction theorems
G(s)H(s) =

-j4

+ 1)
.
(~ + 1) C~ + 1) (3 ~ j4 + 1) (3 ~ j4 + 1)
K(!

contribute the amount necessary to satisfy the angle condition. There~
fore, the direction of locus departure from an open loop pole is ascertained.
A similar situation arises near an open loop zero. Reference to Fig. 55
shows that the direction of departure of the locus from the uppe,r complex
pole is 87°.
THEOREM 6. The direction of locus asymptote lines is given by
±N7f'

n-m

n = number of poles,
m = number of zeros.

FEEDBACK CONTROL

21-52

When the exploratory point is extremely far from the cluster of open
loop poles and zeros, they all contribute essentially the same amount of
angle. Since these must add up to ±180 degrees or some odd multiple,
the foregoing conclusion exists. In Fig. 55 it is seen that ±60° and 180 0
are the directions of the asymptote lines. Part of the 180-degree line
also happens to be a branch of the locus.
THEOREM 7. Asymptote lines cross real axis at
i=n

k=m

L: (J'i - L: (J'k
i=l

k=l

n-m

(J'i

= real part of ith poles,

(J'k

=

real part of kth zero.

This corresponds to the centroid of the pole-zero cluster. In Fig. 55
the asymptote lines cross the axis at -7.
Practical Considerations. If the loop transfer function has many
poles and zeros, some of which are located relatively far from the main
cluster and from the jw-axis, a first order approximation can be made to
the exact locus by omitting the distant poles and zeros. This procedure
requires good engineering judgment. The advantage lies in quicker
ascertainment of the important part of the locus which can be drawn to
an expanded scale.
The procedure to be used in drawing a root locus is to plot the poles
and zeros of the open loop response. From the preceding generalizations,
sketch in the loci. Graphically determine the exact loci. With the open
loop gain constant pick off the closed loop poles. The closed loop response
is then made up of the poles obtained from the plqt and the zeros and
multiplying constant from inspection of G(s) and H(s) per eq. (47). Multiple loops are handled by first reducing the minor loops to transfer functions
in factored form. It is of interest to remember here that the root locus
method is a graphical procedure of factoring the characteristic equation
of a system. The minor loop transfer functions are then included as blocks
in the major loop and the major loop root locus is then plotted.
Donahue's Analytical Procedure to Calculate the Root Loci.

A

relatively simple analytical means of plotting a root locus has been developed (Ref. 34). This method determines a point on the locus by shifting the jw-axis a given distance, (J'I, and then calculating the frequency,
WI, at which the locus crosses the jw-axis in the s-plane.
The requisite
loop gain is then calculated from WI. Successive points on the locus are
obtained by successive shifting of the jw-axis. Tables 1 and 2 have been
derived by Donahue to aid in the calculations (Ref. 34).

STABILITY

21-53

Referring to the general single loop negative feedback system of Fig. 1,
let
G(s)H(s) = K(RN

(56)

RD

+ IN) ,
+ ID

where RN is the real part of the numerator (even powers of s) and [N is
the imaginary part of the numerator (odd powers of s). The denominator
follows similar notation.
For a point to lie on the root locus, from eq. (48),

RD

+ IN)
+ ID

+

KIN = -RD - [Do

[«(R N

-----=

(57)

-l.

Therefore
KRN

(58)

Equating real part to real part and imaginary to imaginary'

K=

(59)
and
(60)

Equations (59) and (60) provide means of solution for frequency and
gain at the root locus crossing of the jw-axis.
EXAMPLE.
Let
H(s) =

(61)

(62)

where
from the preceding
RN

= ho,

IN

= s,

Substitution in eqs. (59) and (60) gives
(63)
(64)

K

= -

(S2

+
ho

ro)

,

1
(s

+ S4)

K(s

+ ho)

,

FEEDBACK CONTROL

21-54

TABLE

(m,n)

General Form

(1,2)

(8 2+r18 +ro)

K(8+h o)

1. AID

FOR ROOT

Locus

ao

al

ro +rlO" +0"2

Tl +20"

CAL

(2,2)

K(8 2+h18 +ho)
(8 2+rI8+ ro)

1-0 +rlO" +0"2

TI +20"

(0,3)

K
(8 3+r282+rI8+ ro)

rO+rlO"+r20" 2 +0"3

rl +2r20" +30"2

ro +rlO" +r20"2 +0"3

rl + 2r20" +30"2

ro +rlO"+r20"2 +0"3

rl + 2r20" +30"2

ro +rlO" +r20"2 + 0"3

rl +2r20" +30"2

rO+rlO"+r20"2
+r30"3+0"4

rl +2r20"
+3r30"2+40"3

ro+rlO"+r20"2
+r30"3+0"4

rl +2r 20"
+ 3r30"2+40"3

rO+rlO"+r20"2
+r30"3+0"4

rl +2r20"
+3r30"2 +40"3

ro+rlO"+r20"2
+r30"3+ r4 0"4+0"5

rl +2r20"+3rs0"2
+4r40"3+50"4

ro+rlO"+r20"2
+r30"3 +r40"4 +0"5

Tl +2r20"+3r30"2
+4r40"3+50"4

ro +rl 0" +r20"2 +rs0"3
+r40"4 +r50"5 +0"6

Tl +2r20" +3T30"2
+4r40"3+ 5r50"4+60"5

[t riO"i]

[~ iriO"(i-l) J

(1,3)

(2,3)

(3,3)
(0,4)

(1,4)

(2,4)

(0,5)

K(8+ho)
(8 3+r282+rI8 +ro)

K(8 2+h18 +ho)
(83+r282+rI8+ro)
K(8 3+h282+hI8 +ho)
(8S+r282 +r18 +ro)
K
(8 4+r38 3+r282+rI 8+ rO)
K(8+ho)
(8 4+rs8 3+r28 2+r18 +ro)

K(8 2+hI8+ hO)
4
(8 +r38 3+r282+r18+ro)
K
(8 5+r484 +rs8 S+r282+r18 +ro)·

(1,5)

K(8+ho)
(8 5+r484 +r383 ~r282 +r18 +ro)

(0,6)

K
(86+r585 +r484 +r38 3+r282 +r18+ro)
K[

(m,n)

t

t=o

i

h i8

]

[t Ti8i ]
t=o

t=O

STABILITY

21-55

CULATION BY DONAHUE PROCEDURE

a2

a3

a4

as

bo

bl

b2

0

0

0

0

ho+O"

0

0

0

0

0

0

ho+hlO"
+0"2

hI +20"

0

r2+ 30"

0

0

0

0

0

0

r2+ 30"

0

0

0

ho+O"

0

0

r2+ 3 0"

0

0

0

ho+hlO"
+0"2

hI +20"

0

r2+ 30"

0

0

0

r2 + 3r30" + 60"2

r3+ 40"

0

0

ho+hlO"
hI +2h20"
+h20"2+0"3
+30"2
0
0

r2+3r30" +60"2

r3+ 4 0"

0

0

ho+O"

0

0

r2 +3r30" +60"2

r3+ 40"

0

0

ho+hlO"
+0"2

hI +20"

0

r4+50"

0

0

0

0

r4+50"

0

ho+O"

0

0

0

0

0

r2+3r30"
+6r40"2 + 100"3

r3+ 4r40"
+100"2
r3+ 4r40"
r2+3r30"
+100"2
+ 6r40"2 + 100"3

r2+3r30"+6r40"2 n+4r40"
r4+5 r50" r5
+150"2 +60"
+ lOrs0"3 + 150"4 +100"2+200"3

[~ C~k)
riO" (i-2) ]

[thiO"i]
1=0

[~
ihiO"(i-l) ]

h2+30"

0

[~ (%k)
hiO"(i-2) ]

FEEDBACK CONTROL

21-56

In eqs. (63) and (64) let s = jw
w2

-

ro

K=---

(65)

ho

(66)
Equations (65) and (66) define the gain and frequency at which the root
locus crosses the imaginary axis.
To calculate other points on the locus, shift the jw-axis by replacing
s with (s
0-)
K(s
(J
ho)
(67)
G(s
(J)H(s
(J) =
,
(s
(J)2
rl(s
(J)
ro
which reduces to
K(s + bo)
(68)
G(s
(J)H(s
(J)
S2
alS
ao'
where

+

+

+

+

+

+

+ +
+ + +

+

+

bo = ho

+ (J.

By analogy to eqs. (62), (65), and (66)
(69)

1
2
- K = - (ao - w ),
bo
.

(70)
Tables for Donahue Procedure. This example gives rise to the first
, row of Tables 1 and 2. For (J = 0 eqs. (69) and (70) reduce to eqs. (65)
and (66). Table 1 has the parameters ao, aI, a2, etc., and bo, bI, b2, etc.,
determined in terms of the original numerator and denominator power
series coefficients for each of several types of loop transfer functions.
Table 2 gives w 2 and - K in terms of these a and b parameters.
The procedure therefore consists of reducing G(s)H(s) to a fraction of
two power series, identifying this with the proper row in Table 1, substituting in values of (J, which lead to calculation of the parameters a
and b and subsequent solution of w2 and -K per Table 2. (JI, WI, and KI
provide a point on the root locus. The occasion may arise that for a given
(J, there may exist no real w or positive K. This merely signifies that no
locus exists in this portion of the s-plane.
It will be noted that the' last line in Table 1 has the general equation
which can be used to evaluate the a's and b's for additional transfer functions. But since the corresponding general equations for the K and w2
are missing, the above serves more as a check on new derivations than as
a means of avoiding work.

TABLE

2.

AID

Locus

FOR ROOT

CALCULATION BY DONAHUE PROCEDURE

w2

-K

(1, 2)

lao - boad

1
- lao - w2]
bo

(2,2)

[bOal - b1ao]
[al - bl ]

lao - w2]
[b o - w2]

(0,3)

al

(1, 3)

[boa I - ao]
[b o - a2]

(2,3)

![(b o + al - b1a2) ± V(b o + al - bla2)2 - 4(boal - b1ao)]

(m,n)

(3, 3)

2(b 2

~ [(b 2al + bo -

a2

ao - bla2) ± V(b 2al

H(a2 - bOa3) ±

(1,4)

(0,6)

a2)2

+ 4(boal -

ao)]

1

Ha3 ± V a32 - 4all

~ [(b Oa3 -

2(b0 - a4

a2) ± V(boa3 - a2)2 - 4(b o - a4)(b oal - ao)]
1

-2 [a3 ±
as

(J')

-t

»
~
r-

[b o - b2w21

::::j

[w 4 - a2W 2 + ao]

1 -

(0,5)
(1, 5)

[b o - w2 ]
lao - a2W 2]

ao - b1a2)2 - 4(b 2 - a2)(b oal - b1ao)]

- b ' [(b Oa3 + al - b1a2) ± V(boa3 + al - b1a2)" - 4(a3"- b1)(boal - b1ao)]

2(a3 -

1

2

1

+ bo -

V (bOa3 -

a2W

- lao - a2W 2]
bo
lao - a2W 2]

[~]

(0,4)

(2,4)

lao -

V a3 -

4alaS]

-<

1
- [w 4 - a2W 2 + ao]
bo
[w 4 - a2W 2 + ao]
[b o - w 2 ]

[a4w4 - a2W 2 + ao]
1

bo [a4w4 [ -w6

a2W 2 + ao]

+ a4w4 -

a2W 2

+ ao]

~

~
.......

FEEDBACK CONTROL

21-58

Cons truction Aids

From the discussion in the preceding subsections, it may be inferred
that locating points that satisfy the angle condition is a time-consuming
procedure. To aid in this respect, a simple device can be constructed which
mechanically, sums the vector angles.
Mechanical Angle SUInIner. {See Fig. 56.) This device is made of
clear plastic. The arm rotates on the disk with a slight drag. To use, place
the pin point at an exploratory point s with arm pointing horizontally to
the left and the zero degree arrowhead aligned under the arm centerline.

~

,Arm"

Pin

head

===~~=====;:=====F==~~ Disk
(:Pin point

Side View

FIG. 56.

Mechanical angle summer.

To sum angles of pole vectors, hold the disk in place and rotate arm centerline to a pole root. Release disk and return arm to neutral. Friction causes
disk to rotate with arm. To sum angles of zero vectors, reverse the order
of disk rotation. Rotate arm centerline to a zero root (disk free to
rotate). Hold disk and return arm to neutral. When all roots have been
successively accounted for and the arm has been returned to the neutral
position, the 180-degree arrowhead should lie under the arm centerline for
a point to be on the root locus.
The Spirule shown in Fig. 57 (Ref. 13) performs the above operation
plus the additional feature of calibrating the locus. A logarithmic spi'ral
curve on the arm' permits the logarithm of a length to be obtained as, an

FIG. 57.

Spirule.

(Developed by 'V. Evans. Available from the Spirule C()mpany,
. Whittier, Calif.)

STABILITY

21-59

angle, so that the addition of such angles corresponds to adding logarithms.
The root locus is calibrated rather simply with this addition.
Conductivc Papcr Disk. (See Figs. 58 and 59.) To minimize further
the labor involved and therefore enhance its use, machines have been
devised which perform the necessary operations automatically. One
such machine uses the fundamental idea that the electric potential de-

A-c
supply

FIG. 58.

Angle measurement on a conductive paper disk.

A-c
supply

Circles of equipotential

FIG. 59.

Magnitude measurement on a conductive paper disk.

veloped on a conducting paper disk could represent angles or logarithms
of lengths. This principle has been tried with success (Ref. 35).
A Mechanical Plotting Machine. This machine, described in Ref.
14, is a simple mechanical instrument which sums angles of vectors by
using the principle that torque developed by a rotational spring is proportional to the angle of rotation of the spring. The machine is simple
in construction, portable, and it requires no auxiliary power.
A Compact Analog Machine. Described in Ref. 15 and called the
"Complex Plane Analyzer" this machine can, among other functions, be
used to obtain a root locus plot. The principle involved is that of reducing
vector multiplication to two independent summations of phase and log
magnitude. To this end, a logarithmic potentiometer is used to measure

21-60

FEEDBACK CONTROL

magnitude and a linear one measures phase. Capacitors are individually
charged with voltages representing these quantities. Summation of
capacitor voltages produces the required overall products and quotients.
The machine is simple, rugged, can also be used to plot phase loci, and is
available commercially.
SOllle COllllllon Root Loci

The following plots (Figs. 60-91) are presented to aid in checking some
of the preceding theorems, to present some general loci and to show in
general how redistribution or variation in number of poles and zeros
affects the plot.
jw

jw

s-plane

s-plane

~x~x--+-----~U~
-84

FIG. 60.

-Sl

FIG. 61.

Root locus plot for

G(8)H(8) = (

8

~X---X--~r-~------~

-84

-82

K(8 + 81)
+ 82)(8 + 84 )

-82

-81

Root locus plot for

G(8)H(8)

K(8

+ 82)

= (8 + 82)(8 + 84 )
jw

jw

;to (-a

+ j(3)
u

FIG. 62.

Root locus plot for

G(8)H(s)
(8

Root locus plot for

FIG. 63.
G(8)H(8)

+ a + j(3)(8 + a

- j(3)

(8

+ a + j(3)(8 + a

-

j(3)

STABILITY

21-61

jw

.---x----x
-SI

FIG. 64.

FIG. 65.

Root locus plot for

G(8)H(8) = (
8

K(8 + 81)
+ 82)(8 + 8a)

!

Root locus plot for

K
G(S)H(8) = - - - - - - (8
81)(8
82)(8
8a)

+

+

X

+

jw

jw

(-0: + j(3)
~X

-52

I-V

__________~~U~

(

-SI

X - - - t -U
;:;....
-51

I-~
FIG. 66. Root locus plot for

FIG. 67. Root locus plot for

G(8)H(8)

G(8)H(8)

K
(8
{

+ 81)(8 + a + i(3) }
X (8 + a - i(3)

K
(8
{

+ 81)(8 + a + i(3)}o
X (8 + a - i(3)

FEEDBACK CONTROL

21-62

jw

jw
8-plane

--------x~~--x~x

-82

FIG. 68.

FIG. 69.

Root locus plot for

G(s).H(s)
(S
{

-84

-81

Root locus plot for

G(8)H(8) =
(8

+ a + j(3)(s + a - j(3)}.
X (s + Sl)

K(s + S1)
+ 82)(8 + 84)(S + 86)

jw

jw

ct

FIG. 70.

G(s)H(s)

Root locus plot for

-86

FIG. 71. Root locus plot for

G(s)H(s)

21-63

STABILITY

\
(-0:

jw

+ j{3)
u

-x

FIG. 72.

jw

Root locus plot for

G(s)H(s)

+ +Jm
+a -

(s
a
{ X (s

FIG. 73.

Root locus plot for

G(s)H(s)

J(3)(s

+ S2)

}

jw

jw
8-plane

x--x-~--+---"-U

-x-++--E+~>---

-82

FIG. 74.

-S4

Root locus plot for

FIG. 75.

-S6

-S1

Root locus plot for

G(s)H(s) =
(s

K(s + Sl)
+ S2)(S + 84)(S + S6)

FEEDBACK CONTROL

21-64

jw

FIG. 76.

Root locus plot for

G(s)H(s)

=

jw

FIG. 77.

K(s + SI)(S + S3) •
s(s + S2)(S + S4)

Root locus plot for

G(s)H(s) = K(s + SI)(S + S3) •
s(s + S2)(S + S4)

jw

jw

FIG. 78.

Root locus plot for

G(s)H(s) = K(s + SI)(S + S3) •
s(s + S2)(S + S4)

FIG. 79.

Root locus plot for

G(s)H(s)

=

K(s + SI)(S + S3) •
s(s + S2)(S + S4)

21-65

STABILITY

jw

FIG. 80.

Root locus plot for

G(s)H(s)

= K(s + SI)(S + 83) •
(s

+ S2)3

jw

FIG. 81.

G(s)H(s)

Root locus plot for

=

K(s
s(s

+ SI)(S + S3) •
+ S2)(S + S4)

jw

jw
(-a

+ j(3)

---x----~----+---~-82

FIG. 82.

Root locus plot for

FIG. 83.

G(s)H(s)

G(s)H(s)

K

K
(8
{

Root locus plot for

+ 82)(S + S4)(S + S6)}'
X (s

+ S8)

s(s

+ S2)(S + a + j(3)(s + a

- j(3)

21-66

FEEDBACK· CONTROL

(-ex

+ j(3)

jw

jw

X

(-ex

~+

+ j/3)

(f

~+
(-ex -j{3)
FIG. 84.

Root locus plot for

FIG. 85.

G(s)H(s)

Root locus plot for

G(8)H(8)

K
(S
{

K

+ S2)(S + 84)(8 + a + j{j)}o
X (8 + a - j(3)

8(8
{

+ 82)(8 + a
X (8

+a

+j{j)}
-j{j)

jw

jw

(-ex
X

+ j(3)

(f

(-~-j{3)

FIG.

86.

FIG. 87.

Root locus plot for

.

Root locus plot for

G(s)H(s)

G(8)H(8)

K
(8
{

+ al + j(3l) (8 + al - j(3l)
X (8 + a2 + j(32) (8 + a2 -

S(8

}

j(32)

{

+ 82)(S + a + j(3)}o
X (8 + a - j(3)

21-67

STABILITY

,@"2+-~FIG. 88.

FIG. 89.

Root locus plot for

Root locus plot for

G(s)H(s)

G(s)H(s)
S(S

,{

+ S2)(S + a + j{j)}
X (S + a - j{j)

S2(S

+ a + j(3)(s + a

-

j{j)

jw
+j 37r/T

+j7r/T
(f

-j7r/T

-j37r-jT

FIG. 90.

Root locus plot for
G(s)H(s)

= Ke- Ts •

FIG.91.

Root locus plot for

G(s)H(s)

Ke- Ts

= --.
(s

+ S2)'

FEEDBACK CONTROL

21-68

Interpretation of Results

The root locus plot provides a pictorial representation of the roots of
the characteristic equation of the closed loop response. The location of
these roots determines the modes of the transient response. Figure 92
shows contours of constant characteristics of these modes.
Line of constant WD

Circle of constant Wo
(J'

s-plane

FIG. 92.

Contours of constant characteristics of transient response modes.

The jw-axis defines the limit of absolute stability. For the system to
be absolutely stable, all roots must lie in the left half· s-plane. Circles
concentric with the origin correspond to loci of roots with constant undamped natural frequency. Therefore, for a system prescribed to have a
maximum natural frequency mode, all roots must lie within the corresponding prescribed circle. Lines of constant imaginary part correspond
to lines of constant damped natural frequency. For prescribed maximum
damped natural frequency, all roots must lie within the area bounded
jw

FIG. 93.

Location of roots for combined restraints.

by the corresponding prescribed lines of constant imaginary part. Lines
of constant real part correspond to lines of constant response time or constant exponential decay factor (- rwo). Again for prescribed maximum
individual response time, aU roots must lie to the left of the corresponding

STABILITY

21-69

line of constant negative real part. Radial lines passing through the
origin correspond to lines of constant damping ratio (r). For prescribed
minimum damping ratio, all roots must lie within the area bounded by the
corresponding minimum damping ratio lines encompassing the negative
real axis. Note that lines of zero damping factor, infinite response time,
and absolute stability are the same.
Combined restraints may be imposed on the modes of the transient
response by reducing the area of the root location to that area common
to the individual areas. For example, with specified maximum response
time, maximum damped natural frequency, and minimum damping ratio,
the roots would have to lie within the cross-hatched area of Fig. 93.
Multiloop Systellls Analysis

For multiloop systems, the procedure is to reduce the individual inner
loops to transfer functions in factored form by use of minor loop root loci.
The major root locus is then drawn up as a single loop. A particular advantage of the root locus method of analysis is that, when changes are made
in the minor loops, the effect on the overall loop is shown directly.
For example, consider a closed loop voltage regulating system shown
in Fig. 94. When a load is imposed upon the system, the gains change
V(s)
(Ts s+1)(TS s +1)

FIG. 94.

Multiloop voltage regulating system.

because of nonlinear behavior. The major loop gain decreases whereas
the minor loop gain increases.
The minor loop root locus is shown in Fig. 95.
The major loop root is shown in Fig. 96. This figure reveals that the
net effect on the overall system of imposition of full load is that the dominant pol.e pair (those complex roots closest to the origin) shifts to a lower
frequency with a slightly higher damping ratio whereas the sub dominant
pole pair (those complex roots furthest from the origin) shifts to a higher
frequency with a lower damping ratio. The conclusion here is that imposition or removal of load does not severely affect the system stability or
performance.

21-70

FEEDBACK CONTROL

K 2 • no load minor
loop gain
K'2' full load minor
loop gain

s-plane

FIG. 95.

Minor loop root locus plot
K 2s
G(s)H(s)

= (T6 S

+ l)(Tss+ 1)(T12S + l)(T14S + 1) •

K 5 • no load major loop gain

K's. full load major loop gain
\
"'"tK's
I
~-

....

;/

K5

------x~~~--~----~~o_-----------

FIG. 96.

Major loop root locus plot.

STABILITY

21-71

In multiloop systems, desired performance of the overall loop can sometimes be achieved by use of unstable minor loops. In these instances, it
must be remembered that if a failure can occur such that· the remaining
system releases large amounts of uncontrolled energy, the design should
be critically reviewed. In practice, systems are usually designed with
stable inner loops.
Systelll Design
By nature, synthesis is more complicated than analysis. A few genera]
observations can be made with regard to reshaping the root locus to obtain
the required root locations. Inspection of the plots shown in Figs. 60 to 91
shows that poles tend to repel the locus whereas zeros tend to attract
it. Also, as the difference between the number of poles and zeros increases,
the locus tends to shift toward the right half s-plane. System synthesis
through use of the root locus technique amounts to proper placement of
the closed loop poles and zeros. The process is by no means simple, but
by use of some of the previously mentioned machinery, a large amount of
the labor is circumvented. For a detail of design in terms of root loci, see
Chap. 23 and Ref. 16.
Relative Merits of Root Locus Method
This method is theoretically exact and places in evidence the salient
features of a closed loop system. Drawing the locus may involve a slight
amount of work, but excessive labor is circumvented by use of mechanical
aids.
Major advantages of this method are:
a. The behavior pattern of the entire closed loop can be shown in one
simple diagram.
b. Modes of the transient response are placed directly in evidence.
c. Effects of variations in system parameters are placed directly in
evidence.
This is a relatively new method and is gaining widespread use.
7. MISCELLANEOUS STABILITY CRITERIA

There are many, many methods to perform stability analysis of linear
systems. The following is a brief account of some methods not discussed
in previous sections. For the interested reader, the references can be consulted for theory and details of operation .
. Hurwitz Criterion (Refs. 17, 18). This criterion is similar in nature to
the Routh criterion and involves use of determinants. It is in general
more laborious than the Routh criterion and offers information only with
regard to whether or not all roots of the characteristic equation lie in the

21-72

FEEDBACK CONTROL

left half s-plane. This method has been used to advantage in deriving
other stability criteria such as stability boundary diagrams.
Dzung's Criterion (Refs. 19, 20). This stability criterion is very
similar to the Nyquist criterion except that it avoids the necessity of determining the location of poles of G(s)H(s) on the jw-axis. It offers particular advantage when G(s)H(s) is not known in factored form and Routh's
criterion indicates poles of G(s)H(s) on the jw-axis.
Wall's Criterion (Ref. 21). This stability criterion is similar to the
Routh criterion and in many cases the computations are somewhat simpler.
Stability Boundary Theory (Refs. 22, 23, 24). This method is nice
in that some simple arithmetic calculations are made by using the coefficients of the characteristic function, the results are plotted on given
charts, and stability is ascertained by inspection. The main disadvantage
lies in the large number of charts required for higher order systems.
Stability Plus Assurance of Margin of Stability (Refs. 25, 26, 27)

+

By substitution, Sf = (s
a) and/or Sf = se j8 in the characteristic
equation, which corresponds to shift and/or rotation of the axes in the
s-plane, and subsequent analysis of the resulting equation, stability plus
assurance of a margin of stability can be ascertained. The substitution
may result in an equation with complex coefficients. Analysis may be
carried out by use of any of the following.
Nyquist and Dzung Criteria. These criteria are general in nature and
are applicable.
Analog of Hurwitz (Refs. 28, 29), Wall (Ref. 4), Routh Criteria
(Ref. 4). These criteria are similar to the criteria as described previously.
Leonhard's Criterion (Ref. 30). This stability criterion is similar to
the Nyquist criterion.
Analog COInputer Approach (Ref. 33). By simulating the equations
which describe the physical equipment's behavior, it is possible to study
system stability and performance characteristics. An entire part in Vol.
2 is devoted to analog computers.
8. CLOSED LOOP RESPONSE FROM OPEN LOOP RESPONSE

As shown by eq. (1), the closed loop response of the general negative
feedback system is a function of the forward and feedback transfer functions. Block diagram manipulation of the diagram in Fig. 1 leads to
that shown in Fig. 97. That portion of the system shown in the dashed
rectangle is a unity feedback system whose closed loop response is given
by
G(s)H(s)
(71)

-------'-'

1

+ G(s)H(s)

.

STABILITY

21-73

For any value of G(s)H(s), the closed loop response can be considered a
vector given by
(72)

All is the magnitude of the vector where a is its direction in radians.

,-----------,
I
I

I

I

I
I
I
el(s) I

R(s) +

I
I
I
I
IL _ _ _ _ _ _ _ _ _ _ _ _ --1I
FIG. 97.

General negative feedback system.

Contours of Constant M and a. It can be shown (Ref. 31) that for
unity feedback systems, certain curves in the complex plane correspond
to loci of constant }vI and constant a.
For the direct polar plot of G(s)H(s), the M loci are circles with

Radius =

IM 2M-1 I

M2
and center at -

-2--

M

-1

on the axis of reals.
The curves are shown in Fig. 98. The a loci correspond to circles passing
through the origin and the -1 point and centers at
1

1

2

2 tan a

--+j--.
These a curves are shown in Fig. 99. These curves of constant 111
and a are useful for many purposes. An important use is that of obtaining
the closed loop response from a plot of the open loop response, G(s)H(s).
G(s)H(s) is superimposed on curves of constant M and constant a, and
the closed loop magnitude and phase angles are obtained by inspection of
the respective circles at points of intersection with G(s)H(s).
For the inverse polar plot of G(s)H(s) it can be shown (Ref. 31) that
the contours of constant M are circles with center at the -1 point and
radius equal to 1/M. See Fig. 100. The contours of constant a are
straight lines passing through the -1 point with slope equal to a.
The M and a contours in the inverse G(s)H(s)-plane plot are somewhat
easier to use because of ease of construction.

21-74

FEEDBACK CONTROL

4

3

2

-4

FIG. 98.

Circles of constant M in the G(s)H(s)-plane.

STABILITY

21-75

4

3

2

Re
O~------~~§§~~------~~
-1

-2

-3

-4

-4

-3
FIG. 99.

-2

-1

o

2

Circles of constant a in the G(s)H(s)-plane.

3

21-76

FEEDBACK CONTROL

It is of interest to note that the lYI and a contours are perfectly general
curves for unity feedback systems. In other words, G(s)H(s) is not restricted to those values of s on the fw axis.

FIG. 100.

Contours of constant M and

a

in the inverse G(s)H(s)-plane.

Nichols Charts

The information contained in the M and a circles, when plotted in
terms of decibels and phase angle as shown in Fig. 101, are commonly
referred to as Nichols charts because of the fundamental work first done
by N. B. Nichols (Ref. 32). The curves shown in the figure have a mirror
image about the -180° ordinate. The total curves correspond to the
principal value of the logarithm given by eq. (28). Since the logarithm
of a complex number is multivalued, eq. (27) the curves repeat as shown
in Fig. 102.
Stability Analysis on the Nichols Chart. Note that the -1 point
in the G(s)H(s)-plane corresponds to the O-db, -180° point in Fig. 10l.
For well-behaved, minimum phase G(s)H(s) , the system is stable if
G(jw)H(jw) crosses the O-db line to the right of the -1 point on the Nichols
chart. Figure 103 shows a stable system with a phase margin of +48°
at gain crossover and a gain margin of 6.8 db at phase crossover.
Exact Closed Loop Response. The procedure to obtain closed loop
response from open loop response is as follows:
a. Manipulation of the general negative feedback system to the form
shown in Fig. 97.

STABILITY

21-77

22 1--+-+----1\

-2.0
(0.794)

f-L-I-+-~~~-\+-_\__f_++__l'+__I__+~-_+_-1l__+____:;l~

'~-4-:"¥=-+--+l--I-hI+-I--fSr...:--:-:-:-¥---+-\--!-\--b..../ -3.0

(0.708)

-12L-_~

-180 -160

__~~~~~~~~~~~_U-~~~~
-140 -120 -100 -80 -60 -40 -20
0
Phase angle. degrees

FIG. 101.

Nichols chart.

I'-.)
--'

~

00

.!!l
.c

"T1

.CU

'u
cu

m
m

r:

~

:t>
()

cu

()

o

"C

o:J

o

CO
::::J
C

A

o

~

Z

-t

;::0

o
r-720

0

-630 0

-540

0

-450

0

-360

0

-270

FIG. 102.

0

-900
-1800
Phase, degrees

Multiple Nichols charts.

90

0

1800

2700

360°

STABILITY
24

21-79

(1.0)

221----H--t\.

-2.0

t+---H--f--f-lI.---t---t-Ir-::i;;oo-"1(0.794)

(-1 point)

-12~

__~__~~__~~~__~~__u-~~__~~~

-180
FIG. 103.

-20
G(s)H(s) plotted on Nichols chart.

0

FEEDBACK CONTROL

21-80

b. Plot G(s)H(s) directly or inversely with the corresponding M and
G(s)H(s) on the Nichols chart is shown in Fig. 103.
c. Obtain the closed loop M and a at points of intersection of G(s)H(s)
with the M and a loci.
d. Modify this response by I/H(s) to obtain the overall closed loop
response.
ApproxiInate Closed Loop Response. The approximate closed loop
response can be obtained by plotting the Bode diagram for G(s) and
a loci.

40
20

VI

Q)
.0

-----......

...::::.:

~ ~~

Q)

0

o

r-. --.-.- ........ ~ --la(s)1

\ 1(S)1 A

'0

C

.........

...........

I;'

R(s)

-20

,I~~,)II
--12: "":::::~I-............ ---~
~

1\
~

\C(S) \
R(s)

- -,

i"- ........

-90

lG -120
~

bO

~ -150

----

I---...

/

Phase angle of

~~~

(approx.)

....... N

-;--,,1"I"--..

-.........

-180
0.1

0.2

....... i'-

0.4

0.7

1

4

2

7

10

-- -

20

r--

40

70 100

w~

FIG. 104.

Approximate closed loop response.

[H(S)]-l on the same sheet. The closed loop response is approximately
equal to the lower of the two curves at any given frequency (see Fig. 104).
(73)

G(~

C(s)

R(s)

1

1

+ G(s)H(s)

[1/G(s)]

+ H(s)

when H(s) is much smaller than G(s) and G(s)H(s) is greater than 1.
1

-G(s) + H(s) = H(s)
and
C(s)

1

--=--1

R(s)

H(s)

when G(s) is much smaller than H(s) and G(s)H(s) is less than 1.

21-81

STABILITY

1

-+
H(s)
G(s)

1
~--

G(s)

and
C(s)
R(s)

~

G(s).

The approach is to approximate the closed loop response by the lowest
portions of G(s) and [H(S)]-l. Assume all the breaks are simple and of
multiplicity one or more. The phase diagram is drawn assuming the
simple breaks of a minimum phase nature.
The approximation is worst in the region of w where G(s) = [H(S)]-l.
If necessary, the exact closed loop response can be obtained in this region
by using the preceding exact method and the Nichols charts.
EXAMPLE.

G(s) =

10
s(0.5s

+ 1)

,

H(s)

(0.33s

+

1)

In Fig. 104 are plotted G(s) and [H(S)]-l. The approximate closed loop
response is shown as the heavy line and is approximated by the equation
(74)

C(s)

10(0.33s

--~

R(s)

S(28

+

1)

+ 1)(0.09s +

1)

The phase angle curve is that corresponding to eq. (74).
N ole. There are many ways to investigate stability of linear closed
loop systems. If used properly, they should all obtain the same result.

REFERENCES
1. E. J. Routh, Stability of a dynamical system with two independent motions, Proc.
London Math. Soc., ser. 1, 5, 97-99 (1874).
2. E. J. Routh, A Treatise on the Stability of a Given State of Motion, Cambridge Uni~
versity Press, Cambridge, England, 1877.
3. E. J. Routh, Advanced Part of a Treatise on Advanced Rigid Dynamics, 6th edition,
pp. 210-231, Cambridge University Press, Cambridge, England, 1930.
4. T. J. Higgins, Epitomization of the Basic Concepts Underlying the Theory of
"The Stability" of Servomechanisms, Advanced Servomechanisms and Automatic Control
Theory, Class Notes EE 216, University of Wisconsin, Ronald, New York, 1955.
5. E. A. Guillemin, The Mathematics of Circuit Analysis, Technology Press and Wiley,
New York, 1950.
6. H. Chestnut and R. VV. Mayer, Servomechanisms and RegUlating System Design,
Vol. 1, vViley, New York, 1951.
7. H. W. Bode, Network Analysis and Feedback Amplifier Design, Van Nostrand,
Princeton, N. J., 1945.

21-82

FEEDBACK CONTROL

8. H. W. Bode, Amplifiers, Patent 2,123,178 (1938).
9. N. Balbanian and W. R. Lepage, What is a minimum phase network? Trans. Am.
[nst. Elec. Engrs., Pt. 1, No. 22, January 1956.
10. G. A. Biernson, Estimating transient responses from open-loop frequency response, Trans. Am. [nst. Elec. Engrs., 74, 388-402, Pt. 2, January 1956.
11. W. R. Evans, Graphical analysis of control systems, Trans. Am. [nst. Elec. Engrs.,
67, 547-551 (1948).
12. W. R. Evans, Control system synthesis by root locus method, Trans. Am. [nst.
Elec. Engrs., 69, Pt. 1, 67-69 (1950).
13. W. R. Evans, Control Systems Dynamics, McGraw-Hill, New York, 1954.
14. A. H. Harris, A Simple [nstrument for Summing Angles in the Root Locus Method
of Solving Ordinary Equations and Stability Problems, University of California, UCRL2269, Berkeley, July 10, 1953.'
15. Thf!l Complex Plane Analyzer, The Technology Instrument Corporation, CPA
type 250-A, Acton, Mass.
16. J. G. Truxal, Automatic Feedback Control System Synthesis, McGraw-Hill, New
York, 1955.
17. A. Hurwitz, Ueber die Bedingungen, unter welchen eine Gleichung nur WurzeIin
mit negati~en reaBen Theilen besitzt, Math. Ann., 46, 273-284 (1895).
18. L. Orlando, SuI problema di Hurwitz, Rendiconte Accademia Lincei, ser. 5, Vol. HI,
pp. 801-805, Rome, 1910.
19. L. S. Dzung, The Stability Criterion, in Automatic and Manual Control, Butterworths, London, 1952 (Proceedings of the 1951 Cranfield Conference, pp. 13-23).
20. L. S. Dzung, Das Stabilitatskriterium nach Nyquist, Regelungstechnik, 1, 143-145
(1953).
21. H. S. Wall, Polynomials whose zeros have negative real parts, Am. Math. Monthly,
52, 308-322 (1945).
22. E. Sponder, On the representation of the stability region on oscillation problems
with the aid of Hurwitz determinants, NACA Technical Memorandum 1348, August
1952, A Translation of E. Sponder, Zur Darstelling des Stabilitatsgebietes bei Schwingungsaufgaben mit Hilfe der Hurwitz-Determinanten, Schweiz. Arch., 16, 93-96 (1950).
23. J. F. Koenig, On the zeros of polynomials and the degree of stability of linear
systems, J. Appl. Phys., 24.,476-482 (1953).
24. T. J. Higgins and J. G. Levinthal, Stability limits for third-order servomechanisms,
Trans. Am. [nst. Elec. Engrs., 71, Pt. 2, 459-467 (1952).
25. J. F. Koenig, A relative damping criterion for linear systems, Trans. Am. [nst.
Elec. Engrs., 72, Pt. 2, 291-295 (1953).
26. A. Vazsong, A generalization of Nyquist's stability criteria, J. Appl. Phys., 20,
863-867 (1949).
27. A. Leonhard, Relative Damping as Criterion for Stability and as an Aid in Finding the Roots of a Hurwitz Polynomial, in Automatic and Manual Control, Butterworths,
London, 1952 (Proceedings of the 1951 Cranfield Confere~ce" pp. 25-43).
28. E. Frank, On the zeros of polynomials with complex coefficients, Bull. Am. Math.
Soc., 52, 144-157 (1946).
29. H. Bilharz, Bemerkung zu einem Satze von Hurwitz, Z. angew. Math. u. Mech.,
24, 77-82 (1944).
30. A. Leonhard, Ueber Selbsterregung elektrischer Maschinen, Arch. Elekrotech., 40,
343-346 (1952).
31. G. S. Brown and D. P. Campbell, Principles of Servomechanisms, Wiley, New
York, 1948.

STABILITY

21-83

32. H. M. James, N. B. Nichols, and R. S. Phillips, Theory of Servomechanisms,
McGraw-Hill, New York, 1947.
33. C. L. Johnson, Analog Computer Techniques, McGraw-Hill, New York, 1956.
34. Robert Donahue, unpublished, M.LT. Flight Control Laboratory.
35. General Electric Company, unpublished, Schenectady, New York.

E

FEEDBACK CONTROL

Chapter

22

Relation between Transient
and Frequency Response
C. E. Bradford and M. W. DeMerit

1. Introduction
2. Response Characteristics Defined
3. Relation between Transient Response and location of Roots of Characteristic
Equation

4.
5.
6.
7.

Relation between Closed loop and Open loop Roots
Design Charts Relating Open loop Frequency Response and Transient Response
Approximate Relations-Rules of Thumb

22·01
22·02
22·03
22·15
22·18
22·43

Numerical and Graphical Techniques of Relating Transient and Frequency

22·43
22·61

Response
Referenc;e$

1. INTRODUCTION

The frequency response technique of analyzing servo systems is used to
facilitate both the analysis and synthesis operations (Chaps. 20 and 21).
Often it is desirable to transform the results of the frequency response
analysis into transient response form in order to interpret them more
readily. Conversely, it is often desirable to transform the transient
response performance requirements into frequency response form for
synthesis purposes. These operations can be performed exactly by rigorous
mathematical techniques; however, the operations are time consuming
22·01

22-02

FEEDBACK CONTROL

and tedious, so it is often profitable to use less accurate but more easily applied
techniques. The purpose of this section is to present some of the more
useful techniques for relating the transient response to the frequency
response and the inverse relations between frequency and transient
response.
2. RESPONSE CHARACTERISTICS DEFINED
Transient Response. System response is often specified and interpreted in terms of the characteristics of the transient response to a step
. 1.05
1.00
0.95
Q)
VI
c:: 0.90

0

c..

C/Rlp

VI

~

0.50

Q)

E
i=

0.10

o ~~--~-+--~--------~---------'td~
I
I
Time_
(Delay time)

-t

I

:

~

I

I

I

I (Rise rtime)

tp - - - '

(Time to peak)

/--------'--'- ts

I

----~

(Settling time)
(a)

Q)

VI

c::
o
c..
~

1.000

1----,..---

3 db

0.707

_1. _______ _

>.
u

c::
Q)

:::J
0'
Q)

Lt
o~------------~------~-----------

w-

(b)

FIG. 1.

(a) Representative system response to unit step input.

(b) Representative

system frequency response (closed loop).

input. The parameters which are most often used to describe the transient
response are:
C/R Ip, the peak value of the transient including any overshoot;
tp , the time to the first peak if the response is underdamped and thus
has an overshoot;

TRANSIENT AND FREQUENCY RESPONSE

22-03

ts , the settling time, measured from the initiation of the step input to
the time at which the system output no longer deviates more than a
certain percentage, often 5 or 2 per cent from its final value;
N, the number of oscillations it takes the system to "settle" or reach ts.
Other parameters sometimes used to describe the transient response are:
td, the delay time, measured from the initiation of the step input to the
time at which the response has reached half the final value;
tT , the rise time, which is the difference between the time at which the
response has reached 10 per cent of the final· value and the time at which
90 per cent of the final value is reached. Rise time is also sometimes defined
as the time from 5 to 95 per cent, and also as the reciprocal of the slope
at the instant the response is 50 per cent of the final value.
Figure 1a illustrates the definitions of these transient response parameters.
Frequency Response. System response is also often described in
terms of certain frequency response characteristics. Chief of these are:
M m , the maximum amplitude ratio of the closed-loop frequency response, which is sometimes designated GIR 1m;
W m, the frequency at which Mm occurs;
Wb, the bandpass frequency which is generally defined as the frequency
at which the closed loop response is down 3 db from the nominal steadystate gain value. Figure 1b illustrates the definitions of these terms.
3. RELATION BETWEEN TRANSIENT RESPONSE AND LOCATION OF ROOTS
OF CHARACTERISTIC EQUATION

Mathematical Relation. The open loop frequency response may be
represented by the open loop response function in terms of its poles and
zeros, roots of the denominator and numerator respectively, of the forward and feedback portions of the control system,
(1)

N 1 (s)N2 (s)

G(s)H(s)

Dl (S)D2(S)

+ Zll) (s + Z12)
sm(s + Pll)(S + P12)
K(s

•••

...

+ Z21) (s + Z22) .. .
(s + P21)(S + P22) .. .

(s

The closed loop frequency response function can be obtained as follows:
(2)

C(s)

G(s)

--- = ---------R(s)

1

+ G(s)H(s)

FEEDBACK CONTROL

22-04

+

The closed loop poles can be found by factoring [D I (S)D2(S)
N I (s)N 2(s)].
Substituting the proper Laplace transform for R(s), C(s) may be represented by a sum of terms such as
(3)

C(s) =

Al
A2
Aa
-+--+--+
...
s
s + a2
s + aa

where Zin are the roots of NI, Z2n are the roots of N 2, PIn are the roots of
DI, P2n are the roots of D 2, anq. where the constants AI, A 2, A a, etc., are
found by partial fraction expansion.
The time response function is then found by performing the inverse
Laplace transformation to get
(4)

c(t) = Al

+ A2 exp (-a2t) + Aa exp (-aat) +

In general, this function must be plotted to determine such parameters
as peak overshoot and settling time which are often of prime importance.
The straight mathematical approach is impractical for any but simple
systems because of the amount of tedious work involved and the fact that
it does not lend itself to system synthesis.
ApproxiInate Approach

The time response can be estimated quite accurately by noting the
location of certain predominate closed loop poles and zeros in the complex
frequency plane (s-plane). The closed loop pole-zero configuration may
consist of one or more pairs of complex poles and several real axis poles
and zeros, and perhaps complex zeros. Ordinarily, one pair of complex
poles will be of primary importance because of its frequency or damping ratio. For example, if a system contains two pairs of complex poles
which have natural frequencies that differ by as much as 10 to 1, the
designer may ordinarily consider either pair as dominant and perform an
analysis in two parts, considering first one pair and then the other.
For many cases it is reasonably accurate to neglect all but one complex pole
pair and to consider the transient response to be made up of the dominant
pair of complex- poles and various groupings of real axis poles and zeros.
This assumption is made in the following discussion.
Only underdamped systems are to be considered here. Overdamped
systems may generally be analyzed quite easily by the normal mathematical techniques.
Dominant Pair of Complex Poles. To determine the effect of the
s-plane pole-zero configuration in the system transient response, it is
convenient to first consider the relation between a single pair of complex
poles on the s-plane and the characterizing parameters of the time response.

TRANSIENT AND FREQUENCY RESPONSE

22-05

The additional effect of the real axis poles and zeros will be considered
later.
The expression for the closed loop frequency response function containing one pair of complex roots is
C(s)

(5)

where

G(s)

R(s)

Wo

r
0"0
Wd

1

+ G(s)

= natural frequency,
= damping ratio,
= rwo = damping exponent,
= Wo ~ = natural damped frequency, or oscillation frequency.

The parameters

Wo,

r, 0"0 and Wd are shown on the s-plane in Fig. 2.

Complex
pole

f

i
FIG. 2.

One pair of complex roots and significant related parameters,

r=

cos

"'1.

Figure 3 illustrates that for constant values of natural frequency, Wo, the
complex roots or poles of eq. (5) generate circles on the s-plane as the
damping ratio r is varied. Radial lines from the origin are generated by
holding r constant and varying woo
Figure 4 illustrates that holding 0"0, the exponential damping factor,
constant forms lines parallel to the imaginary (jw)-axis on the s-plane.
Similarly, maintaining constant values for the damped frequency, Wd, forms
lines parallel to the real (0") axis.

22-06

FEEDBACK CONTROL

The expression for the time response to a unit step input is
e(t) = 1

(6)

+

(WOIWd) exp (-O"ot) sin

~wdt

-1/11),

where 1/;1 = arctan wdl - 0"0

= arctan viI - r2/r.
Figure 3 also illustrates that constant values of 1/;1 correspond to constant
jw

0"

Constant wo is a circle; constant r is a radial line.

FIG. 3.

jw

s-plane

0"03

0"02

0"0)

Wd3
wd 2
wdJ
0"

FIG. 4.

Illustrating lines of constant Wd and

0"0.

values of r. From eq. (6) the characterizing parameters of the transient
response can be derived. The equations for the more important ones are:
1Wo ~,

(7)

tp =

(8)

ts = 310"0 = 3/rwo, 'time required to settle to within 5% of final
value (=4/rwo for 2%).

7r

1Wd =

7r

time required to reach first peak.

22-07

TRANSIENT AND FREQUENCY RESPONSE

C/Rlp

(9)
(10)

1 + exp (-7rr/VI=""?),
= 1 + exp (-7r(J'O/Wd)

the peak value of the ratio of
output to input.
.

=

N = i s /(27r/Wd).

_~
= 3v 1 - r2/27rr

number of oscillations to settle to within 5%
of final value.

Equations (7) through (10) relate the position of the dominant pair of
complex poles on the s-plane to certain transient response parameters.
20

18

16

14

.

~ 12
3

....Q.
0

3
~ 10
~
0.

~

Cj
0

8

...-t

6

4

2

0.2

0.4

0.6

Damping ratio,

FIG. 5.

C jR ip, N, wotp, wots versus

0.8

1.0

f

r for system composed of two complex poles.

22-08

FEEDBACK CONTROL

The relationships between these parameters and the damping ratio, S,
are plotted in Fig. 5.
Similarly certain closed loop frequency response parameters can be
related to the position of the poles.
(11)

Mm

= 1/(2sVI="?), 0 < s < 0.707, maximum frequency response ratio of output to
= 1, 0.707 < s < 1
input (also defined as Mm).

2.0

r-----,------r-------r---,---------,

1.8

f-----t----fl---t----t----/---i

1.6 t-----+---\-\--+-----+---t-------I

1.4 t-----t--~~-----+---t-------I

o

1.2

I-----+----t\~-~---t-------I

3

oQ

3

~~ 1.0 f-----t'-----+~~_+_--=->i~+_-_

~~
0.8 t-----t----+---f"r----t---"r---I

t-----t----+-----+---~---I

0.6

0.41-----+----+-----+---+----1

0.2 t-----+----+-----+---+----I

o~--~--~--~---~--~

o

FIG. 6.

M m, Me,

Wb/WQ,

0.2

versus

0.6
0.4
Damping ratio,

r

0.8

r for system composed of two complex poles.

22-09

TRANSIENT AND FREQUENCY RESPONSE

(12)

Me = 1/(2.r),

(13)

Wb

=

Wo

response ratio at the frequency corresponding to the
natural frequency, or corner frequency.

~I

""I - 2t

2

+ v_I2 -

4t

2

+ 4t4 ,

b~ndpass frequency, at
which response ratio is
0.707.

Figure 6 shows graphically the relationship between these parameters and
damping ratio.
Effect of Real Axis Poles and Zeros. The mathematical expression
for a system whose dynamic characteristics can be described by one pair
of complex roots, one real pole, and one real zero is
C(s)

(14)

R(s)

The expression for the transient response to a unit step input may be
written as
(15)

e(t) = 1 -

W02(Z -

p)
2

exp (-pt)

Z(PPd)

+

(Z:d) ~p:) (::) exp (-uot) sin (wdl -

.p, +.p3 - .p.)

where PPd = distance from P to Pd,
ZPd

= distance from Z to

Pd.

A graphical representation of this system is contained in Fig. 7.
jw

-p

FIG. 7. Illustrating a system with one pair of complex poles, one real pole, and one real
zero.

FEEDBACK CONTROL

22-10

The term [w0 2(z - p)/Z(PPd)2] exp (-pt) in eq. (15) is neglected in
determining the following expressions for the characteristic parameters
of the transient response. This approximation is valid when P is much
larger than 0-0 (at least 3 times as large). The expressions for C/R /p, tp,
and ts are:
~ I = (ZPd) ~ P \ exp [-0-0(71" - l/13 + 1/;4)J,
(16)
R p
Z \PPd/
Wd
(17)

tp = (71" - 1/;a

(18)

+ 1/;4)/Wd,

ts = 3/Swo.

The settling time, ts , remains the same as before since the exponential
pole term in eq. (15) has been neglected. Thus, N also remains unchanged.
For multiple poles and zeros the expressions for C/R /p and tp in eqs. (16)
and (17) become:
(19)

I

~ =
R

p

(ITq pqPdl
pq ~ (IT ZqPd) exp [-0-0
q Zq
.

(71" - };wa

tp

(20)

=

71" - };1/;a

+ };1/;4) J,

Wd

+ };1/;4 ,

Wd

where };1/;a = sum of all angles between the real axis zeros, and the dominant pole,
};1/;4 = sum of all angles between the real axis poles, and the dominant pole,

ITq PqPd
pq
product of the ratios of the poles to the distances from the
poles to complex pole at point Pd,
"ITq ZqPd
= product of the ratios of the distances of the zeros to point Pd,
Zq
to the zeros.
=

As noted before, eqs. (19) and (20) are approximate, based on the assumption that P is large compared to 0-0, which is realistic for many practical
: systems. Conclusions reached from a study of eqs. (19) and (20) are:
.1. The time to peak, tp , is inversely proportional to Wd, the damped
natural frequency.
2. The addition of a pole increases tp and decreases C/R /p, the magnitude of the peak.
~ ,
3. The addition of a zero decreases tp and increases C/R Ip.
4. If a pole and zero are close together (dipole), their net effect on tp
and C/ R /p is negligible.

TRANSIENT AND FREQUENCY RESPONSE

22-11

5. Poles and zeros far out on the real axis have little effect on tp and

C/R/ p •
H the assumption of the real poles and zeros being large compared with
the damping factor (p» 0"0) is not valid, the values of C/ R /p and tp can
still be estimated though not by the simple use of eqs. (19) and (20).

1.0 t---+-+----i--~~--___:;;.",c----~o;;;;::____=:::::::;;;;;--

Correction for Ae-pt term.
O~---------------L-~~~r=----------~------------------­

o

Time~

FIG. 8. Illustrating the effect of a significant real pole on CfR Ip approximations.

The magnitude of the coefficients of the exponential terms such as the one
in eq. (15) can be calculated, as outlined in the next section, and then the
effect of these terms on C/R /p and tp can be estimated. By referring to
Fig. 8 as an example, it is apparent that the value of the simple exponential
term at time, tp , must be subtracted from the approximate curve to give
a more exact value of C/R /p. In other cases this correction might have
to be added. Of course this process becomes more difficult as the number
of significant poles and zeros increases.

Time~

FIG. 9.

Response with a dominant real pole and a pair of complex poles.

The addition of poles and zeros to the closed loop. response function of a
control system may result in a response function whose dominant charac-

_FEEDBACK CONTROL

22-12

teristic is that of the real pole rather than the complex pair. For such a
case the system resppnse to a unit step input might be as illustrated in
Fig. 9.
Coefficients of Transient Response TerIns

Frequently, it is desired, as soon as the closed loop poles are found by
graphical or other means, to determine the exact expression for the time
response. This may be especially true when it is obvious that two pairs
of complex poles are significant, i.e., they are both located about the same
distance from the origin.
The coefficients of the terms in the equation describing the transient
response of the system may be calculated by a formula developed in Laplace
transform theory. This formula may be interpreted in terms of the polezero configuration of the root locus plot for the system.
To illustrate this a system described by the following equation is assumed.

C(s)

(21)

K(s

+ z)

R(s)

If the appropriate Laplace transform for R(s) is substituted into eq. (21),
the expression for C(s) can be written. Assuming a unit step input in
this case (R(s) = l/s), then

K(s + z)
C(s) = - - - - - - - s(s + PI)[(S + 0"0)2 + wi]

(22)

In terms of partial fractions eq. (22) can be written as (see Chap. 20)
(23)

A
A
A2
C(s)=I(~+-_I_+
[
S
S + PI
S + (0"0 + jWd)

+

S+

A
]•
3
(0"0 - jWd)

The transient response equation for this system is

(24)

e(t) = K[Ao

+ Al exp (-PIt)
+ A2 exp (0"0 + jWd)t + A3 exp (0"0 -

Deterlllining the Coefficients.

The formulas for the coefficients A o,

AI, and A3 are:
(25)

(26)

jWd)t].

z

z - PI

TRANSIENT AND FREQUENCY RESPONSE

(27)

A2 = [8

+

(0"0

- (0"0

(28)

Aa = [8

+

+ jWd)]C(8)/8=_(UO+jWd)
z - (0"0 + jWd)

+ jWd) [PI

- (0"0

Z - (0"0

+ jWd)

2jWd( 0"0

22-13

+ jWd) (PI

-

+ jWd)][(O"Q -

0"0 -

jWd) - (0"0

+ jWd)]

jWd)

(0"0 - j Wd)]C(8)/8=_(Uo_jWd)
Z - (0"0 - jWd)

- (0"0 - jWd)[PI - (0"0 - jWd)][(O"O

+ jWd)

- (0"0 - jWd)]

Z - (0"0 - jWd)
-2jWd(0"0 - jWd)(PI -

0"0

+ jWd)

Note that these coefficients are the ratios of vectors in the root locus
plot. For example, consider Fig. 10 which ilhlstrates the pole-zero con+jw

x
I
I
I

+jwd
I

I

(-PI - 0)

0"

-X.~~~~==2F========~-====-=-~--------

I

-PI

~-O"o

I

-jwd
I
I
I

X

FIG. 10.

Vectors for determining Al coefficient.

figuration of the system under. consideration. With this plot the value of
any coefficient can be determined by drawing vectors from all other poles
and zeros to the pole or root which corresponds to the coefficient. The
coefficient is then the ratio of vectors from the zeros to those from the
poles. Figure 10 shows the vectors for the calculation of AI. From this
(29)

Al

-PI

+Z

= --------------PI[( -PI
Z -

+ 0"0) + jWd][( -PI + 0"0)
PI

-pd(O"o - PI)2

which agrees with eq. (26).

+ wi]

- j-IJd]

FEEDBACK CONTROL

22-14

Similarly, the other coefficients may be determined from the root locus
plot. Figure 11 shows the same system with the vectors oriented for determination of the Aa coefficient for one of the complex roots. From this
(30)

Aa
IAal/~a
Aa = - - = - - - - - - - - - - AIA2A4
(IAll/~1)(IA21/~2)(IA41/~4)
IAal

since ~2 = +j or +90°.
+jw

FIG. 11.

Vectors for determining A3 coefficient.

In like manner the A2 coefficient can be determined as
(31)

A2 =

IAal /-~a

-------..:..-....:.~-----

(IAII /-~t)(IA21 /-~2)(IA41 /-,/14)
IAal

The AI, A2, Aa, and A4 vectors can be expressed in terms of the pole-zero
locations in the root locus plot. When this is done eqs. (25) to (28) are
the results. These coefficients can be evaluated conveniently by use of
the Spirule, although a ruler and protractor will suffice (Ref. 1).
The time response of a system such as the one being considered here is
usually expressed in equation form with, a sine or cosine term instead of
the complex exponents. This is illustrated)nthe:follow:i,ng ,equations.

TRANSIENT AND FREQUENCY RESPONSE

22-15

From eqs. (30) and (31) the coefficients A2 and A3 may be expressed as

A3 =

(33)

IA.31

/A
_3,

J

IA31

A2 = - - . - / - A 3 .
J

(34)

Combining these equations with eq. (32) yields
(35)

e(t)

=

Ao

+ Al exp (-Plt) + exp (-uot)

[1~31
= Ao

where

.

IA31

exp (-jwd t - j/Aa)

+ Al exp (-plt) + 2 IA31

X

+ 1~31

exp (jWd t

exp (-uot) sin (Wdt

+ j/A 3 ) ]

+ /A 3),

IA31
I}.q I IA21 IA41

= -----

/A3 = /t/l3 - t/ll - t/l4

and these vector lengths and angles are shown in Fig. 11.
4. RELATION BETWEEN CLOSED LOOP AND OPEN LOOP ROOTS

Mathematical Relationship. The closed loop frequency response
function may be readily written in terms of the open loop 'function as

C(s)

(36)

R(s)

G(s)
1

+ G(s)H(s)

where G(s) = open loop transfer function of forward element
H(s) = feedback element.
If these are written as
(37)

then
(38)

C(s)
R(s)

Nl (s) / Dl (8)
1

+ N l (s)N 2 (s)
Dl (S)D2(S)

The closed loop poles are thus the roots of the denominator of eq. (38).
This is generally a high order polynomial and constitutes a tedious task
if it is to be factored. For this reason methods of estimating closed loop
roots from open loop roots are useful.

22-16

FEEDBACK CONTROL

Graphical Method of Determining Roots. A convenient way to
obtain the closed loop poles {roots of [N I (s) N 2(s) + DI (s) D 2(s)]} is to
use the root locus technique of graphically plotting loci of the closed loop
poles as functions of the open loop system gain (see Chap. 21). These
loci may be plotted from the open loop pole-zero configurations on the
complex frequency plane as shown in Chap. 21. The plotting is simplified
considerably by use of the Spirule plotting tool.
The selection of the open loop gain to give the proper system response
may be determined either by use of frequency response or root locus
synthesis techniques. In either case the configuration may be examined
for its transient response characteristics.
A method of performing the reverse operation, working from closed
loop to open loop roots, is presented in Chap. 23.
An Iterative Process of Determining Closed Loop Roots.· The
closed loop poles {roots of [NI(s) N 2(s) + DI(S) D2(S)]} can be found
more easily by mathematical techniques if known roots can be factored
out and leave a simpler polynomial. A technique is available (Ref. 2)
which allows closed loop poles to be found after the open loop response
characteristics of the system, including the open loop gain, have been determined.
The rules for determining closed loop poles by this method are the following:
1. An open loop zero located at a frequency lower than the crossover
frequency, We, is approximately equal to a closed loop pole.
2. An open"loop pole located at a frequency higher than the crossover
frequency, We, is approximately equal to a closed loop pole.
These rules are used to make the first approximations for the closed
loop poles. These approximate values may be refined by iteration using
the following expressions:
For open loop zeros much smaller than We,

(39)

where Pi = closed loop pole,
ZI = open loop zero much smaller than We,
Pi-I = value of Pi found by previous iteration (equals
iteration),
n = order of open loop zero.
For open loop poles much larger than
(40)

were Pi = closed loop pole,

We,

ZI

for first

TRANSIENT AND FREQUENCY RESPONSE

22-17

P1 = open loop pole much larger than We,
Pi-1 = value of Pi found by previous iteration (equals P1 for first
iteration),
n = order of open loop pole.

If n is greater than unity, n values for, Pi will result with each iteration.
Further refinement should continue only on those values of Pi remaining
far from We. If the value of Pi approaches We the accuracy of the technique
is poor.
After these closed loop poles are found with sufficient, accuracy by iteration, they may be factored from the closed loop polynomial characteristic
equation for the system. The resulting lower order polynomial may then
be more easily factored.
In general if the open loop poles and zeros are larger or smaller than
We by a ratio of 3 to 1 or greater, two iterations will result in sufficient
accuracy for finding the first closed loop poles.
The coefficients of these transient response terms may be calculated as
indicated in the previous section.
EXAMPLE. Determining Roots. Assume the open loop transfer function

(41)

G(s)H(s)

= s(s

400(s + 1)
+ 2)(s + 10)2'

As previously stated: open loop zeros less than We are approximate closed
loop poles. The crossover frequency, We, is 4 rad per second, as may be
easily determined from a graphical plot of eq. (41). Therefore,
(42)

Pi-1 = -1.0.

To refine this approximation
43)
(

. 1 ~ -s(s
(Pl + ) ~

+ 2)(s + 10)21
400

::::: _ [-1(-1

8=-1

+ 2)(-1 +

10)2]

400
::::: 0.20.
Pi ::::: -1

(44)

+ 0.20 =

-0.80.

This may be repeated,
(45)

(Pi

+ 1)

-0.8( -0.8
::::: - [

+ 2)( -0.8 +
400

::::: 0.20.
(46)

Pi ::::: -0.80.

10)2]

22-18

FEEDBACK CONTROL

Similarly, open loop poles larger than
poles, so
(47)
(48)

(49)

We

are approximate closed loop

Pi = -10, -10,
(Pi + 10)2 =. -400(s + 1)
s( s + 2)

I
8=-10

= + ¥cf- = +40.
Pi = -3.7, -16.3.

Since the We is 4, only the larger root can be expected to be useful. With it,
repeating the process gives
2
-400( -16.3 + 1)
(50)
(Pi + 10) = -16.3( -16.3 + 2) = 26.3,
(51)

Pi = -4.9, -15.1.

Again,
(52)
(53)

-400( -15.1 + 1)
2
(Pi + 10) = -15.1( -15.1 + 2) ,

Pi = -15.35.

The two closed loop poles thus determined are s = -0.80, -15.35, and
hence they may be factored out of the expression
(54)

Nl (s)N2 (s) + Dl (S)D2(S) = 400(s + 1) + s(s + 2)(s + 10)2

= S4 + 22s3 + 140s2 + 600s + 400
to give the closed loop poles near
(55)

We

as:

S2 + 5.85s + 33.3 = (s - 2.93 + j5.34)(s - 2.93 - j5.34).

Thus, the closed loop poles are:
(56)

s = -0.80, -15.35, (-2.93 + j5.34), (-2.93 - j5.34).

In this example if H(s) is other than unity these roots are not the roots of
C(s)/R(s), but are the roots of H(s) [C(s)/R (s)].
5. DESIGN CHARTS RELATING OPEN LOOP FREQUENCY RESPONSE AND
TRANSIENT RESPONSE

An approximate method of relating steady-state frequency response
characteristics and transient response characteristics has been described
(Ref. 3). It makes use of a series of charts which indicate the type of open
loop attenuation curves required to produce desired closed-loop responses.

TRANSIENT AND FREQUENCY RESPONSE

22-19

If a servo system falls within the group considered in the charts, this
method enables the designer to take a set of specifications setting forth
steady-state frequency and/or transient response requirements and quickly
estimate the necessary open loop characteristics. The charts also permit
the designer to estimate the effect of changing various system parameters
to give him a better understanding of the system.
Description of Charts. The symbols used on the charts are defined
below and illustrated in Fig. 12.

maximum ratio of the closed loop frequency response (Mm)
peak value of the ratio of controlled variable to input for a step
function input
ratio of the frequency Wm at which C/R 1m occurs to the frequency
at which the straight-line approximation of the open loop response is 0 db

We

Wt

ratio of Wt, the lowest frequency of oscillation for a step input, to
the frequency at which the straight-line approximation of the
open loop response is 0 db

We,

the frequency, We, at which the straight-line approximation of the
open loop response is 0 db times the response time tp measured
from the start of the step function until C/ Rip occurs
the frequency, We, at which the straight-line approximation of the
open loop response is 0 db times the settling time, t s , from the
start of the step function until the output continues to differ from
the input by less than 5 per cent
Indicated in Figs. 12a, b, ap.d c are these various characteristics in terms
of the familiar curves of the open loop transfer function, the frequency
response, and the transient response to a step input. The charts, Figs.
13 to 30, were prepared for a system with an initial open loop attenuation,
Fig. 12a, of 20 db per decade. However, the shape of the curve near 0
db is of greatest importance, so the curves may also be used for systems
with initial attenuation slopes of 0 to 40 db per decade.
Limitations. Of necessity the charts may be used for the analysis and
synthesis of a somewhat restricted class of servomechanisms. Their use
is restricted to:
(a) Linear systems, or those which may be considered linear for a restricted range of operation.

FEEDBACK CONTROL

22-20

c

Open-loop transfer
function G
WI

W2

We

Frequency, radians/second
(a)

Steady-state
frequency
response

Wm,

Frequency, radians/second
(b)

Transient response
following a unit
step function input

~-

Q

o~----~-------------------------------------------------Time, seconds
(c)

FIG. 12. Sketches showing nomenclature used in the design charts of Figs. 13 to 30.

(b) Single loop, unity feedback systems containing only series elements;
of course a multiloop system may be considered if the inner loops are
reduced to equivalent series elements.
(c) Systems whose open loop characteristics fall into the category of
servomechanisms described by Fig. 12. However, systems which ostensibly are not of this type may often be approximated by some which
are, especially if the required approximations occur at an appreciable
distance from the crossover frequency, We.
(d) A step function as the form of the input signal producing the transient response.

TRANSIENT AND FREQUENCY RESPONSE

I

2.4

I

~

/

1.8

J

C,)

1.6

1.4

/

v

/

/

J

,y

1.2

I

I

1I

I

lL

I

J 1
L ;
!
I)
II
/ i
)
II / /
/1/ /1/ IL

/

LV
'/
III = 40

I

l..-

II

~L

'r1

I

Ii

1- t-- I-~~

!
-L
1-

_-f/
.'

:;::.

0.4
0.2
0

w3

I

11
]I

........

- - ._--

--- -...

III = 80

III = 60

0.01

-

= 1

-/

.

'-I-

III = 40

WI

'-

,!

1

1

1

i
/

I

I..,I,<'L
/

- ~'

--1
1/

v'
L
~

----

Wetp - - wets - - - _

WI to W2 = 40 db/decade _
W3 to 00 = 40 db/decade

,

-...:.....- ~

III = 30

0.1

/

Wt

we

---- /A II~
We

i/

~

i
J
-i- ~ "1-r-- h~~
."
-1/
j

I .....

---

We I

I

;

1/

0

:J

%Ip

v

J

II

-:::.-... 1.0

Q)

Rm

0.1

!

0
......

0.6

ci ------

~ WI to W2 = 40 db/decade
/.
w3 to 00 = 40 db/decade

~~
III = 30

J

0.8

i

I

III = 20

1.2

3

I

I

0.01

1.4

!

I

III = 60

1.0

~
>.
0
c:

I

V./

III = 80

I

I

J/

/ /

i

!

I L
/ L
/ /

1/

1

I

I

J

J1

f

I

2.0

I

J

I
II

2.2

22-21

W3
-=1
We

t-

III = 20

1

We

FIG. 13. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

22-22

FEEDBACK CONTROL

WI to W2 = 40 db/d~cade I
wa to 00 = 40 db/decadewa
- =2

2.4
2.2

~

I

,

I
I

We

£1R -----

J,

m'

1.8

I

i

i
! 1/
/ /
/

1.6
I

1.4

pY"I

1I

1I

V

1/

/

f.'l = 80

}

1

I

If

I
J
// !
/L '1--,-

IJ

V

I

~
tL
~~l = 30 .,d

~
I~~

III = 60

1.0

1

I}

~'

I.?

1

I
I

I

III ="'40

II

.. k'

0.01

~""f.'l=20

0.1
WI to ~2 ~ 40 db/d~cade .
w3 to 00 = 40 db/decade -

1.4

~ =2

We

1.2

"-

"'-

\

.Q

e 0.6

/

~
c:

Q)

g. 0.4

Q)

&t
0.2

o

./'
1

I

'I
\. ...

Ii

III = 80

~ ~ L1
JL 1-~
f'. 1-'1 ,V/1
/
~I i
1'""['.

1

\/ /

I

'

\. .

//

/

-

/

J

V

)

V-"",,-

r-

III = 60

--

-

I

V
V
11~

1-,...,.
0.8

_

~Ip --- -

j

;

Q

1.2

I
II

;

,/

2.0

-

I

I

VI
/'~'I

"

I

I

,

j. '-j

"-

wets - - - -

I

L .

"'-J

If

//
W---

III = 40 f.'l = 30,

wetp - - -

I

1

.../

'J~
I

,/

i

/

~
We
1 Wm _____
We

--"'-1'"

III = 20

II
0.01

0.1

FIG. 14. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig., 12 for definition of nomenclature (Ref. 3).

TRANSIENT AND FREQUENCY RESPONSE

22-23

2.4
2.2
2.0

~

WI to W2 - 40 db/decade
w3 to CIO = 40 db/decade
w3

1.8

l>

~Im ------ ~
~Ip

I
I

1.6

I

/

,V

1.4
I

I

/

I

//

I

I

I

.// /

/V

1.2
1.0

We = 4

I

JI~

~~V
III = 4~ ~

./

III = 80

1~=30

III =.60

" //
I

V

I

V

~~

0.01

I

./

/v

/'

~1=20
0.1

1.4

-

WI to W2 = 40 db/decade
w3 to CIO = 40 db/decade -

1.2

W3
- = 4
We

s

....... 1.0
3
0.8

-"

Ij

--

/

,1

\.

\

I

'~
I

I

I

/

0.2

./

I'

/

III = 80

o

I

11

r-t/

I--

\/1 ,/
/['1 1/

-:'1-. .... /

.... ~t'--

-

l,'p'

I~~

.....

/ .......... /

'J.~'

\

I

~: N
./'

,1
il

J
l7

/1\1\/ 1/I/J

/ '1--:// I "

Wt

We
Wm

We

-- --

---- -

wetp - - wets - - - -

b<~J 'J-..
/,.- t<;'r- I~ ..........
I

J,

III = 40 III = 3o IIlI = 20

III = 60

1
0.01

1

j

I

..........

;

II

J

-

II
~

I
0.1

We

FIG. 15. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

22-24

FEEDBACK CONTROL

2.4
2.2

Wl to w2 = 40 db/decade
w3 to 00 = 40 db/decade

~

2.0

1.8

il

Q

1.6
I

1.4

I

I

1- J.LI = 80

I

//

f

9V'

I

I~

J.Ll = 40

V

~V

I

//

1~=30. i-"~1=2~"'"
0.1

i-""'J.Ll = 60

1.0

0.01

/'

I

1.4

WI to W2 = 40 db/decade
w3 to 00 = 40 db / decade

1.2

--~ ----We
Wt
We

W3
=8
We

-

o
.....

wetp---

~ 1.0

wets - - -

3

\

.Q

g.

0.4

\,'l

,

£

I "

""

0.2

o

-J.LI = 80

I

II

1

["

/1

1~'1

..... ,

I

I

11

l

0.01

=

,!

t

7" "

I

I . . . . , ~/

"r-. ,,"

i

L
II II

It.

/'\ Ii

1/

.....

II

-" \

'''\

0.8

~
cQ)

-

---

I

/

V

I~

1.2

p

I

I

~ 0.6

-

~Im -----

~

=8

60

1

~

J
11

~
\

,~

..

'/~

in "

/r'~

,

~

~

1_',

j(i,

~,/1'~!
~b-L

IL~

~ ~~~~

':iL

1"

// ~

th-)<-

",I'

J.Ll = 41 J.Ll = 30 I

-'

/
lit
1
l
L
1 .......
J.Ll = 20

I

0.1

FIG. 16. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

TRANSIENT AND FREQUENCY RESPONSE

22-25

2.4
2.2

Wl to
W3

to

2.0

~

40 db/decade

W2 -

00 = 40 db / decade
W3
w.; = 00

~Im

1.8

-

----

-

~Ip ---

c..>

1.6
L'

1.4

I

~~
I .........ILl

I

ILl

= 401"

\.//

II/V"

1.2
1.0

/

= 80

I-""'"ILl

/1 ILl

/

~

~~~

/'

= 60

-:;::::'"

0.01

.......

= 30
~

~IL1=20

=r- 1:::::"10-"
0.1

~

We

1.4
1.2
0
.....

~

<>

~

\

0.4
0.2
0

1\

"'- /1
/'1',/
I
ILl = 80
I

-----

v'

I

if,

"'-

'-...., 1/
/'r',
I'

/

~l

/ ~I~r---\

:\

'/..,

I

y.~

\~,

I

I

1

/ f--..J

i 1', I,/-Ii r--.~ // h lL
/
lb,
:
" 1 1".1 /

""J

/

L

= 60 {

0.01

/

!'

"\

. . .l /

If

"-

-

I

\

Iff.

"-

--

wets _ _ _

V~"

\

0.6

QJ

~

We

1-

s:::

J:

~
Wetp

0.8

>u

Wt
We

1.0

3

:8

II

= 40 db/decade
00 = 40 db/decade
w3
-We =00

WI to W2
w3 to

~

V

/

r--[",,-_/l'
II

l-

/-II

~'

= 4DJ i/-II = 30
~

~

II

I
/
/-II

1

= 2~

0.1

FIG. 17. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

22-26

FEEDBACK CONTROL

2.4

W~ to ~2 ~
t-

W3

to

CX)

= 40 db / decade
W3

I-

-=1

We

2.2

t-

l-

2.0

-

t-

I

I
I

/

II

i

Q

/

1.6

/

1.4

/

/

/ I

I

~ 1.8

I

J

/

/

!

/

J

/ 11

~

1/

I

I

/

IV

J..Ll = 60

j

,/

"

V/
V/
V

,
If

!J

If

_L

II,

/

/'
/1

/

/

Y

JJ.l = 40 ~
JJ.l = 30
I I I

1.2

,

i

fi

V"

II

I

I

/

I' /

J..Ll = 80

,

I

;
i

---QIRp
~Im

II

,,
,I

II

I
I

I
I

60 db/dec1ad1e

/ V

/

/

/

I I""""
JJ.l = 20

1.0
0.01

1.4

l-

I
WI

I

to W2

w3 to

CX)

I-

1.2

II-

0

~ 1.0
3

We

0.8
0
:;::.

I-

-

:! = 1

II

I

I

'~

r-r---jf-,-I---i--+-+-++-HI,-t---I,--+--+-+-+-+-++-H

;
!

We

Wm

I

I I I II

= 60 db/decade
= 40 db/decade

Wt

.-I

I-

I

0.1

----

,.

!

1/

j

!! V

!

wetp _ _
wets - - -

E 0.6
~

c::
Q)

g.

Q)

0.4

~

-r--

1---+--+-+-+-++++7.JJ.-l= 80 ~

...::r:.::- t-,
-= 60 '-t-- JJ.l = 40 F--=
~ 30!--==JJ."-1....L_.--=2-;;-0P---ir--t--HI-H
I I

0.2

I

0
0.01

0.1

1

FIG. 18. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

TRANSIENT AND FREQUENCY RESPONSE

to W'2 =' 60'db/d~C~d~
W3 to co = 40 db/decade
W3
f- =2
We

2.4 ~
2.2
2.0

~

IIf-

I

I

I

-:

I

Wl

I ---*Ip - C

Ii

,

I

,
I

I

/

/ I

//

1.4

I

r

I

V

1/

J

I

)

II

/V

I

I
I

I

/
,,

/1

1/

1/ I
7/I 7 7" /
V I
J
IIV
//

f/

~l =,40;::~

/v

tY

}/

LIb"

III = SO/Ill = 6 0 /

~11

III = 30[..-

A
'---/Ill
= 20
_/

1.0
0.01

11
I'

,

I

V
,

1

,
,

I

;

/ 1/
/ II

J

//

1.2

I

I

II

1.8
1.6

I

:

I

I

c..>

-:

I
I

r

I

m

I

I

I

,!

22-27

WI

0.1

we

1.4
1.2

s

--

Wm
We

----

t- Wetp

----

-:;:; 1.0 tOJ

3

0.8 I.2

t-

ro

~ 0.6
u
c:

60'db/decad~

I
i

w; to W'2 ='
w3 to CO = 40 db/decade
W3
t- = 2
We
tWt
We
ff-

wets

L

1,1

-I

I

Q)

/

0.4

a.'i:

o

I

! ! II
II II II!
_II J AI

I)
i /
Ii
: If
)/
7f.). 1-1-- 1--r-..~111 V; !
l1'f-j1 ~J /
}
I..i\'-t--,.
't-~
Ai / f /
\ 1 \ 1/ ,
\// \~" "-r--. J~ -..i.,/ I /
II

0.2

1

!
!

i

:J

g

I

,

III = SO

0.01

"

/

---- I

/1
/1

I

"

III = 60
-,

I

/1/

-I-

t"..

t-t-.r-.

~

/Tt"-.::::

Il~ ~ ~Ol/

III

I

Z _/

--

=3~

t-:::j-.

Il}

=~O

0.1

FIG. 19. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

FEEDBACK CONTROL

22-28

I

I

2.4

f-

2.2

ff-'

2.0 -

~

I

I

I I II

f- W\ 10 W2 = 60 db/decade
w3 10 ex> = 40 db / decade
W3
- =4
We

£r

I
I

l-

I

p

/

II

r

/ 1/

1.6

J

J

/ /

1.4

/~

/

I

/ 1/

/IL

//

.I~

~

~

J1.1 = 80_ J1.1 = 60

I II

I
I
I
IJ/
I
/ III II I
/ /
I i
/'/ / V ;' l/

I

!

/

I

40 """
30 - ,-r:::1=t

J1.1~ =

-

~
WI;

I-~

3

I-

f--

We
Wetp
wets

0.8
o

~

~

c:

Q)

I

1/
i

--

0.4

0.2

o

V

~1=20

---

--

---

II

II

\

ll.

112

If!

0.01

'-- I

J1.1

:;= 60

WI
We

JI I
/,1

t-~

J1.1 = 40

~ r-..

I

II
II I

J

/

nf

1\ 1f/'-l/1 /I~
11K 1/ //-f.- V//
d 1/ '!-J l'if'
KL
~ r-r-; r-----.J j.z (j

/ I V
pz:::. j .")Zr-J --,...
I
I-L ri 80 I
II

1

l

V- II hI\.

/j

J1.1 =

II /

II
1,1

0/\,

\,

!J I
1/ I

II

I

\/

1\

II

I

II
Ii/

0.6

:J

/'

0.1

WI to
= 60 db/decade
w310 ex> = 40 db/decade
W3
-We =4

o

~ 1.0

V

w~

~

1.2

//

1/11
~/

0.01

1.4

I

i

J
J

/

I

1.8

C)

1.0

Ii

I

£[
-R

1.2

I.

I

-~-

R m

I
I

tr.:.z r-...I':::'

I
• ....;' I

1---......

;1-

I

0.1

r-...

J1.1 = 30 J1.1 = 20

1

1

FIG. 20. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

TRANSIENT AND FREQUENCY RESPONSE

2.4

II-

2.2

II-

2.0

I-

, , '.

'

,,

I

"

to w2 = 60 db/decade
wato co = 40 db/decade
wa
- =8

WI

,

I

:

We

----

~Im

I

C,)

l

1.6

/

I

/
/

///

1.2

-/

IL

V

V

1;'
/

/

V

l

V

V

~
= 30

I

/

/
/

/ / /

II

/~/

~"

0.01

I

I

.....
J.l.1 - 80~ = 60-tl= 40
I II
I'
1.11

1.0

I

I

/

/ L

1.4

I

/

/

I

ct: 1.8

,

I!

If

~IRp

22-29

/
I

/

l/ /1 L /
V 1j,1' // /

LV

;::" . /

~

.... 1--'

J.l.1

= 20

0.1

WI
We

1.4

WI' to
I-

W3

to

'-

I-

o
.....

Wt
We
Wm
We

~ 1.0 '- CJJetp
3

0.8

CO

:i'-/

= 40 db/decade

-w3
=8
We

I-

1.2

w~ = '60 db)de~a~e'

wets

--

.~

----

----

\
.'-...,..-

o

~>. 0.6

/~

u

c:

,/ I"~
tf

g 0.4

J.

I

.lL

L
II

-. /1J//...\
.~

/

,/

II

0.2
J.l.l

l
-l

/

L

L

/

J

I

L 1

/
/-,1

/1

/1\/~1

/1 /,~

/~~?f' "it

jK",

-

Lt

1/

II

/A I//L 1',- '
;~
7'--.{j
/1- N
r'f,
1""'"'7-~ y---- '.j.
...t, /~ ...~If/ ':'-~ tf-"' .....
/
~ = 40
= 331'
= 60
1M = 20
= 80/
// II
/
Ill . . .

CI)

o

\ /
XI

1\

'I

0.01

J.l.1

I

/

~

J.l.1

0.1

FIG. 21. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

FEEDBACK CONTROL

22-30

I

2.4
2.2

~

I

I

I

II

'I

60 db/decade

W3 to 00 = 40 db/decade
W3
I= 00
We
II-

2.0

I

I- WI to W2 =

f-

!

~Im --QI
R p -

/

J

I

1.8

I

/

J

I

I

1/ I /

/

v' 1/

1.4

1.0

//v ./ V
~
Le- V
J.l.1 = 80-_ 60-'--.:: 40

~

J.l.l -;-

~1

-

J.l.l =

0.01
I

1.4 _

II-

I

I

I

I

30

I

I

I

/

;

,7

,/ /

/

/

,;V / V
/

~v

~"'" "I~

7
//

/~

~1= 20

1

0.1

I II

:~ ~~ %2 ~ ~~ ~~~~:~:~:
W3

I-

1.2 -:;:;- 1.0

V

II

I

!.I

/

//

1.2

3

I

/
I

1.6

u

I

f

Q

~

I

I

-We = 00

/

Wt

We

/11

Wm

We

I

1/ /

J

II

0.01

J

II

1\

0.1

1

FIG. 22. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

22-31

TRANSIENT AND FREQUENCY RESPONSE

2.4
Wl 10 W2 = 40 db/decade

walo co = 60 db/decade
wa

2.2

-=1
We

/

2.0

~

1.8

/

/

II'

/

/

/

Co)

II
I

/

/

IJ

/

~IR p

/

V

/

U?

/

/'
J.l.l = 80

J.l.l = 60

J.l.l = 40

1.4

J.l.l = 20

J.l.l = 30

1.2
1.0

0.01

0.1

1.4

Wl 10 W2 = 40 db/decade

walo co = 60 db/decade
W3

1.2

.....
-...
..... 1.0
3

We

--

wetp - - -

0.8

,-

~
>.
u

0.6

5-

0.4

I'---

-

f---

c(l)
(l)

=1

We

<>

~

-

We

0

'-r

It

J.l.l = 80

J.l.l = 60

J.l.l = 40

J.l.l = 30

J.l.l = 20

0.2

a

0.01

0.1

FIG. 23. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

FEEDBACK CONTROL

22-32

I

1

:

I

!

,I

2.4
2.2

If

I

-

~ 1.8

1/

I

1.6

1.4

II

I
I

/

/

"/ /

= 80

III

I
/

;'

v

1/
III =

= 60

/

1/

i

40 VIII

I

II

I

I

/

/

/

I

II
/

= 30

V

WI to W2
w3 to co

= 40 db/decade
= 60 db / decade
W3

-

0.01

I

II

1.2

8

~..,

1.0

3

0.8
.Q

~ 0.6
~
c

I
I-

--

IIr-

i

I

/

i

i

/

i

_1-::-V --- ./
/

1-

-'";-k--

7

i

;

!
II

If

!
!

Wt
We

/

Wm

....... ~~
~~
...... Wetp
;-~~
:::~--f
-I-

/

!

i

!

wets

--

-

---

-

--

-

---

/1
1-

I

WI to W2
W3 to co

= 40 db/decade
60 db / decade

=

-

W3

-

-We =2

IV

g.

0.4

---

J:

0.2

o

III

= 80

III

0.01

ITT

!

f
/

i

.'

I

-

0.1
Ii

1.4

=2

We

I

1.0

-

" /1l1=20

I

1.2

I

/

V

VI
"

,

%Ip ---

/
/

I

I

~Im ---

I

I

/

I

/

/
III

,
I

" //
v

I

I

C,)

/

I

I

,

I

I

,/

I

,

f

!

I

2.0

,
,,
,

:

1/

1

I
f

= 60

-

-:t--tIII

= 40

III

= 30

III

= 20

0.1

FIG. 24. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

22-33

TRANSIENT AND FREQUENCY RESPONSE

WI to W2
W3 to 00

2.4
I

,

I

:

I

2.2

!

2.0

~

1.8

I

ij

I J
I J

1.4
1.2

I~

1/

~t/

/

t

,II

1.0

I

/1/
~

=

I~I'
40 ....

I

....

0.01

-

,,

II

I

/

V

vb
,

m

/7 / II
/J
//

/[J

~ J1.1= 60
/'
J1.1
I

J1.1 = 80

I IJ

-

%Ip ---

I

1/

/ J

I

1.6

R

/
I

II

=4

£1 ----

I

I

1/

I

1/

CJ

We

1

I

I

-

I
I

,

!

I

= 40 db/decade
= 60 db/decade W3

11'J1.1

V

= 30/J1.1 =

20

I~/'
1

0.1

1.4
WI
W3

1.2

-....

9

'

..........

3'"

0.8

QJ

/

\.

E 0.6
:>.
u
cQJ

)11

~

--.....

~

6-

~

1\

1.0

"-

Ii

I

II

/

\l--'

/

~

f,..........

"\

"""

a

J1.1 = 80

'\

I

]
)'

1,1

III',

I"-f.--V

/

1/

/'

Ai

/1

Ii /
I'

t--tJ1.1 = 60

J1.1 = 40

J1.1

= 30 I

~-

J1.1

-

We

7'

.1/

I

Wt

Wm

-We ----

I

/

-

- - - --

/

Ii -- /

I! /

,I

!

/

r-t-

0.2

j
/

//

W3

-We =4

!
II
j/
'V J /
/-,V II 1/ 1\·. kf:; ~l,L../).....

/

0.4

I---r--,

to W2 = 40 db/decade
to 00 = 60 db/decade

Wetp

---

wets

---_

.J
1-_

= 20

I
0.01

0.1

1

FIG. 25. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

FEEDBACK CONTROL

22-34

WI to W2 = 40 db/decade
W3 to co = 60 db/decade

2.4

I

W3
- =8
We

2.2

~Im
~Ip

2.0

~
C,,)

I
I

!

1.6

I

!
I

I

I

II

I

/

II

/

1
il /

/ /

~ILI = 80

"

I
I /

'/

~~l

fil=60

I............

~l=

= 40

/'"

30 01=20

~

!-H1

0.01

II

//

/v

/V

//

1.0

0.1
I

I

w3
- =8
We

1.2

S

I

3

"-

0.8
~

......

LL.

0

....--,

I/~

ILl

\/1
/

r.. ...
== 80

/ "~

'/

r;""

/'

I

ILl = 60

0.01

I

1\.rJI- ~

1'/

r'}ll

..

/ ~

......

')'. . . . . 1
......... .......
r.,"
"

~y..
/

.I

,'/

l

, ,..-,1'\.
rl

-.......

/

/

I

/7

\

I)
,/

cQ)

0.2

1,/

\

~/V

0.6
0.4

Wt
We

I

~ 1.0

~

I

I

WI to w2 == 40 db/decade
w3 to co = 60 db/decade

1.4

C'

------- -

1.2

:::l

-

I

1.8

1.4

~
>.
u

I

,II

ILl = 40

1/

I

I

-

-

---

II Wm ----We

.4

//

Wetp
wets

~(V / .j.J

--.--- -

, 1'7' /, f"
)<1'

r--

//

I

J A'-JI . . . . .
II;(
/

ILl = 30

0.1

ILl = 20

1

FIG. 26. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

TRANSIENT AND FREQUENCY RESPONSE

-r

I

2.4 r-

W3

~

2.2

f-

r-

2.0

~

f-

I
I

to W2 = 60 db/decade
to 00 = 60 db/decade

WI

£1Rm

We

il

:

=2

----

I

I

C

RIp - -

I

/

C)

1.6

V
iJ.l

/

/

/ /"
I

V

/

= 80

iJ.l

:

,,:

40
iJ.l

/

r

1/ I
// V /

~

l/
/1.1 -

If

1 II,'
II /rl

I/~

= 60

l

/1

/

/

1.4

I
I

/

I

/ 1/
II

/

1.8

II

II

/ !

I
I

-:

11

/
I

22-35

J;,v //

V

l/ /

= 30

~I

1.2

/

/

/

=20

1.0
0.01

0.1

Wt

1

We

1.4 r-

WI
W3

to

W2

to

00

W3

t-

1.2 tf-

~

~

<>

1.0 t-

3

.

-

We
Wt
We
Wm
We

t- Wetp

0.8

= 60db/d~C;d:= 60 dbJdecade

t- wets

= 2

i

/

"--

'r- ..... 1-

~

-

It:="":

0

e

I

;

i

I

,I

----

----

Ii

,/

--- 1/

II I

/

/

/
I
~.-;;
~

!"
!

fC:.":-e:...... - ~r-;....

-

~/

0.6

../

/'

~

t::

(I)

::J
0'

0.4

(I)

~

It
0.2

iJ.l

= 80

iJ.l

F

= 60

/.1.1 = 40

/.1.1 = 30

-- -

/.1.1 = 20

a
0.01

~

0.1

1

We

FIG. 27. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).'

FEEDBACK CONTROL

22·36

2.4

WI

I- Wa

lI-

2.0

-

l-

~ =4

~Im

~Ip

iI

-I

V
J.l.1 = 80

I

I

I

I

I

"

= 60 db/decade

WI to W2
t- wa to GO

= 60 db/decade

I-

wa

II-

t-

We

We =4
--

Wm
We

----

~

I- We tp

0.8

./

wets

IL

V

~

IL

I/lI'

/

!Iv

/'

I

/

I
I!

V

I
V

IL

/V

1L

/'

..... ~I' "I'
J.l.1 = 4~1 = 30 ~
~/'J.l.1 = 20

L'1

J.l.l ;" 60

0.01
I

3

/

/V

II

1.0

~

Y

l'
1/

I

I

/ J

//

1.2

-:;:;-.., 1.0

"V

/

I

i

I

f

! L
! /1

I

I

I"

I
I

II

I

:

I

I 1/

1.4

1.2

/

I

1.6

.

II

;
I

l-

/

I

C,,)

14

L

II

R:: 1.8

:

I{

I
I

I
I

I

I;

I
I

----

I
I

I
I

:

Wa

I-

2.2

I
I

to w2 = 60 db/d~cade
to GO = 60 db/decade

0.1

---+---+-_+__t_+-+-++-t----jl~_r__tl_+__t___r__t_t_t__H
;
: ~

J

t----.,!11L ~
!"i.. I JJ

v

JIJ(

_

-.- _ r-===-=-.::::i'::..: .-.; -

----

.Q

~ 0.6
>.
u

c
<1.1
::s

0-

J:

0.4

,/

-

0.2
J.l.l = 80

J.l.l = 60

J.l.1 = 40

M

= 30

J.l.1

I'-

I

= 20

FIG. 28. Charts giving comparison of steady-state frequency response and transient
. response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

TRANSIENT AND FREQUENCY RESPONSE

I

2.4

I
ex)

t-

CI

I- R m

2.0

-

1 I, I I I

I

I

i

I

I

I

I

I

I-

,
,

= 60 db/decade

W3
We

t-

2.2

I

I- W1 to W2 = 60 db/decade
W3 to

=8

,

I

----

/

I

QI - Rp

j

~ 1.8
C,)

I

1.6

I

/

I

;

/

1/

II

I

/

II

"

III

1.0

W1 to W2
t- W3 to ex)

= 60 db/decade
= 60 db/decad .

t-

w3
We\

t-

I-

S... 1.0
3

t-

Wt
We
Wm
We

I- Wetp

~; =

I

0.8 t- We ts

I

= 8

~1=20
0.1

30 I---

IV

L I

jJ

----

----

11

i\ I
~

1\

0\

.I,
~/rL

it,

\,

AL
I
// 1

1/

/

c:

OJ

PZ-l
/

0.2

1-11 = 80/

a

1111

;;

--

>.
u

!t

I

J

~ 0.6

OJ

I

It

I

11/

.Q

g. 0.4

I

I

I

/ /,/

J

0.01

o

I
VI

/L L j
/ VL
j
vL lV j V
iL
lL
I/i.I
//
~
LL ~~
~
1-11 = 4'oL~V
./'
1-11 = 80
~
1-11~

1.2

1.2

1

I
I

II

I

J

1

II

1

/

1/ II

1.4

1.4

22-37

/
0.01

V

/1

,t'

~ ... r--.J

1'-_

V

)

....

11 i

.1 j .1 £1
~
111 '1 .,/,
v~/..ti/l liL
J
~' ·'U/'1::./}-/...
1'.-..1 .
~

"
f-/,. •'>' .... ~1rt.'L

1-11 = 60/

1/

It

J'
l!

/:-1 .fl

v ........ J/'I-f -J

-b~ r... ",,-:': r-- ... ~

1-11 = 40

I

1t;1
I

= 30 J.L1 = 20

I

0.1

FIG. 29. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).

FEEDBACK CONTROL

22-38

II,~

WI to W2' = 40 db/decade'
W3 to co = 60 db / decade
w3
We = 4

2.4
2.2

~Im

J.l.1 = 80

6~i:

J.l.1 =
J.l.1 = 40..JJ.~
J.l.1 = 30 --:-f-4.
.
I ~

----

R p

~
C,)

I

1.8

J.l.1 =

1.6

,-1 I /
20" j':"<
1/

~~

t./

h

1.2

~ ~J.l.1 = 30
III~ ~
.... or<. "'t J.l.1 = 20 ~

0.01

0.1
WI to W2 :: 40 db/decade
w3 to co = 60 db / decade
w3
-We =4

..-.

~ 1.0

Wm
we

3

:8

~
>.
(,)

I

I

J.l.1 = 20
- ' - fJ.l.I = 30

~~=40 .

Wt
We

0

0.8

,/

~ ~tf ./'
:JVJ.l.1 = 40-

J.l.1 = 80
J.l.1 = 60~..& ~ ~

1.2

1.4

I~ }/I

if r4 ~ t/

1.4

I

1/ I
III}

1.0

~

I I

~I-

2.0

if-

J.l.l = 60

-----

wetp

---

wets

----

i

~.

-

~~

~~;-~ ~~ ~IB
1
'i"'~~-=-~ '/ .
~

-

J.l.l - 60 &[;f~/
'-~ ltl 40
;'1 V

I ~V
=
=60/ --~', lI'l~ /P,1 = 3?LV
J.l.l = 407' .......
~
~ / J.l.l =I, 2.2it'
11/
~
J.l.1 =
30 .
1/ h.l'd ....... f~~'J
"'-·T-1;
~/L ·"/~J.l.l
=
20.
./.J:] ~Yl = 60,80
30.40
If 7 ~J.l.l o
J.l.l

0.6

/

cQ)

g. 0.4

~

J.l.l =

lL.

0.2

/

=t

I

0
0.01

0.1

1

FIG. 30. Charts giving comparison of steady-state frequency response and transient
response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). Note
that W2/ We is the abscissa for these charts.

TRANSIENT AND FREQUENCY RESPONSE

22-39

Uses of Charts. Figures 13 through 30 are comparisons of steady-state
frequency response characteristics and transient response following a step
function of input as a function of wdw e (Ref. 3). The information presented in Figs. 13 to 30 is useful for analysis, that is, determining the
response of systems already designed, or for synthesis, that is, determining
what sort of system will be required to do a specified job. Typical examples
of each are presented in the following.
EXAMPLE 1. Analysis. Determine, approximately, the value of C/R 1m
and the frequency at which it occurs, and the magnitude of the peak over40
~

30

20

-....

ILl

--- -- -

........

r--~
_.-r- ....

10
Cii
..a
.~
"0

-

QI~

l~ i'........
I

til

I

0

R+

E

-

i'-

r--~I'--

rr

WI

-10
-20

..........

W~

W3

~

s(I+0.ls)
10

-30

0.1

1.0

FIG. 31.

I I II

I

I 111111

w, radians/second

"- f'
1"1'

10

f'"

100

Gain plot for example problem.

shoot to a step function input and the time when it occurs for a system
having the open loop transfer function,
(57)

G/E = 10/s(1

+ O.ls),

which is drawn in Fig. 31. From this,
W3

=

}Ll

= 20 db.

We

= 10,

(58)

The values for WI and }Ll were arbitrarily selected; of course, choosing a
value for one fixes the value of the other. Since there is no segment between" WI and W2 with either 40 or 60 db per decade slope the chart with

FEEDBACK CONTROL

22-40

the lower attenuation rate 40 will be used. By entering the chart in Fig.
13 with the following parameters:
J.LI = 20,
(59)

wdw e =
W3/We

WI to
0.1,

W3

= 40 db/decade,

W2

to

= 40 db/decade,

00

= 1.0,

the desired information may be obtained.
~ 1.16,

C/Rlm
(60)

Wm/W e ~
Wm ~

C/Rlp
Wetp

-

0.7,

10

(0.7)(10) = 7.0,

~

~

1.18,

0.36,
tp ~

0.36 sec.

EXAMPLE 2. Synthesis. The requirements of a position control system
are assumed to be set forth in the following set of specifications:

1. C/R 1m = 1.3 or less.
Wm = 2 cycles per second or more.
3. Velocity error coefficient (Kv) is 200 sec-I.
4. The attenuation rate of the open loop control will be 60 db per decade
for frequencies greater than 120 rad per second.
2.

The problem is to determine the open loop transfer function of a suitable
control for this application.
The frequency, We, must be considerably greater than Wm , so as a first
assumption assume that We = 30 rad per second, and that W3 = 120
rad per second. For the specifications given there are many solutions to
the problem. Figure 28, for which W3/ We = 4, shows that for J.LI = 40 db
and wdw e = 2/30 = 0.067.

C/R 1m = 1.25,
(61)

Wm/W e
Wm

= 0.6,

= (0.6) (30) = 18 rad/sec.

Thus, an open loop transfer function satisfying the specifications is
(62)

200(1

C(s)

E(s)

s(1

+ 0.2S)2

+ 0.5s)2(1 + 0.00833s)2

Synthesis by means of the charts is basically a trial-and-error processassuming the solution and checking it.

TRANSIENT AND FREQUENCY RESPONSE
TABLE

Parameter
Time to
peak

Peak overshoot

Damping
ratio

Settling
time

1.

RULE-OF-THUMB ApPROXIMATIONS

Approximation
tp ~

Remarks

7r/w e

where tp = time from step input to peak value
of response transient, seconds
We = open loop crossover
frequency, radians/
second

C/Rlp

~

0.85Mm

where C/ Rip = peak value of
transien t response to a
step input
Mm = maximum value of closed
loop frequency
response

r = 1/( 2Me)
where r = damping ratio
Me = value of closed loop
frequency response
at the corner frequency

ts (6%) ~

22-41

3V"l="?/r

Wd

ts (2%) ~ 5V"l="?/rW d
ts (6%) ~ 3Teq

In Chestnut and Mayer's charts
it is evident that for this general class of servomechanisms,
those with a dominant complex
pair of closed loop poles, the
open loop crossover frequency,
We, times the time to peak, tp ,
is about 3 or7r. In other words,
the time to peak is about half
the period corresponding to the
open loop crossover frequency.
The peak value of the transient
response, C/ Rip, to a unit step
in pu t is generally less than the
maximum steady-state value,
M m, of the closed loop frequency response. The maximum value of C/ R Ip generally
approaches 2.0 while the maxi. mum value of Mm approaches infinity. For many applications "good" servos are those
with the values of Mm between 1.3 and 1.5. For these
servos Mm is generally 10 to
20% greater than C/Rlp.
The damping ratio may be approximated from the value of
the closed loop frequency response of the system at the
corner frequency, We (the frequency at which the lines
asymptotic to the log magnitude curve intersect). This is
exact for a second order system.
Of course this relationship may
also be used to estimate Me,
knowing the damping ratio.
In addition, Me is approximately equal to M m for systems with low damping ratios.
The settling time, ts , is generally
defined as the time for the system to settle to within 5 or

FEEDBACK CONTROL

22-42
TABLE

Parameter
Settling
time
(continued)

Equivalent
time constant

1.

RULE-OF-THUMB ApPROXIMATIONS

Approximation
where ts = time for response to
step input to settle
to within some per
cent of final value,
seconds
Teq = time for response to
reach 63% of final
value
Wd = damped natural
frequency, radians/
second
r = damping ratio
Teq ~ l/w e

where Teq

We

Oscillation
frequency

= time for

response
to step input to
reach 63 %of final
value, seconds
= gain crossover
frequency, radians/second

We ~ Wm ~

where We

=

Wm =

We =

O.75w e
oscillation frequency of transient
response, radians/
second
frequency at which
M m occurs, radians/second
open loop gain
crossover frequency, radians/
second

(Continued)

Remarks
sometimes 2% of the final value. In either case it is quite
difficult to predictt s for an
underdamped system because
it is subject to fluctuations of
about one-half the period of
oscilla tion for only small
changes in system parameters.
However, approximations (see
eq. 18) can be made. The last
approximation is for an overdam ped system.
This relationship is exact for a
simple single time constant system, but also quite good for the
general case (Ref. 4).

The frequency of oscillation of the
transient response, We, is generally about equal to the frequency, W m , at which the frequency response peak, M m ,
occurs. Both Wm and We are
usually less than We, the open
loop crossover frequency. For
the "good" servos with Mm
= 1.3 to 1.5 an approximate
relationship is as indicated. In
this approximation Wt is used to
mean essentially the same thing
as Wd, the damped natural frequency, previously defined for
a system with a dominant complex pair of poles. The use of
Wt places no restriction on the
system characteristics; however, generally, there is no significant difference between We
and Wd.

TRANSIENT AND FREQUENCY RESPONSE
TABLE

1.

RULE-OF-THUMB ApPROXIMATIONS

Remarks

trWt ~ trwm ~ 1.3

The system's rise time, tr , which
is here considered to be the
time for the response to a step
input to go from 10 to 90% of
its final value may be approximated as indicated for systems
with a M m value of about
1.3 to 1.5.

where tr
Wt
Wm

Phase margin at
crossover
frequency

(Continued)

A pproxima tion

Parameter
Rise time

22-43

rise time (10 to
90%)
= (defined above)
= (defined above)
=

"Ie ~ 40°
where "I e = open loop phase
margin at the
crossover frequency

A phase margin of 40° at the
unity gain (crossover) frequency
generally corresponds to a
M m ratio of a pproxima tely
1.5. Since this value of Mm
is the maximum ordinarily considered feasible, the phase margin should be 40° or greater.

6. APPROXIMATE RELATIONS-RULES OF THUMB

There are several approximations or rules of thumb which can be quite
useful when time or facilities are not available for a more exact analysis.
They may also be used as rough checks on the results of a more extensive
analysis. The more common of these rules of thumb are presented in
Table 1. They must be used with caution because, being approximations,
they cannot apply with equal validity to all servo systems; and the approximations for transient response are applicable only for step inputs.
7. NUMERICAL AND GRAPHICAL TECHNIQUES OF RELATING TRANSIENT
AND FREQUENCY RESPONSE

The numerical techniques presented involve only routine calculations
and provide a point by point determination of the related response without
the need of obtaining the closed loop poles or other intermediate quantities.
The methods presented require the follmving assumptions:
(a) The system is linear.
(b) The system frequency response approaches zero as the frequency
approaches infinity.
(c) The system's transient response begins with the system initially at
rest.
(d) The system is stable.
These requirements are satisfied by most servo systems. Even a nonlinear system may generally be considered linear over a restricted operating
range.

22-44

FEEDBACK CONTROL

Determining Transient Response from Frequency Response. A
relatively simple method for obtaining the time response to an impulse
function input, knowing the frequency response, was developed by Floyd
(Ref. 5). He derives the exact inverse transformation and then presents a
method for numerically performing the necessary integration. The exact
transformation is

(63)

e(t) = (2/rr)

i

oo

{Re [G(jw)] cos tw} dw,

where G(jw) isthe closed loop frequency response of the system considered.
Floyd's procedure for evaluating this integral is to plot the real part of
the closed loop frequency response, and then approximate the curve by
a series of straight-line segments. This approximation is then treated as
a summation of trapezoids. Equation (63) is applied to each trapezoid
and the resulting time functions are added to obtain e(t).

FIG. 32.

Geometry of a trapezoid for approximating the real part of response
function.

Each particular trapezoid is defined as indicated in Fig. 32. Performing
the integration indicated in eq. (63) the value for the integral is
(64)

where Al = AIWb the area of the trapezoid, and WI and Al are defined by
the fiI2:ure. The value of e(t) is then the summation for all the trapezoids.
(65)

~ -2An (Sin
wnt) (Sin
Ant)
e(t) _- L..J
----.
n=I 7r
wnt
Ant

TRANSIENT AND FREQUENCY RESPONSE

22-45

EXAMPLE. Assume the closed loop frequency response, G(jw) , of the
system to be expressed mathematically as

18.72

(66)

G(jw) = [(jw

+ 1)2 + l][(jw + 0.6)2 + 9]·

From this the real part of G(jw) is calculateu and plotted as sho,,-n in
Fig. 33. The values used for w, .1, and A of the series in eq. (65) are:
1.2

W1

+ 0.5

2.0

=---

2
.11

+

1.2 - 0.5

.12

2

A1 = 1 X 0.85

.1 4

2

2.0 - 1.2

3.5 - 2.6

= ----

.13 = - - - -

2

2

A2 = 0.66 X 1.6
7.2

W4

+ 3.6

A3 = 0.66 X 3.05
3.6

= ---2

+ 3.5
2

7.2 - 3.6

= ----

3.6 - 3.5

= ----

.15

2

A4 = 0.07 X 5.4

0.8

3.5 - 2.6

= ----

W3

2

= ----

1.0

1.2

A5

2

=

0.07 X 3.55

[}

1,

0.6

i\

0.4

CD

~ 0.2
......

C5

~
~

r~ t-""----

t£!I-®

\\

-0.2

-_

~\®-® trExact

-0.4

\ j

-0.6

-0.8

L Approximation

~

0

\~

o

FIG. 33.

2

4
6
Frequency, w

8

Real part of response function and approximation.

10

FEEDBACK CONTROL

22-46

1.0

\
\.\

0.8
0.6
0.4
0.2

\

0
0.6

-"'\

0.4
~

0.2

~

0

:5

®

\
'\

,

~ -0~2 ~ ~

~~
j
-0.6 ~ ~

-0.4

'(///

'/

Area between curve
and CJJ axis is

7

CD+®-0+0-®

1/

I

-0.8
.4

0.2
0

0

S;;?r

-0.2

o
FIG. 34.

/

"

0.2
0.4
0.6
0.8
Frequency, radians/second

1.0

Illustrating the trapezoids resulting from the straight-line approximation
shown in Fig. 33.

Figure 34 illustrates the trapezoidal approximations used for Fig. 33
and the foregoing calculations. The evaluation of eq. (65) then becomes
(67)

e(t) = (2/11-) [0.85 (Sin 0.855l) (Sin 0.35t)
0.855t
0.35t

sin 1.6t) (Sin O.4t)
+ 1.07 ( - -1.6t
O.4t

- 2.01

sin 5.4t) (Sin 1.8t)
+ 0.38 ( - -- 5.4t
l.8t

0.25

(Sin 3.05t) (Sin 0.45t)
3.05t
0.45t
(Sin 3.55t) (Sin 0.05t)]
•
3.55t
. 0.05t

The sin x/x tables (see Table 3) may be used to facilitate the evaluation
of this equation at various values of t.

TRANSIENT AND FREQUENCY RESPONSE

22-47

The exact solution, obtained by the inverse Laplace transformation,
gives this result:
(08)
c(t) = 2.28 exp (-t) sin (t

+ 5.6°)

- 0.761 exp (-0.6t) sin (3t

+ 17°).

For comparison both eqs. (67) and (68) are plotted in Fig. 35. This is
the system time response to a unit impulse function. If the response to
1.0 ..---.---..--:--,.----,-:----,----,

0.8 f---_+----!l--------'\,\--~---L---l

0.6

t----t--'l--t---+--\-\---i---f---i

E
t;,)

QJ~

§

0.4 f - - - - t - f l - - _ t _ - - , - - - - \ - l : - l - - _ t _ - - l

!}
QJ

a:

0.2

f----IJt--_t_--_t_--l-\'\--_t_--l

O~-_+--_t_--_t_--l-~_t_-~

-0.2 ~-~~--:-'-::--~~-~-:---~~----'
o
0.5
1.0
1.5
. 2.0
2.5
3.0
Time, seconds

FIG. 35.

Transient response for illustrative problem.

a step function is desired instead, the graphical integration of the curve
for the impulse response provides it (Ref. 5).
Deterlllining Frequency Response frolll Transient Response

Quite often the frequency response characteristics of a component or
system need to be known but it is difficult to introduce a sinusoidal signal
or to measure magnitude and phase shift of the output. In many cases it
is much simpler to introduce an impulse or step input; and since time and
frequency responses are uniquely related, it is possible to obtain the
frequency response from the transient response.
There are several approximate methods which have been developed for
accomplishing this. Floyd's trapezoidal approximation method may be
used but it yields only the real part of G(jw). To obtain the total vector
magnitude and phase shift a set of curves such as those presented by
Bode (Ref. 6) must be used. Other methods have been developed by

FEEDBACK CONTROL

22-48

Bedford and Fredendall, by Teasdale, Brooks and German, and by Samulon
(Refs. 7, 8, and 9).
SaIDulon's Method. While the approaches vary somewhat the results
are the same with the exception that Samulon's final equation has a
"correction" term which makes it more accurate than the others. His
procedure is presented here. Its basis is:
SHANNON'S SAMPLING THEOREM. If a function c(t) contains no frequencies higher than feo cycles per second it is completely determined by giving its
ordinate at a series of points spaced 1/ (2feo) seconds apart. N early any
transient response curve will have some limiting value for its frequency
spectrum, either due to the properties of the system or the test equipment
itself. Example. The bandpass of the oscillograph might be the limiting
item.
Shannon has also pointed out that such a function, with limited frequency components, can be exactly synthesized by a sum of sin x/x func-

2

'-'

/
I
I

~
c:

o
c.

I/

'\
\

\
\

,\
,,
,

Vl

Q)

0::

\

t, seconds
FIG. 36.

~

Use of sin x/x function to approximate transient response.

tions in a manner indicated in Fig. 36. The equation resulting from this
approach is
(69)

G(jw)

(~) (::)

. (7r) ( W) exp

sm -

2

(r)
~
jW"2 n::O

Bn exp (-jwnr)

Weo

where Bn = the increment in the time response curve for a step function
input,
w = the frequency of interest, radians/second,
Weo = the cutoff frequency for the system, radians/second,
T = the sampling interval is equal to 7r/w eo .

TRANSIENT AND FREQUENCY RESPONSE

22-49

Equation (69) would be exact if the system response contained no
frequency components greater than Weo. This will never be absolutely
true in a practical system but good results may be obtained nevertheless.
In choosing the nominal cutoff frequency, Weo, the attempt should be made
to estimate the frequency at which the steady-state frequency response is
attenuated by at least 20 db. A good estimate of Weo is ten times We, the
crossover frequency, as approximated in Table 1.
The calculated response will be in error at frequencies lower than the
Weo selected if the true response contains higher frequencies. It is therefore desirable to have a frequency characteristic which attenuates rapidly
above Weo selected for calculation. If the system or instrumentation does
not provide this attenuation a filter may be added. Samulon states
"the amount of error, which will be largest near the nominal cutoff frequency,
Weo, will be in general smaller than the amplitude response at Weo, provided
that the response does not rise again above its value at Weo for frequencies greater
than weo ." The calculated frequency response will indicate how valid the
assumption of the cutoff frequency was. Note that with use of a lower
Weo fewer points must be calculated.
"5

1.4

I

a.

.: 1.2
a.

.... 10,...

1.0

I
/

::l

.s

Q)

VI

0.8

c

o

a.
~ 0.6
Q)

-- --

I

c

III

2\,)

~

I

t: 0.4

'in

.=I

"4

V

2VI

~

I

Sampling
points

0.2

I

0

o

2.0

1.0

3.0

4.0

5.0

Time, seconds

FIG. 37.

Transient response for illustrative problem.

EXAMPLE. Assume a system with a time response to a step input as
shown in Fig. 37. By assuming a system cutoff frequency of Weo of 15.7
rad per second,
(70)

feo

= 15.7 /27r = 2.5 cps.

By Shannon's theorem the sampling interval should be
(71)

T =

1/(2feo) =

0.2 sec.

22-50

FEEDBACK CONTROL

By reading the ordinates from the curve at the sampling points Table 2 is
constructed. For computational convenience the frequency response at
w = 7r/1.6 will be computed.
TABLE

2.

FREQUENCY RESPONSE CALCULATED BY SAl\WLON'S METHOD FOR
W

= 7r/1.6
Bn exp (-jwnr)
Imaginary

Real

nT
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
2.8
3.0
3.2
3.4
3.6
3.8
4.0
4.2

(72)

c(nT)
0.57
0.93
1.13
1.22
1.23
1.20
1.15
1.09
1.04
1.00
0.97
0.'95
0.94
0."94
0.95
0.96
0.97
0.98
0.99
1.00
1.00

Bn

0.57
0.36
0.20
0.09
0.01
-0.03
-0.05
-0.06
-0.05
-0.04
-0.03
-0.02
-0.01
0.01
0.01
0.01
0.01
0.01
0.01
0

I: Bn exp (- jwnr)

+
0.527
0.254
0.076

+
0.218
0.254
0.185
0.090
0.009

0.004
0.021
0.019

0.021
0.046
0.060
0.046
0.028
0.011

0.019
0.028
0.028
0.020
0.009

0.004
0.009
0.010
0.009
0.007
0.004

0.004

1.108 - 0.008

0.044 - 0.890

+1.10

-jO.846

0.004
0.007
0.009
0.010

= 1.10 - jO.846 = 1.385/-37.6°.

The vectors which are shown resolved and added numerically 'in Table 2
may be added graphically by simply plotting them end to end.
The correction terms in eq. (69) will now be computed.
Magnitude correction:
(73)

(7r /2) (7r /1.6) (1 /511')
sin (11'/2)(11'/1.6)(1/511')

11'/16

- - - = 1.007.

sin (11'/16)

Phase correction:
(74)

exp (jwT/2)

= expj(7r/1.6)(O.I) = 1/11.2°.

TRANSIENT AND FREQUENCY RESPONSE

22-51

The final result is:

(7r /2) (w/ Weo)
.
exp
sm (7r/2) (W/Weo)

G(}w)
(75)

(}wr/2)~Bn

exp (-jwnr)

=

(1.007)(1.0/11.2°)(1.385/ -37.6°)

=

1.395/ - 26.4 ° .

The function chosen as an example is:

e(t) = 1 - exp (-2t) + exp (-t) sin 1.5t,
3.5s 2 + 7s + 6.5
C(s)
(s + 2)[(8 + 1)2 + 2.25]'

(76)
(77)

(1 - 0.538w)2

.

(78)

GCJw) = (1 - 0.616w 2)

+ j(1.077w)

+ j(1.115w -

0.154w3 ) •

The computed point is shown plotted on the exact G(jw) curve in Fig. 38.

s

I

I

I I I III

(Computed magnitude

20

4
III

Q;

.c
.~

........ ~

1'\ "

o
-4

compute~)

"0

phase angle

-16
0.1

=

I I III/II

1.0

\

-20 ~

'~ ~Magnitude
~

I I II

ar

"Eo

-40 ~

~,

~shift

+j(l.ll5w -O.lS4w 3)

~

Q)

On

Phase

(1-O.538w 2 )+j(I.077w)
(1-O.616w 2 )

o

i\

\

G(jw)

'\

t'i"o

...
10

'-"

Q)

III

ro

-60

~'\

a:

-80

'\..
100

Frequency (w), radians/second

FIG. 38.

Exact response curves and calculated points for example problem.

The sin x/x values given in Table 3 may be used to aid in computing the
magnitude correction. Generally, these correction terms will be negligible;
however, if accuracy is important they should be checked. Samulon
(Ref. 9) presents a series of tables and nomographs which are useful if
extensive work of this kind is to be done.
The previous example illustrates the fact that the number of calculations
required makes the whole problem rather tedious. If a great amount of
such work is to be done, a special purpose analog computer may be used
(Ref. 10).

22-52

FEEDBACK CONTROL
TABLE

x

0

3. A
1

FOUR-PLACE TABLE OF

2

3

4

sin x/x (Ref. 11)

5

6

7

8

9

- - - - -- -- -- - - - - - - - 0.0
0.1
0.2
0.3
0.4

+10000
9983
9933
9851
9735

10000
9980
9927
9840
9722

9999
9976
9919
9830
9709

9999
9972
9912
9820
9695

9997
9967
9904
9808
9680

9996
9963
9896
9797
9666

9994
9957
9889
9785
9651

9992
9952
9879
9774
9636

9989
9946
9870
9761
9620

9987
9940
9860
9748
9605

0.5
0.6
0.7
0.8
0.9

+9589
9411
9203
8967
8704

9572
9391
9181
8942
8676

9555
9372
9158
8916
8648

9538
9351
9135
8891
8620

9521
9331
9112
8865
8591

9503
9311
9089
8839
8562

9486
9290
9065
8812
8533

9467
9269
9041
8785
8504

9449
9247
9016
8758
8474

9430
9225
8992
8731
8445

1.0
1.1
1.2
1.3
1.4

+8415
8102
7767
7412
7039

8384
8069
7732
7375
7001

8354
8037
7698
7339
6962

8323
8004
7663
7302
6924

8292
7970
7627
7265
6885

8261
7937
7592
7228
6846

8230
7903
7556
7190
6807

8198
7870
7520
7153
6768

8166
7836
7484
7115
6729

8134
7801
7448
7077
6690

1.5
1.6
1.7
1.8
1.9

+6650
6247
5833
5410
4981

6610
6206
5791
5368
4937

6570
6165
5749
5325
4894

6530
6124
5707
5282
4851

6490
6083
5665
5239
4807

6450
6042
5623
5196
4764

6410
6000
5580
5153
4720

6369
5959
5538
5110
4677

6328
5917
5495
5067
4634

6288
5875
5453
5024
4590

2.0
2.1
2.2
2.3
2.4

+4546
4111
3675
3242
2814

4503
4067
3632
3199
2772

4459
4023
3588
3156
2730

4416
3980
3545
3113
2687

4372
3936
3501
3070
2645

4329
3893
3458
3028
2603

4285
3849
3415
2984
2561

4241
3805
3372
2942
2519

4198
3762
3328
2899
2477

4153
3718
3285
2857
2436

2.5
2.6
2.7
2.8
2.9

+2394
1983
1583
1196
825

2352
1942
1544
1159
789

2311
1902
1504
1121
753

2269
1861
1465
1083
717

2228
1821
1427
1046
681

2187
1781
1388
1009
646

2146
1741
1349
972
610

2105
1702
1311
935
575

2064
1662
1273
898
540

2023
1622
1234
861
505

3.0
3.1
3.2
3.3
3.4

+470
+134
-182
478
752

436
+102
213
506
778

402
+69
243
535
804

368
+37
273
562
829

334
+5
303
590
855

300
-27
333
618
880

266
-58
362
645
905

233
-90
392
672
930

200
-121
421
699
954

167
-152
449
725
978

3

4

5

6

7

8

9

x

0

1

- - -- - - - - -- - - - - - - - 2

TRANSIENT AND FREQUENCY RESPONSE
TABLE

x

3.

0

A

FOUR-PLACE TABLE OF

1

I

2

3

4

22-53

sin x/x (Ref. 11) (Continued)
5

6

7

8

9

- -- -- -- -- ----- -

3.5
3.6
3.7
3.8
3.9

-1002
1229
1432
1610
1764

1026
1251
1451
1627
1777

1050
1272
1470
1643
1791

1073
1293
1488
1659
1805

1096
1313
1506
1675
1818

1119
1334
1524
1690
1831

1141
1354
1542
1705
1844

1164
1374
1559
1720
1856

1186
1393
1576
1735
1868

1208
1413
1593
1749
1880

4.0
4.1
4.2
4.3
4.4

-1892
1996
2075
2131
2163

1903
2005
2082
2135
2165

1915
2014
2088
2139
2166

1926
2022
2094
2143
2168

1936
2030
2100
2146
2169

1947
2039
2106
2150
2170

1957
2046
2111
2153
2171

1967
2054
2116
2156
2172

1977
2061
2121
2158
2172

1987
2068
2126
2161
2172

4.5
4.6
4.7
4.8
4.9

-2172
2160
2127
2075
2005

2172
2158
2123
2069
1997

2172
2155
2119
2063
1989

2171
2152
2114
2056
1981

2170
2150
2109
2049
1972

2169
2146
2104
2042
1963

2168
2143
2098
2035
1955

2166
2139
2093
2028
1946

2164
2136
2087
2020
1937

2162
2132
2081
2013
1927

5.0
5.1
5.2
5.3
5.4

-1918
1815
1699
1570
1431

1908
1804
1687
1557
1417

1899
1793
1674
1543
1402

1889
1782
1662
1530
1387

1879
1770
1649
1516
1373

1868
1759
1636
1502
1358

1858
1747
1623
1488
1343

1848
1735
1610
1474
1328

1837
1723
1597
1460
1313

1826
1711
1584
1445
1298

5.5
5.6
5.7
5.8
5.9

-1283
1127
966
800
634

1268
1111
950
784
617

1252
1095
933
768
600

1237
1079
917
751
583

1221
1063
900
734
567

1206
1047
884
718
550

1190
1031
867
701
533

1175
1015
851
684
516

1159
999
834
667
499

1143
982
818
650
482

6.0
6.1
6.2
6.3
6.4

-466
299
-134
+27
182

449
282
-118
43
197

432
265
-102
58
212

416
249
-85
74
227

399
232
-69
90
242

382
216
-53
105
257

365
200
-37
121
272

348
183
-21
136
287

332
167
-5
152
302

315
150
+11
167
316

6.5
6.6
6.7
6.8
6.9

+331
472
604
727
838

346
486
617
738
849

x

0

1

360
374
388
403
417
431
445
458
499
513
526
539
552
566
579
591
630
642
654
667
679
691
703
715
750
761
773
784
795
806
817
828
859
870
880
890
900
910
919
929
- - -- - - -- -- -- - - - - - 2
3
4
5
6
7
8
9

22-54

FEEDBACK CONTROL
TABLE

x

3.

0

A

FOUR-PLACE TABLE OF

1

sin x/x (Ref. 11) (Continued)

2
3
4
5
6
7
8
9
---- -- -- -- -- ----- -

7.0
7.1
7.2
7.3
7.4

+939
1027
1102
1165
1214

948
1035
1109
1171
1219

957
1043
1116
1176
1223

966
1051
1123
1181
1227

975
1058
1129
1186
1231

984
1066
1135
1191
1234

993
1074
1142
1196
1238

1002
1081
1148
1201
1241

1010
1088
1153
1206
1244

1019
1095
1159
1210
1248

7.5
7.6
7.7
7.8
7.9

+1251
1274
1283
1280
1264

1254
1275
1279
1262

1256
1277
1284
1278
1259

1259
1278
1284
1277
1257

1261
1279
1284
1275
1255

1264
1280
1283
1274
1252

1266
1281
1283
1272
1249

1268
1282
1282
1270
1246

1270
1282
1282
1269
1243

1272
1283
1281
1267
1240

8.0
8.1
8.2
8.3
8.4

+1237
1197
1147
1087
1017

1233
1193
1142
1080
1010

1230
1188
1136
1074
1002

1226
1183
1130
1067
995

1222
1179
1124
1060
987

1218
1174
1118
1053
979

1214
1169
1112
1046
972

1210
1163
1106
1039
964

1206
1158
1100
1032
956

1202
1153
1093
1025
948

8.5
8.6
8.7
8.8
8.9

+939
854
762
665
563

931
845
752
655
552

923
836
743
645
542

915
827
733
635
532

906
818
724
625
521

898
809
714
614
511

889
800
704
604
500

880
790
694
594
490

872
781
684
584
479

863
771
675
573
469

9.0
9.1
9.2
9.3
9,4

+458
351
242
134
+26

447
340
231
123
+16

437
329
220
112
+5

426
318
210
101
-6

415
307
199
91
-16

404
296
188
80
-27

394
286
177
69
-37

383
275
166
58
-48

372
264
156
48
-58

361
253
145
37
-69

9.5
9.6
9.7
9.8
9.9

-79
182
280
374
462

89
192
290
383
471

100
202
299
392
479

110
212
309
401
487

120
222
318
410
496

131
231
328
419
504

141
241
337
428
512

151
251
346
436
520

161
261
356
445
528

172
271
365
454
536

10.0
10.1
10.2
10.3
10.4

-544
619
686
745
796

552
626
692
751
801

560
633
699
756
805

567
640
705
761
809

575
647
711
767
814

582
653
717
772
818

590
660
723
777
822

1

2

x

0

1~84

597
604
612
667
673
680
728
734
740
782
787
791
826
830
834
- - - - - - - - - - - - -- 3
4
5
6
7
8
9

TRANSIENT AND FREQUENCY RESPONSE
TABLE

x

3.

A

FOUR-PLACE TABLE OF

1

0

22-55

sin x/x (Ref. 11) (Continued)

2
3
4
5
6
7
8
9
---- -- -- -- -- ----- -

10.5
10.6
10.7
10.8
10.9

-838
871
894
908
913

842
873
896
909
913

845
876
898
910
913

849
879
899
911
913

852
881
901
911
913

855
883
902
912
912

859
886
904
()12
912

862
888
905
913
911

865
890
906
913
911

868
892
907
913
910

11.0
11.1
11.2
11.3
11.4

-909
896
874
806

908
894
872
841
802

907
892
869
837
798

906
890
866
834
794

905
888
863
830
789

904
886
860
826
785

902
884
857
822
780

901
882
854
819
776

899
879
851
815
771

898
877
848
811
766

11.5
11.6
11.7
11.8
11.9

-761
709
651
588
519

756
704
645
581
512

751
698
639
574
505

746
693
633
568
498

741
687
626
561
491

736
681
620
554
484

731
675
614
547
476

726
669
607
540
469

720
663
601
533
462

715
657
594
526
454

12.0
12.1
12.2
12.3
12.4

-447
372
294
214
134

440
364
286
206
125

432
356
278
198
117

425
348
270
190
109

417
341
262
182
101

410
333
254
174
93

402
325
246
166
85

395
317
238
158
77

387
309
230
150
69

379
301
222
142
61

12.5
12.6
12.7
12.8
12.9

-53
+27
105
181
254

-45
35
113
188
261

-37
42
120
196
268

-29
50
128
203
275

-21
58
136
210
282

-13
66
143
218
289

-5
74
151
225
296

+3
82
158
232
303

+11
89
166
240
310

+19
97
173
247
316

13.0
13.1
13.2
13.3
13.4

+323
388
448
503
552

330
395
·154
509
557

337
401
460
514
562

343
407
466
519
566

350
413
471
524
570

356
419
477
529
575

363
425
482
534
579

369
431
488
538
583

376
437
493
543
587

382
443
498
548
591

13.5
13.6
13.7
13.8
13.9

+595
632
661
684
699

599
635
664
686
700

x

8 14
L

0

1

614
607
611
618
603
622
625
628
641
644
638
647
650
653
656
659
669
666
671
673
676
678
680
682
689
691
692
688
694
695
697
698
703
702
703
704
705
706
706
707
- - -- -- -- -- -- -- - -- 2
3
4
5
6
7
8
9

FEEDBACK CONTROL

22-56
TABLE

x

3.

0

A

FOUR-PLACE TABLE OF

1

sin x/x (Ref. 11) (Continued)

2
4
9
3
5
6
7
8
- - -- -- -- -- -- -- - -- -

14.0
14.1
14.2
14.3
14.4

+708
709
703
690
671

708
708
702
688
668

708
708
701
687
666

709
708
700
685
663

709
707
699
683
661

709
707
697
681
658

709
706
696
679
656

709
705
695
677
653

709
705
693
675
650

709
704
692
673
648

14.5
14.6
14.7
14.8
14.9

+645
613
575
533
485

642
609
571
528
480

639
606
567
524
475

636
602
563
519
470

633
599
559
514
465

630
595
555
509
460

626
591
550
505
455

623
587
546
500
449

620
583
542
495
444

616
579
537
490
439

15.0
15.1
15.2
15.3
15.4

+434
378
320
259
197

428
373
314
253
190

423
367
308
247
184

417
361
302
241
178

412
355
296
234
171

406
349
290
228
165

401
344
284
222
159

395
338
278
216
152

390
332
272
209
146

384
326
265
203
140

15.5
15.6
15.7
15.8
15.9

+133
69
+5
58
120

127
63
-1
64
126

120
56
-8
71
132

114
50
-14
77
138

108
43
-20
83
144

101
37
-27
89
150

95
31
-33
95
156

88
24
-39
102
162

82
18
-46
108
168

76
11
-52
114
174

16.0
16.1
16.2
16.3
16.4

-180
237
292
342
389

186
243
297
347
393

192
248
302
352
398

197
254
307
357
402

203
259
312
362
407

209
265
318
366
411

215
270
323
371
415

220
276
328
376
419

226
281
333
380
423

232
286
337
385
427

16.5
16.6
16.7
16.8
16.9

-431
469
501
528
550

435
472
504
531
552

439
476
507
533
553

443
479
510
535
555

447
482
513
538
557

451
486
515
540
558

454
489
518
542
560

458
492
521
544
561

462
495
523
546
563

465
498
526
548
564

17.0
17.1
17.2
17.3
17.4

-566
575
580
578
570

567
576
580
577
569

x

0

1

572
573
574
568
569
570
571
575
579
579
577
578
578
579
579
577
579
579
580
580
579
579
578
580
-573
574
572
576
576
575
577
571
563
562
561
567
566
565
559
568
- - -- -- - - -- -- - - - - - 4
2
3
5
6
9
7
8

22-57

TRANSIENT AND FREQUENCY RESPONSE
TABLE

x

3.

0

A

FOUR-PLACE TABLE OF

1

2

3

4

sin x/x (Ref. 11) (Continued)
5

6

7

8

9

- - -- -- -- - - -- -- - - - 17.5
17.6
17.7
17.8
17.9

-557
539
516
487
454

556
537
513
484
451

554
535
510
481
447

553
533
508
478
444

551
530
505
475
440

549
528
502
471
436

547
526
499
468
433

545
523
496
465
429

543
521
493
461
425

541
518
490
458
421

18.0
18.1
18.2
18.3
18.4

-417
376
332
285
236

413
372
328
281
231

409
368
323
276
226

405
364
319
271
221

401
359
314
266
216

397
355
309
261
211

393
350
304
256
206

389
346
300
251
201

385
341
295
246
195

381
337
290
241
190

18.5
18.6
18.7
18.8
18.9

-185
133
80
-26
+27

180
128
74
-21
32

175
122
69
-16
37

170
117
64
-10
42

164
112
58
-5
48

159
106
53
+0
53

154
101
48
+6
58

149
96
42
+11
63

143
90
37
+16
68

138
85
32
+21
74

19.0
19.1
19.2
19.3
19.4

+79
130
179
226
270

84
135
184
230
274

89
140
188
235
278

94
145
193
239
282

99
150
198
244
286

104
155
202
248
290

110
159
207
252
295

115
164
212
257
299

120
169
216
261
303

125
174
221
265
307

19.5
19.6
19.7
19.8
19.9

+311
348
382
411
436

314
351
385
414
438

x

0

1

333
337
341
344
330
326
322
318
372
362
369
375
378
365
358
355
403
394
400
405
408
391
397
388
429
431
424
427
434
419
422
416
447
449
451
453
443
445
455
440
---- -- -- -- -- ----- 8
9
4
6
7
5
3
2

FEEDBACK CONTROL

22-58
TABLE

3.

A

FOUR-PLACE TABLE OF

sin x/x (Ref. 11) (Continued)

x

0

2

4

6

8

x

0

20.0
20.1
20.2
20.6
20.4

+456
472
483
489
490

460
475
485
490
490

463
477
486
490
489

466
479
487
490
488

469
481
488
490
487

23.5
23.6
23.7
23.8
23.9

-425
423
418
408
395

425
423
416
406
392

425
422
415
403
388

424
421
413
401
385

424
419
411
398
381

20.5
20.6
20.7
20.8
20.9

+486
478
464
447
424

485
475
461
442
420

483
473
458
438
415

482
470
454
434
409

480
467
450
429
404

24.0
24.1
24.2
24.3
24.4

-377
356
332
304
274

373
352
327
299
268

369
347
321
293
261

365
342
316
287
255

361
337
310
280
248

21.0
21.1
21.2
21.3
21.4

+398
369
335
299
260

393
362
328
292
252

387
356
321
284
244

381
349
314
276
236

375
342
307
268
228

24.5
24.6
24.7
24.8
24.9

-241
206
170
132
93

235
199
162
124
85

228
192
155
116
77

221
185
147
108
69

214
177
139
100
61

21.5
21.6
21.7
21.8
21.9

+219
176
132
87
42

211
168
123
78
32

202
159
114
69
23

194
150
105
60
14

185
141
96
51
5

25.0
25.1
25.2
25.3
25.4

-53
-13
+27
66
104

45
-5
35
74
111

37
29
21
+3 +11 +19
42
50
58
81
96
89
119 126 134

22.0
22.1
22.2
22.3
22.4

-4
49
93
136
178

13
58
102
145
185

22
67
111
153
193

31
76
119
161
201

40
85
128
169
209

25.5
25.6
25.7
25.8
25.9

+141
176
209
240
268

148
183
215
246
273

155
189
222
251
278

162
196
228
257
284

169
203
234
263
288

22.5
22.6
22.7
22.8
22.9

-217
253
287
317
344

224
260
293
323
349

231
267
299
329
354

239
274
305
334
359

246
280
311
339
364

26.0
26.1
26.2
26.3
26.4

+293
315
334
350
361

298
320
338
352
363

303
323
341
355
365

307
327
344
357
367

311
331
347
359
368

23.0
23.1
23.2
23.3
23.4

-368
388
403
415
422

384
400
413
421
424

26.5
26.6
26.7
26.8
26.9

+370
374
375
371
365

x

0

8

x

0

372 376 380
391 394 397
406 408 410
416 418 419
423 423 424
-----4
2
6

2

4
6
8
--------

371 372 373 373
374 375 375 375
374 374 373 372
370 369 368 366
363 361 359 357
-------2
4
6
8

TRANSIENT AND FREQUENCY RESPONSE
TABLE

3.

A

22-59

sin x/x (Ref. 11) (Continued)

FOUR-PLACE TABLE OF

2
4
2
4
6
8
x
0
0
x
- - ---- -------- -'---------

6

8
_._-

27.0
27.1
27.2
27.3
27.4

+354
340
323
303
280

352
337
319
200
275

349
334
316
294
270

346
331
312
290
265

343
327
307
285
260

30.5
30.6
30.7
30.8
30.9

-260
238
214
188
160

256
233
209
182
154

252
229
204
177
148

27.5
27.6
27.7
27.8
27.9

+254
226
196
164
131

240
220
190
158
124

243
214
184
151
117

238
208
177
145
111

232
202
171
138
104

31.0
31.1
31.2
31.3
31.4

-130
100
69
37
-5

124
94
62
31
+1

118 112 106
81
87
75
56
43
50
24
18
11
+8 +14 +20

28.0
28.1
28.2
28.3
28.4

83
90
+97
48
62
55
+26 +19 +12
23
16
-9
44
51
58

76
41
+5
30
65

69
34
-2
37
72

31.5
31.6
31.7
31.8
31.9

+27
58
88
118
146

33
64
94
124
151

39
70
100
129
157

45
76
106
135
162

52
82
112
140
167

28.5
28.6
28.7
28.8
28.9

-79
112
144
174
203

85
118
150
180
208

92
125
156
186
213

99
131
162
192
210

105
138
168
197
224

32.0
32.1
32.2
32.3
32.4

+172
197
219
230
257

177
202
224
243
260

182
206
228
247
263

187
211
232
250
266

192
215
236
254
269

20.0
29.1
29.2
29.3
29.4

-229
253
274
292
307

234
257
278
295
310

239
261
281
298
312

243
266
285
301
315

248
270
288
304
317

32.5
32.6
32.7
32.8
32.9

+272
284
293
300
303

275
286
205
300
303

277
288
296
301
303

280
290
297
302
303

282
292
299
302
303

29.5
29.6
29.7
29.8
20.9

-319
328
333
335
334

321
320
334
335
333

323
330
334
335
332

325
331
335
335
332

326
332
335
334
331

33.0
33.1
33.2
33.3
33.4

+303
300
294
286
274

303
299
293
284
272

302
298
291
281
269

302
297
290
270
266

301
296
288
277
263

30.0
30.1
30.2
30.3
30.4

-329
321
311
296
280

323
313
300
283
264

33.5
33.6
33.7
33.8
33.9

+260
243
224
203
180

257
240
220
109
175

x

0

8

x

0

328 327 325
320 317 315
308 305 302
293 290 287
276 272 268
-----4
2
6

247
224
198
171
142

243
219
193
165
136

254 250 247
236 232 228
216 212 208
194 190 185
171 166 161
-------2
4
6
8

22-60

FEEDBACK CONTROL
TABLE

3. A

x

0

34.0
34.1
34.2
34.3
34.4

+156
130
102
74
46

34.5
34.6
34.7
34.8
34.9

FOUR-PLACE TABLE OF

sin x/x (Ref. 11) (Continued)

2
4
6
- -- -- -

8

x

0

2

4
6
8
--------

151
124
97
69
40

145
119
91
63
34

140
113
86
57
28

135
108
80
51
22

37.0
37.1
37.2
37.3
37.4

-174
152
129
104
79

170
147
124
99
74

165
143
119
94
68

161
138
114
89
63

156
133
109
84
58

+17 +11
-12
18
41
47
69
75
96 102

+5
24
52
80
107

-1
30
58
85
112

-7
35
63
91
117

37.5
37.6
37.7
37.8
37.9

-53
26
+0
27
53

47
21
6
32
58

42
16
11
37
63

37
10
16
42
68

32
5
21
47
73

35.0
35.1
35.2
35.3
35.4

-122
147
170
192
211

127
152
175
196
214

132
157
179
199
218

137
161
183
203
221

142
166
187
207
225 I

38.0
38.1
38.2
38.3
38.4

+78
102
126
148
168

83
107
130
152
172

88
112
135
156
176

93
117
139
160
179

98
121
143
164
183

35.5
35.6
35.7
35.8
35.9

-228
243
255
264
271

231
245
257
266
272

234
248
259
268
273

237
250
261
269
274

240
253
263
270
275

38.5
38.6
38.7
38.8
38.9

+186
203
218
230
240

190
206
220
232
241

193
209
223
234
243

197
212
225
236
244

200
215
228
238
246

36.0
36.1
36.2
36.3
36.4

-275
277
276
271
265

276
277
275
270
263

276
277
274
269
261

277
276
273
268
259

277
276
272
266
257

39.0
39.1
39.2
39.3
39.4

+247
252
254
254
252

248
253
255
254
251

249
253
255
254
250

250
254
255
253
249

251
254
255
252
248

36.5
36.6
36.7
36.8
36.9

-255
243
229
213
194

246
232
216
198
178

39.5
39.6
39.7
39.8
39.9

+246
239
229
217
203

x

0

x

0

251 248
238 235
223 220
206 202
186 182
-----2
4
6
253
241
226
209
190

-81

245 244 242 241
237 235 233 231
227 224 222 219
214 211 208 206
199 196 193 190
-------2
4
6
8

TRANSIENT AND FREQUENCY RESPONSE

22-61

ACKNOWLEDGMENT

Figures 13 to 30 are reproduced with permission from H. Chestnut and R. W.
Mayer, Servomechanisms and Regulating System Design, Vol. I, Wiley, New York,
1951.
The example in the section on Determining Transient Response from Frequency
Response is reprinted with permission from G. S. Brown and D. P. Campbell,
Principles of Servomechanisms, Wiley, New York, 1948.

REFERENCES
1. ,V. R. Evans, Control System Dynamics, McGraw-Hill, New York, 1954.
2. G. A. Biernson, Quick methods for evaluating the closed-loop poles of feedback
control systems, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, 53-70 (1953).
3. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. I, Wiley, New York, 1951.
4. G. A. Biernson, Estimating transient response from open-loop frequency response,
Trans. Am. Inst. Elec. Engrs., 74-, 388-403 (1956).
5. G. S. Brown and D. P. Campbell, Principles of Servomechanisms, Wiley, New York,
1948.
6. H. W. Bode, Network Analysis and Feedback Amplifier Design, Van Nostrand,
Princeton, N. J., 1945.
7. A. V. Bedford and G. L. Fredendall, Analysis, synthesis and evaluation of the
transient response of television apparatus, Proc. I.R.E., 30, 440-458 (1942).
8. A. R. Teasdale, Jr., F. E. Brooks, Jr., and J. P. German, System Frequency Response Derived from Transient Response, Am. Inst. Elec. Engrs. District Paper, New
York, October 1950.
A. R. Teasdale, Jr., Get frequency response from transient data by adding
vectors, Control Eng., 2, 56-59 (1955).
9. H. A. Samulon, Spectrum analysis of transient response curves, Proc. I.R.E., 39,
175-186 (1951).
10. J. B. Reynolds, Jr., Get frequency response from transient data by machine
computing, Control Eng., 2, 60-63 (1955).
11. J. Sherman, Z. Krist., 85, 404 (1933).

E

FEEDBACK CONTROL

Chapter

23 .

Feedback System Compensation
P. G. Cushman

1. Design Criteria and Techniques

23-01

2. Compensating Components: D-C Systems

23-18

3. Compensating Networks: A-C Systems

23-48

4. Open-Closed Loop Control

23-54

References

23-56

1. DESIGN CRITERIA AND TECHNIQUES

The first step in the design of a feedback control system is the selection
of a suitable power element with sufficient torque, or force, speed, and
power rating to drive the load. Once the selection of a power element
with known characteristics has been made, the signal devices, amplifiers,
and stabilizing components have to be chosen with such characteristics
that make the entire feedback control system meet system requirements of
accuracy, speed of response, and stability. This chapter is devoted'to the
synthesis of required characteristics of these compensating components
and the presentation of characteristics of practical control system components. Section 1 derives feedback control system characteristics from
system specifications.
Synthesis of Log Magnitude Diagram from System Requirements
Low-Frequency Portion: Static Error Coefficients. Error coefficients are one of the most common means of specifying control system
23-01

FEEDBACK CONTROL

23-02

performance. These coefficients are figures of merit, the higher the coefficient, the smaller the control system error in achieving a required
output.
The static error coefficients are defined as the ratio of the constant output required (position, velocity, or acceleration) to the control system
error required to achieve that output. The types of control system and
the static error coefficient associated with each type are summadzed in
Chap. 20. (See also Ref. 1, Chap. 8.)
The static error coefficients influence the log magnitude diagram in an
easily visualized way and lead to a method of control system classifica-

o db/decade

--~-----..::'---

20logKp

Type 0 system

Type 1 system

w=Ku

Type 2 system

FIG. 1.

Sample log magnitude diagrams showing influence of static error coefficients.

tion. For example, a control system with a transfer function that approaches K p , a constant, at low frequencies (open loop transfer function,
G(s) approaches Kp as s approaches 0) will have a log magnitude diagram
which has zero slope at low frequencies. Such a system is called a type
o system (0 slore at low frequencies) and can follow a steady input, ro,
with an error of ro/ (1 + Kp). If Kp is large, the error will be small. However, if a velocity signal, r = vot, is applied, the error will continue to increase with time. For a system to follow such a signal with small error,
a type 1 system is required which has a transfer function at low frequency of

FEEDBACK SYSTEM COMPENSATION

23-03

Kv/jw (limit SG(S)8=0 = Kv) and an initial slope on the log magnitude
diagram of - 20 db per decade. A type 1 system would follow the constant velocity input with an error of only vo/Kv. Similarly, a type 2 system
(Ka/(jw)2 transfer function giving a slope of 2( -20) = -40 db per decade
at low frequencies) is required to follow a constant acceleration input with
moderate error.
The type of system determines the shape of the log magnitmde diagram
at low frequencies and the gain magnitude of this portion of the diagram
is determined by the static error coefficients. The intersection of the
extensions of the initial log magnitude diagram slope with the w = 1 line
is at the value 20 log K p , 20 log K v, or 20 log K a , as the case may be. The
intersection of the extensions of the initial slope with the O-db axis also
has significance as shown in Fig. 1.
Low-Frequency Portion: Dynalllic Error Coefficients. In addition
to the steady-state characteristics, expressible in terms of the static error
coefficients, it is often desirable to specify control system errors during a
transient by means of dynamic error coefficients, defined in Chap. 20.
That is,
111
1
e = - r -1' - f -·r+···,
lio
K1
K2
K3

+

+

+

where r, i', f are successive derivatives of the input time function and
K o, K 1 , K 2 , etc., are the dynamic error coefficients. The above relation
is valid during time intervals in a transient which are far displaced in
time from a discontinuity in the input function, r, and its derivatives.
The above equation converges quickly to useful values for slowly changing
input functions for which the higher order derivatives are small relative to
the lower order terms. The coefficients can be evaluated by straightforward
Laplace transform techniques, as given in Chap. 20. That is,
n

1
dn [E-(s) ].
- 1 = -lim-

Kn

n! 8---+0 ds

R

Some of the error coefficients, evaluated in this way, will be found
identical to the static coefficients of the previous paragraph. However,
additional coefficients will also be determined. The composition of these
generalized error coefficients can be seen from a general control system
transfer function (see Ref. 4).
E(s)

R(s)

+ n1S + n2s2 + n3s3 + .. .
1 + d1s + d2s2 + d3s3 + .. .

no

23-04

FEEDBACK CONTROL

The dynamic error coefficients for this system are:

-

1

Ko

-

1

KI

= no·
1

= nl - -dl'
Ko

111
= n2 - - dl - - d2 •

-

K2

Kl

1
-

Kk

j=k-l

= nk -

L:

j=o

Ko

1
-d(k-j).

Kj

The dynamic error coefficients in general are composed of the gain term
in combination with various sums and products of the system time constants. These coefficients are readily calculable and are valuable for
analysis purposes for a system of known transfer function. However,
they are not very useful in synthesizing the log magnitude diagram from
system requirements, because each of the coefficients is composed of a
number of parameters of the system characteristics. For this synthesis
work, a more direct procedure is outlined in the next paragraph.
Low-Frequency Portion: Transient Curve Fitting Procedure. A
curve fitting procedure (Ref. 2) by which certain system error requirements are transformed directly to log magnitude values which the log
magnitude diagram must exceed is useful. In this method, the expected
transient input signals are matched by sinusoids. The principle is that
if a control system can follow with small error sine wave inputs with
amplitude, velocity, and acceleration components as great as those of
the transient input, then it can follow the transient with small error. The
worst transients that the control system will be expected to follow are
presumably known, either in graphical or analytical form, together with
the allowable errors during these transients. These transient time functions
are plotted and fitted as closely as possible in various places with sine
waves as indicated in Fig. 2. The amplitudes of these sine waves are AI,
A 2, A 3, etc., with frequencies WI, W2, W3. AdE, A 2/E, A3/E, etc., are the
required gain magnitudes of the log magnitude diagram at WI, W2, W3 if
E is the allowable control system error. These points are shown in Fig. 3.
The required log magnitude diagram must be above these points.
It is important to fit the input transient at several places, such as peaks
and maximum slope points, so that broad coverage of requirements is
established by several points on the log magnitude diagram. Sometimes
it (3 advantageous to take the derivative of the input transient and fit it

FEEDBACK SYSTEM COMPENSATION

23-05

with sine waves of amplitude Vb V 2, etc., at frequencies Wl, W2. These
fits will establish points on the log magnitude diagram of magnitude
VdWlE, V2/W2E, etc. The procedure can be extended to higher derivatives also.

t~

FIG. 2.

Construction illustrating curve fitting procedure.
Diagram must be above
points obtained by curve
fitting

db

40------------~--~--~---------~~-----20 log Al

30------------~E~_r--~~----------------­
I

20------------~--~1__~----~------~----20 log A2 ~~

XI

~20 log AE3

10------------=E--~1---Tl-----+j------------I
j
j
O----------------~I--~I----~I------------WI

FIG. 3.

W2

Ws

W-~

Log magnitude points obtained from Fig. 2.

Mid-Frequency Portion of the Log Magnitude DiagraIll. Once
the gain of the entire control system has been set so as to meet the requirements as outlined in the preceding paragraphs, it is usually desirable to
reduce the system gain effective at the higher frequencies in order to reduce
the susceptibility of the system to noise and other extraneous signals.
However, this reduction in gain has to be achieved in such a manner that
the system has the required stability. As explained in Chap. 21, stability
may be assured by requiring the log magnitude diagram to have a slope
of - 20 db per decade in the vicinity of the crossover frequency. To
obtain adequate stability, this - 20 db per decade slope should extend for
a frequency range of a decade or more. The use of the log magnitudeangle chart (Nichols chart) provides a measure of stability in terms of the
maximum M of the closed loop frequency response. Such charts are given

FEEDBACK CONTROL

23-06

in Chap. 21. To indicate approximate magnitudes, Fig. 4 shows the
maximum M that could possibly be obtained for a particular minimum
value of phase margin.
Often it is convenient to express the degree of stability, or damping, of
a system by means of a damping factor. Strictly, a damping factor can
be applied only to a system that can be described by a second order linear
differential equation with constant coefficients, but it is frequently applied
5

,

4

\
\

3
M

\
2

\

~

1

o

10

20

30

'"

40

..............

50

--... -

60

70

80

90

Phase margin, degrees

FIG. 4.

Maximum peak of frequency response versus minimum phase margin.

to higher order systems. When the response is determined largely by
two complex roots, which is fairly common, the closed loop response is
characterized by a zero slope region of approximately unity gain at low
frequencies followed by a resonant peak in the vicinity of crossover of the
open loop. At frequencies above the resonant peak, the slope changes to
- 40 db per decade and then usually to even greater negative slopes.
Thus for the frequency region from zero to somewhat above the resonant
peak, many systems have much the same frequency characteristic as a
second order system. For such a system the height of the resonant peak,
when expressed as a numeric ratio, M m , determines the damping factor, r,
by the equation:
1
M m = - - = = =2
valid:
o < r < 0.707.

2rv'1 - r

'

FEEDBACK SYSTEM COMPENSATION

23-07

Frequently it is convenient to measure the magnitude of the frequency
response, Me, at the corner frequency. For such a measurement, the
damping factor is
1

t=-·
2Mc

The damping of oscillations in a physical system is a function of the
damping factor. When a system is excited by a unit step function, the
magnitude of the first and successive overshoots is determined by the
damping factor as shown in Fig. 5. See also Chap. 20.
0.8

\
i\

0.7

_\

0.6

\

0.5

0.3

...

\

0.1



Correction using
~iChOIS chart

\

l

Closed internal
loop, B2/M

"../

-20

).(

,,/

Tach

"..

-10

"

. '\ .............

~
1

'0

Rate feedback
component. """
(tachometer)

i\

~

'"

'\

V

\

-30

[\~

\

/,V

\

-40
0.01

100

10

0.1
(a)

50
40

' .....

30
20
Qj

10

0

0

VI

.c
0
Q)

"- .............
........

.....

..........
.....

IIII

yOpen outside loop, C/E l

"

...........

..........

Inverse of
internal feedback

..............

..........

,~

Iii

I~

20 log Kl

V

...

~.~ ~ r.....",
..........

-10
-20
-30
-40
0.01

~,
1

element

Internal
feedback

I..... ~.

~
..........

"

I

I

\ .....\ .
\

II :11

0.1

\

10
(b)

FIG. 17. Internal rate feedback compensation.

100

23-23

FEEDBACK SYSTEM COMPENSATION

50

1///'" , ,

40

Power element ... V"

".~

,

30

~

20

J..-i.-

v~
If)

a;

.c
·0
Q.I

10

/

0

-20

~ ..... ~

V

)/

Open internal loop ),

I

J..-

network feedback

to-

'"'f\

)/

'//

f'r,

Closed internll
loop, B2/M
/

"

/

1\

(~~~t@'
1

element

-

B2

-30
-40
0.01

!C. .Rate plus lead

~

, l' i't>

I;J..-

Cl

-10

~

J..-

v ...

./

\

Internal
feedback

\

'.
\

,
100

10

0.1

w(a)

,, "'N .... 20 log Kl
'. ~.f \ 1\

50
40

"

30

\.

20

Inverse of internal feedback""
If)

10

Q.I

Cl

- Open outside loop, C/E 1

"

"
~,

Q)

.c
·0

\
\

r--....

"

0

-10
E1

-20

M E2
K1

-30

Power
element

C

' ..

1'r-..
't--o

"-

~

r..,

Feedback

""

-40
0.01

I

I

1\

",~r,

10

0.1

w(b)

FIG. 18.

,

1\

IIID

Internal rate plus lead network feedback.

~o

100

FEEDBACK CONTROL

23-24

and phase diagram for an internal rate feedback element, the closed internal
loop and the log magnitude diagram of the outside open loop. The gain
of the internal loop must be greater than unity at frequencies up to and
somewhat beyond the desired crossover frequency of the outside loop and
the gain, Kl, may be set to give the proper crossover.
Much of Fig. 17 can be constructed by using approximate straight line
diagrams, but portions of the diagram in the frequency region near crossover of the internal loop should be corrected by using accurate magnitude
and phase values from the log magnitude-angle diagram (Nichols chart,
see Chap. 21).
Rate and Lead Network Feedback. In some systems, it is necessary
to obtain higher gain at the low frequencies. This can be obtained in a
system using internal feedback by adding a lead network to the rate feedback. Figure 18 shows the log magnitude and phase diagram of such an
internal feedback element along with the system diagram leading to the
open loop diagram of GIE(s).
For the higher order lead networks, the inner loop may be unstable by
itself. In such a situation it is necessary to determine the number of
Imaginary

Re
I
\

\

\

,

' ..... ,,---

FIG. 19. Nyquist diagram of unstable inner loop as indicated by two clockwise rotations.

positive real poles in the closed inner loop in order to apply a stability
criterion to the outer loop. An example of such a system is shown in Figs.
19 and 20. The closed inner loop contains two positive real poles as indicated by the two encirclements of the -1 point by the Nyquist sketch
of the open inner loop transfer function. For the outer loop, and therefore the whole system, to be stable, the Nyquist plot must encircle the
-1 point twice counterclockwise. It does this as indicated in Fig. 20.

FEEDBACK SYSTEM COMPENSATION

23-25

,...------- ..............

Imaginary \~W

",

= 0-

"

"

............ ,

",
\

w = 0':_:"'\

\\

\

,

\

\

\
\

I

I Re

,,
I

I

I

--------FIG. 20.

--

.," '" '"

,,/

/

I

I

I

I

Nyquist diagram of outside loop. Stability is indicated by two counterclockwise rotations about -1.

Lead Network Feedback. A lead network is a rate measuring device
that is somewhat inferior in performance to the components mentioned in
the paragraph above, but because of simplicity and low cost is often used
in place of these more expensive components. Such a network is equivalent
to a tachometer, or other rate device, at low frequencies, but it does not
have rate characteristics above a frequency which is equal to liT where T
50
40
.....

30
20
VI

OJ
.c
'C:;
Q)

c

Tachometer characteristic

V"

-10

~

-20

FIG. 21.

V

".../

a

-40
0.01

....

,......

10

-30

y

".../

~

.... ~~
~
Lead
~

\

network characteristic)

",

.... ~~
'~ V

0.1

10

100

Comparison of tachometer and lead network characteristics.

23-26

FEEDBACK CONTROL

is the time constant of the network. Figure 21 shows a comparison of
tachometer and network characteristics. The tachometer also has a highfrequency droop in its log magnitude diagram, but this is usually well
above any frequencies .of interest in the feedback control system. The
lead network can also have a rate characteristic out to high frequencies
by reducing the network time constant, but this lowers the gain of the
circuit at the frequencies of interest. Such a lead network feedback is
particularly useful in systems in which a d-c voltage is one of the intermediate outputs. An example is the voltage rate feedback around the
amplidyne in a voltage regulator.
Multiloop SysteDls. In the preceding discussion, internal feedback
loops have been closed to form a portion of the open outside loop of a
feedback control system. In the same way this second, or outside, loop
can be closed by using the log magnitude-angle diagram, and becomes a
portion of a third feedback control loop. This procedure can be extended
to any number of concentric feedback loops, such as may be present in
a complex feedback control system. However the block diagram of complex control systems often is not in the form of concentric loops. Chapter
20 shows how intertwined block diagrams can usually be transformed into
concentric loops by making use of superposition rules.
Alternate Methods of Representation

The preceding paragraphs have shown how the log magnitude diagram
of a power element can be modified by the addition of series or feedback
components to obtain the log magnitude diagrams synthesized in Sec. 1.
There are several over forms in which these same data may be presented
and handled to obtain the same desired results. Some of the more commonly used forms are the Nyquist diagram, the inverse complex plane
diagram, and the root locus plot.
Nyquist DiagraDl and Inverse CODlplex Plane DiagraDl. The use
of these diagrams is described in detail in Chap. 9 of Ref. 1. Since these
diagrams contain exactly the same information as the log magnitude diagram, essentially the same principles as described above may be used.
The steps may be summarized:
1. Select the starting axis (type of system) and gain factor from the
static error coefficient requirements.
2. From stability and transient response requirements, determine the
maximum allowable M and draw in this M circle.
3. By using the gains established in step 1 and the chosen power element,
draw a Nyquist diagram.
4. Add frequency sensitive networks (or proper internal feedback loops)
as needed to reshape the diagram to avoid the required M contour. See

FEEDBACK SYSTEM COMPENSATION

23-27

Fig. 22. (This is a trial and error process which will become more efficient
with the user's experience.)
Imaginary
Nyquist diagram of
power element using
required gain.

----------~~----+_------~~--_;+_----;_-Re

System Nyquist diagram

I

Starting axis and
low-frequency
gain established
by requirements

I
FIG. 22.

Nyquist diagram showing synthesis procedure.

The Nyquist diagram is used in this discussion because of its historical
position, although it is somewhat easier to use the inverse complex plane
plot in this type presentation.
Root Locus Plots. (See Ref. 6.) This is essentially a complex plane
graphical representation of the pole-zero configuration synthesis presented
in Sect. 1. In this plot, the locations of the closed loop poles are traced on
the complex plane as the open loop gain is varied. Use of this diagram
may be broadly outlined in steps analogous to those of the preceding paragraph:

23-28

FEEDBACK CONTROL

1. Select the closed loop poles from system specifications of performance
and stability. These may be located on the complex plane.
2. Start with the poles and zeros of the power element and draw the
root locus plot.
3. Add open loop pole and zero combinations to modify the root locus
plot to pass through the required closed loop poles. Reference 6 indicates
optimum selections of added pole and zero configurations to achieve the
desired changes in locus shape.
Figure 23 indicates the above steps.

A

i

W11

Closed loop poles
at required gain

;

I!I

Root locus of
--=r
compensated system

Root locus
of power
element

I~---r-

-..--x--------~---------------~---_+k

~ Power element poles
x Added compensating poles
o Added compensating zeros
EI Required closed loop dominant poles

FIG. 23.

Root locus plots showing synthesis procedure.

COlllparison of Alternate Methods of Representation with the
Log Magnitude Diagralll. The log magnitude and phase diagrams con-

tain the same information as the Nyquist or inverse complex plane diagram.
Because the log magnitude diagram is much easier to construct than the
other two diagrams, and required modifications to meet specifications are

FEEDBACK SYSTEM COMPENSATION

23-29

more easily visualized and constructed on the log magnitude diagram,
there is normally no reason for using the Nyquist or inverse diagram in
control system design work. In very complex systems, containing several
positive real roots, it may be desirable to make rough order of magnitude
Nyquist sketches to check rotations about the -1 point in order to check
stability but the actual numerical work should be done using log magnitude diagrams.
The log magnitude diagram is also easier to construct than the root
locus plot and would normally be used for problems concerned with stability, bandwidth, static and low-frequency errors. However, the root locus
plot has more specific information regarding actual transient response
characteristics and damping factor, and would be used in problems in
which this type of information is of primary importance.
Design Aids
Charts of Elcctric Networks. Tables 3 through 7 show many of the
electric networks useful in compensating d-c systems. All these networks
are of the resistor and capacitor type since any practical type of frequency
characteristic can be obtained with these components. Inductance is
also a useful circuit component, but large time constants cannot be obtained in sizes competitive with resistance-capacitance components.

TABLE

3.

STABILIZING NETWORKS: LEAD NETWORKS WITH

20

DB/DECADE SLOPE

(Ref. 2)

•

Attenuation Characteristio

Network

AC

II

Go- 0

G .. - 1

(a)

BR~

d:P:j--~:-

I.

I

fill

fj

G.-~

Go - 0

1+:8

(b)

o
db
I

I

7'j

fj
1
G o - _-

fill

G.. - 1

I+~

o -----------db
I

fill

Ta
G

G o - - _1_ 1

+D~N

__
1-

..

I+~

o --------------

1
G . - _-

I+~

o -------------

I

I

~
1';
01--,---:---db

Goo
20 db/dec

B

G. -

23-30

'iT'+'N

B(E + G + Nl + GN
G.. - (B + N)(E + G) + BN

TABLE

3.

STABILIZING NETWORKS

T.

T.

ABRC

,-.

Transfer Function

Til
T ,•

+1

T,.
T ,• + 1

A(B

+ D)RC

G (Til + I)
'(T .. + I)

ANRC

G (Til +
'(T .. +

I)
1)

ANRC

G (Til +
'(T .. +

1)
1)

A(E

+ NlRC

G (Til +
o(T.. +

1)
1)

A(E

+ NlRC

G~ (T,l

+ 1)

(T,.+

1)

A

(Continued)

[(E + G + Nl + G:] RC

23·31

ABRC

B
B+N T ,

[

B

B +D ]
+
D +N

T,

[B+/:N]

~TI

[B

A

+ D + EE: N ]
B +D+N

[(E +

G)

+B

B:

T,

N] RC

TABLE

4.

STABILIZING NETWORKS: LEAD NETWORKS WITH

40

DB/DECADE SLOPE

(Ref. 2)
Attenuation Characteristic

Network

(a)

Go = 0

G.. - 1

o

-----------ff.---20 db/dec

db

(b)

G. =0

QC

AC

1

r;
(d
G.. - 1

o -----------------Goo

(d)

G _
..

1
1

+ (8 + G)E
BG

ER

G.. - 1

23-32

TABLE

4.

STABILIZING NETWORKS

Tr&Ilsfer Function

T,T"I
T,T.." +

[ T, (1 +~) + T,] ,+ 1

T,T,.'

TIT~' [

( 8 ' G)D] + [ T, ( 1+ n
G) + T, ( 1+ GD)
1+ --io--

T 1 7',s"

(T,. + n(T" + 1)
+ {Tl (1 +~) + T, [1 + (8 ;~)DJ} + ~

•

+ l)(T~ + 1)
[1 + (8 ~GG)E

(T,.

T,T,

J• + 1

J.'

(Continued)
Tl

T,

ABRC

GQRC

ABRC

GQRC

DQRC

AN
ANRC - DQ Tl

DQRC

ANRC - DQ T,

BE
A B +E RC

GQRC.

AN

+ {T' [1,+ ~ + (B +~: ME] + T, [1 + (B + ~cg + El]} ,+ ~

,TIT•• ,

+ [ TI ~ + T, (B

TIT••'

!

E) ]

8

+ Go ,

+ [ Tl + T. (1 + ~) ] 8 + 1

I

23-33

TABLE

5.

STABILIZING NETWORKS: LAG NETWORKS JVITH

20

DB/DECADE SLOPE

(Ref. 2)
Network

Attenuation Characteristic

Go

NR

LD
!
T!

db

1

~

(a)

Go= 1

G.. - 0

Or-~--~---------------

Go

NR

~

t:

1;0

c

db

j'

1

1

~

\II

Tz

(b)

G

Go - 1

__
1_

1+~

..

o -------------------NR

Go
db

Gm
1

1';

\II

1

~

(c)
G __
1_

o

1 +~

G

..

1
=----

l+~+~

o --------------------Go

NR
db

1

Tz
(d)

23·34

\II

TABLE

5.

STABILIZING NETWORKS

(Continued)

Transfer Function

TI

Tl

1
Til + 1

NHRC

0

T,.+ 1
T,l + 1

(B ~ N) Tl

BHRC

0.(T.3 + 1)
(TI'+ 1)

[1 + B(:ZN)] T,

BHRC

O. (T •• + 1)
(T I.+ 1)

[

+~+~

1

I
B
F
N
N
I+ B + O +"F

23-35

T

'

(B

a.:

0) HRC

TABLE

6.

'STABILIZING NETWORKS: LAG NETWORKS WITH

40

DB/DECADE SLOPE

, (Ref. 2)
Network

Attenuation Characteristic

Go

O.-----~------------

DR

NR
db

L
=scf C::cf tI
I·
(a)

Go - 1

G. - 0

o
DR

NR

0;---------------

ER
db

e..

SC

HC

FR

(b)

F

Go - D

+ E + /" + N

G.. - 0

(e)

.

",. ~

~

Go · l

o Go
-----------------DR

NR
db

Drawn for T1> T2

1

Tz

(d)

G

F
o""D+F+N

23-36

G.. = [

FN] [
1 + B(F + N)
1

D]

D (B

+ F)

+ G + Ii (F + N)

TABLE

6.

STABILIZING NETWORKS

(Continued)

Transfer Function

1
TIT,a1

+ [Til ( 1 + ~) + T.] • + I

I

TIT••'

D) + T. (E
+~'
1
+ [ TI ( 1 + N
-F+
- N)]
• + Go

(T I• ;- I)(T.s
TIT. [ (1

+

+ [ T, (1 + ~ + ~) +

F
T,T.

{[

1

+

+F N

(T,s

+

1)

+ ~ ) ( 1 + ~) +~] .'

IHT.s

FN] [
D]
B(F + N)
1+G

+

T. (1

+ ~) ] • +

(8

Tl

T.

lINRC'

DSRC

~

DSRC

F) HNRC

BlIRC

GSRC

BHRC

GSRC

I

1)

D(H+Fl}.
+ Ii
(F + N) •

+ F)]
( D D ) } b ;- Go T2

AC

l~

(a)

Go -1

G.;o I

Drawn for T1> T2

AC

(6)
GO= _ _I_N

Q.

eo I

I+ D + G

ER

(Ref. 2)

Drawn for

71 > T2

Drawn for

71> T2

AC ,

db

(e)
Gn .. 1

(d)

GO= _ _I_N

l+ B + O

23-38

TABLE

7.

STABILIZING NETWORKS

(Continued)

TransCer Function

+

(T"

I)(T"

+

~"T,a' + [ T, (I +~)

I)

t T. ] ,

+

+ [ T,

T.

BEIRe

ANRC

BO
ii+GHRC'

ANRC

1

(T,a.,. IHT" of- 1)
T,T,,'

T,

(1 ... ~) + T.l a + ~

-

T,T.

r

1 T

(T,a of-

B(:~ N)]

(T,.
T,T, [ 1

+ B(EE
+N
N) ]8, +

{ T, ( 1

,.

OCT"~

+ 1)

+ [ T, ( 1 + ~) + T.] , + 1

+ l)(T., of- 1)
EN
+ ~)
B + T. [ 1 + (B + G)(E
+ N)] } ,+ lGo

23-39

DHRC'

A(E

+ MRC

~HRC

AlE

+ MBC

B+G

TABLE

7.

(Continued)

STABILIZING NETWORKS

Network

Attenuation .Characteristic

_1_

CII

CII=~

G1

=(1+ ~)

~

+1

(e)

ER

Go - 1

G. - 1

Go - 1

G _ 1
•

AC

(fJ
DN

1

ER

+ B(D + E + N)
+

N(D + E)
B(D+ E+N)

AC

db

DN

(8)

+ B(D + E + N)
N(D + E)
1 + B(D+E+N)

G _ 1
..

1
G O - - -- N

I+ B + G

ER

AC

db

71«T2
Attenuation curves are for fairly
large values of F only

(h)

DN

1
c

G

N(D

-

1

23·40

+

E)

+ B(D+E +N)
1 (

+ B(D + E + N) + F

E

D

+E +N

) (

DN)

D

+N +B

7.

TABLE

STABILIZING NETWOHKS

(Continued)

Transfer Function

T,T•• I + T 2• + 1
+ [TI (1 + ~) + T.] 8 + 1

T,T,.'

TIT. [1
T T [1
I •

+ BCD +DNE + N) ]

+ B(D+
N(D + E) ] , 1 +
E+ N)

T,T, [ 1 + B(D : ; +

N)]

8

TIT.

+

{

T,

'[IT1
,
.

[

(

N)
1+B

+

2

[

1

81

8

2

+

[ N ( D + E)
TITI 1 + B(D + E + N)

TI

~HRC

A(D +N)RC

D+N

+ (TI + T.)8 + 1

[TI (1

+ ~) + T.] 8 + 1

+ T. [ 1 + (B + G)(~: E + Nl]} 8 + 1
+ B(DN(D
+ E+ +E) N) ] 8 2

{TI

+ E)
]
+ TI [ 1 + (B + N(D
G)(D + E + N)

1 + B(D:; + N)]

T.

}

8

BHRC

A(D

+ E + N)RC

~HRC

A(D +E+N)RC

~HRC

A(D

B+G

+ 0.1

+ Tz [1 + (B + G)(~: E + N)]} 8 + 1
] 81
+ F D + EE + N ) (D + N + DN)
B
{T.

1 (

T, (
DN)
+ E)
]
+ {TI (1 + lY)
B + Ii
D + N +B
+ T. [ 1 + (B + NCD
G)(D + E + N)
T. (
E
) (
DN )}
1
+Ii D+E+N
D+N+ B + G
8+0.

23·41

B+G

+ E+N)RC

t.J

TABLE

8.

Mechanical Lead Network

DI
(a)

~

xi

I

I

Log Magnitude Characteristic

~OOOljObO~
Xo

Kl

W

MECHANICAL COMPONENTS, LEAD

Odb

1.t.J
Transfer Function

~G~
TIS
TIS + 1

20!db/decade

~

w~

1

TI

T2

DI
KI

...

Ti

(bi

~
xi

C

Go=O

~
I rOOOOHI
Xo
K2
~

Dr

Goo = 1

"T1

m
m

o

0:1

Odb

>
n

Goo

f-

~idb/decade

iboooooo

I

~

1

1

711

Kl

Go=_l1 + K2
Kl

w_

T2

G

Dl
KI GOTl

TIS + 1
o T2 s + 1

~
xi

~

Xo

iboooooo'---

Odb

Kl

f-

--I

4:::LYooo'D2

K2

Goo
40 db/decade
w~

0

0 0 =0

Goo = 1

Z

;;:c

Goo = 1

~20db/decade

o
o
r-

Dl
(c)

A

n

Dl
Kl

TlT2S2
TlT2S2+ [T l + (1

+ :2)T2 ]s+1
1

D2
K2

I 1_

TABLE

9.

MECHANICAL COMPONENTS, LAG

Mechanical Lag Network

Log Magnitude Characteristic

KI
(a)

1

DI

~libooooo'
xi

I

I

a
~

Xo

odb

Transfer Function

,T1

T2

1

Dl
KI

...

Go

~ecade

TIS + 1

l
1

"T1

m
m

oC:I

w~

1'1
G O= 1

»
()

Goo =0

A
I

VI
!

(b)

-

0-

KI

rOOOOOOOO

D2

-

o db

I--

GO

r
I

Dl

LF=1

1-

(c)

-

0-----100000000

f'

DI

-

r

l
1

w~

T2s+ 1
TIS + 1

T2
Goo

odb

Xo

G 1
00 -1 +D /D
2 1
Go

-

o

~
"'C
m

Z

VI

I

~~ecade

w~

"--

I

Dl
KI

»
-t

oZ

D2
K2

1

D2

UOooL£L}-

-t
m

()

T;

40 db/decade

K2

DI
KI

!

G o= 1

KI

l

-<

VI

~

Goo

1

~

:-:

I

db/decade

I

Xo

xi

~20

Go = 1

Goo =0

Tl T2S2 + [TI + (1 + ~2 )T2]s + 1
1

I

~

(.)

1(.)

t-.)

10.

TABLE

w

MECHANICAL COMPONENTS, LAG-LEAD

l...
~

Mechanical Lag-Lead Network

a

xi

K1

Dl

~ooo~oooy.
Kl

Log Magnitude Characteristic

o db

D,
EIJ--1Q1lo

~
IGII

i

_.

-

Transfer Function

1 1

-

Go

= G co = 1

Tl + T2
1- Tl + (1 +~~)T2

G _

T2

Dl
Kl

D2
K2

w_

~ 1'1

0

TI

(T1 8 +1)(T2 8 -+1)
TIT28 2 + [T 1 + (1 + ~~ )12]8 + 1

" "T1

m
m

o

O:J

}>

n
"I

A

n
I

i

I

!

o

z--t

:;0

o
r-

TABLE

11.

MECHANICAL-HYDRAULIC COMPONENTS
log Magnitude Characteristic

Mechanical-Hydraulic Network

w --;.-

.1

Tl

Valve

T2

Goo

Vi

Piston

Transfer Function

G00 1i s
TIs + 1

a

(a

+ b)K
"T1

m
m

Goo = -}

Go=O

0

o;J

~
Go

j

!

I

.1

Go = ...

:b

G _ b+c
00-

W --;.-

12

Tl

»
n
A

Go(Tls + 1)
T2s + 1

b+c
CJ(

a

(a

+ b)K

(J)

-<
(J)
-I

m

~

..

n

0

~

~

Go

w --;.-

.1

Tl

Go =

f

+1

...

iJ

m
Z

(J)

»
-I
(5

Goo=O

z

lli
I
1

"7i

G o-C- .. +b

TIs

b+c
CJ(

Goo=

Goo

GO(Tls

:

.1
T2

w_

+ 1)

Tzs+ 1

b+c

CK

I (a: b)K

b:c
t-.)

K = velocity of piston per unit valve displacement

W

1..
OJ

"l

W

J:,.
0.

TABLE

12.

PNEUMATIC COMPENSATING COMPONENTS.

ApPROXIMATE RELATIONSHIPS FOR HIGH Loop GAIN CONTROLLERS,

E«

1

LEAD
Plot for T = Tl

Pm - Po Al [ 1 + TlS]
Pc - P r = A2 1 + kTls
Tl = me

i

~

.,

Al/kA2

bO

o·

-'

Al/A2

k- = change in PI for a unit change in Pm when
m is completely closed.

I

1

1

2'lrT

27rkT

Log frequency. cpm

.... = Pressure source

"T1

m
m
0

0;1

»

n
A

n
0

Z

-f

:::c

0rPlot for T = Tl

Pm - Po = Al [1
Pc - P r A2

p;

Tl = me

+ (A3/ A2)Tl8]
1 + Tls

!
~
bO

.9

-----r-:
Log frequency. cpm

LAG

Differential
area Al

Plot for T

[1 + l/TIS]

Pm - Po
Al
Pc - P r = A2k 1

I

~I

+ e/kTIs

~

TI = CRe
E =

= TI

1

AI/E.A2
·1
I

"T1

a system constant related to the loop gain.

-~

m
m
0

c;J

To atmosphere tX~W

-80

~

IJ

10

0.1

0.01

W

We

FIG. 26.

Characteristics of the bridged-T network.
eo

(s2/wc 2)

~=

(s2/wc2)

+ vTJT-tCs/wc) + 1
+ gV 1'2/T1(S/wc) + l'
DN

T1 = D +N HRC ,

g

=

(1 + 15N) T1T2 + 1.

T2

= ACD + N)RC,

100

FEEDBACK CONTROL

23-52

As indicated in Table 7(a) the transfer function of the bridged-T network is
eo
TIT2S2 + T 2s + 1

TIT2S2

ei

+ {[I + (N/D)]Tl + T2}S + 1

where

Tl = ( DN )HRC,
D+N
T2 = A(D

+ N)RC.

The circuit is adjusted until TIT2 = 1/w e2, where We is the carrier frequency. The minimum value of the transfer function occurs at W = We,
where the value is
1
1

g

1

+

[1

+

(N /D)](TdT2)

The above equation can be rewritten
eo
ei

(S2/ w/) + -vT;iT;(s/w c) + 1
(S2/We 2) + VT 2/T 1 g(s/We) + 1

Characteristics of this network for several values of T2/Tl are shown
in Fig. 26. It is seen that this can approximate the ideal characteristic of
Fig. 25. The factor T2/Tl largely influences the rate of change of angle
near We whereas g determines the magnitude of phase change that can be
,
obtained.
Characteristics of the parallel-T network can be found in Ref. 5.
Sensitivity to Carrier Frequency Shift. The most serious weakness
of a-c feedback control systems using series networks is that the system
performance is impaired by normal shifts in the frequency of the carrier.
For example, in an aircraft power system, the 400-cps power may be frequency regulated to only 5 or 10 per cent. In many a-c feedback control
systems using series stabilizing networks, this amount of frequency shift
will render the system completely useless.
Figure 27 shows the effects of a carrier frequency shift on the operation
of an a-c stabilizing network. These are:
1. The gain of the control systems at low frequencies is increased. This
may result in saturation in subsequent elements in the control system.
2. The phase of the carrier is shifted. This means that any phase sensitive devices such as discriminators or a-c motors will not operate at best
efficiency.

FEEDBACK SYSTEM COMPENSATION
. Nominal

We

23-53

31

Log magnitude

Phase

FIG. 27. Effect of carrier frequency shift on the operation of an a-c series stabilizing
network.

3. The phase lead at the control system frequencies (awe) is decreased.
Thus the network does not perform the function for which it was intended
and instability may result.
It is noticed that "fast" control systems, which are designed to have a
high crossover frequency (large values of a) are less susceptible to catTier
frequency shifts than systems which have a large effective lead network
time constant.
Tachometer Stabilization. Alternating-current tachometers are excited by the carrier frequency along with other components in an a-c
control system and generate an amplitude-modulated signal proportional
to velocity. This operation is not hindered by reasonable changes of the
carrier frequency so that this method of stabilization is not subject to the
limitations of a-c stabilizing networks. The analysis or synthesis of a-c
systems using tachometers can proceed just as in the case of the d-c system
described in Sect. 2 under Rate Feedback.
Other Techniques. The previous sections have discussed means of
stabilizing all a-c feedback control systems by using rate-producing components. Often it is necessary to add gain at lmv frequencies and reduce
this gain below the desired crossover frequency by means of an integrating
or reset component. This is the type of system (using d-c signals) described
in Sect. 2, Phase Lag Compensation.

FEEDBACK CONTROL

23-54

Theoretically, an a-c carrier lag network can be constructed by using a
bridged-T circuit in the feedback channel of a feedback amplifier. However, since the time constant of a lag network has to be considerably larger
than used in a lead network, the effect of carrier frequency shifts usually
makes this method impractical.
Commonly lag networks are obtained by rectifying the carrier to direct
current and then using d-c networks. This procedure then is the same as
that of Sect. 2 and the system is no longer an all a-c system.
Another method to obtain an effective lag network is to use a small
"reset" servo in parallel with the signal channel, as shown in Fig. 28.
Error signal

+
+

K

Reset servo transfer function

+ Tms)
= - - -___s(l

K

1+

s[(l +K)

K

(1
FIG. 28.

+ Tms)

+ Tms]

Reset servo channel in parallel with signal channel.

The servo channel has high gain at low frequencies but, because of the
tachometer feedback, this gain falls below that of the regular signal channel
below the crossover frequency.
4. OPEN-CLOSED LOOP CONTROL

Open-closed loop control, sometimes called schedule and trim, is not so
much a different kind of control as it is a different way of visualizing or
synthesizing a control system. The principle is that an open loop control
system, although not accurate enough for the complete control, responds
predictably and stably to an input signal, and it can be used as an approximate, or first order correction, control system. Then the required
accuracy can be obtained as a correction or "trim" to the open loop and
is accomplished by the use of a relatively slow but high gain feedback
loop. This is illustrated in the block diagram of Fig. 29, which is modified
for analysis purposes in Fig. 30. The only closed loop to consider is the
easily stabilized, low crossover frequency loop and that the required highfrequency response is obtained by the open-ended forcing function. In

FEEDBACK SYSTEM COMPENSATION

23-55

the actual system this forcing action is attained by the parallel signal
channel to the power element.
An allied situation exists when there is difficulty in measuring the control
system output accurately and immediately. The trouble may be in an
Schedule channel

+

FIG. 29.

c

Block diagram of open-closed loop control.

inherent delay in the measuring device, such as the time lag of a thermocouple measuring temperature, or it may be caused by the need for a
smoothing and averaging process to attain accuracy from noisy data. In
such cases, the delayed but accurate measurement can be used in a trim
feedback control loop and then an internal, fast response, feedback loop
is formed by using an alternate measurement. This alternate quantity is

c

R

FIG. 30.

Modification of Fig. 29 for analysis.

related to the desired output in a known way, such as a pressure change
which accompanies a change in temperature, but accuracy of such a relationship is not high enough to use the alternate quantity as the ultimate
measurement of the desired output.
Open-closed loop control is covered in detail in Ref. 9.
ACKNOWLEDGMENT

Tables 3 to 7 are reproduced with permission from H. Chestnut and R. W. Mayer,
Servomechanisms and Regulating System Design, Vol. II, Wiley, New York, 1955.

23-56

FEEDBACK CONTROL
REFERENCES

1. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. I, Wiley, New York, 1951.
2. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. II, Wiley, New York, 1955.
3. J. G. Truxal, Automatic Feedback Control System Synthesis, McGraw-Hill, New
York, 1955.
4. Paul E. Smith, Jr., Design regulating systems by error coefficients, Control Eng., 2,
69-75 (1955).
5. Leonard Stanton, Theory and application of parallel-T resistance-capacitance frequency-selective networks, Proc. I.R.E., 34, 447-456 (1946).
6 ·W. R. Evans, Control System Dynamics, McGraw-Hill, New York, 1954.
7. J. E. Gibson, 14 ways to generate control functions mechanically, Control Eng., 2,
65-69 (1955).
8. D. M. Considine, Editor, Process Instruments and Controls Handbook, McGrawHill, New York, 1957.
9. John R. Moore, Combination open-cycle, closed-cycle control systems, Proc. I.R.E.,
39, 1421-1432 (1951).

E

FEEDBACK CONTROL

Chapter

24

Noise, Random Inputs,
and Extraneous Signals
D. L. Lippitt

1. Introduction
2. Mathematical Description of Noise
3. Measurement of Noise
4. System Response to Noise
5. System Design in the Presence of Noise
References

24·01
24-02
24-06
24-11
24-15
24-19

1. INTRODUCTION

Linear systems can be designed to obtain a desired response to commands
and disturbances which may be exactly defined either by an equation or
by a graphical plot (Chaps. 19 through 23). In many cases inputs can
be described adequately only in a statistical manner. Examples are the
jitter observed in automatic radar tracking systems and gust disturbances
to an aircraft. This chapter covers methods for:
(a) Measuring and describing statistical inputs.
(b) Computing the system response to such inputs.
(c) Specifying optimum designs.

24-01

24-02

FEEDBACK CONTROL

2. MATHEMATICAL DESCRIPTION OF NOISE

Random processes are described in Chap. 12, Sect. 16, and Chap. 13,
Sect. 2. It is sufficient to note here that a random process has a complete
set of probability distribution functions. If these distributions are independent of time, the process is stationary and its characteristics can be
defined by time averages (Ref. 1).
Autocorrelation. The most useful description of a random process for
control system analysis is the autocorrelation function cf> defined by eq. (1)
for a stationary function of time x(t):
cf>xx(T) = lim -

(1)

T-~

1

2T

f+T x(t)x(t + T) dt.
-T

Figure 1 graphically illustrates eq. (1). For nonstationary processes the
autocorrelation may be described by an ensemble average that is a function

x(t)x(t

Fro. 1.

+ T)

Illustration of the computation of the autocorrelation function.

NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS

24-03

of time as well as r. This definition is given in eq. (2):

f f
+00

(2)

CPxx(t, r) =

-00

+00

XIX2 P (t, XI, t

+ r, X2) dXI, dX2,

-00

where pet, Xb t + r, X2) is the joint probability density of Xl at time t and
X2 at time (t + r). Important properties of the autocorrelation function for
stationary series are:
(3)

cpxx(r)

~

CPxx(O),

cpxx(r) = CPxx( -r),

(4)
(5)

In eq. (5), the bar indicates a time average.
An interesting example is the autocorrelation of the function sin wt.

= A sin wt.

X(t)
(6)

1
cpxx(r) = lim T-H~' 2T
= A2/2

f+T A

2

sin wt sin wet

+ r) dt

-T

cos wr.

Although sin wt is not strictly stationary, this example illustrates the effect
of a pronounced periodicity in noise data. If it exists, it will show up in
the autocorrelation as cosine function.
Cross-Correlation. In some cases a control system will have two inputs, X and y, which are not completely independent. The relationship is
expressed by the cross-correlation function defined by eq. (7) for stationary
series:
1
cpXy(r) = lim x(t)y(t + r) dt.
(7)
T-Hfj 2T
-T

f+T

For nonstationary series CPXy must be expressed as an ensemble average as
given by eq. (8):

f f
+00

(8)

CPXy(t, r) =

-00

+00

xyP(x, t, y, t

+ r) dx dYe

-00

Important properties of the cross-correlation function for stationary series are:
(9)

(10)

[CPXy(r)]max

< x2

or

y2 (whichever is larger),

cpXy(r) = Cpyx( -r).

24-04

FEEDBACK CONTROL

cP(T)

~(W)
~(W)

=p

=0

-WoWo

cf>(T)

=2woP [ Si~;OT]

)

(a)

T

(b)

FIG. 2.

(a) and (b)

Examples of autocorrelation and spectral density pairs.

The autocorrelation of the sum of two correlated functions is given by
eq: (11):
(11)
If x(t) and yet) are independent:
(a) The cross-correlations become constants equal to the products of
their means or xy.
(b) The au tocorrela tion of the sum becomes the sum of the individual
autocorrelations plus twice the product of the means, or CPxx(T)
cPyy(T)

+ 2xy.

+

Spectral Density. An alternate description of a stationary random
process is the spectral density, <1>(w). It is a measure of the distribution of
energy in the frequency spectrum. For a voltage wave the units would
be volts 2 per radian per second.
The following discussion is not rigorous but will show the physical
s'ignijicance of <1>(w). Assume that several samples of noise of duration T

NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS
~(W)
~ (W)

=

24-05

c/J(r)

2a{3/'rr(a 2 + (32)~
-::------=..;.:.:....:....:..~---=--~--.....,....
[a 2 +({3 + w)2][a 2 +({3 - W)2]

c/J(r)

=e -aIT1COS ({3r + q,)

(c)
~(W)=P

q,(r)

= 21l"Po(O)

T

(d)

FIG. 2.

(c) and (d)

Examples of autocorrelation and spectral density pairs.

seconds have been expanded in a Fourier series of the form shown in eq.
(12) :
(12)

where

Wn

=

2n7l"/T~

an = ..:

rT x(t) cos wnt dt,

TJ o

bn = -2

T

iT

x(t) sin wnt dt,

0

cn 2 = a n 2 + bn 2 ; 1/;n = tan- 1 (bn/a n).

AssuID:e that T is very long compared to the longest periodicity present
in the function. If the c's and 1/;'s for a given value of n are considered
over a large number of samples, it will be found that the 1/;'s are uniformly
distributed between ,+~and -71" and that the cn 2 ,s have an average value
cn 2, A knowledge of Cn 2,S for a, given process is sufficient to predict the

24-06

FEEDBACK CONTROL

output of a linear control system with a transfer function G(jw) between
the noise input and the system output. The mean square of the output is
(13)
The experimental determination of the Cn 2,S would be relatively inefficient.
However, the cn 2 ,s are related to the spectral density by eq. (14) for large T.
(14)
where
wn = nLlw = 27rn/T.

Hence if the average value of the input is zero,
becomes eq. (15):

Co

is zero and eq. (13)

21. "'(ol) 1G(Ol) 12 dOl.
00

(15)

",2

=

The spectral density is related to the autocorrelation function by the
Fourier cosine transform as shown in eqs. (16) and (17):
(16)
(17)

~(w)

1

=-

27r

¢(r) =

f

f+oo ¢(r) cos wr dr,
-00
+00

cp(w) cos wr dw.

-00

The cross-spectral density ~xy(w) bears the same transform relationship
to the cross-correlation function ¢Xy(r) as the spectral density does to the
autocorrelation function.
Figure 2 shows several pairs of spectral density functions and autocorrelation functions.
3. MEASUREMENT OF NOISE

The greatest problem involved in the analysis of the response of a control
system to a random input is obtaining the required characteristics of the
input. If the input is stationary, long samples are necessary and numerous
calculations must be made. In most cases the use of high-speed digital
computers is necessary if the job is to be completed in reasonable time. If
the input is nonstationary, the magnitude is multiplied many times since
the results must be calculated separately for each value of time.
Calculation of ¢(r) for Stationary Inputs. The most straightforward methods of analysis if the input is stationary are to compute the auto-

NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS

24-07

correlation function defined by eq. (1). The approximate form for calculation is given by eq. (18):
N-m
1
(18)
¢(m Lh) =
L XnX n +rn ,
N - m + 1 n=O
where Ar is the time interval at which values of the function are read and
is the value of the function n Lh seconds from the beginning of the
sample.
SaInpling Rate. The value of Ar is set by Shannon's sampling theorem
Xn

1

Ar = - .

(19)

2js

The value of js is the highest frequency present in the data. In general,
this will not be known. In most control systems there are considerations
other than noise which set an upper bound on the system band pass so
that a filter may be inserted in the device which records the sample to
eliminate frequencies not of interest. This is desirable since by increasing
Ar the number of calculations is reduced.
Required Range of r. The maximum value of r is determined by the
use to which ¢(r) is to be put. If it is desired to compute the variance of
a system output, reference to eq. (29) will show that mmax Ar should equal
the longest anticipated settling time of the system output to an impulse
applied at the noise input.
An insight to the effect of using a finite value, mmax, can be obtained by
performing the integration of eq. (16) over a finite range or by multiplying
the true autocorrelation function by a function u( r) which equals unity
for - T < r < + T and zero elsewhere. Then the approximate spectrum
is given by eq. (20):
(20)

approx(w)

f+X> u(r)¢(r) cos wr dr.

1

=-

271"

-00

Since the true spectrum is the Fourier transform of ¢(r), and (T /71")
(sin wT/wT) is the transform of u(r), the approximate spectrum is given
byeq. (21):
T
sin aT
(21)
(w - a) - - - da.

f+oo

71"

-00

aT

For example, if the time function were a pure sine wave of frequency Wo
of unity power, the true spectrum would be given by eq. (22):
(22)

(w) = i[o(wo)

where o(w) = impulse at w.

+ o( -wo)],

FEEDBACK CONTROL

24-08

Then if the autocorrelation were computed only for (- T < r < + T),
the approximate spectrum computed from the result would be given by
eq. (23):
(23)

(w) which are physically impossible.
Required SaInple Length. The required sample length for computing
cp( r) depends on the use to which the result will be put and to a certain
extent upon the frequencies contained in the input fllnction. A 'useful rule
of thumb is that N in eq. (18) should be at least 10 times the maximum value
of m. As a check, the autocorrelation can be computed for two samples
of equal length. If then the results are nearly equal, the samples are

NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS

24-09

probably long enough. If not, the average of the two should be compared
with results from a sample twice as long and so forth until agreement is
reached.
Once an approximate knowledge of the frequency components of the
noise has been obtained, a better estimate can be made of the required
sample length. Reference to eq. (28) shows that computing the system
output from the autocorrelation function of the input is equivalent to
computing the output directly by convolution and averaging the square.
Hence, if a sample of the system output T seconds long is sufficient to
give an accurate measure of the output mean square, a sample T
Ts
seconds long, where Ts is the system settling time, is sufficiently long to
compute the autocorrelation of the input. If ¢O(T) is the estimated output
correlation, the ratio of standard deviation of the computed output mean
squares cr2[ao2] for samples T seconds long to the true mean square is given
byeq. (24):

+

2

(24)

cr

[;02]
0

cr

= [

2

~

T ¢ (0)

iT (T _ T)¢02(T) dT] V2 •
0

Figure 4 shows a plot of this ratio where ¢O(T) is (e- aiTi ) as a function of
(aT). References 2,3,4 give a more detailed consideration to the problem.
1.0

0.1

0.01 L...-_ _ _ _ _---L._ _ _ _ _ _- L - _
10
100
1000

aT
FIG. 4. Standard deviation of errors in computing the output mean square error from

a finite length sample.

Nonstationary Inputs. Any process observed in nature is nonstationary in the strict sense of the term. However, in many cases the input
characteristics will vary so slowly that samples long enough to compute

FEEDBACK CONTROL

24-10

cf>(r) can be considered stationary. In this case the techniques previously
discussed are applicable.
A slightly more difficult problem exists when the input changes too rapidly
to obtain a sufficiently long sample, but it still does not change appreciably
during the settling time of the control system. In this case, several recordings of the input must be obtained over the range of characteristics of
interest. Then short samples can be drawn from common points and the
autocorrelation averaged. The amount of computation required is vastly
increased.
One type of slowly changing nonstationary function can be treated in
a more simple manner. If the frequency components retain the same
amplitudes relative to each other but where absolute magnitude increases
or decreases with time, a single recording can be used. The input is divided
into several short samples. The autocorrelations of each sample are normalized, so that the function is divided by cf>(0), and the normalized functions are averaged. The autocorrelation for any specific time is then the
averaged normalized autocorrelation function times the mean square value
of the input corresponding to that time. Note that this technique is
helpful only if it is known that the spectrums have the same form. Otherwise, more data would have to be taken to establish the point.
If the input characteristics vary appreciably during a settling time of
system, the problem becomes immensely complicated. To compute
cf>(r, t1 ) it is necessary to average the products from many recordings of the
input. At least one hundred products would be necessary to obtain 10 per
cent accuracy when the output of a control system is calculated.
Correlation Computers. Special computers for the computation of
correlation functions can be built where many correlations must be done
and high-speed digital computers are not available. The basic principle
is illustrated in Fig. 5. The noise is recorded on a media such as magnetic

c)

-d-

CD

®

(0
,-------1
Integrator

FIG. 5.

Correlation computer.

tape and played back through two reading heads spaced a distance d
apart. The second reading head receives the same signal as the first except

NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS

24-11

that it is delayed by a time equal to d divided by the tape speed. The
two outputs are then miltiplied and the result integrated. The output of
the integrator is then given by eq. (25):
(25)

I = ff(t)f(t - r) dt.

I may then be divided by the time interval over which the run is made to
get the autocorrelation. Other possible recording media are photographic
film and ink recordings tracked by hand (Ref. 5). Still other methods use
pulse sampling and storage.
Computers of this type depend on the availability of the noise in the
right form. For instance, radar tracking data are usually taken with a
moving picture camera so that the data must be read frame by frame before
they are useful. Also, several records may have to be combined to arrive
at the noise. In such cases, it would be simpler to use a general purpose
digital computer.
The accuracy of such systems is limited by the recording mechanism,
multiplier, and integrator. To obtain reasonable accuracy these units
become bulky and expensive.
4. SYSTEM RESPONSE TO NOISE

Tilne DOIllain Methods. The response of a linear dynamic system to
an input x(t) is given by the integral in eq. (26):

(26)
Substituting this eq. (26) in eq. (1) gives the autocorrelation function of
the output if the input is stationary:
(27)

cpcc(r) =

1
lim T-+~ 2T

f +TJro g(s) Jro g(r)x(t + r 00

00

r)x(t - s) dt ds dr

-T

The mean square value of the output is obtained by setting r = 0 in eq.
(27) :
(28)

c2(t) =

.£00g(s) .£00g(r)q,xx(s _

r) ds dr.

FEEDBACK CONTROL

24-12

By an appropriate change of variables the alternate form of eq. (28) is
given by eqs. (29) and (30):

f'

+00

c2 (t) =

(29)

cPxx(r)cPgg(r) dr,

-00

cPgg(r) =

(30)

foo

g(t)g(t

+ r) dt.

-00

Strictly speaking the transition from eq. (28) to eq. (29) is possible
only if the input x(t) is stationary. However, if the characteristics of
x(t) vary only slightly during one settling time of the control system and
if cP(r) has been computed from a sample which is long compared to one
settling time, eq. (28) is approximately true for nonstationary inputs.
For the more general case of a linear time-varying system with a nonstationary input, the mean square output
at time (t) is given by eq. (31):
.
,

~ "" i~g(t, 8) i~g(t, r)(t -

(31)

r, r - .) d. dr,

The autocorrelation function is defined in this case by eq. (2). The function get, s) is defined as the effect on the system output at time t of an
impulse applied at time (t - s). The wavy line over c2 (t) in the equation
indicates an ensemble average rather than a time average.
Frequency DOInain Methods. For cases where the spectral density
is known, the mean square of the output is given by eq. (32):
c2 (t)

(32)

=

i~(") IG(jOl) 12 dOl,

2

(a)

x(t)

---II ~211----.l.
~C(t)

y(t)

-----II ~1 -If
G2

I-_ _

(b)
FIG.

6.

Control system with two inputs: (a) actual circuit; (b) equivalent circuit.

NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS

24-13

A more general case is shown in Fig. 6 where there are two inputs to a
control system and the inputs may be correlated. In terms of the equivalent circuit, the mean square output is given by eq. (33):

f.oo[l Gx(jw) 12iJ>xx(w) + IGy(jw) 12iJ>yy(w)

c2 (t) = 2

(33)

+ Gx*(jw)Gy(}w)'PXY(w) + Gx(}w)Gy*(}w)'Pyx(w)] dw.
The starred transfer functions are complex conjugates.
COInputer Methods. A modification of eq. (28) leads to an analog
computer method for computing noise output. If the input noise has a
constant spectral density K, the autocorrelation function becomes a delta
function at r = 0 with strength 27r1C Then the system output is given
by eq. (34): .
(34)
c2(t) = 2"K
g2(r) dT.

f.oo

Generating the impulse response by analog techniques, squaring it, and
integrating the result give the mean square output. If the input noise

JL

Impulse

FIG. 7. Computer simulation to compute the mean square output for a correlated noise
input.

does not have a constant spectral density, the output can be computed
from the system shown in Fig. 7. The filter transfer function is specified
byeq. (35):
(35)

The same technique can be used for linear time-varying systems, and
for nonstationary random inputs if the inputs are equivalent to stationary
noise passing through a linear time-varying filter. For this case eq. (34)
becomes eq. (36):
(36)

;?ft) =

f.oo g2(t, T) dT,

2"K

where get, r) has the definition given following eq. (31). The function
get, r) with r as the variable can be generated from the adjoint of the
control system and a shaping filter (Ref. 6). The following is quoted from
Ref. 7. "The adjoint is found from the analog of the original by:
l.' Turning each element in the loop around and reversing the direction
of signal flow.

FEEDBACK CONTROL

24-14

2. Letting the variation of time-varying element start from some time
tl and run backward relative to the action in the original system.
3. Interchange the input and output of the system. The new input is

oCt -

t 1)."

The output is then get, T) as a function of T. This output can then be
squared and integrated to give a machine solution of eq. (36). Figure 8
shows an example of a system (a) and its adjoint (b).
fl (t)
I--:-......,..-~ c(t)

x(t)

f'l----r
t-

JL

Impulse

(b)

FIG. 8.

Analog of a system (a) and its adjoint (b).

Noise Generators. Many nonlinear control systems and systems involving a human operator will not yield readily to analytical techniques.
In these cases simulation using a random noise generator is required.
Several such generators (Refs. 8, 9, 10) are available. In general, the
output is a flat spectrum with various amplitude distributions possible.
Where the spectrum of the true input noise is known, a shaping filter, as
specified in eq. (35), can be used to modify the output of the noise generator.
Although the use of a noise generator and a simulated system provides
a simple solution to many complex problems, it is also a time-consuming

NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS

24-15

one. The methods of the previous section of this chapter provide an
indication of the sample lengths required where non-time-varying systems
are tested with stationary inputs. For time:-varying systems, the answers
will be of interest at one or more times during the run. The number of
runs required is determined by standard statistical methods.
5. SYSTEM DESIGN IN THE PRESENCE OF NOISE

Previous sections of this chapter have shown methods for describing a
random input or disturbance and methods for computing the response of
a system to these inputs. It remains then to establish procedures which
can be used to apply these methods to the design of control systems.
Unfortunately each problem is a little or greatly different from any general
case so that the designer must examine his problem and determine what
methods are adequate.
Mean Square Error Criteria. Practically all the work covered in this
section aims at minimizing the mean square error of the system. In some
cases restraints are' placed upon the solution in an attempt to conform
more nearly to the practical situation. The limitations of this approach
are listed below.
1. The mean square error may not be the proper criteria. For instance,
in a gun fire control system the object is to maximize the probability of
destroying the target.
2. The data concerning the system inputs will seldom be exact enough
to warrant an extended analysis or to justify the system complexity required to realize the desired response.
3. The optimum design may be very sensitive to practical limitations of
the system, such as gain variations.
As a result, it is suggested that formal methods of optimum design are
good guides for a design but that more useful results are obtained by
starting with a conventional design and varying the parameters to minimize the noise error as computed by formulas in Sect. 3 of this chapter
or by analog computers.
OptiIllUIll Design for Stationary RandoIll Inputs. For the case
where the signal and the noise enter the system at the same point and are
both random and stationary and their cross-correlation is zero, the linear
filter giving the least mean square error is given by eq. (37) (Ref. 11):
(37)

G (.)

cps(w)
G (. )
opt JW = cps(w)
cpn(w) d JW ,

+

where cps(w) = signal spectral density,
cpn(w) = noise spectral density,
Gd(jw) = desired transfer function if noise was not present.

FEEDBACK CONTROL

24-16

The mean square error is given by eq. (38):
(38)

2
0" Emin

=

f+OO

,,(t, t - 7)

~

1.

00

II(t - 7, t - r)g(! - r, t) dr,

where get - r, t) is the response of the control system at time t to an impulse applied at time (t - r) and the autocorrelation functions are defined
by eq. (2). Equation (49) will usually require a numerical solution.
The response get - r, t) is the optimum only at time t. A different
time will in general require a different response so that the system must
be time-varying. The values of get - r, t) are most easily obtained by
using adjoint techniques with an analog computer (Ref. 7). A cut-andtry variation of parameters can be used to approximate the calculated
optimum response.
Systelll Optilllization Under Constraints. The response called for
by the minimum squared error criteria may call for unrealistic demands
on components by requiring extended ranges of operation to prevent
saturation or by requiring highly rated components to prevent overheating.
Hence, the optimization should more practically be carried out under the
restraints of rms power dissipated in the output or signal level at various
points in the system (Refs. 16 and 17).
EXAMPLE. Suppose that the input consists solely of a signal with spectral density  1
must be true for phenomena to occur in single loop
system with simple saturation.
( IG I = open loop gain,
'Y = phase margin).
(See Ref. 3.)

COMMON NONLINEAR PHENOMENA

Characteristics and/or Effects
When excited by a sinusoidal driving signal, the resonance is normal for small
amplitude signals. Theoretically, for
larger amplitude signals the resonance
bends as in A below. In practice the
three-valued function cannot be measured but the response will appear to
jump as in B below. The jump will
occur at different values of frequency
depending upon whether the frequency
is increasing or decreasing. The phase
characteristic exhibits a corresponding
jump.
locus of
/resonant peaks
large inputs

,Jf.._
~~

~

..........,

E
<

.•

~
\

Frequency_

A.

2. Limit
cycle or
bounded

Occurs in unstable or conditionally stable nonlinear
systems. Describing func-

Small
mputs

.g

'K

~

"-""~
•

t-=:=:=

General Remarks
For jump resonance to occur the system
must be second order or higher. To
have significant bending the damping
must be 0.1 or less in a second order
system. The phenomena can occur in
systems with saturation or increasing
gain characteristics. The bending is
to the right for increasing gain characteristics and to the left for saturation.
The normalized second order-equation
of the type dx 2/dt 2 + dx/dt + f(x) =
F cos wt (Duffing's equation) has been
solved for various forms of the function f(x) and each case has exhibited
the jump resonance when the viscous
damping was small. (See Ref. 4.)
The existence of the jump resonance
can be confirmed by the use of describing functions. (See Refs. 3 and 5.)

B

An unstable linear system can exhibit
oscillations that grow without bound.
A nonlinear system that is unstable

Limit cycle oscillations can arise from a
wide variety of system conditions.
Conditionally stable systems with sat-

"T1

m
m

oo;J

»
n

'"

n

0"

z

-t

:;;0

o

r-

oscillations

3. Subharmonic
generation

tion analysis shows that
conditions of IG I = 1 and
'Y = 0 must be met for phenomena to occur in simple
systems. For a stable system with an unstable limit
cycle it is necessary to excite the system beyond the
level of the limit cycle to
obtain self-sustained oscillations.

Appears in nonlinear systems
excited sinusoidally. No
general rules are available
defining the necessary conditions for occurrence. The
phenomena have been observed in lightly damped
systems with nonlinear restoring force and in systems
with nonlinear energy delays.

can oscillate at fixed amplitudes. Such
oscillations are referred to as limit cycle
oscillations. Limit cycles can be either
stable or unstable depending upon
whether the oscillation converges or
diverges from the conditions represented. Depending upon the system
characteristics, the limit cycle oscillation can vary from nearly simple harmonic oscillation to a highly nonlinear,
relaxation type oscillation. Self-excited oscillations arising in a stable
system with unstable limit cycle are
referred to as soft oscillations. Selfsustained oscillations which occur after the system has been excited to a
given level (unstable limit cycle) are
referred to as hard oscillations.

When the output contains subharmonics
of the input exciting frequency, the
phenomena is referred to as subharmonic generation.

uration will contain both a stable and
unstable limit cycle and an unstable
system with saturation will have one
stable limit cycle. System imperfections that appear at low signal levels
(backlash, friction, etc.) can, under
the proper conditions, cause limit
cycle oscillations. Existence of this
type of limit cycle makes it necessary
to define instability in terms of the acceptable magnitude of an oscillation
since a low level nonlinear oscillation
mayor may not be detrimental to performance of the system. Because soft
and hard types of oscillations can exist,
the designer must specify the input
range completely in evaluation or synthesis of a nonlinear system. Limit
cycles can be most correctly explained
by use of the phase plane; however,
the magnitude and fundamental frequency of the limit cycle can be estimated to a first order of magnitude by
means of describing functions.
Systems with elements having hysteresis, i.e., backlash, magnetic. hysteresis,
friction, have been known to exhibit
this type of performance when excited
with a sinusoidal input. The transition from harmonic to subharmonic
operation can be quite sudden, but
once the subharmonic is established,
it is often quite stable. (See Ref. 8
and its bibliography.)

z

o

Z

!::

Z

m

>
;;:c
U)

-<

U)

--I

m
?;
U)

t-:)

111

6

111

~

01

b

0-

TABLE

Type
4. Intermodulation effect on
gain

Condition for Occurrence
Occurs in amplitude-sensitive nonlinear systems excited by two or more frequencies. The frequencies
can be separate inputs or
one input with a complex
waveform. The amplitude
of the complex wave must
be sufficient to enter the
nonlinear region.

1.

COMMON NONLINEAR PHENOMENA

(Continued)

Characteristics and/or Effects
Because of the amplitude-sensitive nonlinearity the frequencies will be intermodulated. This causes the original
frequency components to have different amplitudes and phase shift than
obtained from the nonlinear system
with only one frequency present. This
can be interpreted as a different phase
shift and/or attenuation through an
element. The effect is also apparent
when noise is present with the signal.

General Remarks
In a simple saturating system the effect
can be explained quite easily. Two
frequencies are considered. Mter the
saturation the amplitude of both will
be reduced beyond that expected if
only one frequency had been present.
If we are considering the effective gain
with respect to one of the two frequencies, the gain will have been reduced.
This gain reduction in the open loop
can be interpreted as reducing the gain
crossover and therefore increasing the
phase shift of the closed loop for the
frequency being considered. (See Ref.
7.) By considering one of the frequencies as an extraneous signal, the effect
of noise on the performance of a saturating system can be envisioned. The
effect is particularly significant if the
amplitude or phase shift of the closed
loop is important to system performance.

-n

m

m

oC::I

»
()
A
()

o
Z

-I
;;c

o
r-

NONLINEAR SYSTEMS

25-07

mente A great deal of work of this type remains to be done in nonlinear
system analysis.
The major problem for the systems engineer lies in synthesis of a control.
In synthesis one needs in addition to a full appreciation of the characteristics and a complete understanding of the nature of the task to be
performed: (a) methods of rapidly, approximately estimating the effect of
different types of compensation in order to allow selection of a potentially
good approach and (b) having selected a general approach, a logical design
procedure which converges on the "best" design. Although no such
generally satisfactory method exists, the methods of linearization, describing functions, and phase plane analysis are powerful analytical tools for
attacking nonlinear problems. (See Sects. 3, 4, and 5.) By and large
these methods are not exact but often suffice for preliminary design calculations. The majority of these methods attempt to linearize the problem
sufficiently to' allow the use of the well-known techniques used in the study of
linear systems. Because of the difficulty in providing generalized design
criteria or design charts for any but the simplest non1inear system, the
synthesis of a system using nonlinear elements is primarily a cut-and-try
process tempered with common sense.
Unusual Phenonwna Peculiar to Nonlinear Systems. Many unusual phenomena occur in nonlinear systems. In a linear system the
response to a given input defines the response to be expected from any
input. This is not true for a nonlinear system. Cases arise where performance may completely deteriorate between a step response and a sinusoidal response. Table 1 summarizes some of the more common types of
nonlinear phenomena which have been catalogued. The list demonstrates
that the designer must be aware of the peculiar characteristics exhibited
by nonlinear systems and completely specify the operating conditions in
order to proceed with an intelligent, efficient design. The types of nonlinear phenomena described in Table 1 are those most commonly encountered in nonlinear feedback control systems. Other types of nonlinear
phenomena have been catalogued; frequency entrainment, asynchronous
excitation and asynchronous quenching, parametric excitation, etc. See
Ref. 6 and its bibliography for more details on nonlinear phenomena.
3. METHODS OF' ANALYSIS: LINEARIZATION

Frequency response analysis can be used only when the system is described by linear constant coefficient equations. Certain nonlinear systems can be linearized by use of the perturbation theory. The method
assumes that for very small deviations about the operating point the
system is linear. The perturbation method determines the coefficients of
the new linear equation describing the performance of the system. Once

25-08

FEEDBACK CONTROL

reduced to a linear form the usual frequency response techniques can be
applied.
Method of Evaluating Linearized Coefficients. A nonlinear function of one or more variables can be linearized if the function is analytic.
(Refs. 6 and 9.) Expand the function in a Taylor series about the operating point and neglect all the second order and higher derivatives. One
thereby considers only the incremental change about a nominal value.
If the function is f(xI, X2, X3,· .. , xn) = f(Xi), then the Taylor expansion
about point ai(xI = all X2 = a2, X3 = a3, etc.) is

k ~ i,
X~=O'l

Xk=ok

or

and
(1)

for small values of Xi - ai. Equation (1) is a linear relation between
tly and Xi - ai or tlXi, the incremental changes.
The accuracy of the approximation can be estimated by evaluating the
next terms in the Taylor series, e.g., the second order terms are

k

~

i.

Xk=a](.

xt=at

Alternate Method of Evaluation of Linearized Coefficients. The
same linearization can be accomplished by substituting al + tlXI, a2
tlX2,
etc., for Xl, X2, etc., in the original function, and neglecting all second order
and higher terms, i.e., terms containing the product of the type tlXItlX2.
Here al and a2 are the values at the operating point and tlXI and tlX2
are small devia ti ons.

+

NONLINEAR SYSTEMS
EXAMPLE.

tuting x = Xo

Consider the nonlinear function

25-09

lex, y)

w

=

=

+ AX, y = Yo + Ay, w = Wo + AW
Wo + AW = xoYo + Ayxo + xoAy + Ax Ay.

xv.

Substi-

If AX and Ay are small, the term Ax Ay is small and can usually be neglected.
Wo

+ AW ~ XoYo + AXyo + Ayxo.

Since Wo = XoYo, the remaining two terms must be deviations, therefore,
(2)

Equation (2) is a linear expression for Aw. For small variations about
the operating point Xo, Yo, eq. (2) describes the performance. The coefficient Yo is the gain between Ax and AW, and Xo is the gain between Ay
and Aw.
The same relationship would have been obtained by using the Taylor
series expansion.
Table of Useful Algebraic Approxhnations for Linearization.

Table 2. is useful when making the above substitution into nonrational
equations. The terms in Table 2 were determined by considering the
TABLE

2.

USEFUL ALGEBRAIC ApPROXIMATIONS

m«1
Algebraic Expression
1
1. 1 + m
2. (1

+ m)n

3. em

1-m
1 +mn

1+m

4. loge (1

+ m)

5. sin (m)
6. cos (m)
7. (1

Approximation

+ ml)(l + m2)

+m2
+ n(n - 1)m2
+m

2

2

2

m2

m

2

m

rn 3

6
m2

1
1

Next Term in Series

2

+ ml + m2

+mlm2

series expansion of the closed form. The last column of the table can be,
used to estimate the accuracy of the approximation.
To use the table, it is necessary to work the expression into a nondimensional form.

25-10

FEEDBACK CONTROL

EXAMPLE.

Consider the flow through a variable orifice.

Flow, q,

IS

given by

where A = orifice area, a variable,
P = pressure drop, a variable,
Cd = flow coefficient, a constant.
By substituting the incremental change form

qo

+

Aq = Cd(Ao

+

AAo)YPo

+

AP.

Dividing the quantity under the radical sign by Po yields

U sing approximation 2 in Table 2 for n =

qo

+

Aq

Y2 yields the expression

~ Cd(Ao + AAo) VPo (1 + ~ AP).

.
2 Po
By expanding and neglecting higher order and constant terms this reduces
to
CdAO)
_ r::Aq = ( 2yP AP + (Cdv Po) LlA,
o
where the terms in parenthesis are the equivalent gain between AP and
AA and flow, Aq.
Graphically Linearizing Systent Characteristic. The analytical
expression for the function need not be known. If the graphical relationTangent at woo Xo. slope

t

w

FIG. 2.

Wo

-----

ow I
=~
Xo
uX

ow

Determination of the linearized coefficient from a plot of the function.

NONLINEAR SYSTEMS

25-11

ship between the variables is known, a linear expression can be obtained
by considering incremental departures from the nominal values at the
particular operating condition being considered. For Fig. 2 then
Wo

+ .1w = f(xo) + (slope of function at xo, wo) ~x
OWl
=Wo+-

and

ox

~x
Xo

~w

= -OWl ~x.
oX Xo

The method can be extended to functions of more than one variable by
obtaining the slopes from the appropriate curves. Note that in dealing with
functions of more than one variabl~ the slope must be taken so as to be independent of all variables but the ones being considered. In Fig. 3 a function of

(b)

FIG. 3.

Determination of the linearized coefficients from cross-plots of a function of
two variables: (a) xw plot and (b) yw plot.

two variables is cross-plotted to obtain the independent slopes from separate
plots. From these the linear relation for small deviations at values of the
variable X2, Y2 is

For functions of two variables that are reasonably regular it is usually
possible to pick the values for calculating the slopes from a single plot
of the function and avoid the labor of cross-plotting the functions (see
Ref. 9).
Use of Li,nearized Coefficients. The characteristics of the function
or system are approximated by the linearized coefficients for small varia-

25-12

FEEDBACK CONTROL

tions from the operating point. Therefore, for small disturbances, only the
deviation terms need to be considered in determining the stability. The
"constant" terms defining the operating
point will remain the same to
Ax
a first order approximation. Under
these conditions the block diagram
l---~.1w
for the multiplication w = xy is
+
shown in Fig. 4.
Ay
For small disturbances, the approximation can be substituted into
FIG. 4. Equivalent block diagram for
the system diagram and the usual
the linearized small deviation expression methods of linear analysis and comfor the function w = xy.
pensation used. Note that one is
now dealing only with the deviations
and not the total variable. For further applications of this method see
Sect. 7.
Lhnitations. The above approximations are limited to small deviations from the operating point. The errors get progressively worse as the
signal level is increased, and considerable care must be exercised when
dealing with large excursions. The validity of the approximation can be
checked by evaluating the next terms in the Taylor series or the last
column of Table 2.
The perturbation theory is valid only if the derivative of the function exists.
The method would be of doubtful value in dealing with a relay characteristic.
The method is sometimes limited when the operating point is at 0. One
or more of the variables will then have a steady-state value of 0, and
terms involving that variable and the deviation can be lost. This can
lead to an indeterminate or inaccurate solution. For instance, in the
example of linearizing w = xy if either Xo or Yo were 0, the variation of
.1w with the corresponding value of .1x or .1y is zero. See eq. (2). Although the error of the approximation is still .1x .1y this term becomes
significant with respect to the other terms as Xo and Yo become small.
Consider the example of the flow through an orifice. The linearized deviation expression is repeated here:

As Po ~ 0, this expression becomes indeterminate and the analysis
based on the approximation under these conditions loses significance. A
linear expression no longer adequately describes these situations.

NONLINEAR SYSTEMS

25-13

4. METHODS OF ANALYSIS: DESCRIBING FUNCTION

General
Definition of Describing Function. When an element is excited by
a sinusoidal signal, the describing function is the ratio of the fundamental
of the output signal to the sinusoidal input signal. A describing function
of an element may be a complex quantity, characterizing both amplitude
and phase relations between the input and output. The describing function may be a function of both signal amplitude and frequency.
Use of the Describing Function. Within the validity of the basic
assumptions, the describing function, representing the nonlinear element,
can be substituted directly into the system equations for the nonlinear
characteristics. The use of the describing function quasi-linearizes the
frequency response equations. Since the describing function will be a
function of amplitude, the system frequency response will be a function
of both frequency and amplitude. The quasi-linearization of the system
equations in the frequency domain allows the use of the Nyquist criterion
to determine stability.
Although the dependency of the frequency response upon both frequency
and amplitude complicates the calculations slightly, practical methods are
available which require little effort beyond that normally required in the
plotting of a Nyquist diagram.
Usefulness of Describing Function. The describing function provides
an approach which allows a solution in the frequency domain. The ability
to manipulate the system equation in the frequency domain is valuable
because: (a) frequency response techniques developed for linear system
are available for synthesis, (b) synthesis and analysis can be handled with
relative ease, and (c) the technique is not limited to systems with few energy
storage elements.
Basic ASSulllptions of Describing Function Method. If the input
to the nonlinear element is sinusoidal, then it is assumed that:

1. The output is periodic and of the same fundamental period as the
input signal.
2. Only the fundamental of the output wave need be considered in a
frequency response analysis.
3. The nonlinear element is not time-varying.
4. Only one nonlinear element is considered to exist in the system.
Assumption 1 implies that no subharmonics are generated.
If the element used in a system is driven by sinusoidal signal, the output
of the device by assumption 2 is considered to be sufficiently filtered by

25-14

FEEDBACK CONTROL

the system characteristics so that the signal fed back into the input of
the nonlinear element is essentially sinusoidal. The degree to which the
input to the nonlinear element must be "essentially sinusoidal" is determined by how critical the nonlinearity is to the wave shape of the
driving signal.
While the describing function cannot be obtained for a system in which
the coefficients are time-varying because the output would not reach a
steady-state periodic solution, the describing function can be obtained
for an element with characteristics which are dependent on frequency.
In such a case the describing function will be a function of both amplitude
and frequency of the signal.
If a system contains two nonlinearities of major importance, it is still
possible to get a describing function for the system. Often the easiest
way to obtain the describing function in this case is to lump the characteristics of the two nonlinearities and obtain an over-all describing function.
In general, it is not practical to consider each nonlinearity separately.
Theory of Describing Functions. Consider the system of Fig. 5. It
is convenient in describing function analysis to have the nonlinear and
Linear elements

Nonlinear elements

~
FIG. 5.

gn(m)

I

n

)

I

g(')

c

Block diagram of simplified nonlinear system.

linear elements separated as in the figure. In the following it will be
assumed that this has been done.
The output, n, of the nonlinear element is related to the input, m, by

n = [gD(m)]m.
If the input is sinusoidal, then, by the assumptions made, the error
M(jw) must be sinusoidal and of the fundamental frequency.

N(jw) = GD[M(jw)]M(jw).
The output, N(jw), can be represented by a Fourier series:

where Nl (jw) is the first harmonic of the output N(jw).

NONLINEAR SYSTEMS

25-15

By definition, the describing function is
G
M
D1 ( I

w
I,)

N 1 (jw)
111 (jw)

=---=

I N 1-(jw)
I
(Nl(jW))
- an Ie - - - ,
IM (jw)
g
M (jw)

where GD1 (I M I, w) is the describing function as a function of amplitude,
IM I, and frequency, w.
Usually for convenience (I M I, w) and the sUbcript 1 are dropped;
this leaves GD as the symbol for describing functions.
Within the validity of the assumption that the harmonics are sufficiently
filtered by the linear system elements so that the feedback signal contains
essentially the fundamental, the harmonics of the output of the nonlinear
element can be neglected, and the describing function GD can be used as a
sedes element in the frequency response analysis.
GD can be determined by conventional Fourier series analysis. (See
Chap. 14 and Ref. 10.)
Stability Criterion. For the system of Fig. 5 the frequency response
is approximated by
c .
GDG(jw)
(Jw) = 1
GDG(jw)

+

R

I

where GD = the describing function,
G(jw) = the frequency sensitive portion of the system.
For a minimum phase shift system (see Chap. 23) the system will be
critically stable when the denominator is zero or
1 + GDG(jw) = 0,
or
(3)

G(jw) = -l/GD,

or
(4)

-GD

=

l/G(jw).

When eq. (3) or (4) is satisfied, the system will have a sustained oscillation
of the amplitude and frequency which satisfies the equation (Ref. 11).
LiInitations and Accuracy of Describing Function Method. There
are two major disadvantages with the describing function analysis:
1. There is no convenient method to determine the accuracy. A method
proposed by Johnson (Ref. 12) becomes laborious for more than the most
simple systems.
2. Frequency response analysis allows prediction of the transient
response. A describing function analysis allows at best a qualitative
interpretation. These designs can therefore predict stability and frequencies of oscillation, but are limited to crude rules of thumb in prescribing

FEEDBACK CONTROL

25-16

a given stable response. It must be pointed out that such approximations
are often all that is justified by the accuracy of other system data available.
In a wide variety of applications, use of describing functions has allowed
prediction of the frequency and amplitude of oscillations within 20 per
cent. It is difficult to generalize, but usually the method will be most
accurate when GD and G(jw) are varying rapidly in the region of intersection (Ref. 12). Erroneous results may be obtained in. some cases when
the intersection is approximately tangential (Ref. 13).
Because the assumption of sinusoidal input to the nonlinear element
is not exact, the method works best when GD is not sensitive to wave
shape of the driving function.
NOTE.
The describing function method of analysis is the only practical
analytical technique for treating nonlinear systems which are higher than second order.
Describing Function: Methods of Presentation
Inverted Nyquist Diagralll. The equation for sustained oscillation,

-GD = l/G(jw) ,
indicates that an intersection of the -GD and l/G(jw) loci is a point having
a given frequency and amplitude at which sustained oscillation can occur.
j

Unstable region

Real
\,

I

Stable region

'\

'-

G(~W) locus
FIG. 6.

I

"

/
' ........ _ _ _ - / /

/

Inverted Nyquist diagram for a stable nonlinear system showing the stable
and unstable regions for the describing function locus.

In a normal inverted Nyquist plot the -1 point should be on the left
as the l/G(jw) locus is traversed in the direction of increasing frequency
for a minimum phase shift system. In this case, there is no longer a fixed

NONLINEAR SYSTEMS

25-17

-1 point, but the system is stable if the - GD locus is entirely on the left
of the l/G(jw) locus. '
The stability criteria is then as follows: If a locus of all possible values
of GD is plotted, then the system will be stable if the - GD locus does not intersect the l/G(jw) locus and GD locus lies completely on the left-hand side of
the l/G(jw) locus when the l/G(jw) locus is traversed in the direction of increasing frequency. (Valid for minimum phase functions.)

A stable system is shown in Fig. 6.
By plotting in this manner the frequency sensitive, G(jw) , and
amplitude sensitive, GD , portions of the system have been separated and can
be considered independently.
Nyquist Diagralll. Equation 3 leads to a Nyquist diagram as indicated
NOTE.

in Fig. 7.

- G
-L locus
D

Stable region
Unstable region

FIG. 7. Nyquist diagram for a stable nonlinear system showing the stable and unstable

regions for the describing function locus.

The stability criteria is as follows: If a locus of all values of -l/GD is
plotted, then the system will be stable if the -l/GD locus does not intersect
the G(jw) locus and the l/GD locus lies entirely on the left-hand side of the
G(jw) locus when the G(jw) locus is traversed ,in the direction of increasing
frequency.
A stable system is shown in Fig. 7.
Log-Angle Plane Representation. It is sometimes more convenient
to work with magnitude and phase angle semi-independently. This can
be done in the case of describing functions by use of the log magnitude- ,
angle plot. These are the familiar coordinates used on the Nichols charts.
(See Chap. 21, Sect. 7).

FEEDBACK CONTROL

25-18

For the case of a nonlinear system, the critical point is
20 10glO / G(jw) / = 20 10gio /1/GD

/

L G(jw) = -180° - L GD •

(5)

If the conditions of eqs. (5) are met, the system is unstable. A typical
plot of a stable system servo is given in Fig. 8. As long as the two loci do
not intersect, the system will be stable.
CIJ

iii
.c

'0
(])

"0

Stable region
Unstable region

0-----4----~------~--------------­

Phase angle

•

FIG. 8. Log magnitude-angle diagram for a stable nonlinear system showing the stable

and unstable regions for the describing function locus.

Typical Loci for Nonlinear SysteIns. Table 3 shows a number ·of
different types of loci which can be expected. In each of these diagrams,
the frequency and amplitude loci have been plotted. The arrows indicate
increasing frequency and amplitude of signal input to the nonlinear
element.
System B has an intersection at point 1. This indicates that the system will be unstable at amplitudes less than those at point 1 and will be
stable for larger amplitudes. This system, therefore, will be unstable and
oscillate at the amplitude and frequency of point 1. If a small disturbance
is introduced into the system B described in Table 3, the system will
appear unstable and the amplitude of oscillation will increase until point
1 is reached. If the amplitude of oscillation becomes larger than this,
the system appears stable and any oscillation would tend to die down to
that corresponding to point 1. The amplitude and frequency corresponding to point 1 are the amplitude and frequency of the sustained oscillation.
Point 1 of system B is called a convergent point because disturbances at
either side tend to converge at these conditions. This is contrasted with
point 2 of system C, Table 3, which is a divergent point since disturbances
which are not large enough to give this value of GD will decay and disturb-

25·19

NONLINEAR SYSTEMS

TABLE

3.

TYPICAL LOCI FOR AMPLITUDE SENSITIVE NONLINEAR SYSTEMS

Diagram Type
Stability criteria

Inverted
Nyquist Diagram
1

-GD

=

G(jw)

Nyquist
Diagram

- ~=
GD

G(jw)

Log-Angle Diagram
2010glO IG(jw) /

= 20 10glO/1/GD /,
LG(jw)

= -180° - LGD
A. Stable system

B. System with a

convergent
point

c. System with a
convergent
point 1 and a
divergent
poin t 2; Case
I, stable for
small signal
D. System with a
convergent
point 1 and
a divergent
point 2; Case
II, unstable for
small signals
and very large
signals

25-20

FEEDBACK CONTROL

ances which are larger will result in oscillations which tend to increase in
amplitude.
In system D, point 1 is convergent and 2 is divergent.
COlllparison of the Methods of Presentation. All the methods are
equally valid. The designer may thus choose the one with which he is
most familiar and/or the. one which best fits the design problem. In the
inverted Nyquist diagram many of the simpler describing functions are
bounded, whereas in the Nyquist diagram the describing functions will
be infinite for some conditions. The other factors that influence the selection of one form of the Nyquist diagram are still applicable. (See Chap.
21.)
The log-angle method of depicting the stability of the system has advantages in synthesizing a system containing nonlinearities. It is somewhat
quicker to plot since G(jw) can be obtained directly from a Bode diagram.
This method of display also lends itself to use with Nichols charts and
templates. (See Sect. 7.)
Frequency Variant Describing Functions. (See Ref. 12.) Describing functions which vary with frequency as well as signal amplitude
will appear graphically like the typical plot of Fig. 9. The describing
function becomes a surface in three dimensions (magnitude, phase, frequency), and if this surface is pierced by the frequency locus (also plotted

G(jw)

I
- Gn (w2)
I

- Gn (w3)
Phase

FIG. 9.

Log magnitude-angle diagram for a nonlinear system with a frequency and
amplitude sensitive nonlinearity. Intersections (1) and (3) are not significant because
at t.hese intersections G(jw) and GD(w) do not have the same frequency.

NONLINEAR SYSTEMS

25-21

in three dimensions), the system will be unstable at the frequency and
amplitude of the intersection. In other words, to have a significant int-ersection, it is necessary to have the intersection of the GD(J'w, 1M I) and
G(jw) loci occur at the same frequency. This is indicated in Fig. 9 at W2'
Tables of Useful Describing Functions
Alllplitude Sensitive Nonlincaritics. Table 4 gives some of the
more common describing functions for simple amplitude sensitive nonlinearities, with corresponding graphs in Figs. 10-16.

TABLE 4.
Type
of Nonlinearity
Saturation

Nonlinear
Characteristic
nl

.r.-

USEFUL AMPLITUDE SENSITIVE DESCRIBING FUNCTIONS

Output Wave
Shape, n
(Input = m
= IMI sinwt)

~

I\.)

Graph
of Describing
Function

Equations of Describing Function

GD

If

2 (. -1
-;;: sm

GD

_

_

S

IMI

S

.

-1

S )

+ 1M/ cossm . IMI

01

~

I\.)

Fig. 10

'-/

Deadband

-#
-;zi!
Slope =K

Hysteresis

/''--',m

f.C\\\J/
\

2

K -;;:

(7r

.

-1

2" - sm

B

B

21MI

)

Fig. 11

0

D::J

G
-D =

/-',m
/~n

~
,,_
/

....

."

m
m

',......,.,/

K

b=

G
-D =

K

b=

»
n

~/t
a2 + b2
tan- 1 -b

A

n

a

0

7r1~1 (IZI - 2)
('

y'-/t
+
a2

b2

Fig. 12

MI IMI

I+ (' MIMI

~:j

. -1
cossm

I

_ 1 [7r
. -1
a-;;: 2"+sm

Negative
deficiency
(type 1)

B

21MI - 21MI

tan- 1 -b
a

_ 1 (
a -;;: 7r

2D

.

+ /M/ cos sm

-1

1 ('

MI IMI

H) ]
Fig.13a

-~(I~IY
D
2/ M /

-I

;:c

0r-

H)

H) cossm.

Z

Negative
deficiency
(type 2)

Relay

n,/

/s:e=K

;;~~n

= 1 +; IMI

GD

= i~sin ~

Fig. 13

'
\::f
'

....

_"I

~ ~
\

- V

2 D

GD
K

I

'-'

7r IMI

+ 1/1)/_ 1/1 2

_

2

{3

Fig. 14

H)

{3 -cos
-1 (B +
~

Z
0
Z

-H)

= cos- (B21MI
1

Granularity

Delay time

¥v
m=step

1=;:=\

n

Z

~
\

,_/

m

2 V
GD=;IMI

I

(Ai
f, ~4-(2a-1)2
.

Ai = largest mteger value of
GD

=

IkII

B

»
;::c

B2 )
IMI2

1 IMI
+ -,
-2 B

Fig. 15

Ul

-<

Ul

-f

> -21

m

~

Ul

e-iwtd

(There is no harmonic distortion in delay time. This
expression is exact.)

Time-

Variable
gain

c:

n = mK

G
D

=

2_ IMI K - l r[(K

y7r

r[(K

+ 1)/2]
+ 2)/2]

Fig. 16
t-.)
t11

~

(,.)

25-24

FEEDBACK CONTROL

1.0

~

q

C 0.8

c

~
u

c:

~

0.6

bO

c:

;gu
Vl

Q)

"0
"0

-f-

/

0.4

___1LC ==

Q)

.~

ro

.E0

0.2

t-+-t-t-t-7'f-t-t-l

/

Slope

d"-~~SS~ ----,;"
T1i =
=

/

i

Z

V

0 /
0

"

I I

I I I

0.4
0.6
Ratio sllMI

0.2

FIG. 10.

=~

nl

==
-I

0.8

1.0

Describing function for saturation.

1.0

-

~

t!;q

#e=~
.-m=-

'\

"- r\.

0.8

c:~

o

~

.2

'\

1,\

0.6

i\.

bO

c:

;g
u

Vl

Q)

_. rB1"
.
.I
I

=
-

0.4

"0
"0

""

Q)

.!::!

ro 0.2
E
5
z

""
r-...

o

o

FIG. 11.

0.2

0.4.
B 0.6
RatiO 21M1

0.8

......

.....
1.0

Describing function for deadband.

NONLINEAR SYSTEMS

~

1.0

o

~~

~t-..

~
.g 0.8

~

~
i'o...

.a

~

'c
OJ)

"

ro

E
s:: 0.6

'\

o

~

'"
,\.

s::

.a

"

r-...
I"-

Magmtude"
."-

+ff

tl.O

~ 0.4
o

VI

'+-+-1--;il--+-+-+t-I
~

I

I

-g

I

I
I

I
roE
04
!
I
I
I
~ . r+~+~/~-r~~~+H-+~I~~~~i~-+~

.t!

~

1/

i

II

0.2 H'-iJ-HH-+-H-r~~t+-'~-+I-+1IH-+#-1+-H~
I
i

I

O~~~~~~~~~~'~~I~~~~I~~~

o

0.2

0.4

0.6

Ratio B/(2IMI)

0.8

1.0

(a)

1/1

o~Ii:::: l-+...

H/B=O

"'~ 1"-- 1-.....
~'"

Q)

e

~

t-..

~. -10 0

C§l
'l

I"-

r"--fo,..

r-..

~

1"-

£

~ -20

r-...
1'\

0

~

1\
I\.

.Q)

I\.

U)

1,\

III

.c

1,\

c.

g -30

\ , I I

\

0

u

\

c:

.2
tlO

~ -40

H/B =0.2

1

HIB = 0.4

f-

I

H/B =0.6

ff-

1\

0

.~
Q)

a

"

-50 0

If/f,=,l

o

0.2

0.4

0.6

0.8

1.0

Ratio B/(2IMI)
(b)

FIG. 14.

Describing function for a relay contactor with hysteresis: (a) magnitude,
(b) phase shift.

FEEDBACK CONTROL

25-28

1.4
[.,..--r-.,

"' "-

1.2

v
~ 1.0

V\

c

bO

,

,,-r

.!::!

II'
I

Q)

ro

E 0.4

II

-I

I

"C
"C

1\
\

IT

0.6

1\

V

'1/11

.~

Q)

I

, I i

:c

I

n

I

~

;

0

z

.

iii
I"

:
0.2 1':;/7

m-,'

II

I I I I I I I I I
I
I I
I I I

I

o

V

.~

~B~I

I~!
II

o

\

II

I;

c

1\

V

1/

I'

~ 0.8

VI

1
V

Ii I
I

~
u

j

II

,i

!:XlI>

u

.... 1'\

[/i

r\

1

1/

0.2

0.4
B

0.6

0.8

1.0

21MI
FIG. 15. Describing function for granularity. Dashed lines are used when granularity

has a finite number of steps.

NONLINEAR SYSTEMS

25-29

10

I

I

I

1/
6

~

4

0.4

0.2

0.1

/

1/

0)/

""

i~
2

~

/

""r--..

'f.'?I"".
~

'"

~

'"'"

" ""
1----r-- :---

---/
/

-

"-

--'"
r--

~
~

. . . .v
/

,/

V

VV

I"-

......... ~,

1"- ......

r-_

~~-------

. . . . . v ~r/'
/~ /V

1--"'"

"" "---

J{ :::.1.2 I - f-I~-;.;-

f...--

t/

/~J

""-"-.....

"""

......... i'-..

~

r-- (£:::0.8 f--f--I-r-.
" I'....... k' I - l -

I""

I/

''

li I

I~ / VV

1/

_I-::::f~

....... 'Art

II

V

V

vVI
V
/

- ~

VV

V

V~
V/
I V lt~~~

L~ V / J

V/
V

V

4.11.

VI ~ \.~I

~

~

V

.$
~
r--..~tl

~
·0
o

>

~

.2

;e

"'" I'....
............

24

.............

~

~ ~ 1=::::1-

~ ~ -40 ~;!::!
-

~

tlO

:c -

§ ~

M'- 8
-

~

fe.-

).. r.;.

M'

t'-....

~

NL <1

""'-

""

i'...

/''~=2~ \.

50

L

~ ~ -60

"'"

~

~

.......

"'J

""

0..

-70
-80

-90
0.2

~........

r-;;t-...

I/)

~ ~ -30

(l)

r::::: ~~ ..... t--r-.,....

0.4

0.6

7-~

M'

N
...... L

=4

.....

\..

"" '\
..... ~~

~

....

124
Normalized frequency, w T

"

""
6

I"
~

10

-

20

FIG. 17. Describing function of a single servomotor subject to velocity limiting;
M' = normalized magnitude of input, radians per second; N L = saturated speed,
radians per second (maximum speed); T = unsaturated motor time constant, seconds
(Ref. 16).

NONLINEAR SYSTEMS
0
-4

VI

Qj

.c

'u
Q)

~

-8

....

,
-............. ........
.............
~
'-.....
.~

'0

-~
~

ID -12

'0

.a
nl

' . . ~+'~
::::cJ
~~

u

Q)

\

VI

Q)

~

t:lO

-30

Q)

~ -40
iI=

~ -50
Q)

VI

~

a.

-60
-70

~

+,I~
I~
::::./.
6~~
r---.:.
~

............

~(i"0/;
~ ~."e)
~4:~'l)'

~

"" '~"

"

~

r........

"

,

~ "~ i" "" '"
~

-20

VI

,~

"....::~
i'. ......

'"

-10 ~

:c

';;::

z

~~~,I
'~4:

0

t:lO

c

E
0

.........

-

-28

~

'iij

" '"

...

r--...

-24

c

Q)

~

:E - 20

~

,~

f:::: ~ :::::-

...........

~

'c
-16
t:lO

~
c::
0

"0
"0

'"

~

25-33

\

I~

~"

~ ........

\

\

\

i'-.,
\

"

~

~~. f---'R~4-'

4~~~
~~

,

~K
<:

~

'-?>4J ,-

1'.<

'-?>~

6'.,?>

~

'it'

~ Ql

./,

"

+
'~~

:(; 0.

0 t,

"fl)'

i'~ ~.,?>~

---

"~h
~

-

,~r---.r---r-.r--~~ ~ t--

-80

r- r-I-~

-90
0.1

0.2

0.4

-:::: :::::t-- -- -to-- -

0.6
1
2
Normalized frequency w T

"'-

.

10

6

4

20

FIG. 18. Describing function of motor subject to acceleration limiting (Ref. 16).

o

1.0

~

:a

Magnitude

-10
-20 ~ VI
-30 ': ~
Q)

Q)~

0.8

'ct:lO

~Q)

Phase shift

E 0.6

-40~~
u •

~
u

-50 §

c::

.a

0.4

- 60:E,.gf
70 ';;:: g;o

0.2

-80 ~
-90

t:lO

~nl

c

;gu

VI

Q)

C

<3

';'\1

o
o

0.5
1
1.5
2
2.5
3
3.5 4.0
Fundamental dimensionless output motion, ~ INll

FIG. 19. Describing function for Coulomb friction. See Table 5 for definition of terms.

25-34

FEEDBACK CONTROL

SiDlplifying CODlplex Nonlinearities
Separation of ADlplitude and Frequency Sensitive EleDlents. It
is obvious that the nonlinearities of Table 5 can be simplified by a number
of approximations to the point where the simpler amplitude sensitive
describing functions can be used. Note that many of the complex nonlinearities consist of a combination of a simple nonlinearity and frequency
sensitive element(s). Typical of such a case is the system with backlash
shown diagrammatically in Fig. 20a.

(a)

r---------------------------,
I

I

I

I

I

I
I

I
I

(b)

FIG. 20. Block diagram for a simple system with backlash GD and G(s) equal to controller elements: (a) the conventional diagram arrangement; (b) the diagram rearranged
so that the amplitude and frequency sensitive portions of the system are separated. ,

If it is recognized that the input to the nonlinear portion of the system
will be essentially sinusoidal, then the describing function for the deadband (Table 4, Fig. 11) can be used and substituted directly into the block
diagram as a gain GD • To analyze the system conveniently it is necessary
to rearrange the block diagram so that the nonlinearity is separated from
the frequency sensitive elements and can be treated by the usual techniques
of graphically presenting amplitude sensitive describing functions.

NONLINEAR SYSTEMS

25-35

Let
1
GAl = - - - - - - - )
Ke8 ( -JmR
-8
1

KTKe

+

Then rearrange the block diagram to appear as in Fig. 20b. In this illustration the amplitude and frequency variant portions of the systems have
been separated and the usual type of describing function analysis can be
used to determine if the system will be stable and if not what the amplitude
and frequency of oscillation will be.
Method of Equivalent Coefficients. In complex systems, it becomes
difficult and sometimes useless to attempt to separate the amplitude
sensitive and frequency sensitive elements in the system so that the method
of analysis described earlier in this section under Theory of Describing
Functions can be used. For instance in the above example the variables
in which one is really interested, C and R, are buried in the block diagram.
It is thus difficult to determine: (1) whether the overall system will have
satisfactory performance, and (2) how to modify the compensation to
improve performance.
The technique of using an equivalent coefficient is: (a) to recognize that
many nonlinearities appear as a gain change in the system, and (b) to
combine this gain change with existing gains in the system to form an
equivalent gain or coefficient. Once knowing the range of gain to be experienced, the system can be designed to be as insensitive to such a variation as is desirable.
A way to avoid the difficulty in the previous example lies in this approach.
It was pointed out that the describing function of the dead space can be
considered directly in the analysis. This describing function is in series
with the spring constant and can be combined to yield an equivalent spring.
Thus, the system will seem to have a very soft spring at low angular displacements and an increasing spring constant (approaching the actual
spring constant) as the angular displacement increases. This equivalent
coefficient can then be considered as a constant in the remaining analysis.
In this case the major effect is the reduction in the load resonant frequency
and it becomes necessary to make the system less sensitive to load resonant
frequency to avoid difficulties.

25-36

FEEDBACK CONTROL

It is necessary to consider a number of different spring constants to
make sure that no unstable points exist, but this is usually not too difficult
a task although it can be somewhat time consuming. (See Ref. 18.)
EXAMPLE. As shown in item 3 of Table 5, the equation for backlash
from armature voltage to output shaft rate is
8(h

Va
J MJ L8 3+ [J MD L+JLK;:-T ] 8 2 + [ (J M+JL)K'L+DL~eK T ] 8 +K' L [ D L +K~K T ]

where K'L = the equivalent spring constant = KLGD • The value of GD
is obtained from Fig. 11 with the argument B /2(()M - ()L) rather than

B/2M.
The complete system transfer function including the above equations
can then be analyzed for several values of K'L. The actual value of GD
has to be considered only if the magnitude of the input to the backlash is
wanted.
5. METHODS OF ANALYSIS: PHASE PLANE, GRAPHICAL SOLUTION OF
SYSTEM EQUATIONS

General

This is essentially a heuristic presentation of the phase plane method.
As a consequence attention will be directed at the areas of application and
only the most rudimentary explanation of the techniques of constructing
the diagrams will be provided. At its present state of development this
technique has only limited utility in system synthesis; however, phase
plane techniques have received some use in the conception and display of
schemes for nonlinear compensation. (see Refs. 22 and 23.)
The reader is referred to Refs. 20, 21, and 27 for details beyond those
provided here.
Definitions. The phase plane has the coordinates of velocity (usually
the ordinate) and position (usually the abscissa) of the system. The solutions of the differential equations are plotted on this coordinate system.
The locus of a solution to the differential equation is called a phase
trajectory or simply trajectory. A series of solutions or trajectories is referred
to as a phase portrait.
Lilllitations. Analysis by the phase plane method is limited to:
1. Second order (single degree of freedom) systems.
2. Autonomous systems (time does not appear as a parameter in any
of the coefficients of the system).
3. Systems with impulse, step, or ramp inputs or driving functions.

NONLINEAR SYSTEMS

25-37

The limitation on the order of the system is severe, but it is possible to
approximate a limited number of practical systems by a second order
equation for purposes of prelimin"ary analysis. Methods have been proposed for extending the technique to higher order systems but have not
received wide use. (See Refs. 24 and 25.)
Basic Equations. The basic equations that can be treated by phase
analysis are of the form:
(6)

d2x
-2

dt

dx

+ hex, x) - + f2(X, x)x
dt

=

0,

dx

X=-·

dt

By substituting dx/dt = y, eq. (6) can be reduced to a set of first order
equations:
dy
- = N(x y)
"
dt
(7)

dx

-

dt

= D(x y)
"

and eliminating time by division yields
(8)

dx

dy

N(x, y)

D(x, y)

Significant Characteristics of Phase Portrait

Table 6 describes a few of the significant characteristics of phase trajectories. Identifying these characteristics will be useful to the engineer
in interpretation of the phase trajectories.
Areas of Use of Phase Plane Method
Analysis. The graphical techniques of plotting the phase trajectories
make the phase plane method particularly useful for systems with second
order nonlinear equations of motion. Although the availability of analog
computers has greatly reduced the need for such hand methods, there still
remains a need for generalizing analysis. The phase plane analysis often
can provide this generalization.
Presentation of Data. The phase plane has found some use in presentation of analog or actual equipment results. In such cases, the system
does not need to be limited to second order. Of course the interpretation
becomes more difficult the higher the order of the system. Such plotting
techniques have been made even more meaningful when the display is on
a cathode ray oscillograph by intensity modulation. Timing pulses can
be indicated by brightening or dimming the trajectory.

TABLE

Type
1. Nodal
point

2. Focal
point

3. A center
4. Saddle
point

5. Limit
cycles

6.

SIGNIFICANT CHARACTERISTICS OF PHASE PORTRAIT

Description
The trajectories converge or radiate from the node in such a
manner that the direction of the trajectory approaches definite limits as the nodal point is approached. The node is
stable if the paths converge on the node and unstable if the
paths radiate from the node. This is a singular point, i.e.,
a point where eqs. (7) are equal to zero.
The trajectories converge or radiate from the focus on spiral
paths. As for the node, if the paths converge, the focus is
referred to as stable; if the paths spiral outward, the focus
is referred to as unstable. This is a singular point.
Closed trajectories about a point.

Typical Trajectories

~I/77y

"-l

01

W

00

Stable node

*~
Stable focus

This is a singular point.

Trajectories converge toward the saddle point and then diverge
except for the special case when the initial conditions are
such as to fall on the trajectory that goes into the point (the
converging separatice). This is a singular point.

Corresponding Conditions in
Linear Second Order System
Stable node, negative real
roots
Unstable node, positive real
roots
Stable focus, complex roots
with negative real parts
Unstable focus, complex roots
with positive real parts

"T1

m

Zero damping

m

o

o:J

---v-

lone negative

roots rea't'ive
_~ Both
and one
~Separatice
..

6. Hard
oscillations

A limit cycle describes the oscillation in a nonlinear system.
A stable limit cycle is a closed path to which adjacent trajectories converge. When the trajectories diverge from a closed
path in the phase plane, the path is called an unstable limit
cycle.
It is necessary t? excite the. system ?ey<:md a finite bound in
order to obtam self-sustamed oSCIllatIOns. The boundary
will be an unstable limit cycle.

7. Soft
oscillations

Self-sustained oscillations can be started with an infinitely small See figure for limit
excitation. Soft oscillations start from unstable nodes or focycle
cal points.

+Ef;LimitCYCle

POSI

None

A stable limit cycle about ali unstable focal point

~unstable

None

limit cycle
Stable
limit cycle

None

»
n
A

n

o
Z

-t

:;0

o
r-

NONLINEAR SYSTEMS

25-39

Once the analyst has set up the equations for the phase plane, an analog
computer can be used to plot the trajectories. In this manner, one can
maintain the generality of the phase plane analysis and avoid the ennui
of extensive hand calculation.
Systelll Synthesis. Because the graphical presentation often makes
interpretation of results easier, the phase plane method has been looked on
with favor by many. A number of authors describing work on "optimum
controls" have made extensive use of the phase plane in presenting their
results. (See Sect. 7 for details.) However, the limitations on the order
of the system equations hamper work on any but the simplest systems.
Ku and a number of others have extended the phase plane to phase
space. (See Refs. 24 and 25.) This is essentially a multidimensional
plot allowing solution of higher order equation. Phase space methods have
received only very limited use to date.
Analytical Methods of Constructing Phase Plane
Direct Method of Solution. If the equation of motion of the system,
eq. (6), can be integrated to obtain time solutions for X, then x as a function
of x can be obtained by eliminating time from the individual solutions,
and the relationship between x and x may be plotted directly.
Indirect Method of Solution. If the equations of motion cannot be
integrated to obtain time solutions for x and x, a new differential equation
in terms of dx and dx may be formed and solved to give x as a function of
x directly. The equations in this case reduce to the form of eq. (8).
EXAMPLE. Simple Relay Servo. Equations (13), Sect. 6, describing the
operation of a relay servo for a step input are repeated here for convenience.

d2x

dx

-+=
dr
dr

1,

x

< o.

Substitution of y = dx/dr, then dividing by y = dx/dr, and recognizing
that (dx/dr)/(dy/dr) = dx/dy yield:
dy
dx

'1

_0_,- 1,

y

x>

0;

(9)
dy

- =
dx

1

+-y

1,

x

< o.

25-40

FEEDBACK CONTROL

The variables can be separated in these equations and integrated:
x

= -

f

(10)

1:

x>

y dy,

x =f-Y-dY,

x

1-y

0;

< O.

These integrals can be found in any good table of integrals, and the
function can be plotted on the phase plane for different constants of integration. When plotted, the trajectories of eq. (10) would appear similar
to the sketch in Fig. 21.
dx

y= crT

FIG. 21. Phase trajectories for a simple relay servo. Each trajectory is for particular
constant of integration of eq. (10).

Obtaining Time from a Phase Plane Plot. It is possible to obtain
time, t, from a phase plane plot even though the original characteristic
equation of motion cannot be solved for x and x as functions of time. To
do this use the relationship:

(11)

T =fdT

=f~
=f~dX.
dx/dT
x

Equation (11) shows that if the phase portrait is replotted with l/x as
the ordinate and x the abscissa, the area under the resultant curve re-

NONLINEAR SYSTEMS

25-41

presents time. This method makes it possible to obtain plots of x and x
versus time. Graphical methods are also available for obtaining time from
the phase portrait. (See Ref. 26.)
Graphical Methods of Constructing Phase Plane

When the original characteristic equation of motion is nonlinear, the
integration of the equation obtained by the above method is difficult or
impossible. Graphical methods exist for solving the equation for a direct
plot of x versus x. One of the most useful methods is the method of isoclines, described in Refs. 20 and 23. Other graphical methods are also
available, e.g., Lienard's, arc-segment procedures.
Method of Isoclines. Equation (8) can be written in the form
(12)

dx

N(x, y)

dy

D(x, y)

Equation (12) is the slope of the phase trajectory. By setting eq. (12)
equal to a constant, the equation for the locus of a constant slope can be
obtained. One can then strike off lines of the proper slope along the locus.
After constructing sufficient loci of constant slope, the phase trajectories
can be sketched.
EXAMPLE. Simple Relay Servo. Equations (9) define the loci of constant
slope for the relay servo given in the previous example.
(9a)

(9b)

dy
1
-=---1

dx

y

dy
1
= --1
dx
y
,

-

x>

,
x

0;

< o.

Equation (9a) set equal to a constant provides the loci in the right halfplane. Equation (9b) describes the left half-plane.
For a +45 0 slope eq. (9a) becomes:
1
1 = - - - 1,
y

y =

-!.

Several values are tabulated in Table 7. The isoclines are constructed
in Fig. 22.

FEEDBACK CONTROL

25-42
TABLE

7.

VALUES OF ISOCLINES FOR SIMPLE RELAY SYSTEM

Value of y
Slope
00

+30 0
-30 0
+45 0
-45 0
+60 0
-60 0
+75 0
-75 0
+90 0
-90 0

Left Half-Plane
1
0.634
2.36
0.5

Right Half-Plane
-1
-0.634
-2.36
-0.5

±oo

±oo

0.366
-1.37
0.211
-0.366

-0.366
1.37
-0.366
0.211

o

o

Isoclines
oo------------------------------~ 1.0

,,-

0.8

,,-

0.6
45°--~~~~~~~~~~~r,~~~
60o--~~~~~~~~~~~+.~~~

0.4

75°--.r~r+_rT_r+_r~~_r~~~.r+_9 0.2

-

90 o --t--+-+--l-+-+--t-Ir--+-H+-I-t-t--t-t-t-t--t-t----0.2

- 0.4
-0.6
-0.8

-1.0
-1.2

-1.4
\

\

\

\

\

FIG. 22. Construction of a phase trajectory by means of isoclines for a simple relay servo.

NONLINEAR SYSTEMS

25-43

6. OTHER METHODS OF ANALYSIS

Differential Equations: Analytical Solutions

An often useful method of analytically obtaining the transient performance of a simple but useful class of nonlinear systems is by piecewise
linearization. Although the type of nonlinearity that can be treated is
restricted, higher order systems can be handled. (See Refs. 31, 32, and
33.) Other analytical methods of obtaining transient solutions are described in Sect. 5 and in Refs. 27, 29, and 30 and their bibliographies.
Piecewise Linear Systems. Many nonlinear control systems are
linear in well-defined areas of operation. At the boundaries of these linear
areas are discontinuities which make the system, when considered as a
whole, nonlinear. For such systems the linear differential equations can
be solved between boundaries and the boundary conditions matched to
obtain a complete solution.
Since it is generally desirable to obtain a solution under steady-state
conditions (or at least as steady-state conditions are approached), the
process using differential equations becomes quite laborious if there are a
number of reversal points. This is true even for a simple second order
system, and the process becomes more unwieldy for higher order systems
where more than two initial conditions are required at each reversal point.
Normalized Performance Charts. Kahn avoided some of the labor
in the differential equation approach by using a semigraphical approach that
recognizes the fact that the initial conditions at each boundary point are
a function of the velocity at the previous boundary and the time between
boundaries. (See Ref. 32.) This method loses its value for systems of an
order higher than second. Under these conditions more than two dimensions are needed to represent the curves. For a higher order system,
it is necessary to have more than one initial condition for each boundary;
for example, on a third order system, it would be necessary to have initial
conditions representing both velocity and acceleration. The higher derivatives would fall in the third dimension.
Summary of Steps in Piecewise Linear Analysis.
1. Prepare the complete system equations.

2. Break the complete equations into a set of linear equations representing the system operation between discontinuities.
3. Determine the boundary conditions at the discontinuities.
4. N ondimensionalize the equations as much as practical.
5. Rearrange the equation or change the dependent variable to make the
dependent variable independent of the input function at the discontinuities.

FEEDBACK CONTROL

25-44

6. Obtain the solutions to the equations.
presented graphically.
Relay

r

~o-=-

Motor

*'

FIG. 23.

These may often be best

-m

l/K u
Tm s+l

c

Block diagram of simple relay servo.

EXAMPLE. Piecewise Linear Relay Servo. A typical block diagram for a
relay servo is shown in Fig. 23. For the motor of the system of the figure,

d2c

m = KvTm2
dt

dc

+ Kv-·
dt

From the characteristics for a simple relay
m = - V for e

< 0,

m"= V for e > O.
Therefore,

d2 c

K vT m -

dt

K v Tm

2

d2 e
-

dt

2

dc

+ K v -dt

de

+ K v -dt

= -

V, for e

< 0;

= V, for e > 0;

e = r - e.
A typical transient to a step input for the servo of Fig. 23 appears in
Fig. 24. The driving voltage in the motor is reversed at each of the zero

.....r::
Q)

E
Q)
o

<0

0.
en

o

Time

FIG. 24.

Typical response of simple relay servo to a step command.

25-45

NONLINEAR SYSTEMS

error points 1, 2, 3, 4, 5, etc. Substitution of c
the variables by substituting

yield for a step input:
d2x

dx

-+-=
dT2
dT

(13)

d2x
dT2

-1

'

dx

+ dT =

1,

for x
for

= r - e and

normalizing

> 0;

x < o.

The solutions to these equations are obtained by taking the Laplace
transform of eqs. (13) and determining the inverse transform. The transforms of eqs. (13) after the switching at Tn are

(14a)
(1'1b)

+ sX(s)
S2X(S) + sX(s)
S2X(S)

=

=

+ SX(Tn) + X(Tn) + x(Tn)]e[l/s + SX(Tn) + X(T~) + X(Tn)]e[-l/s

TnS
,

Tn

\

< 0;
X > 0;
X

where X(Tn) = value of x at Tn,
Xes) = £[X(T)], .
X(Tn) = derivative of x with respect to T at Tn,
Tn = normalized time of the nth switching point.
Notice that with the exception of the first closure of the relay (in the
region 0 - 1 of Fig. 24), X(Tn) = 0 at the switching point and only X(Tn)
affects the characteristics of the transient. The velocity just before the
relay switches must equal the velocity just after the relay has switched.
This is the boundary condition relating the two eqs. (13). To obtain a
solution to eqs. (13), it is then necessary to apply the initial conditions at
each reversal point.
For eq. (14a) the inverse transform yields, when X(Tn) = 0,
(15)

x = - exp [-T

+ Tn]

- (T - Tn)

+ 1 + x(Tn)[l

- exp (-T

+ Tn)],

Tn+l' > T > Tn,
where Tn+l = normalized time at the n + 1 switching point.
Equation (15) is dependent only on the time from the last reversal and
the initial velocity. The time to reach the next reversal can be calculated
by' setting eq. (15) equal to zero and solving for the time difference Tn+l Tn. This equation is
exp (-T n+l + Tn) + Tn+l - Tn - 1
(16)
X(Tn) =
.
1 - exp (-'-T n+l
Tn)

+

FEEDBACK CONTROL

25-46

The velociiy at the next reversal can be obtained from the derivative of
eq. (15) at Tn +l:
(17)

X(T n+l)

= exp (-Tn+l

+ Tn)

- 1

+ X(Tn) exp (-T n+l + Tn).

Equation (14b) can be solved similarly. Since the system is symmetrical,
the equations corresponding to eqs. (16) and (17) will be respectively
identical except for sign.
Kahn avoided the labor of obtaining repetitive solutions to eq. (15)
by plotting eqs. (16) and (17) as shown in Fig. 25. The transient X(T)

Initial velocity, X(Tn)

X(a)

(a)

.

u

o
0.7

0.8

FIG. 26. The maximum modulus Mm and the frequency of the maximum modulus Wm
of a simple relay servo as a function of the argument of the describing function for a
relay. The operating conditions and the servo frequency response are given in Fig. 27.
The uncompensated characteristics result from setting the system gain sufficiently high
to meet sensitivity requirements, i.e., setting B. Maintaining the system gain and
adding a 3 to 1 lead gives the compensated characteristics. (Note that in practice there
would have to be a 3 to 1 increase in amplification to compensate for the 3 to 1 attenuation that a passive lead network would have.)

and Mm of a relay servo before and after compensation. Figure 27 shows
how G(jw) locus has been changed to improve system response by meeting
a criterion of a maximum M m ~ 2. The boundary of such an improvement can be quickly obtained by overlaying a Nichols chart with a graph
of the GD locus and observing the path of the G(jw) locus on the Nichols
chart as the origin is moved along the Gn locus. (See Fig. 28.) If a maximum Mm criterion is being used, the boundary of this Mm for all values

25-52

FEEDBACK CONTROL

(a)

III

~

{

serv~ motor response

I~-- G(jw)
w=0.125

+ 18

= . (11
JW

.

+Jw)

'0

Q)

"0

.g + 12
.a
'E

B/2M= 0.05

tlO

E +6
tlO

o

-l

-180

Phase angle

O----~~~~~~~~~~~-----------------

.-6

-12
-18

FIG. 27. Log magnitude-angle diagram of a simple relay servo; operating conditions
VjB = 6 db; JIjB = 0.5, Curve (a) is the uncompensated system, Curve (b) is the
system compensated with a 3 to 1 lead. The cross-hatched region is the necessary
modification to the uncompensated system to meet an Mm criterion of Mm < 2. For
the purposes of illustration a normalized response has been used for the servo motor
and only the ratio V jB has been defined.

of GD can be quickly sketched on the G(fw) graph. If sufficient work of
this type is performed, templates for several values of Mm can be built.
The necessary compensation networks can be determined either by trial
and error or by more elaborate methods discussed in Chap. 23.
Compensation for Relay Servomechanisms. Describing function,
phase plane, and piecewise linear analyses have all been used extensively
to determine'the necessary compensation. (See R~fs. 11, 18, 24, 31, 32,
and 44.) The describing function method is the most useful for higher
order systems. The techniques used in the preceding example can be
easily extended to higher order systems. The describing function analysis
and experiment normally check within engineering accuracy. (See Refs. 11
and 41.) A maximum Mm -< 2 criterion is generally typical in the design

NONLINEAR SYSTEMS

6

4
U)

a;
..0

'g
"C

Describing function

_...L
GD

+2
Phase angle, degrees

O~+-~--~--~--~--~--~-----------­

ar

:a -2

'E

till

ro

::?:

-4

-6
-8

FIG. 28. Log magnitude-angle diagram showing a method of estimating the necessary
compensation by overlaying with a Nichols chart. The origin of the Nichols chart is
first placed at B 12M = 0.67 and the M contour is sketched on the log magnitude-angle
diagram. This is noted as curve (a), where M m = 2 has been used for illustrative
purposes. The complete area of necessary compensation can be found by moving the
origin of the Nichols chart along the describing function locus. Another location is
shown at BI2M = 0.6, and the M contour is curve (b). The coordinates of the Nichols
.
chart are shown as dotted center lines.

of relay-positioning systems. However, experience wIth the condition
of the particular application may dictate a different value for maximum
Mm or a different criterion. Because of the phase lag of a relay with
hysteresis, lead compensation is quite useful. Two forms of su.ch compensation are tandem lead networks and rate feedback. The latter can be obtained either from a tachometer or from motor back electromotive force.
Nonlinear compensation can also be used to achieve better performance for
a particular type of input. See the paragraphs on Optimum Switching
Functions later in this section for details. Such compensating networks
must be used with care if more than one type of input is to be encountered,
because the performance will vary with the form and magnitude of the
input function. (See Refs. 42, 43, and 45.)
COIllpensation for Saturation. For a large number of systems, it is
necessary to follow a relatively smooth input within a very small error.
In order to provide economical components, the linear operating range
of these components is usually very little beyond that necessary to follow

8.

TABLE
Type
of Nonlinearity
Case I,
preamplifier
saturation

TYPICAL TYPES OF SATURATION EFFECTS AND METHODS OF CDMPENSATION

System Configuration and Characteristics
Block diagram

~
Open loop characteristic

~~'

~

" ....'''l

.

o

c::~
~ -----

""

.[D

O!

~

~

Case II,
power
amplifier
saturation

~
0 1 = integral compensation
02 = power element
Same open loop characteristics as Case I

Effect on System Performance
For an unconditionally stable system with saturation, the relative response will be slower for
large step inputs. As the step input level is
increased for a conditionally stable system,
the system will begin to exhibit less stable
characteristics until a critical value is reached
above which self-sustained oscillations will occur. For moderate saturation, the overshoot
may actually be less than for the linear system
although the settling time will be increased.
Saturation acts like a gain reduction in the
system and for saturation to cause system instability, instability must be predicted on a
linear basis for reduced gain. Oscillation frequency and amplitude can be estimated within
20 to 30% by describing functions. The presence of noise with the input signal causes an
effective increase of saturation beyond that
predicted for the input signal by itself. This
effect causes an increase in the closed loop
phase shift with respect to the input signal.
See Ref. 7 and Table 1.
Same effect as Case I but a reduction in the overshoot with moderate saturation has not been
observed. The degree of saturation (ratio of
saturation level to input to element if the system were linear) necessary to start self-sustained oscillations in a system will vary with
the location of the nonlinearity in the system.
A difference as great as 10 to 1 between the saturation from a step of r needed in the preamplifier and power amplifier has been noted.
(See Ref. 18.) In all the cases considered in Ref.
18, it was necessary to have sufficient saturation so that the gain was reduced to the point
where there was negative phase margin at gain
crossover; however, with preamplifier saturation· it was necessary to exceed this level of
saturation considerably to cause self-sustained
oscillations.

Possible Methods of Compensation
(a) Eliminate or reduce the integral compensation for large signal

inputs; e.g., typical magnitude and phase angle curves are
shown at the left. It is obvious that if the gain is reduced sufficiently, the region of negative phase margin will cause instability. However if a nonlinear compensating network is used
which for large errors eliminates or reduces the integral compensation, satisfactory performance can be obtained. The
dotted line shows the frequency response after such a change.
There are a number of methods by which such changes in the
compensation can be obtained. (See Refs. 18, 38, and Table
10.) The circuit constants are normally set experimentally.
(b) Modify the basic system operation for large signal levels. This
is essentially an extension of (a). However the basic mode of
operation is also changed. Examples are dual-mode servos
wherein the mode of controlling the power element is changed
with signal level, and two-speed servos wherein the feedback
signal gain is lowered. (Actually, in practice, the takeoff is
from a different speed shaft which gives rise to the appellation
two speed.) (See Refs. 39 and 40.) Often the signal used to
switch the feedback signal is used to modify the compensation
networks.
(a) See Case I (a) and (b).

Power amplifier saturation is similar
to torque saturation for which the dual servo techniques have
been developed. (See Refs. 23, 40, and Table 10.)
(b) Use of tachometer feedback around the saturation is effective.
(See Case IlL)
(c) If in place of 01 the integral compensation can be accomplished
by a filter (lead) network around the saturating element, the
system can be so designed that, as the system saturates, the
compensation automatically becomes less and the system will
not become unstable with saturation. (See Ref. 2.)

Case III,
power
amplifier
saturation
Gl

= integral compensation

G2 = power element
G3 = tachometer feedback

For a conditionally stable system saturation. the
gain is reduced in the tachometer loop lowering the crossover frequency. Wt. which lowers
the phase margin at the position loop. There
are two possibilities: (1) The phase margin at
the normal position loop crossover frequency,
We. will be lowered until the system becomes
unstable at the normal crossover frequency,
We. (2) The tachometer loop becomes ineffective before (1) occurs. and the position loop
gain will be lowered forcing the crossover
frequency down into the region where the
phase margin goes negative on account of the
integral compensation.· The effect depends
upon the constants of the particular system being considered. For (1) the oscillation frequency will be approximately the crossover
frequency. For (2) the frequency of oscillation will be closer to the integral compensation
time constants.

(a) Instability is not normally a serious consideration.
(b) The problem can generally he avoided by achieving the com-

pensation by a filter in the tachometer fredback rather than
in tandem elements in Gl.

J>.)

01

Ot

01

Case IV,
saturation
in feedback

~I

Same open loop characteristics aa Case III

Effect similar to Case I.

(a) Eliminate sa.turation if possible.
(b) See Cases I(a) and (b) and II(c) above.

25-56

FEEDBACK CONTROL

the input within the maximum allowable error. When such systems are
synchronized on a new operating condition or subjected to violent disturbances, they will inherently be highly saturated. This leads to reduced
performance. In addition, because such systems often use integral compensation to achieve high values of low-frequency gain, this can also lead
to serious overshoots and the attendant longer settling times or even instability. Normally the requirement on allowable error is not as necessary
during the synchronizing period, and a reduction in performance can be
tolerated. The major concern is, therefore, that the system should settle
rapidly and stably from large signals. The effects of saturation and the
methods of compensation for such systems are summarized in Table 8.
COlllpensation for Backlash. The effects of backlash and load resonance are the major limits on the performance that can be achieved with
a power servomechanism. The great quantity of published material attests
to the serious consideration that has been given to the problem. (See
Refs. 13, 14, 15, 18, and 46 to 52.) However, thoroughly satisfactory
methods for circumventing the effects of backlash are not available.
The basic effects (see also Table 5) are illustrated by the system of Fig.
29. For large input signals, the backlash is quickly taken up and has
very little effect upon performance. For low-level signals approaching the
magnitude of the backlash, the backlash tends noticeably to disconnect
the load· from the motor during signal reversals. Heuristically it can be
seen that this will cause the load to lag farther behind the motor than with
a linear system. Conversely because the motor is disconnected from the
load,. it will accelerate faster than normal in the backlash zone and the
motor position will therefore tend to lead the normal response. When
considered in terms of frequency response, if the primary feedback is from
the load, the effect of backlash increases the lagging phase shift and decreases the loop gain; if the primary feedback is from the motor, the effect
of backlash on system performance is much less severe and it actual1y
introduces a leading phase shift into the loop. For low signal levels, if
the load damping is viscous, the linearized equ?-tions of Fig. 29 can be used.
For low signal levels if there is appreciable Coulomb friction present, its
effects will. predominate, and ·the use of hysteresis to represent the backlash is more correct than the equations of Fig. 29. Therefore, it is necessary to evaluate carefully the type and extent of the damping present.
If the damping is viscous, the frequency of oscillation caused by backlash
will generally be at or higher than the normal linear gain crossover frequency. If the damping is of the Coulomb type, the frequency of oscillation will be lower than the gain crossover frequency. The amplitude in
either case will be small (one to several times the backlash angle, normally).
Methods of analysis. Phase plane, piecewise linear, and describing function methods of analysis have been used successfully. The describing

NONLINEAR SYSTEMS

25-57

Reflected torque

I

I
I

Ke
_____________
JI
'-----I

Primary motor feedback
- - Primiryloadfeedback-- - - - -

Basic linearized equations:

S3

+ [20 + Wm]s2 + [(1 + ~~) aWL2 + 28wm Js + aWL2 [28 ;: + Wm J'

SOM

+ 28s + WL2]
+ ;~) aWL2 + 20wmJ S + aWL2 [28 ;: + wmJ'

(l/Ke)Wm[S2

Va = S3

+ [28 + Wm]S2 + [(1

where a -= IGD I, magnitude of describing function for dead band;
KeKT.
.
I d
Wm = - - , motor tIme constant wIthout oa ;

h.[R

WL 2 = KL, load mechanical resonant frequency;

JL

o=

DL,
-

2JL

Ioa dVISCOUS
'
d ampmg.
.

Other constants are defined in Table 5.
FIG. 29. Typical shunt d-c machine and load with backlash representation. The basic

linearized transfer functions are given in a nondimensionalized form.

function methods are the most generally useful for a paper study. However, the complexity of the problem warrants the use of an analog computer for thorough investigations.
As just noted and in Table 5, the representation used for backlash will
vary depending upon the constants involved. If the hysteresis representation is chosen, the usual describing function methods of analysis can be
used. If the more complex representation of Fig. 29 is chosen, the method
of equivalent coefficients, Sect. 4, is recommended.
General Design Considerations. In the design of a power servomechanism, the following basic effects should be given consideration:
1. Use of tandem integral compensation increases the magnitude of
sustained oscillations. The effect can be reduced by the use of dead space.

TABLE

Type of
Corrective
Measure
1. Mechanical design

2. Divided
reset

9.

USEFUL CORRECTIVE

Techniques for Backlash Compensation
The backlash can be reduced by improving the
grade of gears used and the tolerance on the center distances, by ha ving adjustable center distances, or by numerous other special design
and/or assembly procedures.
(b) Spring loading. There are various methods for mechanically spring
loading the gear trains to take up the backlash. One method used in
lightly loaded gear trains is shown at the right. This can be extended
to the point where the entire gearing is completely divided and onehalf spring loaded against the other half. This takes up the backlash
throughout the entire gear train.
(e) Split drive. By using two driving motors biased in opposite directions
and separate gear trains for each motor, the motors will drive against
each other to the extent of the bias and take up any backlash. A hydraulic drive of this type is shown at the right.
(d) One speed motor. Eliminate the gearing by driving the load directly
from the motor shaft. (See Ref. 52.)
(a) Improved precision.

(a) Solid motion feedback.

Consider the idealized backlash at the right.
There must be a point with a displacement somewhere between the
displacements of J m and J L that responds only to the externally applied forces. The displacement of this point (center of mass) is called
the solid motion. All supplementary motions of J m and J L relative to
the solid motion must then be due to mutual forces that occur upon
collision or separation and the momentum of the supplementary motions must be equal and opposite. Instrumenting the solid motion
would give a signal which did not contain the backlash effects. This
point cannot be physically instrumented but by adding signals from
the load and motor in the proper proportion the supplementary motions can be cancelled and only the solid motion will remain. From
the principles of conservation of momentum:
.
JM.
Xs = J M
J L Xm

+

+

JL
J L

.

+ J M XL,

where xs is the rate of change of the solid motion. (See Refs. 14, 15,
and 49.) A method of instrumenting the technique is shown at the
right.
(b) Artificial damping. Reducing the load-resonant peak and increasing
the apparent load-resonant frequency ameliorates the backlash effects. This can be done by the proper feedback of the relative load
and motor motions .. This includes position, velocity, and acceleration
differences. A configuration containing such a feedback is shown at
the right. The circuit constants can be determined by frequency
response techniques by assuming the system to be linear as in Fig.
29. Experimental adjustment will be necessary.

3. Network
compensation

(a) Dead zone compensation.

Backlash oscillations are low amplitude.
Use of a dead zone in the error channel opens the loop to low amplitude
signals and will stop certain types of backlash oscillation. (See Ref.
48.) The dead zone will be the same order of magnitude as the backlash.
(b) Frequency-sensitive networks. Compensating networks, gain changes,
parallel-tandem networks, etc., can be used in combination with deadband or separately to get the desired gain-phase change at low signal
levels. (See Ref. 28.) Lead networks are particularly effective in
compensating for the lagging phase shift of backlash.
25·58

TECHNIQUES FOR BACKLASH

General Remarks
There are many other methods besides the
two shown for mechanizing the concepts
Driving shaft
of (b) and (c). (See Ref. 49.) In general,
these methods are costly and increase the
friction and wear in the gear train.
Electric and hydraulic models of one-speed
motors have been built and tested. The
low-speed high-torque requirement makes
Spring tension rotates gears in opposite
the electric unit bulky and heavy.
direction until backlash is taken up.
Torque is transmitted through springs General mechanical design considerations
are outlined in the text.
to shaft.
Typical Diagrams

J m = motor inertia
J L = load inertia
Idealized Backlash

When the feedback (or reset) is from (or divided between) the motor and the load it
is called divided reset. The concept as discussed is highly simplified and in practice
the configuration and constants will have
to be adjusted to suit the particular case.
When J m » J L, the motor and the center
of mass follow closely and rate feedback
from the motor alone is effective in damping the system. Approximate scheines for
obtaining the rate feedback from the motor back emf, etc., are often adequate.
Solid motion position feedback can be obtained in the same manner; however, this
position signal can differ from the actual
load position by as much as BJm/(Jm +
J L), and often the addition error cannot
be tolerated.
Feedbacks of the type in (b) are very effective. Generally, position feedback alone
is sensitive to system parameter variations; rate and position feedback in combination are quite insensitive; acceleration feedback is very sensitive. Position
difference feedback increases the resonant
frequency, and rate difference feedback
increases the damping.

This method has been used for drives with a
low-load inertia to motor inertia ratio (referred to the same speed) and with sufficient friction to keep the ioad from much
coasting. Under these conditions, it is
effective and the error is small. More
complex schemes of sensing the proper
time to modify the gain characteristics
are possible but usually are not justified.
25-59

25-60

FEEDBACK CONTROL

(See Table 9, item 3.) The effects of integral type compensation achieved
by tachometer feedback and.a lead filter have not been as thoroughly
documented. However, the same general tendency is apparent ..
2. Increasing the load mechanical resonant frequency, (KL/J L)Yz, reduces
the magnitude of the sustained oscillations.
3. It would be desirable to have mechanical load damping ratios greater
than 0.1. These would probably be undesirable on larger drives because
of the large power loss involved. However, there are methods of increasing
the damping electrically (see Table 9).
4. Primary feedback from the motor gives more stable operation than
from the load.
Table 9 summarizes various methods of compensating for backlash.
None of the schemes is· perfect in the practical case, but all provide a
certain relief from the problem. The final choice usually includes considerations of weight, size, and cost, as well as performance.
R2

R eo
3

(a)

R3
C

C

T

0

(c)

0

0

T TTo
(d)

FIG. 30. Typical nonlinear compensating circuits. As shown the circuits vary with the

input variable but circuits (a) and (c) can be adapted to vary with an independent
variable: (a) nonlinear gain circuit with characteristics that vary with input voltage to
increase gain for low-level signals; (b) resistance characteristics of the voltage sensitive
resistor Rv; (c) nonlinear time constant circuit that reduces the time constant for large
input signals; (d) nonlinear time constant circuit that eliminates the time constant and
reduces the low-frequency gain for large input signals.

NONLINEAR SYSTEMS

25-61

Nonlinearities to Improve System Response

N onlinearities used for improving system performance in general involve methods for (a) reducing the response time and/or (b) minimizing
overshoots by more fully utilizing the performance available in the power
element(s). Many of these methods accelerate the system rapidly for
large errors and increase the relative damping for small errors so that
operation is smooth (very stable). Table 10 summarizes several typical
methods of nonlinear compensation and refers to typical circuits shown in
Figs. 30 and 31.
r

+

-

e

+

-

Cm

/

+

C

Tach.

r--

Nonlinearity
(a)

Input to amplifier

Error

R4
R3
(x)

R2

T'C

2

(b)

Feedback

FIG. 31. Typical nonlinear feedback compensating circuits: (a) nonlinear rate feedback
to minimize overshoot from large signals ; (b) nonlinear stabilizing circuit for switching

feedback compensation for large errors or feedback rates. To obtain the proper characteristics it may be necessary to use an isolation amplifier at (x).

Because of the difficulty of specifying the required characteristics
mathematically' and the impracticality of instrumenting the ideal characteristics for all but the simplest systems, nonlinear compensation is obtained by ~mpirical means in practice.

t-.)

TABLE
Type

10.

TYPICAL NONLINEAR METHODS OF COMPENSATION
Description of Technique

Block Diagram

Gm is the transfer function of the doc output motor, G1 is the transfer function of a convention-

1. Lewis
servo

al tachometer and the dotted block represents
the transfer function of a second tachometer in
which the term x denotes the product of Kl and
sc. The field of this second tachometer is excited by the amplifier error signal so that the
output is proportional to the error magnitude
times the output speed. This output is subtracted from the output of the first tachometer
and results in a value of damping which is low
for large errors and which increases as the error
decreases (Ref. 54).
2. Tandem
compensation
li = lilTtar elements

Generally, the relative damping is decreased and
the frequency response increased. For instance, the solid curve represents the normal
response for small signals, and the dotted curve
the response for large signals.

il~~·'M~.
~

Normal

- ..........__ ....,

L_

3. Feedback
compensation

a:::o::::::::::

...

This is the same basic approach as (2), but the
feedback allows gain and time constant changes
to be made as functions of error and the deriva. tives of the error. This can be used to alleviate
the problem of reaching zero error with high
derivatives existing. The needed functions are
nonlinear but can often be approximated adequately by'linear circuit components and diodes.

Ot

0t\)

General Remarks
It is possible to choose values such that this system will
give a very fast initial response to a step input with
no overshoot. However, if a step input in one direction is followed by an unequal one in the opposite
direction before the error caused by the initial step
is corrected, the system can become unstable (Ref.
45). This tendency toward instability can be corrected by limiting the magnitude of the term Ie I e
to some experimentally determined maximum value.
"T1

m

m

oc:J

The increased bandwidth can be accomplished by adjusting either the controller gain and/or time constants. Both methods have been used with success
(Ref. 36). Operating on the time constants of the
stabilizing network is particularly desirable. This
allows the reduction of energy storage elements in
the system which can give undesirable lags in synchronizing. Typical circuits to give gain and time
constant change with signal level are shown in Fig.
30. Circuit constants are determined experimentally. Since the performance of the system is dependent upon the characteristics of the inputs, one
must completely define the input.
Figure 31 gives two typical circuits for accomplishing
nonlinear feedback compensation. Circuit (a) is a
modification for a standard tachometer stabilized position servo. The form of the feedback function depends upon the servo characteristics (see Optimum
Switching Techniques). Circuit (b) is a more elaborate feedback circuit where error and feedback rate
are combined to switch from normal feedback to a
feedback which provides more rapid response. Note

»
n

'"

n

o
Z

-I

::0

o
....

that for high feedback rates and low errors the switch
will open (the diodes stop conduction) and normal
stabilization will come into play during synchronization. Under extreme conditions the switch may actually reverse polarity to allow rapid deceleration.
4. Optimum
switching
techniques or
minimum
response
time
systems

Optimum switching is the controlled switching of
power to the motor to reduce the error and its
derivatives to zero in the minimum possible
time, recognizing only the limitations on the
performance of the motor. For example, the
optimum response to a step input of a second
order system with torque limiting is to accelerate at the maximum rate about halfway and
then switch and decelerate at the maximum
rate for the remaining distance. By proper
selection of the switching point the system will
arrive at zero error with zero error rate, and if
the torque is removed, the system will remain
at rest with no further corrective action. See
the example in the text. Table 11 gives the optimum switching functions for several second
order systems. The number of switching points
needed to respond in the minimum time to a
step input is (n - 1) where n is the order of the
system. (See Ref. 24.) Excessive switching at
low signal levels can be avoided by having a
small deadband at the null. Smoother operation for small signals can be obtained by changing the mode of operation and having a small
linear band at null. This has been called dual
mode operation. (See Ref. 23.)

It is difficult to mechanize the optimum switching function for sys'tems higher than the second order. However, the optimum performance can be approached
closely without going to the complexity of (n - 1)
switching points. This approximation can be made
analytically by deriving a nonlinear function (of one
variable) that gives a response that approaches the
optimum response. This technique is explained in
Ref. 57. The approximation can be arrived at empirically by using the basic second order system
switching functions and modifying them by experience and experiment to provide satisfactory performance for higher order systems. The optimum switching technique is not limited to relay servos. The
"switching" can he the saturation of some element in
the system. In any case the optimum response is
obtained only for the designed input; Le., systems designed for step inputs show poorer response for velocity inputs. Because the optimum response is the
minimum time that a power element can make a
correction it provides a good basis for rating system
performance. The ratio of response time to the optimum response time is a useful index of system performance. (See Ref. 55.)

z

o

Z
r-

Z
m
»AJ
en
-<
en
-t
m
?;
en

....,
01

b.
(..)

25-64

FEEDBACK CONTROL

Optimum Switching Techniques to Obtain Minimum Time
(Refer to Fig. 32.) It is assumed that the amplifier gain is

EXAMPLE.

Response.

Motor

c

Nonlinear rate
feedback, 4(c)

FIG. 32. Nonlinear control system with a very high gain amplifier and torque saturation ±Tm.

sufficiently high so that the motor operates with full voltage on it for all
but very small errors. The system equations are:

±Tm = J'C

+ Dc,

m = e-

fCc),

(21)

e = r - c,
where

c=

c = d c/dt
+Tm for m > 0,
-Tm for m < O.
dc/dt,

2

2

,

For a step input of r:
'TTm =

(22)

Je + De,

m = e - h(e).

Equation (22) can be solved independent of time to yield a series of
trajectories in the phase plane, Fig. 33. The coordinates of this plane are
error, e, and error rate, e.
There is only one trajectory which passes through the origin, and it will
provide the optimum system response if the torque to the motor is reversed when
this trajectory is reached.
From Fig. 33, it is seen that proper choice of the function Ce, e) = e - fCe)
will provide the intelligence to perform the necessary switching function.
A nonlinear tachometer feedback will then provide the necessary switching
information.

NONLINEAR .SYSTEMS

f(e)

25-65

= optimum switching line
Optimum switching
point for input
El

Maximum
deceleration
at - Tm

Error rate,

e

Maximum
acceleration
at + Tm
Error, e

FIG. 33. Phase portrait of the performance of the control system of Fig. 32, showing
the optimum switching line where the torque must be reversed to bring the system to
rest with no overshoot. When the quantity e - f(e) goes to zero, the torque should
be reversed.

OptilllUlll Switching Functions. The form of the system characteristic equation will dictate what the optimum function should be. Several
typical cases as derived in Ref. 56 are given in Table 11.
TABLE

11.

TYPICAL OPTIMUM SWITCHING FUNCTIONS FOR SECOND ORDER SYSTEMS
WITH LIMITED TORQUE,

System
Type
Undamped
Viscous damped

Torque Equation
(See Fig. 32)
d2e
±Tm = J dt 2
d 2e

Tm

Optimum Switching Function in the Fourth Quadrant

e",.,
e
de

± T m = J dt 2 + D dt

2Tm
= Te,

" de
e = dt

JD)

TmJ
(
e = - --log
D2
e 1 - -Tm

J.

-De,
• de
e =-

dt

Coulomb damped a

.,.,

Tm (

e e = 2T

Tf)
1 + T m e,

• de
e =dt

a Tf(C)

is positive for c

> 0 and negative for c < 0 and is a constant in either case.

25-66

FEEDBACK CONTROL
REFERENCES

1. L. A. MacCall, Fundamental Theory of Servomechanisms, Van Nostrand, Princeton,
N. J., 1945.
2. J. G. Truxal, Control System Synthesis, Chap. 10, McGraw-Hill, New York, 1955.
3. J. C. Lozier, A steady state approach to the theory of saturable servo systems,
I.R.E. Trans. on Automatic Control, May 1956.
4. K. Klotter, Steady state vibrations in systems having arbitrary restoring and arbitrary damping forces, Proc. Symposium on Nonlinear Circuit A nalysis, Vol. II, Polytechnic Institute of Brooklyn, New York, 1953.
5. E. Levinson, Some saturating phenomena in servo mechanisms with emphasis on
the tachometer stabilized system, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, (1953).
6. N. Minorsky, Introduction to Nonlinear Mechanics, Edwards, Ann Arbor, Mich.,
1947.
7. R. G. Wilson and I. H. Van Horn, The Effect of Noise on Rate-Limited Systems,
Rept. No. GER2328, Goodyear Aircraft Corp., Feb. 22, 1952.
8. C. A. Ludeke, The generation and extinction of subharmonics, Proc. Symposium
on Nonlinear Circuit Analysis, Vol. II, Polytechnic Institute of Brooklyn, New York,
1953.
9. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. II, Chap. 10, Wiley, New York, 1955.
10. H. D. Greif, Describing function method of servomechanism analysis applied to
most commonly encountered nonlinearities, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2,
243-248 (1953).
11. R. J. Kochenburger, Frequency response method for analyzing and synthesizing
contractor servomechanisms, Trans. Am. Inst. Elee. Engrs., 69, Pt. 1, 270-284 (1950).
12. E. C. Johnson, Sinusoidal analysis of feedback control systems containing nonlinear elements, Trans. Am. Inst. Elee. Engrs., 71, Pt. 2, 169-181 (1952).
13. N. B. Nichols, Backlash in a velocity lag servomechanism, Trans. Am. Inst. Elee.
Engrs., 72, Pt. 2, 462-466 (1953).
14. A. Tustin, The effects of backlash and of speed-dependent friction on the stability of closed-cycle control systems, J. Inst. Elec. Engrs. (London), 94, Pt. 2A, 143-151
(1947).
15. K. N. Satyendra, Describing functions representing the effects of inertia, backlash, and Coulomb friction on the stability of an automatic control system, Trans. Am.
Inst. Elee. Engrs., 75, Pt. 2, 243-248 (1956).
16. R. J. Kochenburger, Limiting in feedback control systems, Trans. Am. Inst. Elec.
Engrs., 72, Pt. 2, 180-192 (discussion), 192':""194 (1953).
17. V. B. Haas, Coulomb friction in feedback control systems, Trans. Am. Inst. Elec.
Engrs., 72, Pt. 2, 119-123 (discussion), 123-126 (1953).
18. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. II, Chap. 8, Wiley, New York, 1955.
19. K. Klotter, How to obtain describing functions for nonlinear feedback systems,
Am. Soc. Mech. Engrs., IRD Paper No. 56-IRD-5, August 1956.
20. J. J. Stoker, Nonlinear Vibrations in Mechanical and Electrical Systems, Interscience Publishers, New York, 1950.
21. T. M. Stout, Basic methods for nonlinear control system analysis, Am. Soc.
Mech. Engrs. Paper No. 56-IRD-9, July 1956.
22. A. M. Hopkins, A phase-plane approach to compensation of saturating servomechanisms, Trans. Am. Inst. Elec. Engrs., 70, Pt. 1, 631-639 (1951).

NONLINEAR SYSTEMS
23. D. McDonald, Nonlinear techniques for improving servo performance, Proc.
Nall. Electronics Conference, Vol. VI, pp. 400-421, National Electronics Conference, Inc.,
Menasha, Wis., 1950.
24. 1. Bogner, and L. F. Kazda, An investigation of the switching criteria for higher
order contactor servomechanisms, Trans. Am. Inst. Elec. Engrs., 73, Pt. 2, 118-126
(discussion), 126-127 (1954).
25. Y. H. Ku, A method for solving third and higher order nonlinear differential
equations, J. Franklin Inst., 256, 229-244 (1953).
26. J. G. Truxal, Automatic Feedback Control System Synthesis, Chap. 11, McGrawHill, N ew York, 1955.
27. T. J. Higgins, A resume of the development and literature of nonlinear control
system theory, Am. Soc. Mech. Engrs. Paper No. 56-IRD-4, July 1956.
28. C. H. Shen, H. A. Miller, and N. B. Nichols, Nonlinear integral compensation of a
velocity-lag servomechanism with backlash, Am. Soc. Mech. Engrs. IRD Paper No.
56-IRD-3, August 1956.
29. T. M. Stout, A step-by-step method for transient analysis of feedback systems
with one nonlinear element, Trans. Am. Inst. Elec. Engrs., 75, Pt. 2, 378-389 (discussion),
389-390 (1956).
30. J. G. Truxal, Numerical analysis for network design, Approximation Papers,
Trans. I.R.E., PGCT-CT-1, 4-64, September 1954.
31. H. L. Hazen, Theory of servomechanisms, J. Franklin Inst., 218, 279-331 (1934).
32. D. A. Kahn, Analysis of relay servomechanisms, Trans. Am. Inst. Elec. Engrs.,
68, Pt. 2, 1079-1088 (1949).
33. J. W. Schwartz, Piecewise linear servomechanisms, Trans. Am. Inst. Elec. Engrs.,
72, Pt. 2, 401-405 (1953).
34. M. J. KIrby, Stability of servomechanisms with linearly varying elements, 'Trans.
Am. Inst. Elec. Engrs., 69, Pt. 2, 1662-1667 (1950).
35. M. J. Kirby and R. M. Guilianelli, Stability of varying-element servomechanisms
with polynomial coefficients, Trans. Am. Inst. Elec. Engrs., 70, Pt. 2, 1447-1451 (1951).
36. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design,
Vol. II, Chap. 9, Wiley, New York, 1955.
38. E. S. Sherrard, Stabilization of a servomechanism subject to large amplitude
oscillation, Trans. Am. Inst. Elec. Engrs., 71, Pt. 2, 312-324 (1952).
39. J. C. West, A system utilizing course and fine position measuring elements in
remote-position-control servo mechanisms, Proc. I.R.E., 99, Pt. 2, 135-143 (1952).
40. D. McDonald, Multiple mode operations of servomechanisms, Rev. Sci. Instr.,
23, 22-30 (1952).
41. s. K. Chao, Design of a contactor servo using describing function theory, Trans.
Am. Inst. Elec. Engrs., 75, Pt. 2, 223-231 (1956).
42. H. G. Doll and T. M. Stout, Design and analog computer analysis of an optimum
third-order nonlinear servomechanism, Am. Soc. Mech. Engrs. Paper No. 56-IRD-10,
July 1956.
43. J. C. West and P. N. Nikiforak, The frequency response of a servomechanism
designed for optimum transient response, Trans. Am. Inst. Elec. Engrs., 75, Pt. 2,
234-239 (1956).
44. J. E. Hart, An analytical method for the design of relay servomechanisms, Trans.
Am. Inst. Elec. Engrs., 74, Pt. 2, 83-89 (discussion), 89-90 (1955).
45. R. R. Caldwell and V. C. Rideout, A differential-analyzer study of certain nonlinearly damped servomechanisms, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, 165-169
(discussion), 169-170 (1953).

25-68

f=EEDBACK CONTROL

46. A. A. Clark and H. J. Pixley, Effects of non-linearities in multi-loop lead angle
prediction systems, Ani. Soc. Mech. Engrs. Paper No. 56-IRD-18, July 1956.
47. H. T. Marcy, M. Yachter, and J. Zauderer, Instrument inaccuracies in feedback
control systems with particular reference to backlash, Trans. Am. Inst. Elec. Engrs., 68,
Pt. 1, 778-788 (1949).
.
48. F. J. Ellert, Feedback in contouring control systems, Am. Inst. Elec. Engrs.
Second Feedback Control Conference, April 1954.
49. D. C. McDonald, Backlash compensation improves servo system operation,
Instruments and Automation, 28 [10], 1728-1731 (1955).
50. C. H. Thomas, Stability Characteristics of Closed-Loop Systems with Dead Band
Frequency Response, 288-305, R. Oldenburger, Editor, Macmillan, New York, 1956.
51. R. L. Hovious, Jitter in instrument servos, Trans. Am. Inst. Elec. Engrs., 73, Pt. 2,
393-398 (1954).
52. F. M.Bailey, Performance of drive members in feedback control systems, I.R.E.
Trans. on Automatic Control, PGAC-I, May 1956.
53. J. H. Liversidge, Backlash and resilience within the closed loop of automatic
control systems, in Automatic and Manual Control, A. Tustin, Editor, Butterworths,
London, 1952.
54. J. B. Lewis, The use of non-linear feedback to improve the transient response of
servomechanisms, Trans. Am. Inst. Elec. Engrs., 71, Pt. 2, 449-453, (discussion') 453
(1952).
55. R. S. Neiswander and R. H. MacNeal, Optimization of non-linear control systems
by means of non-linear feedbacks, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, 260-270,
(discussion) 270-272 (1953).
56. T. M. Stout, Effects of friction in an optimum relay servomechanism, Trans. Am.
Inst. Elec. Engrs., 72, Pt. 2, 329-335, (discussion) 335-336 (1953).
57. R. E. Kuba and L. F. Kazda, A phase space method for the synthesis of nonlinear
servomechanisms, Trans. Am. Inst. Elec. Engrs., 75, Pt. 2,282-289 (discussion), 289-290
(1956).

E

FEEDBACK CONTROL

Chapter

26

Sampled-D~'ta Systems
and Periodic Controllers
John E. Barnes, Jr.

1. Description and Deflnition of Sampled-Data System

26-01

2. Methods of Transient Analysis
3. Sampled-Data System Stability

26-06

4. Sampled-Data System Synthesis

26-20

26-15

References

26-32

1. DESCRIPTION AND DEFINITION OF SAMPLED-DATA SYSTEM

Definition of SaIllpled-Data SysteIll. Systems which operate on
data obtained at discrete intervals of time are called sampled-data systems_
The information obtained at a particular instant is called the sample.
Normally the intervals are equally spaced in time and the amplitude of the
sample is proportional to the amplitude of the signal.
Characteristics of SaIllpled-Data SysteIlls
Basic EleIllents. Figure 1 shows the basic elements of a sampled-data
system: the sampler and the continuous elements. They may appear in
various configurations, and there may be more than one sampler in the
system. The output from the sampler is a train of pulses which is denoted
by a starred symbol; that is, the output of a sampler whose input is e(t) is
written e*(t).
26-01

26-02

FEEDBACK CONTROL

Linearity. If the continuous elements are linear, the sampled-data
system is linear and the superposition theorem is valid. A sampled-data
system has regular time discontinuities, but the techniques of analysis
by the use of solutions of the linear constant-coefficient differential equations of the system' are directly applicable.
ret)

+

e(t)

(a)

e(t)

Ie'lt)
1111111
°1 -lTk(b)

II

II

I

t

(c)

FIG.1. Sampled-data system and sampler input and output signals: (a) simple sampleddata system; (b) continuous error function; and (c) sampled error function.

The Salllpler. The sampler acts as a pulse modulator of the input and
generates a train of pulses. This action introduces high frequencies into
the system which may be attenuated by a linear filter. The information
contained in the input signal may be recovered with reasonable fidelity if the
sampling frequency is at least twice the highest frequency component of the
input signal. Figure 2 shows the effect of sampling frequency upon the
frequency spectrum of the output of the sampler.
Use of Salllplers. Sampled-data systems may be used for several
reasons:
1. To use a digital computer as part of the controller. The input data
must be in sampled form.
2. To use simpler, low-powered control elements.
3. To realize the beneficial effect which sometimes accrues when sampled-'
data systems are used for the process control of plants having inherent
dead time.

SAMPLED-DATA SYSTEMS AND PERIODIC· CONTROLLERS

26-03

\E(jw) I

----~--~~~---W

(a)

\E*(jw) \

3ws

-2

-W s

Complementary components

Ws

Ws

-2

Ws

"2

Primary
component

Complementary components

(b)

IE*(jw)

I

--~~~~~~~~~~~~~~~~~~~~--w

-4ws -3ws -2ws -W s _~ 0
2

Ws

2"

Ws

2ws

3ws

4ws

(c)

FIG. 2.

Sampler transfer characteristics in the frequency domain. (a) Amplitude
spectrum of sampler input; (b) amplitude spectrum of sampler output, sampling frequency greater than twice the maximum. signal frequency; (c) amplitude spectrum of
sampler output, sampling frequency less than twice the maximum signal frequency.
Ws = sampling frequency (Ref. 5).
Reprinted by permission from J. D. Truxal, Automatic Feedback Control Systems Synthesis, Copyright 1955 by McGraw-Hill Book Co.

4. To use pulsed-data information. The input information may be
available in discrete samples as in a guided missile control system or as in
certain track-while-scan radar systems. Sampled-data systems may be used
to advantage where digital sensors are already available.
Description of Typical Sampled-Data Systems
Digital Computer in the Controller. Figure 3 shows a typical digital
system. The sampling and coding unit converts continuous data into
pulsed data. The digital computer performs a series of operations on the
pulsed data and presents the results in pulsed form to the holding and
decoding unit which reconverts the results into (approximately) continuous

26-04

FEEDBACK CONTROL

signals for use by the continuous control equipment. The feedback may
transmit the data in either pulsed or continuous form.

..!J.!4

Sampling
and
Digital
coding ~ computer
unit

~

Holding
and
decoding ~
unit

Control
equipment
(conventional)

cit) --'"

I

Feedback

FIG. 3.

Typical sampled-data control system. COIlventional control equipment is
continuous.

In practical operating· systems, a typical method of converting from a
continuous variable available as a shaft rotation to a binary code number
which represents its magnitude and polarity is to use an encoding device
such as a circular binary pattern shown in Fig. 4. The circular tracks may

i

.r'

FIG. 4. Circular binary pattern for analog-digital conversion. The lines across the
pattern show that accurate angular position of the photocells or brush contacts is
necessary to avoid errors in conversion.

be scanned radially with a photoelectric cell or brush pickoffs. The output
will be the binary pulse code which represents a particular position of the
circular binary pattern; the pattern shown can resolve a circle into 26 = 64

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS

26-05

parts. Although the encoder shown is for angular rotation, devices have
been manufactured for conversion of pressures and flows to digital form.
Techniques for converting analog voltages to digital form are also available.
(See Vol. 2, Chap. 20.)
Periodic Process Controller. A' typical sampled-data regulator for
process control is shown in Fig. 5.

FIG. 5.

Typical sampled-data process regulator.

The typical stepwise process controller monitors the controlled variable
periodically (every nT) and makes a control adjustment at each sensing
instant. In process regulation, the usual (and perhaps the most useful)
form of control actuator is a servo motor, which serves as a low-pass filter
and also serves to reset the error detector. The following description of a
periodic controller is taken from Oldenbourg and Sartorius (Ref. 1).

Motor
armature
lead

To process control valve

FIG. 6.

Schematic form of a periodic controller (chopper bar relay) (Ref. 1).

26-06

FEEDBACK CONTROL

From a constructional standpoint, the periodic controller operates about
as follows. Through a sensing device, such as a meter pointer, the control
variable is observed at equal time intervals. Then, by auxiliary power,
additional members of the control loop are suitably actuated according to
the sensed position of the pointer.
EXAMPLE. The Chopper Bar Controller. See Fig. 6. As long as the
meter pointer stands between the two cont'act springs the circuit remains
broken, even during the sensing instants. It is closed only when the pointer
leaves its mid-position. The duration of closure increases with deviation
of the pointer. If the contact closure is used to actuate a reversible
constant-speed motor, the control action is called astatic (never quiet)
because, with constant actuating error, the control motor moves intermittently across its entire range at an average speed (roughly) proportional
to the pointer deviation~ Although periodic controllers may have static
correspondence between deviation and motor motion, astatic action will
be assumed here because of its greater practical significance (Ref. 1).
2. METHODS OF TRANSIENT ANALYSIS

Basic Mathelliatical Relationships
Analysis of Sallipler. The output of the sampler (see Fig. 7) is the input modulated by the sampler into a train of pulses:
00

(1)

e*(t) = e(t)

00

2: uo(t- nT)

=

2: e(nT)uo(t -

nT),

where uo(t - nT) = the impulse or Dirac delta function occurring at
t = nT, in which
.1
u(t - nT) - u(t - nT - a)
uo(t - nT) = lim
,
a-tO
a
T = the sampling period,

n = an integer,
e(nT) = the value of the input at the sampling instant.
The Laplace transform of eq. (1) may be written:
00

(2)

E*(s) =

2: e(nT)e-nTs ,
n=O

or eq. (1) may he written in the frequency response form:
1 00
(3)
E*(s) = - 2: E (s + jn27r/T).
T n=-oo

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROllERS
'-,

e(t)

FIG. 7.

~ e*(t)

1
1

g(t)

26-07

e*(t)
"0---

e(t)

Showing basic mathematical relationships of a sampler.

NOTE. Equation (3) may be derived by performing the complex convolution of the input e(t) and the train of unit impulses generated by the
sampler, namely
(4)

E*(s) = E(s)®£

[n~ uo(t -

nT)] ;

where ® is the symbol denoting complex convolution.
Notice that
£

L~ uo(t -

nT)] = 1

+ e-,T + e-2,T + ...
1

or in closed form:

[1 - exp (-sT)]

1

has only simple poles at s = jn27r/T, the complex
[1 - exp (-sT)]
convolution reduces to eq. (3).
Sllloothing the Salllpled Data. Normally, the high-frequency components generated by the sampler are removed before the signal reaches
the output. Often in sampled-data servo systems, a large portion of the
smoothing is accomplished by the components (motors, etc.) between
the sampler and the output. Sometimes more smoothing is necessary.
One particularly simple low-pass filter is the holding circuit or -boxcar
generator. In this circuit, the value of a sampling pulse is held until the
next pulse arrives, whereupon the circuit assumes the value of the new
pulse. The transfer function of such a network is that of a rectangular
pulse of unity height and of T seconds duration, namely
Because

(5)

1
GH(s) = - [1 - e- sT ].
s

Response of a Continuous Filter to Salllpled Data. The response

of a continuous transfer member get) of Fig. 7 is
00

(6)

e(t) =

:E g(t)e(nT)uoCt n=O

nT).

FEEDBACK CONTROL

26-08

Equation (6) has the Laplace transform
C(s) = E*(s)G(s).

(7)

Equation (6) is a summation of the filter impulse responses which are
excited by each sample and is valid only for the case of zero initial conditions. When this condition is not met, a second term must be added to
eq. (6) to include the decay of the nonzero initial conditions. Since the
system is linear this is not important when considering the stability of the
system, but it must be included if a time response is being calculated.
The response of the filter only at the sa~pling insLants is:
00

(8)

c*(t)

=

L: e(nT)uo(t -

nT)g(t)uo[t - (q - n)T],

n=O

which has the following Laplace transform:
(9)

C*(s) = E*(s)G*(s),

where
G*(s) = £[g*(t)] = £[g(t)uo(t - nT)].
SaIllpled-Data SysteIll Transfer Function. From eq. (9), the
sampled-data transfer function or pulse transfer can be defined as

(10)

C*(s)
G*(s) = - - .
E*(s)

An equivalent form in terms of the z-transform symbolism is indicated
in eq. (11). (The z-transform is defined and illustrated in a later paragraph.)
(11)

C(z)
G(z) = - .

E(z)

Laplace TransforIll Analysis

It is possible to use the equations of the previous section to obtain the
complete time response, c(t). However, the Laplace transfotms are not
rational and it requires considerable labor to obtain the complete response.
If the response is calculated only at the sa,~pling instants, the transforms
can often be written in closed form and the'labor of calculation and manipulation is greatly reduced. The z-tratisform method is usually used to compute the response at the sensing 'Instants. (See z-Transform Analysis,
Sect. 2.)

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS

26-09

Analysis by Difference Equations

The analysis of sampled-data control systems leads to characteristic
equations, which are difference equations.
ForDlulation of the Difference Equations. Difference equations are
discussed in Chap. 4. A simple example will illustrate the analysis of a
control process by difference equations.
EXAMPLE.
(See· Fig. 8.) The following simplifying assumptions are
made: (a) The displacement of the control means, m, is linear and unlimited.
me
Control valve

l/(Tcs+l)

Influx

Supply

c(t)

Controller

e(nT)

(a)

e(l)

. (b)

FIG. 8.

(a) Simple process control.

N ole.

eel) = TO -

(b) Elements of process control eql,livalent to (a).
e(t); this quantity is dimensionless.

(b) The plant has a simple time constant, Te. (c) The controller is lagfree, i.e., the sensing and positioning times are negligible.
The disturbance is assumed at the most unfavorable instant (just after
sensing). The control means, m, changes instantly at the sensing instant
and remains at its new value throughout the sensing cycle.
The analysis is quite simple. Inside a sensing cycle, the behavior of
the plant is continuous and may be described by the linear differential
equation
Te de(t)
e(t) = ro - mi
me = ro - m
T dr

- -- +

+

26-10

FEEDBACK CONTROL

where Tc = the plant time constant, seconds;
T = the sampling cycle, seconds;
e(t) = the controlled variable error, dimensionless:
e(t) = ro - c(t), c(t) is the controlled variable, ro is the set
point, all parameters non dimensional and normalized;
T = tiT, dimensionless time;
m = the net value of control means, dimensionless;
mi = the manipulated control means, dimensionless;
me = the disturbance of the control means, dimensionless.
By using the Laplace transformation it can be shown that the solution
to eq. (1) at the nth sensing instant is
en = Den-l - (1 - D)mn-l,

(12)

where en = value of the controlled variable at the nth sensing instant,
en-l = value at the (n - l)th sensing instant,
D = exp (- T IT c ), the decrement characteristic of the plant and of
the sensing cycle,
mn-l = value of the control means at the (n-l) sensing instant.
N ow consider the behavior of the variable m at the sensing instants.
The relationship is assumed to be linear:
(13)
where K is the strength of the controller and is called the specific step.
The minus sign in eq. (13) provides the negative feedback needed for
regulation of the variables.
Equations (12) and (13) are the simultaneous difference equations of
the control action. They lead to the difference equation of the system,
namely,
en+2 - [1

(14)

+D

- K(1 - D)]en+l

Solution of Linear Difference Equations.

+ Den

=

O.

The linear homogeneous

difference equation may be written:
Aoen+q

(15)

+ A1en+q_l + ... + Aq_1en+l + Aqen =

If the roots are not equal, eq. (15) has the solution:
q

(16)

en =

L: aiZin,
i=l

where
(17)

Zi

is a root of the auxiliary equation,

O.

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS

26-11

The coefficients of eq. (17) are identical with those of eq. (15). Equation
(17) is often called the characteristic equation of the system.
With q distinct roots the general solution is
(18)

If k roots are equal the solution is
(19)

en = [al

+ a2n + ... + aknk-l]zln + ak+lZ2n + ... + aqZq_k n.

Thus, the values en at any sensing instant may be computed. The q
summing constants ai are determined by the first q of the e's at the sensing
instants. The characteristic equation with vanishing roots has special
significance as will be discussed later.
Use and Limitations of Difference Equation Method. If the problem is dominated by the sampler, that is, if there is a rather simple control
loop whose servomotor is actuated by a periodically applied measurement
of the error, this type of analysis is simplest. It is also well to remember
that the difference equation method and the z-transform method are
synonymous. For higher order systems, the difference equation approach
becomes laborious, so that the more methodical z-transform method becomes advantageous.
Analysis by z-Transform Method
Usefulness. The z-transform is the shorthand rational way to write
the Laplace transform of the linear difference equation. It has the same
relationship to linear difference equations as the Laplace transformation
bears to linear differential equations. The advantages of the z-transform
are: (a) it reduces the nonrational Laplace transform of a sampled-data
system to a rational transform which facilitates writing transfer functions;
(b) it allows definition of the closed loop system response of a sampleddata system (no advantage over difference equations).
Limitations. Tables of the more complex z-transforms are not readily
available and the polynomials must be expanded into partial fractions.
A fundamental limitation of z-transforms is that the time solutions are defined only at the sensing instants.
Hidden Oscillations. Because the time solutions are calculated only
at sensing instants, it is possible that the sampling frequency may be
lower than the characteristic frequency of the plant being controlled, and
oscillations may occur which are not apparent from the z-transforms. If
such a condition is suspected the z-transform can be modified to give the
output between the sampling instances and the existences of such oscillations can be checked. (See Refs. 4 and 5.)

FEEDBACK CONTROL

26-12

Basic Relationship. The z-transform is based on the transformation
(20)

where s is the Laplace operator and T is the sampling period. The Laplace
transform of the sampled signal will contain s in the irrational form
e-nsT . Substitution of z will produce a rational transform in z. The ztransform is defined as
00

L

c(nT)z-n.
n=O
Table of Useful z- TransforIns. See Table 1.
C(z) =

(21)

TABLE 1.

LAPLACE AND

Column 1
Row

Laplace Transform

a
b

c

e- nT8
1

1 - e- T8
1

d

s

Column 3

Column 2
Time
Function

z-Transform

uo(t)

1

Impulse function
at t = 0

uo(t - nT)

1
zn

Impulse function
at t = nT

i(t)

z
z - 1

Train of impulses
at sampling instants

u(t)

z
z - 1

Step function

1
S2

e

z-TRANSFORMS (Refs. 2, 5)

Tz
(z - 1)2

Column 4
Description of
Time Function

Ramp function

f

1

;a

!t2

!.T2 z(z
1)
2
(z _ 1)3

+

Quadratic or acceleration function

g

1
s+a

e-at

z
z - e- aT

Exponential func-

h

i

a
S2

sin at

+ a2

1
s - (ljT) In a

j

b
[s - (ljT) In a]2

k

s - (ljT) In a
[s - (ljT) In a]2
b2

+ b2
+

F(s

+ a)

atlT

z sin aT
Z2 - 2z cos aT

tion

+1

z
z-a

Sinusoidal function
Constant raised
to power t

atlT sin bt

za sin bT
Z2 - 2az cos bT

+ a2

Sine wave multiplied by atlT

atlT cos bt

z(z - a cos bT)
Z2 - 2az cos bT
a2

Cosine wave multiplied by atlT

e-atf(t)

F(e+aTz)

Effect of multiplication bye-at

+

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS

26-13

Methods- of Inverting z- Transformation.

Real Inversion Integral. The real inversion integral for the z-transform is
c(nT) =

(22)

~ ~C(z)zn-l dz,
27rJ

'f

where the line integration is made of a sufficiently large radius to enclose
all roots of the integrand.
Partial Fraction Expansion. The z-transform is factored into components so that each term in the expansion can be obtained from Table 1.
The usual methods for partial fraction expansion are applicable. (See
Chap. 20.)
Power Series Expansion. From eq. (21)

c(O)

00

C(z) =

(23)

c(T)

c(2T)

L: c(nT)z-n = - ZO + - Zl + -Z2- + .. ',

n=O

which thus expands the z-transform of a variable in an inverse power series
in z . . The coefficient c(nT) of z-n is the value of the variable at the nth
sensing instant, and the coefficients can be used directly to plot the time
function at the sampling instants.
z-Transform Block Diagram Algebra. The z..:transform describes
the transfer function of two variables at the sensing instants only. Figure
9a illustrates the transform R(z). Figure 9b shows the ~ransform
(24)

and Fig. 9c
(25)
(a)

r(t)

../ r*(t)

~o

)

(b)

- /" c/(t)

r4
(c)

r(t)

0--0)-

../~~~

~ ~

~

r~

ro"
I

(d)

FIG. 9.

,.." cl(t)

-

o---~

Cd(t)

Basic z-transform relationships.

r·

FEEDBACK CONTROL

26-14

In words, if each transfer member is separated from others by synchronous
samplers, the z-transforms cascade, i.e., they can be multiplied. But
notice that if the transfer members are not separated by a chopper, the ztransform cannot be obtained by multiplying together the z-transforms
of the component members. In continuous systems where coupling exists
between transfer members a similar -difficulty is encountered. For example Fig. 9d has the transform:
Consider

Cd(z) = R(Z)G l G2(Z).

z

1

Gl(s) =

S

+

1;

z - exp (-T)

1

and

s + 2'

Then

Z

G2 (z) =
Z -

exp (-2T)

Cc(z)

Z2

R(z)

[z - exp (-T)][z - exp (-2T)]

--=

.

,

whereas
Cd(Z)

Z

Z

R(z).

[z - exp (- T)]

[z - exp (-2T)]

---- = ------------

z[exp (- T) - exp (-2T)]
[z - exp (- T)][z - exp (-2T)]
NOTE.

The difference is that in the first case, G2 is driven by a train of impulses,
whereas in the second, it is driven by the linear response of Gl to its own
input pulses. A helpful concept is that the z-transform of a chain of transfer
members must be derived from chopper to chopper in the circuit. Table
2 shows some control loops, their Laplace transforms and their z-transforms.
The output c may be assumed to be sampled by an imaginary chopper
(synchronized with the real one), resulting in c(nT), although this imaginary chopper must be disregarded in traversing the complete control
loops.

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS
TABLE

2.

26-15

OUTPUT TRANSFORMS FOR BASIC SAMPLED-DATA SYSTEMS a

z- Transform
Laplace Transform
of Output C(s)

System

1

r

O"""~

Output

of

C(z)

R*(s)

R(z)

2

~~

GR*(s)

GR(z)

3

..!.-o"~

G(s)R*(s)

G(z)R(z)

4

5

6

7

~

G(s)R*(s)
HG*(s)

1

l~f

G(z)R(z)
HG(z)

+

1

G*(s)R*(s)

1

~
~

+ H*(s)G*(s)

G(z)R(z)

1

+

+

+ H(z)G(z)
RG(z)

G( ) [ R( ) _ H(s)RG*(s) ]
s
.s
1
HG*(s)

G2(s)RG l *(s)
1
HG 1G2 *(S)

+

1

+ HG(z)

G2(z)RGi.(zt
HG,G 2(z)

1

+

a This table is reprinted from an article by Ragazzini and Zadeh (Ref. '2) with the
permission of the authors.

3. SAMPLED-DATA SYSTEM STABILITY

Stability Criteria of Difference Equations and z- Transforllls

The solution of the characteristic difference equation with nonequal
roots is
m

(26)

Cn

= c(nT) =

L

n

aiZi •

i=l

Now if c(nT) is to remain finite even for large n, then
(27)
In words, the inequality (27) states that, for stability of the sampled-data

FEEDBACK CONTROL

26-16

system, the roots (or zeros) of its characteristic difference equation must
lie inside the unit circle with center at the origin. The unit circle in the
z-plane is the periodic limit of stability corresponding to the Routh-Hurwitz
stability criteria.
The Routh-Hurwitz Stability Criteria. This is used in linear control
theory and may be applied to sampled-data systems by using the conformal
transformation

z+l
s=--'
z- 1

(28)

This transformation changes the unit circle in the z-plane into the left half
of the s-plane as shown in Fig. 10. If the Hurwitz conditions are applied
z-plane
y

s-plane

t

tv

+1

+1

-

-1

u

-1

FIG. 10. The linear transformation Sw = (z + l)/(z - 1), used for deriving stability
conditions from the difference equation of control.

to the characteristic equation (subjected to the transformation of eq. 28),
the conditions can be found which cause the· roots of the transformed
equation to lie in the left half of the w-plane. Hence, the roots of the
characteristic equation must lie within the unit circle in the z-pJane. As
an example consider the second order characteristic equation
(29)

If the transformation (eq. 28) is used, the transformed equation is
(30)

BoS2

+ BIs + B2 =

0,

where Bo = Ao + Al + A 2,
BI = (Ao - A 2),
B2 = Ao - Al + A 2.
In the simple quadratic case, the Hurwitz criterion requires for stability
only that all these coefficients have the same sign. Therefore, the periodic

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS

26-17

stability limit of the second order eq. (29) is defined by the dual condition
(if Bo > 0):
BI > 0 or Ao - A2 > 0,
(31)
B2 > 0 or Ao - Al + A2 > o.
The above procedure can be extended to higher order systems.
Use of Frequency Response Methods to Determine Stability
Nyquist Diagram. Exact Graphical Procedure. The Nyquist diagram
can be drawn by considering G(z) rather than G*(s). The complex plane
plot is made by allowing z to vary along a unit circle in the z-domain. The
gains and phase at any frequency are found by locating the point on the
unit circle (z-domain) corresponding to this angular frequency. The interpretation of the Nyquist diagram follows conventional lines.
EXAMPLE.
Simple sampled-data system (T = sampling period) (this
example is after a similar example by Truxal, Ref. 5):

G(s)=

=K[~ _ _l-J'

K
s(s

+ 1)

s

z

z

s

+1

Kz(l - e-T )

]

G*(s) = [ - - K = -----z - 1
(z - e-T )
(z - 1) (z - e-T )

for

Ws

= 4 rad/sec,

2'lr

6.28

Ws

4

- = T = - - = 1.57;
G*(s)

=

e":""1.57

= 0.208,

0.792Kz
(z - 1) (z - 0.208)

.

The unit circle in the z-plane would appear as in Fig. 11. The vectors are
shown for 1 rad/sec. At W = 1 rad/sec,

z

= 1/90°,

0.792K1/90°
- 1.414/135° 1.0216 /101.8°

G*(·l) J

K

= 0.86 -

T

/ -146.8° ;

likewise

K
G*(J·2) = 0.52 - /180°.
T--

FEEDBACK CONTROL

26-18

Imaginary

z

1 rad/sec

-1
Re z

3 rad/sec

FIG. 11. Pole-zero configuration for G(z), with vectors shown for calculation of Nyquist
diagram at w = 1 rad/sec
K
and Ws

.

= 4 rad/sec.

G(s)

= ---,
s(s

+ 1)

Continuing the above for other frequencies, a Nyquist plot, G*(jw) , similar to that shown in Fig. 12, could be produced.
COlllparison with Alllplitude Modulation. Graphical Approxi",:"
lllate Nyquist Plot. Linvill applied the Nyquist diagram to sampled

systems based on an approximation for the starred open loop transfer
function:
1 00
G*(s)

=-

T

L:

G(s

+ jnws ).

n=-oo

If G(s) is a good low-pass filter, G*(s) will contain only two or three significant terms. For example G*(j1) is the vector addition
G*(jl) = (1/T)[G(jl)

+ G(j1 -

jw s )

+ G(j1 + jw s ) + ... ].

If the sampling frequency is 4 rad/sec,
G*(j1)

= [G(j1)

+ G( -j3) + G(j5) + G( -j7) + ... ](1/T).

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS
260
100

270
90

26-19

280
80

--..,.---~

190
170

350
10

170
190

10
350

100 '
260

90
270

80
280

FIG. 12. Nyquist diagram for G*(s) with G(s) = K/s(s + 1), K/T = 1, and Ws = 4
rad/sec(constructed by using two-term approximation) (Ref. 5). Radial scale numbers
are in terms of T /K; circle spacing is O.125T /K.

All terms except G(j1) and G( -j3) would be small if G(s) is an effective
low-pass filter.
The graphical construction on the G*(jw) plane is shown in Fig. 12.
NOTE. The value of G*(jw s /2) is purely real. ,At frequencies above
ws /2 the Nyquist diagram continues into the upper half-plane until it
reaches infinity at the sampling frequency. The only part of the diagram
of interest in stability considerations is the section corresponding to frequencies lying between zero and ws /2.
The example illustrates that sampling, by itself, increases the phase
lag for a given gain. From the Nyquist diagram the maximum gain for a

26-20

FEEDBACK CONTROL

stable system is read directly. If only the two terms are used in the series
expansion of G*(jw) the allowable KIT is 2.5, if all are used, the KIT is
1.94. The gain in the first case is 3.93; consideration of the rest of the
terms reduces the allowable gain to 3.05, since in this example, T = 27r/4
= 1.57.
4. SAMPLED-DATA SYSTEM SYNTHESIS

Design Procedure Using z-TransforIns

This section is based upon material from Ref. 4.
The typical sampled-data system of Fig. 13 wiII be used for illustration.
The error unit embodies both the analog-digital transducer, which peri-

r-----------,
I

I

Error unit

Input l

I
I
I
I

I

I

I

I
I

I
I
I

I

I
I
I __________ --1I
L
FIG. 13.

The basic digital servo system.

odically expresses the angular position of the output shaft as a number in
binary code form and the digital subtractor which takes the difference of
this number and the incoming one. The characteristics of the servo motoramplifier combination differ for different· applications. They are assumed
to be known and invariant, so that the problem is to synthesize a suitable
controller. The following z-transforms will be used:
Gc(z) = z-transform of the controller,
Gm(z) = z-transform of the motor-amplifier combination,
Go(z) = z-transform of the open control loop,
G(z) = the closed loop z-transform.
(32)

G(z) -

z-transform of the forward path

1 + Go(z)

.

PerforInance Criteria. It is useful to assess the performance in terms
of the responses to specific driving functions such as a step function, a
steady velocity or acceleration, a sinusoidal input of various frequencies or

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS

26-21

a random noise. Any or all such tests may be applied and, since improvement in one respect is often accompanied by deterioration in another, it
will be necessary to compromise. Such overriding factors as the demand
for zero velocity lag must take precedence. The servo amplifier may overload if it is fed a series of discontinuous pulses representing samples. Its
input must be reasonably smooth, and the correction due to one error
number will not be complete before the next is begun.
An equivalent system is shown in Fig. 14. The system delay, AT, is
now shown with the motor amplifier. The controller has two parts, the
first of which is characterized by its z-transform, G1 (z), and it modifies the
sequence of correction samples supplied to it. The modified sequence is
~Gl(Z)

I
I
I

>

I

G2G m (Z)

E

I
I

I
I

~

I

I

G 2 (s)

I

Motoramplifier
Gm(s)

'AT

- - - ControllerG o[ )

FIG. 14.

A system equivalent to that of Fig. 13.

smoothed by the second part, characterized by its transfer function G2 (s),
which provides a continuous signal for driving the servo amplifier. This
subdivision is unlikely to correspond to any physical separation of the
components. The composite expression G1(z) * G2 (s) may be called the
"operational instruction" of the controller. The * symbol is used in this
case to separate the sampled and continuous portions of the operational
instruction and indicates that the information input to the continuous
elements is in sampled form.
Knowing Gm(s) and the performance requirements, it should be possible
to specify G2 (s). For example, the motor amplifier may have the simple
transfer function l/[s(l
T mS)], where the time constant, T m, is probably
smaller than the sampling interval. To avoid sudden changes in velocity,
G2 (s) need only be l/s.
The next step is the determination of a suitable Go(z), taking into account
all the overriding factors. The fact that Go(z) may be expressed as the
ratio of two polynomials N(z)/D(z) is also used.
Physical Realizability. The order of N must be at least one less than
that of D.
Poles at z = 1. To have zero static error, the function Go(z) must
possess at least one simple pole at z = 1. A second order pole at z = 1

+

26-22

FEEDBACK CONTROL

would provide zero velocity lag and a third order pole, a zero acceleration
lag characteristic.
Cancellation of Poles and Zeros. The characteristic equation of the
system is D(z) + N(z, Ll) = O. The parameter Ll indicates the need for
checking the values of the system variables between sensing instants. The
system will be u'nstable if any root lies on or outside the unit circle (z) = 1.
One may be tempted to arrange by adjustment of parameters for the cancellation of a zero by a pole so as to eliminate the root which would otherwise lie outside the unit circle. It is better to increase the sampling frequency. This point cannot be emphasized too strongly, particularly because it is tempting to deal with the special case of no system delay, but
this can result in instability that would then be revealed only when the
behavior between sampling instants is investigated. See Ref. 5.
Design Constants. The suggested method of synthesizing a system is
to match the characteristic equation with one known to give satisfactory performance. Lawden et al. (Ref. 6) has used equations of the form (z - a)n =
0, although when n is a small number, it may be desirable to depart from this
form. Oldenbourg and Sartorius (Ref. 1) show that minimum control
area (see Condition for Minimal Control Area, later in this section) results
from the case of vanishing roots, namely zn = O. Examples relevant to
continuous systems may well be suitable for sampling systems. The
procedure is to arrange for N(z) and D(z) to include between them a number
of constants which are adjustable in the design stage. This number should
be equal to the order of the characteristic equation. It is always possible to
do this, because two additional constants are picked up each time the order
of the characteristic equation is increased by one. The characteristic
equation zn = 0 leads to minimum control area, but if there is noise present,
very little smoothing is provided; as a result, the servo amplifier may be
transiently overloaded or driven into saturation. The characteristic equation (z - OA)n has been used by some authors to provide smooth and
satisfactory performance in the presence of noise. Analysis of a representa. tive second order sampled-data system by Jury (Ref. 7) leads to the results
of Fig. 15, which shows the constant overshoot loci in the z-plane. It can
be shown that these loci can be used for higher order systems. Note that
a system which has no overshoot has its characteristic roots on the positive
real axis. The values of the roots must be less than unity.
The simplest expression which meets all the above requirements is the
expression Go(z). Dividing it by G2 Gm (z) gives G1 (z), the first part of the
operational instruction.
The Operational Instruction. It remains to decide how standard
components may be assembled into a system having the required operational instruction, G1 (z) * G2 (s). The s-part must describe the properties

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS

26-23

Imaginary

FIG. 15. Constant overshoot loci in the z-plane for a unit step input, Mp
values transient peak to steady state (Ref. 7).

= ratio of

of the digital-analog converter included in the error unit. This converter
may perform the function of a clamp, which has the operational instruction
(1 - e-sT)s-l. Other functions of s may be obtained by the usual synthesis procedures and may lead to further terms in the z-part. This usually
leaves an expression in z which is required for the rest of the operational
instruction. Generally, such an expression is of the form
(33)

+ A1z-1 + A 2 z-2 + ... + Arz-r
1 + B1z- 1 + B 2 z- 2 + ... + Brz-r

Ao

This function may be synthesized in many ways; one way of constructing
its physical counterpart is with the aid of r delay elements (each equal to T).
The output is obtained by the summation of delayed components proportional to the coefficients in the numerator. The correct denominator is
obtained by negative feedback of the delayed components proportional to
the coefficients in the denominator.
EXAMPLE. Synthesis of a Simple Analog System. Figure 14 shows the
simple system to be synthesized, and the following assumptions are made
regarding it:
(a) The servo motor and amplifier are constructed so that the rate of
rotation of the motor is proportional to the voltage applied to the amplifier.
The transfer function is Gm(s) = K/s.

FEEDBACK CONTROL

26-24

(b) The transducer samples and introduces a delay, T, the effect of
which is to multiply the z-transform by Z-I.
(c) There must be zero static error; hence Go(z) must include the
factor (z - 1) in its denominator.
(d) There shall be no sudden changes in output velocity. The law of
motion shall be quadratic between sampling instants. Hence G2(s) = S-2.
Therefore, G2(s)G m (s) = KS-3 and G2Gm (z) = KT 2z(z + 1)/2(z - 1)3,
the z-transform being found directly from Table 1.
The z-TransforIn of the Loop. This is

G (z) o
-

(34)

KT 2 GI (z) (z + 1)
.
2(z _ 1)3

It contains the required factor (z - 1) in the denominator. It must also
contain adjustable design constants such that the characteristic equation
can be forced into one known to be suitable such as (z - a)n = o. The
simplest expression for GI (z) which adds two further constants without
increasing the order is
(35)

The operational instruction for the controller is therefore
(z - 1)2
(36)

(Z2

+ BIZ + B 2) * S-2

The z-transform of ,the operational instruction is obtained by replacing
S-2 with its z-transform, Tz(z - 1) -2, which reduces eq. (36) to TZ(Z2
BIZ + B 2)-I. The characteristic equation is
(37) Z3
(BI - 1)z2
(B2 - BI
1/2KT2)z
(1/2KT2 - B 2) = O.

+

+

+

+

+

The simplest third order equation this can be identified with is z3 = o.
Choice of the characteristic equation z3 = 0 is known to produce the most
rapid recovery from a transient disturbance. (See Condition for Minimal
Control Area, later in this section.) If KT2 = BI = 1 and B2 = 72, Go(z)
can be written
Go(z) = (z
1)(2z3 - Z - 1)-1.
(38)

+

The closed loop z-transform in response to a pulse is
(39)

or

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS

26-25

Design Procedure Using Frequency Response. The design procedures of linear techniques discussed in previous chapters are applicable.
Nyquist and Bode diagrams may be used after the transfer function has
been obtained. In working with the Nyquist diagram, it can be seen that
lead compensation will increase the bandwidth of the system. If the
sampling frequency is not increased, no additional high-frequency information will be passed. This illustrates a difficulty which one may encounter
i~ the synthesis of sampled data systems.
Design Procedure Using Root Locus Techniques. As in frequency
response techniques, the root locus could be used as an aid for synthesizing
sampled-data systems. However, it can be shown that the desired root
locus is the positive real axis in the z-plane .. As previously mentioned, the
characteristic equation zn = 0 is known to lead to the fastest recovery of
the system from a disturbance. If noise is present (z - a)n = 0 is the
desired characteristic, where a is a number between zero and 1. As Jury
has shown (see Fig. 15), the loci of constant overshoot also illustrate that
the positive z-axis is the desired place to locate the roots of the characteristic
equation. The circumference of the circle in the z-plane having unity
radius is the periodic liJnit of stability.
In summary, the sampling frequency controls the bandpass and thus
the speed with which the system can transmit information. The roots of
the characteristic equation should be placed as near the origin as permissible.
Placement at the origin is known to produce the liveliest system; if noise
is present, the roots must be moved along the positive real axis in the
z-plane toward z = 1. It should be noted that in many practical cases
th~ above simple criterion for performance will have to be modified for
one or more practical reasons. In such cases the approach suggested is
to use the above rules for the first approximation and then to introduce
the other considerations.
Performance Charts for Typical Sampled-Data Systems
Performance Index: Control Area. To evaluate the results of computations and to choose the most favorable conditions of operation, the
copcept of control effectiveness, measured by the smallness of the control
area, is very useful. For continuous controllers, the control area is defined as
(40)

For sampled-data controllers the calculation is not so simple, except in
one case, when the control process has the initial value of zero at the first

FEEDBACK CONTROL

26-26

sensing instant. In this case (which leads to the largest control area) the
control area is
F

(41)

T = sampling period.

-=

T

It can be seen the control area is the error-time integral.
Condition for Minintal Control Area. The characteristic equation
with vanishing roots, namely zn = 0, has the least control area. Such a
system can be shown to recover most quickly from a disturbance. However, if there is noise present, or if such a characteristic equation is physically unrealizable, the characteristic equation (z - a)n = 0 is used. It
is used in the presence of noise to provide smoothing of the impulses, and
it is used in the second case so that normal system components may be
employed.
Second Order Systent with Dead Tinte and with No Contpensation. (See Fig. 16.) The characteristic equation for this system is

(42)

Z2

+ [K(1

- D/L) - (1

+ D)]z + D

- KD(1 - I/L) = 0,

where L = exp ( - TL/Tc) and D = exp ( - T /Tc). '
If T < TL < 2T, the equation becomes
(43)

Z3 -

(1

+ D)Z2 + [D + K(1

- D 2/L)]z

+ KD(D/L -

1) = O.

Dead time does not increase the order of the characteristic equation as
long as TL < T. When T < TL <2T, the order of the equation is increased from 2 to 3. Further increase of the dead time (or shortening of
the sensing time) leads to successively higher order characteristic equations, and it can be shown that the equation becomes transcendental for
the case of the continuous controller. It can be shown that the controlled
variable never oscillates if all the roots lie on the positive axis of the z-plane
between 0 and + 1. This fact is used to force eq. (42) to have a double
positive root less than.unity. The control factors leading to this aperiodic
limit are shown in Fig. 16. It is not possible to cause the roots to vanish
(i.e., zn = 0) unless compensation is added. Adding compensation introduces two arbitrary constants into the characteristic equation. The
two extra constants can be used to design the system characteristic for
vanishing roots.

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS

C

Co

Sampler

r-

'" I" 5
~r-,

"-

2
.........

1

~

10
~I\

,

1\

~ .~I-

T-r-, r-, ......

K

0.5

0.1

k
.'

0.2
-

I--

-

I--

T/Tc

=0.1

/

/

Ii
/

0.01
0.01

J~

0.1

10

-TL/Tc

FIG. 16. Ex;cursion-dependent periodic control on a plant with first order time constant
and dead time; aperiodic limit (Ref. 1).

FEEDBACK CONTROL

26-28

~--_co

In

500
'/

/'

v

/

V

-

E-i<'> 100

E-i'"

..........

T= Tc

K

'"
j/F/MTc
10
/

-

-

I

TdTc

/

......
I. . . . . . . . . .

r-...

/

1
, 0.01

~~

1/
0.1

0.01
10

FIG. 17. Excursion~dep(mdent periodic control with rate action, on a plant with first

order time constant and dead time; parameters for optimal response (Ref. 1).

Second Order SysteIn with Dead Tilne and Rate Stabilization.
(See Fig. 17.) This system leads to the equation

(44)

AOZ2

+ Alz + A2 =

0,

where Ao = 1,
Al = K(1 - D/L) + TI/Tc·D/L - (1 + D),
A2 = D[I + K(I/L - 1) - TI/Tc·I/L],
L = exp (-TL/Tc),
D = exp (-T/Tc),
M = step disturbance in m,
F = control area.
The control area will assume an absolute minimum if all the roots vanish.

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS

26-29

This limit arises when Ad Ao = 0 and A2/ Ao = o. These conditions are
called optimal because they lead to least control a.rea. They are valid
only in the range 0 < TL/Tc < T /T c. The con.trol area increases steadily
with T /Tc so that if one is free to choose T /Tc, the most favorable operating
conditions are obtained when T = T L • The action following a step
disturbance inside the control loop is shown in Fig. 18. The parameters
for optimal response are shown in Fig. 17.

r-l~2-1

Interva Is

,,,., ...

Z

t

Zo

Tc

=0

...... ...

V~

Z2

.... ,

=0
...............

=TL
-----

m

1

~

!

--

FIG. 18. Example of a difference equation of second order with vanishing characteristic
values (Ref. 1).

Third Order Systelll with Dead Tillle and with Delayed Rate
COlllpensation. The system of Fig. 19 leads to the following character-

istic equation:
(45)
where Ao
Al
A2
A3
Q

=
=
=
=
=

1,
1(1 - D/L
pQ) - 1 - D - Q,
1([D/L(l + Q) - pQ(l + D) - D - Q] + D + DQ + Q,
DQ[1(l
p - l/L) - 1],
exp ( - T /TR), and L, D, J.11, F are defined above (see eq. 44).

+

+

Here again optimal response is possible with its finite control process
similar to that shown in Fig. 18. The parameters for optimal response are
shown in Fig. 19.
COlllparison of Continuous and Salllpled-Data Controllers. If
the process has no dead time, sampled-data control is decidedly less favor-

FEEDBACK CONTROL

26-30

Co

I

I

I

I'\.

""\

I I
/

~

1-1'

I

//

y/

~

/

~
~

TR/TcV

L.... '"

'"I£'\.
""

""

/

b<

r7

f

~

Y

II

l/p

/

II

1/

V

/
0.01
0.01

/

,/

;-

,/

100

v

"I"-1\

/

t

1I

~

/

K

1/

~

0.1

V

11/

T= TL

"' r--,K

i

J ,/

I)

F/MTc

:

,.

"

10

"-

"' \..
i'

t\.

"-

1I
0.1

1

. . . . r-. t--.i-o-

1
10

FIG. 19. Excursion-dependent periodic control with retractile followup, on a plant with
first order time constant and dead time; parameters for optimal response (Ref. 1).

able than the corresponding continuous control. Figure 20 shows the
relationships which are present when there is dead time in the plant. If
the control areas of sampled-data and continuous controllers without stabilization are compared at the aperiodic limit, the two upper solid lines of Fig.
20· are obtained. For small values of the dead time, these two curves can
hardly be distinguished from one another. Decidedly different relations
are present, however, if the controller includes a stabilizing device, since

SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS

26-31

then a control response which terminates in a finite time can be had
with a sampled-data controller. A comparison of control areas, Fig. 20,
shows that sampled-data control gives appreciably better results. To be
sure, the combination 'of parameters which causes vanishing roots of the
characteristic equation is not possible for arbitrarily small dead times
Continuous
Co no stabilizer
f- CR with retractile
follow-up
10 f- C with rate
action

40

~ 20

Periodic
Po no stabilizer

PR with retractile
I~V
follow-up
P wit~ rate
~V,/

' i //

action

[,/ /
~~O/~/ )('~
/
/

/' 1//

PRj

I' //

/I •

t

/1/ 1/

Navigation menu