Grabbe_Ramo_Wooldridge_Handbook_of_Automation_Computation_and_Control_Vol_1_1958 Grabbe Ramo Wooldridge Handbook Of Automation Computation And Control Vol 1 1958
Grabbe_Ramo_Wooldridge_Handbook_of_Automation_Computation_and_Control_Vol_1_1958 Grabbe_Ramo_Wooldridge_Handbook_of_Automation_Computation_and_Control_Vol_1_1958
User Manual: Grabbe_Ramo_Wooldridge_Handbook_of_Automation_Computation_and_Control_Vol_1_1958
Open the PDF directly: View PDF .
Page Count: 1037
Download | |
Open PDF In Browser | View PDF |
HANDBOOK OF AUTOMATION, COMPUTATION, AND CONTROL Volume 1 CONTROL FUNDAMENTALS NEW YORK · JOHN WILEY & SONS, INC. London • Chapman & Hall, Limited HANDBOOK OF AUTOMATION, COMPUTATION, AND CONTROL Volume 1 CONTROL FUNDAMENTALS Prepared by a Staff of Specialists Edited by EUGENE M. GRABBE SIMON RAMO DEAN E. WOOLDRIDGE The Ramo-Wooldridge Corporation Los Angeles, California Copyright © 1958, by John Wiley & Sons, Inc. All Rights Reserved. This book or any part thereof must not be reproduced in any form without the written permission of the publisher. Library of Congress Catalog Card Nu.mber: 58-10800 Printed in the United States of America CONTRI BUTORS E. L. ARNOfF, Case Institute of Technology, Cleveland, Ohio (Chapter 15) J. E. BARNES, Jr., General Electric Company, Schenectady, New York (Chapter 26) C. E. BRADFORD, General Electric Company, Pittsfield, Massachusetts (Chapter 22) J. M. CAMERON, National Bureau of Standards, Washington, D. C. (Chapter 14) R. F. CLIPPINGER, Datamatic, A Division of Minneapolis-Honeywell Regulator Company, Newton Highlands, Massachusetts (Co-Editor, Chapter 14) A. B. CLARKE, University of Michigan, Ann Arbor, Michigan (Chapter 13) A. H. COPELAND, SR., University of Michigan, Ann Arbor, Michigan (Chapters 11 and 12) . P. G. CUSHMAN, General Electric Company, Pittsfield, Massachusetts (Chapter 23) M. W. DE MERIT, General Electric Company, Schenectady, New York (Chapter 22) J. B. DIAZ, University of Maryland, College Park, Maryland (Chapter 14) B. DIMSDALE, Service Bureau Corporation, Los Angeles, California (Chapter 14) P. ELIAS, Massachusetts Institute of Technology, Cambridge, Massachusetts (Chapter 16) B. FRIEDMAN, University of California, Berkeley, California (Chapter '14) W. M. GAINES, General Electric Company, Tempe, Arizona (Editor, Part E; Chapters 19 and 25) G. E. HAY, University of Michigan, Ann Arbor, Michigan (Chapters 4 and 5) v CONTRIBUTORS vi E. ISAACSON, New York University, New York City, New York (Chapter 14) S. J. JENNINGS, General Electric Company, Evendale, Ohio (Chapter 20) W. KAPLAN, University of Michigan, Ann Arbor, Michigan (Co-Editor, Part Ai Chapters 5, 7, 8, 9, and 10) J. H. LEVIN, Datamatic, A Division of Minneapolis-Honeywell Regulator Company, Newton Highlands, Massachusetts (Co-Editor, Chapter 14) D. L. LIPPITT, General Electric Company, Schenectady, New York (Chapter 24) R. C. LYNDON, University of Michigan, Ann Arbor, Michigan (Chapters 2 and 3) M. MANNOS, Datamatic, A Division of Minneapolis-Honeywell Regulator Company, Newton Highlands, Massachusetts (Chapter 14) P. MERTZ, Bell Telephone Laboratories, New York City, New York (Chapters 17 and 18) S. G. REQUE, Ge.neral Electric Company, Tempe, Arizona (Chapter 21) R. RICHTMEYER, New York University, New York City, New York (Chapter 14) E. H. ROTHE, University of Michigan, Ann Arbor, Michigan (Chapter 6) W. E. SOLLECITO, General Electric Company, Schenectady, New York (Chapter 21) R. M. THRALL, University of Michigan, Ann Arbor, Michigan (Co-Editor, Part Ai Chapter 1) A. A. WINKEUOHANN, General Electric Company, Evendale, Ohio (Chapter 20) FOREWORD The proliferation of knowledge now makes it most difficult for scientists or engineers to keep ahead of change even in their own fields, let alone in contiguous fields. One of the fields where recent change has been most noticeable, and in fact exponential, has been automatic control. This three-volume Handbook will aid individuals in almost every branch of technology who must constantly refresh their memories or refurbish their knowledge about many aspects of their work. Automation, computation, and control, as we know them, have been evolving for centuries, but within the last generation their impact has been felt in nearly every segment of human endeavor. Feedback principles were exploited by Leonardo da Vinci and applied by James Watt. Some of the early theoretical work of importance was contributed by Lord Kelvin, who also, together with Charles Babbage, pointed the way to the development of today's giant computational aids. Since about the turn of the present century, the works of men like Minorsky, Nyquist, Wiener, Bush, Hazen, and Von Neuman gave quantum jumps to computation and control. But it was during and immediately following World War II that quantum jumps occurred in abundance. This was the period when theories of control, new concepts of computation, new areas of application, and a host of new devices appeared with great rapidity. Technologists now find these fields charged with challenge, but at the same time hard to encompass. From the activities of World War II such terms as servomechanism, feedback control, digital and analog computer, transducer, and system engineering reached maturity. More recently the word automation has become deeply entrenched as meaning something about the field on which no two people agree. Philosophically minded technologists do not accept automation merely as a third Industrial Revolution. They see it, as they stand about where the editors of this Handbook stood when they projected this work, as a manifestation of one of the greatest Intellectual Revolutions in Thinking that has occurred for a long time. They see in automation the natural consequence of man's urge to exploit modern science on a wide front to perform useful tasks in, for example, manufacturing, transportation, business, physical science, social science, medicine, the military, and government. They see that it has brought great change to our conventional way of thinking about the human use of human beings, to quote Norbert Wiener, and in turn about how our engineers will be trained to solve tomorrow's engineering problems. They even see that it has precipitated some deep thinking on the part of our indusvii viii FOREWORD trial and union leadership about the organization of workers in order not to hold captive bodies of workmen for jobs that automation, computation, and control have swept or will soon sweep away. Perhaps the important new face on todais technological scene is the degree to which the broad field needs codification and unification in order that technologists can optimize their role to exploit it for the general good. One of the early instances of organized academic instruction in the field was at The Massachusetts Institute of Technology in the Electrical Engineering Department in September 1939, as a course entitled Theory and Application of Servomechanisms. I can well recollect discussions around 1940 with the late Dr. Donald P. Campbell and Dr. Harold·L. Hazen, which led temporarily to renaming the course Dynamic Analysis of Automatic Control Systems because so few students knew what "servomechanisms" were .. But when the GI's returned from war everybody knew, and everyone wanted instruction. Since that time engineering colleges throughout the land have elected to offer organized instruction in a multitude of topics ranging from the most abstract mathematical fundamentals to the most specific applications of hardware. Textbooks are available on every subject along this broad spectrum. But still the practicing control or computer technologist experiences great difficulty keeping abreast of what he needs to know. As organized instruction appeared in educational institutions, and as industrial activity increased, professional societies organized groups in the areas of control and computation to meet the needs of their members to tell one another about technical advances. Within the past five years several trade journals have undertaken to report regularly on developments in theory, components, and systems. The net effect of all this is that the technologist is overwhelmed with fragmentary, sometimes contradictory, redundant information that comes at him at random and in many languages. The problem of assessing and codifying even a portion of this avalanche of knowledge is beyond the capabilities of even the most able technologist. The editors of the Handbook have rightly concluded that what each technologist needs for his long term professional growth is to have a body of knowledge that is negotiable at par in anyone of a number of related fields for many years to come. It would be ideal, of course, if a college education could give a prospective technologist this kind of knowledge. It is in the hope of doing this that engineering curricula are becoming more broadly based in science and engineering science. But it is unlikely that even this kind of college training will be adequate to cope with the consequences of the rapid proliferation of technology as is manifest in the area of automation, computation, and control. Hence, handbooks are an essential component of the technical literature when they provide the unity and continuity that are requisite. I can think of. no better way to describe this Handbook than to say that the editors, in both their organization of material and selection of substance, have given technologists a unified work of lasting value. It truly represents today's optimum package of that body of knowledge that. will be negotiable at par by technologists for many years to come in a wide range of disciplines. GORDON S. BROWN Massachusetts Institute of Technology PREFACE Accelerated advances in technology have brought a steady stream of automatic machines to our factories, offices, and homes. The earliest automation forms were concerned with doing work, followed by the controlling function, and recently the big surge in automation has been directed toward data handling functions. New devices ranging from digital computers to satellites have resulted from military and other government research and development programs. Such activity will continue to have an important impact on automation progress. One of the pressures for the development of automation has been the growing complexity and speed of business and industrial operations. But automation in turn accelerates the tempo of whatever it tou'ches, so that we can expect future systems to be even larger, faster, and more complex. While a segment of engineering will continue to mastermind, by rule of thumb procedures, the design and construction of automatic equipment and systems, a growing percentage of engineering effort will be devoted to activities that may be classified as problem solving. The activities of the problem solver involve analysis of previous behavior of systems and equipment, simulation of present situations, and predictions about the future. In the past, problem solving has largely been practiced by engineers and scientists, using slide rules and hand calculators, but with the advent of large-scale data processing systems, the range of applications has been broadened considerably to include economic, government, and social activities. Air traffic control, traffic simulation, library searching, and language translation, are typical of the problems that have been attacked. This Handbook is directed toward the problem solvers-the engineers, scientists, technicians, managers, and others from all walks of life who are concerned with applying technology to the mushrooming developments in automatic equipment and systems. It is our purpose to gather together in one place the available theory and information on general mathematics, ix x PREFACE feedback control, computers, data processing, and systems design. The emphasis has been on practical methods of applying theory, new techniques and components, and the ever broadening role of the electronic computer. Each chapter starts with definitions and descriptions aimed at providing perspective and moves on to more complicated theory, analysis, and applications. In general, the Handbook assumes some engineering training and will serve as an information source and refresher for practicing engineers. For management, it will provide a frame of reference and background material for understanding modern techniques of importance to business and industry. To others engaged in various ramifications of automation systems, the Handbook will provide a source of definitions and descriptive material about new areas of technology. It would be difficult for anyone individual or small group of individuals to prepare a handbook of this type. A large number of contributors, each with a field of specialty, is required to provide the engineer with the desired coverage. With such a broad field, it is difficult to treat all material in a homogeneous manner. Topics in new fields are given in more detail than the older, established ones since there is a need for more background information on these new subjects. The organization of the material is in three volumes as shown on the inside cover of the Handbook. Volume 1 is on Control Fundamentals, Volume 2 is concerned with Computers and Data Processing, and Volume 3 with Systems and Components. In keeping with the purpose of this Handbook, Volume 1 has a strong treatment of general mathematics which includes chapters on subjects not ordinarily found in engineering handbooks. These include sets and iela~ tions, Boolean algebra, probability, and statistics. Additional chapters are devoted to numerical analysis, operations research, and information theory. Finally, the present status of feedback control theory is summarized in eight chapters. Components have been placed with systems in Volume 3 rather than with control theory in Volume 1, although any discussion of feedback control must, of necessity, be concerned with components. The importance of computing in research, development, production, real time process control, and business applications, has steadily increased. Hence, Volume 2 is devoted entirely to the design and use of analog and digital computers and data processors. In addition to covering the status of knowledge today in these fields, there are chapters on unusual computer systems, magnetic core and transistor circuits, and an advanced treatment of programming. Volume 3 emphasizes systems engineering. A part of the volume covers techniques used in important industrial applications by examining typical systems. The treatment of components is largely concerned with how to select components among the various alternates, their mathematical description and their integration into systems. There is also PREFACE xi a treatment of the design of components of considerable importance today. These include magnetic amplifiers, semiconductors, and gyroscopes. We consider this Handbook a pioneering effort in a field that is steadily pushing back frontiers. It is our hope that these volumes will not only provide basic information on new fields, but will also inspire work and further research and development in the fields of automatic control. The editors are pleased to acknowledge the advice and assistance of Professor Gordon S. Brown and Professor Jerome S. Wiesner of the Massachusetts Institute of Technology, and Dr. Brockway McMillan of the Bell Telephone Laboratories, in organizing the subject matter. To the contributors goes the major credit for providing clear, thorough treatments of their subjects. The editors are deeply indebted to the large number of specialists in the control field who have aided and encouraged this undertaking by reviewing manuscripts and making valuable suggestions. Many members of the technical staff and secretarial staff of The Ramo-Wooldridge Corporation have been especially helpful in speeding the progress of the Handbook. EUGENE M. GRABBE SIMON RAMO DEAN E. WOOLDRIDGE August 1958 CONTENTS A. GENERAL MATHEMATICS Chapter 1. Sets and Relations 1. Sets 1-01 2. Relations 1-05 3. Functions 1-06 4. Binary Relations on a Set 1-07 5. Equivalence Relations 1-07 6: Operations 1-08 7. Order Relations 1-09 1-01 8. Sets of Points . 1 - 10 References 1- 11 Chapter 2. Algebraic Equations 1. Polynomials 2-01 2. Real Roots 2-03 3. Complex Roots 2-04 References 2-06 2-01 Chapter 3. Matrix Theory 1. Vector Spaces 3-01 2. linear Transformations 3. Coordinates 3-04 4. Echelon Form 3-05 5. Rank, Inverses 3-07 3-01 3-03 Determinants, Adjoint 3-08 Equivalence 3-09 Similarity 3- 10 Orthogonal and Symmetric Matrices 3-13 10. Systems of Linear Inequalities 3- 14 6. 7. 8. 9. References 3- 17 xiii CONTENTS xiv Chapter 4. Finite Difference Equations 4-01 1. Definitions 4-01 2. Linear Difference Equations 4-03 3. Homogeneous Linear Equations with Constant Coefficients 4-04 4. Nonhomogeneous Linear Equations with Constant Coefficients 4-05 5. Linear Equations with Variable Coefficients 4-07 References 4-08 Chapter 5. Differential Equations 1. Basic Concepts 5-01 2. Equations of First Order and First Degree 5-02 3. Linear Differential Equations 5-04 4. Equations of First Order but not of First Degree 5-07 5. Special Methods for Equations of Higher than First Order 5-09 6. Solutions in Form of Power Series 5- 10 7. Simultaneous Linear Differential Equations 5-12 8. Numerical Methods 5- 14 9. Graphical Methods-Phase Plane Analysis 5-15 10. Partial Differential Equations 5-20 References 5-22 5-01 Chapter 6. Integral Equations 1. Definitions and Main Problems 6-01 2. Relation to Boundary Value Problems 6-03 3. General Theorems 6-05 4. Theorems on Eigenvalues 6-06 5. The Expansion Theorem and Some of Its Consequences 6-07 6. Variational Interpretation of Eigenvalue Problem 6-08 7. Approximation Methods 6-10 References 6- 17 6-01 Chapter 7. Complex Variables 1. Functions of a Complex Variable 7-01 7-01 CONTENTS xv 2. Analytic Functions. Harmonic Functions 7-04 3. Integral Theorems 7-05 4. Power Series. Laurent Series 7 -08 5. Zeros. Singularities. Residues. Argument Principle 7 - 11 6. Analytic Continuation 7-16 7. Riemann Surfaces 7 - 17 8. Elliptic Functions 7 - 1 8 9. Functions Defined by Linear Differential Equations 7-21 10. Other Transcendental Functions 7-25 References 7-28 Chapter 8. Operational Mathematics . 8-01 1. Heaviside Operators 8-01 2. Application to Differential Equations 8-05 3. Superposition Principle. Response to Unit Function and Delta Function 8-06 4. Appraisal of the Heaviside Calculus 8-07 5. Operational Calculus Based on Integral Transforms 8-07 6. Fourier Series. Finite Fourier Transform 8-10 7. Fourier Integral. Fourier Transforms 8-15 8. Laplace Transforms 8-17 9. Other Transforms 8-18 References 8- 19 Chapter 9. Laplace Transforms . 1. Fundamental Properties 9-01 2. Transforms of Derivatives and Integrals 9-03 3. Translation. Transform of Unit Function, Step Functions, Impulse Function (Delta Function) 9-06 4. Convolution 9-08 5. Inversion 9-09· 6. Application to Differential Equations 9- 10 7. Response to Impulse Functions 9- 15 8. Equations Containing Integrals 9-1 8 9. Weighting Function 9- 18 10. Difference-Differential Equations 9-20 9-01 CONTENTS xvi 11. Asymptotic Behavior of Transforms References 9-21 Chapter 10. 9-21 Conformal Mapping 10-01 1. Definition of Conformal Mapping. General Properties 10-01 2. Linear Fractiondl Transformations 10-05 3. Mapping by Elementary Functions 10-06 4. Schwarz-Christoffel Mappings 10-08 5. Application of Conformal Mapping to Boundary Value Problems 10-09 References 10- 11 Chapter 11. Boolean Algebra 1. Table of Notations 2. 3. 4. 5. 6. Chapter 12. 11-01 11' -01 Definitions of Boolean Algebra 11-01 Boolean Algebra and logic 11-05 Canonical Form of Boolean Functions 11 -08 Stone Representation 11-09 Sheffer Stroke Operation 11 - 10 References 11-11 Probability 12-01 1. Fundamental Concepts· and Related Probabilities 12-01 2. Random Variables and Distribution Functions 12-04 3. Expected Value 12-06 4. Variance 12-11 5. Central Limit Theorem 12-13 6. Random Processes 12-18 References 12-:-20 Chapter 13. Statistics 1. 2. 3. 4. 5. 6. 7. 8. 9. Nature of Statistics 1 3-01 Probability Background 13-02 Important Probability Distributions Sampling 13-06 Bivariate Distributions 13-13 Tests for Goodness of Fit 1 3 - 1 6 Sequential Analysis 13-16 Monte Carlo Method 13-17 Statistical Tables 13-18 References 13-21 13-01 13-04 CONTENTS B. xvii NUMERICAL ANALYSIS Chapter 14. Numerical Analysis 14-01 1. Interpolation, Curve Fitting, Differentiation, and Integration 14-01 2. Matrix Inversion and Simultaneous Linear Equations 14-13 3. Eigenvalues and Eigenvectors 14-28 4. Digital Techniques in Statistical Analysis of Experiments 14-48 5. Ordinary Differential Equations 14-55 6. Partial Differential Equations 14-64 References 14-88 C. OPERATIONS RESEARCH Chapter 15. Operations Research. 15-01 1. Operations Research and Mathematical Models 15-02 2. Solution of the Model 1 5-1 0 3. Inventory Models 15-21 4. Allocation Models 15-31 5. Waiting Time Models 15-73 6. Replacement Models 15-86 7. Competitive Problems 15-99 8. Data for Model Testing 15- 115 9. Controlling the Solution 15-120 10. Implementation 15-123 References 15- 124 D. INFORMATION THEORY AND TRANSMISSION Chapter 16. Information Theory . 1. 2. 3. 4. 5. 6. Introduction 1 6-01 General Definitions 16-02 Simple Discrete Sources 16-08 More Complicated Discrete Sources 1 6- 1 9 Discrete Noiseless Channels 16-24 Discrete Noisy Channels I. Distribution of Information 16-26 7. Discrete Noisy Channels II. Channel Capacity and Interpretations 16-32 8. The Continuous Case 16-39 References 1 6-46 16-01 CONTENTS xviii Chapter 17. Smoothing and Filtering 17-01 1. Definitions: Smoothing and Prediction. Symbols 17-01 2. Definitions: Correlation 17-05 3. Relationship between Correlation and Signal Structure 17-09 4. Design of Optimum Filter 17-1 3 5. Extensions of Procedure 17-19 6. Network Synthesis 17-25 References 17-32 Chapter 18. E. Data Transmission 1. Introduction and Symbols 18-01 2. Formation and Use of the Electrical Signal 18-07 3. Transmission Impairment 18-18 References 18-30 18-01 FEEDBACK CONTROL Chapter 19. Methodology of Feedback Control 19-01 1. Symbols for Feedback Control 1 9-01 2. General Feedback Control System Definitions 19-04 3. Feedback Control System Design Considerations 19-12 4. Selection of Method of Synthesis for Feedback Controls 19-19 References 19-21' Chapter 20. Fundamentals of System Analysis . 1. 2. 3. 4. 5. 6. Chapter 21. 20-01 Representation of Physical Systems 20-01 Classical Methods of Analysis 20-28 Block Diagrams 20-56 System Types 20-66' Error Coefficients 20-70 Analysis of A-C Servos: Carrier Systems 20-79 References 20-84 Stability 1. Introduction 21-01 2. Classical Solution Approach 21-01 21-02 CONTENTS 3. 4. 5. 6. 7. 8. Chapter 22. xix Routh's Criterion 21 -05 Nyquist Stability Criterion 2,1-09 Bode Attenuation Diagram Approach 21-29 Root Locus Method 21 -46 Miscellaneous Stability Criteria 21-71 Closed Loop Response from Open Loop Response 21 -72 References 21-81 Relation between Transient and Frequency Response . 22-01 1. Introduction 22-01 2. Response Characteristics Defined 22-02 3. Relation between Transient Response and Location of Roots of Characteristic Equation 22-03 4. Relation between Closed Loop and Open Loop Roots 22-15 5. Design Charts Relating Open Loop Frequency Response and Transient Response 22-18 6. Approximate Relations-Rules of Thumb 22-43 7. Numerical and Graphical Techniques of Relating Transient and Frequency Response 22-43 References 22-61 Chapter 23. Feedback System Compensation 23-01 1. Design Criteria and Techniques 23-01 2. Compensating Components: D-C Systems 23-18 3. Compensating Networks: A-C Systems 23-48 4. Open-Closed Loop Control 23-54 References 23-56 ' Chapter 24. Noise, Random Inputs, and Extraneous Signals 1. 2. 3. 4. Introduction 24-01 Mathematical Description of Noise Measurement of Noise 24-06 System Response to Noise 24- 11 24-01 24-02 xx CONTENTS 5. System Design in the Presence of Noise 24-15 References Chapter 25. 24-1 9 Nonlinear Systems 25-01 1. 2. 3. 4. Definitions 25-01 General Nonlinear System Problem 25-03 Methods of Analysis: Linearization 25-07 Methods of Analysis: Describing Function 25-13 ' .5. Methods of Analysis: Phase Plane, Graphical Solution of System Equations 25-36 6. Other Methods of Analysis 25-43 7. Nonlinear System Compensation 25-48 References 25-66 Chapter 26. Sampled-Data Systems and Periodic Controllers . 1. Description and Definition of Sampled-Data System 26-01 2. 'Methods of Transient Analysis 26-06 3. Sampled-Data System Stability 26-15 4. Sampled-Data System Synthesis 26-20 References 26-32 INDEX 26-01 MATHEMATICS A. GENERAL MATHEMATICS 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 1 3. R. M. Thrall and W. Kaplan, Editors Sets and Relations, by R. M. Thrall Algebraic Equations, by R. C. Lyndon Matrix Theory, by R. C. Lyndon Finite Difference Equations, by G. E. Hay Differential Equations, by G. E. Hay and W. Kaplan Integral Equations, by E. H. Rothe Complex Variables, by W. Kaplan Operational Mathematics, by W. Kaplan Laplace Transforms, by W. Kaplan Conformal Mapping, by W. Kaplan Boolean Algebra, by A. H. Copeland, Sr. Probability, by A. H. Copeland, Sr. Statistics, by A. B. Clarke A GENERAL MATHEMATICS Chapter 1 Sets and Relations R. M. Thrall 1. 2. 3. 4. Sets Relations Functions Binary Relations on a Set 1-01 1-05 1-06 1-07 5. Equivalence Relations 1-07 6. Operations 7. Order Relations 8. Sets of Points 1-08 1-09 1-10 References 1-11 1. SETS A set is a collection of objects of any sort. The words class, family, ensemble, aggregate are synonyms for the term set. Each object in a set is called an element (member) of the set. If S denotes the set and b an element of S, one writes: b~S; this is read: "b belongs to S." If b does not belong to S, one writes: b ([ S. Sets will generally be designated by capital letters, elements by lower case letters. IMPORTANT EXAMPLES OF SETS. Z, the set of positive integers z; Z consists of the numbers 1, 2, 3, ... ; J, the set of all integers j (including 0 and the negative integers); 1-01 1-02 GENERAL MATHEMATiCS Q, the set of rational numbers q (fractions alb, where a is an integer and b is a positive int~ger) ; R, the set of all real numbers r (numbers which are expressible as unending decimals) ; C, the set of all complex numbers c (numbers of form x + yV -1, where x and yare real). In geometry one employs sets of points; for example, all points on a specified line or all points inside a circle. rr.... Geometric diagrams can be helpful in I'S reasoning about sets which may have T ~ no reference to geometry (Fig. 1). ~. W A set can be designated by listing JlLIIJJ,I.I........""'-I.U.J..1J...""""'-'--UllJ.~ its elements between braces. Thus ill S = {I, -3,7} FIG. 1. Set and subset. T is a subset of S, T c S. is the set whose elements are the numbers 1, -3, and 7. For infinite sets one still uses braces, but instead of writing all the elements one gives a rule for set membership. For example, Z = {z I z is a positive integer} is an abbreviation for liZ is the set of all z for which z is a positive integer." This set is also sometimes designated (less precisely) by Z = {I, 2, 3, "', n, ... }. Two sets are said to be equal if they have exactly the same members. For example, if B = {3, 1,5, 1, 7}, A={1,3,5,7}, then A = B. Neither the order in which the elements are written down nor the number of times that an element is repeated within the braces is significant. If A is not equal to B, one writes: A ~ B. Subsets. The set T is said to be a subset of the set S (Fig. 1) if every member of T is also a member of S; in symbols, T c S or S:J T. If both T c Sand SeT, Sand T are equal. If S :J T and S ~ T, i.e., if S contains every element of" T and at least one element not in T, one says that S contains T properly or that T is a proper subset of S; in symbols, S> T or T; the transpose of the relation "is the wife of" is the relation "is the husband of." Range and DOlllain. Let R be a relation between elements a of A and b of B. For each a in A one denotes by R(a) the set of all b in B for which aRb; R(a) is called the image of a under R. For each subset S of A one denotes by R(S) the set of all b in B for which aRb for at least one a in S; R(S) is called the image of Sunder R. The image of A under R, the set R(A), is called the range of R. The counterimageof .an element b under R is the set of all elements a for which aRb; this is the same as the set RT (b). The counterimage of a subset U of B is the set of all a in A for which aRb for at least one b in U; this is the same as the set RT (U). The set RT (B) is called the domain of R. A relation is sometimes called a correspondence between its domain and its range. . Product of Rehitions. If R is a relation between elements a of A and b of Band S is a relation between elements b of B and elements c of C, the product relation (or composition) RS is defined as a relation between elements a of A and c of C, as follows: aRSc whenever for some b in Bone has aRb and bSc. EXAMPLE. The product can be illustrated by a communications network. Let A, B, C be three sets of stations, let aRb mean that a can communicate with b and let bSc mean that b can communicate with c. Then aRSc means that there is a two-stage communication link from a through some intermediate station b to c. For products of three relations one has the associative law A (Be) = (AB)C. 3. FUNCTIONS A relation F between elements a of A and bof B is said to be a function if, for every a in A, F(a) is either empty or contains just one element. If in addition F has domain A, one says that F is a function on A into B. In this case one identifies F(a) with its unique element b and writes: b = F(a) or F a -)0 b The terms mapping and transformation are synonyms for function. A function F on A into B can be defined directly as a correspondence which assigns to each element a in A a unique image b = F(a) in B. The set B is called a codomain of F. The range F(A) is a subset of B. If F(A) = B, one says that F is a function on A onto B. If both F and FT are functions, one says that F is a one-to-one function. 1-07 SETS AND RELATIONS The transpose F is then termed the inverse of F and is denoted by F-l . A one-to-one function on A onto B is called a one-to-one correspondence between A and B. In this case one has (FFT)a = FT(F(a)) = a for all a in A; FFT is the identity function EA on A onto A: EA(a) EE. ' = a; similarly, FTF = In classical analysis, a function F is often denoted by F(x). The symbol F(x) then has two meanings: the value of the function for a particular x, and the function as a whole. Similarly, F(x, y) denotes a function of two variables and also the value of the function for given x, y. 4. BINARY RELATIONS ON A SET A relation R on a set A is said to be Identical if R = E A, i.e., if aRb is equivalent to a = b; Reflexive if R :::> EA, i.e., if aRa for all a in A; Irreflexive if R n EA =0, i.e., if aRa for no a in A; Transitive if R2 c R, i.e., if aRb and bRc imply aRc; Symmetric if R = R T, i.e., if aRb implies bRa; Antisymmetric if R n RT c E A, i.e., if aRb and bRa imply a = b; Asymmetric if R n RT = 0, i.e., if for no a, b is aRb and bRa; Acyclic if R n n EA = 0 for all n, i.e., if alRa2, a2Rag, .• " an-IRan imply al ~ an; Complete if R U RT = A X A, i.e., if for each pair (a, b) either aRb or bRa; Trichotomous if R U RT U EA = A X A and R n RT = 0, i.e., if for each pair (a, b) exactly one of the relations aRb, bRa, a = b holds. Note that a relation is asymmetric if and only if it is antisymmetric and irreflexive. EXAMPLES. The parallel relation on lines in a plane is symmetric, irreflexive but not transitive (unless a line is defined to be parallel to itself). The relation < on the real numbers is asymmetric, transitive, trichotomous, and acyclic; whereas the relation ~ is antisymmetric, reflexive, transitive, and complete. The relation "is at least as good as" is reflexive, transitive, but not antisymmetric if there are two objects judged equally good. 5. EQUIVALENCE RELATIONS By a partition of a set A is meant a subdivision of A into subsets, no two of which have an element in common. By an equivalence relation R in, A is meant a relation R in A which is reflexive, symmetric, and transitive. GENERAL MATHEMATICS 1-08 Each partition of A determines an equivalence relation in A; aRb holds when a and b are in the same subset of the partition. Conversely, each equivalence relation determines a partition of A; the subsets of the partition are the sets R(a), i.e., the sets of form {b IbRa}. The sets R(a) are called equivalence classes. Equivalence is the basis of classification; the equivalence classes contain elements which,·' although not identical, can be regarded as alike or interchangeable for some purpose. Example. The sorting of nuts and bolts is based on the equivalence relation "has the same size and shape as." A property shared by all elements of each equivalence class is called an invariant. More formally, let R be an equivalence relation on a set A. A function F on A is said to be an invariant relative to R if aRb implies F(a) = F(b). For example, if'R is the relation of congruence on a set A of triangles, then the function F(a) = area of triangle a is an invariant. A set of invariants F, G, ... relative to 'a relation R is said to be complete if F(a) = F(b), G(a) = G(b), ... together imply aRb. The language "a necessary and sufficient for K is that Pl, P 2 , ••• all hold" frequently states that Pl, P 2 , . : . are a complete set of invariants for an equivalence relation associated with K. Many of the theorems of elementary geometry fall in this class. One sometimes is interested in choosing from each equivalence class a representative from which one or more invariants can be easily calculated. Such representatives are said to be in normal form or standard form. More technically, let R be an equivalence relation on a set A. A function which assigns to each equivalence class R(a) one of its members is called a canonical form relative to R. Thus in matrix theory (Chap. 3) one has canonical forms for row equivalence, equivalence, congruence, orthogonal congruence, and similarity. It is customary to select a representative which displays a complete set of invariants. 6. OPERATIONS A function F assigning to each ordered pair (a,· b), with a in A and b in B, an element c of set C is called a (binary) operation on A X B. If F(a, b) = c, one also writes aFb = c. If A = B, F is called an operation on A. If also C = A, F is called an interior operation; otherwise F is called an exterior operation. EXAMPLES. Addition and multiplication of numbers are interior operations. The scalar product of two vectors is an exterior operation. Let F be an interior operation on A and let R be an equivalence relation SETS AND RELATIONS 1-09 on A. One says that F has the substitution property relative to R, if aRa' and bRb' imply (aFb)R(a'Fb'). Example. Let A be the set of all integers; let aRb mean that a and b have the same parity (both even or both odd). ThenaRa' and bRb' imply that (a + b)R(a' + b'). Thus addition has the substitution property relative to R. An exterior operation is said to have the substitution property relative to R if aRa' and bRb' imply (aFb) = (a'Fb f ). 7. ORDER RELATIONS ~ on a set A is said to be a partial order if (i) a ~ a (reflexivity), (ii) a ~ band b ~ c imply a ~ c (transitivity), (iii) a ~ band b ~ a imply a = b (antisymmetry). If a ~ b and a ~ b, one writes: a < b. The relation < is then asymmetric: (iv) for no a, b is a < band b < a; it is also transitive. If a ~ band b < c, one writes: b ~ a and c > b (transposition). An element a of A is said to be an upper bound for the subset B of A if b ~ a for all b in B; if also a ~ c for every upper bound c of B, one says that a is a least upper bound (l.u.b.) for B. An upper bound for B which belongs to B is called a maximal element of B. If in these definitions one replaces ~ by ~, the resulting concepts are called lower bound, greatest lower bound (g.l.b.) and minimal element, respec- A relation tively. The least upper bound (greatest lower bound) of a set, if it exists, is unique. The partial order is said to be a linear order or chain order, if it is complete: (v) for every a, b either a ~ b or b ~ a. EXAMPLES. The relation a ~ b between real numbers is a linear order; there is no maximal or minimal element. The complex numbers x + yi can be partially ordered by the definition: a + bi ~ c + di if a < c or if a = c and b = d. Numbers with the same real part are not compared. A partially ordered set A is said to be a lattice if each subset containing two elements has a least upper bound and a greatest lower bound. EXAMPLE. Let A be the class of all subsets of a given set B and let S ~ T if S is a subset of T, i.e., if SeT. Then A is a lattice and l.u.b. {S, T} = S U T, g.l.b. {S, T} = S n T. One extends this notation to lattices generally and uses a U b (a cup b) for l.u.b. {a, b}, a b (a cap b) for g.l.b. {a, b}. If every subset of A has a g.l.b. and a l.u.b., then A is called a complete lattice. The example given of a class of all subsets of a given set is a complete lattice. n GENERAL MATHEMATICS 1-10 In a lattice the operations U and , For all a, b, c in A n have the following properties: a U b = b U a and a n b = b n a (commutative laws); (a U b) U c = a U (b U c) and (a b) c= a (b c) (associa- n n n n tive laws); a U a = a, a n a = a (idempotent laws); a n (a U b) = a, a U (a 'n b) = a (absorptive laws). A lattice is said to be distributive if for all a, b, c in A . a U (b n c) = (a U b) n (a U c) or, equivalently, if a n (b U c) = (a n b) U (a n c) for all a, b, c. If A has a minimal 'element and a maximal element, one ordinarily denotes them by 0, 1 respectively. Two elements a and b are said to be complements of each other if a n b = and a U b = 1. A lattice is said to be complemented if each of its elements has a complement. In a distributive lattice, no element can have more than one complement. A Boolean algebra is a complemented, distributive lattice. (See Chap. 11.) EXAMPLE. The partially ordered set formed of the class of all subsets of a given set B forms a Boolean algebra. The minimal and maximal elements are the empty set 0,' and the set B the complement is the same as that defined in Sect. 1. REMARK. A relation ;S which satisfies only conditions (i) and (ii) is called a preorder or quasi-order. An example is the relation "is at least as good as" between automobiles. ° 8. SETS OF POINTS Sets of real numbers can be interpreted as sets of points on the real line, For fixed a, b, a < b, or number axis. {x Ia < x < b} is an open interval, {x Ia ~ x ~ b} is a closed interval, {xla ~ x < b} or {xla < x ~ b} is a half-open interval. For fixed a, e, e > 0, the set . {xta - e < X < a + e} is the e-:neighborhood of a. An arbitrary set of real numbers is open if each element of the set has ane-neighborhqod . contained in the set. A set is closed if its complement is open. A number a is a limit point of a set A if every e-neighborhood of a contains at least one element of A differing from a. SETS AND RElATIONS 1-11 Sets of ordered number pairs (x, y) can be interpreted as sets of points in the xv-plane. For fixed (a, b) and e > 0 the set {(x, y) I (x - a)2 + (y - b)2 < e} is the e-neighborhood of (a, b). A set of points in the xv-plane is open if each point (a, b) in the set has an e-neighborhood contained in the set. A set is closed if its complement is open. A point (a, b) is a limit point of a set A if every e-neighborhood of (a, b) contains at least one element of A differing from (a, b). An open set is called an open region or domain if each two points of the set can be joined by a broken line within the set. A point (a, b) is a boundary point of set A if every e-neighborhood of (a, b) contains at least one point of A and at least one point not in A. The boundary of A is the set of all boundary points of A. A closed region is a set formed of the union of an open region and its boundary. A point (a, b) of a set A is called an isolated point of A if some e-neighborhood of (a, b) contains no element of A other than (a, b). REFERENCES· The references for this chapter fall into several levels. The most elementary discussions of foundations and set theory are found in Refs. 1 and 4. References 5 and 7 are basic graduate level texts in the foundation of mathematics; Hef. 6 is at the same level for general set theory. Reference 3 (Chap. 11) gives a simple introduction to lattice theory and Boolean algebra. Reference 2 is a treatise on all phases of lattice theory, including preorder, partial order, and Boolean algebra. 1. C. B. Allendoerfer and C. O. Oakley, Principles of Mathematic!). McGraw-HilI, New York, 1955. 2. Garrett Birkhoff, ~attice 'Theory, American Mathematical Society, New York, 1948. 3. Garrett Birkhoff and Saunders MacLane, A Survey of llIodern Algebra, Macmillan, New York, 1941. 4. J. G. Kemeny, J. L. Snell, and G. L. Thompson, Introduction to Finite },{athematics, Prentice-Hall, Englewood Cliffs, New Jersey, 1957. 5. R. B. Kershner and L. R. ·Wilcox, The Anatomy of Mathematics, Ronald, New York, 1950. 6. Erich Kamke, Theory of Sets, Dover, New York, 1950. 7. R. L. Wilder, Introduction to the Foundations of Mathematics, Wiley, New York, 1952. A GENERAL MATHEMATICS Chapter 2 Algebraic Equations R. 1. Polynomials c. Lyndon 2-01 2-03 2-04 2-06 2. Real Roots 3. Complex Roots References 1. POLYNOMIALS A polynomial may be defined as a function f = f(x) defined by an equation f(x) = anx n an_IX n- 1 + alX ao, + + ... + where the coefficients ao, at, "', an are constants (real or complex) and x is variable (real or complex). The leading coefficient an will be assumed ~ O. The degree of f is n. A polynomial a2x2 + alX + ao of degree 2 is quadratic; a polynomial alX + ao, of degree 1 is linear; we accept the constant polynomials: f(x) = ao, although the zero polynomial f(x) == 0 must be tacitly excluded from certain contexts. An algebraic equation Of degree n is an equation of form: polynomial of degree n in x = 0; that is, of form f(x) = anx n + ... ao = 0 + A root of sueh an equation is a value of x whieh satisfies it; a root of the equation is called a root of f(:r) or a zero of f(x). Thus r is a root of f(x) if and only iff(r) = O. 2-01 GENERAL MATHEMATICS 2-02 The fundamental theorem of algebra a3serts that an algebraic equation of degree n (n = 1, 2, ... ) has at least one root (real or complex) (Refs. 1, 2). From this it follows that an algebraic equation of degree n has exactly 11, roots (some of which may be repeated, see below). The operations of addition, subtraction, multiplication, and division of polynomials will be assumed to be familiar. Synthetic division is an abbreviation of division by a linear polynomial, x-c. As an illustration, the division of 3x 2 - 7x 11 by x - 2 is carried out in long form and by synthetic division. + 3x - 1 x - 21 3x 2 - 3x 2 - 7x ox X x + 11 + 11 + 2 213 - 7 111 o 6- 2 3 - 1 9 9 Either method yields the quotient 3x - 1 and remainder 9, so that 3x 2 ~ 7x + 11 = (3x - 1)(x - 2) + 9. In the synthetic process, on the first line one replaces x - 2 by +2, 3x 2 - 7x + 11 by the numbers 3, -7, 11. A zero is placed below the 3 and added to yield 3; the result is multiplied by 2 to yield 6; the 6 is added to -7 to yield -1; the -1 is multiplied by 2 to yield - 2; the - 2 is added to 11 to yield 9. On the third line the coefficients of the quotient, 3x - 1, and the remainder, 9, appear in order. REMAINDER THEOREM. If a polynomial f(x) is divided by x - c, then the remainder is f(c). FAc'roR THEOREM. C is a root of f(x) if and only if x - C is a factor of f(x) (Ref. 2). ApPLICATION. If one root, c, of f(x) has been found, the remaining roots of f(x) will be roots of the quotient polynomial f(x) -;-. (x - c), which is of degree n - 1. Repetition of this reasoning leads to a representation of f(x) as a constant times a product of linear factors (x - CI), (x - C2), Since f(x) is of degree n there must be exactly n such factors: Thus f(x) has n roots Cl, C2, ••• , Cn, some of which may be equal. If CI is repeated m times, so that (x - CI)m is a factor of f(x) (and (x - Cl)m+l is not a factor), then Cl is a root of multiplicity m. ALGEBRAIC EQUATIONS 2-03 If c is a repeated root of f(x) (a root of multiplicity 2 or more), then c will also be a root of f' (x), the derivative of f(x), (n - 1)an_lx n- 2 al. f'ex) = nanx n- l Repeated Roots. + + ... + To find the repeated roots, one can proceed as follows. fo(x) = f(x)~ Let fleX) = f'ex), and by division obtain fo(x) = ol(x)fl(X) + hex), where hex) is of degree lower than that of fl (x). ft-l(X) = Ot(x)ft(x) Continue, taking + ft+l(X), until ft+I(X) = o. Then the repeated roots of f(x) are the roots of ft(x). If ft(x) is a (non-zero) constant, f(x) has no repeated roots. Otherwise all repeated roots of f(x) can be found as the roots of ft(x), which has degree lower than that of f (Ref. 2). 2. REAL ROOTS In this section f(x) denotes a polynomial with real coefficients. If f(x) is of odd degree, f(x) has at least one real root, whereas x 2 + 1, for example, has no real roots. Two problems will be considered: (1) establishing existence of real roots, perhaps within prescribed intervals; (2) computing to a satisfactory accuracy the value of a root that has been approximately located. Graphical Methods. One plots the graph of y = f(x). The roots of odd multiplicity are the values of x at which the curve crosses the x-axis, while at roots of even multiplicity the curve is tangent to the x-axis. If f(XI) and f(X2) have opposite signs, there is a root between Xl and X2. In practice, one could use synthetic division to compute the values of f(x) for a number of values of x within some interval a ~ X ~ b. The values a and b can be chosen so that all roots lie between a and b; in particular, all real roots lie in the interval _(M + 1) ~ x ~ M+ 1, lanl lanl where M is the largest of the numbers Iao I, Ial I, ... , Ian-l I. Narrower bounds can often be found by inspection. If in computation of f(b) by synthetic division, the third row consists of non-negative numbers, then no real root exceeds b. An alternative criterion is Newton's rule: if the values f(b),' f' (b), ... , f(n) (b) of the successive derivatives are all non-negative, then no root exceeds b. These last two rules can be applied to the equa- 2-04 GENERAL MATHEMATICS tion obtained by replacing x by - x, in order to obtain a lower bound a. The following rule is sometimes useful: if g(x) = xnf(llx) and if g has all of its real roots between -b and +b, then f has no real roots between - (lIb) and (lib). Derivative. The value f'(C) of the derivative at c gives the rate of increase (decrease, if l' (c) < 0) of f(x) at x = c. At an extremum (relative maximum or minimum) of f(x) , 1'(x) is zero; there can be at most n - 1 such values of x (critical points of f(x)). ROLLE'S THEOREM. Between each two real roots of f(x) there is at least one critical point. Descartes and SturIn Tests. Zero is a root of f(x) only if ao = O. By division by x or some power of x, all zero roots can be removed. Information about the number of positive roots is given by: DESCARTES'S RULE. The signs of the coefficients an, an-b· .. , ao in order, omitting possible zeros, form a string of +'s and -'s. The number v of alternations in sign is defined as the number of consecutive pairs + - or - +. The number p of positive roots is no greater than v and v - p is even. (Negative roots of f(x) are the positive roots of f( -x).) EXAMPLES. x 2 + X + 1 = 0, v = 0, no positive roots; x 2 - 2x + 3 = 0, v = 2, 0 or 2 positive roots; x 2 + 2x - 3 = 0, v = 1, 1 positive root. A more precise criterion is given by: STURM'S THEOREM. Write fo(x) = f(x), fleX) = f'ex) and, stepwise, ft-I(X) = qt(x)ft(x) - ft+I(X), where ft+I(X) is of lower d~gree than ft(x). Continue until some fm+I(X) = o. Now suppose a < b, f(a) ~ 0, feb) ~ O. Let v(a) be the number of alternations in sign in the sequence of values it (c), f2(C), ... , fm(c) (zeros omitted). Then v(a) - v(b) is the exact number of distinct real roots between a and b (Ref. 2). Newton's Method. If Xl is an approximate value of a root of f(x) then one sets The sequence of numbers thus defined converges to a real root of f(x), provided f(XI)f"(XI) > 0 and it is known that Xl lies in an interval containing a root of f(x) but none of 1'(x) or of f"(x) (Ref. 2). 3. COMPLEX ROOTS Let fez) = anzn + ... + ao be a polynomial in the complex variable z, z = x + iy, i = V -1. The coefficients are allowed to be real or complex. If they are real, complex roots of fez) come in conjugate pairs, x ± yi, so that the total number of nonreal complex roots is even. ALGEBRAIC EQUATIONS 2-05 If f(z) is of degree 2, 3, or 4, explicit algebraic formulas for all roots are available (Ref. 1). It is proved in Galois theory that similar formulas for equations of higher degree do not exist (Ref. 1). Equations for Real and Imaginal'Y Parts. Replacement of z by x + iy in the equation J(z) = 0 and equating real and imaginary parts separately to zero leads to two simultaneous equations in the real variables x, y. These can be solved by elimination. EXAMPLE. Z3 - Z 1 = O. Replacement of z by x iy leads to the equations x 3 - 3xy2 - X + 1 = 0, 3x 2y - y3 - Y = O. To find nonreal roots, one assumes y ~ 0 and is led to the equations 8x 3 - 2x - 1 = 0, y2 = 3x 2 - 1. The first has one real root x = 0.66. Hence 0.66 ± 0.55i are the nonreal roots of the equation. Application of Argument Principle. The argument principle, when applied to the polynomial J(z) , states that the total change in the argument (polar angle) of the complex number w = f(z), as z traces out a simple closed path (circuit) C, equals 27r times the number of zeros of J(z) inside C (provided J(z) ~ 0 on C). (See Chap. 7, Sect. 5.) The path C can be chosen as a circle, semicircle, square, or other convenient shape, and the variation of the argument of w can be evaluated graphically. One can pass to the limit from a semicircle in order to find the number of roots in a halfplane. This is the basis of the Nyquist criterion (Chap. 21) . .In general, no root can lie outside the circle with center at z = 0 and radius 1 (1lI/ Ian I), where ill is the largest of Iao I, Ial I, "', Ia n-l I (Ref. 2). Hurwitz-Routh Criterion. This is a rule for determining whether all roots of J(z) lie in the left half-plane (i.e., have negative real parts). For a given sequence co, CI, " ' , Cn, " ' , one denotes by ~k the determinant + + + ~k so that .11 = = Cl Co 0 0 0 C3 C2 Cl Co 0 C2k-l C2k-2 Ck CI, d2 = Cl Co C3 C2 1 I' ~3 = Cl Co 0 C3 C2 Cl C5 C4 C3 For a given polynomial J(z) = coz n + CIZ n - 1 + ... + Cn with real coefficients and Co > 0, one forms ~I, " ' , ~n-I, with Ck replaced by 0 for GENERAL MATHEMATICS 2-06 All roots of fez) lie in the left half-plane if and only if Lli > 0, "', Ll n - I > (Ref. 3). Graffe's Method. Graffe's method is efficient for finding a complex root, or successively all roots, of a polynomialf(z). For simplicity, suppose that fez) has no repeated roots, as can always be arranged by the methods indicated above (Sect. 1). One must further suppose that fez) has a single root To of maximum absolute value; if this fails for fez) it will hold for the new polynomial g(z) = fez + c) for all but certain special values of c. It is necessary to have some rough idea of the argument of the root TO; for example, if To is real, to know whether it is positive or negative. Starting with the polynomial fez) = fl (z) = zn + alzn- I + .. " one forms fl (-z). The product ft (z)ft (-z) contains only even powers of z, hence is of the form fl (z)ft (-z) = h(z2). Similarly, h(z) is formed from !2(Z): h(Z2) = f2(Z) 'h( -z), and the process is continued to form a sequence of polynomials fk(Z) = zn + akZn-1 + .. '. (As justification note that ik has roots which are the 2kth powers of the roots of f; that -ak is the sum of the roots of fk, and hence that the ratio of -ak to T02k approaches 1 as k ---7 (0). One chooses a value Zlc of the 2kth root of - ak; the choice of Zk is made to agree as closely as possible in argument with the initial estimate for the argument of To. The successive values Zl, Z2, ... can be expected to approach TO rapidly. After the root of largest absolute value has been found, one could divide out the corresponding factor and proceed to find the root of next largest absolute value. In practice, it is generally more efficient to use an elaboration of Graffe's method (Ref. 7). k > n. > 0, Ll2 ° REFERENCES 1. Garrett Birkhoff and Saunders MacLane, A Survey of Modern Algebra (Revised edition), Macmillan, New York, 1953. 2. L. E. Dickson, First Course in the Theory of Equations, Wiley, New York, 1922. 3. E. A. Guillemin, The ~Mathematics of Circuit Analysis, Wiley, New York, 1949. 4. C. C. MacDuffee, Theory of Equations, Wiley, New York, 1954. 5. J. V. Uspensky, Theory of Equations, McGraw-Hill, New York, 1948. 6. L. Weisner, Introduction to the Theory of Equations, Macmillan, New York, 1938. 7. F. A. Willers, Practical Analysis, Dover, New York, 1948. A GENERAL MATHEMATICS Chapter 3 Matrix Theory R. C. Lyndon 1. Vector Spaces 3-01 2. Linear Transformations 5. Rank, Inverses 6. Determinants, Adjoint 3-03 3-04 3-05 3-07 3-08 7. Equivalence 3-09 3. Coordinates 4. Echelon Form 8. Similarity 9. Orthogonal and Symmetric Matrices 10. Systems of Linear Inequalities 3-10 3-13 3-14 3-17 References 1. VECTOR SPACES Let F denote the rational number system, or the real number system, or the complex number system; in the following, elements of F are termed scalars and are denoted by small Roman letters a, b, c, .... A vector space V over F is defined (Ref. 9) as a set of elements called vectors, denoted by small Greek letters a, {3, 'Y, "', for which the operations of addition: a + {3 and multiplication by scalars: aa are defined and satisfy the following rules: (i) For each pair a, {3 in V, a + {3 is an element of V and a + {3 = {3 + a, a + ({3 + 'Y) = (a + (3) + 'Y; 3-01 GENERAL MATHEMATICS 3-02 (ii) For each a in V and each a in F, aa is an element of V and, for arbitrary b in F and (3 in V a(a + (3) = aa + a{3, (a a(ba) = (ab )a, + b)a = aa + ba, la = a; (iii) For given a, {3 in V, there is a unique vector 'Y in V such that + 'Y = {3. In particular, there is a unique vector denoted by 0 such that a + 0 = a for all a in V. a When F is the real number system, V is called a real vector space; when F is the complex number system, V is a complex vector space. The system F can be chosen more generally as a field (Ref. 9). The vectors of mechanics in 3-dimensional space form a real vector space V. In terms of a coordinate system, the elements of V are ordered triples (x, y, z) of real numbers; addition and multiplication by real scalars are defined as follows: (Xl, yl, Zl) + (X2' Y2, Z2) = (Xl + X2, YI + Y2, Zl + Z2), a(x, y, z) = (ax, ay, az). A vector a is said to be a linear combination of vectors aI, ... , an if for appropriate choice of aI, ... , an. An ordered set {aI, ... , an} is said to be independent if no member of the set is a linear combination of the others or, equivalently, if al al + ... + anan = 0 implies al = 0, ... , an = o. If the ordered set S = {al, ... , an} is independent and a is a linear combination of its elements (is linearly dependent on S), then the scalars al, ... , an can be chosen in only one way so that a = ~iaiai. If there is a finite set S = {aI, ... , an} such that every a in V is linearly dependent on S, then V is said to be of finite dimension. For the remainder of this chapter, only vector spaces of finite dimension will be considered; this is, however, not the only case of importance. If S = {al, ... , an} is independent and every a of V is a linear combination of these vectors, then S is said to constitute a basis for V. Every finite dimensional vector space has at least one basis, all bases have the same number, n, of elements; n is the dimension of V. A subset W of V is said to be a subspace of V if, with the operations as defined in V, W is itself a vector space. A subset W will be a subspace of V if, whenever a, {3 are in W, a + {3 is in W, and aa is in W for every a in MATRIX THEORY 3-03 F. In particular, {O} is a subspace, as is V itself. The intersection (Chap. 1, Sect. 1) W n U of two subspaces of V is a subspace of V; it is the largest subspace contained in both TV and U. The union TV U U is not usually a subspace; the smallest subspace containing vVand U is rather their (linear) sum W U, consisting of all vectors a {3, a in lV, and {3 in U. If W n U = 0, then W + U is called a direct sum, and is often denoted by WEB U or W U; in this case every vector in W + U is expressible uniquely as a + {3, a in W, {3 in U. For any set of vectors {aI, "', an}, the set of all their linear combinations constitutes a subspace, and the subspace is spanned by them. Every independent set is a subset of a basis. From this it follows that, for each subspace W, there exists U (in general, many) such that V is the direct sum of Wand U. + + + 2. LINEAR TRANSFORMATIONS Let f be a transformation (function, mapping) (Chap. 1, Sect. 3) of vector space V into a second space V'; f is said to be linear if for all a, /3, a, b, f( aa + b(3) = af( a) + bf({3). The image of V under f, denoted by f(V), is the set of all vectors f(a) for a in V; f(V) is a subspace of V'. If f(V) = V', f is said to map V onto V'. The null space of f, denoted by N (f), is the set of all vectors a in V such that f(a) = 0; N(f) is a subspace of V. If N(f) contains only the element 0, f is said to be nonsingular; this is equivalent to the condition that f be one-to-one (Chap. 1, Sect. 3); a nonsingular transformation is termed an isomorphism of V onto f(V). The rank of f is defined as the dimension of f(V); this equals the dimension of V minus that of N(f). The mapping f is nonsingular if and only if its rank is maximal, that is, equals the dimension of V. If W is chosen so that V is the direct sum of N(f) and W, and W has dimension greater than 0, then the restriction of f to W is a nonsingular mapping of W onto f(V), that is, an isomorphism of W onto f(V). If f is an isomorphism of V onto V', then the inverse transformation fT = f- I is a linear transformation of V' onto V. The set of all linear transformations of V into V' becomes itself a vector space over F, if addition and multiplication by scalars are defined by the rules: . f + g is the transformation such that (f + g)a = f(a) + g(a) for all a in V; . af is the transformation such that (af)a = a[f(a)] for all a in V. If f maps V into V' and g maps V' into V", following f by g defines the composite transformation fg of V into V"; explicitly, fg(a) = g[f(a)]. If f, g are linear, so also is fg (Refs. 2, 8, 9). GENERAL MATHEMATICS 3-04 3. COORDINATES Let {ab " ' , an} be a basis for the vector space V, so that every vector a in V can be written uniquely in. the form ~aiai. The ai are the coordinates of a relative to the chosen basis; the ai are also termed components, but this word is sometimes used for: the terms aiai. The choice of a definite basis is often necessary for computation. With a fixed basis understood, one can replace each vector a by the corresponding n-tuple (ab "', an); then A basis that is natural at one stage of a problem may not be the most advantageous at a later stage, so that one must be prepared to change bases. If a basis ab " ' , an is chosen for V and a basis a'b' . " a'm for V', then each linear transformation of V into V' can be assigned coordinates as follows. The transformation f is fully determined by the images f(ai) of the basis elements for V. If f(ai) = ~jaija'j, thenf may be characterized by the n·m scalars aij, where i = 1, .. " n, j = 1, " " m. These numbers are usually thought of as arranged in a rectangular array, or matrix A= 1l a12 [aa21 a22 anI an2 a'a2mm1 . . . = (ai;). anm One terms A the matrix representing the transformation f relative to the given bases in V and V'. If g is a second transformation from V into V', with matrix B = (b ij ) , it is clear that the transformation f + g will have the matrix (aij + bij ). Accordingly, one defines the sum of two n by m matrices as follows: Similarly, the product cA, which represents cf, is defined as the matrix (caij) . Now let f be a linear transformation of V into V', g a linear transformation of V' into V", where V, V' have bases as before, and V" has a basis a/'l, "', a"p. Relative to these bases, f is represented by an n by m MATRIX THEORY 3-05 matrix A = (aij), g by an m by p matrix B = (b ij ), fg by an n by p matrix C = (Cij). Since (fg)(ai) = 2: 2: aijbjka"k k j one finds 7n Cik = 2: aijbjk ; j=l correspondingly, one defines the product of two matrices A and B (where the number of columns of A equals the number of rows of B) to be the matrix C = AB, where the elements Cik of C are given by the above "rowby-column" rule. Multiplication of matrices is not commutative, but is associative and distributive: A(BC) = (AB)C, A(B + C) = AB + AC, (A + B)C = AC + BC. H a = ~aiai is a vector with coordinate representation (at, "', an), one can regard the n-tuple as a 1 by n matrix. The product aA can then be evaluated as that of a 1 by n matrix and an n bym matrix. The result is the 1 by m matrix IVA ..... = (~a.a'l L..J t t , i=l' ... ,L..J .; a·a· t tm ) i=l which represents f(a): f(a) = f (~aiai) = ~ ai/(ai) = ~ t t t 2;: aiaija'j J This shows that, when bases are chosen in V and V', each matrix A is the matrix of a linear transformation (Ref. 2, 9). 4. ECHELON FORM The matrix A associated with a linear trarisformation f from V to V' can be given an especially simple form by suitable choice of basis for V, for V', or for both. We consider the effect of a change of basis for V. Every change of basis for V can be effected by a sequence of elementary transformations of the following types: (1) replacement of ai by a scalar multiple cai, c ~ 0; (2) renumbering, interchanging ai and aj; (3) adding to ai some multiple of aj, j ~ i, so that ai is replaced by ai + caj (and aj is left unchanged). The effect of each transformation is to carry out the analogous operation on the rows of the matrix A = (aij). Thus (1) multiplies each element of the ith row by c, (2) interchanges ith and jth rows, (3) replaces the ith row by (ail + cajl, "', aim + Cajm)' GENERAL MATHEMATICS 3-06 A matrix is said to be in (strict) echelon form (Ref. 9) if: (i) The leading element (first nonzero element) in each nonzero row appears farther to the right than that of any preceding row; (ii) The leading elements are all 1; (iii) Only zeros appear in the same column with a leading element; (iv) All zero rows (if any) appear at the bottom. By a zero row (or column) is meant one consisting wholly of zeros. EXAMPLE. The following matrix is in echelon form. 0 1 3 000 A= [o 0 0 o 0 0 0 0 5] 107 0 1 2 000 Each matrix can be reduced to echelon form by elementary transformations on its rows, as follows: Step 1. If the first column is a zero column, leave it untouched and proceed to the matrix formed by the remaining columns. If the first column is not a zero column, permute rows so that an ~ O. Dividing this row by all gives a new matrix with all = 1. Subtracting suitable multiples of this row from the other rows makes all ail = 0 for i ~ 1. The matrix now has a first column which is a zero column, or else it has all zeros except for a 1 in the top position. Leave the first row and column untouched and proceed to the matrix formed by the elements not in the first row or column. Step 2. Repeat this process as long as possible. The resulting matrix will satisfy (i), (ii), and (iv). Step 3. To obtain (iii), subtract suitable multiples of each row from earlier rows to convert the elements in these rows above the leading element of the given row into zeros. The result may be stated as follows: Every matrix is row-equivalent to an echelon matrix and (it can be shown) to a unique echelon matrix. Application to Systmns of Equations (Ref. 9). A system of m linear equations in n unknowns n L: aijXj = Ci (i = 1, "', m) j=l can be replaced by a single matrix equation AX = C, 3-07 MATRIX THEORY where A (aij) and X, 0 are column vectors: X= [J c= [ } Let B be the augmented matrix of the system, obtained by adjoining -0 as (n + l)st column to A. The usual manipulations of equations employed to successively eliminate (so far as possible) the unknowns Xl, X2, " ' , Xn correspond to elementary transformations on the matrix B. If the result were the echelon matrix of the above example, one would have obtained the equivalent system: X4 +7= 0 Since Xl, X3 do not appear in leading terms, they can be assigned arbitrary values; the general solution can be obtained immediately from the given equations: Xl arbit., X3 arbit., X5 = -2. If a row (00 ... 01) had appeared, there would be an equation 1 = 0, as a consequence of the original system, which would therefore be inconsistent and have no solution. 5. RANK, INVERSES The rank of a linear transformation I of V into V' was defined (Sect. 2) as the dimension of the image space I(V). If I has matrix A, then I(V) is the row-space of A; that is, the subspace of V' spanned by the vectors consisting of the rows of A. The rank of A is defined as the dimension of the row-space of A; hence the rank of A equals the rank of I. It can be shown that the rank of A also equals the dimension of the column-space of A. The rank is unaltered by elementary transformations and can be determined by inspection for an echelon matrix, where it is simply the number of nonzero rows. Let I be a one-to-one linear transformation of V onto V', so that I has a linear inverse 1-1 (Sect. 2). The spaces V, V' must have the same dimen- 3-08 GENERAL MATHEM~TICS sion m and i, i-I are represented by nonsingular square matrices A, B such that AB = BA = I, where I = Im = [: : ~ .:] is the m by m identity matrix. If A is an arbitrary square nonsingular matrix, there exists a unique inverse A-I such that AA -1 = I (which implies A-I A = 1). Hence B must be A -1. The echelon matrix for a square nonsingular matrix A is I; the inverse A -1 may be obtained by applying to I the same sequence of elementary transformations that carry A into its echelon form I. The inverse has the properties (CA)-1 = c-1A-t, 6. DETERMINANTS, ADJOINT By a permutation p of the set of integers 1, 2, .. " m is meant a function p:k ~ k' = p(k) which is a one-to-one transformation of this set onto itself (Ref. 2). Each such permutation is classified as even or odd according as the polynomials in m variables are the same or nega ti ves of each other. EXAMPLE. If m = 3, and p(l) = 3, p(2) = 1, p(3) = 2, then p is even, since One denotes by sgn p the value 1 if p is even, the value -1 if p is odd. The determinant (Refs. 1,9) det A of a square m by m matrix A = (aij) is defined to be the scalar det A = L: sgn p. alp(1) • a2p(2)' ••• • amp(m) p where the sum is over all permutations p of 1,2, "', m. If A is singular, det A = O. For nonsipgular A, det A ~ 0 and det A -is (_l)h times the product of the scalars c appearing in the elementary transformation of type (1) (Sect. 4) used in reducing A to the echelon form I, where h is the number of transformations of type (2). MATRIX THEORY 3-09 I,et Aij denote the submatrix of A obtained by deleting the ith row and jth column. Then for any fixed i, m det A = L (-l)i+jaij'det Aij; j=1 there is an analogous result for expansion according to a fixed column j. One calls det Ai; the minor of aij, and the expansions of det A are called expansions by minors. EXAMPLE. = all (a22 a33 - a23 a32) - a12(a21 a33 - a23 a31) + a13(a21 a32 - a22 a31). The adjoint (adD A of a square matrix A is the matrix B = (b ij ), where bij = (-l)i+j det Aji (note the reversal of indices). One has the rule adj A·A = (det A)·I and, if det A ~ 0, A-I = (det A)-I· a dj A, adj A = (det A).A -1. CRAMER'S RULE. If det A ~ 0, the system m L aijXj = Cj (i = 1 ... m), j=1 has a unique solution Xi = det A (i) detA , where A (i) is the matrix obtained from A by replacing the i-th column by (Ref. 2, 9). Cl, " ' , Cm 7. EQUIVALENCE Let f be a linear transformation of V into V'. It has been seen (Sect. 4) that the matrix A for f can be put in echelon form by a suitable change of basis in V. If V'is not the same space as V, one can further simplify A by independently changing the basis for V'. This effects elementary trans- GENERAL MATHEMATICS 3-10 formations on the columns of A; by successive subtractions of multiples of earlier columns from later ones, followed possibly by a renumbering of the basis, A can be reduced to the form Jr _ (Ir 0) - o Ir 0 where is the r by r identity matrix (Sect. 5) and the O's stand for rows and columns consisting wholly of zeros; J r is a rectangular n by m matrix, just as was the given matrix A. For the matrix in echelon form in the example of Sect. 4 the matrix J r would be [ ~ ~ ~ ~ ~ ~l o o 0 1 000 0 0 0 0 0 The effect of a change of basis in V is to replace A by P A, where P is a nonsingular n by n matrix; the effect of a change of basis in V' is to replace A by AQ, where Q is a nonsingular m by m matrix. The matrix B is said to be equivalent to matrix A if B = P AQ for some nonsingular P and Q. This is a proper equivalence relation (Chap. 1, Sect. 5). The reasoning given above then gives the conclusion: Every A is equivalent to a unique matrix of the form Jr. In other words, the matrices J r (for various r, m, and n) are a set of canonicallorms under equivalence (Ref. 9). 8. SIMILARITY One now considers the possible matrices A representing a linear transformation I of the vector space V into itself. The field F of scalars will be assumed to be the complex number system. Since V' = V, one can no longer change bases in V and V' independently. Indeed, let a'i = '1;j Pijaj be equations defining a new basis a'1, ... , a' n in V. Then P = (Pij) is a nonsingular matrix with inverse p- 1 = (qij) , and ak = '1;h qkha ' h. Let I have the matrix A = (ajk) relative to the basis ai, SO thatl(ai) = '1;k ajka'k. Then I(a'i) = L L L h j Pijajkqkhcih, k and I has the matrix PAP- 1 relative to the basis a'1, ... , a' n. The square matrix B is said to be similar to square matrix A if B = PAP- 1 for some nonsingular matrix P. Hence change of basis in V replaces the matrix of I by a similar matrix. Similarity is an equivalence relation (Chap. 1, Sect. 5) in the class of square matrices (Ref. 9). If A can be reduced to a similar matrix of sufficiently simple form, most MATRIX THEORY 3·11 of the important properties of A can be read off. The ideal situation is that in which A is similar to a diagonal matrix; that is, a matrix (aij) in which aij = 0 for i ~ j. Unfortunately, not every A is similar to a diagonal matrix, and the various canonical forms are approximations to the diagonal form. If A is similar to Al 0 B = diag (AI, ... , An) 0 A2 _0 0 :.J then in terms of the new basis aI, ... , an associated with B one has f(al) = alB = AlaI, ... , f(a n) = anB = Anan. In general, if a vector a ~ 0 is such that aA = Aa for some scalar A, then A is called an eigenvalue (characteristic value, latent root) of A, and a is called an eigenvector belonging to A. The characteristic polynomial for A is the polynomial cp(x) = det (xl - A); this is a polynomial cp(x) = Co + CIX + ... + cnxn of degree n, and its n roots (real or complex) are the eigenvalues of A. In particular, (-l)nco = det A = AI·A2 ..... An, -Cn-l = au + ... + ann = Al + ... + An = trace of A, and Cn = l. The HAMILTON-CAYLEY THEOREM (Ref. 9) states that A satisfies its characteristic equation: cp(A) = col + cIA + ... + cnA n = o. If the roots of cp(x) are distinct, then A is similar to B = diag (AI, ... , An). In fact, let cp(x) be a factor of the kth power of a polynomial 1f;(x), whose roots are the distinct numbers AI, ... , Ap; if 1f;(A) = 0, then A is similar to a diagonal matrix; if 1f;(A) ~ 0, then A is not similar to a diagonal matrix. In the general case of repeated roots, the matrix A is similar to a matrix B in Jordan normal form; that is, a matrix (in partitioned form, see Ref. 9, Sect. 2.8) 3-12 GENERAL MATHEMATICS where- the Bi are square matrices of form "Xi 1 0 0 0 "Xi 1 0 0 0 "Xi 1 0 0 0 "Xi Bi= and "Xl, "', "Xs are not necessarily distinct. In the matrix B each characteristic root "X appears on the diagonal a number of times equal to its multiplicity. An alternative rational canonical form for matrix A has the form B = diag (BI, "', Bp), where Bi has form o o 1 o o o o o o 1 If A has rational (real) entries, the Bi can be chosen so that the Cij are rational (real). If A is a real matrix, the eigenvalues "X need not be real but, since ¢(x) has real coefficients, they will occur in conjugate complex pairs. In this connection it is useful to note that the matrices r c~s (J ( r sm (J -r sin (J) r cos (J are similar. When A is of small degree or is otherwise especially simple, its eigenvalues and eigenvectors can be found by explicit calculation from the definitions given above; often they can be found from the physical interpretation of the problem. Determination of eigenvalues is a problem in solving an algebraic equation (Chap. 2), but other methods are available (Ref. 4). If "X is an eigenvalue having absolute value greater than that of all other' eigenvalues and a is any reasonable approximation to an eigenvector belonging to "X (a must not lie in the subspace spanned by the eigenvectors of the remaining eigenvalues), then the sequence a = aI, a2, "', an, ... where ai+l = aiA/ci and Ci is the first nonzero coefficient of ai will converge to an eigenvector for "X. MATRIX THEORY 3-13 9. ORTHOGONAL AND SYMMETRIC MATRICES Let V be a real vector space, with basis aI, "', an, SO that each vector has coordinates (al, "', an). The inner product (Ref. 9) (a, (3) of the vectors a = (aI, "', an), {3 = (b l , " ' , bn) is defined as the scalar (a, (3) + ... + anb n. = alb l The norm of a is the scalar Ia I = (a, a)Y2. The angle () between a, {3 is defined by the equation (a, (3) = lall{3lcos (). These definitions are relative to the given basis but are unaffected if a new orthonormal basis is introduced; that is, a basis a'l, "', a' n such that (a'i, a'j) = Oij = 1 or 0 according as i = j or i ~ j. If a'i = L: aijaj, j then the matrix A = (aij) has as its inverse the transposed matrix AT = (b ij ) , where bij = aji; that is, AAT = I. A matrix with this property is 'called orthogonal. Since det A = det AT, and det A ·det AT = det I = 1, one concludes that det A = ±1. When det A = 1, A is called proper orthogonal and is a product of rotations; if det A = -1, A is a product of rotations and one reflection, so that orientation is reversed. The eigenvalues of an orthogonal matrix all have absolute value equal to 1. A real matrix A = (aij) is termed symmetric if A = AT; if, further, the quadratic form "J;i,j aijXiXj is >0 except when Xl = ... = Xn = 0, then A and the quadratic form are said to be positive definite. The x's can be interpreted as coordinates of a vector a with respect to a given basis; then ~i,j aijXiXj = (aA, a). If a new basis is chosen (not necessarily orthonormal), the form is replaced by a new quadratic form. When A is positive definite, the new basis can be chosen so that (aA, a) has the form ~i xl; this is equivalent to the statement that A can be written as ppT, where P is nonsingular. If A is symmetric, but not necessarily positive definite, the new basis can be chosen so that (aA, a) has the form Yl 2 + ... + Yr2 - Yr+1 2 - ... - Ys2, where the numbers r, s are uniquely determined by A. This is equivalent to the statement that there exists a nonsingular matrix P such that PApT = B, where B = (b ij ) , bij = 0 for i =/= j, bij = 0 or ±1 for i = j. (One terms B congruent to A.) The eigenvalues of a symmetric matrix A are all real, and A is similar to a real diagonal matrix C; indeed C = P AP-l, where P may be chosen to be orthogonal (Ref. 9). GENERAL MATHEMATICS 3-14 An analogous theory holds for complex vector spaces. The inner product is defined as so that (a, a) = ~ilail2 > 0; the norm of a is defined to be (a, a)Y2. Orthogonal matrices are replaced by unitary matrices, defined by the condition AAT = I, where the bar denotes replacement of each entry by its conjugate. Symmetric matrices are replaced by H ermitean matrices, defined by the condition A = AT. 10. SYSTEMS OF LINEAR INEQUALITIES Let V be a real vector space with fixed basis {aI, "', an} as in Sect. 9, so that each vector a has coordinates (ab "', an). If the vector {3 has coordinates (bI, "', bn ) then one writes {3 > a or a < {3 if ai < bi (i = 1, "', n), (i = 1, "', n), {3 ~ a or a ~ {3 if a ~ {3 but a ~ {3. The relation ~ is a partial order; the relations < and ~ are antisymmetric and transitive but not reflexive (Chap. 1, Sects. 4 and 7). A vector a is said to be non-negative if a ~ 0, positive if a ~ 0, strictly positive if a > o. The set Q of all non-negative vectors is called the positive orthant in V. A positive vector a such that al + ... + an = 1 is called a probability vector. For fixed a and real number k, the set of all vectors ~ = (Xl, " ' , xn) such that (a, ~) + k = aIXI + ... + anXn + k ~ 0 is a closed set (Chap. 1, Sect. 8) called a half-space x. For example, the solutions of 2XI + 3X2 - 6 ~ 0 form the half-space (half-plane) in twodimensional space, as shaded in Fig. 1. Similarly, the solutions of (a, ~) + k > 0 constitute an open half-space XO' The solutions of (a, ~) + k = 0 constitute a hyperplane II which is the boundary of both X and XO' By a system of linear inequalities is meant a set of relations MATRix THEORY 3-15 where'the index K ranges over a given l:let (possibly infinite), and for each RK is one of the relations >, ~, For exmnple,' Xl + 5X2 > 0, Xl + X3 K, =0 is a system of linear inequalities. By a solution of the system of inequalities is meant a vector~::= (Xl, •• " xn) which satisfies all the inequalities. FIG. 1. Half-space in two dimensions. With each inequality is associated a half-space (or hyperplane) 3C K • The set of all solutions of the system is the intersection of all X K • Convexity. The vector a is said to be a convex combination of vectors al, " ' , am if P1 + ... + Pm = 1, Pi ~ 0 (i = 1, "', m). A nonempty set I(in V is sa~d to be convex if it contains all convex combinations of its vectors. If I( is interpreted as a point set in n-dimensional space, K is convex if and only if, for each pair of points al, a2 in K, the line segment joining a1 to a2 lies in K. (See Fig. 2.) A half-space 3C is said to be a support for a convex set K in V if K is a subset of x. If, moreover, II contains n - 1 independent vectors of K, X is called an extreme support for K. Let T be a subset of V. The set of all convex combinations of vectors 3-16 GENERAL MATHEMATICS in T is a convex set, called the convex closure of T. A convex set K is said to be finitely generated if it is the convex closure of a finite subset of K. FIG. 2. Convex set. A set T is said to be bounded if, for some constant M, Iall + ... + Ian I ~ M for all a in T. DOUBLE DESCRIPTION THEOREM. If K is a finitely generated convex set in V, then K is the intersection of a finite number of supports; moreover, if K spans V, then K is the intersection of its extreme supports. Conversely, if the intersection of a finite collection of half-spaces is nonempty and bounded, then it is finitely generated. This theorem, when formulated in algebraic terms, is known as Farkas' Lemma: FARKAS' LEMMA (strong nonhomogeneous form). Let V be n-dimensional real vector space, let V' be m-dimensional real vector space; fixed bases are assumed chosen in each. Let A be an n by m matrix, let 0 be a vector in V', let k be a scalar, and suppose that there is at least one vector cp in V such that cpA.~ o. Then a vector a in V will satisfy the condition (a, cp) ~ k for all cp for which cpA ~ 0 if and only if there exists a vector "I ~ 0 in V' such that a = "IAT and ("I, 0) ~ k. FARKAS' LEMMA (weaker homogeneous form, k = 0, 0 = 0). Let A be an n by m matrix. Then a vector a in V will satisfy the condition (a, cp) ~ 0 for all cp for which cpA ~ 0 if and only if there exists a vector "I ~ 0 in V' such that a = "lAT. Farkas' Lemma can be used as a foundation for the 111inimax Theorem in game theory and for the Duality Theorem in linear programming. These theorems can also be deduced from the following one: MATRIX THEORY 3-17 THEOREM. Let A = (aij) awl B be n by m 1r/,alrices with aij > 0 for all i, j. Then there exist probability vectors ~ in V and 7] in V' and a unique scalar Ie such that (leA - B)'r] ~ 0 and ~(lcA - B) ~ 0; in the first inequality 7] is regarded as an m by 1 matrix. For all of Sect. 10, see Refs. 5, 6, 9. REFERENCES 1. A. C. Aitken, Determinants and Matrices, Interscience, New York, 1954. 2. Garrett BirkhofI and Saunders MacLane, A Survey of lIfodern Algebra (Revised edition), Maemillan, New York, Hl53. 3. M. Bocher, Introduction to Higher Algebra, Macmillan, New York, 1930. 4. R. A. Frazer, W. J. Duncan, and A. R. Collar, Elementary lIfatrices, Cambridge University Press, Cambridge, Englund, 1938. 5. T. C. Koopmans (Editor), Activity Analysis of Production and Allocation (Cowles Commission Monograph No. 13), Wiley, New York, 1951. G. H. W. Kuhn and A. W. Tucker (Editors), Contributions to the Theory of Games, Vols. I, II, III (Annals of Mathematics Studies Nos. 24, 28, 38), Princeton University Press, Princeton, N. J., 1950, 1953, 195G. 7. C. C. MaeDuffee, The Theory of Matrices, Chelsea, New York, 194G. 8. C. C. MacDuffee, Vectors and Matrices, Mathematical Association of America, Buffalo, N. Y., 1943. 9. R. M. Thrall and L. Tornheim, Vector Spaces and lIfatrices, Wiley, New York, 1957. 10. J. H. M. Wedderburn, Lectures on Matrices, American Mathematical Society, New York, 1934. A GENERAL MATHEMATICS Chapter 4 Finite Difference Equations G. E. Hay 1. Definitions 4-01 2. Linear Difference Equations 4-03 3. Homogeneous Linear Equations with Constant Coefficients 4-04 4. Nonhomogeneous Linear Equations with Constant Coefficients 4-05 5. Linear Equations with Variable Coefficients 4-07 4-08 References 1. DEFINITIONS By a difference equation is meant an equation relating the values of an unspecified function f at x, x h, x 2h, "', x nh, where h is fixed. For example, + (1) f(x + 3) - f(x + 2) + - xf(x + + 1) - 2f(x) = x 2 is a difference equation, in which h = 1, n = 3. The variable x will generally be assumed to vary over the discrete set of real values Xo + ph (p = 0, ±1, ±2, ... ), where Xo is a constant. By proper choice of origin Xo and scale one can make Xo = 0, h = 1, so that x varies over the integers 0, ± 1, ±2, .... In the subsequent discussion, this simplification will be assumed made, so that x varies over the integers and the difference equation thus relates the values of f at x, x 1, "', x n. For the case when x varies continuously, see Remark at the end of Sect. 4. The values of f are assumed to be real, although much of the theory extends to the case in which f has complex values. + 4-01 + GENERAL MATHEMATICS 4-02 A general difference equation is constructed from a function 1/;(x, Yo, Yl, .. " Yn) of the integer variable x and the n + 1 real variables Yo, "', Yn. The difference equation is the equation (2) 1/;(x, f(x) , f(x + 1), .. " f(x + n» = O. By a solution of the difference equation is meant a function f which satisfies it identically. When the equation takes the form (3) f(x + n) = c/>(x,f(x), ... ,f(x +n - 1» and c/>(x, Yo, .. " Yn-l) is defined for all values of x, Yo, "', Yn-l, eq. (3) is simply a recursion formula. If f(O) , f(l), "', fen - 1) are given arbitrary values, then eq. (3) determines successively fen), fen + 1), "'; thus there is a unique solution for x ~ 0 with the given initial values f(O), f(1), "', fen -'- 1). The first d~fference of f(x) is f)"f = f(x + 1) - f(x); the second difference is f),,2f = f)"(f),,f) = f(x + 2) - 2f(x + 1) + f(x); the kth difference is f)"kj. A difference cq. (2) can be written in terms of f and its differences. For example, eq. (1) is equivalent to the equation (1') Conversely, an equation relating f, f)"f, "', f)"nf can be written in form (2). Thus, eq. (2) is the general form for difference equations, and this form will be used throughout this section, in preference to an equation relating the differences of f. The order of the difference eq. (2) is defined as the distance between the most widely separated x-values at which the values of f are related. If 1/; definitely depends on f(x) and f(x + n), then the order is n. However, the order may be less than n. For example, the equation (4) f(x + 4) - 2f(x + 3) - f(x + 1) =0 has order 3, since the most widely separated values are x + 1 and x + 4. The substitution g(x) = f(x + 1) reduces this to an equation relating g(x), g(x + 2), g(x + 3). OPERATOR NOTATION. If Y is a function of x, one writes (5) Thus EOy = y(x), Ely = Ey = y(x be written (2') (k = 0, 1,2, ... ). + 1). The difference eq. (2) can thus FINITE DIFFERENCE EQUATIONS 4-03 2. LINEAR DIFFERENCE EQUATIONS By a linear difference equation is meant an equation of form (6) anf(x + n) + an-d(x + n - 1) + ... + ad(x + 1) + aof(x) = where ao, "', an, vex) are given functions of the integer variable x. terms of the operator E of Sect. 1, the equation can be written: vex), In (6') where Y = f(x). It can be written more concisely as follows: ,p(E)y (7) = vex), where ,peE) is a linear difference operator: ,peE) (8) = an En + ... + alE + ao. If vex) == 0, eq. (6) is termed homogeneous; otherwise it is nonhomogeneous. In case ao ~ 0, an ~ 0, the equation is of order n. In case ao == al == ... == am-l == 0, but ama n ~ 0, then it is of order n - m; the substitution g(x) = f(x - m) then reduces eq. (6) to a linear equation for g of form (6), with nonvanishing first and last coefficients. Linear Independence. Let YI (x), "', Yp(x) be functions of x defined for a < x < b. The functions are said to be linearly independent if a relation blYI (x) + ... + bpYp(x) == 0 with constant bl " ' , bp, can hold only if bl = b2 = ... = bp = O. Otherwise, the functions are said to be linearly dependent. General Solution. Let the difference eq. (6) be given, with vex) == 0 and ao(x)an(x) ~ for a < x < b; all coefficients are assumed defined for a < x < b. Then the equation has order n, there are n linearly independent solutions Yl (x), "', Yn(x) for a < x < band ° (9) Y = CIYI(X) + ... + cnYn(x), a < x < b, where CI, " ' , en are arbitrary constants, is the general solution; that is, all solutions are given by eq. (9). If vex) ;;5 0, but the other hypotheses hold, then the general solution has form (10) . Y = CIYl(X) + ... + CnYn(X) + VeX), where Vex) is a solution of the nonhomogeneolls equation and CIYl(X) + ... + cnYn(x) is the general solution of the related homogeneous equation, that is, the homogeneous equation obtained by replacing vex) by O. EXAMPLE. The functions YI == 1, Y2 == X are linearly independent solutions of the equation (E 2 - 2E l)y = 0, so that Y = Cl C2X is the general solution; Y = 2x - 1 is a solution of the equation (E 2 - 2E + l)y = 2x - l , so that Y = Cl C2X 2x - 1 is the general solution. + + + + 4-04 GENERAL MATHEMATICS 3. HOMOGENEOUS LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS The equations considered have form 1/;(E)y = 0, (11) where (12) the coefficients an, "', ao are constants, and aoan eq. (12) is the characteristic polynomial 1/;(A) = anA n in the complex variable A. ~ O. Associated with + ... + alA + ao The equation (13) 1/;(A) =0 is an algebraic equation of degree n, called the characteristic equation or auxiliary equation associated with eq. (11). The characteristic equation has n roots At, "', An called characteristic roots. (See Chap. 2.) These may be real or complex; since the coefficients are assumed real, the complex roots come in conjugate pairs. From the set of characteristic roots one obtains a set of n solutions of the difference eq. (11) by the following rules: I. To each simple real root A one assigns the function Xl:; II. To each real root A of multiplicity k one assigns the k functions AX, XA x, "', Xk-IA x ; III. To each pair of simple complex roots ex + (3i = p(cos cp ± i sin cp) one assigns the functions pX cos cpx, pX sin cpx; IV. To each pair of complex roots ex ± (3i = p(cos cp ± i sin cp) of multiplicity k one assigns the 2k functions pX cos cpx, xpx cos cpx, "', Xk- l pX cos cpx, pX sin cpx, xpx sin cpx, "', xk-Ipx sin cpx. In all one obtains n functions YI (x), "', Yn (x) which are linearly independent solutions of eq. (11) for all x, so that Y = CIYI(X) is the general solution. 2 4 3 EXAMPLE. (E - 8E + 25E The characteristic equation is A4 - 8A3 + ... + cnYn(x) - + 25A2 36E + 20)y = O. -36A + 20 = O. 4-05 FINITE DIFFERENCE EQUATIONS The roots are 2, 2, 2 ± i. Y = 2X(Cl Hence the general solution is + C2X) + 5x / 2 (C3 cos cpx + C4 sin cpx), where cp = arctan 72. REMARKS. The variable x has heretofore assumed only integral values. If x is allowed to take on all real values, then the difference equation becomes a functional equation. The methods of this section are still applicable and provide the general solution of eq. (11) subject only to the following two modifications: (a) the arbitrary constants Cl, C2, ••• may be replaced by arbitrary periodic functions of x, of period 1; (b) if A is a negative characteristic root of multiplicity Ie, the corresponding solutions become (_A)X cos 7I"X, x( _A)X cos 7I"X, "', Xk - l ( -A)X cos 7I"X. 4. NONHOMOGENEOUS LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS The equation considered is 1f;(E)y (14) = vex), where 1f;(E) satisfies the same conditions as in Sect. 3. By the rule stated at the end of Sect. 2, the general solution of eq. (14) has the form (15) y = C1Yl (x) + ... + cnYn(x) + Vex), where Vex) is a particular solution and the other terms are the general solution of the related homogeneous equation 1f;(E)y = O. The procedures for finding the particular solution Vex) can be described concisely by means of an operational calculus which parallels that used for differential equations (Chap. 8). The operators 1f;(E) with constant coeffi. cients can be added, subtracted, multiplied, and multiplied by constants just as polynomials. The operators can be converted into operators x(~) by the relation A=E-l. (16) For example, The powers ~, D?, "', ~ k are the first, second, "', kth difference, as defined in Sect. 1. If Y = Vex) is a solution of eq. (14), one writes (17) Vex) = - 1 1f;(E) vex) = [1f;(E)]-lV(X). GENERAL MATHEMATICS 4-06 TABLE No. 1. t/;(E) 1. RULES FOR PARTICULAR SOLUTIONS vex) t/;(E) CIVI(X) 2. t/;1(E)t/;2(E) + C2V2(X); CI, C2 const. vex) [t/;(E)]-IV cI[t/;(E)]-IVI c2[t/;(E)]-IV2 + t/;)E) Ch~E) v) x-I 3. A = vex), E - 1 X = 0, ±1, ... A-IV = L v(k) k=O 4. A2 = E2 - 2E +1 x-I k-l vex), x= 0, ±1, ... e) ~ 1* (x 1~1'~ ~ 1) I A-2v = L L v(S) k=O s=O d-' (:) - (n: J 1 aX t/;(aE) u(x) 6. t/;(E) 7. t/;(E) t/;(a) 8. (E - a)kcf>(E) cf>(a)k! aXu(x), t/;(a) ~ 0, 'it a polynomial of degree S 9. t/;(E) aX[p(a) as + !!.- p'(a)A + ... 1! + I" p8(a)As]u(x), s. 10. (E - a)kcf>(E) ¢(a) ~ ° p(X) = l/tf;(X) aXu(x), u(x) a polynoial ofaX-k[q(a)A -k degree S a q'(a)A l-k + -1! as + -;! q8(a)A 11. E - a 12. (E - a)k + 13. (E - a)2 ,82 a + i,8 = p( cos cf> i sincf» + s - + ... k]u(x), vex) q(X) = l/cf>(X) ax-1A -lea-xv) vex) ax-kA -k(a-xv) vex) (pX-l/,8)[sin (cf>x - ¢). A -l(p-x cos cf>x v) - cos (cf>x - cf». A -l(p-x sin cf>x v)] FINITE DIFFERENCE EQUATIONS 4-07 Thus the inverse operator [1f(E)]-I, when applied to vex), yields one solution of the eq. (14). The rules for finding particular solutions can now be summarized in a table, which evaluates [1f(E)]-I V for various choices of 1/; and v. This is carried out in Table 1. The last column gives one choice Vex) of [1f(E)]-lV; the general solution is given by eq. (15). The binomial coefficient (~) of Rule 5, Table 1, is defined for n = 1,2, .... When n = 0, it is defined to equal 1, and Rule 5 remains valid. Corresponding to this inverse rule is the direct rule: O n. A general power- of x can be expanded in terms of these coefficients: (19) where the T ni are Stirling numbers of the second kind. They are tabulated on page 170 of Ref. 2. If the polynomial u(x) is expanded in terms of the coefficients by eq. (19) and Rule 5 or eq. (18) is applied, then Rules 9, 10 are easier to use. A general expression 1/1f(E) can be regarded as a rational function of E and expanded in partial fractions, just as if E were a numerical variable. Rules 11, 12, 13 then permit evaluation of the terms. For example) 1 E2 _ 3E + 2 vex) = 1 = (E - 1)(E - 2) vex) = 2x - I Ll- 1 (2- X v) - (1 1) E - 2 - E _ 1 vex) Ll-Iv. Rule 12 is needed for multiple roots. Rule 13 is needed for complex roots; it can be generalized to take care of repeated complex roots (Ref. 2). 5. LINEAR EQUATIONS WITH VARIABLE COEFFICIENTS The general solution of the first order linear equation [E - p(x)]y = v(x),· (20) where p(x) ;z!= 0 for x ~ a, (22) Y = q(x) q(x) = p(a)p(a = 1, ~ v(s) ] a, is x-I (21) x x [ c + L: 8=a q(s + 1) ' + 1) ... p(x = a. 1), x >a GENERAL MATHEMATICS 4-08 Laplace's Method. For equations (23) with polynomial coefficients, one seeks a solution (24) y(x) =·1 b tx-1v(t) dt, a where a, b, and vet) are to be determined. Let (x)n denote n!(~)~ so that (x)n = x(x - 1) ... (x - n + 1) for n = 1, 2, ... , and let (x)o = 1. It follows from eq. (19) that an arbitrary polynomial can be expressed as a linear combination of the polynomials (x)n. Hence the coefficients ak(x) can be considered as linear combinations of the (x )n. N ow by integration by parts one obtains from eq. (24) the relation (25) (x +m = l)mETy [,~ (-l)'+'(x + m - l)m_,t"-1+'D'-' (t"v(t) I . + (-l)m f I b tx+m-1Dm{tnv(t)} dt, a where D8 = d8/dt 8. Hence the difference eq. (23) takes the form (26) [F(x, v, t)]: + f b tX-1G(v, t) dt = O. a The function vet) is chosen so that G(v, t) == O. In fact, the equation G(v, t) = 0 is usually a homogeneous linear differential equation for vet). The constants a and b are then chosen so that F(x, v, t) vanishes when t = a and t = b, so that eq. (26) is satisfied. Once a, band vet) have been determined in this way, eq. (24) then yields y(x). For further details see Ref. 2, Sect. 174. REFERENCES 1. T. Fort, Finite Differences, Oxford University Press, Oxford, England, 1948. 2. C. Jordan, Calculus of Finite Differences, 2nd edition, Chelsea, New York, 1950. 3. N. E. Norlund, Vorlesungen uber Differenzenrechnung, Springer, Berlin, 1924. A GENERAL MATHEMATICS Chapter 5 Differential Equations G. E. Hay and W. Kaplan 1. Basic Concepts 2. Equations of First Order and First Degree 3. Linear Differential Equations 4. Equations of First Order but not of First Degree 5. Special Methods for Equations of Higher than First Order 6. Solutions in Form of Power Series 7. Simultaneous Linear Differential Equations 8. Numerical Methods 9. Graphical Methods-Phase Plane Analysis 10. Partial Differential Equations References 5·01 5·02 5·04 5·07 5·09 5·10 5·12 5·14 5·15 5·20 5·22 1. BASIC CONCEPTS An ordinary differential equation is an equation of form (1) ",(x, y, y', "', yen») = 0, expressing a relationship between an unspecified function y of x and its derivatives y' = dy/dx, "', yen) = dny/dx n. An example is the following: y' - xy = 0. The order of the equation is n, which is the order of the highest derivative appearing. A solution is a function y of x, a < x < b, which satisfies the 5-01 5-02 GENERAL MATHEMATICS equation identically. For many equations one can obtain a function (2) = f(x, y cn ), Cl, ••• , expressing y in terms of x and n independent arbitrary constants Cl, ••• , Cn such that, for each choice of the constants, eq. (2) is a solution of eq. (1), and every solution of eq. (1) is included in eq. (2). When these conditions are satisfied, eq. (2) is called the general solution of eq. (1). A particular solution is the general solution with all of the n arbitrary constants given particular values. If eq. (1) is an algebraic equation in yen) of degree k, then the differential eq. (1) is said to have degree k. For example, the equation (3) y",2 + y"y'" + y4 = eX has order 3 and degree 2. When the degree is 1, the equation has the form (4) p(x, y, ... , y(n-I))y(n) + q(x, y, ... , yen-I») = 0 or, where p -=F- 0, the equivalent form (5) yen) = F(x, y, ... , yen-I)), F = -q/p. The EXISTENCE THEOREM asserts that, if in eq. (5) F is continuous in an open region R of the space of the variables x, y, ... , y(n-I), and (xo, Yo, ... , Yo (n-I)) is a point of R, then there exists a solution y(x) of eq. (5), Ix - Xo I < h, such that (6) y = Yo, y' = y' 0, ••• , yen-I) = Yo (n-I) for x = Xo. Thus there exists a solution satisfying initial conditions (6). If F has continuous partial derivatives with respect to y, y', ... , yen-I) in R, then the solution is unique. 2. EQUATIONS OF FIRST ORDER AND FIRST DEGREE . An equation of first order and first degree can be written in either of the equivalent forms (7) (8) y' = F(x, y), M(x, y) dx + N(x, y) dy = o. For equations of special form, explicit rules can be given for finding the general solution. Some of the most important types are listed here. Equations with Variables Separable. If in eq. (8) M depends only on x, N only on y, then eq. (8) is said to have the variables separable. The equation may then be written with the x's separated from the y's, and the general solution may be obtained by integration. DIFFERENTIAL EQUATIONS EXAMPLE. 5-03 y' = 3x:.!y. An equivalent separated form is 3x 2 dx - y-l dy = O. Hence f3X 2 dx - f y - l dy = c. Integrating and solving for y one finds y = clex~ as the general solution, where CI = e-c • HOlllogeneous Equations. A function F(x, y) is said to be homogeneous of degree n if F(Ax, AY) == AnF(x, V). The differential eq. (7) is said to be homogeneous if F(x, y) is homogeneous of degree o. To solve such a differential equation write y = vx and express the differential equation in terms of v and x. The resulting differential equation has variables separable and can be solved as above. In general, y' = F(x, y) becomes xv' +v = F(x, vx) = xOF(1, v) = G(v), dx -+ x v- dv G(v) =0. Exact Equations. The differential eq. (8) is exact if for some function u(x, y) (9) au - ax = au - M(x, V), ay = N(x, V), so that du = 1\;[ dx + N dy. The equation is exact if and only if alv[lay == aNlax. The general solution is given (implicitly) by u(x, y) = c. EXAMPLE. (3x 2 - 2xy) dx + (2y - x 2) dy = O. Here aMlay = -2x = aNI ax, so that the equation is exact. Then au - = 3x 2 - 2xy ax ' From the first equation, y = x - x 2y + g(y), where g(y) is an arbitrary function of y. Substitution in the second equation yields the relation _x 2 + g'(y) = 2y - x 2, so that g(y) = y2 + c. Hence the general solution is x 3 - x 2 y + y2 = c. Integrating Factors. If the eq. (8) is not exact, it may be possible to make it exact by multiplying by a function ¢(x, V), called an integrating 3 factor. The equation (3xy + 2y2) dx + (x 2 + 2xy) dy = 0 is not exact, but after multiplication by x becomes the exact equation (3x 2y + 2xy2) dx +(x3 + 2x 2y) dy = O. EXAMPLE. The general solution is x 3 y + x 2y2 = c. The integrating factor is x. 5-04 GENERAL MATHEMATICS Linear Equations. A differential equation is linear if it is of the first degree in the dependent variable and its derivatives. If such an equation is also of the first order, it may be written in the form (10) y' + p(x)y = q(x). Here u = ef p dx is an integrating factor and the general solution is (11) EXAMPLE. y y' ~ u-1 (f + x-Iy = Q(x)udx 4x 2. + C), U pdx = ef . Here u = x and eq. (11) gives 3. LINEAR DIFFERENTIAL EQUATIONS The linear differential equation of order n can be written in the form (12) aoDny + alDn-Iy + ... + an_IDy + anY = Q(x), where the coefficients ao~ "', an may depend on x, and Dky == dky / dxk. When the aj are constant, eq. (12) is said to have constant coefficients. When Q(x) == 0, the equation is said to be homogeneous. The homogeneous equation obtained from eq. (12) by replacing Q(x) by 0 is called the related homogeneous equation. It will generally be assumed that ao ~ 0 throughout the interval of x considered. The general solution of eq. (12) is given by (13) y = ClYI (x) + ... + cnYn(X) + y*(x), where y* (x) is one particular solution and YI (x), "', Yn (x) are particular solutions of the related homogeneous equation which are linearly independent; that is, a relation blYI (x) + b2Y2(X) + ... + bnYn(x) == 0, with constant bI, "', bn can hold only if bl = 0, "', bn = O. Q(x) == 0, one can choose y*(x) to be O. When HOlllogeneous Linear Equations with Constant Coefficients. The equation has the form (14) ao ~ 0, where ao, "', an are constants. Particular solutions are obtained by setting y = eTX • Substitution in eq. (14) leads to the equation for r: (15) 5-05 DIFFERENTIAL EQUATIONS This is called the auxiliary equation or characteristic equation. In general it has n roots, real or complex, some of which may be coincident (Chap. 2). From these roots one obtains n linearly independent solutions of the differential eq. (14) by the following rules: I. To each real root r of multiplicity k one assigns the functions ekx , kx xe , "', xk-1e kx . II. To each pair of conjugate complex roots a ± {3i of multiplicity k one assigns the 2k functions eCXX cos {3x, eCXX sin {3x, xe CXX cos {3x, xe CXX sin {3x, .. " xk-1e cxx cos {3x, xk-1e cxx sin (3x. The n function Yl (x), .. " Yn(x) thus obtained are linearly independent and Y = CIYl(X) + ... + cnYn(x) is the general solution of eq. (14). EXAMPLE 1. D2y - 3Dy + 2y = 0. The auxiliary equation is r2 - 3r + 2 = 0, the roots are 1, 2; the general solution is y = clex + C2e2x. 4 EXAMPLE 2. D6y - 9D y + 24D2y - 16y = 0. The auxiliary equa4 2 tion is r6 - 9r + 24r - 16 = 0, the roots are ±1, ±2, ±2; the general solution is Y = clex + C2e-x + e2x (c3 + C4X) + e-2x (c5 + C6X). 3 EXAMPLE 3. D4y + 4D y + 12D2y + 16Dy + 16 = 0. The auxiliary 4 3 equation is r + 4r + 12r2 + 16r + 16 = 0, the roots are -1 ± iy3, -1 ± iV3. The general solution is y = e-X[(cl + C2X) cos y3 x (C3 + C4 X ) sin V3 x]. + Nonhomogeneous Linear Equations with Constant Coefficients. The equation considered is (16) ao ~ 0, where ao, ';', an are constants and Q(x) is, for example, continuous for < x < b. The general solution of the related homogeneous equation is found as in the preceding paragraphs; it is called the complementary function. Here are presented methods for finding a particular solution y*(x) of eq. (16). As indicated in eq. (13), addition of the complementary function and y*(x) gives the required general solution of eq. (16). Method of' Undetermined Coefficients. If Q(x) is of form a (17) eCXX[p (x) cos (3x + q(x) sin (3x], where p(x) and q(x) are polynomials of degree at most h, then there is a particular solution (18) GENERAL MATHEMATICS 5-06 where cjJ(x) and 'if; (x) are polynomials of degree at most h and a ± {3i is a root of multiplicity k (possibly 0) of the auxiliary equation. If (3 = 0, Q is of form eaXp(x); also p and q may reduce to constants (h = 0). The coefficients of the polynomials cjJ, 'if; can be considered as undetermined coefficients; substitution of eq. (18) in eq. (16) leads to relations between these coefficients from which all can be determined. As an example consider the equation (D 2 l)y = 3 cosx. + Here a = 0, (3 = 1, h = O. Since ±i are roots of the auxiliary equation, = 1 and Y* = x(A cosx B sin x). k + Substitution in the differential equation leads to the relation 2( -A sin x Hence B = %, + B cos x) == 3 cos x. A = 0; Y* = %x sin x and the general solution is Y = ix sin x + Cl cos + X C2 sin x. Superposition Principle. If in eq. (16) Q(x) is a linear combination of functions Ql (x), "', QN(X) and Yl *(x), "', YN*(X) are particular solutions of the respective equations obtained by replacing Q(x) by Ql (x), "', QN(X), then the corresponding linear combination of Yl *(x), "', YN*(X) is a solution of eq. (16); that is, if then + ... + bNYN*(X) y*(x) = b1Yl * (x) is a particular solution of eq. (16). For example, particular solutions of (D 2 + l)y = 3 cos x, are found by undetermined coefficients to be %x sin x, e2x respectively. Hence a particular solution of (D 2 is given by 6x sin x + 2e 2x + l)y = 12 cos x + 10e 2x • Variation of Parallleters. Let the complementary function be Then a particular solution is (19) DIFFERENTIAL EQUATIONS 5-07 where and WI (X), ... , Wn (X) are defined by the linear equations + ... + Yn(x)wn(x) = 0, Y'l (X)WI (x) + ... + y' n(X)Wn(X) = 0, YI (X)WI (x) (20) YI(n-l)(X)WI(X) +···+Yn(n-l)(X)W n(X) = Q(x)/ao. The determinant of coefficients of eqs. (20) is the Wronskian determinant YI(X) (21) W= Y'I (x) YI (n-l) (x) Yn (n-l) (x) Under the assumptions made, W cannot equal 0 for any x of the interval considered, so that eqs. (20) have a unique solution (Chap. 2). This method is applicable if ao, ... , an are functions of x, provided ao(x) r!= O. Operational Methods. The operational methods based on the H eaviside calculus provide another powerful tool for obtaining solutions of nonhomogeneous linear equations with constant coefficients (see Chap. 8). Closely related are the methods based on the Laplace transform (Chap. 9). 4. EQUATIONS OF FIRST ORDER BUT NOT OF FIRST DEGREE The equations considered have form (22) -.r(X, y, p) = 0, where p = dy/dx. Equation (22) can be solved for p, except where Y;P = O. The locus defined by the two equations (23) -.r(X, y, p) = 0, is called the singular locus. It may contain curves y = f(x) which are solutions of eq. (22); such solutions are called singular solutions. The solutions of eq. (22) (with the possible exception of the singular solutions) can often be obtained by one of the following special methods. Factol'ization. If eq. (22) can be factored in the form (24) [p - FI(x, y)][p - F 2(x, V)] ... [p - Fk(x, V)] = 0, 5-08 GENERAL MATHEMATICS then its solutions are obtained by combining all solutions of the first degree equations (25) (p For example, the equation p2 _ (2x + y)p + 2xy = = dy/dx). ° can be factored into the equations p = 2x, p = y; the solutions are y = x 2 + CI, y = C2ex. If 1f;(x, y, p) is of second degree in p, the expressions for p and the equivalent factorization (24) can be obtained by the quadratic formula. Solving for y or x. If eq. (22) is of first degree in y, one can solve for y to obtain an equation (26) y = F(x, p). Differentiation of this equation with respect to x yields a relation of form '.dp dx = G(x, p), (27) that is, a first order equation relating p and x. If the general solution of eq. (27) is given by cJ>(x, p) = (28) C, then the equations (29) y = F(x, p), cJ>(x, p) = C together define solutions of eq. 22; p may be eliminated between the equations or treated as a parameter. As an example, consider the Clairaut equation: (30) y = xp + F(p). The method described leads to the "general solution" (31) y = cx + F(c). There is, in general, a singular solution defined by the equations (32) x+F'(p) =0, y = xp + F(p). If the eq. (22) is solvable for x, one can differentiate with respect to y, reo DIFFERENTIAL EQUATIONS 5-09 placing dxldy by lip; one obtains the solutions in the form (33) cp(y, p) = y C, = F(x, p). 5. SPECIAL METHODS FOR EQUATIONS OF HIGHER THAN FIRST ORDER Equations with Dependent Variable Missing. Let the given equa- tion be F(x, y', y", .. " yen») = 0, (34) Set p = dyldx. so that y does not appear. Then dp y" = - , dx and so eq. (34) becomes (34') n 1 dp d - ) F ( x p - , ... -~ = , 'dx ' dxn -1 ° ' an equation of order n - 1 for p in terms of x. If its solutions are known, then the solutions of eq. (34) are obtained from the relation y = EXAMPLE. f p dx. Consider the equation x 3 y" - x 2 y' = 3 + x2• The substitution p = y' leads to the first order linear equation x 3 dp _ x2p = 3 dx + x2. Its general solution is found (Sect. 2) to be 1 p = - 2 X + 1 + CIX. Hence integration yields y: Equations with Independent Variable ~1issing. equation be the nth order equation (35) F(y, y', y", "', yen») = 0, Let the given 5-10 GENERAL· MATHEMATICS so that x does not appear. Set p = y'. Then d2y dp dp dy dp -= -=--=p2 dx dx dy dx dy' 2 3 d y = p2 d p (d P)2, .... dx 2 dy2 dy + Thus eq. (35) becomes an equation of order n - 1. If its solutions are known, in the form p = cp(y, Cl, ••• , Cn-l), then dy dy -----=dx. dx = cp, ocp(y, Cl, ••• ) Thus integration yields an implicit form of the solutions of the given equation. Linear Equations with One Known Solution. Let a linear equation be given: (36) ao(x)y(n) (x) + ... + an(x)y = Q(x). Let YI (x) be a solution of the related homogeneous equation. substitutions (37) Y = YI(X)V, w Then the = v' leads to an equation of order n - 1 for w. If w has been found, integration and multiplication by YI (x) yields y. 6. SOLUTIONS IN FORM OF POWER SERIES Formation of Taylor Series. (38) y(n) = F(x, Let an equation of order n be given: y, y', ... , y(n-l») and let F be expressible in an absolutely convergent power series in powers of x, y, y', ... for Ixl < a, Iyl 0, {3 > 0; in Fig. 3 the roots are AI, A2with Al < 0 < A2; in Fig. 4 the roots are ±{3i, {3 > 0; in Fig. 5 y x FIG. 4. Solution near center type singular point. the roots are Al, A2 with Al < A2 < O. The solutions of the system (61) will have the same appearance near (0, 0) as the solution of eqs: (67), except in borderline cases; of the four cases illustrated, only that of Fig. 4 is of borderline type. For a full discussion, see Ref. 2. y FIG. 5. Solution near node type singular point. DIFFERENTIAL EQUATIONS 5-19 Limit Cycles. Of much importance for applications are the solutions represented by closed curves in the xy-plane. These are termed limit cycles. For the parametric eqs. (G1) such a solution is represented by equations x = 'p(t), y = q(t), where p and q have a common period '1'. A typical solution family containing a limit cycle C is illustrated in Fig. G. The y x FIG. 6. Limit cycle. cycle C is stable in this case; that is, all solutions starting near C approach C as time t increases. In many cases simple properties of the isoclines allow one to conclude existence of limit cycles in particular regions. A theorem of Bendixson states, that a region in which Fx + Gy > 0 can contain no limit cycle of eqs. (G1) (Refs. 2, 3). Phase Plane. For the motion of a particle of mass m on a line, classical mechanics gives an equation of the form 2 (G9) dx m dt 2 ( dX) = F t, x, dt . When F is independent of t, the substitution v = dx/dt leads to an equation (70) dv l1lV - ; dx = F(x, v) which can be analyzed as above. The pair (x, v) represents a phase of the mechanical system and the xv-plane is termed the ph'lse plane. Second order equations arising in other contexts can be treate:l similarly and the term phase is used for the pair (x, v) or (x, y) regardless of the physical GENERAL MATHEMATICS 5-20 significance of the variables. An especially simple graphical discussion can be given for the conservative case of eq. (69): d2 x (71) m dt 2 = F(x). See Ref. 2. 10. PARTIAL DIFFERENTIAL EQUATIONS This section presents a brief discussion of partial differential equations of second order. Some further information is given in Chap. 6. (See Refs. 4, 10.) Classification. Consider an equation (72) A a2 u -2 ax a2u a2u au au + 2B laxay + C -ay2 + D -ax + E -ay - + Fu + G = 0 where u is an unknown function of x and y and the coefficients A, "', G are given functions of x and y (perhaps constants). The eq. (72) is termed elliptic if B2 - AC < 0, parabolic if B2 - AC = 0, hyperbolic if B2 - AC > O. The three types are illustrated by the Laplace equation: a2 u -2 ax a2 u + -2 = 0, ay :au a2 u heat equation: - - k2 - 2 = 0, at ax wave equation: a2u -2 - at k2 a2 1,l -2 ax = o. Attention will be restricted to the three special types. Dirichlet Problelll. One seeks a solution u(x, y) of the Laplace equation in an open region D, with given boundary values on the boundary of D. This problem can be treated by conformal mapping (Chap. 10, Sect. 5). Heat Equation. A typical problem is the following. One seeks a solution u(x, t) of the heat equation Ut - k 2 u xx = 0 for t > 0, 0 < x < 1, with given initial values rf>(x) = u(x, 0) and boundary values u(O, t) = 0, u(l, t) "= O. To obtain a solution one can employ the method of separa- 5-11 DIFFERENTIAL EQUATIONS tion of variables. One seeks solutions of the differential equation and boundary conditions of form u = f(x)g(t). (73) From the differential equation one finds that one must have g'(t) 2f "(X) - - k --=0. get) f(x) Hence g'l 9 must be a constant A, and f" If must equal Alk2 : (74) k 2f" (x) - Af(x) = O. g' (t) - Ag(l) = 0, From the boundary conditions at x = 0 and x = 1 one finds that f(O) = f(l) = O. (75) From eqs. (74) and (75) one concludes that f(x) and A must have the form (76) f(x) = b sin n7rX, A = _k 2 n 2 7r 2 , n = 1, 2, .... From eqs. (74) get) has form const .. eAt. Hence particular solutions of form (73) have been found: n = 1,2, .... (77) Each linear combination of the functions (77) is also a solution of both the heat equation and the boundary conditions at x = 0 and x = 1. Accordingly, each convergent series (78) also represents a solution. By proper choice of the constants bn the initial values can be satisfied. One requires that 00 (79) ¢(x) = ~ bn sin n7rx. n=l Thus the bn are determined from the expansion of ck(x) in its Fourier sine series (Chap. 8, Sect. 8). With the bn so chosen, eq. (78) represents the desired solution of the given problem. 'Vave Equation. One seeks a solution u(x, t) of the wave equation 2 Utt - k u xx = 0 for 0 < x < 7r, t > 0 with given initial values u(x, 0) = ¢(x) and initial velocities Ut(x, 0) = t/I(x) and given boundary values u(O, t) = u(7r, t) = O. This is the problem of the vibrating string. The GENERAL MATHEMATICS 5-22 method of separation of variables can be used as above and one obtains the solution in the form of a series ao (80) U =~IL,'sin nx[an sin knt + f3n cos knt], n=I where an and f3n are determined from the expansions: ao (81) cJ>(x) = L ao f3n sin nx, y;(x) = n=I L nka n sin nx. n=I Relaxation Methods. One can obtain an approximation to the solution of a partial differential equation by replacing it by a corresponding difference equation. The method has been especially successful for the Dirichlet problem, which is discussed here. The differential equation U xx + U yy = 0 is replaced by the equation (82) u(x + h, y) + u(x, y + h) + u(x - h, y) + u(x, y - h) - 4u(x, y) = O. If the given region is the square 0 ~ x ~ 1, 0 ~ y ~ 1, one chooses h = lin for some positive integer n and requires eq. (82) to hold at the lattice points (kIh, k2h), 0 < ki < n, 0 < k2 < n. The values of u on the boundary (x = 0 or 1, Y = 0 or 1) are given, and eq. (82) bec~mes a system of simultaneous linear equations for the unknowns U(klh, k2h). These can be solved by the relaxation method. One chooses an initial set of values for the unknowns, then obtains a next approximation by replacing u(x, y) by (83) t[u(x + h, y) + u(x, y + h) + u(x - at each lattice point. h, y) + u(x, y - h)] Repetition of the process generates a sequence un(x, y) which can be shown to converge to the solution of eq. (82). As h ~ 0, the solution of eq. (82) can be shown to converge to the desired solution of the Dirichlet problem (Ref. 10). REFERENCES 1. R. P. Agnew, Differential Equations. McGraw-Hill, New York, 1942. 2. A. Andronow and C. E. Chaikin, Theory of Oscillations, Princeton University Press, Princeton, N. J., 1949. 3. E. A. Coddington and N. Levinson, Theory of Ordinary Differential Equations, McGraw-Hill, New York, 1955. 4. R. Courant and D. Hilbert, Methods of Mathematical Physics, Vol. I, Interscience, New York, 1953. DIFFERENTIAL EQUATIONS 5-23 5. E. L. Ince, Ordinary Differential Eqllations, Longmans, Green, London, 1927. 6. E. Kamke, Differentialgleichllngen, Losllngsmethoden 1lnd Losllngen, YoU, 2nd edition, Akademische Verlagsgesellschaft, Leipzig, 1H43. 7. E. Kamke, Differentialgleichungen reel leI' Fllnktionen, Akademische Verlagsgesellschaft, Leipzig, 1933. 8. E. D. Rainville, Elementary Differential Equations, Macmillan, New York, 1952. 9. E. D. Rainville, Intermediate Differential Equations, Wiley, New York, 1943. 10. R. V. Southwell, Relaxation Methods in Engineering Sciences, Oxford University Press, Oxford, England, IH46. 11. E. T. Whittaker and G. M. Watson, A Course of lIfodern Analysis, 4th edition, Cambridge University Press, Cambridge, England, 1940. A GENERAL MATHEMATICS Chapter 6 Integral Equations E. H. Rothe 1 Definitions and Main Problems 6-01 2. Relation to Boundary Value Problems 6-03 3. General Theorems 6-05 4. Theorems on Eigenvalues 6-06 5. The Expansion Theorem and Some of Its Consequences 6-07 6. Variational Interpretation of the Eigenvalue Problem 6-08 7. Approximation Methods 6-10 References 6-17 1. DEFINITIONS AND MAIN PROBLEMS A linear integral equation of first kind is an equation of form (1) f. b ](s, t)x(t) dt = f(s); f(s) and ](s, t) are considered to be given, and a function x(t) satisfying eq. (1) is called a solution of the integral equation. Fredhohn Integral Equation. This is the linear integral equation of second kind and has the form (2) f. xes) - A b ](8, t)x(t) dt a 6-01 = f(s). 6-02 GENERAL MATHEMATICS Here K(s, t) and f(s) are given real functions, and A is a given real constant; a solution of the integral equation is a function xes) satisfying eq. (2) for a ~ s ~ b. Volterra Integral Equation. If in eq. (2) the upper limit b is replaced by the variable s, the resulting equation (3) f. 8 xes) - A ]{(S, t)x(t) dt = f(s) a is called a Volterra integral equation. Equation (3) can be considered as a special case of eq.(2); namely, the case for which K(s, t) = 0 for t ~ s. The preceding definitions relate to integral equations for functions of one real variable. There are analogous definitions for functions of two or more real variables. It is also of importance to allow x, K, f to take on complex values and to allow A to be complex. For simplicity the results will be formulated for functions of one variable; essentially no change is required to extend the results to functions of several variables. Only functions with real values will be considered here. The discussion will furthermore be restricted to the integral equation of second kind; for the equation of first kind, see Ref. 10, Chap. 2. REMARK. The equations defining Laplace and Fourier transforms can be regarded as integral equations of first kind. Solving the equations is equivalent to finding the inverse transforms. See Chaps. 8, 9. The function K(s, t) in eq. (2) is called the kernel of the integral equation. The eq. (2) is said to be homogeneous if f(s) = 0; otherwise it is nonhomogeneous. The homogeneous equation (4) f. xes) - A b K(s, t)x(t) =0 a obtained from eq. (2) by replacingf(s) by 0 is called the homogeneous equation associated with eq. (2). A number A such that eq. (4) has a solution x = ¢(s) not identically 0 is called a characteristic value or eigenvalue of eq. (4) or of the kernel K(s, t); the solution ¢(s) is called 'an eigenfunction associated with A. For each eigenvalue A there may be several associated eigenfunctions. From the definition it follows that 0 cannot be an eigenvalue. The eigenvalue problem associated with eq. (4) is the determination of whether, for a given kernel, eigenvalues exist, what they are, and what the corresponding eigenfunctions are. The expansion problem associated with eq. (3) is the determination of the possibility of expanding every function g(s) of a given class in an in- INTEGRAL EQUATIONS 6-03 fini te series: :E caCPa(s), g(s) = a=l where the CPa(s) are eigenfunctions. The solvability problem associated with eq. (2) is the determination of whether eq. (2) has a solution xes) and whether the solution is unique. 2. RELATION TO BOUNDARY VALUE PROBLEMS The problems described in Sect. 1 arise naturally in the analysis of boundary value problems associated with partial differential equations. A TYPICAL EXAMPLE is presented. The equation is the wave equation u (5) = u(x, y, z, t), where y-2 is the Laplacian operator: 2 y- u = (6) a2u a2u a2u -+-+-. ax2 ay2 az 2 A finite domain D, with smooth boundary B, is given in x, y, z space; a function g(x, y, z) is given in D. One seeks a function u satisfying eq. (5) for t ~ 0 and for (x, y, z) in D and satisfying the boundary conditions (7) u(x, y, Z, t) = 0, (8) u(x, y, Z, 0) = g(x, y, z), (9) (x, y, z) on B, t ~ 0; Ut(x, y, z, 0) = 0, (x, y, z) in D; (x, y, z) in D. The classical attack on this problem is to "separate" the time and space variables; that is, to set u(X; y, z, t) (10) = Sex, y, z)T(t). Then one is led to the boundary value problems: (11) (12) y- 2 s + XS T"(t) = 0, S = 0 for (x, + XT(t) = 0, y, z} on B. T'(O) = o. If Sand T satisfy eqs. (11), (12) for some constant X, then u = ST satisfies eqs. (5) and both (7) and (9), but not necessarily eq. (8). Integral Equation. The problem (11) can be replaced by an integral equation by the following reasoning. It is shown in the theory of partial differential equations (Ref. 5) that there exists a uniquely determined function ]((s, 0") of the two points s: (x, y, z), 0": (~, 11, r) GENERAL MATHEMATICS 6-04 in D, the so-called Green's junction, with the following properties: K has continuous second partial derivatives as long as s ~ er; K(s, er) = 0 for s on B, er in D; if ¢(er) has continuous first partial derivatives in D and Dl is an arbitrary subdomain of D (with smooth boundary), then (13) V2 fff ~, 1], t)¢(~, 1], t) d~ d1] dt K(x, y,--z, = -¢(x, y, z) for (x, y, z) in D 1 • Identifying ¢ with -AS and Dl with D, one sees that the boundary value problem (11) is equivalent to the homogeneous integral equation in these variables: Sex, y, z) -A (14) fff K(x, y, z, ~, 1], t)S(~, 1], t) d~ d1] dt = o. D Solution. Let eq. (14) have a sequence of eigenfunctions Sa (x, y, z) associated with the positive eigenvalues Aa (a = 1, 2, ... ). -Then Sa satisfies eq. (11) with A = Aa; for A = Aa eq. (12) has the solution Ta == -cos t, so that u = SaTa satisfies eqs. (5), (7), (9). To satisfy eq. (8), one notes that each series vx: OC! (15) U = L: caSa(x, y, z)Ta(t) = ~caSa cos a=1 vx: t also satisfies eqs. (5), (7), (9), if the c's are constants and the series satisfies appropriate convergence conditions. The condition (8) now becomes OC! (16) g(x, y, z) L: caSa(x, y, z); = a=1 thus one is led to the expansion problem. If g- can be expanded as in eq. (8), then (15) defines a solution of the given problem. Suppose that the 0 on the right-hand side of eq. (5) is replaced by Fo(x, y, z, t); this corresponds to an external force. If Fo(x, y, z, t) = F(x, y, z) T(t), where T(t) satisfies eq. (12) for some A = AO, then the substitution of eq. (10) leads to the nonhomogeneous integral equation (17) Sex, y, z) - AO fff K(x, y, z, ~, 1], t)S(~, 1], t) d~ d1] dt = j(x, y, z), D where (18) j(x, y, z) = fff D K(x, y, z, ~, 1], t)F(~, 1], t) d~ d1] dt. INTEGRAL EQUATIONS 6-05 3. GENERAL THEOREMS In what follows ]((s, t) will be assumed continuous for a ~ s ~ b, a ~ t ~ b. Such a continuity condition is not always satisfied, e.g., for the Green's function of Sect. 2. See Ref. 5 (pp. 543 ff.) and Ref. 6 (pp. 355 ff.) for reduction of the discontinuous case to the continuous case. Definitions. The following definitions relate to functions of t defined and continuous for a ~ t ~ b. If x, yare two such functions, their scalar product is (19) (x, y) = f b x(t)y(t) dt. a II x II of x = x(t) is defined as (x, x)>-2. Functions xl, "', linearly independent if . The norm Xn are (20) with constant Cl, " ' , Cn, implies Cl = 0" . " Cn = 0; if the functions are not linearly independent, they are termed linearly dependent. An infinite system of functions (21) is called linearly independent if ¢l, "', ¢k are linearly independent for every k. Two functions x, yare said to be orthogonal if (x, y) = O. The system (21) is orthogonal if (¢a, ¢(J) = 0 for a ~ {3. A system of orthogonal functions none of which is identically zero is necessarily linearly independent. The system (21) is called orthonormal if it is orthogonal and normalized, that is, /I ¢a /I = 1 for all a. If the system (21) is linearly independent, it can be orthogonalized and normalized; that is, an orthonormal system {¥tal can be found such that, for every n, ¥t1t is a linear combination of ¢l, "', ¢n and ¢n is a linear combination of ¥tl, "', ¥tn. For details, see Ref. 4, p. 50. THE SCHWARZ INEQUALITY states that for every x, y, I (x, y) I ~ II x 11·11 y II, with equality if and only if x, yare linearly dependent. THE BESSEL INEQUALITY states that, if {¢a 1 is an orthonormal system, then for every x (22) L I(x, ¢a) 12 a=l ~ II X 112. 6-06 GENERAL MATHEMATICS N ow consider three related integral equations: (23) f b K(s, t)cjJ(t) dt = f(s), cjJ(s) - A a Ai K(s, t)cjJ(t) dt 0, b (24) cjJ(s) - (25) "'(8) - AibK(t, 8)",(t) dt = O. = a Equation (24) is the homogeneous equation related to (23); eq. (25) is called the adjoint or transposed equation of (24). THEOREM 1. If A is an eigenvalue of eq. (24), then A is an eigenvalue of eq. (25). There are at most a finite number k of linearly independent eigenfunctions of eq. (24) associated with eigenvalue A; this maximal number k is the same for eq. (25). The number k is called the multiplicity of the eigenvalue A. THEOREM 2. Equation (23) has a solution if and only if f is orthogonal to all solutions of the adjoint eq. (25). . Conclusions Based on Theorellls 1 and 2. Let A not be an eigenvalue of ]((s, t). Then A is also not an eigenvalue of K(t, s); that is, l/I == 0 is the only solution of eq. (25). Hence (f, l/I) = 0 for all solutions l/I of eq. (25), and eq. (23) has a solution for arbitrary f. For each f, the solution is unique; for the difference cjJ of two solutions is a solution of eq. (24), hence cjJ == O. Let A be an eigenvalue of K(s, t). Then eq. (25) is satisfied for at least one l/I not identically zero and eq. (23) is not satisfied for somef, in particular for f = l/I. (In the problem of Sect. 2 this case arises if the frequency Ao of the time factor T of Fo(x, y, Z, t) is an eigenvalue of the homogeneous eq. (14); this is the case of resonance.) One is thus led to the following alternative of Fredholm: either (i) the nonhomogeneous eq. (23) has a solution for arbitrary f or (ii) the homogeneous eq. (24) has at least one (not identically vanishing) solution. Case (i) can also be characterized by the statement: eq. (23) has at most one solution for each f; for the uniqueness implies existence of a solution. 4. THEOREMS ON EIGENVALUES The kernel K(s, t) is said to be symmetric if K(s, t) == K(t, s). This case occurs in many applications; for example in the problem of Sect. 2. THEOREM 3. A symmetric kernel has at least one and at most a countable infinity of eigenvalues. Eigenfunctions corresponding to distinct eigenvalues are orthogonal. The eigenvalues can be numbered to form a sequence {A a }, in which each eigenvalue is repeated as many times as its multiplicity, and such INTEGRAL EQUATIONS 6-07 that IAll ~ IA21 ~ •.• ; if there are infinitely lnany eigenvalues, then IAa I ~ 00 as ex ~ 00. An eigenfunction cJ>a can be assigned to each Aa in such a fashion that the sequence {cJ>a} is orthonormal and every eigenfunction cJ> is a linear combination of a finite number of the cJ>a's. The sequence {cJ>a} is called a full system, of eigenfunctions of the kernel. REMARK. While restricting ](s, t) to be real, one can consider complex eigenvalues A and eigenfunctions x(t) = Xl (t) + iX2(t). Some kernels have only complex eigenvalues; some kernels have no eigenvalues at all. A symmetric kernel has only real eigenvalues. 5. THE EXPANSION THEOREM AND SOME OF ITS CONSEQUENCES THEOREM 4. Let {cJ>a} be a full system of eigenfunctions for the symmetric kernel I{(s, t). Then in order that a function g(s) can be expanded in a uniformly convergent series: (26) where (27) it is sufficient that g(s) can be written in lhe form (28) g(s) = f b I{(s, t)G(t) dt, a where G(t) is continuous. In many applications the form (28) for the function g(s) to be expanded arises in a natural way. For example, the function (I8) is of this form. The coefficients (27) can be written in a different form which is often useful. From eq. (28) and from the facts that cJ>a satisfies eq. (24) with A = Aa and that ]( is symmetric, one deduces the expression (29) Ca (G, cJ>cJ = ---. Aa As a first application of the expansion theorem, let A be a number which is not an eigenvalue, and seek to expand the solution x = cJ>(s) of eq. (23) in terms of the eigenfunctions. To do this, note that by eq. (23) x - f is of form (28) with G = AX. By Theorem 4 and eq. (29) one deduces the expansion (30) xes) - f(s) = A La (x, cJ>a)cJ>a(s). Aa If this relation is multiplied by cJ>{3(s) and integrated from a to b, one ob- GENERAL MATHEMATICS 6-08 tains a linear equation for (x, ¢(3). Solving this equation and substituting the result in eq. (30) gives the desired formula (31) ~ (j, ¢a) + A L....i - '- xes) = f(s) a Aa - A ¢a(s). (The series is meaningless if A is one of the Aa , unless (j, ¢a) = 0; this is in agreement with Theorem 2 of Sect. 3.) A second application concerns the "quadratic form" ff b (32) I{x, x} = a b K(s, t)x(s)x(t) dt ds, a whose importance will become clear in the next section. If one applies the expansion theorem to the integral of K(s, t)x(t), one obtains the formula: (33) I{x, x} = k 2 a 2:-, a Aa The transition from eq. (32) to (33) is the analogue of choosing coordinates which represent a conic section in its "principal axis" form. 6. VARIATIONAL INTERPRETATION OF THE EIGENVALUE PROBLEM In this section the hypotheses and notations are the same as those of Sect. 5. It is convenient to denote the positive Aa'S by o < Pl (34) ~ P2 ~ P3 ~ ..• and the negative ones by (35) There may be no p's or no n's; as remarked in Sect. 1, 0 is not an eigenvalue. Equation (33) now becomes k· 2 (36) I{x, x} = 2:~ i - Pi l·2 2:~, i nj where k j = (x, 'if;j), lj = (x, Xj) and ..pj is the eigenfunction associated with Pi, Xj the eigenfunction associated with nj. From eq. (36) and Bessel's inequality (22) one now concludes: THEOREM 5. If there are positive eigenvalues of the symmetric kernel K(s, t), then (37) I{x, x} ~ JL:Jf, Pl INTEGRAL EQUATIONS 6-09 where PI is the smallest positive eigenvalue. The maximum of I {x, x} for x within the class of x having norm 1 is attained when x = 1/11 and equals 1/Pl. If there is a positive eigenvalue Pn, then the maximum of I {x, x} within the class of x for which (38) "x" = (j = 1, "', n - 1) 1, is attained when x = 1/In and equals 1/Pn. If K(s, t) is replaced by -I((s, t), one obtains a characterization of the negative eigenvalues and corresponding eigenfunctions. The characterization of eigenvalues in Theorem 5 is recursive; that is, in order to characterize Pn and 1/In one has to know 1/11, "', 1/In-l. A direct characterization is obtainable as follows. Let M {YI, "', Yn-l} denote the least upper bound of I {x, x} among all x such that (39) =0 (x, Yj) (j = 1, "', n - 1). It can be shown that, among all choices of Yl = Yl (t), "', Yn-l = Yn-l (t), M has its smallest value, namely 1/Pn, when Yl = 1/11, "', Yn-l = 1/In-l. See Ref. 4, p. 132. Rayleigh-Ritz Quotient. (40) This is the quotient Q{xl = I{x, xl + IT[ KCs, I) xCI) dlr ds. Assume that there are at most a finite number of negative eigenvalues and assume all the eigenvalues are numbered so that Al ~ A2 ~ A3 ~ ..... From eq. (33) one finds (41) From the expansion theorem of Sect. 5, with G(t) = x(t), one deduces that (42) f. b[f b I((s, t)x(t) dt a a ]2ds = 2: (k~ )2 . a Aa From eqs. (40), (41), (42) one thus obtains the inequality (43) Furthermore one can show that Q {x} takes on its mInImUm Al when x = CPl. Thus the smallest eigenvalue and associated eigenfunction are obtainable by minimizing Q{x} . This is the basis of a very effective computational procedure. 6-10 GENERAL MATHEMATICS The quotient Q{x} can be written in another way, more familiar in the theory of differential equations. One sets (44) u(s) = f b K(s, t)x(t) dt, a so that b (45) =f u(t)x(t) dt -;- f Q{x} a b u 2 dt. a The analogous definition, and integration by parts, for the problem of Sect. 2 leads to the expression III(U (46) 2 X 2 + u 2+U z ) dx dy dz y D Q{x} III 2 u dx dy dz D where u is the solution of the problem (47) 11 = 0 on B. 7. APPROXIMATION METHODS The first fqur methods to be described are devices for replacing the integra] equation by a system of linear algebraic equations. ApproxiIllation of Integrals. Let a subdivision of the interval a ~ t ~ b be given: a = t1 < t2 < ... < tn < tn+1 and let D = max (tf+1 ...;.. tf). f. = b Then for continuous k(t), the difference n b h(t) dt - f, h(tj) (tj+, - tj) can be made as small as desired, in absolute value, by making 0 sufficiently small. Hence one can take the sum as approximation to the integral. If this is done for the Fredholm eq. (2), one obtains the approximating equation n (48) xes) - A 2: K(s, tj)(tf+1 - tf) f=1 = f(s). INTEGRAL EQUATIONS 6-11 If one now writes (49) for i = 1, "', n, then at s = ti eq. (48) becomes n (50) Xi - A L aijXj = bi (i = 1, "', n). j=l This is a system of linear equations for Xl, " ' , x n . A solution can be regarded as giving the values of the desired xes) at tl, "', tn; one can interpolate linearly between these points to obtain an approximation to xes). The first proof by Fredholm of the main theorems of Sect. 3 was based on eq. (50) and subsequent passage to the limit (n ~ 00, 0 ~ 0). For numerical purposes the procedure may be improved by using better approximations for the integral such as those given by the trapezoidal rule, Simpson's rule or Gauss's quadrature (Ref. 9, Chap. 7). Each of these methods replaces the integral by a sum T-h(tj)A j with properly chosen abscissas tj and "weights" Aj. For more details and also the question of convergence, see Ref. 1 (pp. 105 ff.), Ref. 3 (pp. 437 ff.), Ref. 9 (p. 455). Method of Degenerate Kernels. A kernel A (8, t) is called degenerate if it can be written as a finite sum of products of a function of s by a function of t; that is, if it is of the form n (51) A(8, t) = L A j (8)Bj (t). j=l Every continuous kernel I((s, t) can be approximated by a continuous degenerate kernel A (s, t); that is, for every E > 0 there exists a continuous A (s, t) such that II((s, t) - A (s, t) I < E for a ~ 8 ~ b, a ~ t ~ b. One therefore obtains an approximate solution of eq. (2) by replacing I( by A. For the question of convergence one is referred to Ref. 4 (pp. 118 ff.), Ref. 1 (Abschnitt IV), and Ref. 3 (p. 464). If I( is replaced by A, the Fredholm eq. (2) is replaced by the equation 00 (52) xes) - A L Aj(s) f b Bj(t)x(t) dt = f(8), a j=l whose solution is found by solving a system of linear equations. To see this, one multiplies eq. (52) by Bi(S) and integrates with respect to s from a to b. With the notations f a b Aj(t)Bi(t) dt = aij, f a b f(t)Bj(t) dt = h, GENERAL MATHEMATICS one obtains the system n (53) Xi - A L: aijXj = h ·(i = 1, ... , n). j=l It can be verified that if Xl, ..• , Xn is a solution of this linear system, then n Xes) = f(s) (54) + A L: xjAj(s) j=l is a solution of eq. (52) and, conversely, every solution of eq. (52) is obtained in this way. The Ritz-Galerkin Method. This is a method for finding approximations to the eigenvalues and eigenfunctions of the homogeneous eq. (3) with symmetric kernel. Let {va} be an orthonormal system. Such a system is called complete (in the class of continuous functions on the interval a ~ s ~ b) if for every continuous function xes) the sums n (55) L: XiVi(S), i=l converge "in the mean" to x(s); that is, if EXAMPLE. The functions form a complete orthonormal system for -71" ~ S ~ 7r; see Chap. 8, Sect. 8. Now let ¢l (s) pe a normalized eigenfunction of eq. (3) corresponding to the smallest positive eigenvalue AI. (If there are no positive eigenvalues, one follows a similar procedure starting with the negative eigenvalue of smallest absolute value.) One now seeks an approximation ¢ to ¢l of form n (56) ¢ = L: CiViCS), i=l where {Va} is a complete orthonormal system. In order to determine the note (Theorem 5, Sect. 6) that I/AI is the maximum of I {x, x} when II X II = 1, and that this maximum is reached for X = ¢l. Restricting at- Ci, INTEGRAL EQUATIONS 6-13 tention now to functions of form (56), one finds n I {cp, cp} (57) = n 2: 2: ba {3c a c{3, a=l (3=1 with f. f. b ba {3 = (58) The condition II cP " = ~ b K(s, t)v a (s)v{3(t) ds dt. becomes (59) Maximizing the quadratic form (57) with side condition (59) can be analyzed by the method of Lagrange multipliers (Ref. 4). One obtains the equations n (60) Ci - A 2: bijcj = 0 (i = 1, "', n) j=l which, together with eq. (59), determine the Ci and A. In particular, A is a root of the algebraic equation obtained by setting the determinant of eq. (60) equal to zero. If Al * is the smallest positive root of this equation, then Al * is an approximation to Al and Al * ~ AI; for A = Al * eqs. (59) and (60) determine CI, " ' , Cn and, by eq. (56), a desired approximation cP of the eigenfunction CPl. Method of Enskog. The method will be discussed for the Fredholm eq. (2) with symmetric kernel, with A not an eigenvalue. (For less restrictive assumptions, see Ref. 7, p. 109.) It is based on a complete linearly independent system VI, V2, ••• with the additional property that the functions b (61) f. K(s, t)vn(t) dt Yn(s) = vn(s) - A a are orthonormal and complete. Such a system can be constructed as follows: Let WI, W2J ••• be a complete linearly independent system (e.g., the system of sines and cosines given above). One then defines b (62) f. K(s, t)wn(t) dt. zn(s) = wn(s) - A a I t can be proved that the Zn are likewise linearly independent and com- 6-14 GENERAL MATHEMATICS plete. From the Zn one constructs an equivalent orthonormal system Yl, Y2, (Sect. 3), so that relations n Yn(s) = (63) L: cnmzm(s) m=l hold, with constant Cnm . One now defines: (64) vn(s) = n L: cnmwm(s). m=l It follows from eq. (62) that eq. (61) holds. Moreover, the system {v n } is a complete linearly independent system. Having a system {v n } of the properties indicated, one can find an approximate solution xes) of the Fredholm eq. (2) of form n xes) = L: CiYi(S), i=l Multiply eq. (2) by VieS) and integrate with respect to S from a to b, to obtain the relations: (f, Vi) = f ff b b X(S)Vi(S) ds - A a i a f b = K(s, t)X(t)Vi(S) dt ds a b X(S)[Vi(S) - A a b K(t, S)Vi(t) dt] ds. a Because of the symmetry of the kernel, the expression in brackets is Yi(S). Hence (j, Vi) = i b X(S)Yi(S) ds = Ci· a Iteration is the basis of the following methods: The Fredholm equation can be written in the form Successive Approxhnations. (65) xes) = f(s) f +A b 1((s, t)x(t) dt. a This form suggests defining successive approximations tion xes) as follows: (66) x(O) (s) = f(s) , x(i+l) (s} = f(s) f. +A a x(i) (s) to the solu- b K(s, t)x(i) (t) dt, INTEGRAL EQUATIONS where i = 0, 1, 2, .... One can prove by induction that n x(n) (s) = f(s) (67) 6-15 +L b Aif KY> (s, t)f(t) dt, i=1 a where the so-called iterated kernels are defined by the relations (68) [{(l)(s, t) = [{(s, t), [(Hl)(8, t) =J"[(8, u)[(i)(u, t) du, a for i = 1, 2, .... (69) xes) = It can be proved that }~",x(n>(s) + i5 Ail. 00 = f(s) b KY>(s, t) dt exists if IAI is less than [(b - a) Max [{ (s, t)] -1; the series, known as Neumann's series, converges uniformly for a ~ s ~ b. The function xes) defined by eqs. (69) is the solution of (65) for A restricted as stated. For the Volterra eq. (3) the Neumann series converges for all A and the solution is valid for all A. The Schwarz Constants. Write I{x, x, I(} for the quadratic form 1 {x, x} defined by eq. (32) to express more clearly the dependence on [{. The Schwarz constants are then defined as follows: (70) ao ::= (x, x), a·~ = I{x " x j(i>} , (i = 1, 2, ... ), where the [{(i> are defined by eq. (68). These constants (which obviously depend on the choice of the function x) are important for the theory as well as for estimating eigenvalues. Note the following facts, supposing always that 1(s, t) is symmetric: If P is an arbitrary real number, subject only to the restriction that (71) and (72) then the interval with end pointsP, Q contains at least one eigenvalue provided that at least one of theJollowing assumptions is satisfied: (a) i is even, (b) [{ is a positive definite kernel, that is, I {x, x} > 0 unless x ::::; o. (For a proof, see Ref. 1, p. 30.) The quotients Q are termed Temple quotients. Setting P = 0 in eq. (72) leads to consideration of the quotients Qi = ai-IIai. It can be shown that the sequence IQ2i-1 I is monotone nonincreasing and converges to IAll, where Al is the eigenvalue of smallest GENERAL MATHEMATICS 6-16 absolute value. (For further applications of the Schwarz constants, see Refs. 1, 3, 9.) Method of Steepest Descent. The basis of this method is the fact that x is a solution of the Fredholm eq. (2) with symmetric kernel if and only if x minimizes the expression f. f. b (73) b F {x} = ![(x, x) - A a K(s, t)x(s)x(t) ds dtl - (f, x). a Let Xo be a first approximation to x. One seeks a better approximation = Xo h, and tries to choose h so that in going from Xo to Xl the value of F descends as rapidly as possible. With the notation Xl + f L[xl = x - A (74) b K(s, t)x(t) dt, a one finds that (75) F{xo + h} = F{xo} + (L[xo] -j, h) + t(L[h], h). N ow if F were a function of a finite number of real variables, the analogue of the second term on the right side of eq. (75) would be the scalar product of grad F with h. One therefore defines here g[x] = L[x] - j (76) as the gradient of F. This suggests that, as in the case of a function of a finite number of real variables, the direction oj steepest descent is given by the negative gradient; this can be proved to be true. One therefore sets h = - ag[xo], where a is a real constant to be determined. Replace h by -ag[xol in eq. (75); then F[xo + h] becomes a function of the real variable a. Now determine a by minimizing this function by the ordinary methods of calculus. The result for the desired next approximation Xl = Xo h is + (77) Xl = Xo _ \\ g[xo] \\2 g[xol. (L[g[xo]], g[xoD If one repeats the procedure starting with Xl instead of Xo, one obtains a new approximation X2; continuing thus, one obtains a sequence Xl, X2, •• " x n , •••• If the kernel K is symmetric and positive definite and IA\ is less than IAa I for every eigenvalue Aa, then Xn converges in the mean to the solution x of the Fredholm eq. (2). For proofs and details, see Ref. 8 (pp. 103 and 136). The method can also be applied to finding eigenvalues (Ref. 8, p. 142). INTEGRAL EQUATIONS 6-17 REFERENCES 1. H. BUckner, Die Praktische Behandlung von I ntegralgleichungen, Ergebnisse der Angewandten ~Mathematik, Vol. 1, Springer Verlag, Berlin, G6ttingen, Heidelberg, 1952. 2. L. Collatz, Eigenwertprobleme and ihre numerische Behandlung, Chelsea Publishing Company, New York, 1948. 3. L. Collatz, Numerische Behandlung von Diiferentialgleichungen, Die Grundlehren der JI,{athematischen Wissenschaften in Einzeldarstellungen, Vol. LX, 2nd edition, Springer Verlag, Berlin, G6ttingen, Heidelberg, 1955. 4. R. Courant and D. Hilbert, Methods of JI,{athematical Physics, Vol. 1, Interscience Publishers, New York-London, 1953. 5. Ph. Frank and R. v. Mises, Die Diiferential- und Integralgleichungen der Jl,fechanik und Physik, 2nd edition, Vieweg, Braunschweig, 1930 (republished Rosenberg, New York, 1943). 6. E. Goursat, Cours d'Analyse Mathematique, Vol. 3, 3rd edition, Gauthier-Villars, Paris, 1923. 7. G. Hamel, Integralgleichungen, Einfuehrung in Lehre und Gebrauch, Springer, Berlin, 1937 (Edwards Brothers, Ann Arbor, Mich., 1946). 8. L. V. Kantorovich, Functional Analysis and Applied Mathematics, National Bureau of Standards Report 1509, 1952 [translated from Uspekhi Matemat. Nauk, 3 (6),89-185 (1948)J. 9. Z. Kopal, Numerical Analysis, Wiley, New York, 1955. 10. W. Schmeidler, Integralgleichungen mit Anwendungen in Physik und Technik, 1. Lineare Integralgleichungen, Akademische Verlagsgesellschaft Geest u. Portig, Leipzig, 1950. A GENERAL MATHEMATICS Chapter 7 Complex Variables W. Kaplan 1. Functions of a Complex Variable 2. Analytic Functions. Harmonic Functions 3. Integral Theorems 7-01 7-04 7-05 7-08 4. Power Series. laurent Series 5. Zeros. Singularities. Residues. 'Argument Principle 7-11 7-16 7-17 7-18 6. Analyric Continuation 7. Riemann Surfa,ces 8. Elliptic Functions 9. Functions Deflned by linear Differential Equations 10. Other Transcendental Functions 7-21 7-25 References 7-28 1. FUNCTIONS OF A COMPLEX VARIABLE COlllplex NUlllbers. Throughout Chap. 7, z = x + iy and w = u + iv denote complex numbers; i is the imaginary unit, i 2 = -1; x, .y, u, v are arbitrary real numbers; x is the real part of z, y the imaginary part of z: (1) x = Re (x + iy), y = 1m (x + iy). The complex numbers z can be represented geometrically by the points (x, y) of an xy-plane (or z-plane), as in Fig. 1. The polar coordinates (r, 0) of z are termed respectively the modulus (or absolute value) of z and argument (or amplitude) of z: (2) r = Iz I = mod z; () = arg z = amp z; 7-01 z = r(cos () + i sin ()). GENERAL MATHEMATICS 7-02 The con}ugate of z =x + iy is: (3) 2 = x - iy. Algebraic properties of complex numbers are discussed in Chap. 2. In general, complex numbers are combined as are real numbers, with the relation i 2 = -1 used to simplify the results. Addition is the same as vector y x I I I I I ~z The complex z-plane. FIG. 1. addition (Fig. 1) . Multiplication of ZI by Z2 yields a number modulus is !ZI ! . !Z2! and whose argument is arg ZI + arg Z2. ZI • Z2 whose Useful Rules. ZI (4) + Z2 = 21 + 22 , + 2 = 2 Re (z), !Zl + Z2! ~ !ZI !+ !Z2! , z zn Z - = [r(cos 0 + i sin O)]n = 2 = 2i 1m (z), rn(cos nO + i sin nO), n = ±1, ±2, .... COlllplex Functions. By a function of the complex variable z will be meant an assignment of a value w to each z of a certain set D in the z-plane (see Chap. 1, Sects. 1 and 3); one then writes: w = fez). (5) (Some formulas will assign several values of w to each z in D. One then speaks of a "multiple-valued function.") The set D is generally an open region (e.g., interior of a circle); see Chap. 1, Sect. 8. From the equation u + iv = f(x + iy) one deduces two equations of the form (6) u = u(x, y), v = vex, y) ((x, y) in D), COMPLEX VARIABLES 7-03 and conversely a pair (6) of functions of two variables determines a complex function (5) of z. Limits and continuity for complex functions are defined as for real functions. The phrase "z approaches zo" is interpreted to mean: Iz - Zo I ~ 0, or that the distance from z to Zo becomes arbitrarily small. The basic theorems on sums, products, quotients hold without change from the real case. Continuity of w = J(z) is equivalent to continuity of both u(x, y) and v(x, y) in (6). Each complex function w = J(z) can be interpreted as a mapping (Chap. 10) of the set D into a set E in the w-plane. If J(z) is continuous, then as z traces a curve in the z-plane, w traces a curve in the w-plane. Derivatives of complex functions arc defined as for real functions: ~ J(z) = J'(z) = lim J(z + L\z) - J(z) , (7) dz L\z AZ ----> 0 and the formal rules of differentiation carryover. Higher derivatives J"(z) , ... are defined similarly. Definite integrals of complex functions are defined as line integrals: (8) j o Z J (Z) dz = f(u 2 ZI + iv) (dx + i dy) 0 = ofu dx - v dy + i ofv dx + u dy. Here C is a continuous path of finite length from Zl to formal rules carryover. ExaIllples of COIllplex Functions are the following: polynomials: w = aoz n an-lz an, Z2. Again the + ... + + n aoz + ... + a n, rational Junctions: w = bozm (9) + ... + bm exponential Junction: w = eZ = eZ(cos y + i sin y) logarithm: w = log z = log Iz I + i arg z (z ~ = exp z, 0), power Junction: w = za = exp (a log z), eiz trigonometric Junctions: sin z = hyperbolic Junctions: sinh z = eZ _ e- iz 2i e-z - 2 eiz ' cos z + e- iz = --2-eZ + e- z , cosh z = - - 2 7-04 GENERAL MATHEMATICS inverse trigonometric functions: 1 _~ sin-1 z = ~ log (iz ± VI - Z2), cos-1 1 Z = -;-log (z ± i~) t t The logarithm is a multiple-valued Junction and can be made singlevalued (so that continuity can be discussed) by properly restricting z and the choice of arg z. The principal value is: log z = log I z I + (10) ie, (r > 0, -7r < e~ 7r), a function continuous except for e = 7r. If a is a rational number (e.g., 7~), za has a finite number of values. For example, zYz = has two values: vz zYz = eYz log z = eYz(lOg r+i arg z) = Vr e(Yz) i arg z (11) if e is one choice of arg z. Identities satisfied by the exponential function, logarithm, and trigonometric functions: log zn = n logz, sin (Zl + Z2) = sin Zl cos Z2 + cos Zl sin Z2, sin2 z + cos2 Z = 1, .. '. In the case of the logarithm, the identities are true only for proper choice of value of each logarithm concerned. The rules for differentiation also carryover: (13) 2. ANALYTIC FUNCTIONS. d - sin z = cos z, dz HARMONIC FUNCTIONS The function w = J(z) is said to be analytic (regular, holomorphic) in an open region D if it has a derivative f'(z) in D. The function fez) is analytic inD if and only ifu = Re (f(z» and v = 1m (f(z» have continuous COMPLEX VARIABLES 7-05 first partial derivatives in D and the Cauchy-Riemann equations hold in D: (14) au av au av ax ay ay ax Furt.hermore, if j(z) is analytic, (15) au av + i -ax ux f'(z) = - av av + i -ax ay =- = .... If j(z) is analytic in D, the derivatives of all orders of j, u, v exist and are continuous in D. From eq. (14) one deduces that (16) that is, u and v are harmonic junctions. Relations (14) are described by the statement: "u and v form a pair of conjugate harmonic functions." One says "v is conjugate to u," but should note that u is conjugate to -v. In polar coordinates (14) and (15) become (17) au 1 av 1 au ar rae' r ae (18) j'(z) av - ar' eu + av) = e-tf) - ar i- . ar All the functions (9) are analytic, provided the logarithms are restricted so as to be continuous and division by zero is excluded. A function analytic for all z is called an entire function or an integral function; examples are polynomials and eZ • A function cannot be analytic merely at a single point or merely along a curve. The definit,ion requires always that analyticity holds in an open region. The phrases "analytic at zo" or "analytic along curve C" are understood to mean "analytic in an open region containing zo" or "analytic in an open region containing curve C." If j is analytic in an open region D, then the values w = j(z) form an open region in the w-plane. 3. INTEGRAL THEOREMS The open region D is termed simply connected if every ,simple closed path C in D (Fig. 2) has its interior in D. If D is not simply connected it is 7-06 GENERAL MATHEMATICS multiply connected; for example, the region between two concentric circles is multiply connected; it is doubly connected, because its boundary is formed of two pieces or "components." FIG. 2. Simply connected' region. All paths in the following line integrals are assumed to .be "rectifiable," i.e., to have finite length. CAUCHY INTEGRAL THEOREM. If fCz) is analytic in a simply connected open region D, then c~f(Z) dz = 0 on every simple closed path C in D or, equivalently, jf(Z) dz is independent of path in D. MORERA'S THEOREM Cconverse of Cauchy theorem). If fez) is continuous in the open region D and c~f(Z) dz =0 on every simple closed path C in D, then fCz) is analytic in D. An indefinite integral of fCz) is a function FCz) whose derivative is fez). If fCz) is continuous in D and has an indefinite integral F(z), then (19) in particular, the integral is independent of path, so that fez) must be analytic; since F' Cz) = fCz), F(z) must also be analytic. If fCz) is a given analytic function in D, then existence of an indefinite integral of fez) can be COMPLEX VARIABLES 7-07 proved, provided D is simply connected. In particular, F(z) = jZfCZ) dz (20) (zo in D) Zo has meaning, since the integral is independent of path, and F'(z) = fez), so that F is an indefinite integral. CAUCHY INTEGRAL FORMULAS. Let fez) be analytic in D. Let C be a simple closed path in D and having its interior in D. Let Zo be interim' to C (Fig. 2). Then f(zo) 1 = -. 27r~ ~ fCz) - - dz, C Z - zo 1 ~ !'(zo) = -. 27r~ C fez) 2 dz, "', (z - zo) (21) At the heart of this theorem is the special case fez) == 1: n = 2,3, Cauchy's theorem and integral formulas can be extended to multiply connected domains. Let D be a domain bounded by curves Cl , C2 , " ' , Ck FIG. 3. Multiply connected region. as in Fig. 3. Let fez) be analytic in a somewhat larger region, including all of D and its boundary. Then ~f(Z) dz + ~f(Z) dz + ... + (22) Ct C2 iv(z) dz = 0; cn:f that is, the integral of fez) around the complete boundary B of D is zero, GENERAL MATHEMATICS 7-08 provided one integrates on the boundary in the direction· which keeps the region D "on the left": Bff(Z) dz = O. (23) Under the same conditions, if Zo is in D, (24) f -fez)- dz, 1 f(zo) = - . 21T"t B n! fen) (zo) = 21T"i B Z - Zo f ---- ·f(z) dz. (z - zo)n+l CA UCHY INEQUALITIES. Under the hypotheses' stated above for eqs.. (21), let If(z) I = M on C and let C be a circle with center Zo and radius R. Then Mn! If(n) (zo) I ~ R n (25) (n=0,1,2,"')' LIOUVILLE THEOREM. If fez) is analytic for all finite z and I fez) I ~ M, where M is a constant, for all z, then fez) is identically constant. MAXIMUM PRINCIPLE. Let fez) be analytic in the open region D. If If(z) I has a weak relative maximum at a point Zo of D (that is, if If(z) I ~ If(zo) I for z sufficiently close to zo), then fez) is identically constant. For proofs of these theorems see Refs. 2, 3, 8. 4. POWER SERIES. LAURENT SERIES Infinite series whose terms are complex numbers are defined as for real numbers and, in general, the theory of convergence is the same. In par00 ticular, a series L bn of complex numbers is termed absolutely convergent n=l if the series of real numbers ~ Ibn I converges. Absolute convergence implies convergence. Power Series. A power series in z has the form 00 L (26) cn(z - zo)n, n=O where Zo is fixed. Each such series has a radius of convergence p, 0 ~ p ~ +00. If p = 0, the series converges only for z = ZOo Otherwise, the series converges (in fact, absolutely) for Iz - Zo I < p, i.e., inside the circle of convergence (whose radius p may be infinite). Outside this circle, for Iz - Zo I > p, the seri~s diverges. On the circle: Iz - Zo I = p, the series may converge at some points and diverge at others. The radius can be evaluated by the formulas: (27) p I ~ I' = n--+oo lim Cn+l p = lim n--+oo 1 _nj-' V ICn I COMPLEX VARIABLES 7-09 provided the limit exists, and in any case by the formula (28) p = lil!l 1 _nj-' n~«' V Icnl where lim denotes the lower limit. Let the power series (26) have radius of convergence p > 0, so that its sum is a well-defined function J(z) inside the circle of convergence. One can then prove that the series converges uniformly to J(z) in each circle Iz '-- Zo I ~ p' < p, so that J(z) is continuous. Furthermore, the differentiated series ~ncn(z - zo)n-l converges uniformly in each circle Iz - Zo I ~ p' < p. From this it follows that the differentiated series converges to J'(Z) and that f'(z) is continuous. Hence J(z) is itself analytic for Iz - Zo I < p. Every power series defines an analytic Junction inside its circle oj convergence. In general, all derivatives of J(z) can be evaluated by repeated differentiation of the series. One hence concludes that c (29) n J(n) (zo) =---' n." that is, the power series is the Taylor series of JCz). that equality of the sum of two power series: From this it follows Jz - zol < p, implies equality of corresponding coefficients: Cn=0,1,2,···). N ow let JCz) be given as an analytic function in an open region D of arbitrary shape and let Zo be a point of D. With Zo as center one can then construct a circle of maximum radius ro having its interior in D (Fig. 4). FIG. 4. Taylor series expansion. GENERAL MATHEMATICS 7-10 Within this circle fez) can be represented by a power series, its Taylor series about Zo: 00 f(n) (zo) fez) = (z - Zo)n, (30) Iz - zol < ro; n=O n! L -- the series may have a radius of convergence P larger than roo theorem one deduces the following expansions: zn Z3 all z· sin z = z - ~ + . .. all z· n=on!' 3!" 00 eZ = L -, Z2 cos z = 1 - 2! (31) + z) = log (1 1 -z + + ... ' all z·, Z2 Z)k = 1 + kz Z3 2" +"3 - ... , Izl < 1; z- = 1+ z -1- (1 From this + Z2 + ... , k(k - 1) + 2! Z2 Izl < 1; + ... , Iz I < 1. Laurent Series. A series of form ~ bn (z - zo)n n=I is reducible by the substitution z' = 1/ (z - zo) to the form of an ordinary power series and accordingly converges for Iz' I < p, i.e., for Iz - Zo I> PI 1 00 = -. If now a series an(z - zo)n converges for Iz - Zo I < P2 and L P PI n=O < P2, then the sum 00 L n=I b n (z - zo)n 00 +L n=O an(z - zo)n has meaning for PI < Iz - Zo I < P2, that is, in a certain annular region D (Fig. 5). Here PI may be 0 and P2 may be +00. Let the sum be f(z) , so that fez) is analytic in D. If one writes bn = a_n (n = 1, 2, ... ); then one has ~ (32) fCz) = L n=-oo 00 an(z - zo)n +L n=O 00 anCz - zo)n = L n=-oo anCz - zo)n. COMPLEX VARIABLES This series is termed the Laurent expansion of J(z). be shown to be uniquely determined as follows: 7-11 The coefficients can n = 0, ±I, ±2, (33) where C is any path about the ring, as in Fig. 5. If J(z) is an arbitrary function analytic in a ring domain D, then one can compute the coefficients an by eq. (33) and form the Laurent series, FIG. 5. Laurent expansion. which will then converge to J(z) in D. In practice there are easier ways of obtaining the coefficients. One way is to write J(z) as the sum of two functions!2(z), !I (z), the first analytic for Iz - zol < P2, the second analytic for Iz - Zo I > PI and approaching 0 as Iz I ~ 00. Under the substitution r = I/(z - zo),r:/J Jr (z) becomes a function of r analytic for Ir I < 1/PI, so that Jl (z) = bnr n or L n=1 r:/Jb (34) Jr(z) = L n n=1 (z - zo)n , Iz- zol>Pl' For J2(Z) one has a Taylor series about zoo Addition of the two series provides the desired Laurent series. For example, if J(z) = I/(z - 1) (z - 2) and D is the ring 1 < Izl < 2, then one can choose Jl(Z) = -I/(z - 1), !2(z) = 1/ (z - 2). 5. ZEROS. SINGULARITIES. RESIDUES. ARGUMENT PRINCIPLE Zeros. Let J(z) be analytic in domain D and let J(zo) = O. Then Zo is called a zero of J(z). If J(z) is not identically zero, then each zero has a 7-12 GENERAL MATHEMATICS definite order (or multiplicity) n, a positive integer, andf(z) = (z - zo)ng(z) where g is analytic in D and g(zo) ~ O. The order n is the smallest value of k such that j(k) (zo) ~ O. If fez) is not identically zero, then each zero of fez) is isolated; that is, for each zero Zo one can choose a circular region Iz - Zo I < a cont'aining rio other zero. Singularities. Let fez) be not identically zero and have a zero of order nat zoo Then h(z) , = Ilf(z) is analytic in some circular region Iz - Zo I < a except at the center ZOo By definition, h(z) has a pole of order n at Z00 One can write:h(z) = (z - zo)-np(z), where p(z) is analytic for Iz - Zo I < a and p(zo) ~ O. Since f(zo) = 0, lh(z) l ~ 00 as z ~ ZOo One conventionally assigns the value 00 to h(z) at ZOo In general, let fez) be analytic in a punctured disk: 0 < Iz - Zo I < a, but not at ZOo Then fez) is said to have an isolated singularity at zoo One can form a Laurent expansion of fez) in the ring domain PI = 0 < Iz - Zo I < a = P2' Three cases can then arise. I. No negative poU?ers in the Laurent series. Then 00 fez) = 2: an(z - zo)n, n=O so that fez) can be treated as a function analytic for Iz - Zo I < a without exception. The singularity is termed removable. The new value of fez) at Zo is ao = lim fez). Z-'Zo II. A finite number of negative powers in the Laurent series. proper choice of N, fez) = a_N a_I N (z -:-- zo). + ... + - - + ao + al (z Z - Zo ---: zo) .' Here, for + ... (35) g(z) (z - zo)N' Hence fez) has a pole of order N at ZOo III. Infinitely many negative powers in the Laurent series. In this case fez) is said to have an essential singularity at zoo By a theorem of Riemann, the three cases can be distinguished as follows: I. If(z) I is bounded for 0 < Iz - Zo I < b for some b. II. If(z) I ~ 00 as z ~ ZOo III. Neither If(z) I nor II If(z) I is bounded in each punctured disk 0 < Iz - zo I < b. In Case III, by a theorem of Weierstrass and Casorati, fez) comes arbitrarily close to every complex number in every neighborhood of zoo COMPLEX VARIABLES 7-13 If fCz) is analytic for Iz I > R, then fez) is considered to have an isolated singularity at z = co. A Laurent expansion is available, with P1 = Rand P2 = co. The classification is similar to the above, with "negative'" replaced by "positive." Also the type of singularity of fez) at co is the same as that of f(l/z) at z = O. A function analytic for all finite z except for poles is termed a meromorphic function. Residues. The residue of fez) at an isolated singularity Zo is defined as Res [f(z) , zo] (36) =~. 27rl jf(Z) dz, C where C is a circle I z - Zo I = c, enclosing no singularity other than Zo, and the integration is in the counterclockwise direction. The residue of fez) at z = co, denoted by Res [fez), co], is defined by the same integral, where C is a circle I z I = c outside of which fez) has no singularity other than co and where the integration is in the clockwise direction. If ~o is finite, Res [fez), zo] (37) = a_1, where a_1 is the coefficient of (z - zo) -1 in the Laurent expansion about zoo If Zo is co, Res [fez), (38) co] = -a_I, where a_1 is the coefficient of Z-1 in the Laurent expansion of fez) for Izl> R. The CAUCHY RESIDUE THEOREM asserts that, iff(z) is analytic in an open region containing the path C, then c~f(Z) dz (39) = 27ri· (sum of residues of fez) insz'de C), provided fCz) is analytic inside C except for a finite number of isolated singularities. Similarly, ~f(z) dz (40) c = 27ri· (sum of residues of fez) outside C, including Res [f(z) , co]), provided fez) is analytic outside C except for a finite number of isolated singularities. Hence, if fez) is analytic for all z, except for a finite number of singularities, the sum of all residues of f(z) , including that at co, is O. GENERAL MATHEMATICS 7-14 Calculation of residues may be simplified by the following rules: 1. At a pole Zo of first order, Res [fez), zo] = lim (z - zo)f(z). (41) 2. At a pole Zo of order N (N = 2,3, ... ), g(N-l)(Z) Res [fez), Zo] = lim (42) Z~Zo (N - I)! , where g(z) = (z - zo)Nf(z). 3. Let fez) (43) = A(z) , B(z) where A(z) and B(z) are analytic at ZOo If A (zo) of first order at Zo, then A (zo) (44) Res [fez), zo] = - - . B'(zo) If A (zo) 0 and B(z) has a zero 0 and B(z) has a zero of second order at Zo, then . 6A '(zo) B"(zo) - 2A(zo)B"'(zo) Res [f(z), zo] = 3[B"(zO)]2 . (4.5) If .A (zo) (46) ~ ~ ~ 0 and B(z) has a zero of third order at Zo, then Res [f(z), zo] 120A"B",2 - 60A'B'''B i v - 12AB"'B 1' + 15ABt1,2 40B",3 where all quantities are evaluated at ZOo If A (z) has a first order zero at zo and B(z) a second order zero, then Res (f(z), zo] = (47) 2A'(zo) B"(zo) . ARGUMENT PRINCIPLE. Let fez) be analytic in an open region D containing the simple closed path C; let fez) have at most a finite number of singularities inside C, all of which are poles, and let fez) ~ 0 on C. Then (48) _1 ,j/'(z) dz 27ri c:r' fez) = number of zeros of f inside C - number of poles of f inside C, where zeros and poles are counted according to multiplicity. COMPLEX VARIABLES C. 7-15 The left-hand side of eq. (48) is termed the logarithmic residue of f(z) on It can be written as ~ J:.d logf. 27l"~c~ As z traces C, w J = f(z) traces a path Cw in the w-plane. The integral d log f(z) equals i times the total change in the argument of w as the path Cw is traced. Hence it equals 27l"i times the "winding number" of Cw about w = 0, i.e., the number of times that Cw effectively winds about w = 0 in the positive direction. THE ;FUNDAMENTAL THEOREM OF ALGEBRA (see Chap. 2). From the argument principle one deduces that every polynomial in z of degree N has precisely N zeros in the complex plane. ROUCHE'S THEOREM may also be deduced: if both fl (z) and f2(Z) are analytic in a simply connected open region containing the simple closed path C and If1 (z) - h(z) I < Ih(z) I on C, thenfl (z) andf2(z) have the same number of zeros inside C. Evaluation of Definite Integrals by Residues. A great variety of definite integrals can be evaluated with the aid of residues. For example, if R(u, v) is a rational function of u and v, then 1 + 1) 271" ~ (Z2 Z2 dz R(sin e, cos e) de = R - - .-, - - -:o 2~z 2z ~z i (49) Izi = 1 and the integral on the right can be computed by residues. Also, in general (50) foo J(x) dx = 2"i {sum ofresidues of J(z) in the half-plane y > O}, -00 provided f(z) is analytic for y y ~ 0 except for a finite number of points in > 0, i 0, g(z) is rational, and g(z) has a zero at 00. For further applications one is referred to Chap. VI of Ref. 12. GENERAL MATHEMATICS 7-16 6. ANALYTIC CONTINUATION Let IICz) be analytic in the open region Db f2CZ) in D 2. If D2 and Dl have a common part and fl Cz) = hCz) in that common part, then f2CZ) is said to be a direct analytic continuation of II Cz) from Dl to D 2. Given IICz), Dl, D 2 ,the function f2CZ) mayor may not exist; however, if it does exist, there can be only one such function Cuniqueness of analytic continuation). Let Db D 2, "', Dn be regions such that each has a common part with the next and let hCz) be analytic in Dj (j = 1, "', n). If hCz) = h+l Cz) Cj = 1, "', n - 1) in the common part of Dj, D j +b then one says that fl Cz) has been continued analytically from Dl to Dn via D 2, "', D n- 1 and calls fnCz) an Cindirect) analytic continuation of ftCz). Given flCZ) and the regions D 1 , " ' , D n , there is at most one analytic continuation of II Cz) to Dn via D 2, "', D n- 1 • There may exist other continuations of II (z) to Dn via other chains of regions. Given a function fCz) analytic in region D, one can form all possible continuations of fCz) to other regions. The totality of such continuations is said to form an analytic function in the broad sense CWeierstrassian analytic function). In this sense log z, VZ, sin- 1 z can each be considered as one analytic function. The importance of the concept is illustrated by the fact that every identity satisfied by fCz) will be satisfied by all its analytic continuations. The term "identity" includes linear differential equations with polynomial coefficients. ExaInple of Analytic Continuation. The functions 00 flCZ) -- zn 2: 2n +, n=O 1 \z\ < 2, are analytic continuations of each other. Indeed, both are power series expansions of hCz) = I/C2 - z) and have the same ~um for \z\ < 2. Also hCz) can be regarded as the Taylor series of flCZ) about z = -1. This series happens to converge outside of Iz \ < 2 and hence provides an analytic continuation. Analytic Continuation fro I n Reals. Let II Cz) be defined only for y = 0, a < x < b, i.e., only when z is real and between a and b. Let f2CZ) be analytic in an open region D which includes the interval of definition offlCz). Iff2Cz) = IICz) on this interval, thenhCz) is said to be an analytic continuation of II Cz) from reals. Again continuation, if possible, is unique. Examples. eZ as a continuation of eX, sin z as a continuation of sin x, log z as a continuation of log x. COMPLEX VARIABLES 7-17 7. RIEMANN SURFACES The function w = zY2 can be considered as an analytic Junction in the broad sense; that is, it is formed of several functions which are analytic continuations of each other. The resulting totality has the defect that it is two-valued: for each z, there are two possible values (except for z = 0). To remedy this defect one regards w = zY2 as a function defined not in the z-plane, but on a Riemann surface over the z-plane. In this case, the Riemann surface can be constructed as follows. One takes two copies of the z-plane, calling them Sheet I and Sheet II. Each sheet is considered as cut open along a branch line, the positive real axis. Sheet II is placed directly over Sheet I, with axes in the same position, and then the two sheets are attached by joining upper edge of the cut line of each sheet to the lower edge of the cut line of the other, as suggested in Fig. 6. Un- FIG. 6. Riemann surface of Z72. FIG. 7. Branch line for w = Z72. fortunately this cannot be carried out in space. For each point in the ,z-plane, one has then two points in the Riemann surface, one in each sheet. As one traces a path about z = 0 in the z-plane, one can describe a corresponding path in the Riemann surface by assigning a sheet to each position; no change of sheet can be made except when crossing the branch line, and a change of sheet must be made at such a crossing (Fig. 7). A closed path in the z-plane will not in general lead to a closed path on the Riemann surface. A path which closes up after two encirclements of the origin will be closed on the Riemann surface. The origin itself appears as a point common to the two sheets and is termed a branch point. On the Riemann surface just constructed one can now define yz as a single-valued function as follows: Vz = Vr eiO / 2 , 0 < 8 < 27r, on Sheet I; z = yr eiO / 2 , 27r < 8 < 47r, on Sheet II. Above the branch line continuity determines the proper value to be assigned. The procedure described can be generalized to w =~;;, w = y (z - 1) (z - 2) (z - 3) 7-18 GENERAL MATHEMATICS and to all algebraic functions. In general n sheets will be required and several branch lines and branch points. The surface for w= V Cz - 1) Cz - 2) Cz - 3) is suggested in Fig. 8. The procedure can be extended to nonalgebraic functions, but in general infinitely many sheets are required. An important case is log z, for z-plane z-plane II -II -1 4 a 1 II 0 1 II '1lffii'~~~If1ii~I~~ffiH,fffitllllllll!lIlIlIlIlI> -II -I II FIG. 8. Riemann surface of = [(z - l)(z - 2)(z - 3)]~. w FIG. 9. Riemann surface of log z. ° which sheets 0, ±I, ±II, ... are needed, as in Fig. 9. In this case z = is a logarithmic branch point and is not regarded as a point of the Riemann surface. 8. ELLIPTIC FUNCTIONS Let fCz) be a meromorphic function Canalytic except for poles); fCz) is said to have period w, W ~ 0, if fCz + w) = fCz) for all z; fCz) is called an elliptic or doubly periodic function if f is not constant and has periods WI, W2' and if WdW2 is not real. It then follows that nlwl n2w2 are also periods, for every choice of the integers nI, n2. For proper choice of WI, W2 these are all the periods of f and it will always be assumed that WI, W2 are so chosen.' The numbers Q = nlwl + n2w2 form the vertices of a paving of the plane by parallelograms, anyone of which can be chosen as a period parallelogram of fCz); it is convenient to exclude the points on a pair of adjacent sides from each period parallelogram. It can be proved that fCz) has a finite number N of poles Ccounted according to multiplicity) in a period parallelogram; N is the order of fCz) as an elliptic function; N is always at least 2. In general, fCz) - a has N zeros in the parallelogram. Jacobian Elliptic Functions. Examples of elliptic functions are provided by the functions sn z, cn z, dn z + COMPLEX VARIABLES 7-19 of Jacobi. These can be defined as follows. For fixed k, 0 (51) F(w) = i w 0 VI _ 0, but r can be continued analytically and becomes a meromorphic function with poles of order 1 at 0, -1, -2, Identities satisfied by r(z) are the following: (84) r(z r(n (85) + 1) + 1) = n! = zr(z); (n r(z) r( -z) = (86) r(z) = lim n ~ z(z (87) = 1, 2, 3, ... ); z sin n!n Z 7rZ + 1) ... (z + n) ; ze'Yz IT [(1 + -=) e- z1n ] , 00 _1_ r(z) (88) = n n=l where (89) 'Y = lim m ~ 00 (f ~n - log m) = 0.5772 1566 49 ... n=l is the Euler-Mascheroni constant. The Beta Function. fa t 1 (90) B(z, w) = Z - 1 Re z (1 - t)w-l dt, > 0, Re w > o. This is expressible in terms of the r-function: r(z)r(w) B(z w) = .. , r(z + w) (91) The Incomplete Gamma Function. (92) 1'(a, z) = fa'e-'t a - . dt, Re a> O. This is expressible in terms of the Whittaker function of the preceding section: (93) The Error Functions. (94) (95) Erf (z) = Erfc (z) = f Z OCi fa'e-" dt; ~ 7r e- t dt = - - Erf (z). 2 COMPLEX VARIABLES 7-27 These functions are also expressible in terms of the Whittaker function: (96) The LogarithIllic Integral Function. (97) Ii (z) = i z - dt = - (- log z) Y22Z 722W -Y2,o( - log z). o log t The Exponential Integral Function. _foo e- Ei (z) = (98) -z t dt = Ii (e Z ). t The Sine and Cosine Integral Functions. (99) si z = I sin t z - 00 Si z = (100) (101) 1 dt = - [Ei (iz) - Ei (-iz)] t 2i ' Ci z = - f oo z i o cos t - t Z sin t 7T' t 2 - - dt = - + si z· dt = ![Ei (iz) ' + Ei (-iz)]. In eq. (97) z is first taken as real and positive, but analytic continuation then gives meaning to the function, as a multiple-valued function, for all z ~ O. Similarly, in eq. (98), z is first to be real and negative. The functions si z and Si z are entire functions; Ci (z) - log z is also entire. The RieIllann Zeta Function. 00 (102) t(z) = 1 L-'z Re z > 1. n=ln This function can be continued analytically and becomes a function singlevalued and analytic for all z except z = 1, where r(z) has a pole of first order. One has the integral representation: (103) 1 t(z) = r(z) i oo 0 t eZ z 1 _ 1 dZ J Re z > 1. 7-28 GENERAL MATHEMATICS REFERENCES 1. Higher Transcendental Functions, Vols. 1, 2, 3, prepared by the staff of the Bateman manuscript project, McGraw-Hill, New York, 1954. 2. L. V. Ahlfors, Complex Analysis, McGraw-Hill, New York, 1953. 3. R. V. Churchill, Introduction to Complex Variables and Applications, McGraw-Hill, New York, 1948. 4. H. Hancock, Lectures on the Theory of Elliptic Functions, Vol. 1, Wiley, New York, 1910. 5. H. Hancock, ElUptic Integrals, Wiley, New York, 1917. 6. A. Hurwitz and R. Courant, Funktionentheorie, 3rd edition, Springer, Berlin, 1929. 7. E. Jahnke and F. Emde, Tables of Functions, 3rd edition, Teubner, Berlin, 1938. 8. W. Kaplan, A First Course in Functions of a Complex Variable, Addison-Wesley, Cambridge, Mags., 1953. 9. K. Knopp, Theory of Functions, Vols. 1, 2, translated by F. Bagemihl, Dovel', New York, 1945. 10. F. Oberhettinger and "Y. Magnus, Anwendung der Elliptischen Funktionen in Physik und Technik, Sp!'inger, Berlin, 1949. 11. E. C. Titchmarsh, 'l'he Theory of Functions, 2nd edition, Oxford University Press, Oxford, England, 1939. 12. E. T. Whittaker and G. M. Watgon, A Course of Modern Analysis, 4th edition, Cambridge University Press, Cambridge, En,?;1and, 1940. A GENERAL MATHEMATICS Chapter 8 Operational Mathematics w. Kaplan 1. Heaviside Operators 8-01 2. Application to Differential Equations 8-05 3. Superposition Principle. 8-06 Response to Unit Function and Delta Function 4. Appraisal of the Heaviside Calculus 8-07 5. Operational Calculus Based on Integral Transforms 6. Fourier Series. Finite Fourier Transform 8-07 8-10 7. Fourier Integral. 8-15 Fourier Transforms 8. Laplace Transforms 8-17 9. Other Transforms 8·18 References 8- 19 RI 1) + (1/<1>2) = (1/<1>2) + (1/<1>1), as is multiplication, provided the coefficients are constant. The ratio of two polynomial operators is defined by the equation: <1>1 (D) [ 1 ] <1>2 (D) f = <1>1 (D) <1>2(D)f . (10) Here the order chosen is essential: it is not true that (11) even if the coefficients are constant. linear. All operators defined thus far are Integral Representation of Inverse Operators with Constant Coefficients. One has the formulas: 1 -f(t) = (12) (13) D feu) du, 0 - -1f ( t ) = eat D-a 1 (14) it it it f t - eat (D_a)k ()- e-aUf(u) du, 0 0 e-au(t - U)k-l u du (k-l)! f() , where a is constant and k = 1, 2, .... Now <1>(D) can be factored as in algebra: (15) where rr, "', rn are the roots of the characteristic equation (16) Correspondingly, (17) Thus if n = 2 TIt t _1_f= 1 [ 1 f] = e e<-T1 +T2)U rUe-T2Vf(v) dv du. <1> (D) ao(D - rl) CD - r2) ao)o Jo r In general, computation of [1/<1>(D)]f is reduced to a repeated integration. GENERAL MATHEMATICS 8-04 If ¢(r) has complex roots, quadratic factors appear in eq. (17); for these one has the rule: 1 eatit au , (18) 2 2f = e- sin bet - u)f(u) duo (D - a) + b b 0 If II ¢(D) is expanded in partial fractions as in algebra, then the corresponding operator identity is valid; for example, t = !e'i e~f(u) du t - !e-'i e"f(u) duo HEAVISIDE EXPANSION THEOREM. More generally, a ratio ¢1(D)/¢2(D) can be replaced by its partial fraction expansion. If in particular the degree of ¢2 exceeds that of ¢1, and ¢2(r) has simple roots rl, r2, ... , r n , then by Chap. 7, Sect. 5, ¢1 (r) ~ ¢1 (rk) 1 L..J----' ¢2(r) k=1 ¢' 2(rk) r - rk -- = (19) ¢1 (D) = ¢2(D) i= 1 k=1 ¢'2(rk) D - rk ¢1 (rk) This is in essence the Heaviside expansion theorem. Power Series Operators. The formal relation 1 1 --=----D - a does not agree with the definition of I/(D - a). However, if the operator is applied to a polynomial in t, one obtains a particular solution of the corresponding differential equation (with modified initial conditions). For example, is a solution of dx --ax=t dt for which x(O) = -2a-3 • 2 OPERATIONAL MATHEMATICS 8-05 One can also expand in inverse powers of D: a a2 D-a=D+D2 +D3 1 (20) 1 -···· In this case the rule can be proved to be correct. OCJ The power series L hnDn / n! can be interpreted as the operator ehD • o One then finds, under appropriate conditions, (21) 2. APPLICATION TO DIFFERENTIAL EQUATIONS The general solution of a linear differential equation, cjJ(D)x = J(t), (22) is formed of the complementary Junction xc(t), which is the general solution of the homogeneous equation cjJ(D)x = 0 and of a particular solution xp(t) of the given equation: (23) [cf. Chap. 5, Sect. 3]. The Heaviside operators provide simple ways of finding xp(t), namely as the function 1 xp(t) = -J(t); (24) cjJ(D) this is the solution with zero initial conditions. If 1/cjJ(D) is expanded in partial fractions, one can then apply the integral formulas (12), (13), (14), (18). The procedure can be extended to simultaneous equations. For example, + (D + l)x + Dx (D 1)y = F(t) 2Dy = G(t) can be solved formally: x = 2D D2 + 1 F(t) - D-1 D2 +1 G(t) ' Y= D D2 +1 G(t) - D+l F(t) D2 + 1 and it can be verified that these provide the solution for which x = 0, y = 0 when t = o. GENERAL MATHEMATICS 8-06 3. SUPERPOSITION PRINCIPLE. DELTA FUNCTION RESPONSE TO UNIT FUNCTION AND The Heaviside unit Junction u(t) is defined to equal 0 for t < 0 and to equal 1 for t ~ O. The solution of the differential equation cP(D)x = u(t) with zero initial values, i.e., the function (1/ cP(D) )u(t) = A (t) is known as the indicial admittance or step response. The superposition principle states that the response of a linear system to a linear combination cdl (t) C2J2(t) equals the corresponding linear combination CIXl(t) C2X2(t) of the responses Xl(t) toh(t), X2(t) tof2(t). In the typical case x(t) and J(t) are related by a differential equation cP(D)x = J(t) and the superposition principle is equivalent to the statement that 1/cP(D) is a linear operator. One can apply the superposition principle to show that (when cP(D) has constant coefficients) the response to a general J(t) is deducible from the indicial admittance, i.e., the response to u(t). Indeed, the response to u(t - h), for h ~ 0, is A(t - h); one can approximateJ(t) by a linear combination ~kCkU(t - tk), where Ck = J(tk+l) - J(tk). A passage to the limit gives the Duhamel theorem + + x(t) = (25) f. t J(s)A'(t - s) ds. [It is assumed that J(t) is 0 for t < 0 and the solution x(t) has 0 initial values]. If J(t) is constant, equal to 1/ E for 0 ~ t ~ E, and then equal to 0 for t> E, the response is [A(t) - ACt - E)]/E. The limiting case of such anJ(t), as E ~ 0, is an "ideal function," the delta Junction oCt), also termed the unit impulse Junction. The response to oCt) is interpreted as A'(t) = h(t). Accordingly, x(t) = (26) f. t J(s)h(t - s) ds. For some linear systems the response to u(t) appears as [cP(D)/1f(D)]u, where cP and 1f are polynomials. If 1f has simple roots ba (a = 1, "', k), then by eqs. (19) cP(D) k cP(b a) 1 k cP(b a) ebat - 1 - u = 2:-u = 2:-u(t) 1f(D) a=l1f'(ba) D - ba a=l1f'(b a) ba and hence (27) OPERATIONAL MATHEMATICS 8-07 4. APPRAISAL OF THE HEAVISIDE CALCULUS The operational methods described in the preceding section provide a valuable tool for solution of linear differential equations. The method has two principal drawbacks: it is very awkward to obtain solutions with specified initial values other than 0; further development of the method leads to symbolic expressions whose meaning has to be studied afresh in each case. Great ingenuity has been employed to remedy these defects but a satisfactory general theory within the Heaviside framework has not been found. On the other hand, it has been discovered that all the goals of the Heaviside calculus can be achieved without reference to differential operators or their inverses and, indeed, without any symbolic calculus. The means to this end is the Laplace transform (see Chap. 9); the closely related Fourier transforms can also serve the purpose. By means of these the questions about initial conditions are easily disposed of, and justification of formal rules becomes simple. The transformations referred to do not merely serve as a substitute for the Heaviside calculus. Deeper study shows that they lie at the very basis of that calculus and must inevitably enter in a full justification of the operational rules. 5. OPERATIONAL CALCULUS BASED ON INTEGRAL TRANSFORMS One considers equations ,of form (28) F(y) = f b f(t)K(t, y) dt. a Such an equation assigns a function F(y) to each function f(t), whenever the integral has meaning. One calls F the integral transform of f with respect to the particular transformation (28) and writes: (29) F = T[f]. The relation between f and F is much like that between independent and dependent variables; here the variables are functions. Because of the form of eq. (28), the transformation T is linear: (30) The transformation (28) is said to have a (single-valued) inverse if, for each F of a certain class, there is precisely one f such that T[f] = F. One writes: (31) f = T- 1 [F] GENERAL MATHEMATICS 8-08 and calls f the inverse transform of F. Because T is linear, T- 1 must also be linear. Convolution. If to each pair of functions iI, f2 one can associate (in a unique manner) a third function fa such that T[f3] = T[iI]' T[12]' (32) then one calls fa the convolution of iI, 12 and writes: (33) fa = iI *12· The convolution must then obey simple laws: (34) *iI ; iI * (12 + fa) = iI * f2 + fl * f3; = eiI * 12 = e (iI * h) ; iI * (12 * fa) = (iI * h) * f 3· iI * f2 = iI * ef2 f2 Solution of Differential Equations. Suppose the transformation T has the property that, for a certain polynomial differential operator cp(D) and for f(t) in a certain class of functions, one has an identity: T[cp(D)f] = H(y)T[J] = H(y)F(y), (35) where H(y) is a function of y associated with the operator cpo solve a differential equation Then to cp(D)x = get) (36) for x = f(t) in the class ref~rred to, one forms the transformed equation T[cp(D)x] = T[g] or equivalently, by eqs. (35), H(y)F(y) = G(y). (37) Accordingly, (38) F( ) = G(y) , y H(y) f(t) = T-1 [G(Y)]. H(y) One can try to find the inverse transform of G(y)/H(y) with the aid of tables of functions and their transforms. One can also seek T-1 (39) [_1 ] = w(t). H(y) Then eq. (38) gives (40) T[f(t)] = T[w(t)]· T[g(t)] = T[w so that (41) f(t) = wet) * get). * g], OPERATIONAL MATHEMATICS 8-09 The crucial question is choice of the transformation '1' so that eqs. (35) hold. For differential" equations with constant coefficients it is sufficient to choose 'T so that T[Df(t)] = H(y)F(y). (42) For then (43) T(aoDn + ... + an-1D + an)! = (aolIn + ... + a1H + an)F(y). Fourier Integral. Now associated with the operator D are certain functions f such that Dj is a constant times f; these are precisely the functions ke at • It is known that an "arbitrary" function f is expressible as a "sum" of functions of this form. For example, under appropriate conditions, (44) r(t) ~ foo F(w)ei"' dw; -00 this is the representation of f as a Fourier integral. (45) F(w) = -1 271" One finds that j'OO f(t)e- iwt dt, -00 so that F(w) can be considered as ~T[f], a linear integral transform of T; except for a constant multiplier, this is the Fourier transform of f. The fact that De iwt = ie iwt is reflected in the formula .' Df = f'(t) = fOO iwF(w)eiwt dw, -00 which follows from eq. (44). (47) Hence T[Dfl = iwF(w). Thus the transformation T defined by eq. (45) has the property desired. The functions f representable as Fourier integrals must be small for large positive or negative l (see Sect .. g). For functions not satisfying such a condition other representations can be used. If f is defined only for t ~O and does nQt grow too rapidly as t ~ 00, one can use the Laplace transform. If.f is defined for all t and has period 271". then .f can be represented by a Fourier series; associated with this series is the finite Fourier transform. If cp(D) does not have constant coefficients, the transformation T must be related specially to the particular operator cp. Associated with cp are the "characteristic functions" f for which cpD(f) is a constant times f. Representation of an arbitrary function as a series or integral of such characteristic functions leads to a corresponding integral transformation. 8-10 GENERAL MATHEMATICS 6. FOURIER SERIES. FINITE FOURIER TRANSFORM Let J(t) be defined for all real t. One says that J(t) has period T ~ 0 if + T) = J(t) for all t. A function J(t) given only for a < t < b can always be defined outside this interval so as to have period T = b - a (periodic extension of J( t) ) . Let J(t) have period T and let w = 2rr/T. The Fourier series of J(t) is defined as the series: I(t a ~ (48) 2 where (49) an = -2 T 00 + L: (an cos nwt + bn sin nwt), n=l iT i T 2 bn = - J( t) cos nwt dt, 0 T J(t) sin nwt dt. 0 Because of the periodicity of J(t), the interval of integration in eqs. (4J) can be replaced by any other interval of length T, e.g., from - Y2T to Y2T. T FIG. 1. 2T Piecewise continuous function of period T. It is assumed that the integrals in eqs. (49) have meaning. For this it is sufficient that J(t) be piecewise continuous, i.e., continuous except for jump discontinuities (Fig. 1). Convergence. The Fourier series of J(t) converges to J(t) under very general conditions: for example, wherever J(t) is continuous and has a derivative to the left and to the right. At a jump discontinuity to the series converges to ![f(to+) + J(t o - )], where (50) JCto+) = lim J(t) , t---->to+ JCto-) = lim J(t) , t---->to- OPERATIONAL MATHEMATICS 8-11 provided [J(to + h) - J(t o+ )]/h and [J(to -) - J(to - h)]/h have limits as h ~ 0+. For example, if J(t) = t for -1 < t < 1 and J(t) has period 2, then the corresponding Fourier series converges to 0 at t = 1, t = -1, t = 3, t = -3, .... It is common practice to redefine J(t) as Y2[J(to+) + J(t o -)] at each jump discontinuity. If J(t) is merely continuous, there is no general theorem on convergence. However, one has a "convergence in the mean," that is, if sn(t) denotes the sum of the first n terms of the series (48), then the "mean square error" iT T -1 [J(t) - sn(t)]2 dt 0 tends to 0 as n ~ 00. This result holds considerably more generally, e.g., if J is merely piecewise continuous. If J(t) has a continuous derivative over an interval to ~ t ~ tI , then the Fourier series of J(t) converges uniJormly to J(t) over this interval; i.e., max (51) to IJ(t) - snU) I ~ 0 as n ~ 00. ;£ t ;£ tl In general, if a series of form (48), i.e., a "trigonometric series," converges uniformly to J(t) for 0 ~ t ~ T, then the series must be the Fourier series of J(t). A function is determined uniquely by its "Fourier coefficients" ao, aI, .. " bI , . . . ; that is, if J(t) and get) have the same Fourier coefficients, then J(t) = get) except perhaps at points of discontinuity. Fourier Cosine and Sine Series. If J(t) is even [J(t) = J( -t)], then all bn are 0 and J(t) is represented by a Fourier cosine series; that is, (52) a J(t) = ~ 2 + L: an cos nwt, an = -4 00 n=I T iT'2 J(t) cos nwt dt, 0 provided the convergence conditions are satisfied. If J(t) is given merely between 0 and Y2T, eqs. (52) are still valid; for J(t) can be extended to all t to be even and have period T. Similar remarks apply to representation of an odd function [J(t) = -J( -t)] by a Fourier sine series: 00 (53) L: bn sin nwt, J(t) = n=I bn = -4 T iT'2 J(t) cos nwt dt. 0 The identities: (51) 1. . sin a = - (eta - e- ta ) 2i ' GENERAL MATHEMATICS 8-12 lead to a rewriting of the formulas (49) in complex form. Under conditions for convergence, 00 T1 n= C iT . 0 f(t)e- mwt dt, n = 0, ±1, .... n=-oo One can interpret the doubly infinite Finite Fourier TransforIn. sequence of numbers .£ T f(t)e- inwt dt, n = 0, ±1, ±2, "', as a function of n, >(n), defined only when n is an integer. The equation >(n) = (56) .£ T f(t)e- inwt dt can then be regarded as a special case of the linear integral transformation (28); the variable y is replaced by n and is restricted to integer values. The notations: (57) >(n) = cp[J(t)] or > = cp[f] will be used to denote the functional transformation CP, the finite Fourier transformation, which assigns the function >(n) to the function J(t). cP is then defined at least for all J(t) which are piecewise continuous for 0 ~ t ~ T. As in Sect. 5, cP is linear: (58) Inverse TransforIn. If cp[f] = >, then one writes: J = cp-l[>]. The inverse transformation is then uniquely defined by the theorem stated above concerning functions having the same Fourier coefficients. It is a less simple matter to describe those functions ¢(n) for which <1>-1 exists. One class of such functions ¢(n) consists of those for which the series ~T-l>(n)einwt converges uniformly for 0 ~ t ~ T. The sum of the series is then a function J(t) which serves as cp-l[>]; 1 (59) <1>-1[¢] T Convolution. Given 00 L: =- >(n)einwt • n=-oo !I (t), J2(t) having period T, their convolution is defined as: .£ !I T (60) Ja(t) = (s)h(t - s) ds; OPERATIONAL MATHEMATICS one writes: property: 8-13 h (t) = f 1 (t) * 12 (t) . One can then prove the characteristic (G 1) cfJ[ft * 12] = cfJ[fd . cfJ[h]· If f(t) has a continuous derivative T, then an integration by parts proves that Transformation of Derivatives. for 0 ~ t ~ (62) cfJ[f'(t)] = f(T) - f(O) + inwcfJ[f(t)]. This rule can be made the basis for application of the finite Fourier transform to boundary value problems. Interest will be concentrated here on the periodic case: f(T) = f(O), for which the rule becomes cfJ[f'(t)] = inwcfJ[f(t)]. (63) Similarly, if order, f is periodic and has continuous derivatives through the kth (64) this relation remains true if f(k-l) (t) is continuous and f(k) is continuous except at a finite number of points at which left and right hand kth derivatives exist. From eq. (64) it follows that, for every polynomial operator 1f;(D) = aoDn + ... + an_1D + an with constant coefficients (65) cfJ{1f;(D)[f(t)]} = 1f;(inw)cfJ[f(t)]. Steady-State Solutions of Differential Equations. Letf(t) be piecewise continuous and have period T. Let ao, "', an be constants and let 1f;(D) = aoDn + ... + an-1D + an. It can then be shown that in general the differential equation 1f;.(D)x = f(t) (66) has a solution x = X(t) having period T; X(t) has continuous derivatives through the (n - l)st order and an nth derivative which is continuous. where F(t) is continuous. If 1f;(p) has no root of the form inw for some n, there is precisely one such periodic solution; it will be assumed in the following that 1f;(inw) ~ 0 for every n. Applying the finite Fourier transformation to eq. (66), one finds by eq. (65) (67) cfJ[X] (n) einwt • n=-oo !f;(inw) One can attempt to reduce this to a simpler form by developing a table of finite Fourier transforms and inverse transforms. One can also apply the convolution formula to eq. (67): (69) X = g *J = i T J(s)g(t - s) ds, where g = cp-I[I/!f;(inw)]. To find g, decompose 1/!f;(inw) into partial fractions and apply linearity. The problem is reduced tio finding inverses of (inw - a)-k (k = 1,2, ... ). One finds: cp-I [ 1 • ~nw ] - a = k a eat , (70) where ka = (1 - eaT )-I. In particular, if !f;(p) has simple roots PI, Pm, so that (71) one finds that m (72) = X(t) L AjH~(t, Pj) j=1 where HI(t, p) = ePt[QI(t, p) (73) QI(t, p) = i + kpePTQJ(T, p)], t J(s)e- P8 ds, 0 ~t~ T. The operators QJ and HI can be tabulated for various functions J(t) of interest, so that the corresponding periodic solutions can be found easily. For tables and illustrations of applications see Ref. 11. OPERATIONAL MATHEMATICS 7. FOURIER INTEGRAL. 8-15 FOURIER TRANSFORMS Fourier Integral. By allowing the period 'P to become infinite, one is led to the following integral analogue of the Fourier series expansion: f(t) = 1.oo[a(w) cos wt + b(w) sin wt] dw, (74) 1 a(w) = 7r' f-: f 1 00 b(w) = - J(t) cos wt dt, 7r' -00 f 00 J(t) sin wt dt. -00 The "coefficients" aCw), b(w) exist if J(t) is, for example, piecewise continuous, and IJCt) Idt exists. The representation of J(t) as a Fourier inte- gral is then valid under the same conditions as for Fourier series, e.g., wherever f' (t) exists. Also, under the conditions described in Sect. 8, the integral equals Y2[J(t o+) + J(t o -)] at each jump discontinuity of J. One can write eqs. (74) in complex form: f(t) = (75) Joo A(w)e iw , 1 dw, A(w) =- 27r' -00 f 00 . J(t)e- twt dt; -00 the first integral must, however, be treated as a principal value, i.e., as f ... b lim as b -7 00. -b Under conditions analogous to those for Fourier series one is led to representation of a function JCt) in the interval 0 ~ t < 00 by a Fourier cosine integral !a ooa(w) cos wi dw. It is customary to define the Fourier cosine transform of J(t) as /2 00 F,(w) = \/; 1. f(t) cos wt dt (7G) so that the Fourier cosine integral representation of J reads /2 00 f(t) = \ / ; 1. F,(w) cos wt dw; (77) thus J is also the Fourier cosine transform of F c' Similar formulas hold for the Fourier sine transform: (2 (78) 00 F,(w) = \/; 1. f(t) sin wt dt, J(t) /2 00 \/; 1. F.(w) sin wt dw. GENERAL MATHEMATICS 8-16 Similarly one defines the (exponential) Fourier transform ofj as F(w) = (79) 1 fifI'J j(t)e-iwt dt, V271" -rYJ so that by eqs. (75) 1 rYJ f !(w)e v271" -rYJ jet) = _ /- (80) iwt dw. Properties of the Fourier Transform. For simplicity, the numerical factor is dropped and the Fourier transform is defined as CPrYJ[f] = jrYJj(t)e- iwt dt; -rYJ (81) then cI>rYJ[f] is a linear operator. If j has a continuous derivative 1'(t) and jet), 1'(t) satisfy the conditions stated above, then (82) A convolution is defined as follows: j (83) * g = frYJj(S)g(t - s) ds = h(t) -rYJ and one has the characteristic property: (84) it is assumed here that j, g satisfy the conditions given above. An inverse operator is defined by the condition: rYJ -l[F] = j, if CPrYJ[f] = F. The function j can be shown to be uniquely defined by its transform F. The applications of the Fourier transform to differential equations parallel those for the finite Fourier transform, as described in Sect. 8 above; eq. (65) is replaced by (85) Application of the transform to the equation 1/;(D)x = jet) yields a solution in the form of a Fourier integral: (86) 1 frYJ F(w). . X(t) = - - e~wt dw 271" -rYJ 1/;(iw) , or as a convolution: (87) X(t) = j * g, OPERATIONAL MATHEMATICS 8-17 ° If J(t) = for t < 0, the same solution is obtainable by Laplace transforms; see Sect. 10 and Chap. 9 below. References to tables of Fourier transforms are given at the end of this chapter (Refs. 1,5,6). 8. LAPLACE TRANSFORMS The Laplace transform of J(t) , t ~ 0, is defined as . (88) 00 F(s) = L[f] = 1"0 J(t)e- st dt. It is convenient to allow s to be complex: s = u reads: + iw. Equation (88) then (89) hence for each fixed u the Laplace transform of J is the same as the Fourier transform of J(t)e- qt , where J is considered tobe for t < 0: ° (90) Accordingly, the Laplace transform is well defined if u is chosen so that (91) exists, and for such u one can invert: 1 J(t)e- l1t = 27J' f 00 F(u + iw)e . twt dw, -00 (92) 1 J(t) = L-l[F(s)] = 27J' f 00 F(s)e st dw, t> 0; -00 + in the last integral s = u iw, u has any value such that (91) exists, and the integral itself is a principal value. The integral can be interpreted as an integral in the complex s-plane along the line u = const., w going from -00 to +00 (Fig. 2). Since ds = idw on the path, (93) J(t) = ~ rF(s)e 27J'~ Jc st ds, C being the line u = const. The conditions for equality of left and right sides of (93) are the same as for Fourier integrals. At t = 0, J(t) will 8-18 GENERAL MATHEMATICS in general have a jump, because of the convention that J(t) be 0 for t < 0, and accordingly the right hand side gives )1J(O+). The validity of eqs. (88) and (92) depends on choosing (J' so that (91) exists. It can be shown that for each J(t) there is a value (J'o, - 00 ~ (J'o ~ w s-plane c FIG. 2. Path of integration for inverse of Laplace transform. +00, called the abscissa oj absolute convergence, such that the integral (91) exists for (J' > (J'o. If (J'o = -00, all values of ..(J' are allowable; if (J'o = +00, no values are allowed. Further properties of the Laplace transform and its applications are discussed in Chap. 9. 9. OTHER TRANSFORMS The two-sided Laplace transJorm is defined as (94) L 1 [f] = F(s) = jooJ(t)e- st dt. -00 Hence it differs from the (one-sided) Laplace transform only in the lower limit of integration; thus (95) with no requirement that J(t) be 0 for t < o. The two-sided transform is thus a generalization of the one-sided transform. The Laplace-Stieltjes transJorm of get) is defined as (96) G(s) ~ .£00e-,t dg(t). The integral on the right is an improper Stieltjes integral; it has meaning if get) is expressible as the difference of two monotone functions and if the limit as b ~ +00 of the integral from 0 to b exists. If g'(t) = J(t) exists, OPERATIONAL MATHEMATICS 8-19 then G(s) is the Laplace transform of f(t). If get) is a step function with jumps at t1 , t2 , " ' , the integral'reduces to a series };cje- tjs • For furtheI information one is referred to the book of Widder (Ref. 10). Other integral transforms have been defined and studied. These have found their main applications in the boundary value problems associated with partial differential equations; they could conceivably be applied to ordinary linear differential equations with variable coefficients, on the basis of the analysis of Sect. 5. The Legendre transform is an example which assigns to each f(t), -1 ~ t ~ 1, the function (97) T[f] = cP(n) = f 1 f(t)P net) dt, n=O,I,2,···, -1 where P net) is the nth Legendre polynomial. The transformation has the property T[R{f}] = -n(n+ l)cP(n), (98) R{f} = ~ dt [(1 - t 2 ) ~f(t)]. dt Hence the transform can be applied to differential equations of form (99) (aoR m + alRm-l + ... + am-1R + am)x = f(t), -1 ~ t ~ 1, where ao, ... , am are constants. For details on the Legendre transform see Ref. 12. The Mellin transform, Bessel transforms, Hilbert transform, and others are defined and their properties are listed in the volumes of the Bateman project (Ref. 1). REFERENCES 1. Tables of Integral Transforms, Vols. 1, 2, prepared by the staff of the Bateman manuscript project, McGraw-Hill, New York, 1954. 2. R. V. Churchill, Modem Operational Mathematics in Engineering, McGraw-Hill, New York, 1944. 3. G. Doetsch, Theorie und Anwendung der Laplace Transformation, Springer, Berlin, 1937. 4. G. Doetsch, Handbuch der Laplace Transformation, Vol. I, Birkhiiuser, Basel, 1950. 5. G. Doetsch, H. Kniess, and D. Voelker, Tabellen zur Laplace Transformation, Springer, Berlin, 1947. 6. M. F. Gardner and J. L. Barnes, Transients in Linear Systems, Vol. I, Chapman and Hall, London, 1942. 7. T. von Karman and M. A. Biot, Mathematical Methods in Engineering, McGrawHill, New York, 1940. , 8-20 GENERAL MATHEMATICS 8. D. F. Lawden, Mathematics of Engineering Systems, Wiley, New York, 1954. 9. B. van der Pol and H. Bremmer, Operational Mathematics Based on the Two-sided Laplace Integral, Cambridge University Press, Cambridge, England, 1950. 10. D. V. Widder, The Laplace Transform, Princeton University Press, Princeton, N. J., 1941. 11. W. Kaplan, Operational Methods for Linear Systems, Addison-Wesley, Cambridge, Mass., 1958. 12. R. V. Churchill, The Operational Calculus of Legendre Transforms, J. Math; Phys., 33, 165--178 (1954). / A GENERAL MATHEMATICS Chapter 9 Laplace Transforms w. Kaplan 1. Fundamental Properties 9-01 2. Transforms of Derivatives and Integrals 9-03 3. Translation. Transform of Unit Function, Step Functions, Impulse Function (Delta Function) 9-06 4. Convolution 9-08 5. Inversion 9-09 6. Application to Differential Equations 9-10 7. Response to Impulse Functions 9-15 8. Equations Containing Integrals 9-18 9. Weighting Function 9-18 10. Difference-Differential Equations 9-20 11. Asymptotic Behavior of Transforms 9-21 References 9-21 1. FUNDAMENTAL PROPERTIES Of the various operational methods described in Chap. 8 those based on the Laplace transform have proved to be the most fruitful. Basic Definitions and Properties. Let f(t) be a function of the real variable t, defined for t ~ O. The Laplace transform of J(t) is a function F(s) of the complex variable s = u + iw: (1) L[fl = F(s) = ,£oof(t)e-" dt. 9-01 9-02 GENERAL MATHEMATICS It is convenient to allow J(t) itself to have complex values: J(t) = it (t) + ihCt), though for most applications J will be real. It will be assumed that J(t) is piecewise continuous (Chap. 8), although the theory can be extended to more general cases. It can be shown that there is a number 0"0, -00 ~ 0"0 ~ +00, such that (2) i OO If(t) Ie-at dt exists for 0" > 0"0 and does not exist for 0" < 0"0. If 0"0 = - 00, the integral exists for all 0"; if 0"0 = +00, it exists for no 0"; 0"0 is called the abscissa oj absolute convergence of L[f]. If 0" > 0"0, then the Laplace transform of J does exist. Accordingly, there is a certain half-plane in the complex s-plane for which L[f] = F(s) is defined (Fig. 1). Furthermore, F(s) is an analytic Junction oj s in this halJ-plane (Chap. 7, Sect. 2). REMARK. For existence of F(s), it is sufficient that the integral in (1) w 0"0 0" FIG. 1. Domain of definition of F(s) = L[fl. have meaning. It can be shown that there is a number 0"1, the abscissa oj (conditional) convergence, for which this integral exists, and 0"1 ~ 0"0. For most applications 0"1 = 0"0 and for most operations on F(s) it is simpler to restrict 0" to be greater than 0"0. EXAMPLES OF LAPLACE TRANSFORMS. These are given in Table 1.' For extensive tables one is referred to Refs. 1, 5, 6, Chap. 8. Existence. For practical purposes the condition that the Laplace transform exist for some 0" is that the function J(t) should not grow too t2 do not have Laplace transforms.' rapidly as t -7 +00. For example, e , In general, a function of exponential type, i.e., a function for which IJ(t) I < ekt for some k and for t sufficiently large, has a Laplace transform F(s). ee\ LAPLACE TRANSFORMS 9-03 Linearity. The Laplace transform is a linear operator. More precisely, if L[h(t)] = FI(S) exists for u > UI and L[h(t)] = F2(S) exists for U > U2, then for every pair of constD,nts Cl, C2 L[cdi + c2f2] exists for u > max (ut, (2) and (3) 2. TRANSFORMS OF DERIVATIVES AND INTEGRALS Rules. (4) L[f'(t)] = sL[f(t)] - f(O), (5) L[f"(t)] = s2L[f(t)] - 1'(0) - sf(O) , (6) L[f(n)(t)] = snL[f] - [f(n-l)(o) (7) L LC f(t) dt] + sj(n-2)(0) + ... + sn-lj(O)], ~ ~ L[fl· The first rule is basic here, the others being consequences of it; it is valid if (for some u) jet) and I'(t) have Laplace transforms and jet), I'Ct) are continuous for t ~ O. More generally, eq. (4) is valid if only J(t) is continuous and I' (t) is continuous except for jump discontinuities. Similarly, eq. (6) is valid if the Laplace transforms exist and all derivatives concerned are continuous except perhaps the nth, which is allowed to have jump disdiscontinuities. Rule (7) is valid if J is piecewise continuous and the transforms exist. EXAMPLE. If J = sin t, j' = cos t, J(O) = 0, so that L[cos t] = sL[sin t] = S/(S2 + 1). Of great importance is the special case of eq. (6): (8) L[J(n) (t)] = sn L[f], if J(O) = I' (0) = ... = J(n-l) (0) = O. Hence, if one restricts to functions with 0 initial values, differentiation with respect to t corresponds to multiplication by s. '0 TABLE 1. F(s) = L[f] = Lf(t)e-st dt f(t) 1 b,a:.... LAPLACE TRANSFORMS Range of u u>o 1 eat l/s 1/(s - a) 3 tn (n > -1) r(n 1) of sn+l. or, 1 n = 0, 1, 2, 4 tnefLt (n > -1) r(n + 1) of c;-a)nH or, n = 0,1,2, 5 cos at s/(s2 6 sin at 7 cosh at 8 sinh at 9 t n cos at (n > -1) 10 tn sin at (n > -1) 11 cos 2 t 2 ; + S2 + 4 S) (}">O 12 sin 2 t 1 (1 2 s S) (}">O 13 sin at sin bt 2 u > Re (a) + n! 0 0 0, u>O sn+l n! (}" > Re (a) Q m (}" > IImal m ) (}" > IImal r- siCs!' - a2 ) a/(s2 - a2 ) (}" > iReal 3: (}" > IReal ::z: I 000, (s _ a)n+1 + a2) al(s?' + a 2 + 1) (s + ai)n+l + (s - ai)n+l (S2 + a2)n+l rCn + 1) (s + ai)n+l - (s - ai)n+1 --(S2 + a2)n+l 2i r(n .:., 1 (1 S2 +4 2abs [S2 + (a + b)2][S2 + (a - b)2] (}" > IImal Z ;;0 » »-4 m 3: » ::::! n (J) (}" > IImal (}" > Max (a, (3) a = 11m (a + b) I (3 = IIm(a - b)/ 14 eat sin (bt (8 - a) sin c + b cos C (8 - a)2 + b2 + c) u > Max (a, (3) a = Re (a + bi) {3 15 16 17 1 for 2n ~ t < 2n + 1 o for 2n + 1 ~ t < 2n + 2 n = 0,1,2, ... (square wave) 1 for a ~ t ~ b < 00 o for 0 ~ t < a and t > b o for 0 ~ t < b, 1 for t 18 19 20 t, 0 1, t ~ ~ ~ b t ~ 1 1 t, 0 ~ t ~ 1 2 - t, 1 ~ t 0, t ~ 2 ~ 2 a a - bit - (2n + l)b I for 2nb ~ t ~ (2n + 2)b, aCt - nb) for nb ~ t (sawtooth wave) 8(1 + e-8) < (n + l)b = Re (a - bi) u>O e-as _ e- bs 8 all u e- OS 8 1 - e- s -8-2(1 - e-8 ) 82 n = 0, 1, ... , b > 0, a real (triangular wave) 21 1 u>O u>O r- »-c r»n m all u -f :;0 » Z en "T1 0 a 1 + e- bs b 82 u>O a(l + b8 - ebs) 82 (1 _ eb8 ) u>O :;0 ~ en '0 6 til 9-06 GENERAL MATHEMATICS 3. TRANSLATION. TRANSFORM OF UNIT FUNCTION, STEP FUNCTIONS, IMPULSE FUNCTION (DELTA FUNCTION) Translation. In Laplace transform theory it is convenient to consider each function J(t) to be defined as 0 for t < o. Hence for c ~ 0 J(t - c) is o for t < c and coincides with a translated J(t) for t > c (Fig. 2). One finds L[J(t - c)] = fooJ(t - c)e-st dt (9) c = e-CSL[f]. x I(t) I(t - c) c FIG. 2. Translated function. Unit Function. N ow let u(t) = 0 for t ~ 0, u(t) = 1 for t > 0; u(t) is called the unit function (of Heaviside). By entry 1 of Table 1, L[u(t)] (10) 1 = -. 8 Hence for c ~ 0 e- Cs (11) L[u(t - c)] = -, s where u(t - c) is the translated unit function with jump at t = c (Fig. 3); cf. entry 17 of Table 1. A square pulse of height h (Fig. 4) can be reprex x u(t - c) h a FIG. 3. Translated unit function. FIG. 4. b Square pulse. LAPLACE TRANSFORMS 9-07 sen ted as a combination of two unit functions: (12) o~ J(t) = h[u(t - a) - u(t - b)], a < b; he"nce its transform is h L[f] = - (e- as s (13) - e- bs ). x FIG. 5. Step function. A general step Junction (Fig. 5) can be regarded as a superposition of such square pulses: (14) J = hl[u(t) - u(t - al)] + h2[u(t - al) - u(t - a2)] + ... ; hence (if the pulses do not grow too rapidly, so that £[J] exists) (15) 1 L[f] = - [hI (1 - e-a1S ) s + h2(e- alS - e-a,S) + ... ]. Impulse Function. The unit impulse Junction at t = 0 is defined as the limit as e -; 0 of a square pulse form t = 0 to t = e and having unit area, i.e., the limit as e -; 0+ of 1 - [u(t) - u(t - E)]. (16) E The limit does not exist in the ordinary sense; it can be considered as defining an "ideal" function, the delta function o(t). One can consider oCt) to be 0 except near t = 0 where oCt) is large and positive and has an integral equal to 1. Now L [ U(t) - u(t E E)] __ 1 - e- - -; 1 as ES ES and accordingly one defines: (17) L[o(t)] = 1. E -; 0, 9-08 GENERAL MATHEMATICS The unit impulse function at t = c is defined as oCt - c) and one finds L[o(t - c)] = e- C8 • (18) It should be noted that L[u(t)] = L[o(t)]/s, so that by eq. (7) u(t)" can be thought of as an integral of o(t): u(t) = fot oCt) dt. This in turn suggests interpretation of oCt) as u'(t). 4. CONVOLUTION Let f(t) and get) be piecewise continuous for t convolution of f and g is defined as ~ 0. Then the (Laplace) t f (19) f. f(u)g(t - *g = u) du = h(t). It can be verified that h(t) is continuous for t t h(t) = (20) . If now, for some u, f." 1 Jro g(u)f(t - f.'" ~ u) du = g *f. II(t) 1e-" dt and f." 1g(t) 1e-" h(t) 1 e-" dt exists, so that L[f], L[g], L[h] exist and L[h] (21) = L[f * g] = L[f]L[g]. Properties of the Convolution. These are: (23) = f * g + f * h; f * (cg) = (cf) * g = c(f * g), c (24) f * (g * h) = (f * g) * h. (22) 0, also that . f * (g + h) Special Convolutions. = const.; The following are useful: (25) (26) eat tn-Ieat * eat * ... * eat = _ __ (n - I)! (27) (28) eat --:- ebt eat * ebt = _ __ a-b (a ;;e b). (n factors); dt exist, then LAPLACE TRANSFORMS 9-09 5. INVERSION If L[f] = F(s), one writesJ = L- 1 [F], thereby defining the inverse Laplace transJorm. The inverse is uniquely determined; more precisely, if L[f] = L[g] and J, 9 are piecewise continuous, then J = g, except perhaps at points of discontinuity. If J = L-1 [F], then as for Fourier series and integrals (Chap. 8, Sects. 8,9), 1 00 1 (29) J(t) = F(s)e Bt dw = lim F(s)e Bt dw, s = (j iw, 27r -00 b-+oo 27r -b fb f + at every t for which J has left- and right-handed derivatives; in the integrals (j is chosen greater than the abscissa of absolute convergence of L[f]. Under the conditions desr,ribed in Chap. 8, Sect. 8, the integral represents Y2[f(to+) JUo -)] at each jump discontinuity to. In general J(t) is defineq to be 0 for t < 0, which will force a discontinuity at t = 0 unless J(t) ~ 0 as t - t 0+; the integral thus gives Y2J(O+) at t = O. Conditions for Existence. Given F(s) as a function of the complex variable s, one can ask whether L -l[F] exists, i.e., whether F is the Laplace transform of some J(t). For this to hold, F(s) must be analytic in some half-plane (j > (jO, but this alone is not sufficient. If F(s) is analytic at s = 00 and has a zero there (Chap. 7, Sect. 5), so that + lsi> (30) R, then F(s) is a Laplace transform: (31) L -l[F(s)] = J(t) = 00 tn n=O n. 2: an +l , ; J(t) is of exponential type and is an entire function of t (Chap. 7, Sect. 4). Furthermore, (32) ; J(t) = ~ 27r1, c feBtF(S) ds, where C is a circle: lsi = Ro > R. If in addition, F(s) is analytic for all finite s except at SI, "', Sn, then J(t) equals the sum of the residues of F(s)eBtat SI, "', Sn (Chap. 7, Sect. 5). More general conditions that F(s) be a transform can be given. If, for example, F(s) is analytic for (j > (jO ~ 0 and is representable in the form (33) F(s) c J.I.(s) =-+s S1+0 (0 > 0), 9-10 GENERAL MATHEMATICS where IJ.t(s) I is bounded for (T ~ (Tl > (To, then F(s) is the Laplace transform of f(t), where f(t) is given by eqs. (29), with (T = (Tl. If F(s) is a proper rational function of s: F(s) = P(s)/Q(s), then eq. (32) is applicable and the integral can be computed by residues. If in particular Q(s) has only simple roots SI, ... ; Sn, then by Chap. 7, Sect. 5, estp /Q has residue exp (Skt)P(Sk)/Q' (Sk) at Sk, so that L -1 [pes)] = Q(s) (34) ± k=I eSktP(sk). Q'(Sk) This corresponds to the Heaviside expansion formula (Chap. 8, Sect. 1). Particular inverse transforms can be read off Table 1 (Sect. 1) or the accompanying Table 2. Others can be deduced from these by linearity and the various rules such as (4)-(7), (9), and with the aid of convolutions. Extensive lists are given in Refs. 1, 5, 6 of Chap. 8. Rules for Finding Laplace Transforms and Their Inverses. If f(t) has period T, then L[f] = (35) 1 1 + e- S TiT e-stf(t) dt. 0 For general f(t) with transform F(s), ~ ~F G)' (36) L[f(atl] a (37) L[e-atf] = F(s (38) L[tnf] = (-l)nF(n) (s), (39) L[t-nf] = > 0; + a); n = 1, 2, "'; foo .. 'fooF(S) ds ... ds , (n=1,2,···). S n times 6. APPLICATION TO DIFFERENTIAL EQUATIONS Characteristic Function. Let ao, "', an be constants, with ao ~ O. The function V (s) = aos n + ... + an will be termed the characteristic function associated with the differential equation (40) dnx ao - n dt Transfer Function. (41) dx + ... + an-l - + dt anx = f(t). The function 1 1 Yes) = = ----Yes) aosn + ... + an will be termed the tranf?fer function. LAPLACE TRANSFORMS 9-11 Solutions. Let J(t) be piecewise continuous for t ~ 0 and have an absolutely convergent Laplace transform for u > uo. A solution x(t) of eq. (40) satisfying given initial conditions x(O) (42) = ao, X' (0) = aI, "', x(n-l) (0) = an-l is obtained as follows. One forms the Laplace transform of both sides of eq. (40), applies the rule (6), and obtains the transformed equation V(s)X(s) - Q(s) = F(s), (43) where X = L[x], F = L[f] and (44) Q(s) = aoaosn- l + (aOal + alaO)sn-2 + ... + (aOan-l + ... + an-lao). Accordingly, (45) Q(s) Xes) = Yes) (46) x(t) = L -l[Y(S)Q(s) F(s) +- Yes) + = + Y(s)F(s), Y(s)Q(s) Y(s)F(s)] = L -l[Y(S)Q(s)] + L -l[Y(s)F(s)]. Since Y(s)Q(s) is a proper rational function, its inverse can be found by residues as in Chap. 9, Sect. 5. The inverse of Y(s)F(s) can be found in a variety of ways. In particular, Yes) has an inverse transform yet) and (47) L-'[Y(s)F(s)] = y(t) • I(t) = ,fY(U)/(t - u) duo Thus both terms in eqs. (46) are well defined and it can be shown that x(t) is the solution sought; x(t) has continuous derivatives through the (n - l)st order and an nth derivative which is continuous except where J(t) is discontinuous. The formula (47) defines y * J if J(t) is piecewise continuous for t ~ 0, even though J(t) may grow very rapidly as t ~ +00. If Yes) has only simple roots Sl, " ' , Sn, so that (48) Yes) = n A. 2: _J_. j=l S - n yet) = Sj 2: Aje Sjt , j=l then (49) If V has multiple roots, each multiple root Sj gives rise to terms of form 9-12 GENERAL MATHEMATICS TABLE 2. INVERSE LAPLACE TRANSFORMS F(s) L-l[F(s)] c !?. e(-b/a)t +b ps + q (s + a)(s + (3) 1 as 2 3 4 = f(t) a (q - pa)e- at - (q - p(3)e- pt {3-a ' + ps q (s + a)2 e-at[p p3 + q 2 as 2+bs + c ,b -4ac>O 1 . - - [(q - pa)e- at - (q - p(3)e-Pt ], p. + (q - ap)t] a = b + p., {3 = b - J.1., 2a 2a 5 as 2 ps + q + bs + c ,b2 - 4ac <0 e(-b/2a)t J.1. 6 (s pS2 + qs + r + a)(s + (3)(s + 1')' pS2 + qs + r + a)2(s + (3) , (s a ¢ {3 + qs + r + a)3 (s yb 2 - V4ac - b2 -1 ABC [A (pa 2 - + r)e-at q{3 + r)e- Bt qa ql' + r)e-"Y t], A = {3 - 1', B = l' - a, C = a - {3 2 p{32 - q{3 + r e- pt + [pa - qa + r t ({3 - a)2 ({3 - a) 2 pa - 2a{3p + q{3 -at ({3 - a)2 e rJ + 8 pS2 = pe-at 4ac [Ea cos.!!:.-.2a t + 2aqap.- pb sin.!!:.-.2a tJ ' + B(p{32 + C(pl'2 - a, {3, l' distinct 7 = J.1. + (q - + (pa 2 - 2pa)te- at 2 qa t + r) '2e-at LAPLACE TRANSFORMS TABLE 2. 9 10 (s (Continued) INVERSE LAPLACE TRANSFORMS L-l[F(S)] = J(t) F(s) pS2 9-13 + qs + r ~e-at + ex) (as 2 + bs + c)' aa 2 - bex + c ~ 0 N M pS3 + qs2 + rs + u (as 2 + bs + C)(AS2 + Bs + C) as 2 + bs + c and As2 + Bs + C having no common roots + ~L-l [ Bs + C ], N as 2 + bs + c = pex 2 - qex + r, N = aex 2 - bex B = (aq - bp)a+ pc - ar, C = (ar - pc)a + qc - br L-l [~ + qo ] as 2 + bs + C + L-l [ + c, PIS + q1 ] . AS2 + Bs + C To find po, qo, PI, ql, compute: + + Ao = a(ar - cp) b(bp - aq), J..I.o = a(au - cq) bcp, ero = a(bu - cr) + c2p, (3 = aB - bA "I = aC - cA, Do = a'Y 2 - b{3'Y + c{32, Al = A(Ar - Cp) B(Bp - Aq), + J..I.1 = erl = 15 1 = A(Au - Cq) + BCp, A(Bu - Cr) + C2p, A'Y2 - B{3'Y + C{32. Then po = PI = 11 ps + q 2 (as2 + bs + C)2' b - 4ac Ao'Y - J..I.o{3 Do -Al'Y . + J..I.tf3 , 15 1 q1 e- at < 0 '2dif33 [p{32t sin {3t + (q ex = ~, 2a {3 = J..I.o'Y - ero{3 , qo = -J..I.I'Y = , + ertf3 15 1 exp )(sin {3t - {3t cos (3t)] V4ac 2a Do 2 b GENERAL MATHEMATICS 9-14 The corresponding term in yet) is Atk- I e8 jl/(k - I)! A(s ~ Sj)-k in Yes). and in L -I[YF] is (50) Particular Solutions. If all the initial constants ao, " ' , an-I are 0, then Q(s) = and x = L -I[YF] is the solution sought. This particular solution can be found by eqs. (47), which requires knowledge of yet) and hence of the roots of V(s). This can cause difficulty. An alternative is to employ eqs. (29): ° 1 (51) x(t) =- 27r f 00 Y(s)F(s)e Bt dw, 0" = const. > 0"0' -00 It may be possible to simplify this by residues or series expansions. If J(t) is of form ebtp(t), where pet) is a polynomial of degree m in t, a particular solution can be found explicitly without finding the roots of Yes). If V(b) ~ 0, the particular solution is (52) x(t) = ebt [Y(b)P(t) + Y'(b) p'(t) I! Y"(b) y(m) (b) 2! m! + - - p"(t) + ... + If V(b) == 0, then Yes) = (s - b)kW(s), Web) ~ 0. Let Z(s) and let PI (t) be the polynomial obtained by integrating (53) Z(b)p(t) + Z'(b)p'(t) + ... + z(m) ] p(m)(t) . = I/W(s) (b) m! . p(m)(t) ° k times from to t. Then x = ebtpi (t) is a particular solution of eq. (40). In both cases it can be verified that L[x] = Y(s)L[ebtp] + Y(s)R(s), where R is a polynomial of degree less than that of V (in fact less than that of W (s) in the second case). Shnultaneous Equations. Similar methods are employed for simultaneous linear differential equations with constant coefficients in unknowns xl, X2, •••• One applies the Laplace transformation to the equations, thereby obtaining equations for XI(s), X 2 (s), ... ; in forming these new equations, certain initial conditions for xl, X2, ••• are assumed. The equations are simultaneous algebraic equations for XI(s), X 2 (s), ... and can LAPLACE TRANSFORMS 9-15 be solved by elimination or determinants. When X j(8) is known, Xj(t) can be found by forming the inverse transforms. EXAMPLE. d2 x dy -dt2 + 2 -dt + y = 13e2t dx - - 2x dt ' d2 y when t = 0, x = 1, y = 0, dx/dt = 0, dy/dt = 1. . 82 X(8) . (8 - 2)X(8) X= Y 84 (8 - . 15 = 2 + --, 8-2 2 2 + 1582 - 178 + 26 + 1)(8 + 2)(8 + 1) 1 8- 2 2t 2 8 - 2 3 Y=--- y=e Hence = 8 + --, + (8 + 38 + 5) Y(8) 2 2 19 X=--8- 2 2(8 + 1) 2t ' 13 + (28 + 1) Y(8) (8 - 2)(8 x=2e = 15e2t • + 8 + 88 + 58 + 54 , 2)(8 + 1)(8 + 2)(8 + 1) 83 = dy + -dP + 3 -dt + 5y + 21 5(s 19 2(s + 2) 28 , + + 1) + 5(8 + 2) + 19 - t 21 -2t - - e +-e 2 5 19 - t 28 -2t - - e +-e 2 5 438 - 51 10(S2 298 10(s2 + 1) ' + 7 + 1) ' - 51 sin t + 43 cos t 10 ' t + 7 sin t +29-cos ---- 10 7. RESPONSE TO IMPULSE FUNCTIONS For many applications it is important to consider the response of a linear system to the impulse function oCt) or to other ideal functions such as O/(t), o"(t), .... EXAMPLE 1. Consider the equation dx -dt + x = oCt) ' x(O) = ao. If one applies the Laplace transform mechanically to both sides and em- GENERAL MATHEMATICS 9-16 ploys the rule: L[o(t)] = 1 (Sect. 3), one finds: 1 + ao Xes) = - - , xU) = (1 s+l Hence x(t) has a discontinuity at t = 'initial value ao to the value 1 + ao. EXAMPLE 2. Similarly, d2 x dt2 + ao)e- t. ° (Fig. 6); x jumps from the assigned dx + dt = oCt) is found to have the solution (Fig. 7) x = 1 + ao + al - (1 + al)e7"t, with x x 1 + ao FIG. 6. Response of first order system to a-function. FIG. 7. Response of second order system to a-function. ao = x(O), al = x'(O). Here there is no discontinuity of x(t) at t = 0, but x'(t) has a discontinuity, jumping from the assigned initial slope of al to the slope 1 + al. It should be noted that the second example can be written as follows: dx dt = y, dy dt +y = o(t); thus its solution is an integral of the solution of the first example. Each such integration reduces the type of discontinuity. In general, V(D)x = oCt), V(D) = aoDn + ... , ao ~ 0, has a solution which has a jump in the (n - l)st derivative at t = 0, but no jumps in the derivatives of lower order. For the equation V(D)x = oCt - c), c> 0, a similar conclusion holds, with the discontinuity occurring at t = c. LAPLACE TRANSFORMS 9-17 One can interpret oCt - c) as dd u(t - c), if one forms the transforms by . t the rule L[1'] = sL[j], ignoring the discontinuity which would make the rule inapplicable. For then L [:t u(t - e) ] = sL[u(t - e)] = e-", in accordance with eq. (18). This suggests the general procedure. General Procedure. For the differential equation df V(D)x = -, dt in which f has a jump discontinuity at t = c but has otherwise continuous derivatives, one should take transforms ignoring the discontinuity: V(s)L[x] = sL[f] - f(O). Under similar conditions on f, a similar procedure can be used for higher derivatives, and for the general equation V(D)x = W(D)f. If the order of V(D) is less than that of WeD), x will itself be an ideal function; otherwise x will merely show some discontinuity at t = c. Similar remarks apply when there are several jump discontinuities. Let f have continuous derivatives of all orders except for t = c, at which the derivatives have limiting values to the left and to the right. Then f can be written as !l (t) + klU(t - c), wherefl(t) is continuous at t = c; correspondingly, 1'Ct) = f't (t) + kloCt - c), where f'l (t) is discontinuous at t = c. Thus 1'(t) = !2(t) + k 2u(l - f"(t) = 1'2(t) + k20(t = faCt) c) - c) + kau(t - c) + klo(t - c) + klo'(t - c) + k20(t - c) + klo'(t - c), Computation of lAf'], L[f"], "', as described above, is then equivalent to that obtained by writing L[1'] = L[!2] L[f"] = L[fa] + k2L[u(t - c)] + kaL[u(t - c)] + klL[o(t - c)] + k2L[0(t - c)] + klL[o'(t - c)] 9-18 GENERAL MATHEMATICS if one agrees that (m = (54) 1,2, ... ). The justification for· the rules adopted lies in the fact that they give a reasonable limiting form for the response x(t), and they meet the needs of the physical situations to which they are applied. 8. EQUATIONS CONTAINING INTEGRALS The method of Sect. 6 is applicable to "integro-differential equations" such as the following: (55) ao dx dt + alX + a2 rtx dt = J(t). Jo One need only apply the Laplace transformation to both sides and employ rule (7): (56) from which one can solve as before for Xes). One can also differentiate eq. (55) to obtain an equation of second order: d2x dx (57) ao - 2 + al - + a2 x = f' (t) ; dt dt from eq. (55), aox'(O) + alx(O) = J(O), so that one initial condition for eq. (57) is fixed. If J(t) has discontinuities, f'(t) has to be treated as an ideal function (Sect. 7); in such a case, it is simpler to use eq. (56). It should be remarked that eq. (55) is equivalent to the system dy -=X dt ' with the initial conditions: x(O) = ao, yeO) = O. By similar devices integrals can be eliminated formally in most cases. 9. WEIGHTING FUNCTION It has been seen that, for proper initial conditions, various problems lead to relations of form (58) Xes) = Y(s)F(s), where F(s) is the Laplace transform of a driving function or "input" J(t) and Xes) is the Laplace transform of the "output" x(t). In such cases LAPLACE TRANSFORMS 9-19 yes) is termed the transJer Junction; i.e., in general, the transfer function is the ratio of the Laplace transforms of output and input. If yet) is the inverse Laplace transform of Yes), then as in Sect. 6 x(t) = yet) * J(t) = (59) 1. t J(t - u)y(u) duo Accordingly, x(t) is a weighted average of J(t) over the interval from 0 to t, the value at t - u receiving weight y(u). SinceJ(t) = 0 for t < 0, one can also write x(t) = (60) f' f(t - u)y(u) du, -00 so that the average is over the entire "past" of J(t). Graphical Computation. One can then compute x(t) at each t graphically as suggested in Fig. 8. Here y(u) is graphed against u, with u FIG. 8. Response as a weighted average. the positive u-axis to the left and the origin above the point t on the t-axis. The value of J at t - u is multiplied by the value of y above t - u and the result is integrated to yield x(t) at the t chosen. As the graph of y(u) is moved parallel to the t-axis, the average at successive times t can be found. Weighting Function. The function yet) = L -l[y(S)] is termed the "weighting function." In view of the discussion given, this term would be 1. 00 justified only if (61) Y(s) y(t) dt = 1. = 1. But 00 y(t)e-" dt, Y(O) = 1. 00 y(t) dt, provided Yes) is defined for s = O. Hence if YeO) = 1, the "total weight" is 1, as desired. If YeO) ~ 00, one can redefine the input as a constant times x(t) and achieve the same result. 9-20 GENERAL MATHEMATICS Response to Unit Impulse. If € is very small andJ(t) is a'square pulse of height 1/ € from t = O'to t = €, then eqs. (59)' show that approximately 1 x(t) = - yet) . € = y(t); € as E ~ 0, this can be shown to be the limiting relation. .Thus the weighting Junction is the response to the unit impulse Junction oCt). This also follows from eq. (58), since if J(t) = oCt), L[f] = F(s) = 1. One can also remark that,if J(t) is the unit function u(t), then L[f] = F(s) = l/s, so that by eq. (58) Yes) Xes) = - , dx yet) = dt ; Yes) = sX(s), s· for, by eqs. (59), x(O) = O. Thus the weighting function can be interpreted as the derivative oj the response to the unit Junction. If one denotes by A (t) the response to the unit function, so that L[A] = Y(s)/s, then for an arbitrary driving function J(t), (62) Y(S) ) Xes) = s ( -s- F(s) , X dit =- dt J(t - u)A(u) duo 0 Equations (59) and (62) are equivalent to the eqs. (25), (26) of Chap. 8, Sect. 5. . 10. DIFFERENCE-DIFFERENTIAL EQUATIONS Because of the transformation rule: L[f(t - c)] = e-C8 L[f] (Sect. 3), Laplace transforms can be applied to solve linear difference-differential equations, i.e., equations of form n (63) M E E amlcf(lc) (t - mT) = g(t); lc=;=Om=O it will be assumed that the coefficients amlc are constants and that a solutionJ(t) is to be found which is equal to 0 for t ~ 0 and satisfies eq. (63) for t > O. Under these conditions and the transformed equation corresponding to eq. (63) is (65) (~m~o amks'e-mT') F(s) = G(s). LAPLACE TRANSFORMS 9-21 This can be solved for F(s) and the solution sought is L -l[F(s)]. Validity of this process requires in particular that for some 0"0 the term in parentheses in eq. (65)"have no zeros in the complex s-plane for 0" > 0"0' For discussion of the questions involved here see Ref. 1. Instead of requiring that J(t) be == 0 for t < 0 one can impose the condition that J(t) coincide with a given function Jo(t) in an "initial interval" -MT ~ t ~ O. ;This case can be reduced to the previous one by first extending the definition of Jo(t) to the range t > 0, while preserving continuity, and introducing a new unknown function it (t) = J(t) - Jo(t). 11. ASYMPTOTIC BEHAVIOR OF TRANSFORMS In general the behavior of J(t) at t = 0 is related to that of F(s) = L[f] as s ~ 00 along the real axis, while the behavior of J(t) at t = +00 is related to that of F(s) as s ~ 0 (or s ~ 0"0) along the real axis. A full discussion is given in the book of Doetsch (Ref. 3, Chap. 8), pp. 186-277. If al G(s) (66) F(s) = -+-, s S2 where IG(s) I < M for (67) 0" > O"o,then. lim J(t) = lim sF(s) t-+O+ (s real). 8-+00 If J(t) and J'(t) have convergent Laplace transforms for has a limit as t ~ +00, then (68) limJ(t) = lim sF(s) t-+oo 0" > 0 and J(t) (s real). 8-+0 REFERENCES 1. R. Bellman and J. M. Danskin, The Stability Theory of Differential Difference Equations, Proceedings of the Symposium on Nonlinear Circuit Analysis, Vol. II, pp. 107-128, Polytechnic Institute of Brooklyn, New York, 1953. See also the list following Chap. 8. A GENERAL MATHEMATICS Chapter 10 Conformal Mapping w. Kaplan 1. Deflnition of Conformal Mapping. General Properties 2. linear Fractional Transformations 10-01 10-05 3. Mapping by Elementary Functions 10-06 4. Schwarz-Christoffel Mappings 10-08 5. Application of Conformal Mapping to Boundary Value Problems 10-09 References 10-11 1. DEFINITION OF CONFORMAL MAPPING. GENERAL PROPERTIES Definitions. Let u = f(x, y), v = g(x, y) be two real functions of the real variables x, y, both defined in an open region D of the xy-plane. As (x, y) varies in D (Fig. 1), the corresponding point (u, v) varies in a set Dl and one says that the equations (1) v = g(x, y) u = f(x, y), define a transformation or mapping T of D onto Dl (Chap. 1, Sect. 3). If for each (u, v) in Dl there is precisely one (x, y) in D such that u = f(x, y), v = g(x, y), then the transformation T is said to be one-to-one, and T has an inverse T-I, defined by equations (2) y = 1/I(u, v), x = cp(u, v), obtained by solving eqs. (1) for x and y in terms of u and v. Now let T, defined by eqs. (1), be a mapping of D onto D 1 • In addition, let f(x, y) and g(x, y) have continuous first partial derivatives in D. The 10-01 10-02 GENERAL MATHEMATICS mapping T is said to be conformal if, for each pair of curves CI, C2 meeting at a point (xo, Yo) of D, the corresponding curves CI, C2 meeting at (uo, vo) form an angle ex at (uo, vo) equal to that formed by C1 *, C2 * at (xo, Yo). It is assumed that CI, C2 are directed curves and have well-defined tangent vectors at (xo, Yo) so that C1 *, C2 * also have tangent vectors at (uo, vo). The angle ex is then measured between the tangent vectors. I t is customarily a signed angle and measured, e.g., from C1 to C2 and, correspondingly, from v u x (b) (a) FIG. 1. Conformal mapping: (a) z-plane, (b) w-plane. C1 * to C2 *. Conformality then means that the corresponding angles are equal and have the same sense, as in Fig. 1. To emphasize this, one can write more explicitly that T is to be conformal and sense-preserving. For most applications T is assumed to be one-to-one. Conformalityof T then implies conformality of T- 1 • THEOREM 1. Let (1) define a mapping T of D onto D 1 • Let f(x, y) and g(x, y) have continuous first partial derivatives in D. Then T is conformal and sense-preserving if and only if the Cauchy-Riemann equations au av -=-, ax ay (3) au av ay ax hold in D and the Jacobian a(u, v)/a(x, y) ¢ 0 in D. By virtue of this theorem, the theory of conformal mapping is related to the theory of analytic functions of a complex variable (Chap. 7). One can use complex notation: (4) z = x + iy, w = u + iv, i= and the transformation T is then simply a complex function w = F(z) defined in D. The mapping w = F(z) is conformal precisely when F is analytic in D and F'(z) ¢ 0 in D. (See Ref. 2.) . CONFORMAL MAPPING 10-03 REMARK. If F'(z) is 0 at a point zo, then Zo is termed a critical point of F(z). A function w = F(z) cannot define a conformal mapping of any open region D containing a critical point zoo The behavior of F(z) near a critical point is typified by the behavior of zn near z = 0, for n = 2, 3, ... ; except for w = 0, each w has n inverse values w l / n • Curves meeting at angle a at z = 0 are transformed onto curves meeting at angle na at w = O. The absence of critical points does not guarantee that F(z) describes a one-toone mapping; all that can be said is that, if F'(zo) ~ 0, then w = F(z) does define a one-to-one conformal mapping of s~me sufficiently small region cont'aining zoo GeoIlletrical Meaning of ConforIllality. Let w = F(z) define a on~to-one conformal mapping of D on D I • Then each geometrical figure v y A' C pQa S u H FIG. 2. Behavior of mapping in the interior and on the boundary. in D will correspond to one in DI which is similar in a certain sense; if the first figure is bounded by smooth arcs, the second will be bounded by similar arcs and corresponding pairs of arcs form the same angle (Fig. 2). The lines x = const., y = const. in D form two families of curves meeting at right angles; hence these correspond to curves in Dl formed of one family and of its family of orthogonal trajectories (Fig. 3). Similarly the curves u = const. form orthogonal trajectories of the curves v = const. On the boundary of D conformalitY,may break down. In general there is some sort of continuous correspondence between boundary points of D and those of D I .· If D and Dl are each bounded by several simple closed curves, and F is one-to-one, then the mapping F and its inverse can indeed be extended continuously to the boundaries. Commonly there are points at which conformality is violated in that two boundary arcs of D meeting at angle a correspond to boundary arcs of DI meeting at angle {3 ~ a; in particular this can mean a folding together of the boundary, as suggested in Fig. 2. 10-04 GENERAL MATHEMATICS As in Chap. 7, Sect. 5, one can adjoin the number 00 to the complex plane to form the extended plane. The mapping w = F(z) is said to be conformal in a region containing z = 00 if F(l/z) is conformal in a region containing z = O. Similarly, one can discuss conformality in a neighborhood of a point Zo at which F(zo) = 00, so that F(z) has a pole, in terms of the conformality of l/F(z) near zoo y v x (a) FIG. 3. u (b) Level curves of x and y: (a) z-plane, (b) w-plane. ConforIllal Equivalence. Two regions D, Dl are said to be conformally equivalent if there is a one-to-one conformal mapping w = F(z) of D on Dl (so that the inverse function maps Dl conformally on D). Conformally equivalent regions must have the same connectivity; i.e., if D is simply connected, then so is D 1 ; if D is doubly connected, so is D 1 • However, having the same connectivity does not guarantee conformal equivalence. If D is simply connected then D is conformally equivalent to one and only one of the following three: (a) the interior of a circle; (b) the finite plane; (c) the extended plane. In particular, one has the following theorem. THEOREM 2 (RIEMANN MAPPING THEOREM). Let D be aS'imply connected region of the finite z-plane, not the whole finite plane. Let Zo be a point of D,. and let a be a given real number. Then there exists a one-to-one conformal mapping w = F(z) of D onto the circle Iw I < 1 such that F(zo) = 0 and arg F' (zo) = a. Furthermore, F(z) is uniquely determined. From this theorem it follows that the one-to-one conformal transformations of D onto Iw I < 1 depend on three real parameters: Xo = Re (zo) , Yo = 1m (zo) and a. These parameters can be chosen in other ways. For example, three boundary points of D can be made to correspond to 3 points on Iwi = 1 (in the same "cyclic order"). CONFORMAL MAPPING 10-05 2. LINEAR FRACTIONAL TRANSFORMATIONS Each function az +b cz +d I ae . bd I ~ 0, w=---,' (5) where a, b, c, d are complex constants, defines a linear fractional transformation. Each such transformation is a one-to-one conformal mapping of the extended z-plane onto the extended w-plane. Special cases of eqs. (5) are the following: Translations. The general form is w = z (6) + b. Each point z is displaced through the vector b. Rotation Stretchings. The general form is (7) The value of w is obtained by rotating z about the origin through angle a and then increasing or decreasing the distance from the origin in the ratio Complex plane A to 1.' Linear Integral TransforlDations. (8) w = az + b. Each transformation (8) is equivalent to a rotation stretching followed by a translation. ~r-------~---------+--~ Reciprocal TransforlDation. (9) 1 w =-. z FIG. 4. The transformation w = liz. Here Iwl = 1/lzl and arg w = - arg z. Hence w is obtained from z by "inversion" in the circle Iz I = 1 followed by reflection in the x-axis (Fig. 4). IlDportant ConforlDal Mappings. The general linear fractional transformation (5) can be composed of a succession of transformations of the special types: (10) a w = - + e be - ad e r, r 1 =-1 Z Z = ez + d. 10-06 GENERAL MATHEMATICS If one includes straight lines as "circles through 00," then each transformation (5) maps each circle onto a circle. By considering special regions. bounded by circles and lines one obtains a variety of important conformal mappings, as illustrated in Table 1. The first three entries in the table depend on 3 real parameters and provide all conformal mappings of D on Dl in each case .. ' .' 1. TABLE IMPORTANT CONFORMAL MAPPINGS F(z) D eia z - Zo. 1 - zoz a real, IZo I < 1 az ez Izl < 1 +b + a' a, b, e, d real, ad - be z - Zo > 1 >0 1m (z) > 0 1m (w) 1m (z) > 0 Iwl < 1 >0 . _ z -_ Zo • e,a a real, 1m (zo) Iwl< 0 1 1 2b region between circles z Iz - al = a, Iz - bl = b, 1 < Re (w) < 2a O O. The corresponding region Dl consists of the w-plane minus the ray: u ~ 0, v = O. The points (x, 0) on the boundary of D correspond to the points (u, 0) on the boundary of D 1 , both (x, 0) and ( -x, 0) corresponding to (u, 0), with u = x 2 • It should be noted that F'(z) = 2z is 0 at z = 0, so that this point is critical; conformality fails here, 'and in fact the edges of D" forming a 180 0 angle at z = 0, are transformed onto overlapping edges of Dl which form a 360 0 angle. For w = Z2 one can also choose D as a sector a < arg z < /3, provided /3 - a < 71"; the region Dl is the sector: 2a < arg w < 2/3. A third choice of D is a hyperbolic region: xy > 1, x > 0; Dl is then a half-plane, v > 72. A fourth choice of D is a strip: a < x < b, where a > 0; Dl is then a region bounded by two parabolas: 4a 2u + v2 = 4a4 , 4b 2u + v2 = 4b4 • The Function tV = zn. Analogous choices of regions can be made for w = zn (n = 2, 3, 4, ... ). The sector D: a < arg z < /3, with /3 - a < 271"/n, corresponds to the sector D 1 : na < arg w < n/3. If n is allowed to be fractional or irrational, w = zn becomes a multiple-valued analytic function (Chap. 7, Sects. 6 and 7) and one must select analytic branches. For such a branch the mapping of sectors is similar to that when n is an integer. The General Polynolllial tV = aoz n + ... + an_lZ + an. Suitable regions can be obtained by means of the level curves of u = Re (w) and v = 1m (w). In particular the level y curves of u and v which pass through the critical points of F(z) divide the z-plane into open regions each of which is mapped in one-to-one fashion on a VI region of the w-plane. This" is illustrated in Fig. 5 for w = Z3 - 3z + 3. x The critical points are at z = ± 1, at V which v = O. The level curve v = 0 divides the z-plane into six regions, in each of which w = F(z) describes a one-to-one conformal mapping of the region onto a half-plane. Adjacent regions, such as I and IV, can be FIG. 5. Mapping by w = z3 - 3z + 3. merged along their common boundary to yield a region mapped by w = F(z) on the w-plane minus a single line. The Exponential Function tV = e Z • This maps each infinite strip a < y < b conformally onto a sector a < arg w < b, provided b - a ~ 271"; in particular each rectangle: c < x < d, a < y < b in the strip corresponds 10-08 GENERAL MATHEMATICS to the part of the sector lying between the circles /w I = eC and IwI = ed • Similarly, the inverse of the exponential function, w = log z, maps a sector on an infinite strip. When b - a = 7r/2, the sector is a quadrant; when b - a = 7r, the sector is a half-plane. The Trigonometric Function w = sin z. This maps the infinite strip -7r/2 < x < 7r/2 on the finite w-plane minus the portion IRe (w) I ~ 1 of the real axis. ' The Rational Function w = z (lIz) = (z2 l)lz. This maps the exterior of the circle Iz I = 1 on the w-plane minus a slit from -2 to +2. The same function maps the upper half-plane 1m (z) > 0 on the w-plane minus the portion IRe (w) I ~ 2 of the real axis. Let the real constants hI, "', hn+I, XI, "', Xn satisfy the conditions + (12) Xl < X2 for some m, 1 ~ m < ... < ~ n + Xn, + 1. Then n (13) fez) = hI log (z - Xl) - hn+llog (z - Xn) - Xk + :E hk log -z k=2 Z - Xk-l maps the half-plane 1m (z) > 0 one-to-one conform ally on a region DI consisting of a strip between two lines v = const. minus several rays of form v = const. If the strip DI has width ~ 27r, the function F(z) = exp [fez)] maps the upper half-plane conformally and one-to-one on a sector minus certain rays and segments on which arg w = const. (See Chap. 7, Ref. 7, pp.605-606.) 4. SCHWARZ-CHRISTOFFEL MAPPINGS These are defined by the equation (14) w = fez) = AfZ (z Xo Xl) k dz 1 ••• (z - k Xn) n + B, where A, B are complex constants, Xo, XI, "', X n , kI, "', k n are real constants, and -1 ~ k j ~ 1. The function fez) is analytic for 1m (z) > 0, with (z - Xj)k i interpreted as the principal value: exp [k j log (z - Xj)]. Every one-to-one conformal mapping of the half-plane D onto the interior of a polygon can be represented in the form (14); this applies more generally to every one-to-one conformal mapping of the half-plane onto a simply connected region whose boundary consists of a finite number of lines, line segments, and rays. Polygon. When the function maps D onto a polygon, the points XI, ••• , Xn (and possibly 00) on the x-axis correspond to vertices of the polygon, CONFORMAL MAPPING 10-09 and the corresponding exterior angles are k l 7r, ... , kn 7r. If there is an (n + l)st vertex, corresponding to z = 00, then necessarily kl + ... + k n ~ 2; in general, 1 < kl + ... + k n < 3. Convex Polygon. When the function (14) maps D onto a convex polygon, all exterior angles are between 0 and 7r and the sum of the exterior angles is 27r; accordingly, (15) o < kj < 1 and kl + ... + k ~ 2. (n + l)st vertex n When kl + ... + k n < 2, there is an corresponding to z = 00. In general, for every choice of the numbers kl, ... , k n such that (15) holds, eq. (14) describes !1 one-to-one conformal mapping of the half-plane 1m (z) > 0 onto the interior of a convex polygon. Rectangle. For the special case (16) o< k < 1, the mapping is onto a rectangle with vertices ±K, ±K + iK', where In this case F(z) is an elliptic integral of the first kind (Chap. 7, Sect. 8), and its inverse is the elliptic function z = sn w. A great variety of conformal mappings have been studied and classified. See Ref. 1 for an extensive survey. 5. APPLICATION OF CONFORMAL MAPPING TO BOUNDARY VALUE PROBLEMS The applications depend primarily on the following formal rule. If Vex, y) is given in a region D and w = fez) is a one-to-one conformal mapping of D on a region Dl, then (18) In particular, V is harmonic in terms of x and y: (19) if and only if V is harmonic when expressed in terms of u and v. GENERAL MATHEMATICS 10-10 The boundary value problems considered require determination of U in D when U is required to satisfy some conditions on the boundary of D and to satisfy an equation (20) for given hex, y), in D. It follows from eq. (18) that a conformal mapping w = fez) amounts to a change of variable reducing the problem to one of similar form in the region D 1 • It is in general simpler to solve the problem for a special region such as a circle or a half-plane. Hence one tries to find a conformal mapping of D onto such a special region D 1 • Once the problem has been solved for U in D1, U can be expressed in terms of (x, y) in D and the problem has been solved for D. For most cases D has a boundary B consisting of a finite number of smooth closed curves C1, "', Cn, the case n = 1 being most common. The most important boundary value problems are then the following. I. Dirichlet Problem. The values of U on B are given; U is required to be harmonic in D and to approach these values as limits as z approaches the boundary. II. Neumann Problem. Again U is harmonic in D but on B the values of au/ an are given, where n is an exterior normal vector on B. Both problems can be generalized by requiring that U satisfy a Poisson eq. (20) in D. In general this case can be reduced to the previous one by introducing a new variable W, where (21) Furthermore, the Neumann problem can be reduced to the Dirichlet problem by consideration of the harmonic function Vex, y) conjugate to U (Chap. 7, Sect. 2). To solve the Dirichlet problem for a simply connected region D, one' seeks a one-to-one conformal mapping of D on the circular region Iwi < 1. This reduces the problem to a Dirichlet problem for the circular region. If p(u, v) are the new boundary values, its solution is given by (22) U 1 =- 211' i 0 2 71'" p(cos q" sin q,) 1- 1 +r 2 r2 - 2r cos (q, - 0) where r, 0 are polar coordinates in the uv-plane. dq" CONFORMAL MAPPING 10-11 If D is multiply connected, it is also possible to map D conformally on a standard type of domain, for which solution of the Dirichlet problem is known. For details, see Ref. 2. REFERENCES 1. H. Kober, Dictionary of Conformal Representations, Dover, New York, 1952. 2. Z. Nehari, Conformal ~Mapping, McGraw-Hill, New York, 1952. See also the list at the end of Chap. 7. A GENERAL MATHEMATICS Chapter 11 Boolean Algebra A. H. Copeland, Sr. 1. Table of Notations 2. Definitions of Boolean Algebra 3. Boolean Algebra and Logic 4. Canonical Form of Boolean Functions 5. Stone Representation 6. Sheffer Stroke Operation References 11·01 11·01 11·05 11·08 11·09 11·10 11·11 1. TABLE OF NOTATIONS Table 1 lists notations in current use. There are some inconsistencies between different systems, and care is needed to ensure proper understanding. The list is not exhaustive and there are other notations even for the crucial relations; for example, "a and b" is sometimes denoted by "ab." The grouping under mathematics, engineering, and logic is somewhat arbitrary. 2. DEFINITIONS OF BOOLEAN ALGEBRA First Definition. A study of the rules governing the operations on sets (Chap. 1) leads to a type of algebraic system, in which the basic operations are U and n, frequently called "or" and "and," corresponding to union and intersection of sets. In addition, the system can be partially ordered (the relation of set inclusion) and each object of the system has a complement. 11·01 GENERAL MATHEMATICS 11-02 TABLE 1. TABLE OF SYMBOLS, BOOLEAN ALGEBRA Operation Name Mathematics (Set Theory) Engineering Union Intersection Symmetric difference Complement Order "or" "and" Exclusive "or" Complement Order Symbols ,Logic U "or" "and" "or" n EBor Negation Material implication Sheffer stroke Existential quantifier Universal quantifier . Sheffer stroke Existential quantifier Universal quantifier Sheffer stroke Existential quantifier Universal quantifier EngiMathematics neer(Set Theory) ing + Logic + V None /\ A ==? :::::> V 3 or L /\ V or II 'or C ~ 3 or U n t A Boolean algebra B is a set of elements x, y, z, ... with two binary operations U and n, an order relation ~, and operation' of forming the complement such that: = x, (1) x Ux (2) x U y = y U x, (3) x n (y n z) = (4) x n x n z) = x, ny = x (x (y U z) = (x x U (y nx y n x, n y) n z, n y) U = (x U y) x U (y U z) (x n (x n = (x U y) U z, z), U z), (5) x ~ (6) x ~ y (7) x ~ y and y ~ x imply x (8) B contains two elements 0 and 1 such that 0 ~ x ~ 1 for all x in B, (9) 0 (10) 0 U x = x, (11) x x, nx and y = 0, ~ z imply x 1 ~ z, = y, nx = x, 1 U x = 1, (12) n x' = 0, x (x n y)' = x' U' y', (13) (x')' = x. U x' = 1, (x U y)' = x' ny', BOOLEAN ALGEBRA 11-03 The properties (1) to (13) can be regarded as a set of postulates, from which all other properties are to be deduced. Some of the postulates are consequences of others, so that the list could be considerably reduced (Refs. 1,3,5). The definition given here is easily verified to be equivalent to that given in Chap. 1, Sect. 7, in terms of lattices (Ref. 3). Second Definition. An alternative definition is based upon the set operation of symmetric difference, also known as "exclusive or." The symmetric difference of two sets X, Y, denoted by X EB Y, is the set of all elements in X, or in Y, but not in both. In symbols, (14) X EB Y = {s Is E: X U Y and s f[ X n Y}. This is pictured in Fig. 1. From the definition, a number of properties can be verified. For example, X EB X = 0 (here the empty set plays the role of the 0 of a Boolean 4111if X (f) Y X Iv y y FIG. 1. Symmetric difference. FIG. 2. Three-term symmetric difference. algebra), (X EB Y) EB Z = X EB (Y EB Z). The proof of the second rule is suggested in Fig. 2. In an arbitrary Boolean algebra one can define x EB y in terms of the other operations (15) x EB y = n y') U (x' n y). (x From (1), ... , (13) and (15) a number of rules can then be deduced by algebraic means alone. It is possible to consider EB and n as the basic operations and express U, " and ~ in terms of these two: (16) x U y = (x (17) x' = (18) x ~ y if x EB y) EB (x n V), lEBx, ny = x. Pursuing this point of view further, one is led to a second definition of a Boolean algebra. 11-04 GENERAL MATHEMATICS Alternative Definition. A Boolean algebra B is a set of elementtr x, y, z, ... with two binary operations EB, satisfying the laws: n = y EB x, ny n x, (19) x EB y (20) (x EB y) EB z = x (21) x n (y EB z) = (x (22) x nx (23) B contains two elements 0 and 1 such that for all x in B, x x EB x = 0 and x n 1 = x. = x EB (y EB z), = y (x n y) n z = x n (y n z), n y) EB (x n z), x, EB 0 = x, If the rules (19) to (23) are regarded as postulates and the relations (16), (17), (18) as definitions of U, I, and ~, then one can prove all the laws (1) to (13). Conversely, from (1) to (13) and the definition (15), one can prove (19) to (23). Hence the two definitions of a Boolean algebra are equivalent. Relation to Set Theory. Although Boolean algebras arise naturally in set theory, that is not the only source of such systems. They arise in logic and in other mathematical contexts. It is natural to ask whether every Boolean algebra can be interpreted as an algebra of all subsets of a given set. This is not true as stated, but there is a close relationship between each Boolean algebra and an algebra of sets (Sect. 5). EXAMPLE 1. A very simple but nevertheless useful Boolean algebra is one in which B contains only 0 and 1. The properties are given in Tables 2 and 3. This Boolean algebra is used in switching circuits: x = 1 means TABLE 2. ~ 0 1 xUy 0 1 0 1 1 1 TABLE 3. ~ 0 1 xny 0 1 --- o 0 0 1 that a certain switch is closed and x = 0 means that the switch is open. Two switches in parallel correspond to x U y; two switches in series correspond to x n y. EXAMPLE 2. A somewhat more general Boolean algebra is used in the design of electronic digital computers. This can be described as follows. The elements of B are all ordered n-tuples x = (Xl, X2, " ' , x n ), where 11·05 BOOLEAN ALGEBRA each Xk is 0 or 1; if Y follows: = (YI, ... , Yn), then x U y, x n yare defined as n Y = (Xl n Yl, X2 n Y2, ••. , Xn n Yn), Xk n Yk are evaluated as in Tables 1 and 2. x where Xk U Yk, of B are defined as follows: o= (0, 0, ... , 0), The 0 and 1 1 = (1, 1, ... , 1). Electronic devices can be constructed to perform the Boolean operations on the n-tuples x. The operation of ordinary arithmetic can be defined in terms of the Boolean operations together with the operation of shifting the decimal point. 3. BOOLEAN ALGEBRA AND LOGIC Algebra of Sentences. Let x, y, ... stand for declaratory sentences. For example x might stand for "greed is evil" and Y for "lead is heavy." From two sentences x, Y one can form the new sentence "x and y"; this is denoted by x n y. In the example given, x n Y is the sentence "greed is evil and lead is heavy." From x and Y one can also form the sentence "x or y"; this is understood to mean: x or y, but not both; the new sentence is denoted by x Ee y. One can also form the statement "x and/or Y," meaning: x or Y or both; this is denoted by x U y. Finally, one can form the negation of a sentence x: "lead is heavy" when negated becomes "lead is not heavy." The negation of x is denoted by x'. One can now verify that in the normal logical procedures for manipulating sentences, the operations U, n, Ee, I obey all the rules of a Boolean algebra. Two sentences are regarded as equal if they are logically equivalent. In this sense, all fals~ sentences can be considered equal and identified with the 0 of the Boolean, algebra; a universal truth ("tantology") can serve as the 1. In logic a table showing the Boolean algebra relationship of variables is called a truth table. The order relation x ~ Y can be interpreted to mean: "x implies y." For example, if x is the sentence Itt is an even integer" and Y is the sentence "2t is an even integer," then x ~ y, but Y ~ x is false, so that x < y. The implication defined here is essentially strict implication (see below) (Refs. 4, 5). Propositional Functions. The sentence Itt is an even integer" contains a variable, t. Accordingly, the sentence can be regarded as a function of t. For each value of t, the function becomes a definite sentence or proposition. Hence the function is termed a propositional function. It can be denoted by f, with f(t) denoting the value for each t. For example, GENERAL MATHEMATICS 11-06 f(4) is the true sentence "4 is an even integer," while J(3) is the false sen- tence "3 is an even integer." The t's for which J(t) is true form a set. Similarly, the sentence "t is a human being" is a propositional function which in turn determines a set; namely, the set of all human beings. If J, g, ... are propositional functions, then one can form new propositional functions JUg, J g, J EB g, J', ... as above. If Xj, X g , ••• are the sets corresponding to these propositional functions, then the operations on the functions correspond precisely to the operations U, n, EB, I on sets. For example, J n g is true when J and g are true; therefore an object belongs to' Xing when it belongs to XI and to X g, that is, to XI n Xg. Thus the calculus of propositional functions can be interpreted as a Boolean algebra of sets. The zero element represents a propositional function which is false for all values of the variable; the set 1 corresponds to a function which is true for all values. Conversely, each set X gives rise to a propositiohal function: t is an element of X. This function is true precisely when t belongs to X. A Boolean algebra of sets thus leads to a Boolean algebra of propositional functions. Because of the parallel between propositional functions and sets, one can employ geometric set diagrams, as in Figs. rand 2, to reason about propositional functions. In logic they are called "Venn diagrams." Quantifiers. The operation of forming the intersection of many sets has an analogue for sentences or propositional functions. As for sets n n n (Chap. 1, Sect. 1) Xt denotes Xl nX2 n ... n X n . When the x's are t=l sentences, this is the new sentence: "everyone of the x's" or "for every t, The range of t may be over an infinite set. When the range is underXt. Similarly, if J(t) is a propositional funcstood, one writes simply Xt." . n n J(t) is t tion and t ranges over all values for which J(t) has meaning, then read: "for every t, J(t)." Alternative notations for IIJ(t). One terms t n a quantifier. n J(t) are t t V J(t) and t There is an analogous interpretation t U Xt and U J(t); the first is read "for some t, x/' and the second "there exists a t such that J(t)." An alternative notat'ion for U is 3; U is also of t t called a quantifier. t t t hnplication. The statement "x implies y" is capable of various inter- pretations, of which three will be discussed here: material implication, conditional implication, and strict implication. Throughout, x, y, ... denote sentences forming a Boolean algebra B. Material implication. From the sentences x, y one forms the new sentence: "x implies y" as the sentence x' U y. This is called material impli- 11 . . 07 BOOLEAN ALGEBRA cation. One often writes x (24) :::::> y or x => y for this implication: x:::::> y = x' U y = x => y. If x and yare propositional functions x(t), yet), then they can be represented as sets X, Y. The sentence is then a propositional function which is true for all t if X' U Y = I; that is, if X c Y. The notation x :::::> y is therefore unfortunate. Material implication is the basis for most mathematical arguments, but it is criticized as permitting such statements as "if Iceland is an island, then fish can swim" to be judged true. Conditional implication. For each pair of sentences x, y a new sentence y/x is formed and is read "if x then y" or "y if x." It will be assumed that x -=;t. O. This is called conditional implication. The significance of the new sentence is indicated by certain postulates: (25) x/x = 1, (26) y/x = 0 implies y n x = 0, (27) (y (28) z/(x (29) (1 (30) for every x, y there is a z such that z/x = y, if x n z)/x = (y/x) n y) EB y)/x n (z/x) , = (z/x)/(y/x), = 1 EB (y/x), -=;t. O. Conditional implication is designed to fit the needs of the theory of probability. When x is false, it may happen that y/x is true or that y/x is neither true nor false. One can verify that postulates (25) to (28) are s~tisfied by material implication, but that (29), (30) are not. However, (29) is a reasonable demand to make on an implication, and it is valuable in theory of probability. Postulate (30) requires that B contain sufficiently many sentences so that one can always solve the equation z/x = y for the sentence z. A Boolean algebra which has an operation x/V satisfying postulates (25) to (30) is called an implicative Boolean algebra. It can be shown that an implicative Boolean algebra cannot be atomic (Sect. 4) but that one can always construct an implicative Boolean algebra containing any given Boolean algebra. Strict implication is defined as follows. The strict implication x implies y holds if and only if the material impli~ation is a tautology (i.e., x :::::> y = 1) and this is true if and only if y / x =' 1. When x and yare interpreted assets then the equation x :::::> y = 1 can be interpreted as stating that x is contained in y. This relation has the following alternative notations: GENERAL MATHEMATICS 11-08 x ~ y, y ~ x, x C y, Y ~ x, x C y, Y ::> x. The last two notations are unfortunate since they almost reverse the interpretation of the implication symbol. 4. CANONICAL FORM OF BOOLEAN FUNCTIONS Let a Boolean algebra B be given, with operations U, n, and ' as in the first definition of Sect. 1. By a Boolean function or Boolean polynomial in n variables Xl, " ' , Xn is meant an expression constructed from the n variable elements XI, ••• , Xn by the three operations U, n, '. For example, (X U y) n (x' U z') is a Boolean polynomial in three variables. It would appear at first that such expressions can be made arbitrarily long and hence that, for fixed n, there are infinitely many polynomials. However, by the rules of the alge2n bra, each polynomial can be simplified, and there are precisely 2 polynomials for each n. For example, there are four polynomials in one variable x: x, x', 0 = x x', 1 = x U x'. If two Boolean polynomials in XI, " ' , Xn are given, one may wish to determine whether they are the same; that is, whether one can be reduced to the other by applying the algebraic rules. In order to decide this, one reduces both polynomials to a canonical form, as described below. If both have the same canonical form, they are the same; otherwise, they are unequal polynomials. Definitions of Canonical ForIll and Minilllal Polynollliais. By a minimal polynomial in XI, " ' , Xn is meant an intersection of n letters in which the ith letter is either Xi or X'i. EXAMPLES. There are four minimal polynomials in x, y: n X ny, X' n y, X ny', X' n y'. x' n y' n z X' n y' n z'. Similarly, there are eight minimal polynomials in x, y, z: X n y n z, X n y n z', n y' n z,· X n y' n z', X n y n z, X' X' n y n z', There are 2n such minimal polynomials in XI, " ' , X n . By a polynomial in canonical form is meant a polynomial which is either o or else is a union of distinct minimal polynomials. (The order of the terms can be specified, but this is of no importance since U is commutative.) For example, (X n y) U (x' n y'), (X n y) U (x n y') U (x' n y') BOOLEAN ALGEBRA 11-09 are in canonical form. Every polynomial can be written in a unique canonical form, so that equality of two polynomials holds if and only if they have the same canonical form (Ref. 3). Reduction to Canonical ForlD. A given polynomial can be reduced to canonical form by the following steps: (i) Moving all primes inside parentheses by (12); (ii) Moving all caps (n 's) to the inside of parentheses by the first rule (4) ; (iii) Simplification of terms by rules (1), (2), (9), (10), (11), (13), so that one finally obtains a union of terms, each of which is a minimal polynomial in some of the x's; (iv) Adjoining missing x's to the minimal polynomials by inserting x U x' = 1 for each such x; (v) Applying steps (ii) and (iii) again. EXAMPLE. [x n (y u z)] u [(x U y) n (y' U z)'] n y) U (x n z)] U [(x U y) n (y n z')] = (x n y) U (x n z) U (x n y n z') U (y n y n z') =. [(x n y) n (z' U z)] U [(x n z) n (y' U y)] U (x n y n z') = [(x U [(y n z') n (x' U x)] n y n z') U (x n y n z) U (x n y' n z) U (x n y n z) U (x n y n z') U (x' n y n z') U (x n y n z') (x n y n z') U (x n y n z) U (x n y' n z) U (x' n y n z') = (x = 5. STONE REPRESENTATION Let a Boolean algebra B be given. Then it is possible to find a set S and to define a one-to-one correspondence between the elements x, y, ... of B and the certain subsets X, Y, ... of S in such a fashion that if x corresponds to X and y to Y, then x U y corresponds to X U Y, x n y to X Y, x' to X', 0 to the empty subset 0, and 1 to S itself. Thus every Boolean algebra can be represented as (is isomorphic to) the Boolean algebra of certain subsets of a set S. This is the Stone representation. If B has only a finite number m of elements, then B can always be represented as the Boolean algebra of all subsets of a given set S. Furthermore, m must be of the form 2n , where n is the number of elements in S. If Bl and B2 are Boolean algebras both having m elements, where m is finite, then Bland B2 are isomorphic. n GENERAL MATHEMATICS 11-10 STONE REPRESENTATION THEOREM. An infinite Boolean algebra B can be represented as the Boolean algebra of all subsets of a set S if and only if B is atomic, complete, and distributive. These properties are defined as follows: An element a of B is called an atom if the intersection x n a of a with an arbitrary element x of B is either a or O. If, for each x other than 0 in B, there is an atom a such that x a = a, then B is said to be atomic. In the representation of B as a class of sets, the atoms correspond to sets each containing one point. A Boolean algebra B is said to be complete: if every subset A of B has a least upper bound (Chap. 1, Sect. 7). The least upper bound is then unique; it can be denoted by n U(A) or and is also called the union of A. A Boolean algebra B is said to be distributive if, whenever (31) U(A) exists, {jn U (A) = U ({j n a) a E: A for every {j in B. 6. SHEFFER STROKE OPERATION In a Boolean algebra B let (32) x I y = x' U y'. If x and yare sentences, x Iy is the sentence "either not x or not can then prove that (33) x Ix (34) (x Iy) I(x Iy) =x (35) (x Ix) I (y Iy) = x U y, (36) x I (y Iy) = x' U y, (37) xl (xix) = 1. y." One = x' = 1 EB x, n y, Accordingly, all the operations of the Boolean algebra can be expressed in terms of the Sheffer stroke operation. This proves to be of value in the design of electronic digital cotnputing machines, which compute in the scale of two (see Ref. 6). BOOLEAN ALGEBRA 11-11 REFERENCES 1. Digital Computers and Data Processing, J. W. Carr and N. R. Scott, Editors, University of Michigan, Ann Arbor, HJ55, especially Article III. 4.1. 2. High-Speed Computing Devices, Engineering Research Associates, McGraw-Hill, N ew York, 1%0. 3. G. Birkhoff, Lattice 'Theory, American Mathematical Society, New York, 1940. 4. 1. M. Copi, Symbolic Logic, Macmillan, New York, 1951. 5. P. C. Rosenbloom, 'The Elements of Mathematical Logic, Dover, New York, 1950. 6. M. Phistcr, Jr., Logical Design of Digital Computers, Wiley, New York, 1958. A GENERAL MATHEMATICS Chapter 12 Proba bility A. H. Copeland, Sr. 1. Fundamental Concepts and Related Probabilities 12-01 2. Random Variables and Distribution Functions 12-04 3. Expected Value 12-06 4. Variance 12-11 5. Central Limit Theorem 12-13 12-18 6. Random Processes 12-20 References 1. FUNDAMENTAL CONCEPTS AND RELATED PROBABILITIES Postulates. The probability that an event will occur is a real number between 0 and 1. If x denotes the sentence, the event will occur, then Pr(x) denotes the probability that the event will occur. Thus Pr(x) is the probability associated with the sentence x. Consider a Boolean algebra B of sentences (see Chap. 11) in which 0 is interpreted as the sentence associated with an impossible event and 1 is interpreted as the sentence associated with a certain event. This treatment will (a) show how some probabilities can be computed from others; (b) study the relations between probabilities of sentences connected by the words and, or, not, if (denoted respectively by n, U, " /). 12-01 GENERAL MATHEMATICS 12-02 Assume that the following postulates hold: (1) Pr(x) is a non-negative real number if x is in B. (2) If xl, X2, "', are in B and Xi n Xj = 0 when i ~ j where i, j = 1, 2, '. •• ... then 00 U Xk is in Band k=l (3) Pr(l) = 1. (4) Pr(x y) = Pr(x) Prey/x). If Xi Xj = 0, i.e., if Xi, Xj cannot both occur, then Xi, Xj are said to be mutually exclusive and the events associated with them are also said to be mutually exclusive. Thus postulate (2) states that the probability that at least one of a set mutually exclusive events will occur is the sum of their probabilities. The following theorems are consequences of the above postulates. THEOREM 1. 0 ~ Pr(x) ~ 1. THEOREM 2. Pr(O) = 0 .. n n THEOREM 3. Pr (U Xk) = f k=l k=l Pr(xk) if Xi n Xj = 0 whenever i ~ j. + THEOREM 4. Pr(x U y) = Pr(x) Prey) - Pr(x n y). THEOREM 5. Pr(x') = 1 - Pr(x). THEOREM 6. Pr(x y') = Pr(x) - Pr(x y). THEOREM 7. If xl, X2, "', Xn are mutually exclusive (i.e., Xi Xj = 0 when i ~ j) and exhaustive (i.e., Xl U X2 U ... U Xn = 1) and equally likely (i.e., all Pr(Xk) are equal) then Pr(xk) = l/n for k = 1, 2, "', n. EXAMPLE 1. As an illustration of Theorem 7 consider a coin which is about to be tossed, and let Xl be the sentence "the coin will turn up heads" and X2 be the sentence "the coin will turn up tails." If the coin is not loaded, one says that it is honest and assumes that the hypotheses of Theorem 7 hold. Then Pr(XI) = Pr(x2) = 72. EXAMPLE 2. Next consider an honest die which is about to be thrown and let Xk be the sentence "the face numbered k will turn up" where k = 1, 2, "', 6. Again one assumes that the hypotheses of Theorem 7 hold and concludes that Pr(xk) = 76 for k = 1, 2, .. " 6. The probability that the die will tum up an odd number is given by Theorem 3. Thus n n n EXAMPLE 3. Next let X = Xl U X3 U X5, Y = Xl U X2 U X3 and note that Pr(x) is the probability that the die will turn up an odd number and Prey) is the probability that it will turn up a number less than 4. It will PROBABILITY 12-03 be instructive for the reader to check that x n y = Xl U X3, XUy = Xl U X2 U X3 U X5 and also to check Theorems 4, 5, and 6 for this X and y. To compute the conditional probability Prey/x), i.e., the probability that the die will turn a number less than 4 if it turns up an odd number, use postulate (4). Thus Pr(x n y) = t = Pr(x)Pr(y/x) = tPr(y/x) , and hence Prey/x) = -§-. EXAMPLE 4. N ext consider three boxes and let Xk denote the sentence "the kth box will be selected" where k = 1, 2, 3. If one of the boxes is selected at random, this is interpreted to mean that the hypotheses of Theorem 7 hold and hence that Suppose further that the first box contains two silver coins, the second contains one silver coin and one gold coin, the third contains two gold coins, and that a coin is drawn at random from the box which has been selected. Let y denote the sentence "a gold coin will be drawn from the box which has been selected." Then N ow suppose that this experiment has been performed and that the coin has been examined and found to be gold. On the basis of this information what is the probability that the coin came from the third box containing the two gold coins? One interprets the answer to this question as the conditional probability Pr(xa/Y) , i.e., the probability that the third box was drawn if the coin was observed to be gold. It will be instructive for the reader to verify that Pr(x3/Y) = 7i with the aid of the following theorem which is called Bayes's theorem and which is a consequence of the above postulates. THEOREM 8. BAYES'S THEOREM. If XI, X2, .. " Xn are mutually exclusive, exhaustive and distinct from 0, then for any y one has n Pr(xify) = Pr(xi)Pr(y/xi)/ if the denominator is not O. L Pr(xk)Pr(y/xk) 12-04 GENERAL MATHEMATICS Independence. The sentences x!, X2, ... , Xn are said to be independent if and if a similar equation holds for every subset of x!, X2, "', Xn. Thus when n = 3 one has Pr(xi n X2 n xs) Pr(xi n X2) Pr(x2 Pr(xi = Pr(xI)Pr(x2)Pr(xS), = Pr(xI)Pr(x2), n xs) = n xs) = Pr(x2)Pr(xS) ' Pr(xl)Pr(x3). If x!, X2 are independent and Pr(xI) Pr(xi n X2) ~ 0, Pr(x2) ~ 0 then = Pr(xI)Pr(x2) = Pr(xI)Pr(x2/x I) = Pr(x2)Pr(xI!x2), and hence Pr(x2/xI) = Pr(x2) and Pr(xI!x2) = Pr(xI). 2. RANDOM VARIABLES AND DISTRIBUTION FUNCTIONS Consider a physical experiment which is designed to result in a real number. This number is subject to certain random fluctuations since in all physical experiments one expects experimental errors to be present. The result of the experiment is interpreted as a random variable X. For a mathematical definition of a random variable, see below. Let Xx (for any real number A) denote the sentence the experiment will produce a number less than A, i.e., the sentence X is less than A. Then the probability Pr(xx) is a function F of the real variable A called the distribution function of X. Thus Pr(xx) = F(A). If Al < A2 then ,Pr(xx2 n x'x 1) = Pr(xX2) - Pr(xXl n xx 2) = F(A2) - F(AI) is the probability that X is greater than or equal to A!, but less than A2. Thus when F is known, one can find the probability that X lies in a given interval. In Chap. 11 it was noted that the elements of a Boolean algebra can be interpreted as sets of points of some space. Thus one interprets xX 1 n X'X2 as a set and Pr(xx 2 n x'xJ as the probability of obtaining a point of this set, that is, the probability that the experiment will select a point ~ of PROBABILITY 12-05 this set. Imagine that the number which the experiment produces. is determined by the point ~ selected and hence that X is a function of-~. Then x>-. is the set of all points ~ for which X(~) < X. The only restrictions placed on the function X are that it is real valued and that each of the sets x>-. shall belong to B. Such a function is said to be measurable .with respect to B. The measure of a set x>-. is defined as the probability Pr(x>-.). A random variable X is a function which is measurable with respect to B. Let X be a random variable, let x>-. be the set of points ~ for which X(~) ~ X, and denote Pr(x>-.) by F(X+). Then it can be proved (using postulate 2) that X>-. is in B for all real X. Moreover F(X+) is the limit of F(J.L) as J.L approaches X through values greater than X. If J.L approaches X through values less than X then the limit of F(J.L) is F(X). Furthermore, F is a nondecreasing function for which lim x -+ -00 F(X) = F( -00) = 0, lim F(X) = F( +00) = 1. x -+ +00 The above properties characterize the distribution function of a random variable. EXAMPLE 1. As an illustration of a random variable let x be any element of B and let I if ~ is in the set x ,px(~) = { O'f' . h 1 ~ IS not III t e set x. Then ,px is called the characteristic functio.n of the set x and is interpreted as the random variable which takes on the value 1 when x succeeds and the F(X) Pr(x' ) I - - - - - . . . . J FIG. 1. Distribution function for Example 1. value 0 when x fails. The distribution function F of the random variable ,px is the following (see Fig. 1): if X :::; 0 Pr(x') if 0 < 'A ~ 1 ( 1 if 1 < 'A. 0 F('A) = 12-06 GENERAL MATHEMATICS EXAMPLE 2. Consider a die and let bered k will turn up. Let X = 1/IxJ+ 21/1x2 3;k denote the sentence the face num- + 31/1x3 + 41/1x4 + 51/1x5 + 61/1x6' If the face numberedk does turn up then this will assign the value 1 to 1/Ixk and the value 0 to the remaining characteristic functions and hence X F(X) FIG. 2. Distribution function for Example 2, random tossing of a die. will take on the value k. Thus X is the random variable which takes on the value which the die turns up (see Fig. 2). It can be proved that sums, products, and differences of random variables are again random variables. Furthermore any real number is a random variable. EXAMPLE 3. The number V2 is the random variable whose distribution function F is given by (see Fig. 3): F(A) - o if A < V2 - 1 if V2 ~ A. F(X) FIG. 3. Distribution function for Example 3. 3. EXPECTED VALUE If X is a random variable associated with some experiment and if the experiment is repeated a large number of times, then one should expect the average of the numbers obtained to be very close to some fixed number PROBABILITY 12-07 E(X) which is called the expected value of X. In order to make this idea more precise the following definition is introduced. The random variables Xl, X 2 , " ' , Xn are said to be independent provided xl,~p X2'~2' " ' , xn,~n are independent for all AI, A2, "', An where Xk'~k is the set of points for which Xk(~) < Ak. , As an illustration of independent random variables, consider a pair of honest dice. Let X I denote the random variable which takes on the value resulting from the throw of the first die and X 2 denote the random variable corresponding to the second die. It is reasonable to assume that Xl and X 2 are independent. Thus we assume that the occurrence of a number less than Al = 3 on the first die and the occurrence of a number h~ss than A2 = 5 on the second die are independent events; simila~ly, for other choices of Al and A2. Next let X3 = Xl + X 2 • . Then Xl and X3 are dependent random variables. Weak Law of Large NUlllbers. Now let X be an arbitrary random variable and let Xl, X 2 , •• " Xn be independent random variables all having the same distribution function as X. Let XE,n be the set of points'~ for which Then XE,n is interpreted as the sentence the average of 'Xl, X 2 , differ from E(X) by less than E. One might expect that lim Pr(xE,n) = 1 for every E •• " Xn will >0 n~QO and that there is only one choice of E(X) for which this limiting probability is 1. This, as a matter of fact, is the case and this result is calle~ the weak law of large numbers. Roughly the weak law of large numbers states thai if an experiment is repeated a large number of times then it is very likely that the average of the results will differ only slightly from the expected value. The expected value E(X) exists for a large class of random variables but not for all random variables. Properties of Expected Value. THEOREM 9. E(AX + J.LY) = AE(X) + J.LE(Y) if A, J.L are real numbers and X, Yare random variables for which E(X), E(Y) exist. THEOREM 10. If E(X), E(Y) exist and X (~) ~ Y(~) for all ~ then E(X) ~ E(Y). THEOREM 11. Ifl/lx is the characteristic function of the set x then E(1/;x) = Pr(x). With the aid of Theorems 9 and 11 one can compute expected value for certain random variables called simple random variables. GENERAL MATHEMATICS 12-08 A random variable X is simple if it has the form where each Ak is a real number and each t/lxk is the characteristic function of the set Xk. THEOREM 12. Theorem 10 is used to approximate expected value for a much larger class of random variables called bounded random variables. The real numbers A, J.L are called bounds for a random variable X if A ~ X(~) ~ J.L for all~. When the bounds exist X is said to be bounded. THEOREM. 13. If A, J.L are bounds for X and if A = AQ < Al < ... < An = J.L then E(X) exists and n L n Ak-I(F(Ak) - F(Ak-I)) ~ E(X) ~ k=1 L Ak(F(Ak) - F(Ak_I)) k=1 where F is the distribution function of X. If each Ak - Ak-I extreme members of the inequalities differ by at most e. Theorem 13 is readily established as follows. Let < e, then the then n n L ¢k = 1, k=1 L k=1 XCPk = X, and and hence the inequalities follow from Theorems 9 to 11. The difference between the extreme members of the inequalities is: n L k=1 n (Ak - Ak-I)(F(Ak) - F(Ak_I)) < L e(F(Ak) - F(Ak_I)) k=1 If F has a continuous derivative f (i.e., dF(A)/dA = e(F(J.L) - F(A)) = e. = f(A)) then F(Ak) - F(Ak-l) = f(J.Lk)(Ak - Ak -1) PROBABILITY where Ak-l 12-09 < J1.k < Ak and n n lim E - L Ak(F(Ak) - F(Ak_l» = lim 0 k=l E - L Akf(J1.k)(Ak - Ak-l) 0 k=l ~ J."uf(u) duo Thus THEOREM 14. If the distribution function F of a random variable X has a continuous derivative f and A, J1. are bounds of X then E(X) exists and E(X) ~ J."uf(u) du ~ J."u dF(u). THEOREM 15. If the distribution function F of a random variable X has a derivative f then E(X) exists and E(X) ~ foo uf(u) du -00 ~ foo u dF(u) -00 whenever the integral exists. Stieltjes and Lebesgue Integrals. The two cases which arise most frequently in practice are the simple random variables and the random variables whose distribution functions have continuous derivatives. In the first case the expected value is computed by means of Theorem 12 and in the second case by Theorem 15. The integral on the right of the equation of Theorem 15 can be assigned a meaning even when f does not exist. In the case of a bounded variable this integral is defined to be the limit of the approximations given in Theorem 13. A meaning can also be assigned in certain unbounded cases. This integral is called a Stieltjes integral. Another integral expression for E(X) is E(X) = f X dPr. This is called a Lebesgue integral and it is also defined in terms of the approximations of Theorem 13. The terms expectation and mean are often used as synonyms for expected value. Probability Density and Joint Distribution. The derivative f of F is called the probability density. When the density is given the distribu- GENERAL MATHEMATICS 12-10 tion function can be computed by the formula F(A) = . f A feu) duo -00 See Figs. 4 and 5, Sect. 5. The joint distribution of two random variables X!, X 2 is a function F such that FO\!' A2) is the probability that Xl < AI, and X 2 < A2' If the joint distribution has a density f then if FI, F2 are the distribution functions of Xl, X 2, and !I, f2 are the corresponding densities then F,(A,) ~ f" -00 F 2 (A2) ~ f" -00 1,(U1) ~f foo I(u" U2) du, dU2 ~ f" 1,(U1) dU1, -00 -00 foo I(u" U2) dU2 du, ~ f "/2 (U2) dU2, -00 oo/ (U 1' "2) dU2, -00 (2(U2) ~ foo/(U" U2) dU1. -00 The expected value of the product X I X 2 is If Xl, X 2 are independent then. and Furthermore: THEOREM 16: If Xl, X 2 are independent then E(X 1 X 2 ) = E(X 1 )E(X2 ) • . ' Two random variables Xl, X 2 for which E(X 1 X 2 ) = E(X 1 )E(X2 ) are said to be uncorrelated. Thus Theorem 16 states that independent random variables are uncorrelated. This result holds even when there is no joint probability density. The converse is not true. That is, random variables may be uncorrelated, but not independent. PROBABILITY 12-11 As an illustration of a pair of random variables which are dependent but uncorrelated, consider an honest die whose faces are numbered respectively -3, -2, -1, 1,2,3. Let Xl denote the random variable which takes on the value resulting from the throw of this die and let X 2 = X12. Then E(X 1 ) = 0, E(X 2) = J 34, E(X 1X 2) = E(X I 3) = 0 = E(X I )E(X2) Hence X I and X 2 are uncorrelated but they are clearly dependent. 4. VARiANCE THEOREM 17. If f is a function of a real variable A with at most a finite number of discontinuities and if X is a random variable with distribution function F, then f(X) is a random variable and E(f(X)) = foo feu) dF(u) -00 whenever the integral exists. A special case of this formula is the following: If E(X) = J.L and E«X - J.L)2) = fOO (u - J.L)2 dF(u) = E(X2) - E2(X) = u2(X), -00 then u2(X) is called the variance of X and the positive square root of the variance, u(X), is called the standard deviation of X. The Properties of Variance. THEOREM 18. u 2 (X A) = u 2 (X), + (AX) = A2 u 2 (X). THEOREM 19. If XI, X 2, "', Xn are independent random variables, then 2 2 U (Xl X2 Xn) = U 2 (Xl) U (X2) u2(Xn), 2 2 2 2 u2 Xn) = ~ u (Xd U (X2) U (Xn). n n n + + ... + (XI + X+ ... + U 2 + + + ... + + ... + If X E is the set of all points ~ such that IX(~) - J.L I < e where J.L = E(X) then x' E is the set~. of all ~ such that IX (~) - J.L I ~ e. Moreover the inequality e21/1x' /~) ~ (X(~) - J.L)2 can readily be verified when ~ is in X E and when ~ is in x' E and hence this inequality holds for all~. Thus by Theorems 9 to 11 it follows that E(e 21/1x) = e2Pr(x'E) ~ E«(X - J.L)2) = u2(X), 12-12 GENERAL MATHEMATICS and therefore This inequality is called Tchebysheff's inequality. By combining Tchebysheff's inequality with Theorem 19, one obtains: THEOREM 20. If Xl, X 2, ... , Xn are independent random variables with common mean J.L and common variance 0-2 and if XE,n is the set of points ~ for which then and lim Pr(xE,n) = 1 for every e > o. n-+oo The Strong Law of Large NUlllbers. The first part of Theorem 20 gives a crude approximation for the probability that the average will differ from the common mean by less than e. Recall that the second part of this theorem is ~he weak law of large numbers. The reasoning by which one arrived at this result is of course circular but this circularity can be avoided. The strong law of large numbers is the following: THEOREM 21. If Xl, X 2, ... , Xn are independent random variables with common expected value J.L and common variance 0-2 and if x is the set of points ~ for which lim XI'(~) X2(~) Xn(~) = J.L. + + ... + n n-+oo Then Pr(x) = 1. Even though Pr(x) = 1 it is not in general true that x = 1. If an element x of B is such that Pr(x) = 1, then x is said to be almost certain. The strong law of large numbers states that it is almost certain that the limit of the average is the common expected value. The following example will help one understand the distinction between certain and almost certain. Let X be a random variable with distribution function F defined as follows: o if "A F("A) = { ::::;; 0 "A if 0 ~ "A ~ 1 1 if 1 ~ "A. Then it is almost certain, but not entirely certain, that X will take on a va]ue distinct from ~. PROBABILITY 12-13 5. CENTRAL LIMIT THEOREM Distribution of Sums and Averages of Independent Random Variables. Consider eiXt = cos Xt + i sin Xt where i 2 = -1 and t is a parameter. This exponential converts the real random variable X into a complex valued random variable. The expected value of the latter random variable is defined in a natural way to be E(e iXt ) = E(cos Xt) + iE(sin Xt) = ¢x(t). The advantage of the exponential is that it converts a sum into a product and hence enables one to make use of the condition of independence. Thus if X, Yare independent then it can be shown that eiXt , eiYt are independent and hence by Theorem 16 ¢x+y(t) = E(eiXteiYt) = ¢x(t)¢y(t). The advantage of the factor i is that it produces a bounded random variable and insures the existence of the expected value for all real values of t. The advantage of the parameter t. is that it produces a function in terms of which one can compute the distribution function. Thus ¢x is a function of the parameter t called the characteristic function of the random variable X. Unfortunat~ly the phrase, characteristic function, has two distinct meanings in the theory of probability, namely, characteristic function of a set of points and characteristic function of a random variable. Computation of the Characteristic Function of a Simple Random Variable. Let n X = L AkY;xk, k=l where Xl, X2, ••• , Xn are mutually exclusive and exhaustive. Then n eiX(I;)t = L ei'AktY;Xk(~) k=l for all ~, since if ~ lies in Xk, then Y;xk(~) = 1 and the remaining characteristic functions have the value 0 and hence both sides of the equation become ei'Akt. From this it follows that n E(e iXt ) = L Pr(xk)ei'Akt = ¢x(t). k=l As a special case consider the simple random variable Y;x. One can write Y;x = 0 ·Y;x' + l·1/tx GENERAL MATHEMATICS 12-14 and hence + peit cJ>if;x(t) = q where p = Pr(x), q = Pr(x'). Next compute the characteristic function for the sum 1/Ixl +-,pX2 + ... + 1/Ixn where Xl, X2, " ' , xn are independent and Pr(xk) = p, Pr(x'k) = q for each k. Then ' n II cJ>fXk(t) cJ>fXl+fx2+"'+fxn(t) = = (q + peit)n. k=l If X has the distribution function F then the characteristic function is cJ>x(t) = E(e iXt ) = foo ei"At dF(A). -00 This transforms the function F into the ftinction cJ>x (essentially the Laplace-Fourier transform). The inverse transform is !(F(A) 1 + F(A+» = - . 27r~ h lim -+ fh eit -h 00 cJ>x(t)e-"At t dt. To see why this is the case note that 1 if J.L < A, = ! if J.L = A. / o if A < J.L. This formula is verified by converting the integral into integrals of the form f oo -00 sin mt --dt t by means of the relation eimt = cos mt + i sin mt. N ow compute the inverse transform for a simple random variable where XI, X2, •• ', Xn are mutually exclusive and exhaustive. Since n L: Pr(xk) k=l = 1, 12., 15 PROBABILITY then - 1 21ri n f 00 . . tt (t) -tAt e - C/>X e 1 dt = - 21ri -00 f 00 " L.J n Pr(x .)eit _ k k=l " L.J k=l Pr(x )ei">\kte-iAt k -------------dt -00 if A ~ any Ak. The final sum is the probability that X will take on a value . less than A and hence this sum is equal to F(A) = F(A+) = t(F(A) + F(A+ )). If A equals some Ak, then the corresponding term Pr(xk) /2 must be added and again the result is Y2(F(A) + F(X+ )). The proof for an arbitrary random variable X consists in approximating X by a simple random variable. In the general case the integral from -00 to 00 may not exist, and one has to resort to integrating from -h to h and then passing to the limit. Binolllial and Poisson Distributions. If F n is the distribution function of where Xl, X2, "', Xn are independent and Pr(xk) = p, Pr(x'k) = q for each k then F n(A) is the probability that less than A of the events Xl, "', Xn will succeed. If X ~ any Ie then Fn(X) = - 1 21ri foo -00 eit - c/>x(t)e- iAt t dt where GD is the number of combinations of n things taken k at a time. If A = some k, then the corresponding term (k)pkqn-k /2 must be added. GENERAL MATHEMATICS 12-16 This is called the binomial distribution. To obtain an approximation to this distribution for small p and large n set p = p./n and let n become infinite. The limiting distribution F is given by p.k 2: - F('A) = k
compute its derivative. Thus d 1 - ('A) = d'A 21r 2 Since t + 2i'At = d 1 - ('A) = d'A 21r (t f+OO e- t2 / e- "Xt. 2 + i'A)2 + 'A , set u 2 f+oo e-(t+i"X) /2e-"X /2 dt. -00 i -00 = - t + i'A, du 2 e-"X /2 = -- 21r = dt and obtain f+oo e-00 u2 / 2 du = Ae-"X 21r /2 J PROBABILITY 12-17 where A = f~ e~'/2 duo Hence (X) FIG. 5. Probability density for the normal distribution function. 6. RANDOM PROCESSES A continuous random process is a function X which assigns to every real number t, a random variable Xt. If t ranges only over the integers then the process X is said to be discrete and if t ranges only over the positive integers then the process X is simply a sequence of random variables. Consider complex valued random variables, i.e., random variables of the form X = Xl + iX2 where Xl, X 2 are real. The complex conjugate of X is X I - iX2 and is denoted by X. The inner product of two random variables X, Y is denoted by (X, Y) and defined by the equation (X, Y) = E(XY). The covariance function R of a process X is defined by the equation R(t, T) = (Xt+n Xt). If R depends only on T and not on t, then the process is said to be stationary in the wide sense. A physical example of such a process is the phenomenon of noise. In the mathematical model (i.e., the process X) the variable t is interpreted as time. The process can be envisioned as being composed of simple harmonic oscillations in which the amplitudes associated with the various frequencies are selected in accordance with a certain random procedure. A simple harmonic oscillation of frequency A is represented by e27ri"At and the (complex) amplitude associated with the frequencies between A and A dA is denoted by dY"A, and hence the contribution of such frequencies to the process is + e27ri"At dY"A. Here Y is a process which assigns' to each real number A a random variable PROBABILITY 12-19 Y}.. The process X is obtained by adding the contributions associated with the various frequencies. Hence Xt =foo e27ri}'tdY}.. -00 Thus the spectrum of the process X is described by the process Y. The expected value of the square of the amplitude associated with the frequencies between "A and "A d"A is denoted by dF("A) and is defined by + dF("A) = (dY}., dY}.) ~ o. Thus F is a monotone nondecreasing function. A property of the process Y, called the property of orthogonal increments, is the following (dY}., dYp.) = 0, if the intervals d"A, dp, have no common points. Hence R(T) = (X'+T> X,) = (f_: e2 .,,(,+,) dY" f~ e 2 i • " dY,) foo foo e2Ti}.(t+T)e27rip.t (dY}., dYp.) -00 = = -00 foo e27ri}.(t+T)e-27ri}.t dF("A) = foo e27ri}.T dF("A). -00 -00 As a special case of this formula R(O) = foo dF("A). -00 The following example of a one-dimensional Brownian motion will aid in visualizing a random process. A tiny mirror is suspended by a fiber. Particles of air bombard the mirror and cause it to turn through an angle. A beam of light is reflected by the mirror and the position of the reflection enables the observer to measure the angle X(t) through which the mirror has turned at time t. Since X(t) is produced by the average effect of a number of bombardments, one might expect X(t) to have a normal distribution. That is, the probability that X(t) < "A is _~- f}. o-v 211" e-X2/2u2 dx, -00 where 0-2 is the variance and is assumed to be independent of t. From this formula one can readily show that E(X(t)) = o. The zero angle is the angle GENERAL MATHEMATICS 12-20 in which the fiber is untwisted. If t and t + l' are two times at which the mirror is observed, the joint probability that X(t) < Al and X(t + 1') < A2 is 1 271"0" 2 ... / . V ~ - fA! fA2 2 r (1') -ao -ao e -[X~-2r(T)Xy+y2] /20'~[1-r2(T)] This is called the bivariate normal distribution. can show that the covariance is (X(t), X(t + 1') = d d x y. From this formula one 0"2r(1') , and hence that the process is stationary in the wide sense. If it is known that X(t) = a then the probability that X(t + 1') < A is fA 1 O"V 271"(1 - r2(1'» ..., . e-[x-ar(r)] M/20'M[1-r~(T)] dx. -ao Any information concerning the motion previous to time t is irrelevant to this probability. A process having this property is said to be Markovian. The assumption that the above process is Markovian implies that r(1') = e-kT , where k > O. REFERENCES 1. H. Cramer, The Elements of Probability Theory, Wiley, New York, 1950. 2. J. L. Doob, Stochastic Processes, Wiley, New York, 1953. 3. W. Feller, Probability Theory and Its Applications, Wiley, New York, 1950. 4. A. N. Kolmogoroff, Foundations of the Theory of Probability, Chelsea, New York, 1950. 5. P. Levy, Theorie de l'addition des variables alCatoires, Gautier-Villars, Paris, 1937. 6. J. V. Uspenski, Introduction to Mathematical Probability, McGraw-Hill, New York, 1937. 7. Ming Chen Wang and G. E. Uhlenbeck, On the theory of Brownian motion. II. Revs. Mod. Phys., 17,323-342 (1945). A GENERAL MATHEMATICS Chapter 13 Statistics A. B. Clarke 1. Nature of Statistics 2. Probability Background 3. Important Probability Distributions 13-01 4. Sampling 5. Bivariate Distributions 13-06 6. Tests for Goodness of Fit 13-16 7. Sequential Analysis 8. Monte Carlo Method 13·17 9. Statistical Tables References 13-21 13-02 13-04 13-13 13-16 13-18 1. NATURE OF STATISTICS The basic assumption underlying the application of the mathematical theory of probability and statistics to physical situations is the following: If a physical "experiment" is repeated under "identical" conditions and "without bias," the observed relative frequency of success of any physical "event" approaches as a limit the probability assigned to this event by some underlying probability distribution. Probability theory is the study of probability distributions as mathematical entities. Statistics is the analysis of probability distributions on the basis of a number of experimental observations; the distribution is in general not fully known to start with, and one seeks properties of the distribution on the basis of the observations. Since an infinite number 13·01 GENERAL MATHEMATICS 13-02 'of experiments would usually be required to determine a distribution with precision, it is only rarely possible to answer a statistical question with 100 per cent surety. Accordingly the answer to each statistical question should consist of two parts: (a) the best possible answer to the question and (b) the amount of confidence that can be placed in the correctness of this answer. The omission of (b) greatly diminishes the value of the conclusion. 2. PROBABILITY BACKGROUND The basic probability theory required for statistics is reviewed in Chap. 12. For the sake of convenience the principal definitions are recalled here. (See Refs. 2, 6.) SaInple Space. The sample space S is the collection of all, possible outcomes of a physical experiment; the individual outcomes are sample points. By an event is meant a certain type of outcome; in other words, a certain set A of sample points. A class ex of events is assumed ~pecified. To each event A of class ex is assigned a probability, Pr(A), which is a real number between 0 and 1. One has Pr(0) = 0, PreS) = 1, and Pr(A U B) = Pr(A) + Pr(B), provided A, B have no points in common (are mutually exclusive events). A sample space S is discrete if its points form a finite or infinite sequence h, ~2, •••• For discrete spaces a probability is usually defined for each point, and then for each subset A as the sum of the probabilities of the points in A. RandoIn Variables. A random variable is a function X = X (~) which assigns to each sample point ~ a real number x in such a fashion that, for each a, the set A for which x ~ a has a probability; thus Pr(X ~ a) is well defined. With each random variable X is associated a distribution F(x);F(a) = Pr(X ~ a). F(x)isnondecreasing,F(-oo) = O,F(+oo) = 1. If XI, .. " Xn are random variables associated with the same experiment, then their joint distribution is F(XI' "', x n ), where F(al' "', an) is the probability assigned to the set where Xl' ~ aI, "', Xn ~ an. The random variable X has a density j, if F(x) = (1) IX j(t) dt; -00 the random variables Xl, "', Xn have a joint density j if (2) F(x., ... , x n ) = f~ f~· .. f~ I(t., ... , t n) dt n • • ·dt,. When the range or collection of values of X forms a discrete sequence then XI, X2, ••• , STATISTICS I: Pr(X = F(a) = (3) Xi ~a Xi) = 13-03 I: f(Xi) , Xi ~a where Pr(X = Xi) = f(Xi) is the probability assigned to the set of sample points for which X = Xi. This can be generalized to joint distributions. Random variables Xl, "', Xn are mutually independent if (4) where F is the joint distribution and Fi(Xi) is the distribution of Xi. Throughout the following it will be assumed that either the range of each random variable is discrete or else each distribution has a density (continuous case). The expectation or mean of a random variable X is E(X) (5) = loo xf(x) dx [I: Xi!(Xi) (continuous case), -00 (discrete case). i If cp is a continuous function of x, then (continuous case), (6) (discrete case). Moments. The moments of X about the origin are the numbers (7) J.I.'k = E(X lc ), k = 1,2, .... The moments of X about the mean are defined by k = 2,3, (8) where E(X) = J.I.\. The quantity 0"2 = J.l.2 is the variance of X, while 0" = ~ is the standard deviation of X. By expanding the quantity (x - J.I.)k by the binomial formula and applying eq. (8), one obtains an expression for the J.l.lc in terms of J.I.\, J.I. = In· particular, (9) The mean J.I. is a measure of the location of the "center" of the distribution, while the variance 0"2 is a measure of the "spread" of the distribution. 13-04 GENERAL MATHEMATICS Other possible measures of central tendency are: Median: a point Xo such that Pr(X ~ xo) = Pr(X ~ xo), Mode: a point Xo where f(x) is a maximum, Midrange: tea + b), if a ~ x ~ b is the smallest interval containing all x for which f(x) > 0. Other measures of the spread of the distribution are: Mean deviation from the mean = E (I X - J.L I), Probable error: a number a such that Pre IX - J.L I ~ a) = t. For comparison and tabulating purposes it is useful to describe a random variable in a manner independent of origin and scale. These requirements are met by the standardized variable X* = (X - ·J.L)/u, which has mean 0, has standard deviation 1, is dimensionless, and is invariant under any linear change of variable: X' = aX + b. 3. IMPORTANT PROBABILITY DISTRIBUTIONS Binomial or Bernoulli Distribution. If X represents the number of "successes" in n independent trials of an experiment, with probability p of "success" each time, then X takes on the values 0, 1, 2, "', n with probabilities (10) q = 1- p. Hence the sample space S has 2n points ~, each representing one particular succession of successes and failures. The random variable X assigns to each ~ the number of successes in~. The mean and standard deviation are found to be (11) J.L = np, u = v;;;q. Poisson Distribution. A discrete random variable X with values 0, 1, 2, "', is said to have a Poisson distribution if the corresponding functionf(x) has form (12) (x = 0, 1, 2, ... ), where a is a positive constant. One finds (13) J.L = a, u = Va. For large n and small p the binomial distribution (10) is well approximated by the distribution (12), with a = np. STATISTICS 13-05 If a number of events occur independently in space or time and if X represents the number of these events occurring in any given space or time interval, then the Poisson distribution is a good model for the distribution of X. Examples are the number of red corpuscles on a microscope slide, the rate of emission of electrons or a-particles, the number of incoming calls to a telephone exchange. NorIllal Distribution. Let X be a continuous random variable with density (14) Then X is said to have a normal distribution; its mean and standard deviation are J.L and u. One terms ¢(x) the normal density function of mean J.L and standard deviation u; the corresponding distribution cI>(x) (15) = IX ¢(t) dt -00 is the normal distribution function. The function cI> is tabulated for J.L = 0 and u = 1, and any other case is reduced to this by replacing X by its standardized variable X* (Sect. 2). See Table 2, Sect. 9. For large values of n, the binomial distribution may be approximated by the normal distribution having J.L = np, u = vinpq. More precisely, if X has a binomial distribution, then as n -7 00 (16) Pr (~;: ~ t) ~ Pr(X* ~ t) ---> (t). The X2-Distribution. Let X be a continuous random variable with values in the range 0 ~ x < 00. Then X is said to have a x2-distribution with n degrees of freedom, if X has density x (17) ~ O. One finds (18) J.L = n, (n=I,2,"')' This type of distribution is of great importance in the theory of sampling of normal populations (Sect. 4). See Table 3, Sect. 9. GENERAL MATHEMATICS 13-06 Let X be a continuous random variable Student t-Distribution. with density n (19) + 1) r ( -2sn(x) = _ / V n7r r(n/2) ( X2)-Cn+1)/2 1 +- n . Then X is said to have a Student t-distribution with n degrees of freedom (n = 1, 2, ... ). One finds (20) J..I. = 0, u=g. ° As n ~ 00, sn(x) approaches the normal density function of mean and standard deviation 1. The t-distribution is of value in sampling theory (Sect. 4). See Table 4, Sect. 9. 4. SAMPLING In a great variety of practical problems a precise answer is obtainable only by making a very large number of measurements. For the sake of economy, one makes a smaller number of measurements and estimates the true answer from these. The theory of such methods of estimation is called sampling . .Examples. The average height of 1,000,000 soldiers can be estimated by averaging the heights of a selected 1000 soldiers. The outcome of a presidential election can be estimated by polling a small number of voters. The successive measurements in an experiment yield a random sequence XI, .. " Xn called a sample. . EXAMPLE. The measurements of the height of 1000 soldiers yield 1000 numbers. One can regard each soldier as a sample point ~, the aggregrate of all 1,000,000 soldiers as the sample space S. If the heights follow some definite pattern, then there will be a definite probability that the height Xl be less than a fixed value. Hence, there is a distribution function FI(XI) associated with Xl and Xl can be regarded as the value of a random variable Xl. Similar statements apply to the measurements X 2 , " ' , X n . If the measurements are independent (i.e., each one is made without considering the others), all measurements have the same distribution F(x) and XI, "', Xn are random variables with joint distribution (21) The assumptions of the example considered will be assumed to hold generally. A sample space is assumed given, with associated probabilities. STATISTICS 13-07 A measurement x is a value of a random variable X; the probability that X ~ a is F(a), where F is the distribution of X. Successive measurements yield random variables Xl, "', X n . It will be assumed that these are independent, so that eq. (21) gives the joint distribution. Sample Moments. The sa1nple mean or average is the number Xl + ... + Xn x = ------------ (22) n The sample moments about the origin and about the mean are defined respectively as 1 (23) mk n so that x = m'l' the formula The, number 8 2 n L =- (Xi - x)\ i=l = m2 is the sample variance. One has (24) One can regard x and 8 2 as estimates for the mean fJ. and variance (]"2 of X; x and S2, and indeed all the moments, are random variables, being functions , of Xl, "', X n . From the fact that all Xi have a common distribution F(x), one can deduce properties of the distribution of the various moments. For example, (25) 1 E(x) = E ( - "2;X i n ) 1 = - "2;E(X i ) n = -1 ~fJ. n = fJ. Similarly, (26) E(S2) n - 1 = _ _ (]"2 n Unbiased Estimate. A sample estimate is termed unbiased if its expectation is equal to the parameter being estimated. Equation (25) shows that x is an unbiased estimate of fJ.; eq. (26) shows that S2 is not an unbiased estimate of (]"2, although [n/(n - 1)]s2 is such an unbiased estimate. Unbiasedness is a useful property of an estimate, but it is not as important as some other properties. The bias in S2 need be considered only if n is sufficiently small (less than 20, for example), so that (n - l)/n is appreciably ·different from 1. GENERAL MATHEMATICS 13-08 Computational Procedures Data Classification. The computation of sample moments for large samples is simplified by the classification of the data. In this procedure the sample range, the interval from the smallest to the largest sample value, is divided into approximately fifteen class intervals of equal width (the class width). The number of measurements whose Xi value lies in each class interval, the frequency of the class interval, is then recorded, as well as the midpoint of each interval, the class mark. In the subsequent computation one then replaces each sample value Xi by the class mark of the corresponding class interval; usually a negligible error is introduced by this replacement. Example. In measuring height of a population to the nearest 0.1 in. one can choose class intervals 1 in. in width; to avoid ambiguity the end points of the class intervals should be 60.05 in., 61.05 in., .. " for instance, rather than 60 in., 61 in., .... Computation. If there are h class intervals with frequencies Ii and class marks Xj (j = 1, "', h), then the moments are computed as follows: (27) (28) m ,2 1 h n j=l = - ""j-2 L.J jXj, 1 (29) S2 = - n h "" L.J j j (Xj - X-)2 = m,2 - X-2 • j=l The computation can be further simplified by coding the data; that is, by introducing new measurements Yj by a linear change of variables: (a ~ 0), (30) where the coefficients a, b are chosen to simplify the Yj data. The new mean and variance y and Sy2 are related to the old, x and sx 2 , by the equations '(31) x = ay + b, If a is chosen to be the class width and b is taken to be one of the class marks (usually chosen near the middle of the range), then the Yj are integers, positive or negative, so that the computation is considerably simplified. After y and sl are computed, x and sx 2 are found from eq. (31). The procedure is illustrated in tabular form in Table 1. ' STATISTICS TABLE 1. 13-09 COMPUTATION OF SAMPLE MEAN AND VARIANCE Coded Marks Class Intervals aj - aj-l = a aO--al al--a2 a2--a3 a3-·-a4 a4--a5 aG--a7 a7--ag ag--ag ah-l--ah Totals Class Mark Xj Frequency h II 1HJ Xl = ao X2 = al !I I h + !a + !a Vi = Xj - b a hYi hvi -5 -10 -4 -24 50 96 0 0 -3 -2 -1 b = a6 +!a 0 1 2 III fit :t1£ n Sy2 = m'2,y - nfj tP, x= afj + b, nm'2,1/ sx2 = a 2sy2• Distribution of SaIllple MOIllents If some information is known concerning the distribution F(x) of the random variable X being measured, then one can draw conclusions as to the distributions of the sample moments. These conclusions in turn permit one to make statements as to the accuracy of the sample moments as estimates of the true moments. For example, suppose that the variable X is distributed uniformly over an interval of length 1; that is, F'(x) = f(x) = 1 for c ~ x ~ c + 1, and f(x) = 0 otherwise. If c is unknown, each sample will give information as to its value. A single measurement X then allows one to conclude that X - I ~ c ~ X, the mean c + 72 would be estimated as X and one knows that, with probability 1, the mean lies between X - 7~ and X + 72. One now proceeds to list properties of the distribution of sample moments when various assumptions are made concerning the distribution F(x). These results are applied below to estimation of accuracy of the estimates. Distribution of x When (j Is Known. If X is normally distributed, then x is also normally distributed, with mean J.L and variance (j2jn (Sect. 3). Equivalently, one can state that (32) X-J.L_/- x'=--Vn (j has a normal distribution of mean 0 and variance 1. The conclusion is GENERAL MATHEMATICS 13-10 approximately true even if X does not have a normal distribution, provided (See Chap. 12.) Distribution of x When u Is Unknown. Let s = Y:;2, the sample standard deviation and let n is large. t (33) x - p, _ ;---;= - - V n -1, s so that t can be considered as a random variable. If X is normally distributed, then t has a Student t-distribution with n - 1 degrees of freedom. Again the conclusion is approximately true even if X does not have a normal distribution, provided n is large. Furthermore, the i-distribution approaches the normal distribution of mean 0 and variance 1 as n ~ 00. Distribution of s When u Is Unknown. Let (34) If X is normally distributed, then u has a x2-distribution with n - 1 degrees of freedom. Again the conclusion is approximately true for large n, regardless of the form of F(x). Confidence Intervals and Hypothesis Testing The results described are now applied to obtain estimates for the accuracy of x and S2 as estimates of p, and u 2 • The accuracy will be described in the terminology of confidence intervals. The statement "the interval (a, b) is a 95 per cent confidence interval for p," means that Pr(A ~ p, ~ B) is 0.95, where A, B are random variables with observed values a, b. One can also say "either a ~ J.l. ~ b or an event of probability only 0.05 has occurred in the sampling." Confidence Intervals for p, When u Is Known. The 95 per cent interval is obtained from the fact that (x - p,)Vn/u has a normal distribution of mean 0 and variance 1. By means of tables (Sect. 9) one determines the number to .95 on the normal density curve such that 95 per cent of the area lies between -to.95 and i O. 95 ; that is, (35)
51n and k > 5. Frequently in such a problem the hypothetical distribution is not completely specified, but contains some adjustable parameters. For example, one might wish to test whether a sample comes from a normal population, in which case the mean and variance of the population must first be estimated from the sample. It can be shown that the x 2-test usually remains valid, provided one further degree of freedom is subtracted for each parameter estimated. More precisely, in order for the test to be valid, the parameters must be estimated by the method of maximum lfkelihood. See Refs. 4, 5. 7. SEQUENTIAL ANALYSIS The usual method of collecting data consists of the determination of a fixed number of observations and their subsequent statistical analysis. Frequently a considerable reduction in the number of observations required can be made by making the observations in sequence and reanalyzing the data after each observation. Such a process is known as a sequential analysis and is particularly useful for such problems as production testing. EXAMPLE. Consider a population whose density function f(x; (J) depends on some parameter (J (mean, variance, etc.) whose value is not known; let STATISTICS 13·17 us suppose that (J can take only one of two given values (Jo, (Jl. The problem is to decide which value is the correct one. In such a decision problem, errors can be made in two ways: by deciding that {}1 is correct when (Jo is actually the true value of (J, or by deciding that (Jo is correct when {}l is actually the true value. Denote the probabilities to be assigned to these two types of errors by a and {3 respectively. The values of a and {3 can be preassigned by an experimenter, and clearly both should be small if one wants to have great confidence in one's decision; however, the smaller a and {3 are taken to be the more observations will be required to come to a decision. Let XI, X2, ••• be the sequence of observed values, and let f(x; (}j) denote the density function of the population when (Jj is the true value of (J, j = 0, 1. Define the quantities n (51) Pjn = II f(Xi; (Jj) (j = 0, 1), i=l Each Pjn can be found from the preceding one after each observation by multiplying by the corresponding f(xn; (Jj). The decision rule is then the following. If {3 (52) 1-{3 --
(u) = f ---= u x2 e -"2 dx FOR 0.00 ~ U ~ (Ref. 10) 2.99. V211' -co .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 .0 .1 .2 .3 .4 .5000 .5398 .5793 .6179 .6554 .5040 .5438 .5832 .6217 .6591 .5080 .5478 .5871 .6255 .6628 .5120 .5517 .5910 .6293 .6664 .5160 .5557 .5948 .6331 .6700 .5199 .5596 .5987 .6368 .6736 .5239 .5636 .6026 .6406 .6772 .5279 .5675 .6064 .6443 .6808 .5319 .5714 .6103 .6480 .6844 .5359 .5753 .6141 .6517 .6879 .5 .6 .7 .8 .9 .6915 .7257 .7580 .7881 .8159 .6950 .7291 .7611 .7910 .8186 .6985 .7324 .7642 .7939 .8212 .7019 .7357 .7673 .7967 .8238 .7054 .7389 .7703 .7995 .8264 7088 .7422 .7734 .8023 .8289 .7123 .7454 .7764 .8051 .8315 .7157 .7486 .7794 .8078 .8340 .7190 .7517 .7823 .8106 .8365 .7224 .7549 .7852 .8133 .8389 1.0 1.1 1.2 1.3 1.4 .8413 .8643 .8849 .90320 .91924 .8438 .8665 .8869 .90490 .92073 .8461 .8686 .8888 .90658 .92220 .8485 .8708 .8907 .90824 .92364 .8508 .8729 .8925 .90988 .92507 .8531 .8749 .8944 .91149 .92647 .8554 .8770 .8962 .91309 .92785 .8577 .8790 .8980 .91466 .92922 .8599 .8810 .8997 .91621 .93056 .8621 .8830 .90147 .91774 .93189 1.5 1.6 1.7 1.8 1.9 .93319 .94520 .95543 .96407 .97128 .93448 .94630 .95637 .96485 .97193 .93574 .94738 .95728 .96562 .97257 .93699 .94845 .95818 .96638 .97320 .93822 .94950 .95907 .96712 .97381 .93943 .95053 .95994 .96784 .97441 .940(]2 .95154 .96080 .95855 .97500 .94179 .95254 .96164 .96926 .97558 .94295 .95352 .96246 .96995 .97615 .94408 .95449 .96327 .97062 .97670 2.0 2.1 2.2 2.3 2.4 .97725 .98214 .98610 .98928 .9 2 1802 .97778 .98257 .98645 .98956 .9 22024 .97831 .98300 .98679 .98983 .9 22240 .97882 .98341 .98713 .9 20097 .9 22451 .97932 .98382 .98745 .9 20358 .9 22656 .97982 .98422 .98778 .9 2 0613 .9 2 2857 .98030 .98461 .98809 .9 2 0863 .9 2 3053 .98077 .98500 .98840 .9 2 1106 .9 23244 .98124 .98537 .98870 .9 2 1344 .9 23431 .98169 .98574 .98899 .9 21576 .9 23613 2.5 2.6 2.7 2.8 2.9 .9 23790 .9 25339 .9 26533 .9 2 7445 .9 28134 .9 2 3963 .9 25473 .9 2 6636 .9 2 7523 .9 28193 .9 2 4132 .9 25604 .9 2 6736 .9 2 7599 .9 28250 .9 2 4297 .9 25731 .9 26833 .9 27673 .9 28305 .9 2 4457 .9 25855 .9 26928 .9 2 7744 .9 28359 .9 2 4614 .9 25975 .9 2 7020 .9 2 7814 .9 28411 .9 2 4766 .9 2 6093 .9 2 7110 .9 2 7882 .9 2 8462 .9 2 4915 .9 2 6207 .9 2 7197 .9 2 7948 .9 2 8511 .9 25060 .9 2 6319 .9 2 7282 .9 28012 .9 2 8559 .9 25201 .9 26427 .9 2 7365 .9 2 8074 .9 2 8605 U Example: <1>(2.57) = .9 2 4915 = .994915. TABLE P Degrees of freedcm P = 0.99 = 3. THE X 2 DISTRIBUTION the probability ofax2 deviation greater than the tabulated value 0.98 4 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.02 0.01 0.00393 0.103 0.352 0.711 1:145 1.635 2.167 2.733 3.325 3.940 0.0158 0.211 0.584 1.064 1.610 2.204 2.833 3.490 4.168 4.865 0.0642 0.446 1.005 1.649 2.343 3.070 3.822 4.594 5.380 6.179 0.148 0.713 1.424 2.195 3.000 3.828 4.671 5.527 0.393 7.261 0.455 1.386 2.366 3.357 4.351 5.348 6.346 7.344 8.343 9.342 1.074 2.408 3.665 4.878 6.064 7.231 8.383 9.524 10.656 11.181 1.642 3.219 4.642 5.989 1.289 8.558 9.803 11.030 12.242 13.442 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 15.981 3.841 5.991 7.815 9.488 11.070 12.592 14.067 15.507 16.919 18.307 5.412 7.824 9.837 11.668 13.388 15.033 16.622 18.168 19.079 21.161 6.635 9.210 11.341 13.271 15.086 16.812 18.475 20.090 21.666 23.209 0.000157 0.0201 0.115 0.297 0.554 0.872 1.239 1.646 2.088 2.558 0.000628 0.0404 0.185 0.429 0.752 1.134 1.564 2.032 2.532 3.059 12 13 14 15 16 17 18 19 20 3.053 3.571 4.107 4.660 5.229 5.812 0.408 7.015 7.633 8.260 3.609 4.178 4.765 5.368 5.985 6.614 7.255 7.906 8.567 9.237 4.575 5.226 5.892 6.571 7.261 7.962 8.672 9.390 10.117 10.851 5.578 6.304 7.042 7.790 8.547 9.312 10.085 10.865 11.651 12.443 6.989 7.807 8.634 9.467 10.307 11.152 12.002 12.857 13.716 14.578 8.148 9.034 9.926 10.821 11. 721 12.624 13.531 14.440 15.352 16.266 10.341 11.340 12.340 13.339 14.339 15.338 16.338 17.338 18.338 19.337 12.899 14.011 15.119 16.222 17.322 18.418 19.511 20.601 21.689 22.775 14.631 15.812 16.985 18.151 19.311 20.465 21.615 22.760 23.900 25.038 11.215 18.549 19.812 21.064 22.307 23.542 24.769 25.989 27.204 28.412 19.675 21.026 22.362 23.685 24.996 26.296 27.587 28.869 30.144 31.410 22.618 24.054 25.472 26.873 28.259 29.633 30.995 32.346 33.687 35.020 24.725 26.217 27.688 29.141 30.578 32.000 33.409 34.805 36.191 37.566 21 22 23 24 25 26 27 28 29 30 8.891 9.542 10.196 10.&56 11.524 12.198 12.879 13.565 14.256 14.953 9.915 10.600 11.293 11.992 12.697 13.409 14.125 14.847 15.574 16.306 11.591 12.338 13.091 13.848 14.611 15.379 16.151 16.928 17.708 18.493 13.240 14.041 14.848 15.659 16.473 17.292 18.114 18.939 19.768 20.599 15.445 16.314 17.187 18.062 18.940 19.820 20.703 21.588 22.475 23.364 17.182 18.101 19.021 19.943 20.867 21. 792 22.719 23.647 24.577 25.508 20.337 21.337 22.331 23.337 24.331 25.336 26.336 27.336 28.336 29.336 23.858 24.939 26.018 27.096 28.172 29.246 30.319 31.391 32.461 33.530 26.171 27.301 28.429 29.553 30.675 31.795 32.912 34.021 35.139 36.250 29.615 30.813 32.007 33.196 32.671 33.924 35.172 36.415 37.052 38.885 40.113 41.337 42.557 43.773 36.343 31.659 38.968 40.270 41.566 42.856 44.140 45.419 46.693 47.962 38.932 40.289 41.638 42.980 44.314 45.642 46.963 48.273 49.588 50.892 1 2 3 4 5 6 7 8 9 10 11 ~4.382 35.563 36.741 31.916 39.087 40.256 (J) ~ ~ Vi -4 n (J) -- For degrees of freedom greater than 30, the expression v'2xi - V2n' - 1 may be used as a normal deviate with unit variance, where n' is the number of degrees of freedom. Reproduced from Statistical Methods Jor Research Workers, 6th ed., with the permission ot the author. R. A. Fisher. and his publisher. Oliver and Boyd, Edinburgh. ...... Cf ...... -.0 GENERAL MATHEMATICS 13-20 TABLE Degrees of freedom 11 4. STUDENT'S t DISTRIBUTION * Prbbability of a deviation greater thant .005 .01 .025 .05 .1 .15 fj 63.657 9.925 5.841 4.604 4.032 31.821 6.965 4.541 3.747 3.365 12.706 4.303 3.182 2.776 2.571 6.314 2.920 2.353 2.132 2.015 3.078 1.886 1.638 1.533 1.476 1.963 1.386 1.250 1.190 1.156 6 7 8 9 10 3.707 3.499 3.355 3.250 3.169 3.143 2.998 2.896 2.821 2.764 2.447 2.365 2.306 2.262 2.228 1.943 1.895 1.860 1.833 1.812 1.440 1.415 1.397 1.383 1.372 1.134 1.119 1.108 1.100 1.093 11 12 13 15 3.106 3.055 3.012 2.977 2.947 2.718 2.681 2.650 2.624 .2.602 2.201 2.179 2.160 2.145 2.131 1.796 1.782 1.771 1.761 1.753 1.363 1.356 1.350 1.345 1.341 1.088 1.083 1.079 1.076 1.074 16 17 18 19 20 2.921 2.898 2.878 2.861 2.845 2.583 2.567 2.552 2.539 2.528 2.120 2.110 2.101 2.093 2.086 1.746 1.740 1.734 1.729 1.725 1.337 1.333 1.330 1.328 1.325 1.071 1.069 1.067 1.066 1.064 21 22 23 24 25 2.831 2.819 2.807 2.797 2.787 2.518 2.508 2.500 2 492 2.485 2.080 2.074 2.069 2.064 2.060 1.721 1.717 1.714 1.711 1.708 1.323 1.321 1.319 l.318 l.316 1.063 1.061 l.060 1.059 1.058 26 27 28 29 30 2.779 2.771 2.763 2.756 2.750 2.479 2.473 2.467 2.462 2.457 2 056 2.052 2.048 2.045 2.042 1.706 1.703 1.701 1.699 1.697 1.315 1.314 1.313 1.311 1.310 1.058 1.057 1.056 1.055 1.055 00 2.576 2.326 l.960 1.645 1.282 1.036 1 2 3 4 14 The probability of a deviation numerically greater than t is twice the probability given at the head of the table. * This table is reproduced from Statistical Methods lor Research Workers, with the generous permission of the author, Professor R. A. Fisher, and the publishers, Messrs. Oliver and Boyd. STATISTICS 13-21 REFERENCES 1. I. "V. Burr, Engineering Statistics and Quality Control, McGraw-Hill, New York, 1953. 2. H. Cramer, 'The Elements of Probability Theory, Wiley, New York, 1954. 3. W. J. Dixon and F. J. Massey, Introduction to Statistical Analysis, McGraw-Hili, New York, 1951. 4. P. G. Hoel, Introduction to Mathematical Statistics, Wiley, New York, 1947. 5. A. M. Mood, Introduction to the Theory of Statistics, McGraw-Hili, New York, 1950. 6. J. Neyman, First Course in Probability and Statistics, Henry Holt, New York, 1950. 7. A. Wald, Sequential Analysis, Wiley, New York, 1947. 8. G. U. Yule and M. G. Kendall, Introduction to the Theory of Statistics, Giffin and Co., London, 1937. 9. Symposium on Monte Carlo Methods, H. A. Meyer, Editor, Wiley, New York, 1956. 10. A. Hald, Statistical Tables and Formulas, Wiley, New York, 1952. NUMERICAL ANALYSIS B. NUMERICAL ANALYSIS Richard F. Clippinger and Joseph H. Levin, Editors 14. Numerical Analysis, by Bernard Dimsdale Murray Mannos J. M. Cameron R. F. Clippinger J. B. Diaz Bernard Friedman Eugene Isaacson Robert Richtmyer B NUMERICAL ANALYSIS Chapter 14 Numerical Analysis Richard F. Clippinger and Joseph H. Levin, Editors 1. Interpolation, Curve Fitting, Differentiation, and Integration, by Bernard Dimsdale 14-01 2. Matrix Inversion and Simultaneous Linear Equations, by Murray Mannos 14-13 3. Eigenvalues and Eigenvectors, by Murray Mannos 14-28 4. Digital Techniques in Statistical Analysis of Experiments, by Joseph M. Cameron 14-48 5. Ordinary Differential Equations, by Richard F. Clippinger 14-55 6. Partial Differential Equations, by J. B. Dia:z, Richard F. Clippinger, Bernard Friedman, Eugene Isaacson, and Robert Richtmyer 14-64 References 14-88 1. INTERPOLATION, CURVE FITTING, DIFFERENTIATION, AND INTEGRATION Bernard Dimsdale Definitions. Suppose j(x) is a function about which the following is known: at each of n + 1 points Xo, XI, " ' , x n, called the basic set of points, the numerical value of j or of one of its derivatives is known. It is to be noted that X may represent one or more independent variables. Suppose g(x; ao, aI, " ' , an) is given analytically and the a's are determined so that g has the same numerical property as j at each point of the basic set. Then g is called an interpolating junction for j, and R = j - g is called the remainder. 14·01 14-02 NUMERICAL ANALYSIS In the event that g is linear with respect to the a's, that is g(x; a) aogo(x) + algI (x) + ... + angn(x) the interpolating function is called linear, and the functions go, gl, ... , gn are called basic interpolating functions. In the further event that x is a single variable and gi(X) = Xi the function g is called an interpolating polynomial. If a function g(x; ao, ... , am) is given analytically for m ~ n, any requirement whatsoever on f - g over the basic set establishes g as a curve-fitting function. If that requirement is that n 2: [f(Xi) - g(Xi; a)Fw(xi) i=O be minimal then g is a least square fit to f, relative to the weight function w, which is presumed to be positive. Again g may be nonlinear, linear, or polynomial. Interpolation General Solution of Interpolating Problem. For nonlinear g the definitions imply that the a's can be determined by solving n + 1 simultaneous nonlinear algebraic equations. For linear g the equations for a are linear and the problem is solved when an (n + 1)st order matrix is inverted, which of course presupposes that it is not singular. No element of this matrix depends on the values of f or its derivatives, so that the inverted matrix can be used for all those functions f for which the conditions of interpolation, the basic interpolating functions, and the basic set of points are the same. Interpolating Polynomials for Arbitrary Basie Point Sets. If the derivatives of f are not involved in the interpolation, then f(n+l) (~)h(x) R(x) = (n + 1)! ' where hex) is the product of all x - Xv, v = 0, 1, ... , n; hi(X) is the same except that~the factor X - Xi is deleted, f(n+l) (~) is the (n + 1)st derivative of f(x), ~ is an unknown function of x, but is some number between the least and the greatest of the basic set of points. This is Lagrange's formula, and has been put in practicable computing form by Aitken. Form the table NUMERICAL ANALYSIS Xo Xl X2 X3 fo.o fl,O f2.0 fa.O ft.1 f2.1 fa.1 14-03 12.2 fa.2 fa.3 where h.o = !(Xi), j k.i+l = f i.i + (Xi - X)(h.i - !k.j) I Xk - xi j> O. If sufficient information about some derivative, say the pth, is available to show that R = ! (X) = O. Thus four points would have been sufficient. In the event that derivatives are also given, Neville's procedure applies (see Ref. 1). Interpolating Polynolllials for Uniforlllly Spaced Points. In the event that the basic set of points has the property that Xp+l - Xp = h, where h does not change with p, the procedure to be followed, if derivatives do not enter, involves a difference table as follows: Xp ~2fp_1 fp ~3jp-1 ~fp Xp+1 fpH Xp+2 fp+2 Xp+3 fp+3 ~2fp ~3fp ~fp+1 ~2fp+1 ~3fpH ~fp+2 ~2fp+2 where ~kfq = ~k-lfq+1 - ~k-lfq and Ii = f(xj), that is, any element with a ~ is the difference of its two adjacent left neighbors, and is obtained by 14-04 NUMERICAL ANALYSIS subtracting the upper one from the lower one, and the subscripts on fare constant along a line running diagonally downward to the right. Let u (u)r x - Xo = ---, h u(u - = 1) ... (u - r + 1) r! , then and R(x) = j
a then Ij(n)(x) I ~ 1.36n! If it is required that the remainder term shall not exceed 10-10, then for the above n's the h's are 0.0008,0.013,0.03,0.06, and the number of evaluations of integrand per unit b - a is 1250, 77, 33, 16 respectively. Gauss's Forlllula. For any n let Xi = a + (b - a)~i, = b - (b - a)~i, i = 0, 1, "', [n/21, i = [n/2] + 1, "', n. Then, for n = 2N .r. for n = 2N N-l b f(x) dx = (b - a)[ANfN + E Ai(fi + f2N-i)]; +1 .r. b EAMi N f(x) dx = (b - a) +f2N-i), where the A's and the fs are ,given in Table 4. TABLE 4. VALUES OF Ai AND Xi IN GAUSS'S FORMULA n= 1 n=2 ~o n=3 ~o ~o 6 h n=4 ~o 6 ~2 n=5 ~o 6 ~2 n=6 ~o ~I ~2 ~a = = = = = = = = = = = = = = = 0.21132 48654 0.11270 16654 0.5 0.06943 18442 0.33000 94782 0.04691 00770 0.23076 53449 0.5 0.03376 52429. 0.16939 53068 0.38069 04070 0.02544 60438 0.12923 44072 0.29707 74243 0.5 The remainder term is of order h2n+l. Hobson (Ref. 6). = 0.5 =]:\ = t = 0.17392 = 0.32607 = 0.11846 = 0.23931 = 0.28444 = 0.08566 = 0.18038 = 0.23395 = 0.06474 = 0.13985 = 0.19091 Aa = 0.20897 Ao Ao Al Ao Al Ao Al A2 Au Al A2 Ao Al A2 74226 25774 34425 43352 44444 22462 07865 69672 24831 26957 50253 95918 For further development, see NUMERICAL ANALYSIS 14-13 Other Integration Methods. Tchebysheff has developed a method in which the numerical integral has the form !c(fo + il + ... + in) which is useful if i represents data subject to uniform errors, since no error is weighted more than another. For multiple integration the methods given here may be applied repeatedly. If the number of repeated integrations is quite large, the Monte Carlo method is useful. For integrals over an infinite range and for infinite integrands, transformations of the variable of integration can frequently be found which remove the difficulty. 2. MATRIX INVERSION AND SIMULTANEOUS LINEAR EQUATIONS Murray Mannos General Remarks. The development of large scale electronic digital computers has made it numerically possible to invert many large size matrices and to solve large systems of linear equations heretofore considered impractical because of their Jarge size. Problems being attacked by matrix inversion include: (a) The numerical solution of a differential equation, a partial differential equation, or an integral equation satisfying boundary conditions is often achieved by resolving the problem into a large approximating set of algebraic equations. (b) A nonlinear problem is frequently replaced by a sequence of linear systems yielding successively improved approximations to the original problem. (c) Large systems of linear equations, at least in part, are serving as preliminary models for economic and business type problems. The object in linear programming (see Chap. 15), for example, is to maximize (minimize) a linear objective function such as profit (cost) subject to the restraints imposed by a system of linear equations (or inequalities). If the inverse of the matrix of coefficients of a linear system of equations is already known, the solution to the system is obtained by merely multiplying the inverse by the column vector whose components consist of constants on the right-hand side of the equalities. In the revised simplex technique (Ref. 7) designed for solving linear programming problems it is the inverse of certain basic column vectors that is calculated at each iteration or stage of the algorithm. 14-14 NUMERICAL ANALYSIS Practical ways of solving systems of linear equations are divided into two categories: the direct and indirect methods. (a) The direct method yields an exact solution in a finite number of steps provided no roundoff errors are permitted. (b) The indirect method usually involves an infinite number of iterations to get an exact solution. In practice one accepts the fact that one cannot get a precise answer but must be satisfied with a result sufficiently close to the exact result. At this point in the indirect method the calculation is broken off. To be really sure that the answer is sufficiently close either some estimate of roundoff errors must be made or the closeness must be determined perhaps by some physical considerations. Severity of roundoff errors may easily render useless results. . The discussion will be confined to matrices whose elements are real and to linear systems whose coefficients are rea1. Many of the methods and results described apply equally well to the complex elements and coefficients simply by making appropriate word changes. Furthermore, any matrix of order n wit~ complex coefficients may be represented by a real matrix of order 2n. No "best method" for either inverting matrices or solving linear systems of equations can be recommended. For a given technique, a matrix or a linear system of equations can always be constructed which will not work too well but which may work better· with some other technique. In some cases it is a combination of methods, perhaps a direct followed by an indirect method, that works well for a system of linear equations. Ill-conditioned matrices, of which the favorite seems to be the Hilbert matrix, impose an extremely stringent test upon the accuracy of any given matrix inversion technique. A measure of the ill-conditioning of a matrix may be looked upon as the relative smallness of its determinant compared with that of its individual elements. This wiI] suffice here although more sophisticated measures could be used to interpret the notion of ill-conditioned matrices. The Hilbert matrix is denoted by H = (hij) where hij = Iii + j + 1 (i, j = 1, 2, "', n). Having obtained by a given technique a not entirely satisfactory approximation for the inverse of a matrix or for a solution to a system of linear equations, one may consider using techniques for improving the inverse of the matrix or the solution to the linear system of equations as the case may be. To facilitate the evaluation of procedures for matrix inversion or solution of linear systems for use on digital computers, a summary table of approximate storage requirements and number of operations is presented at the end of the section. NUMERICAL ANALYSIS 14-15 Matrix Inversion Each nonsingular square matrix A of order n has an inverse A -1 such that AA- I = A-1A = I. (1) If for A = (aij) the elements aij (i, j = 1, .. " n) are real, then the elements bij of A-I = (bij) (i, j = 1, "', n) are also real. If the aij of the matrix A arc specified, thc problem is to find the numbers bij of A-I. For certain types of matrices this is relatively simple. (a) If D = (dij) is a diagonal matrix, that is, dij = 0, i ~ j and d ii ~ (i = 1, 2, "', n), then the elements of its inverse D- I = (b ij ) are bij = 0, i ~ j, and bii = l/dii (i = 1, 2, "', n). (b) If T = (aij) is a nonsingular lower triangular matrix, that is, aij = 0, i < j, and aii ~ (i = 1, 2, "', n), the elements of its inverse T- I = B = (b ij ) can be obtained essentiaIly by solving a series of linear equations in one unknown. Multiplying each of the columns of B by the first row of T yields allb l1 = 1; (j = 2, "', n). allblj = ° ° This yields bl l = 1/all and blj = ing B by the second row of T gives ° ° (j = 2, "', n). Similarly, multiply- . (j = 1, "', n: j ~ 2). Substituting the known blj (j = 1, "', n) into the latter equations yields new va1ues b2j (j = 2, "', n) from the resulting n linear equations in each of these unknowns. By continuing in this way, multiplication of each of the columns of B by the nth row of T gives anIb ln + an2b2n + ... + annbnn = 1; anIblj + an2b2j + ... + annb nj = ° (j = 1, "', n - 1). Substituting the known bij (i = 1, "', n - 1; j = 1, "', n) yields the values bnj (j = 1, .. " n) of the last row of B. (c) An old standard method for inverting matrices is given by A -1 = (l/det A)(· .. ), where the expression in parenthesis is the transpose of the matrix of cofactors of the elements aij of the given matrix A. This method is not to be recommended as practical for n greater than 3 or 4. (d) If one has already computed the characteristic polynomial or better stil1 the minimum polynomial of a matrix + a1xm-I + ... + am-IX + am; am ~ 0, (-1/ am)(A + alAm-2 + ... + am-II) since A satisfies its m(x) = xm then A-I = m-I 14-16 NUMERICAL ANALYSIS minimum equation. In general it may be as much trouble calculating the characteristic or minimum polynomial as it is to invert the matrix itself. (e) Let Ai denote the ith row of the nonsingular matrix A and Ii the ith row of the identity matrix. Then n Ai = L (j = 1, .. ·,n). aij!j j=l If one has solved for the I/s in terms of the A/s, then n Ij = L bjkAk, k=l and the matrix of coefficients of the latter equation is the desired inverse, i.e., A-I = (b ij ). In general, this method is more cumbersome than a number of the methods described below. Jordan-Gauss Method. Write the matrix A with the identity matrix beside it as shown (2) [Ull a12 a1n 1 0 a21 a22 a2n 0 1 anI a n2 ann 0 0 01 ~j A series of elementary row operations will be applied to A and these will also be applied in the same order to I. When A has been reduced to I by a series of elementary row transformations, then I will in turn be transformed into A-I by the same transformations, and the process will be finished. If A is nonsingular, then for some i = 1, ... , n it follows that ail ~ o. One can by an exchange of rows guarantee that the element in the first row of the first column is different from zero. In case the matrix (aij) has been altered by an exchange of rows one now denotes the left-hand matrix of (2) by (b ij ). Then adding to the ith row - bi t/b l1 times the first row (i = 2, ... , n) the new left-hand matrix of (2) takes the form (3) The minor of order n - 1 in the lower right-hand corner of the matrix (3) has rank n - 1 so that at least one of the elements C2j ~ 0 (j = 2, NUMERICAL ANALYSIS 14-17 n). Applying the same argument as before to this minor, all elements below the diagonal element of column 2 of the left-hand matrix in (2) may be reduced to zero. Similarly the element in the first row, second column may be reduced to zero. The first column remains unchanged while the second one has been altered to the desired form. By continuing in this way the left-hand side of (2) may be reduced to the diagonal form . bn 0 0 o d22 0 o 0 Znn with each diagonal element being different from O. By dividing the first row of (2) by bl l , the second row by d22 , etc., the left-hand matrix of (2) is finally reduced to the identity and the right-hand matrix is now A-I. The diagonal elements bl l , d22 • •• of the first, second, ... columns which are used to reduce the remaining elements of their respective columns to zero are referred to as pivots. Care should be exercised whenever possible not to select a pivot which is too small or too large; otherwise, loss of significance among other difficulties may arise. Numerous variations of the use of elementary row operations for inverting matrices exist in the literature (Ref. 8). Partition Method. Let the nonsingular n X n matrix A be partitioned as A = [All A21 A12] A22 where An is an m X m minor (m < n) which is likewise nonsingular. Then the inverse A-I of A is given by the matrix where Bll B12 B21 B22 and = A l1 - 1 + = -XA-I, = -A- 1y, = L\ -1, A -1 = [Bll B21 1 XA- y, B12] B?.2 14-18 NUMERICAL ANALYSIS Inverting a matrix of order n has been reduced to inverting a matrix of m, and another of order n - m. However, one has to pay the price of performing a number of matrix multiplications afterwards. Morris Escalator Method. By starting with the inverse of the 2 X 2 principal minor M22 in the upper left-hand corner of the nonsingular matrix A one may by the partition method obtain the inverse of the 3 X 3 principal minor M33 in the upper left-hand corner of A. Then M33 is used to compute the inverse M 44 of the 4 X 4 principal minor in the first four rows and columns. Step by step, one dimension at a time, the partition procedure is carried out until A -1 is obtained. The process is uninterrupted until the inverse of one of the M ii fails to exist, a fact which is established by noting that the corresponding l1 i = O. This situation is remedied by interchanging the ith row with an appropriate row, say the jth, of the remaining n - i rows 6f A, computing the inverse of the new i X i principal minor in the left-hand corner, and then continuing as before. In order to obtain A -lone must interchange the ith and jth columns of the resulting inverse so obtained. If several of the inverses of principal minors encountered fail to exist, a similar procedure applies in each instance. GraIn SchInidt Orthogonalization Method. Premultiplication of the nonsingular matrix A by an appropriate matrix P transforms A into an orthogonal matrix, i.e., (4) PA = o. Since the inverse of an orthogonal matrix is its own transpose, it follows from eq. (4) that A-I = A'P'P, where P = DN 0 o o o o 0 o 1 o 0 Ci,i-l 1 0 1 0 o o 1 o 0 o 0 o 0 1 o o o o 0 o o 1 0 NUMERICAL ANALYSIS 1/IQ11 0 0 0 1/IQ21 0 0 0 1/IQnl 14-19 D= and in turn Cij = AiQ'j ---I QjQ'j j = 1, 2, "', i-I, where Ai denotes the ith row of A, Qj denotes thejth row of Q = NA, and IQi I denotes the length of the ith row of Q considered as a row vector. Inversion of Modified Matrices. If the inverse of a matrix A is known, the inverse of a matrix differing from A in only an element, a row, or a column can be found as a result. If the matrix differs from A by several elements, rows, or columns, its inverse may be realized by repeated application of this method. The method is based on the matrix identity (5) (A + XV') -1 = A -1 _ ( A-1 )( 'A-I) X Y (1 + y'A -IX) I where x and yare arbitrary column vectors. The matrix xy' can be made to consist of all zeros except the element in the ith row and jth column where it is to contain a fixed value c. This is easily achieved by taking x = cei and y = Cj where Ci is the unit column vector containing a 1 in the ith position and 0 elsewhere. By taking y = Ci the matrix XV' has x for its ith column and all other columns consist of zeros. Hence, if the vector x stands for the vector difference of the ith column of the matrix whose inverse is desired and the ith column of A, the required inverse is obtained from eq. (5). A similar argument applies if the matrices differ only in one row. Illlproving a COlllputed Inverse (Hotelling and Bodewig, see Refs. 9 and 10). Suppose that the matrix Co is considered a sufficiently good approximation to the inverse of the matrix A so that B = I - ACo has very small elements. If necessary, for some specific purpose, the computed inverse can be improved by forming the sequence k = 1,2, .... Actually, the sequence converges to A -1 and so A -1 is expressible in the following form of an infinite product 14-20 NUMERICAL ANALYSIS 00 = A-I (6) Co II (I + B 2k- 1 ). k=1 Very frequently the improvement found by computing Co(I + B) or perhaps Co(I + B)(I + B2) is sufficiently satisfactory. Although there are a number of variations other than eq. (6) for expressing A-I, the present scheme has some merit when using an electronic digital computing machine, since it is only necessary to keep successively squared powers of B, adding this to the identity matrix I, and premultiplying by the last computed approximation to A-I. Systmns of Linear Equations. Direct Methods Direct methods arrive at an exact solution in a finite sequence of arithmetical operations. Elhnination. Given a set of m ~ n linear equations in n unknowns + + al1 x l (7) a21 x l aml Xl + a22 X 2 + ... + + ... + a m2X2 + ... + a12 x 2 a2n X n = b1 = b2 amnXn = bm alnXn or more briefly in matrix notation Ax = b, the augmented matrix (A Ib) is operated on by a sequence of elementary row operations which reduce the matrix of coefficients A to echelon form (see Chap. 3). If a row of the reduced form of (A Ib) is of the form (0, 0, ... , 0, c), where c ~ 0, the system (7) is inconsistent; otherwise, it is consistent. Arbitrary values are assigned to those x's which do not correspond to a leading coefficient of 1 in some line; while the remaining x's may be solved for in terms of these parameters one at a time as a linear equation in one unknown whose coefficient is l. Note. In the remainder of this section only the case with m = nand the matrix A nonsingular will be considered. Use of Cramer's Rule. Let A(Ie) denote the matrix constructed from A in (7) by replacing column Ie by the column b of right-hand coefficients. Then the unique solution to (7) is given by Cramer's rule in the following form as a ratio of determinants Xk For n = detA(Ie) detA (le=l,···,n). > 3 or 4 this method is not to be recommended as efficient. NUMERICAL ANALYSIS 14-21 Known Inverse. If the inverse A -1 of A has already been calculated by any of the previously described or perhaps other methods, the solution in matrix form is given by x = A-lb. However, if A -1 must be computed for the sole purpose of getting x, the method is not always efficient for large values of n. Conjugate Gradient Method. Most of the iterative schemes involve an infinite number of iterations and so are classified as indirect methods. However, an outstanding iterative scheme called the conjugate gradient method involves but a finite number of iterations and so is classified as a direct method. Because of the way in which the algorithm for this scheme is built up, it seems more appropriate to discuss it after the gradient method, an indirect method. The elegant finite algorithm for the conjugate gradient method seems to have been independently discovered by Stiefel, Hestenes, and Lanczos (Ref. 11). For a linear system Ax = h, det A ~ 0, of n equations the algorithm starts with an initial guess Xo building up successive approximations xI, "', Xn and finally terminates after at most n of these steps or iterations. The corresponding residual vectors (i = 0, 1, "', n) ° so formed are mutually orthogonal to the preceding ones. If ri ~ (i = 0, 1, "', n - 1) then rn orthogonal to each ri means rn must be the null vector 0; since n + 1 linearly independent vectors of dimension n cannot exist. SysteIlls of Linear Equations: Indirect Methods By and large this discussion includes most iterative methods since it takes an infinite number of steps to carry through the whole process. An iteration for solving a system of linear equations is a set of rules for operating on an approximate solution (Xl (k), " ' , Xn (k») to obtain an improved or more precise solution (Xl (lc+l) , " ' , Xn (k+l)). The sequence of approximate solutions so defined must converge to the actua] solution of the given system of equations. In some cases it is a pronounced advantage to start out with a rather good initial approximation (Xl (0), " ' , Xn (0»), whereas in others this is not necessarily true. It is frequently advantageous to improve the solution obtained by a direct method by a few iterations, since the direct solution usually is afflicted with roundoff errors. Seidel Method. One starts off with a guess (Xl (0), X2 (0), " ' , Xn (0») as the initial solution to the linear system (7). Substituting in the first equation of (7) the values X2 (0) for X2, X3 (0) for X3, " ' , and finally Xn (0) for X n , and then solving for X yields a new value Xl (1) as the first component 14-22 NUMERICAL ANALYSIS of the next approximate solution. Next by substituting in the second equation of (7) the newly gained value Xl (1) for Xl and X3 (0) for X3, " ' , Xn (0) for X n , and then solving for X2, one obtains a new value X2 (1) as the second component of the next approximation. Continuing in this fashion and finally substituting in the nth or last equation of (7) the values Xl (1) for XI, X2 (1) for X2, " ' , Xn-l (1) for Xn-l and solving for Xn yields the final component Xn (1) of the new iteration (Xl (1), X2 (1), " ' , Xn (1»). J-'he approximation (Xl (1), X2 (1), " ' , Xn (1») is used primarily in the next iteration to obtain the improved approximation (Xl (2), X2 (2), " ' , Xn (2»). One continues in this way. The process is very \vell adapted to machine usage. Convergence is assured when either the matrix of coefficients A is positive definite or \vhen the diagonal element of the ith row dominates the rest of the row for each i, that is, when (i = 1, 2, .. " n). Convergence is also guaranteed for additional types of matrices, and there are a number of variations of this procedure. In particular, the "back and forth" Seidel method due to Aitken and Rosser was especially designed to handle those cases in which convergence of the regular Seidel method was erratic. Relaxation Method. First write the system (7) in the form (8) bl - allXl - a12 x 2 - ••• - alnXn b2 - a2l X l a22X2 - ••• - a2nXn - = 0 = 0 and assume that none of the diagonal elements aii (i = 1, "', n) is equal to zero. Then take x(O) = (Xl (0), X2 (0), " ' , x/O), " ' , Xn (0») as an initial guess to the solution. If it should accidentally happen that x(O) satisfies (8), one is finished. If not, define the residual vector by reO) = (rl (0), r2(0), " ' , rn(O»), where r/O) (i = 1,2, "', n) is the value or residual obtained by substituting x(O) in the left-hand side of the ith equation of (8). Suppose that r/O) is a component of largest magnitude in r(O). The object then is to reduce the residual r/O) to 0 by altering the value of the ith component x/O) of x(O) while keeping the remaining components of x(O) fixed. The next trial solution x(l) is constructed as follows: Xk(l) X/I) = = (k = 1, "', n; k ~ i) Xk(O) r·(O) Xi(O) + -~-. aii This effects a new set of residuals r(l) with ith residual equal to O. Seleci NUMERICAL ANALYSIS 14-23 the residual of maximum magnitude and similarly apply the same scheme as above to obtain X(2). This process is repeated again and again so as ultimately to reduce all residuals to as close to 0 as possible. It is sometimes possible to speed up convergence by picking residuals not necessarily of maximum magnitude. In fact, by varying several of the variables at one time it may be possible to speed up convergence considerably. However, it would be very difficult to write a code including many such variations and tricks. Note. In the following sections it is often convenient to introduce a measure or metric different from the usual one in order to cut down on the amount of computation required. ApproxiIllations. Let A be a symmetric positive definite matrix, then the length of a vector x with respect to the metric A is defined as I x IA = (x' Ax)Y2, and any two vectors x and yare conjugate or A-orthogonal if x' Ay = O. These are extensions of the usual definitions of length and orthogonality. The latter may be obtained from the new definitions by taking A = I. With respect to the usual metric, IAx - b 12 = 0 if and only if Ax - b = ' O. This means that solving Ax = b is equivalent to finding an x such that IAx - b 12 is minimized, since it is known that 0 is its minimum value. Likewise with respect to the generalized metric B, lAx - b IB2 = 0 if and only if Ax - b = 0 since B must be positive definite. Now let (9) f(x) = lAx - b IB2 and consider the family of hyperellipsoids (10) f(x) = k, where k may take on any constant value. Then the solution of the system Ax = b is the common center of the family of ellipsoids eq. (10). The game then is to construct a set of approximations x(O), x(1), ••• which get us to or close to this center. The more rapidly this happens the less computation is involved. Gradient Method. Start with a guess x(O) as an initial approximation to the solution of Ax = b. The ellipsoid of the family (10) obtained by setting k = f(x(O» passes through the point x(O) in n-dimensional space. Then proceed in the direction of the gradient of -f(x) at x(O) that is, along the inner normal to the ellipsoid f(x) = f(x(O». It is known that f(x) decreases most rapidly along the latter direction and so it is natural to proceed in this direction until one arrives at the minimum of f(x) along this inner normal. This happens at that point x(l) where the inner normal becomes a tangent to one of the family of ellipsoids in eq. (10). Similarly, proceed along the inner normal of the ellipsoid f(x) = f(x(l) until the 14-24 NUMERICAL ANALYSIS minimum of f(x) in this direction is reached. Continue in this way and work in closer and closer to the common center of the family of ellipsoids in (10). The algebraic procedure for solving Ax = b with respect to the metric B according to the geometric scheme described above is as follows: (a) Compute C = A'BA c = A'Bb. (b) Make an initial guess x(O) • (c) Use the following algorithm to obtain the approximation x(i+1) from that of x(i). (i) Calculate the vector z(i) in the direction of the gradient of f(x) at x(i); i.e., z(i) = Cx(i) - c. (ij) Calculate (iii) Obtain x(i+l) = x(i) - aiz(i), where the coefficient ai determines the minimum value of f(x) in eq. (9) along the inner normal to f(x) = f(X(i») ,at x (i) • If A is a symmetric positive definite matrix, it is most convenient to choose the metric B = A -1, for then A replaces Band b replaces c throughout the above algorithm with a resulting simplification. A considerable advantage of the gradient method is that there need not be an accumulation of roundoff error since the vector z(i) along the gradient can be recalculated for each iteration. The function f(x) in eq. (9) may be regarded as a measure of the closeness of an approximation x to the true solution A -lb. For the gradient method it is true that f(x(i+l)) < f(x(i») for each i and that f(x(i») approaches 0 in the limit; that is, X(i) converges steadily toward the true solution A -lb. However, it is still true that the convergence may be slow or, in other words, it may take many iterations to get close to the center of the ellipsoids. A number of variations of the gradient method have been devised to try to speed up the convergence. Conjugate Gradient Method. First consider the case where A is symmetric and positive definite. The object in the conjugate gradient method as in the gradient method is to get to the common center of the family of ellipsoids eq. (10). However, the route taken in the conjugate gradient method is different from that of the gradient method and is so modified as to get to the center of the family eq. (10) in but a finite number of steps, namely, at most n iterations. The procedure in three dimensions will be described. The discussion in higher dimensions follows along similar lines. As before, make an initial guess x(O) and proceed from x(O) along the negative gradient of f(x) or what is the same along the inner normal of NUMERICAL ANALYSIS 14-25 the three-dimensional ellipsoid f(x) = f(x(O»). Take as the next approximation the point x(l), which is the midpoint of the resulting chord of the ellipsoid f(x) = f(x(O»). Consider the diametral plane through x(l) containing the locus of midpoints of the chords of f(x) = f(x(O»), which are paral1el to the direction of the inner normal. The diametral plane so formed cuts out a two-dimensional el1iptic cross section from the el1ipsoid f(x) ~ f(x(O»). The common center of the ellipsoids (10) of interest lie in this two-dimensional elliptic cross section, and the method is designed so that all subsequent approximating points shall remain trapped in this cross section. The diametral plane of f(x) = f(x(O») is likewise a diametral plane of the interior ellipsoid f(x) = f(x(l)) of the family (10) and cuts it in a two-dimensional e1liptic cross section lying within the previous one cut from f(x) = f(x(O»). Next proceed from X(l) along the gradient of f(x) within the last elliptic cross section formed and take for X(2) the midpoint of the chord so formed in the el1ipse. In other words, instead of proceeding from x(l) along the inner normal of the ellipsoid f(x) = f(X(l») as in the gradient method, proceed along the inner normal of its cross section made by the diametral plane through x(l). Again the locus of centers of chords parallel to the chord through x(l) and X(2) forms a diameter of the elliptic cross section of f(x) = f(x(l)), which contains not only X(2) but also the center of the family (10). Next proceed from X(2) along this diameter, choosing its center as the new and final approximation 'X(3). By barring roundoff error, X(3) yields the exact solution to a linear system of three equations in three unknowns. If either of the chords mentioned above passing through x(O), x(l) happens to pass also through the center of the family (10), the process will end in only one or two iterations, respectively, instead of three. This will be indicated by the residual r(l) = 0 or r(2) = 0, respectively. Algorithms for a symmetric positive definite matrix of order n and for the general n-dimensional case, respectively, will be given below, where Pi denotes a vector in the direction of X(i) to x(i+l). (a) Pick x(O); then let p(O) = r(O) = b - Ax(O), 1r(i) 12 (b) a·= t (11) (p(i»), Ap(i) , + aiP(i) , (c) x(i+l) = x(i) (d) r(i+l) = r(i) - aiAp(i) , 1r(i+l) 12 (e) bi = (f) p(i+l) 1r(i) 12 = , r(i+l) + biP(i), \ L 14-26 NUMERICAL ANALYSIS where the coefficient ai is selected to make X(i+l) the appropriate distance from x(i) and bi to keep p(i+l) in the appropriate direction as described above. The algorithm eqs. (11) may be applied to a matrix which is symmetric and positive semidefinite as well as to a symmetric positive definite matrix. In the case where A is a general matrix, the system Ax = b is replaced by the equivalent system (12) A'Ax = A'b, where A' A is a symmetric and positive semidefinite. The algorithm (11) could thus be applied to eq. (12), but in order to avoid the roundoff errors due to computing A' A, it is better to use the following algorithm which leads to theoretically equivalent results. (a) Pick x(O), then let r(O) = b :... Ax(O) , p(O) ~ A'r(O) , (b) ai = IA'r(i) 12 1Ap(i) 12' (c) x(i+l) = X(i) + aiP(i), (d) r(i+l) = r(i) - aiAp(i>, 1A'r(i+l) 12 (e) bi = 1A'r(i) 12 ' (f) p(i+l) = A'r(i+l) + biP(i). The conjugate gradient method has numerous advantages in addition to those already mentioned. One may start all over again with the last approximation obtained as the initial approximation in order to nullify the effects of accumulated roundoff errors. Also, each successive approximation is better than its predecessor. It is very important to note that the given matrix is unchanged during the procedure so that the original data are used again and again. This permits use of special properties of the given matrix such as its particular form or sparseness. A number of variations of this technique have been devised. A great many of the most important works in the field are to be found in the extensive bibliographies of works by Forsythe and Householder (Refs. 8, 9, 12, and 13). COIllputer Storage RequireIllents and NUIllber of Operations. Storage requirements for a given problem will vary in gener·al with the machine, with the programmer, and with the layout of the program. Hence, in Table 5 the number of storage locations required for the program of a given technique of matrix inversion or solution of a linear system shall simply be denoted by the symbol w. NUMERICAL ANALYSIS 14-27 A multiplication or a division wil1 be identified simply as a multiplication. Likewise an addition or a subtraction will be identified as an addition. Since a multiplication requires from about 2 to 10 times as much time as an addition on most computers, greater weight should be accordingly apportioned to the number of multiplications. If the number of multiplica-:tions required for a given technique turns out to be, for example 2n3 + 3n 1; then 3n + 1 is negligible compared with 2n 3 when n is sufficiently large. One says the number of multiplications required in this case is of the order 2n3 , and this is simply indicated by 2n3 • In the case of the indirect procedures such as the Seidel, relaxation, and gradient methods the number of iterations necessary for a satisfactory solution varies from problem to problem. In fact, the number of iterations required depends upon the original system of equations, the choice of the initial solution, and the accuracy stipulated beforehand. In these cases storage requirements and the number of operations are given for one iteration. For the conjugate gradient method these will be given totally for all n iterations. + TABLE n 5. COMPUTER STORAGE REQUIREMENTS AND NUMBER OF OPERATIONS FOR MATRIX INVERSION AND LINEAR SYSTEMS OF EQUATIONS = the order of matrix involved w = the number of storage locations required for the computer program of a given technique. Method Storage Requirements Multiplications Additions 111 atrix Inversion +w +w ~·n2 + w n 2 + 2n + w n3 2n 2 n2 Jordan-Gauss Morris escalator Gram-Schmidt Modified matrix -~n3 ~ln3 (a) one element n 2 (b) one row or column 2n 2 (c) whole matrix 2n3 n 3 3 in J-I_n 3 (a) n 2 (b) 2n 2 (c) 2n 3 Linear Systems of Equations Elimination Seidel (one iteration) Relaxation (one iteration) Gradient (one iteration) Conjugate gradient (one iteration) +n +w n +n +w n +n +w n 2 + 5n + 1 + w Symmetric positive definite 2n + 6n + 2 + w General case 4n + 5n + 2 + w n 3/3 n 3/3 2 n2 n2 2 n2 n2 2n2 2n2 2 n2 n2 2 3n2 3n2 n2 14-28 NUMERICAL ANALYSIS 3. EIGENVALUES AND EIGENVECTORS Murray Mannos General ReInarks. The characteristic equation of a matrix together with the corresponding eigenvalues (characteristic values) and eigenvectors (characteristic vectors) plays a fundamental role in the theory of mechanical or electrical vibrations. Examples: the flutter vibrations of an airplane wing, the elastic vibrations of a skyscraper or bridge, the buckling of an elastic structure, the transient oscillations of an electric network, and mechanical wave vibrations of molecules and atoms. Similar remarks concerning direct and indirect methods, roundoff errors, etc., apply to the finding of the eigenvalues and eigenvectors as to the inverting of a matrix and the solution of a linear system of equations (see Refs. 8, 9, and 12-14). In practice, it usually happens that all the eigenvalues of a matrix are distinct. This gives rise to a matrix A which can be diagonalized by a similarity transformation. Under a similarity transformation the eigenvalues of A remain invariant. A symmetric matrix can be diagonalized by an orthogonal transformation and similarly a Hermitian matrix can be diagonalized by a unitary transformation. Hence, these types of matrices are frequently singled out for special treatment by somewhat less general methods than apply to the most general type of matrix. Matrices which cannot be diagonalized by means of a similarity transformation or whose eigenvalues are multiple or very closely spaced cause the procedures to become more complex. Results concerning the bounds of eigenvalues are sometimes useful in helping to isolate them. In numerous cases it suffices to find either the dominant or the least eigenvalue. The elements of the matrix A will usually be complex elements but they may be confined to be real numbers in some instances. The matrix A itself will always be of finite order. Approximations for digital computer storage requirements and number of operations for finding the eigenvalues and eigenvectors of a matrix cannot be given as readily as in the cases of matrix inversion and the solution of systems of linear equations. This is because the solution of an eigenvalue problem often consists of a number of major segments, such as an iteration, the reduction of a matrix to a direct sum of triple diagonal matrices whose sizes depend on the original matrix, the solution of complex equations, or the evaluation of transcendental functions at specific places, or the consideration of a sequence of Sturm functions. In the case of the triple diagonal method, consideration of the computer aspects has been NUMERICAL ANALYSIS 14-29 broken down in terms of the more important segments. Similarly, computer information for one step of the reduction process for finding eigenvalues of a symmetric matrix by the Jacobi method is also given in Table G at the end of this section. Characteristic Polynonlial. The characteristic polynomial f(x) of a matrix A of order n over the complex number system may be defined as (13) det (AI _ A) = det [~~21~1I..:~1~a2~ -anI = An + CIA n - + ... + -a2n 1 ~ ~ ~n~ -a n2 1 n -al Cn = f(A). The matrix AI - A has elements which are polynomials in A with complex coefficients. The characteristic polynomial may be found by the following methods: 1. The theory of determinants for such matrices is developed along the same line as for those matrices whose elements are real or complex numbers. Hence, the det (AI - A) can be expanded along any row or column to obtain its characteristic polynomial. This method is not to be recommended for n> 3. 2. The coefficients ClI C2, " ' , Cn of the characteristic polynomial in eq. (13) may be obtained from subdeterminants of the matrix A itself: CI = - (0,11 a22 ann) is the negative of the sum of the diagonal elements of A or simply the negative of the trace of A; C2 is the sum of the determinants of the 2 X 2 principal minors of A (i.e., the totality of minors having two of their elements on the diagonal of A); C3 is the negative of the sum of the determinants of the 3 X 3 principal minors of A, "', Cn = (_l)n det A. Likewise, this method is not to be recommended for n> 3. 3. A finite iterative scheme based on repeated premultiplication by the matrix A yields the coefficients ClI C2, " ' , Cn of (13) also. This is the socaned Souriau-Frame algorithm. + + ... + Al = A, Cl = - trace AI, Ck = - tracek' Ak (k = 2, 3, "', n) 4. Another way of finding the characteristic polynomial of a matrix A is NUMERICAL ANALYSIS 14-30 to build it up one degree at a time by finding the characteristic polynomial of the upper left-hand minors of A in increasing size. Let Mi denote the upper left-hand minor of A of order i, Ii the unit matrix of order i,and fiCA) the characteristic polynomial of Mi. Since it follows from a consideration of the last column that o o + (14) bi-l,i(A) bi-l;i(A) fi-I(A) ii-I (A) where bi -1.i(A) fi~I(A) is the ith or last column of adj (Ali - M i). From the first i - I rows of the expressions in eq. (14) the coefficients of the polynomials bki(A) (k = 1, 2, ... , i-I) are determined by comparing the various powers of A. The leading coefficient of each of the bki(A) (k = 1, 2, ... , i-I) is determined by comparing coefficients of Ai-I. Then by using these known coefficients and by comparing coefficients of Ai - 2 , the second coefficients of each of the polynomials bki(A) (k = 1, 2, ... , i-I) are obtained. By continuing in this way the bki (k = 1, 2, ... , i-I) are completely determined. If the known bki (k = 1, 2, ... , i-I) are now substituted in the resulting equation formed by setting the ith or last rows of eq. (14) equal, the polynomial fiCA) is determined. One first forms fl (A) = A - au and uses the above technique to find f2(A) from II (A), etc., until finally f(A) = fn(A) is obtained from fn-l (A). 5. The method of finite iterations may be used to obtain a polynomial equation from which some of the eigenvalues of a matrix A may be obtained. Let x ~ 0 be an arbitrary vector and form Ax. If x and Ax are linearly NUMERICAL ANALYSIS 14-31 independent, form A 2X. If x, Ax, and 11 2X are linearly independent, form A 3X , etc. Continue in this way until one ultimately comes to a sequence x, Ax, A 2X , " ' , A kx, which is linearly dependent. This must happen for Ie ~ n, since at most n vectors are linearly independent. 'That is, (15) Form the corresponding polynomial (16) The polynomial Pk(A) of eq. (16) is a factor of the minimum polynomial meA) of A, which will be defined explicitly in the subsection on eigenvalues and eigenvectors. If lc = m where m is the degree of meA), then Pk(A) coincides with the minimum polynomial meA). Finally if k = n, then Pk(A), the minimum polynomial meA), and the characteristic polynomial f(A) all coincide. The coefficients CI, C2, " ' , Ck are obtained from eq. (15) by forming a set of linear equations resulting from a comparison of components. 6. The necessity of testing for linear dependence and for solving a system of linear equations are disadvantages of the method of finite iteration. However, the polynomial eq. (16) may be obtained while avoiding these disadvantages by the so-called method of minimized iterations due to Lanczos (Ref. 47). . Lanczos employs a finite algorithm involving the sequences of polynomials: Po(A) = 1 PI (A) = (A - aO)Po(A) P 2 (A) = (A - al)PI(A) - boPO(A) and the vectors given by the equations: (17) where (18) ai-l = - - - y'i-IXi-1 and Xo ~ 0, Yo ~ 0 are not orthogonal but otherwise arbitrary vectors. The algorithm proceeds to calculate the vectors Xi-l and Yi-l until one 14-32 NUMERICAL ANALYSIS of them becomes zero and the process terminates. From Xo and Yo one gets ao from the left-hand equation of (18) by setting i = 1. This determines the polynomial PI(A) and in turn one gets the vectors Xl and YI from eq. (17) by setting i = 2. From Xl and YI one gets the coefficients al and bo from (18) by setting i = 2. This in turn determines P 2 (A) from which one determines the vectors X2 and Y2 by the use of eq. (17) with i = 3. Continuing in this manner ultimately shows that either the vector Xk = 0 or Yk = 0 for some k. When this occurs the polynomial Pk(A), whose coefficients are now determined, is singled out. The polynomial Pk(A) as before is a factor of the minimum polynomial meA) and coincides with meA) if k = m, and with the characteristic polynomial f(A) if k = n. Deterlllination of Eigenvalues and Eigenvectors. f(A) = 0 is called the characteristic equation of the matrix A and the n roots of this equation are called the eigenvalues of the matrix A. From eq. (13) it follows that if A is an eigenvalue of A, then det (AI - A) = f(A) = 0 so that the system of linear equations Ax = AX has a nontrivial solution X ~ 0, and any such solution X ~ 0 is called an eigenvector of the matrix A. Once the coefficients of the characteristic polynomial have been determined, the characteristic equation can be solved by Graeffe's, BernouIli's, or any other known method for solving a polynomial equation to obtain the eigenvalues of A. If n is fairly high, a large amount of precision in the calculations must be exercised or roundoff error may easi1y invalidate the results. Apart from multiplicity it is possible to find the eigenvalues of A by considering a polynomial equation of lower degree than n. In this connection the minimum polynomial of the matrix A will be defined below. By the well-known Cayley-Hamilton theorem it follows that f(A) = O. In general, however, A satisfies polynomial equations of lower degree than n = deg f(A). One denotes by meA) that polynomial of lowest degree with leading coefficient 1 such that meA) = o. This polynomial is unique. Furthermore, the minimum polynomial meA) divides the characteristic polynomial f(A), and each of the eigenvalues of A is a root of meA) = o. The multiplicity of such a root A of meA) = 0 is less than or equal to the multiplicity of A as a root of f(A) = o. Hence, if a certain procedure leads to the construction of the minimum polynomial of A, it may be sufficient to obtain the necessary information concerning the eigenvalues of A from meA), which may be of considerably lower degree than the characteristic polynomial of A and so easier to work with. If one denotes by g(A) the greatest common divisor of the polynomial elements of adj (AI - A) it may be shown that NUMERICAL ANALYSIS meA) 14-33 !(A) yeA) Direct Methods Apart from roundoff errors, the procedures described under the heading of direct methods terminate in a finite number of steps with exact results. The Escalator Method. If the eigenvalues of a symmetric matrix Ai of order i are known and distinct and the eigenvectors are also known, the symmetric matrix obtained by bordering Ai with an additional row and column also has eigenvalues and eigenvectors which can be found in terms of the eigenvalues and eigenvectors of Ai. Furthermore, the eigenvalues of A i +1 are distinct and interlace with those of Ai. Let Ak (Ie = 1, 2, ... , i) denote the eigenvalues of Ai, and Uk denote the eigenvectors of Ai. Then the eigenvalues of A i +1 are obtained by solving the equation for the i + 1 values of fJ. which satisfy this equation. The eigenvector Vk (Ie = 1, 2, ... , i + 1) of A i +1 corresponding to fJ.k (Ie = 1,2, ... , i + 1) may be given by Vk = (UCfJ.kI - A)-lU'ai+l, 1), where U = Ie = 1, 2, ... , i + 1, CUb U2, ••• , Ui), A= Ai and UCfJ.kI - A)-lU'ai+l give the first i components of Vk. Starting with the matrix (an), which has an eigenvalue of an and an eigenvector 1, yields the eigenvalues and eigenvectors of the matrix 14-34 NUMERICAL ANALYSIS and continuing step by stepAinally yields the eigenvalues and eigenvectors of the matrix A itself. It should be observed that it is necessary to calculate the eigenvalues and eigenvectors of each of the submatrices Ai (i = 2,3, ... , n - 1) as well as of the matrix A itself. Triple Diagonal Method. Let A be a real symmetric matrix. The method consists of first reducing the matrix A to a triple diagonal form by means of a specia11y formed orthogonal transformation to be described below. Then the eigenvalues of the resulting matrix S, which are the same as those of A, are obtained with the aid of a Sturm sequence of functions consisting of the determinants of the first principal minors or upper lefthand corner minors of the matrix AI ~ S. Then also the eigenvectors of S associated with an eigenvalue A are obtained directly from the solution of the homogeneous equations (AI - S)x = 0 because of their exceedingly simple form. From the eigenvectors of S, one then constructs the eigenvectors of A itself. 1. In the triple diagonal form of a matrix each element not on the main diagonal, the diagonal just above it, or the diagonal just below it is o. To obtain this form one attempts by appropriate orthogonal transformations to reduce to 0 all elements of the first row beyond the second column and likewise an elements of the first column beyond the second row. If all these elements are already 0, no manipulation is required. If not, one next looks at the element a12. If a12 = 0 and a1j is the first nonzero element of the first row following a12, interchange the second and jth columns and do likewise with the second and jth rows. Thus the new element in the first row second column is nonzero. If a12 ~ 0 to begin with, look at the element a13. If a13 = 0, make an exchange similar to the one above so as to bring a nonzero element into its position. If a13 ~ 0, postmultiply A by the orthogonal matrix R 23 and premultiply A by R 23 -1 = R' 23, where 1 0 (19) and 0 0 0 0 0 C -8 0 0 0 0 8 C 0 0 0 R 23 = 0 0 0 1 0 0 0 0 0 0 1 0 _0 0 0 0 0 1 c= [1 + (:::Yf'; 8 = [~23 = (a '")c. ~nJ a12 This amounts to a rotation in the x2x3-plane, and c, 8 are the cosine and NUMERICAL ANALYSIS 14-35 sine, respectively, of appropriate angles for making the element a'13 of the matrix R 23 -1 AR 23 = (a'ij) equal to 0 and hence also making a' 31 = O. Also a'12 has larger magnitude than a12, and so a'12 ~ 0 also. Furthermore, a'Ii = alj and a'il = ail (i, j = 4, "', n). If a'I4 ~ 0, one may interchange the third and fourth columns and the third and fourth rows of (a'lj) and apply the same type of transformation as before, and give rise to an additional 0 in the first row and column of the newly formed matrix. If a'14 = 0, one looks at a'15, etc. By continuing in this fashion one forms a new matrix whose first row and first column, except possibly for the first two elements in each case, consist of zeros. 2. The same scheme can next be applied to the resulting subma'trix of order n - 1 in the lower right-hand corner. Here instead of R 23 one uses R34 where 1 0 (20) R34 = LJ U 34 ,0 [ o 0 to reduce all elements in first row and column of the (n-l)-st order submatrix to zero except possibly for the first two elements. No elements in the first row or column of the nth order matrix are affected by this. Continue in this way and, if necessary, finally use Rn - I n- = [ o l,n 3 0 ] U n -1.n to effect the final reduction to the following triple diagonal form. s= al bl 0 0 0 0 bl a2 b2 0 0 0 0 b2 a3 b3 0 0 0 0 b3 a4 b4 0 0 0 0 0 0 0 bn - 0 0 0 0 0 0 0 2 an-l bn - bn - an 1 1 It fol1ows that (21) S = T'AT, where T consists of a finite product of orthogonal matrices of the type eqs. (19), (20), etc., and also of the type obtained from interchanging two columns of the identity matrix. NUMERICAL ANALYSIS 14-36 3. If any bi = 0, the eigenvalues and eigenvectors of S can be obtained by a consideration of the eigenvalues and eigenvectors of each of the two individual submatrices thus formed: one above and to the left of the vanishing bi , and the other to the right and below, and both of lower order than S. Further, such subdivisions or simplifications are possible as several additional b's may vanish. It wiII suffice then to treat the case bi ~ (i = 1, 2, "', n - 1). 4. Let ° PO(A) = 1 (22) PI(A) = A - al PiCA) = (A - ai)Pi-I(A) - b2i_IPi_2(A) (i = 2, 3, "', n). By expanding the determinant of the first principal minor of AI - S, whose order is i, in terms of the ith row and ith column, one obtains the last line of eq. (22). If bi ~ 0 (i = 1, 2, "', n - 1), the polynomials Pn(A) = detlAI - SI, Pn-I(A), "', PI(A), PO(A) = 1 form a Sturm sequence. This means that the eigenvalues of S are distinct and may be isolated. Suppose that c < d are two real numbers which are not roots of P n(A). Then the number of variations in sign of P n(c), P n-l (c), "', PI (c), 1 minus the number of variations in sign of Pn(d), Pn-l(d), "', Pled), 1 yields the exact number of eigenvalues of A between c and d. 5. Once an eigenvalue A is determined, the homogeneous equations (AI - S)x = 0 can be solved to obtain the associated eigenvalue x. The equations when written out have the form: X2 = l/b l (A - al)xI X3 = 1/b2[(A - a2)x2 - blxd (23) (i = 4, .. " n) It follows from eqs. (23) that if Xl is taken as an arbitrary nonzero real number that X2, X3, "', Xn can be obtained in turn. The last equation of (22) may be used as a check. When this has been done for each of the A'S and one has all the eigenvectors of S, one must turn attention to finding the eigenvectors of A. NUMERICAL ANALYSIS 14-37 6. Sx = Xx by virtue of eq. (21) implies (T' AT)x = Xx or A (Tx) = X(Tx). Hence Tx, where x is an eigenvector of S associated with X, is the eigenvector of A associated with the eigenvalue X. Adjoint XI - A and Eigenvectors. Here one assumes that the eigenvalues Xi (i = 1, 2, "', n), not necessarily distinct, have already been found. The adj (XI - A), its derivative, or perhaps one of its higher derivatives when evaluated at X = Xi present fertile territory for finding the eigenvectors associated with Xi. The adj (XI - A) is ,a matrix whose elements are polynomials in X but may also be viewed as a polynomial in X with matrix coefficients. If one writes from F(X)(XI - A) = f(X)I = IX n + ciIA n- 1 + ... + cnI one may determine the matrix coefficients F o, FI, "', F n and comparing coefficients of X. These are Fo = l by expanding I FI = FoA + clI F2 = FIA + c21 F n-l = F n-2A + cn_II. If Xi is a simple root of f(A) = 0, then F(Xi) is of rank 1, and a nonzero column of F(Xi) is an eigenvector of A associated with Ai. There exists in this case only one linearly independent eigenvector of A associated with Xi. If Xi is a root of f(X) = of multiplicity 2, there can exist two linearly independent eigenvectors of A associated with Xi. But this need not be the case, as there may exist only one linearly independent eigenvector associated with Ai. In the latter case F(Xi) again is of rank 1 and any nonzero column of F(Xi) is an eigenvector associated with Xi. On the other hand, if there exist two linearly independent eigenvectors associated with Xi, F(Xi) turns out to be the zero matrix. But F'(Xi), the derivative of F(X) at Xi, is of rank 2 and any two linearly independent columns of F' (Xi) are such eigenvectors associated with Ai. Likewise, if Xi is a triple root of f(X) = 0, there can be three linearly independent eigenvectors associated with Xi, but here again this need not be the case. There may be only two linearly independent eigenvectors or even only one. Again if only one linearly inde- ° 14-38 NUMERICAL ANALYSIS pendent eigenvector is associated with Ai, any nonzero column of F(Ai), which has rank 1, is the desired eigenvector associated with Ai. If there are two linearly independent eigenvectors associated with Ai, F(Ai) = 0 and F' (Ai) is of rank 2, and any two linearly independent columns yield the desired eigenvectors. Lastly, if there are three linearly independent eigenvectors, F(Ai) = F'(Ai) = O. But F"(Ai) is of rank 3, and any three linearly independent columns of F" (Ai) are the desired eigenvectors associated with Ai. This procedure can be extended all the way to a root of f(A) = 0 having multiplicity n. Indirect Methods Here the number of arithmetic operations necessary to arrive at exact answers is infinite. The procedures are iterative and the eigenvalues and eigenvectors of a matrix A are found without explicitly calculating the characteristic polynomial of A. Iterative Procedures for HerIllitian Matrices. It is easier to handle the case of a Hermitian matrix since it has real eigenvalues; eigenvectors associated ,vith distinct eigenvalues are mutually orthogonal; and, because it can bediagonalized, the multiplicity of each eigenvalue A equals the number of linearly independent eigenvectors associated with A. Assume, for the time being, that the eigenvalues of a given Hermitian matrix A are distinct. Also all the eigenvalues of A + pI, .which is also Hermitian, can be made positive by picking p sufficiently large so that there is no restriction in assuming that the matrix A has a single dominant eigenvalue, i.e., an eigenvalue whose absolute value is greater than that of any other eigenvalue of A. One first concentrates attention upon a method of finding the dominant eigenvalue and its associated eigenvector. Several methods are then available for finding the remaining eigenvalues of A. The procedure starts with an initial vector Xo and by repeated premultiplication of A builds up the sequence of vectors (24) (p = 1, 2, ... ). In the nonexceptional case for p sufficiently large, the direction of the vector Xp wilJ approach the direction of the eigenvector Ul associated with the dominant eigenvalue AI. In the exceptional cases, Xp will approach either some Ui associated with the eigenvalue Ai (i = 2, "', n) or else O. The latter rarely happens, but at any rate, an Xo can be easily picked so that the former case will apply. Again if p is sufficiently large, the ratio of the ith component (i = 1, 2, "', n) of X p +l to that of Xp can be made arbitrarily close to the dominant eigenvalue. The closeness with which these ratios agree may be regarded as a measure of the accuracy of the 'NUMERICAL ANALYSIS 14-39 approximation to ~'1' An error made during the course of the computation of Xp will not lead to an erroneous result since subsequent multiplication by A will pull the computation back into line. One may alternatively calculate Al by means of a ratio of numbers as defined below. Let (p = 1, 2, ... ); then Al . ap+l p~oo ap = hm - - . If next one desires to find the minimum eigenvalue An of A together with its associated eigenvector, one may consider the matrix cI - A where c > AI' The matrix cI - A is Hermitian, and the same techniques may be applied to it to find its maximum eigenvalue and its associated eigenvector. To get the minimum eigenvalue An of A, one simply changes the sign of the maximum eigenvalue of cI - A and adds c. The eigenvector associated with the maximum eigenvalue of cI - A is also the eigenvector associated with the minimum eigenvalue An of A. After Al and UI have been calculated, the determination of the remaining n - 1 eigenvalues and their associated eigenvectors of the nth order matrix A may be done in terms of a matrix whose order is n - 1 instead of n. If the normalized form of UI is denoted by UI *, i.e., UI * = uti lUll, from the vector UI * a unitary matrix U may be constructed so that D'AU = [AI o 0 ]. Al where Al is a Hermitian matrix of order n - 1 whose eigenvalues A2, A3, .. " An are the n - 1 remaining unknown eigenvalues of A. The dominant eigenvalue A2 of A I and its associated eigenvector V2 can be found as previously. The eigenvector U2* associated with the eigenvalue A2 of A is the vector U (v~). The following construction of the unitary matrix U is due to Feller and Forsythe. One writes 'ill * as fol1ows UI *= (~), where a is a complex number and z is an n - 1 dimensional vector with complex components. rfhen U= [ a -z' z I n- where k = (1 - a)/(l - aa). l - kzz' ]. 14-40 NUMERICAL ANALYSIS' Next one replaces Al by the matrix , where A2 is a Hermitian matrix of order n - 2 having eigenvalues A3, An, and then one repeats the previous step. This is continued until all eigenvalues and eigenvectors of A are obtained. Another way to find the eigenvalues A2, "', An and their associated eigenvectors, once Al and UI * are known, is to form the new Hermitian matrix (25) of order n also. The eigenvalues of Al are 0, A2, .. " An. The known eigenvector UI * is associated with the eigenvalue of AI; while the unknown eigenvector Ui* is associated with the eigenvalue Ai (i = 2, "', n) of Al as well as of A. Thus the dominant eigenvalue A2 of Al and its associated eigenvector U2* can be found as before by forming powers of Al instead of powers of A as in eq. (24). Next one forms the Hermitian matrix ° of order n which has eigenvalues 0, 0, A3, "', An, and the unknown u/ is associated with the Ai (i = 3, "', n) of A2 as well as of A. Thus one obtains A3 and its associated eigenvector U3*. Again one continues in this fashion until all eigenvalues of A and their associated eigenvectors are found. Multiple Roots. So far the possibility of multiple roots has not been considered. Suppose, as before, one starts with Xo and builds up sequence (24), one obtains as before an eigenvector associated with AI. A distinct starting vector Yo may be selected to build up a new sequence which will again lead to the eigenvalue AI. But it may happen that Yo leads to an eigenvector which is linearly independent of the one to which Xo leads. In this case Al is a multiple eigenvalue. If Yo, as in the case of distinct eigenvalues, leads only to an eigenvector which is linearly dependent or simply a multiple of the eigenvector to which Xo leads, then Al is a simple eigenvalue. If Xo and Yo lead to linearly independent eigenvectors, and a third arbitrary vector Zo leads to an eigenvector which is linearly dependent upon the first two eigenvectors, Al is an eigenvalue of multiplicity 2; whereas, if Zo leads to an eigenvector linearly independent of the first two calculated eigenvectors, Al is at least of multiplicity 3. One can continue this process for eigenvalues of higher multiplicity also. NUMERICAL ANALYSIS 14-41 Let ~q be a root of multiplicity 2. Then in the two-dimensional vector space generated by the two linearly independent eigenvectors obtained one may select UI*' U2*' which are orthogonal and of unit length, and which are eigenvectors associated with AI. By starting with UI * and U2* one may similarly, as before, build up a unitary matrix U such that o where A2 is a Hermitian matrix of order n - 2 containing the remaining eigenvalues A3, "', An of A. Similarly, one proceeds in the case of multiple eigenvalues, as outlined before. A number of additional variations for obtaining the eigenvalues and eigenvectors of A is possible. Iterative Process for General Type Matrices. If the matrix A can be diagonalized, the method of successively premultiplying by A applies, with some small appropriate modifications, to this case as well as to the case of the Hermitian matrix. No longer are the eigenvalues of A necessarily real. There may be several distinct dominant eigenvalues. The eigenvectors of A can no longer be assumed mutually orthogonal. In order to get around this situation, one introduces the concept of row eigenvectors as well as column eigenvectors. Associated with each eigenvalue Ai of A is the row eigenvector u(i), where u(i) A = Aiu(i), and the column eigenvector Ui, where A Ui = Aiui (i = 1, 2, "', n). In this case where i ~j, i = j. u(i) and Ui (i = 1, 2, "', n) need not be unit vectors but only = 1 (i = 1,2, "', n). Here U(i)Ui One again starts with an arbitrary initial vector Xo and forms the sequence of eq. 24. A unique dominant eigenvalue and its associated eigenvector are found exactly as before. Whereas, in the case of a Hermitian matrix one forms the matrix Al as in eq. (25) in order to study the remaining eigenvalues and their associated eigenvectors, one now forms the matrix Xp Al = A - AIU(l)UI. Finding the eigenvalues in the case where several eigenvalues are dominant is more complicated, as these are not computed as a simple ratio but NUMERICAL ANALYSIS 14-42 rather as described below. Suppose that Xl p , X2p, ••• , Xkp are Ie linearly independent vectors obtained from the sequence (24). For p sufficiently large, these are arbitrarily close to the actual eigenvectors. It is desired to find the eigenvalues AI, A2, ... , Ak associated with these. Take z as an arbitrary vector and form aip = Z'Xip (i = 1, 2, ... , k), then 1 det l rA alp a2p akp al,p+l a2.p+1 ak.p+l A2 al. p +2 a2.p+2 ak.p+2 Ak al.p+k a2.p+k ak.p+k 1 J= 0 has Ie roots which are close approximations to the eigenvalues AI, A2, ... , Ak. The great majority of matrices appearing in applications have distinct eigenvalues and so can be'diagonalized. Therefore, the method of iterating by premultiplication of a given matrix is applicabJe. In the rare case in which A has a root of multiplicity r and whose associated eigenvectors number less than the full complement of r, it is not possible to diagonalize A. Nevertheless, even in this case, it is still possible to use this iteration scheme to find the dominant eigenvalue and the associated eigenvector of matrix A having but a single dominant eigenvalue. One must, however, consider the linear dependence of a finite number of successive xp's in sequence (24) for p sufficiently large to obtain the dominant eigenvalue AI. The associated eigenvector may be obtained as a linear combination of a finite number of the xp's whose components contain powers of AI. Jacobi Method. The technique applies to Hermitian and so to real symmetric matrices too. The method hinges on the fact that a 2 X 2 Hermitian matrix a>O can be reduced to diagonal form D by a unitary transformation U-IHU = D, where eiifi / 2 cos () U= [ tifi . (26) e- / 2 sin () and () is an angle in the first quadrant which satisfies tan 2(} = 2a/(au -a22). = 0, i.e., H is real symmetric, the matrix (26) reduces to the familiar form If 1/; NUMERICAL ANALYSIS u= COS [ 0 sin 0 -sin 0] cos 0 14-43 I which corresponds to a rotation in the plane. If the nth order Hermitian matrix A = (aij) is written in the form H AI] A = [ ..4\ A22 I then the unitary matrix (27) u, = [~ ~} where U is the matrix (26) and transforms A into a matrix B = (bij), where = b21 = 0; furthermore, the sum of the squares of the diagonal elements of B exceeds the corresponding sum of A by the positive quantity 2a2 • If it is desired to transform A into a matrix B such that bij = bji = 0, the elements of the matrix U in eq. (26) must be positioned in the ith and jth rows as well as in the ith and jth columns of U 1 in eq. (27). One might hope that after applying the product of a finite number of the above unitary transformations one might reduce the matrix A to diagonal form, in which case the sum of the squares of the diagonal elements will have the maximum possible value. Unfortunately, this is not true, as some of the elements, which have previously been reduced to zero, will not remain so while some additional elements are likewise being reduced to zero. The procedure is to reduce to zero a pair of off-diagonal elements of greatest modulus. It is the infinite product of all these transformations which will reduce A to diagonal form /\ and whose diagonal contains the eigenvalues of A. The infinite product of unitary matrices of the type (27) converges to a matrix whose columns are the eigenvectors of the matrix b12 A. Eigenvalues of Special Matrices The types of eigenvalues to which certain important classes of matrices give rise are worth noting. Matrix A (a) Real and symmetric (b) Real, symmetric, and positive definite (c) Real, symmetric and positive semidefinite (d) Orthogonal Every Eigenvalue Ai of A (a) Real (b) Real and positive (c) Real and non-negative (d) IAi I = 1 for every i 14-44 NUMERICAL ANALYSIS In (a), (b), and (c), if a real symmetric matrix is replaced by a Hermitian matrix, the conclusions still remain valid. In (d) if A is unitary, the conclusion drawn there still holds. Some additional properties concerning dominant roots of important classes are listed below: (i) If A is real and symmetric, the maximum eigenvalue Amax is given by x'Ax Amax = max--, Xr"O and the minimum eigenvalue 'x'x is given by Amin x'Ax Amin = min - - . Xr"O x'x (ii) If A is a real positive matrix (i.e., A has positive elements), real number. Amax is a Bounds on Eigenvalues It is often a helpful guide to establish bounds for the eigenvalues of a matrix at the outset, as this may influence the procedure. It is extremely advantageous when this leads to the isolation of some of the eigenvalues of a given matrix. Some of the criteria for determining bounds are easily applied. A number of such results wi1I be stated and in some cases additional information will be given concerning the associated eigenvectors. First, the case of matrices with complex elements wi1l be treated, and subsequently this will be specialized to matrices with positive and also matrices with non-negative elements. However, when results on bounds apply to a large class of matrices, the bounds cannot be expected to be as sharp as those applying to a smaller more specialized class of matrices. The following cases are of interest. 1. Let A be an arbitrary matrix of order n with complex elements. Then IAI ~ nM , where A is any eigenvalue of A, and M is the maximum of the moduli of the elements aij (i, j = 1, ... , n) of A. This result is due to Hirsch in 1902. 2. Let n Ri = L j=l n laijl and Tj = L laijl· i=l Also let R = max Ri (i = 1, ... , n) and T = max T j (j = 1, Then IAI ~ min (R, T). n). NUMERICAL ANALYSIS 14-45 A number of variations on these two bounds (1) and (2) exists. Some of these variations are a bit sharper, but these are simple to apply and will be sufficient for the purposes at hand. 3. Let Pi denote the sum of the moduli of the off-diagonal elements of the ith row of the matrix A and Qj the sum of the moduli of the off-diagonal elements of the jth column of A. That is, n Pi = I: n I aijl j=1 and Qj = I: I aij I· i=1 i,cj j,ci Then a result due to Levy and Hadamard states that each eigenvalue of A lies in at least one of the circles (28) (i = 1, "', n) and in at least one of the circles Iz - alii ~ Qj (j = 1, "', n). In other words, if one takes the diagonal element aii and draws' a circle with aii as the center and Pi (i = 1, "', n) as radius, all the eigenvalues of A wiB be trapped in these n circles. A similar remark applies to the n circles with the Qj (j = 1, "', n) as radii. It is to be noted that an eigenvalue of A may be in several of the n circles. 4. An interesting offshoot of this result is the following: If one of the n circles is isolated from the remaining n - 1 circles, that is, has no point in common with the remaining n - 1 circles, exactly one eigenvalue of A will be found in the isolated circle. More generally Gersgorin showed that when m circles intersect in a connected region isolated from the remaining n - m circles, the connected region thus formed contains exactly m eigenvalues of A. 5. The following results concerning the number of associated eigenvectors is noteworthy. If an eigenvalue A of the matrix A lies in only one of n circles (28), A has only one linearly independent eigenvector associated with it. This result is due to Taussky (Ref. 15). Stein has shown that if an eigenvalue A has associated with it m ~ n linearly independent eigenvectors, A lies in at least m of the circles (28). 6. Before passing to the case of positive and non-negative matrices, it is worth noting a result of Frobenius which gives a connection between the eigenvalues of a matrix with complex elements and a dominating matrix with non-negative elements. Let B = (b ij ) be a matrix with complex elements and A = (aij) be a matrix ,vith non-negative elements such that Ibij I ~ aij (i, j = 1, "', n). Then the characteristic circle of A contains the characteristic circle of B. (The characteristic circle of a matrix is the NUMERICAL ANALYSIS 14-46 smallest circle about the origin containing the eigenvalues of the given matrix.) 7. Turning attention next to real matrices with non-negative elements, one can draw some additional and sharper conclusions. If A is a matrix whose elements aij ~ 0 (i, j = 1, ... , n) then (a) A has a real eigenvalue Ad ~ 0 which is dominant (there may be other dominant eigenvalues), (b) Ad has an associated eigenvector x ~ 0, i.e., all components of x are non-negative, and (c) Ad does not decrease when an element of A increases. The above results are due to Herstein and Debreu, who paralleled for the case of non-negative matrices the results of Frobenius given below. 8. These results grow sharper when one further restricts the matrix A to be indecomposable. A non-negative matrix A is called indecomposable if A cannot be transformed to a matrix of the form by the same permutations of rows and columns where An and A22 are square submatrices of A. If A is a non-negative indecomposable matrix, then (a) A has a real simple eigenvalue Ad > 0 which is dominant; (b) Ad has an associated eigenvector x > 0, i.e., all the components of x are strictly positive; and (c) Ad increases when an element of A increases. These important results were first demonstrated by Frobenius nearly a half century ago. 9. If the matrix A is still further restricted so that all its elements are positive, that is, aij > 0 (i, j = 1, ... , n), then the statement (a) above can be strengthened to include the fact that Ad is the only dominant eigenvalue of A. 10. Again, suppose A is a positive matrix and let n Ri = :E aij (i = 1, ... , n) , R = max {Rl , R2 , ... ,n, R } j=l Frobenius first noted that r ~ Ad ~ R. Also Ad = r = R if and only if all Ri are equal; otherwise, the inequality r < Ad < R holds. Suppose that not all the Ri (i = 1, ... , n) are equal and let o = max {Ri/Rj}, Ri 0, 0 < 1, and Frobenius' result as follows (J' < 1. Ledermann improved the bounds on and Ostrowski further sharpened the bounds with the inequalities r + < (~ - 1) ; ; Ad ;;; R - «I - q). In fact, the right-hand side of Ostrowski's inequalities applies to matrices with complex elements when in the definitions of Rand K one uses the modulus of the elements. More recently, Brauer announced further improvement of the above bounds, stating that the best possible bounds have been attained. That is, in order to get sharper bounds one would have to restrict further the class of positive matrices. 11. Some specialized examples of non-negative matrices are the stochastic matrices and the oscillation matrices. The eigenvalues of the former play an important role in the theory of stochastic processes while the latter type matrices are applicable in the theory of small oscillations of mechanical systems. The matrix A = (aij) is called stochastic if aij ~ 0 (i, f = 1, .. " n) and if (i = 1, "', n). If aij > 0 (i, f = 1, "', n), the matrix A is called a positive stochastic. matrix. All the eigenvalues of a stochastic matrix lie within or on the boundary of the unit circle. Also A = 1 is a dominant eigenvalue of any stochastic matrix. Previous results on non-negative and positive matrices may be directly applied to stochastic matrices. The matrix A of order n is said to be completely non-negative (completely positive) if all minors of all orders from 1 to n of A are non-negative (positive). If A is completely non-negative and there exists a positive integer k such that A k is completely positive, then A is said to be an oscillation matrix. A non-negative matrix A will specialize to an oscillation matrix if and only if det A ~ 0, ai,i+l > 0 and ai+l,i > 0 (i = 1, .. " n - 1). The eigenvalues of an oscillation matrix have the interesting property that they are all strictly positive and simple. For an extensive bibliography on the bounds of eigenvalues, see Ref. 15. NUMERICAL ANALYSIS - 14-48 TABLE 6. COMPUTER STORAGE REQUIREMENTS AND NUMBER OF OPERATIONS FOR FINDING EIGENVALUES AND EIGENVECTORS n w, w', w" = order of matrix involved = program storage requirements Storage Requirements Method ' Multiplications Additions Triple Diagonal Triple diagonal form S Eigenvalues of A a Eigenvectors of S Eigenvectors of A Total for eigenvalues and eigenvectors b + n/2 + w 2n - 1 + w' 2n + w" tn + tn - 1 + w'" n 2/2 2 2 !n3 in3 3n 2 2n3 2n2 n3 J~n3 fn 3 4n 2n Jacobi (Symmetric matrix) Eigenvalues (one step of reduction) n2 + 4 +w a In finding the eigenvalues of A, the number of operations depends on the stipulated requirements for accuracy. If n is sufficiently large, the number of operations required to find the eigenvalues, once the matrix is in triple diagonal form, is negligible compared with the number of operations required to reduce the original matrix to triple diagonal form. b w'" is the sum of w, w', w", and the number of cell locations used in finding the eigenvectors of A. 4. DIGITAL TECHNIQUES IN STATISTICAL ANALYSIS OF EXPERIMENTS Joseph M. Cameron Introduction. In scientific experiments a variable is measured under several different conditions with a view to assessing the effect of these conditions on the variable under study. There may be factors present in the measurement process which, if not balanced out or their effect reduced by randomization or replication, may invalidate the estimates of the effects the experiment seeks to measure. The branch of statistics called the design of experiments is concerned with the construction of experimental arrangements that permit the balancing out of such extraneous factors and at the same time minimizing (for a given number of observations) the uncertainties in the estimates of the effects under study. NUMERICAL ANALYSIS 14-49 In most applications the analysis required is the usual least squares analysis for estimating the parameters postulated to represent the data. In a designed experiment the normal equations that arise in the estimation of the parameters take on a particularly simple form and the calculations have been systematized and given the name analysis of variance. Example. Consider a set of measurements Xb X2, •• " Xn all postulated to be estimates of a single quantity. The least squares estimate for that quantity is, of course, the average £ = ~xdn. One can also compute from the data a measure of the dispersion of the results about this average. Perhaps the most common such measure is the standard deviation, V~(Xi - £)2/(n - 1). In the analysis of variance one deals not with the standard deviation but rather with its square, which is a quadratic form in the deviations divided by the number of independent deviations, called the number of degrees of freedom. The analysis of variance in its general form is a technique for (a) computing estimates of the parameters involved in the problem and (b) computing the value of quadratic forms, called sums of squares, assignable to certain groupings of the parameters, each sum of squares carrying with it a certain number of degrees of freedom (the rank of the quadratic form). Thus in the case of k averages each based on n measurements, the parameters to be estimated are the grand average and the (k - 1) independent deviations of the individual averages about this grand average. Three sums of squares are to be calculated: one for the grand average (with one degree of freedom), one for the deviation of the individual averages about the grand.average [with (k - 1) degrees of freedom], and one for the deviations of the observations about their own group averages [with ken - 1) degrees of freedom]. Several examples of the analysis of variance are presented to illustrate the different techniques of computation that are available. The advantage of one over another probably depends on the nature of the computing device used. The availability of modern high-speed digital computers makes it feasible to analyze experimental data involving a much greater number of factors, each factor occurring at more levels than would otherwise be the case. The types of calculations described above and in the succeeding pages, because of their systematic nature, lend themselves particularly well to treatment on automatic digital computers. Analysis of Factorial Designs Using Hartley Method. An experiment in which the effects of several factors on a variable are studied by making measurements at all possible combinations of the several states or levels for each of the factors is called a factorial experiment. Example. Four temperatures of heat treating can be combined with three time periods NUMERICAL ANALYSIS 14-50 to give rise to twelve conditioning treatments for some alloy. This would be a factorial design with two factors, one at four levels and the other at three levels. The most general method for the analysis of factorials was developed by Hartley (Ref. 22). His method depends on three operators which he has labeled 2:;, D, and ( )2, defined as follows: ~t Sum over all levels t = 1, 2, "', T for each combination of the other subscripts. D t Difference between T times the original values and the total in ,the set ~t to which the original value contributed. )2. Sum of squares of items indicated in the parentheses. Procedure. The use of this technique will be illustrated for a two-factor factorial having one factor at k levels and the other at n levels. Denote by Xij the observation at the ith level of the first factor and jth level of the second factor. Let X.j denote the set of sums Xi} = X.j, there being n L: i such sums. In Table 7 the plan of the calculations is shown. Table 8 shows the analysis of variance table derived from the results of Table 7. TABLE 7. PLAN OF CALCULATIONS USING HARTLEY TECHNIQUE, Two-FACTOR FACTORIAL EXPERIMENT Level of factor B Al A2 Ak BI Xu X21 Xkl B2 Xl2 X22 Xk2 Bn Xln X2n Xkn }";j Dj Xl X2. }";i}";j Xk. nXU - Xl. nX21 - X2. nXkl - Xk. nXl2 - Xl. nX22 - X2. nXk2 - Xk. nXln - Xl. nX2n - X2. nXkn - Xk. Di}";j knxu - kXI. - X .. kXI. - nX.I + x. }";iDj kXk. - X .. knXkl - kXk. - nX.I knXkn - kXk. - nX.n X •. nX.I - X .. nX.2 - X .. nX.n - X .. + X .. DiDj knXln --: kXl. - nX.n + X .. + X .. The estimates of the parameters are obtained by dividing the entries in the sets ~i2:;j, ~iDj, Di~j, and DiDj by nk giving in that order the grand NUMERICAL ANALYSIS 'fABLE 8. 14-51 ANALYSIS OF VARIANCE FOR Two-FACTOR FACTORIAL EXPERIMENT No. of Items Sum of Squares 1 Degrees of Freedom (~i~j)2 (~iDj)2 n (~i~j)2/nk (~iDj)2/n(nk) n-1 (Di~j)2 k (Di~j)2/k(nk) k-1 (DiDj) 2 nk (DiDj)2/(nk) 2 (n - 1)(k - 1) ~~xil nk 1 Sum of Squares Is Associated with: Grand average Effect of different levels of factor B Effect of different levels of factor A Interaction: lack of constancy between levels of A as level of B is varied Total (for check) average, differences among levels of factor B, differences among levels of factor A, and the differences due to lack of constancy of the different levels of factor A as the level of factor B is changed. This technique can be extended to cover the case of three or more factors by using the basic operations of ~, D, or ()2 and is adaptable to other designs as well (see Ref. 22). An alternate procedure necessary when the experiment is run in blocks containing only a fraction of the total number of observations or when a fractional replication design is used is based on the technique described in Ref. 23, and is discussed below. Still another procedure is given in Ref. 16 based on the computation of individual degrees of freedom with orthogonal polynomials tabled in Refs. 20 and 21. Balanced Incolllplete BlocI{s. When there are more objects or treatments than can be compared under the same conditions, i.e., on a given batch of material, in a given time period, or other factor which limits the uniformity of conditions to a few tests, it is necessary to schedule the measurements so that all comparisons of interest may be estimated from the data. The class of designs constructed for such a case is called incomplete block designs, the block being the group of tests within which the environmental or other factor is assumed not to change. The analysis of these block designs will be illustrated for the case of the balanc,ed incom-· plete block design (see Refs. 16-23). Observations have index Xbktr referring to B blocks with K units per block and T treatments with R repetitions of each. The data are entered so that the observations from the first block come first, followed by those from the second block, etc. Step I. Compute total sum of squares of original values, ~Xbk2. 14-52 NUMERICAL ANALYSIS Step II. Consider only indices band k. ( )2 Applied to Result Gives BK Times Operation Number of Items Result 2;k Dk 2;b2;k Db2;lc B BK 1 B Xb. KXbk - Xb. X.. BXb. - X.. Correction factor Unadjusted blocks sums of squares Step III. Now consider only indices t and r. The values of Dk are now rearranged into T groups with R values each so that the R values corre. sponding to the first treatment come in a group followed by a similar grouping for each of the remaining treatments. Call these values d fr and denote operations after rearrangement with asterisk. 2;r *Dk results in d t . = KXt. - B t (2;r *Dk)2 = K(K - l)TR (T - 1) X sum of squares for treatments (adjusted), . where B t = sum of block totals for blocks containing treatment t. Analysis of Variance Total Blocks (unadjusted) Treatments (adjusted) Error Sum of Squares 2;Xbk2 - x2 • ./BK (Db2;k) 2/ BK (T - 1)(~r*Dk)2 K(K - l)TR By subtraction Degrees of Freedom BK-l B-1 T-l BK - B - T +1 Analysis of Factorials by Using Relations alllong the Indices Associated with the Treatlllents. To illustrate the method assume there are three factors A, B, and C having levels n + 1, n + 1, and n + 1 respectively. Each observation is tagged with an index XIX2Xa, where Xl = 0,1, "', n, X2 = 0, 1, "', n, and Xa = 0, 1, "', n, where n is a prime. For the main effect of A form the (n + 1) sums of values whose indices satisfy Xl = mod (n + 1) ° Xl = 1 mod (n + 1) Xl = n mod (n + 1) NUMERICAL ANALYSIS Denote these sums by AI, A 2 , effect of A is given by + A n +l • The sum of squares for the main •• " (~A)2 ~A2 (n 14-53 1)2 - - - - (degrees of freedom = n). (n + 1)3 Similar computations give the sum of squares for the main effects of B and C. For the two-factor interactions the sums of values whose indices satisfy the equations below are computed. :1 + [+ Xl :1 + [+ Xl X2 ~ 0 mod (n X2 = n mod (n + 1) + nX2 ~ 0 mod (n nX2 = n mod (n 1) + 1) + 1) From the (n + 1) sums corresponding to Xl + aX2 = 0, 1, "', n mod 1) are computed the sum of squares associated with the n degrees of freedom for ABO! and the AB interaction is given by the total of such sums over all values of a. For the three-factor interaction one computes the (n l)n2 sums of values for which the indices satisfy (n + + Xl + aX2 + f3X3 = 0, 1, "', n mod (n + 1), where a = 1, 2, "', n, and f3 = 1, 2, "', n. Each group of (n + 1) sums give the sum of squares associated with the n degrees of freedom for the effect ABO!Cf3. For each group one computes: ~ (Sums)2 Number of items in each sum (Grand total)2 Total number of items The extension to higher order interactions is straightforward. This technique is ideally adapted to analysis of variance of factorials where block confounding occurs or to the analysis of fractional replication of factorials. Example. A 3 4 design in blocks of 9 with ABD, ACD 2 , AB 2 C2 , and BC 2D 2 confounded with blocks is computed in the manner 14-54 NUMERICAL ANALYSIS described to get the usual analysis except for the combination of the sums of squares for the three-factor interactions which involve confounding with blocks. For example the ABD interaction is given by the sum of squares associated with AB2D, ABD 2, and AB2D2 each of which has two degrees of freedom. The sum of squares associated with ABD is assigned to blocks. For fractional factorials (with or without block confounding) the analysis is carried out as if it were a complete design' with fewer factors by suppressing one or more of the indices. The individual components, A, "', B, "', AB, AB 2, "', ABC, ABC2, "', are computed, and an identification is made according to the identity relationships (and block confounding, if any). (For further details see Ref. 23.) Analysis of Variance for 2 n Factorials. An example for a 22 experiment will illustrate this procedure. Enter observations in the order designated. Observed Values = Xoo a= XOI (1) b = XI0 ab = Xu First Sums and Differences, Dl +a b + ab Second Sums and Differences, D2 D 22/2 n Will Give + a + b + ab a - (1) + ab - b (1) (1) Corree. for mean A a - (1) ab - b b + ab - (1) - a ab - b - a + (1) B AB In general: (a) Form a column of sums of the 2n - l pairs followed by 2n - l differences between the first and second element of a pair. (b) Repeat this operation on the column so formed until the nth such column is formed. (c) Then square the entries in the nth column and divide by 2n to get analysis of variance table in the order A, B, AB, C, AC, BC, ABC, .... The observations are entered so that their subscripts form an increasing sequence when regarded as binary numbers; e.g., for n = 3 the observations are in the order Xooo XOOl XOlO XOll XlOO XlOl X110 X11l· Analysis of Fractional Replication of 2n Factorials. Arrange the (1/2k)2n = 28 observations in the proper order for a 28 factorial (suppressing the other indices) and carry out the analysis as above. Identify the results of the analysis by using the identity relationships and the block confounding in the manner shown in the following example. EXAMPLE. 7.i replication of 26 in blocks of 8. Fundamental identity: I = ABEF = ACDF = BCDE. Block confounding: CD. NUMERICAL ANALYSIS Block Treatment (1) 1 af 1 be 1 abef 1 cef 2 ace 2 bcf 2 2 abc def 2 2 ade 2 bdf 2' abd cd 1 acdf 1 be de 1 abcdef 1 a Index a 000000 0001 01 0010 10 001111 0100 11 010110 011001 0111 00 1000 11 1001 10 1010 01 1011 00 110000 1101 01 1110 10 1111 11 14-55 Identification Mean A=A B=B AB = AB +EF C=C AC = AC DF BC = 13C DE ABC = error D=D AD = AD CF BD = BD+ CE ABD = error CD = CD AF ACD = F BCD = E ABCD = AE+BF + + + + + BE + blocks Only the first four indices are used. 5. ORDINARY DIFFERENTIAL EQUATIONS Richard F. Clippinger Definitions and Introduction. An ordinary differential equation of nth order is a relation between an independent variable x, a dependent variable Yb and derivatives of Yl up to order n, (dnYddxn = Yl (n)): F(x, Yl(X), y'!(x), "', Yl(n)(X)) = o. By the introduction of new variables, it is possible to obtain a system of n equations of first order: Gl(X, Yl(X), Y2(X), "', Yn(x), y'!(x), "', y'n(x)) = 0 G2(x, Yl(X), Y2(X), "', Yn(X), y'!(x), "', y'n(x)) = 0 Gn(X, Yl (x), Y2(X), "', Yn(X), Y'! (x), "', y' n(x)) = 0 which theoretically can usually be solved in the form: Y'! = 11 (x, Yl (x), "', Yn(x)). y' n = In(x, Yl (x), "', Yn(x)). NUMERICAL ANALYSIS 14-56 With vector notation, this system takes the form: y'(x) = f(x, y(x)), (29) where y is a vector whose components are Yi(X), i = 1, 2, "', n, and f is a vector whose components are h(x, YI (x), "', Yn(x)), j = 1, 2, "', n. Vector notation will be used throughout this section covering systems of equations which can be put in this form. The reader who is not familiar with vectors can take the case where y(x) is a single function of x and use this section as·a guide to the solution of one first order equation. At the end of this section is a summary table of some useful numerical methods for solving differential equations on a digital computer (see Table 11). Some important characteristics of each of these methods a~e listed. The prospective user may employ this table as a quick guide in selecting the most suitable method for the problem at hand. Requireluents for Solution. A solution of eq. (29) is a vector y(x) which satisfies eq. (29). It necessarily possesses a first derivative. The differential equations used by engineers nearly always possess solutions which have continuous derivatives of many or all orders or indeed are analytic (i.e., the Taylor series converges) except at isolated points. They are said to be piecewise continuous and have piecewise continuous derivatives. The isolated discontinuities are of practical importance since engineer's derivatives are such quantities as current, voltage, velocity, and acceleration which he must limit to avoid damage to his equipment. Methods of solving differential equations that are awkward at discontinuities are of restricted value to him. NUlllerical Solution. The Taylor series for y(x) in the neighborhood of some point Xo: y(x) = y(xo) + y'(xo)(x - xo) + ... + y(m) (xo)(x - xo)m 1m! + .. " enables one to approximate y by an mth degree polynomial in x - Xo. Most numerical methods of solving differential equations depend directly or indirectly on this fact. Consider a set of points Xi+j = Xj + ih, i = 0, ± 1, ±2, .... These points are equally spaced along the x-axis and the distance between neighboring points is h, called the grid size. Write the Taylor series of y, hy', h2 y", etc., at each of these points: + ihy'j + ... + imhmy/m) 1m! + R m+b = hy'j + + ... + im-1hmy/m) I(m - 1)! + R m+b 2 = h Y"j + ... + i m- 2 hmy/m) I(m - 2)! + R m+b (30a) Yi+j = y(Xi+j) = Yj (30b) hy'i+j ih 2 y/, (30c) 2 h y"i+j NUMERICAL ANALYSIS 14-57 where Rm+l is a generic notation for a rel1winder which contains hm+1 as a factor. Equations (30) can be used in an endless variety of ways to obtain procedures for the numerical solutions of eq. (29). Solutions for Known Yi and y'i. Yi and y'i are known at several past points (i.e., i = 0, -1, -2, ... , - I) and Yj+l is desired. Solve eqs. (30a) and (30b) at i = -1, ... , - I for 21 of the quantities: h2Yj/2!, h3y/3) /3!, ... , h2I +l y/21 +1) / (21 + I)!, and substitute into eq. (30a) for Yj+l and obtain a formula accurate to terms of degree 21 + 2 in h. Thus we have Table 9. TABLE Formula I Yj+l Yj 9. hy'j EXTRAPOLATION FORMULAS Yj-l hy'j_l 2 3 4 2 3 Yj-2 hy'j_2 Yj-3 hy'j-3 -4 -18 12~ - --a- 4 9 16 5 9 -36 2 18 72 10 64 3 48 Error y(2)h 2/2 y(4)h 4/8 (Euler method) 0 47 ----:3 4 h 611(6) /20 h 8y(8)/70 First Order NI ethod. Formula 1 of Table 9 is the simplest and best known of all solution methods and is due to Euler. The value of Yj+l is (31) then the value of y'j+l is obtained from eq. (29). It can be shown that the approximate solution obtained in this fashion converges to the exact solution as the grid size approaches zero, the error at a given point being proportional to h. This is called a first order method. The principal attraction of this method is its simplicity. Its principal disadvantage whether for hand or electronic digital computation is that it requires a small grid size to obtain a given accuracy. Studying the Stability of the Method. The most illuminating test of any method of solving differential equations (ordinary or partial) is to perturb the solution and study the local properties of the perturbed solution. To illustrate, consider Euler's method for solving a single eq. (29). Suppose that a small error € is made at Xo and that Zj is the Euler solution of eq. (29) with this error at Xo. Let 1]j = Zj - Yj· Then, since Yj and Zj each satisfy eq. (31), one finds, with the mean value theorem, that ~Hl = ~i (1 + h :~ (x;, Yi + O~i)) , 0< 0 < 1. 14-58 NUMERICAL ANALYSIS Consider now a small enough neighborhood of Xo so that second order effects may be neglected, i.e., that ajlay may be taken to be a constant d. Then 1]j+l = 1]j(l + hd) = 1]0(1 + hd)j+l = e[(l + hd)l/hd]hd(j + 1). If attention is focused on a fixed point, x = Xo + (j + l)h, and if h is allowed to approach zero, 1]j+l approaches e exp (x - xo)d. The error at x, due to the error € at Xo, thus remains finite as the grid size goes to zero, and the method is said to be locally stable. The error grows with x if ajlay is positive; otherwise it decreases. The same method shows that the other extrapolation formulas 2, 3, and 4 of Table 9 cannot by themselves be used to solve differential equations because they are locally unstable, i.e., the error at x due to a given error at Xo becomes infinite as the grid size goes to zero. Solutions for Known Yj+l- If Yj+l is obtained in some fashion, y'j+l can be found from eq. (29). Using eq. (30) for hy'i+l in addition to the equations used to obtain Table 9 results in Table 10. TABLE Formula I 5 6 7 0 1 2 Yi+l hy'i+l 10. Yi 1 2" 1 "3 a IT 0 27 -IT EXTRAPOLATION FORMULAS hy'i 1 1f 4 a 27 IT Yi-l 1 27 IT hy'i-l Yi-2 hy'i-2 Yi-3 hy'i-3 (Trapezoidal formula) 1 (Simpson's rule) a 27 a 1 IT TT Error _h 3y(3)j4 -h 5y(5)/90 -1~()h7y(7) Heun's second order method has its basis in formula 5, Table 10, the trapezoidal formula. It is a considerable improvement on Euler's method since a much larger grid size may be used. It is just as stable as Euler's method, requires no past history, and calls for substitution in eq. (29) only once per point. One uses Euler's formula for a first value of Yj+l, eq. (29) to find y'j+l and then Heun's formula for a better value of Yj+l. It is not necessary to recompute y'i+l' The process may be iterated if desired. The procedure which Milne (Ref. 24) recommends most highly for solving ordinary differential equations uses Yi+l = Yj-3 + 4hy'j_l + 8hI3(y'j_l - 2y'i-2 + y'j-3) + ~ gh 5y(5) to extrapolate and formula 6, Table 10, which is Simpson's rule, Yj+l = Yj-l + hI3(y'i-l + 4y'j + y'j+l) - h 5y(5) 190 to recalculate. NUMERICAL ANALYSIS 14-59 Solution by formulas 2 and 6. A procedure which requires less past history and therefore is better for starting and at discontinuities uses formula 2, Table 9, to find a third order approximation to Yj+l and Simpson's rule to recalculate. Either procedure calculates derivatives only once per point. Formula 7, Table 10, is unstable and therefore useful for extrapolation but not for recalculation. Method of Adallls and Bashforth (Ref. 25). This approach is best expressed in terms of differences: \1Y'j = y'j - y'i-I, \1 2Y'j = \1(y'i - y'i-l) = y'i - 2y'j-1 + y'i-2, \1n y 'j = \1(\1n-I y'i - \1 n - Iy'j_I)' Yj+l = Yj + h(y'j + \1y'j/2 + 5\12Y'j/12 + 3\13 y'j/8 + 251\14y'j/720 + ... ). For solutions whose derivatives of some order are everywhere continuous, this method has the advantage of yielding arbitrarily high order of approximation with only one evaluation of derivatives per point. For automatic computer use, it has several disadvantages which lead to its rare use. A special starting process is required; it is awkward to change grid size; at each isolated discontinuity, the special starting process must be used again. The Runge-Kutta l\1ethod. Like Euler's and Heun's methods, this method avoids these difficulties (Refs. 26 and 27). It has several forms. One of the best known, which has a truncation error proportional to h 5 , is: Yi+l = Yj + (leI + 2le2 + 2le3 + le4)/6 leI = hf(Xh Yj), le 2 = hf(Xj + h/2, Yj + led2), + h/2, Yi + le 2/2), hf(Xj + h, Yi + le 3). le 3 = hf(Xi le 4 = The Runge-Kutta method was recently adapted to automatic computers by Gill in a form which concentrates on saving memory and reducing roundoff error (Ref. 28). All forms of the Runge-Kutta method have the disadvantage that the derivatives must be evaluated several times, four in these two cases. Fourth Order Method. This method has been used extensively on automatic computers since 1946 and has been carefully studied by Dims- 14-60 NUMERICAL ANALYSIS dale and Clippinger (Ref. 29). It consists in extrapolating for Yj+2 by the third order formula using one past point (see Table 11): (32a) Yj+2 = Yj-2 + 4(Yj-2 - Yj) + 4h(2y'j + y'j-2) + 2h 4y/4). The derivative y'j+2 is then found and also Yi+l by (32b) Yj+l = (Yj + Yi+2)/2 + (h/4)(y'j - y'i+2) - h4y/ 4)/24. The derivative y'j+l is then found, and Yj+2 is redetermined by Simpson's rule: (32c) Yj+2 = Yj + h/3(y'j + 4y'j+l + y'j+2) + h5y/5) /90. Isolated discontinuities are made to fall at odd-numbered grid points by adjusting h. To start, or at points where the grid size is altered, eqs. (32c) and (32b) are iterated, and eq. (32a) is not used. Thus, like Runge-Kutta's or Gill's methods, this method requires no past history, and is well suited to starting, discontinuities, and change of grid size. By the addition of a single point from past history, it achieves the efficiency of Adam's, Milne's, and other methods requiring only one evaluation of derivatives per point. Higher Derivatives. Sometimes eq. (29) can be easily differentiated. In this case a fourth order, stable procedure requiring no past history is obtained by eliminating h3y(3), and h4y(4) from eqs. (30a), (30b), and (30c) at i = 0, 1: Yj+l = Yj + h(y'j + y'j+l)/2 + h2(y"j - y"j+l)/12 + h5y(5) /720. By adding a single point from past history, one obtains the predictor, Yj+l = 32Yj - 31Yj-l - 2h(8y'j + 7y'j-l) + h2(9Y"j - 4y"j_l)/2 + h6y(6) /720, and the seventh order corrector, Yj+l = yj-l + 2yj + 3h(y'j+l - y'j-l)/8 + h2(8Y"j - y"j-l - y"j+l)/24 + h8y/8) /60450, which can be used except at the start, at discontinuities, and at grid change points. Method of Brock and Murray. A method which takes advantage of the fact that differential equations are locally linear with constant coefficients and therefore have solutions which are locally linear combinations of exponentials has been developed by Brock and Murray (Ref. 48). Extrapolation to Zero Grid Size. If a method has a local error proportional to hn+I, it has an error at a given x proportional to hn, since the number of local errors made going from Xo to x is (x - xo)/h. By call- NUMERICAL ANALYSIS 14-61 ing E the error at x, and the exact answer, y, tl:en, (33) E = Y - Y = ahn + bhn+1 + 1'/, where the remainder, 1'/, goes to zero as hn+2. If eq. (20) is solved numerically at two grid sizes, hI and h2' one may write eq. (33) at both grids and solve for y: (34) y = Yl + (Yl - Y2)r n/(l - rn) + bh2n+lrn(1 - - r)/(l - rn) (1'/1 - rn 1'/2)/(1 - r n ), where r = ht/h 2 • Richardson (Ref. 30), who invented this procedure, called it "extrapolation to zero grid size." Looking at the next to last term, one sees that it would be more apt to call it "increasing the order of accuracy from n to n + 1." Equation (34) is useful in many ways. For example: (a) One can solve (29) at two grid sizes and use eq. (34) to get a better answer at common points. (b) One can solve (29) at one grid size and occasionally take a step at two grid sizes by using the second term to estimate the error and adjust the grid size. (With this procedure it is important to use methods which depend on little past history.) (c) One can take every step at two grids, use eq. (34) to improve the accuracy before proceeding, and also use the second term to adjust the grid size. Boundary Value Problems or Distributed Conditions. It may happen that not all components of yare specified at one value of x. Instead, some of the components of Y may be given in terms of the others at two or more points. A pproach A. Perhaps the most obvious approach to this problem is to: 1. Assume initIal conditions at Xo. 2. Solve the problem. 3. Assume other initial conditions. 4. Resolve the problem. 5. Interpolate between the initial conditions for initial conditions which will satisfy one of the other given conditions at some other point. (This is based on the theorem that the solutions of differential equations are, under suitable conditions, continuous functions of their values at particular points.) . 6. Reiterate this process until all conditions are satisfied. If there are many conditions to be satisfied by varying the same number of components of Y at Xo as parameters, the interpolation process becomes quite complicated. If convergence is also slow, it may be necessary to solve the differential equation thousands of times, treating the different equations and distributed conditions as simultaneous equations for all the variables at all the points. NUMERICAL ANALYSIS 14-62 Approach B is to consider all the distributed conditions and the approximating equations simultaneously. For instance, consider the second order system: . Y' = f(x, y, z), (35) Z' = g(x, y, z), with the distributed conditions: (36) yea) = A, ky(b) + lz(b) + my'(b) + z'(b) = O. One might use Heun's approximating difference equations: (37) Yj+l - Yj = (f(xj, Yj, Zj) Zj+l - Zj = (g(xj, Yj, Zj) + f(Xj+l' Yj+l, Zj+l)) (h/2), + g(Xj+b Yj+l, Zj+l))(h/2). Replacing a, b by Xo, Xn one would write the side conditions eq. (36) in the form Y(Xo) = A, (38) m(Yn - Yn-l) + (zn - Zn-l) = (h/2)[mfn-l + gn-l - kYn - lZn], where the second eq. (36) is replaced by one equivalent to it to third order and fn is written for f(x n, Yn, zn). Equations (37} and (38) are 2(n + 1) simultaneous equations for the 2(n + 1) unknowns Yj, zj, j = 0, 1, 2, .. " n. They are not linear; however, h appears as a factor of the right members and the left members are linear. It is therefore natural and quite practical to define an iterative process, writing Y/ and Zji for the ith approximation to Yj and Zj: Yj+l i (39) - Y/ = (h/2)(f/-I + h+I i-I), Zj+l i - z/ = (h/2)(g/-I + gj+l i-I), m(Yn i - Yn-l i) Zn i - Zn-I i + = (h/2) (mfn-l i - I + gn-l i - I - kYn i - I - Zn i-I). Approach C, useful if f and g are readily differentiable, is to perturb eqs. (35) by introducing 'YJ = Y - Ti, t = Z - z where Ti, z is some approximate solution: af af (40) 'YJ' =-'YJ+-t, ay az It would be possible to use eqs. (39) to find Ti and z and then, by evaluating the derivatives af/ay, etc., at Ti, z solve eq. (40) as linear equations for '1], t subject to initial conditions 'YJ(xo) = t(xo) = 0. NUMERICAL ANALYSIS 14-63 COlllputcr Storagc Rcquirclllcnts and NUlllbcr of Opcrations. The columns of Table 11 provide a guide to the use of the methods listed. Similar remarks apply to this table as in the concluding paragraph of Sect. 2 relative to content and notation of Table 5 in that section. The presence of past history requirements is an important consideration for digital computers because this generally means programming special starting programs for use at boundary points, at points of discontinuity of the solution, or at points where the grid size changes. For this reason formulas requiring no past history are in general easiest to program. It is important in evaluating a digital computer procedure to be able to estimate the number of operations, multiplication times, or other index of the computing time. However, in practical problems in differential equations, this time is almost completely dominated by the time to compute the derivative f(x, y) in eq. (29), and this, of course, cannot be determined except in the context of a specific problem. The next best guide to the volume of computations is the number of times the derivative must be computed per integration step, and this is listed in the last column of Table 11. TABLE 11. Method Extrapolation Formulas (Tables 9, 10) Formula 1 (Euler) Formula 5 (He un) Adams-Bashforth Runge Kutta Gill Fourth Order Method COMPUTER REQUIREMENTS IN SOLUTION OF ORDINARY DIFFERENTIAL EQUATIONS Order of Error h2 h3 Arbitrary h5 hU hO Predictor-corrector formulas h5 Milne Dimsdale-Clippinger h5 (using 3 iterations) Dimsdale-Clippinger h6 (using extrapolator) 5th order predictor-7th h8 order corrector Order of hn+l, Extrapolation to Z'ero grid at least, if siz-e error of formula used is of order h n Past History Required Computer Storage a None None k points, where k is arbitrary None None None 2n +w 3n w n(k 1) w 3 points 5n +w None + + + 4n + w 3n +w On +w On +w 1 point On +w 1 point On +w Derivative Evaluated. Times/Step 4 4 2 (1st derive once and 2nd derive once) 3 2 3n a n is dimension of vector y, and w is undetermined amount of working storage and program storage. 14-64 NUMERICAL ANALYSIS 6. PARTIAL DIFFERENTIAL EQUATIONS J. B. Diaz R. F. Clippinger Bernard Friedman Eugene Isaacson Robert Richtmyer Introduction. A variety of physical problems, when analyzed from a mathematical point of view, lead to the consideration of boundary value problems for differential equations. In many cases, the physical quantity of interest is found to be represented by a function which satisfies a differential equation in a certain domain of the independent variables. Besides the differential equation (which may be ordinary or partial, depending upon whether the independent variables are one or more than one, respectively) the "unknown" function is required to satisfy certain other conditions, which will be referred to collectively as boundary conditions. Generally speaking, these additional boundary conditions select, from the totality of the solutions of the differential equation in question, the solutions which correspond to the actual physical situation under study. Example. The determination of the steady-state temperature in a plane circular plate of unit radius, whose periphery is maintained at a given temperature, amounts to the determination of a real-valued function u(x, y) satisfying the partial differential equation a2ujax 2 + a2ujay2 = 0 for 0 ~ x2 + y2 < 1, and the boundary condition u(x, y) = !(x, y) for x 2 + y2 = 1, where! is a prescribed function. (f is essentially the preassigned temperature distribution on the periphery.) An equation involving a function of two or more variables and its partial derivatives is called a partial differential equation. The order of a partial differential equation is the order of the highest order derivative which actually appears in it. A partial differential equation is linear, if it is of the first degree when considered as a polynomial in the unknown function and its partial derivatives (otherwise the equation is called nonlinear). Example. The equation a2ujax 2 + a2ujay2 = 0 is a linear second order equation, while the equation (aujax)2 + u = 0 is a nonlinear first order equation. NUMERICAL ANALYSIS 14-65 This section will consider linear and second order partial differential equations starting with some mathematical background and leading to a discussion of numerical methods suitable for digital computer use. The section will conclude with a summary table giving some significant attributes of the methods listed from the point of view of digital computer solution (see Table 12). First Order Partial Differential Equations Consider F(x, y, z, p, q) = 0, az az p =-, q =-, ay ax (41) a par:tial differential equation of first order for z as a function of x, y. The general solution of this problem depends on an arbitrary function. Lagrange showed that the general solution could be deduced from a "complete" solution, i.e., a two-parameter family of particular solutions. Lagrange and Charpit also showed that such a complete solution could be deduced from the solution of the system of ordinary equations for x, y, z, p, q in terms of a parameter: dx - = x' y' Q =-, dt (42) = = pet) z' = Pp p' = _ q' aF(x, y, z, p, q) ap aF aq + Qq, (aF + p aF), ax az = _ (aF + q aF). ay az Cauchy showed that any particular solution of eqs. (41) is composed of curves he called characteristics obtained by integrating eqs. (42). Let (43) Xo = f(s), Yo = g(s), Zo = h(s) be the parametric equations of a curve through which a particular solution NUMERICAL ANALYSIS 14-66 of eqs. (41) is to be found. Then Po(s) and qo(s) must satisfy the differential equation, F(f, g, h, Po(s), qo(s)) = 0, (44) and the condition (45) fpo(s) + gqo(s) - h = O. The solution of eqs. (42) subject to initial conditions (43), (44), and (45) can be represented by x = x(u, t), y = y(u, t), z = z(u, t), P = p(u, t), q = q(u, t). Thus the problem of finding the solution of eqs. (41) passing through curve (43) is reduced to the solution of ordinary equations which can be done by the methods of Sect. 5. To illustrate, consider the linear equation, x+y+z+p+q=O subject to the conditions y = z = 0, when (46) O~x~1. When y is zero and x is outside the range (46), z is not defined. Cauchy's method yields the solution + x = s t, y = t, + (s - z = -2t P = e- t - q= - 1 2)(e- t - 1), 1, + (1 - s) e-t. O~s~1. Eliminating sand t yields z = -2y + (x - y - 2)( -1 + e- Y ). NUMERICAL ANALYSIS 14-67 Cauchy's method shows that, in general, if Z is given along some arc of curve C terminated at points A and B, then Z is determined in a strip bounded by a characteristic through A and a characteristic through B. Call this strip the region of determinacy, in our example, t~e strip between y = x and y = x - 1. Any method other than Cauchy's must therefore determine these characteristics one way or another to determine the region of determinacy. Practically, it may be difficult to obtain aFlax, aFlay, and aFlaz. In that case, it is not possible to obtain p and q by using the last two equations (42). As an alternative, let the characteristic curves s = constant and the curves t = constant be used as a curvilinear coordinate network. The transformation from Cartesian coordinates x, y to coordinates s, t is governed by the relations tx = Ys/ A, ty = -XsIA, (47) Sx = -YtI A , Sy = xtl A, where tx is atl ax, etc. By definition and eqs. (47), p (48) = azlax = (ZtYs - zsYt)1 A, q = azlay = (-ZtXs + zsxt)1 A. Start along t = 0 and use Euler's method and the first three of eqs. (42) to get x, y, Z at each point on t = h. Use numerical differentiation to obtain Xs , Ys, and Zs at each point on t = h. Use eqs. (48) to get p, q on the same curve. If more accuracy is desired, H eun' s method may now be used to obtain better values on t = h. The same process may now be repeated for t = -h, 2h, -2h, etc. Second Order Partial Differential Equations The general linear partial differential equation of the second order in the two independent real variables x and y is (49) a 2u a -2 ax a2u a2u au au + 2b-- + c - + d - + e - +fu = ax ay ay2 ax ay g, where the letters a, b, ... , g denote real-valued functions of x and y. The equation is called homogeneous if the "nonhomogeneous term" 9 is identically zero. The linear homogeneous equation has the property that the NUMERICAL ANALYSIS 14-68 "superposition principle of solutions" holds, i.e., that if u and v are solutions, any linear combination Au + Bv, with constant coefficients A and B, is also a solution. Classification. The partial differential eq. (49) can be reduced to certain typical, or canonical forms by means of a suitable change of variables: ~ (50) = Hx, y),. 'f} = 'f}(x, y). Consider first the equation with constant coefficients a2u a2u a2u (51) A -2+ 2B--+ C - = 0 ax ax ay ay2 ' and make the change of variables ~ (52) = ax + {3y, 'f} = 'YX + oy, with a, {3, '1', 0 real constants. In the new variables (53) (Aa 2 + 2Ba{3 ~, ~u 'f}, one has ~u + C(32) -a~2 + (A'Y2 + 2B'Yo + C0 2)-2 a'f} a2u + 2(Aa'Y + B[ao + {3'Y] + C{3o) -a~ a'f} = o. Since eq. (51) is assumed to be of second order, not all three real constants A, B, C are zero, i.e., A 2 + B2 + C2 > o. It will now be supposed further that A ~ O. There is no loss of generality, since if A = 0 and C = 0, too, then B ~ 0, and the equation is already in "canonical" form (see eq. 54 below); whereas if A = 0 and C ~ 0 one has merely to interchange the roles of x and y. The classification into three types is as follows: B2 - AC > 0, hyperbolic type, B2 - AC < 0, elliptic type, B2 - AC = 0, parabolic type. (The reason for the designations elliptic, hyperbolic, and parabolic is obvious from analytic geometry, the reduction of a quadratic bilinear form Ax2 + 2Bxy + Cy2 to a sum of squares.) HYPERBOLIC CASE. When B2 - AC a= > 0, by choosing {3 = 0 = 1, '1'= NUMERICAL ANALYSIS 14-69 in eqs. (52) and (53), and by dividing (53) by a nonzero constant, one obtains the canonical form (54) whereas by choosing a= -C VB2 - AC , o = 1, 'Y = 0, one obtains similarly the canonical form a2 u a2u ---=0. ae ELLIPTIC CASE. a= -C VAG - B2 a1]2 When B2 - AC < 0, by choosing , 0=1, 'Y = 0, one obtains the canonical form a2 u a2 u a~2 + a1]2 = 0. PARABOLIC CASE. When B2 - AC = 0, bychoosing,B = 1, a ~ -BjA and 0 = 1, 'Y = - Bj A, one obtains the canonical form a2 u ae -=0. In the general case of an eq. (49) with variable coefficients, it is said to be of elliptic, hyperbolic, or parabolic type at a given point (xo, Yo) according to whether b2 (xo, Yo) - a(xo, Yo)c(xo, Yo) is < 0, > 0, or = 0, respectively. If the coefficients a, ... , g are sufficiently smooth in a neighborhood of (xo, Yo), and eq. (49) is elliptic at each point of the neighborhood, there is a sufficiently small subneighborhood of (xo, Yo) in which one can introduce new variables by means of eq. (50) (not necessarily a linear change of variables as in the case eq. (51) of constant coefficients) so that eq. (49) becomes, in this subneighbdrhood, 2 au -2 a~ 2 au ) + -aa1]2u + ( Linear terms in -au , - , and u a~ a1] = 0. A similar statement applies in the hyperbolic and parabolic cases. NUMERICAL ANALYSIS 14-70 Of course, eq. (49) with variable coefficients may be of different type at different points of a domain, i.e., it may be of "mixed" type, as occurs in the linearized equation for the potential function of a two-dimensional compressible flow. Example.- The equation ya 2ujay2 + a2ujax 2 = 0 IS elliptic for y > 0, parabolic for y = 0, and hyperbolic for y < o. Representative equations commonly studied are: (a) Elliptic (b) Hyperbolic (c) Parabolic + a2ujax 2 a2ujay2 = 0 a2ujax 2 - a2ujay2 = 0 a2ujax 2 - aujay = 0 Laplace Vibrating string Heat The variable t is usually written instead of the variable y in the last two equations. For a more detailed discussion of the canonical forms of eq. (49), as well as for the classification into canonical forms of higher order equations and systems of equations see Refs. 36-38. Difference Equations. In numerical investigations it is often necessary to replace the partial differential equation occurring in a given boundary value problem py a suitable equa.tion involving differences rather than derivatives of the unknown function (see Chap. 4). The basic principle usually employed is none other than the fact that any partial derivative is the limit of a certain difference quotient. For a function of one variable, f(x), the difference quotients in the plus x and minus x directions, fx and fx, are defined by fx(x) f(x + h) - f(x) h and fx(x) f(x) - f(x - h) h where h > o. The second differences of f(x) are defined as the differences of the first differences. There are three second differences, fxx, fxx( = fxx) , and fxx. The second difference fxx is the most "symmetric" of the three: . f(x fxx(x)· = + h) + f(xh - h) - 2f(x) 2 . The corresponding differences for functions of several independent variables are defined as above, upon holding fixed all the variables but one at a time. For example, for a function of two variables u(x, y): + u(x h, y) - u(x, y) ux(x, y) = - - - - - - - - h and uxxCx, y) = u(x + h, y) - 2u(x, y) h2 + 'u(x - h, y) ' etc. NUMERICAL ANALYSIS 14-71 Laplace equation. By taking the difference equation U xx + Uyfj = 0 as a "difference approximation" to Laplace's differential equation a 2ujax 2 + a2ujay2 = 0, one obtains the difference equation (55) U(x + h, y) + u(x - h, y) + u(x, y + h) + u(x, y 4 h) = u(x, V). Vibrating string. By taking the difference equation U xx - 'llllfj = 0 as a difference approximation to the vibrating string equation a 2ujax 2 a 2ujay2 = 0, one obtains the difference equation u(x + h, y) + u(x - h, y) - u(x, y + h) - u(x, y - h) = o. In the case of the heat equation a 2ujax 2 - aujay = 0, one has the alternative difference equations U xx - U y = 0 and UX,i: - 'llfj = o. Exactly the same procedure is applicable to first and to higher order partial differential equations, as well as to systems of equations. An alternative approach to the numerical treatment of first order partial differential equations can be based on the fact demonstrated in the previous subsection that the solution of a first order partial differential equation and the solution of the characteristic system of ordinary differential eqs. (42) corresponding to the given first order partial differential equation are equivalent tasks. Heat. Note. The following three subsections represent results obtained at the Institute of Mathematical Sciences, New York University, under the sponsorship of the United States Atomic Energy Commission Contract AT(30-1)1480. Reproduction in whole or in part permitted for any purpose of the United States Government. Elliptic Partial Differential Equations Consider a partial differential equation of second order for a function u of n variables Xb X2, ••• , X n . One writes the equation as follows: (56) The coefficients aij, bi , c are assumed to be constant. called elliptic in a region R if the quadratic form (57) This equation is Laij~i~j i.j is non-negative definite for all values of the ~i such that (6, ~2, ••• , ~n) is 14-72 NUMERICAL ANALYSIS a point in R. A typical example of an elliptic difference equation is Poisson's equation, that is, Note that the condition (57) for ellipticity depends only on the highest order derivative terms of eq. (56). Dirichlet and Neulllann Prohlellls. A typical problem involving elliptic differential equations is one that requires the solution of a boundary value problem. For example, a typical problem would be to solve Lu =1 in a region R given. that u is a prescribed function Uo on the boundary B of the region R. Such a problem is called a Dirichlet problem for eq. (56). If, instead of the values of u, the values of au/av, the normal derivative of u, are prescribed on B, the problem is called a Neumann problem. A more general problem is that in which eq. (56) has to be solved, given that the values of are prescribed on B. Here hI and h2 are known functions. If h2 = 0, it is a Dirichlet problem; if hI = 0, a Neumann problem; if neither is identically zero, it is a mixed problem. Choice of Method. The standard procedure for solving a partial differential equation numerically is to place a rectangular mesh on R, to replace the differential equation at each mesh point by a finite difference approximation, and thus obtain a set of linear equations. The main difficulty in this procedure occurs in the process of solving the set of linear equations. Inverting the matrix of this set of linear equations is usually not convenient because the matrix is generally ill conditioned. A "marching" process such as that used for solving hyperbolic equations by which the values in the lines of the mesh are determined in succession from the values on the preceding lines is not feasible because the values on any line depend on the values of two preceding lines and the boundary data are not sufficient to determine the values on two successive lines. Because of these considerations, the method most frequently used for solving the linear equations is an iteration or relaxation method. Iteration Procedure. In order to discuss iteration methods, some notation is needed. Suppose one wishes to solve a system of equations in p unknowns. Let x denote a p-dimensional vector whose components are the p unknowns xl, X2, ••• , xp; let the p X p matrix, K, of the coefficients NUMERICAL ANALYSIS 14-73 of the unknowns be nonsingular, and let b be a vector whose components are the p nonhomogeneous terms in the set of equations. Then write the system of equations as follows: (58) Kx = b. To solve this system by iteration, put K = N - P, where Nand P are any matrices whose difference is K, and write (58) as Nx = Px + b. Make an estimate of the value of x and call it possibly given by the vector x(l) such that N x(l) = Px(O) + Nx(n+l) = px(n) + b, A better estimate is b. This process can be continued indefinitely. (n = 0, 1, 2, ... ) as the solution of (59) x(O). Define the vector x(n+l) n = 0, 1,2, "', and hope that the sequence of vectors x(n) converges in the limit to the desired vector x. The iteration method defined here is completely general in that the splitting of the matrix K into two matrices Nand P was arbitrary. Each distinct split gives a different iteration procedure. There are, however, two restrictions on the ways of splitting 1(. (a) To find x(n+l) from eq. (59) more easily than to find x from (58), N must be a matrix with an easily found inverse. For example, N might be a diagonal matrix or a lower triangular matrix. (b) For the iteration scheme to converge, it is required that all the eigenvalues of the matrix N- 1P be in absolute value less than 1. It can be shown that this is a necessary and sufficient condition for the sequence x(n) to converge to x, no matter what the original guess x(O) is. Richardson and Lieblllann Iteration Methods. The ideas of the preceding section will be illustrated by applying them to the solution of Poisson's equation (60) a2 u ax2 ° + a2 u ay2 = I(x, y) inside the unit square when ~ x ~ 1, 0 ~ y ~ 1. Assume that the values of u(x, y) are given on the boundary of the square. 14-74 NUMERICAL ANALYSIS Put a square mesh of width lip over the unit square and let u(;.;) = Ui;, i, j = 0, 1, "', p. By the use of the well-known finite difference approximation for the Laplacian (see eq. 55), eq. (60) becomes (61) ui-l,i + ui,i-l - 4Ui,i + Ui+l,i 1 + ui,i+l = "2 fij, P i, f = 1, 2, "', p - 1. Since the values of UOj, Up;" (J = 0, 1, "', p) and of UiO, Uip (i = 0, 1, .. " p) are given, (61) is a system of (p - 1)2 equations for the (p - 1)2 unknowns uii (i, j = 1, 2, "', p - 1). In Richardson's method for solving eq. (61), the following iteration scheme is used: + u- -+l(n) l,) 1 - -f-2 l), ···0 n = 0, 1,2, P The values of Ui/O) are of course the initial guess to the solution of eq. (60). If this method is compared with that in eq. (59), it is apparent that the split is such that N is a diagonal matrix. One disadvantage of Richardson's method when an electronic computer is used is that all the previous values of Ui/ n ) must be stored until all the new values of Ui/ n +1) are found. This disadvantage is avoided in Liebmann's method where the new value of Ui/ n +l) is calculated by using as many new values as are available. Thus, if the values of Uij are calculated in order along each row from left to right and the rows in order from bottom to top, the following iteration scheme would be used: (63) 4u-l,)_(n+l) = u-l -1 ,)_(n+l) + U- - t,)- l(n+l) + U-+l l _en) ,) + U- -+l(n) l,) 1 -f-2 l)' P It can be proved that the method defined by eq. (63) would converge twice as fast as that defined by eq. (62). The rate of convergence can be still further improved by using an extrapolation parameter a, thus obtaining what is called Liebmann's extrapolated NUMERICAL ANALYSIS 14-75 method. The iteration scheme is now this: 4Ui/n+1) = 4(1 - a)ui/ n ) + a [ Ui_l.j (n+l) + Ui,j-l (n+l) + Ui+l,j (n) + Ui,j+l (n) 1f ] - p2 ij . The value of a for which convergence is fastest is found by solving the equa.: tion a 2 tm 2 - 4a +4 = 0, where tm is the largest eigenvalue of the Richardson scheme eq. (62). For the case considered, tm = cos (-Trip). For a rectangular mesh with p divisions in one direction and q divisions in the other, tm = ![cos (nip) + cos (7rlq)]. In general the use of the extrapolated Liebmann method with the best value of a will be much faster than the unextrapolated Liebmann method. Line Iteration Schemes. Another iteration method which is useful in many cases is given by the following scheme: (64) 4Ui/n+1) - Ui,j-l (n+l) - Ui.j+l (n+l) n = Ui_l./ +l) + Ui+1./ n ) - 1 2 p fij. In this scheme instead of solving for the values of U at a point ij, salve for all values of U on the ith column in terms of the values of U on the (i - l)-th and (i l)-th column. That is why eq. (64) has been written with the left-hand side containing all the u-values on the ith column. Since at each step the value of the right-hand side is known for all values of j, the three-term relation defined by eq. (64) is solved for the values of U on the ith column. (The method of solving the three-term relation is explained in the subsection on Hyperbolic Partial Differential Equations.) Instead of solving for the values of U on a column, one may solve for the values of U on a row. In that case use the following scheme: + (65) 4u··(n+1) tJ U·+l .(n+l) t ,J U·t - 1 .J.(n+l) - Ui,j-l (n+l) + Ui,i+l (n) - 1 2 P f ij· Again this three-term recurrence scheme is solved for the values of U on the jth row starting with j = 1. The Method of Peaceman and Rachford (Ref. 35). This seems to be one of .the quickest iterative methods for solving an elliptic differential NUMERICA~ 14-76 ANALYSIS equation. It is essentially a line iteration scheme which uses columns and rows alternately. The explicit description of the method is contained in the following formulas: Ui_l./ 2n 1 + ) - (2 + Pn)Ui/ 2n +1) + Ui+1./ 2n +1) = -Ui,j-l (2n) Ui.j-l (2n+2) - (2 + (2 - Pn)Ui./ 2n ) - Ui.j+l (2n) + ~fij; p + Pn)Ui/ 2n + 2 ) + Ui,j+l (2n+2) = -Ui_1./ 2n +1) + (2 - Pn)Ui/ 2n +1) - Ui+1./ 2n + 1 )+ 1 !ij. p2 Here Pn is an extrapolation parameter which is to be determined so that the method will converge as quickly as possible. In the present case Peaceman and Rachford suggest putting Pn = Pk if n == k (mod p), where • 2 Pk = 4 sm (2k + 1)71" . 4p Variational Principle. An important characteristic of elliptic differential equations is that they can be obtained as the Euler equations of problems in the calculus of variations. Physically, this implies that the problem possesses an energy integral whose minimum value is given by the solution of the elliptic partial differential equation. For exq,mple, in Dirichlet's problem the integral (66) ff 2 (V'u) dx dy R must be a minimum in the domain of all functions U satisfying the preassigned boundary conditions. In problems with mixed boundary conditions the integral (66) must be modified (for details see Ref. 37). As another example, in elasticity problems involving plates, the integral (67) f f(~U)2 dx dy must be a minimum in the domain of all functions U satisfying the preassigned boundary conditions. For numerical purposes the energy integral can be approximated by a sum involving the values of the unknown function u at a set of points inside the region R. Then choose values of U so that the sum will be a mini- NUMERICAL ANALYSIS 14-77 mum. For example, (66) would be approximated by L L (68) i [(Uij - Ui_l.i)2 + (Uij - Ui.j_l)2], j and (67) by (69) LL i [Uij - !(Ui.j+l + Ui.j-l + Ui+1.j + Ui_1.j)]2. j As these illustrations show, the sum is a quadratic form in the values Uij. By differentiation with respect to Uij a set of linear equations is obtained whose solution will make the sum a minimum. Simple algebra shows that when this method is applied to eqs. (68) and (69) the standard difference equations for Laplace's equation or the biharmonic equation are obtained. Any iteration method which at each step reduces the value of the sum must automatically converge to a minimum value. Use of this fact easily shows that the various schemes proposed above do converge to a solution. The variational principle is also useful in determining how the boundary conditions should be taken into account. Hyperbolic Partial Differential Equations The equation of the vibrating string will be used to illustrate some finite difference methods for solving problems involving a s'econd order hyperbolic partial differential equation. If the end points of the string are held fixed at x = 0 and x = 1, the deflection of the string u(x, t) is determined from the initial deflection u(x, 0) = f(x), and the initial velocity Ut(x, 0) = g(x). The conditions describing the motion are: P.D.E. 1 ""2 Utt = Uxx , for 0 < x < 1, t > for 0 l/c, the solution U will not converge to u for all initial displacements. This may be verified by noting that the solution of the initial value problem for the infinite string is given by x ct + g(~) d~. (72) u(x, t) = Mf(x + ct) + f(x - ct)] + -1 2c i x-ct Formula (72) shows that the solution u(x, t) depends solely on the initial data in the interval (x - ct, x + ct). The solution U(x, t) depends solely on the initial data in the interval [x - (tIA), x + (t/A)]. Hence if A > l/c, it is possible to vary f and g in the intervals [x - ct, x - (tIA)] and [x + (lIA), x + ct] in such a way that the solution u(x, t) is changed, but yet U(x, t) is unaffected. Hence if A > l/c, the solution U(x, t) cannot converge as h, k ~ 0 since it would have to converge to different values. Hence it is necessary for convergence that A ~ l/c, i.e., the "domain of dependence" for the solution of the finite difference equation should contain the domain of dependence of the solution of the differential equation. In fact, if A ~ II c, U does converge to u as h, k ~ o. The proof of convergence may be made to rest upon the Fourier series representations of the solutions of eqs. (71) and (70), namely, 00 U(x, t) = ~ (An cos J.l.n t + Bn sin J.l.nt) sin nx, n=l where J.l.n is determined from the condition sin J.l.nkl2 = AC sin nhl2; and 00 u(x, t) = ~ (an cos nct n=l + bn sin nct) sin nx. 14-80 NUMERICAL ANALYSIS Roundoff and Truncation Errors. The calculation of U is in practice effected by rounding to a finite number of decimal places. The equations which determine U are + Ri,o, Ui,Q =h Ui,l = 2 Ui,j+l = C UO,j = UN,j C2A2 fi+l + (1 2A2 U + ,j i l 2 A2)h - C + (2 - C2A2 + .2 h-l + kYi + Ri,l, 2C2A2) Ui,j + 2A2U _1.j i C Ui,j-l + Ri.j+b = 0, where Rp,q is the roundoff error. The truncation error Tp,q is defined by substituting U into eq. (71) as follows: + Ti,Q, Ui,O =h Ui,l = -h+l 2 C2A2 + (1 - 2 A2)h C C2A2 + -2h - l + kYi + Ti,l, It is easily verified that Ti,o = 0, Ti,l = O(k3 ), Ti,j = O(k4) where O(le n ) represents a quantity which is bounded in absolute value for all sufficiently small Ie by Mk n with some constant M. It is reasonable to require that the roundoff error be of the same order of magnitude as the truncation error or smaller, in order that the number of digits carried in the calculation be appropriate for the interval size. With this restriction, the total error ei,j is O(T2k2) for any finite time, T, where ei,j = Ui,j - Ui,j and ~ j ~ Tile. ° IInplicit Schetnes. The restriction klh = A ~ 1/c may be relaxed by using an implicit scheme. That is, it is possible to take larger time steps at the expense of more involved calculations as follows: P.D.E. Ui,j+l - 2Ui ,j + Ui,j-l c 2 - - - - - 2- - - - - = -2 [a?(Ui +1.j+1 k h + Ui-l,j+l) + (1 - + U i -1.j) 2Ui ,j_l. + Ui-l,j-l)]. 2 2a )(Ui +1.j - 2U i ,j + a 2 (Ui+1.j_1 - 2Ui ,j+1 NUMERICAL ANALYSIS 14-81 The equations when solved for the unknowns at the have the form (73) (acX)2U i +l,i+l - [1 U + 1)-st time step + 2(acX)2]Ui ,i+l for i = 1, 2, ... , N - 1, where W involves information on the two preceding lines U and j - 1). The labor involved in solving eq. (73) is minimal since the (N - 1) X (N - 1) matrix of coefficients is in triple diagonal form. At the same time the condition on X which insures convergence of U to u as h, Ie ~ 0 is 1 X2 C2 ~ - 1 - 4a2 and no restriction for .%: , ~ a2• Solution of Triple Diagonal SysteIns. + CIX2 + 0 + ... a2xl + b2X2 + C2Xa + 0 + .. ~ o + aaX2 + baxa + CaX4 + 0 The equations +0 +0 +0 b1Xl +0 o + o + + ... = Yl = Y2 = Ya + aN-IXN-2 + bN-1XN-l + CN-IXN = YN-l +0 = YN + aNXN-l + bNxN may be solved by eliminating the unknowns in succession from the equations. By starting at the top the system can be put in the form + ... +0 0+··· + 0 0+··· +0 The numbers CK and Y K may be recursively computed from the formulas (74) CK CK =----bK - aKC K - 1 YK for K = 2, 3, ... , N. = YK - aKYK - bK aKCK - 1 _1 , NUMERICAL ANALYSIS 14-82 It is now easy to solve for the XN = XK beginning with XN as follows: YN , (75) for K = N - 1, N - 2, "',1. The von Neumann Criterion for Convergence. A quick method for heuristically testing the convergence of a finite difference method has been attributed to von Neumann. In the case of linear differential equations with variable coefficients, the method consists in replacing the coefficients by constants and then finding all solutions of the difference equation of the form U(x, t) = e'Yte i /3 x , with {3 real. If Ie'Yt I ~ 1 for t ~ 0, for all (3 and for all admissible values of the coefficients, the finite difference method is said to be stable, otherwise not. The von Neumann "test" for convergence is the same as for stability. In practice, this test for convergence is as simple as any a priori calculation could be; in addition, it has been shown to be a sufficient condition for convergence' for a large number of cases. Parabolic Partial Differential Equations Finite difference methods for parabolic equations are similar to those for equations. The present discussion' will be restricted to equations of the first order in time and second order in one or more variables. Illustrative methods will be given for: (a) The linear heat flow equation in one dimension. (b) A quasilinear equation in one space variable and time. (c) A linear parabolic equation in two space variables and time. For diffusion or heat flow in one dimension there is an initial value problem consisting of a partial differential equation (P.D.E.), an initial condition (I.C.), and boundary conditions (B.C.) for a function u(x, t). In the simplest case, these are: hyperbo~ic P.D.E. 1 -Ut = u xx , for 0 0 (j "(76) I.C. U(X, 0) = f(x), B.C. u(O, t) = u(l, t) = 0, for 0 as h, k ~ 0, the solution of the difference eqs. (77) diverges for all but special cases in which the initial functionf(x) has a terminating Fourier series. One says that the equations are unstable under these circumstances and that (78) is a condition for stability. For general discussions of convergence and stability, see Refs. 39, 40, and 43. The convergence as k ~ is slower, at least in a formal sense, than it is for the hyperbolic problem, and its rate depends upon the smoothness of the initial function f(x). If condition (78) is satisfied and f(x) is analytic for ~ x ~ 1, the error ei,j of the approximation (77) is O(Tk) for t in a finite interval ~ t ~ T. One can of course also write ei,j = O(Th2) because of the relation (78). The method is more accurate in the special case in which hand k are s,o chosen that uk 1 (79) h2 . = 6· °! ° ° ° It is easy to verify by Taylor's series expansions that in this case there is 14-84 NUMERICAL ANALYSIS a cancellation of the first order error terms coming from the two members of the first eq. (77). In consequence, if f(x) is analytic, eij = O(Tk2) = " O(Th4). A condition of the form (78) is perhaps not unexpected from the point of view of the domain of depe~dence of the differential equation, which is not confined to a small interval as it was for the vibrating string problem. That is, u(x, t), for any t > 0, depends on all the initial data, i.e., on the values of f(x) for the entire interval 0 ~ x ~ 1. For any finite values of h and k the difference equations of course possess a restricted domain of dependence, but as the mesh is refined this domain opens out (because eq. 78 requires that k vary as h2 ) so as to include all past values of the function. hnplicit Equations. Implicit difference equations can be constructed for the heat flow problem in many ways. For example, one can replace the first eq. (77) by the equation (80) Ui.j+l uk Ui,j 1 = h2 [a(Ui+1.j+l - 2Ui,j+l + (1 + U i-1.j+l) - a)(Ui+l,j - 2U i ,j + Ui-1.j)], where a is a constant. The resulting method reduces to the foregoing explicit method for a = 0, to the so-called Crank-Nicholson method (Ref. 41) for a = 72 and to the method of Laasonen (Ref. 42) for a = 1. The condition for convergence of the solutions of eq. (80) as h, k -7 0 is (81a) (81b) uk 1 -2 < - - h = 2 - 4a if 0 ~ a No restriction if ! 0 for 0 by the difference equation (83) = 0, C > 0, AC - B2 > ° can be approximated by an analog of the general implicit eq. (80). Because of the number of variables it is convenient to introduce a slightly different notational convention for this problem by calling the increments At, AX, Ay and by relating the time variable t to a superscript n as follows. Let Uj,Zn = U(j AX, lAY, nAt), = f(Xb X2, ... , xn) + AlgI (Xl, X2, ... , xn) + ... + Amgm(XI, X2, where pliers. AI, A2, ••• , Am ... , x n), are undetermined multipliers called Lagrangian multi- OPERATIONS RESEARCH 15-13 Then, in order to determine the extremal values of u = l(XI, X2, "', x n ), all that is necessary is to obtain the solution of the system of eqs. (10) for the unknowns Xl, X2, " ' , X n, Ab A2, "', Am. (See Ref. 5.) EXAMPLE. Find the point in the plane X + 2y + 3z = 14 nearest the origin. The problem may be converted to that of finding values of x, y, and z which minimize the square of the sphere diameter u = D2 = x 2 + y2 + Z2 subject to g = X + 2y + 3z - 14 = o. Form: ¢ = X2 + y2 + Z2 - A(X + 2y + 3z - 14). Take partial derivatives of ¢ with respect to x, y, z, and A. a¢ -=2X-A ax a¢ - = 2z - 3A az ' ' a¢ - = 2y - 2A, ay a¢ - = aA X + 2y + 3z - 14. Setting these four partial derivatives equal to zero and solving the system of four simultaneous equations then yields: X = 1, y = 2, z = 3, and A = - 2; that is, the point in the plane X + 2y + 3z = 14 nearest the origin IS (1, 2, 3). Other examples of the use of Lagrangian multipliers can be found in Ref. 6 and in texts on advanced calculus. Modified Lagrangian Multiplier Method (Ref. 1, Chap. 10). Many practical extremal problems have the added restriction that all variables must be non-negative; for example, it makes no sense to produce -n units of a given product. Furthermore, the restrictions may be given in the form of inequalities instead of equalities. Since the Lagrangian multiplier method does not guarantee the non-negativity of the solution variables, a modification must be made. EXAMPLE. Consider an economic lot size inventory problem (of the type described above) involving two products, with a restriction on the total available warehouse space. If WI and W 2 are the respective unit storage requirements, and an average inventory level is assumed equal to one-half the lot size q, the total space requirement can be written as (12) OPERATIONS RESEARCH 15-14 If WI = 5 cu ft, W 2 = 35 cu ft, and S = 14,000 cu ft, eq. (12) becomes (13) or, equivalently, (14) 5qI + 35q2 ~ 28,000. The problem (for two products) can then be stated as: Problem. Determine non-negative values of qi and q2 which minimize TEC = (!C l1 Tql + ~ CSIRI) + (!C I2 Tq2 + ~ CS2R2) q2 ~. subject to the restriction of eq. (14). Solution. Define an undetermined multiplier A such that (15) A< 0 when S - !~Wiqi = 0; A= 0 when S - !~Wiqi > O. Form (16) that is, (17) cp = (1z;C l1 Tql . qi1CslR ) + (1Z;C12 Tq2 + 1) + q2 Cs2 R 2 + A(S - ! WIql I - ! W 2Q2)' Since A(S - Y2Wiqi) is always identically zero by eq. (15), cp = TEe. Taking partial derivatives of cp with respect to qi and q2 yields (18a) and (18b) a(TEC) 1 - - - = Z;C I2 T aq2 Setting eqs. (18) equal to zero yields (19a) and (19b) 1 -2 q2 1 Cs2 R 2 - Z;AW2 • OPERATIONS RESEARCH 15-15 For each product, the quantities Ri, Csi, Cli, Wi, and l' are known, but A is still unknown. However, for any arbitrarily assigned value of A, qi and, hence, }-2~Wiqi can be calculated. If }-2~Wiqi exceeds S (see eqs. 12 and 13), the lot sizes are too large. In this case, decrease A repeatedly and recompute until }-2~lViqi = S has been obtained. If }-2~Wiqi < S for all negative A, set A = 0 in eq. (19). The resulting q/s will allow the smallest possible total costs for the company with existing warehouse space S. TABLE 1. STORAGE SET BY VARIOUS A -0.0000 -0.0012 -0.0024 -0.0036 -0.0060 -0.0084 -0.0120 a Assumes: T ql* 816 813 810 806 800 794 784 q2* 756 721 690 663 617 580 535 A VALUES a !(5ql + 35q2) 15,270 14,650 14,100 13,618 12,790 12,135 11,323 = 12 months and Product Ri e,i eli Xl X2 2400 4800 $100 $ 25 0.060 0.035 Values of }-2(5ql + 35q2) are calculated in Table 1 in order to determine the correct value of A. As indicated in Table 1, A should be approximately equal to -0.0024 so that ql* = 810 and q2* = 690. N ole. For this example, without any restriction on storage space (minimizing TEC, rather than ¢), ql * = 816 and q2* = 756. Another approach to a modified Lagrangian multiplier technique can be found in Ref. 7. Such modified Lagrangian multiplier techniques, when applicable, are most cumbersome and impractical for a large number of variables. (See Ref. 1, Chap. 10.) Where the objective function and the restrictions are linear (and for some other special cases), the techniques of linear programming are applicable (see Sect. 4). Other Analytic Methods of Solution. There are many other analytic methods of solution much more sophisticated than those presented here. A number of the models arising in specific problems require the development of special methods for their solution. For these more sophisticated OPERATIONS RESEARCH 15-16 and special methods, see journals such as Operations Research, Management Science, Econometrica, Naval Research Logistics Quarterly, and the publica.:. tions of the RAND Corporation (Santa Monica, CaliL). See also Refs. 7, 8, and 9. Numerical Solutions Numerical techniques of deriving a solution from a model consist of substituting numbers for the symbols in the model and finding that set of substituted numbers which yields the maximum effectiveness. Some numerical procedures are trial-and-error procedures into which one seeks to build some rationale for the selection of subsequent trials. Others are so-called iterative procedures in which one converges to an optimum solution through successively better steps. Newton's Method. An example of a. quite useful trial-and-error procedure is Newton's method for solving equations, which is a procedure for determining, within any desired degree of accuracy, the roots of an algebraic equation. The method is based on the fact that, for a short distance, the tangent to a smooth curve is a good approximation to the curve. Newton's method may be formulated as follows. Let f(X) = 0 be the equation under consideration. A root of this equation is the abscissa of a point at which the curve Y = f(X) crosses the X-axis. Start with a trial solution, say Xo (see Fig. 1). This value Xo determines a point P on the curve whose coordinates are (Xo, Yo). The tangent to the y x FIG. 1. Figure for Newton's method (Ref. 1). curve at P is then drawn and will intersect the X-axis at (XI, 0). If the curve and the tangent are nearly coincident over the range (Xo, Xl), the value X I will be the first a.pproximate root of the equation. Furthermore, 15-17 OPERATIONS RESEARCH using the fact that the slope of the tangent at P is given by f'(Xo) , namely the derivative of f(X) evaluated at X = X o, yields f(Xo) (20) Xl = Xo - - - . f'(X o) The procedure may be repeated as many times as necessary where, in' general, (21) Whether and how fast the process will converge depends on the function f(X) and the initial value Xo. Conditions 'favorable to convergence are evidently that f(Xo) be small and f'(Xo) be large. To illustrate Newton's method, consider f(X) = X3 - 3X2 + 4X - 2. Although there are many devices which can be used to locate integers between or at which roots will lie, arbitrarily take Xo = 2 as the trial solution. For the particular f(X) chosen, X = 1 is obviously a solution, and that is what we wish to approximate by Newton's method. The deviation from the value X = 1 will, of course, measure the degree of accuracy of this approximation. Now f'(X) = 3X2 - 6X 4, so that f(2) = 8 - 12 + 8 - 2 = 2, + f' (2) = 12 - 12 + 4 = 4. Hence, using eq. (21) yields Xl = 2 - f = 1.5. By continuing in this manner, f(1.5) = + 4(!) - 2 = t, 6(!) + 4 = i, (!)3 - 3(!)2 and f'(1.5) = 3(!)2 so that X 2 = 1.5 - -f:r = 1.143. 15-18 OPERATIONS RESEARCH Continuing once more gives f(1.143) = 0.147, f'(1.143) = 1.060, so that 0.147 Xa = 1.143 - - - = 1.004. 1.060 One could continue in this manner, measuring at each stage of the iterative procedure the value of f(X i ) 'to indicate how quickly one is converging to a solution [obviously, at a point of solution X*, f(X*) = 0], and, hence, obtain this solution within any prescribed degree of accuracy. Excellent examples of converging iterative procedures are to be found in the several techniques of linear programming. These are discussed in Sect. 4. The Monte Carlo Technique In many mathematical models, it is necessary to evaluate certain terms in the model before a solution can be derived. Especially where probability concepts are involved, it may not be possible or practical to evaluate a given function (within a model) by mathematical analysis. Such expressions, however, can be evaluated by the Monte Carlo technique. Specifically, the Monte Carlo technique is a procedure by which one can obtain approximate evaluations of mathematical expressions which are built up of one or more probability distribution functions. The Monte Carlo technique consists of simulating an experiment to determine some probabilistic property of a population of objects or events by the use of random sampling applied to the components of the objects or events. This statement can best be clarified by means of examples. The RandoIn Walk ProbleIn. The discovery of the Monte Carlo technique is said to be due to a legendary mathematician observing the ' perambulation of a saturated drunk. The mathematician wondered how many steps the drunkard would have to take, on the average, to reach a specified distance from his starting point, if it were assumed that, at each step, there was an equal probability of the drunkard stepping off in ariy direction. EXAMPLE. To illustrate how the Monte Carlo technique can be applied to this problem of the "random walk," an estimate can be obtained of the probable distance traveled after five steps of equal size. (It is further ,assumed, for simplicity of presentation, that these steps are at 45°, 135°, 225°, or 315°.) To do this, refer to Table 2, which is a portion of a two~ digit random number table. ' OPERATIONS RESEARCH TABLE 2. RANDOM NUl\IDEHS 15-19 (Ref. 1) 09 73 25 33 54 20 48 05 42 26 89 53 OJ flO 25 29 80 79 99 70 76 64 19 09 80 53 89 64 37 15 01 47 50 67 73 35 42 93 07 61 86 96 03 15 47 34 24 23 38 64 67 80 20 31 03 35 52 90 13 23 48 40 25 11 66 76 37 60 65 53 80 20 15 88 98 95 63 95 67 95 90 61 33 67 11 90 04 47 43 68 17 02 64 97 77 39 00 35 04 12 29 82 08 43 17 27 29 03 62 17 49 16 36 76 68 06 06 26 57 79 57 01 9.7 33 64 47 08 76 21 57 17 05 02 35 53 34 45 02 05 03 07 57 05 32 52 27 18 16 54 96 68 24 56 70 47 50 06 92 48 78 36 35 68 90 35 69 30 66 55 80 73 34 57 35 83 61 26 48 75 42 70 14 18 48 82 65 86 73 28 60 81 79 05 46 93 33 90 38 82 52 98 74 52 87 03 85 39 47 09 44 11 23 18 82 35 19 40 62 49 27 92 30 38 12 38 91 97 85 56 84 52 80 45 68 59 01 50 29 34 46 77 54 96 02 73 67 31 34 00 48 14 39 06 86 87 90 80 28 50 51 56 82 89 75 76 86 77 80 84 49 07 22 10 94 05 58 32 50 72 56 82 48 83 13 74 67 00 78 01 36 76 66 79 51 69 91 82 60 89 28 60 29 18 90 93 97 40 47 36 78 09 52 54 47 56 34 42 06 64 13 33 01 10 93 68 50 52 68 29 23 50 77 71 60 47 07 56 17 91 83 39 78 78 01 41 48 12 35 91 8~ 11 43 09 62 32 76 56 98 68 05 74 35 17 03 05 17 17 77 66 14 46 72 40 25 22 85 70 27 22 56 09 80 72 91 85 50 15 14 48 14 58 45 43 36 46 04 31 23 93 42 77 82 60 68 75 69 23 02 72 67 74 74 10 03 88 73 21 45 76 96 03 11 52 62 29 95 57 16 11 77 71 82 42 39 88 86 53 37 90 22 40 14 96 94 54 21 38 28 40 38 81 55 60 05 21 65 37 26 64 45 49 33 10 55 . 60 91 69 48 07 64 45 45 19 37 93 23 98 49 42 29 68 26 85 11 16 47 94 15 10 50 92 03 74 00 53 76 68 79 20 44 86 58 54 40 84 46 70 32 12 40 16 29 97 86 21 28 73 92 07 95 35 41 65 46 25 54 35 75 97 63 94 53 57 96 43 75 14 60 64 65 08 03 04 48 17 99 33 08 94 70 23 40 81 39 82 37 42 22 28 07 08 05 22 70 20 92 08 20 72 73 00 23 64 58 17 19 47 55 48 52 69 44 72 11 37 04 52 85 62 83 46 66 73 13 17 26 95 67 97 73 45 27 89 34 20 74 07 75 40 88 77 99 43 87 98 74 53 87 21 37 51 59 54 16 68 92 36 62 86 93 43 78 24 84 59 37 38 44 87 14 29 48 31 67 16 65 82 91 02 26 39 39 19 07 25 45 61 04 11 22 95 01 25 20 96 93 18 92 59 63 42 33 92 25 05 58 21 92 70 52 26 15 74 14 28 05 94 59 66 25 49 54 96 80 05 35 99 31 80 88 24 76 53 83 52 94 54 07 91 36 75 64 26 45 01 24 05 89 42 39 63 18 80 72 09 38 81 93 68 22 24 59 54 42 86 45 96 33 83 77 86 11 35 60 28 25 96 13 94 14 10 38 54 97 40 25 96 62 00 77 61 54 77 13 93 96 69 97 02 91 27 28 45 12 08 93 23 00 48 36 35 91 24 92 47 65 23 90 78 70 33 28 10 56 61 71 72 33 52 74 24 95 93 01 29 17 23 56 15 86 90 46 54 51 43 02 14 14 49 19 97 06 30 38 94 87 20 01 19 36 37 11 75 47 16 92 74 87 60 81 52 52 53 72 08 41 04 79 46 51 05 15 40 43 34 56 95 41 66 88 70 66 92 79 88 70 00 15 45 15 07 00 85 43 53 86 18 66 59 01 74 31 74 39 6743 04 79 54 03 71 24 68 00 54 57 23 06 33 56 85 97 84 20 05 39 11 96 82 01 41 89 28 66 45 18 63 52 85 11 08.62 48 26 45 24 02 84 04 44 99 90 88 96 39 09 47 34 07 35 44 13 18 18,51 62 32 41 94 15 09 49 89 43 54 85 81 88 69 54 19 94 37 54 87 30 95'10 04 06 96 38 27 07 74 20 15 12 33 87 25 01 62 52 98 94 62 46 11 OPERATIONS RESEARCH 15-20 Use the following symbolism: 1. The lamppost is represented by the origin of the X- and Y-axis. See Fig. 2. y x FIG. 2. Plotting of points (x n , xv) (Ref. 1). 2. The first digit of the two-digit random number selected from the table represents one unit of X, positive if even or zero, negative if odd. 3. The second digit of the same two-digit random number selected represents one unit of Y, positive if even or zero, and negative if odd. 4. (xn, Yn) represents the position of the drunkard at the end of the nth phase. 5. d n equals the distance of the drunkard from the lamppost at the end of the nth phase; that is, d n2 = xn 2 + Yn 2 • To start at random, select the two-digit number, say in column 10 _and row 6 of Table 2, and, by reading down, obtain the following five numbers: 36, 35, 68, 90, and 35. These numbers may then be arranged and the drunkard's moves obtained as shown in Table 3. The points (xn, Yn) may also be plotted as in Fig. 2. TABLE 3 Phase n First Digit Second Digit 1 2 3 4 3 3 6 5 6 9 3 5 8 0 5 Point Location (Xn, Yn) (-1, 1) (-2,0) (-1, 1) (-2,2) (-3, 1) OPERATIONS RESEARCH 15-21 In this example, then, one estimate is that the drunkard will be 3.16 units from the lamppost at the end of the fifth phase. This is obtained as follows: d5 2 = d5 2 = d5 = + 9 + 1, vIW = X5 2 2 Y5 , 3.16. This procedure must be repeated for different random numbers in the table so that a group of estimates of the desired distance is obtained. The estimates in this group can then be averaged to yield an average estimated distance from the lamppost. In general, the estimates will improve as the number of such samples is increased. The accuracy of the estimate will be proportional to the square root of the number of samples. More generally, from many such simulated trials, the probability of the drunkard's being a specified distance from the lamppost for any number n of irregular zigzag phases is estimated. As a point of interest and as a basis for the reader comparing his own Monte Carlo solutions, it might be pointed out that, for this example, an analytic solution is obtainable and is given by i.e., the most probable distance of the drunkard from the lamppost, after a large number of irregular phases of his walk, is equal to the average length a of each straight track he walks, times the square root of the number n of phases of his walk. For an illustration of the use of the Monte Carlo technique for the solution of problems involving normal distributions, see Sect. 6. The use of the Monte Carlo technique for any probability distribution function can be found in Ref. 1, Chap. 7. Reference 1 discusses only the normal distribution, but the treatment is general and applicable to any probability distribution function. For a discussion of the nature of tables of random numbers and a bibliography of tables and works on this subject, see Ref. 10. Examples of other uses of the Monte Carlo technique can be found in Ref. 1, Chaps. 7, 14, and 17. See also Refs. 5, 11-13. 3. INVENTORY MODELS ProbleIn Statelllent. Inventory problems are concerned with minimizing the sum of costs such as those due to (a) carrying inventory, (b) setup, (c) shortage, (d) obsolescence, and (e) change of work force level. Inventory problems require the determination of (a) how many (or much) to order (i.e., produce or purchase) and/or (b) when to order. OPERATIONS RESEARCH 15-22 This section will introduce the kind of analysis that yields symbolic models of inventory processes. The mathematical models and solutions presented here pertain to specific inventory situations and progress' from the most elementary to somewhat complex ones. For a complete definition and classification of the characteristics of inventory problems, see Ref. 1, Chap. 7. Decisions. The general class of inventory problems to be considered involves decisions concerning inventory levels. These decisions can be classified as follows: (1) The time at which orders for goods are to be p!aced is fixed. The quantity to be ordered must be determined. (2) Both-the order quantity and order time must be determined. Cost. The costs associated "rith inventory are 6f three types: (1) setup cost, the fixed cost per lot of obtaining goods (purchasing or manufacturing); (2) inventory holdirig cost, including cost of money spent in obtaining the part, storage, obsolescence, handling, taxes, and insurance; (3) shortage cost, cost resulting from it delay in supplying the goods or an inability to fill the order at the time of request. Variables. The three major classes of variables in an inventory problem are: (1) cost variables,(2) demand variables, i.e., relative to customer demand for goods; (3) order variables, i.e., relative to obtaining the necessary goods. i , EleInentary Inventory Models (see Ref. 1, Chap. 8) SYInbols. The following symbols are used throughout the discussion of the elementary inventory models. q qi q* r ri S Si Si input, or quantity ordered input which occurs at the beginning of the ith time interval optimum order quantity requirements per time interval requirements for the ith time interval inventory level inventory level at beginning of ith interval inventory level at end of ith interval. Note. Si = Si - ri, and Si Si-l S* t ts ts * T R Cl C2 C8 TEC + qi optimum inventory level at the beginning of a time interval an interval of time interval betw~'en placing orders, in units of time . optimum interval between placing orders period for which a policy is being established total requirement for period T . holding cost p,er unit of goods for a unit of time shortage cost per unit of goods for a specified period setup cost per production run total expected relevant cost = OPERATIONS RESEARCH 71 EC* 15-23 minimum (optimum) total expected relevant cost probability of requiring r units, where r is a discrete variable probability density function of r, where r is a continuous variable probability of requiring S units or less, where r is a discrete variable cumulative probability function of r, where r is a continuous variable per) fer) per ~ S) is F(r) F(S) fIr) dr, probability of requiring S or less units, where r is a continu- ous variable. Model I. (See Fig. 3.) Given: (a) Demand is fixed and known. (b) Withdrawals from stock are continuous and at a constant rate. (c) No shortages are permitted. The variable costs are: CI and Cs (see Symbols above). I( I ts ,I E ts ts I ~ ~ )I IE T )I )I E FIG. 3. :ts Inventory situation for Model I (Ref. 1). Problem. To determine: (1) how often to make a production run; (2) how many units should be made per run. Cost Equation. (22) Solution. (23) q* (24) ts* ~ = ~ = 2--. TC I 2--, RC I (25) . Note that Model I is a special case of Model II, wherein C2 = 00. Accordingly, by letting C2 ~ 00 in eqs. (27-30), one readily obtains eqs. (2325). 15-24 OPERATIONS RESEARCH Model II. (See Fig. 4.) Given: (a) Demand is known and fixed. Shortages are permitted. (b) s q ts IE ) IE ts ) 1( ) IE T I( FIG. 4. ts )'1 )'1 Inventory situation for Model II (Ref. 1). Cost Equation. (26) 1 TEC(q, S) = -S2C1 T 2q + (q - S)2C T 2 2q CR + _8_. q Solution. (27) (28) (29) (30) TEC* = V2RTC 1 Cs ~. ~~ Model III. Given: (a) Estimated variable demands and inputs, (b) discrete units, (c) shortages permitted (finite cost of shortage), (d) discontinuous distribution over time of withdrawals and input at a discontinuous rate, (e) known and constant reorder cycle time. In this model and in Model VI, the cost of carrying an inventory of parts until they are used is not taken into consideration. Rather, in this elementary inventory situation, the cost of having excess parts that are never used is balanced against the cost of being short of parts ,,,hen needed. Problem. To determine how many units of a given part should be ordered at the time of the initial purchase order. Here, one is balancing the cost of OPERATIONS RESEARCH 15-25 having excess parts that are never used against the cost of being short of parts when needed. No consideration is given to the cost of carrying the inventory of parts until they are used. Cost Equation. S (31) TEC(S) = C1 00 :E P(r)(S - r) + C2 :E P(r)(r - S). Solution. The optimum value, S*, is given by that value of S which satisfies the inequalities: ' (32) P, (r:::;; S - 1) - C < C +2 C < Per 1 2 :::;; S). - For further discussion, derivation of this solution, and an example of its use, see Ref. I, Chap. 8. Model IV. Given: (a) Estimated variable demand and inputs; (b) continuous (rather than discrete) units; (c) shortages permitted, i.e., finite cost of shortage; (d) continuous distribution over time of withdrawals and input at a continuous rate; (e) known and constant reorder cycle time; (f) negative orders, i.e., returns, not considered. Problem. To determine the initial order quantity, where one balances the holding cost against the shortage cost. Cost Equation. (33) TEC(S) = C, i s (S ~ r)f(r) dr + c2lOO(r - S)f(r) dr. Solution. (See Ref. 1, Chap. 8.) The total expected cost is minimum for that value S which satisfies the condition F(S) (34) 'is == C fer) dr = o ,2 C1 + C2 • Model V. Given: Conditions of Model IV plus a significant reorder lead time, i.e., one must take into account the lapse of time between the , I placing of an order and the receipt of the goods. Problem. To determine how much (many) should be ordered for the lcth day hence (where the reorder lead time is lc days). Cost Equation. Let lc So = number of days in the order lead time, = the stock level at the end of the period preceding, the placing of the order, 15-26 OPERATIONS RESEARCH qI, q2, "', qk-l = quantities already ordered and due to arrive on the 1st, 2nd, "', (k - 1)st days hence, qk = quantity to be ordered for delivery k days hence, k R' = L ri, the total requirement over the order lead time, i= 1 S' = total of amount available in stock at end of previous. period and amounts ordered over the present 7-day period; i.e., k-l + L qi + qk. S' = So The problem is to determine the value of qk which will minimize the total expected cost over. the lead time period, i. e., k days. However, since orders in the amounts ql, q2, .. " qk-l have already been placed, the total expected cost for the first k - 1 days has already been determined and is no longer subject to control. Hence, equivalently, the problem is one of determining the value of qk which will minimize the tota,l expected cost for the kth day only. Solution. The stock at the end of the kth period can be expressed as k-l (35) Sk = So +L Then, since k qi + qk - L ri •. k-l (36) S' = So +L qi + qk, i=l and (37) the total expected cost for the kth day will be given by ~ (38) TEC(S') = Cl r (S' Jo + C2i R')f(R') dR' 00 ~ (R' - S')f(R') dR'. Equation (38) is equivalent to eq. (33); therefore the optimum value of S' is given I by (see eq. 34) 8' (39) F(S') == C . r feR') dR' = Jo 2 Cl + C2 •. Once having determined the optimum value of S', namely S'*, qk * can be determined from eq. (36), i.e., (40) q,c * = S'* - ( So + k-l) ~ qi • l=l OPERATIONS RESEARCH 15-27 See Ref. 1, Chap. 8, for further discussion and an example of the use of this solution. Model VI. (See Fig. 5.) Given: Conditions of Model III except that withdrawals from stock are continuous and at a constant rate. S r r-S (a) FIG. 5. Illustration for Model VI (Ref. 1). Problem. To determine how many parts should be ordered at the time of the initial purchase order. Cost Equation. (a) For r ~ s. For a given value of r, the average number of units in stock over the order cycle period is given by 1 (41) 2[8 + (8 - . r r)] = 8 --. 2 The expected cost, for a specific value of r, (r ~ 8), will then be (42) Therefore, the total expected cost for all r ~ S will be s (43) L: P(r) (8 - C1 !r). Cost Equation. (b) For r > S. Here, as seen from Fig. 5, there will be no shortages tIl (tl + t2) part of the time, while shortages will occur t2 /(t 1 + t2 ) part of the time. Now (44) tl 8 + t2 r --- = tl t2 r- S and - - = - - . tl + t2 r Furthermore, the average amount stocked is Y28, and the average amount short is Y2(r - 8). Therefore, the holding cost for each value of rover the period Blr is given by . (45) OPERATIONS RESEARCH 15-28 while the shortage cost for each r, over the period (r - S)/r, is given by (46) , C2 C~ C~) ~ S) C2 (r ~rS)2 Therefore, the total expected cost will be given by S (47) TEC(S) = C1 S2 P(r)r=S+l 2r 00 L P(r)(S - !r) + C1 L r=O ~ £..oJ (r - S)2 per) - - 2r Solution. The optimum value of S is that which satisfies the condition (48) { p[r ~ (S - 1)] + (S _ !) i r=S < p(r)) r k(r < C2 + C2 ~ S) + (S +!) C1 ,i:+l p;rl See Ref. 1, Chap. 8, for further discussion, an example of the use of this solution, and a case study employing this model. Inventory Models with Price Breaks In this section, decision rules are given for the optimum lot size (or optimum purchase quantity) as derived for a class of inventory problems in which the unit manufacturing (or purchase) cost is variable, that is, subject to quantity discounts or price breaks. Specifically, this section will generalize on Model I (see Elementary Inventory Models), which describes a system in which demand is fixed and known, withdrawals from stock are continuous and at a constant rate, and no shortages are permitted. (See Fig. 3.) SYlllbols. The following symbols are used: TEK TEK* cost per unit of manufacturing or purchasing for range i monthly holding cost expressed as a decimal fraction of the value of the unit setup cost per production run or, when for purchased parts, the setup cost associated with the procurement of the purchased items total expected cost minimum (optimum) total expected cost As before, T R ts q q* the period of time for which the decision rules are being determined total requirement during period T interval between placing orders input, or quantity ordered optimum order quantity, i.e., economic lot size or economic purchase. quantity OPERATIONS RESEARCH 15-29 Finally, let the price break situation be described by the following: Range Quantity Unit Purchase Price Rl R2 1 ~ ql < b1 b1 ~ q2 < b2 kl k2 Rn bn - ~ kn 1 qn where bj (j = 1, 2, ... , n - 1) are those quantities which determine the price breaks. Problem. The problem can be stated as one of determining: (1) how often should parts be purchased; (2) how many units should be purchased at anyone time. Basic Cost Equation. ,The basic cost equation for the period T for any one value of the unit purchase cost kl is given by , (49) TEK CsR =- q + klR + "2CsTP + "2kITPq, 1 1 while the basic solution is given by (50) q* = J2C,R, klTP and (51) Solution. Decision Rules. (See Ref. 1, Chap. 9.) One Price Break. 1. Compute q2* from eq. (50), by using k 2. If q2* ~ b, then the optimum purchase quantity is q2*, that is, q* = q2*' 2. If q2* < b, compute TEK*(k l ) from eq. (51) [or, equivalently, TEI(ql *) from eq. (49)] and compare this with TEK(b l ) as given by eq. (49). If TEK(ql *) If TEK(ql *) < TEK(b l ), then q* = > TEK(b l ), then q* = ql *. bl . Two Price Breaks. 1. Compute q3*. If q3,Q ~ b2, then q* = q3*. 2. If q3* < b2, compute q2*' If q3* < b2 and bl ~ q2* < b2, proceed as in the case of one price break, i.e., compare TEK*(k2) with TEK(b 2) to determine the optimum purchase quantity. 3. If q3* < b2 and q2* < bb compute TEK*(k l ) and compare it with TEK(b l ) and TEK(b 2) to determine the optimum purchase quantity. 15-30 OPERATIONS,. RESEARCH (n - 1) Price Breaks. 1. Compute qn *. If qn * ~ bn-b then q* = qn *. 2. If qn * < bn-b . compute qn-l *. If qn-l * ~ bn- 2, i.e., bn- 2 ~ qn-l < bn-b proceed as for one price break, i.e., compare TEK*(k n_ 1) with TEK(b n_ 1 ) to determine q*. 3. If qn-l * < b';~2' compute qn-2*' If qn-2*·~ bn- 3, proceed as for two price breaks, i.e., compare TEK*(k n_ 2) with TEK(b n_ 2) and TEK(b n_ 1 ) to determine q*. 4. If qn-2* < bn- 3, compute qn-3*' If qn-3* ~ bn- 4 , compare TEK*(lr,n_3) with TEK(b n_ 3), TEK(b n_ 2), and TEK(b n_ 1 ). 5. Continue in this manner until qn-i* ~ bn-i-b (0 ~ j ~ n - 1), and then compare TEK*(k n_ i ) with TEK(b n_ i ), TEK(b n- i +1 ), , TEK(b n_ 1 ) to determine the economic purchase quantity q*. Note. Define bo = 1 for this step. Inventory Models with Restrictions In some inventory situations it is ·necessary to consider restrictions on production facilities, storage space, time, or money. When such restrictions are introduced in situations involving more than one product, it is necessary to allocate the limited available resources among the products. Models have been developed which enable one to determine how much of each item to produce (or purchase) under the specified restrictions. Such models are developed and solved in Ref. 1, Chap. 10. A brief description of the approach to the solution of such models is given in Sect. 2, Modified Lagrangian Multipliers. See also Refs. 14-16. Other Inventory Models Arrow, Harris, and Marschak (Ref. 17), Eisenhart (Ref. 18), Tompkins (Ref. 19), and others have treated the problem of determining the optimum buffer stock needed to protect against shortages, where demand is uncertain. Whitin (Ref. 20) has investigated the interaction between buffer stocks and lot sizes. Dvoretzky, Kiefer, and Wolfowitz (Refs. 21 and 22) have shown the conditions under which optimum inventory levels can be found. Multistorage Points. Berman and Clark (Ref. 23) have developed and solved specific models for systems in which a central warehouse supplies a number of field warehouses which, in turn, supply distributors. DynaInic Models. The dynamic problem of inventory is one in which consideration must be given to the effect of a decision in the current period on subsequent periods. A servomechanism approach to the dynamic inventory problem which utilizes feedback rules to adjust production to sales has been developed OPERATIONS RESEARCH 15-31 and applied at Carnegie Institute of Technology (Refs. 8 and 24) for situations of uncertain demand. This procedure applies Norbert Wiener's autocorrelation methods (see Chap. 17). A related method has been developed by Vassian (Ref. 26). A number of persons have developed approaches with linear programming techniques. Such linear programming models al,'e designed primarily for situations with important seasonal fluctuations in demands. Charnes, Cooper, and Farr (Ref. 27) have treated this case while further assuming that demand is known. See also Dannerstedt (Ref. 28). Bellman (Refs. 29-32) has developed "dynamic programming" which makes it possible to approach these problems through the calculus of variations. See also Bellman, Glicksberg, and Gross (Ref. 25). Holt, Modigliani, and Simon (Ref. 8) have developed "quadratic programming" and applied it to setting .overall production levels for cases in which the cost functions are quadratic. For excellent summaries of the great amount of pertinent research and application in the inventory area, see Whitin (Refs. 33 and 34) and Simon and Holt (Ref. 35). See also Ref. 1, Chaps. 8-10. 4. ALLOCATION MODELS Types of ProbleIns. Allocation models are used to solve a class of problems which arise when (a) a number of activities are to be performed and there are alternative ways of doing them, and (b) resources or facilities are not available for performing each activity in the most effective way. The problem is to combine activities and resources in such a way as to maximize overall effectiveness. These problems are divisible into two types: 1. An amount of work to be done is specified. Certain resources are available; i.e., a fixed capacity and/or ma.terial for doing the job is available and, hence, constitutes a restriction' or limitation. The problem is to use these limited faciljties and/or materials to accomplish the required work in the most economical manner. 2. The facilities and/or materials which are to be used are conside"red to be fixed. The problem is to determine what work, if performed, will yield the maximum return on use of the facilities and/or materials. Linear PrograInIning. Generally speaking, linear programming techniques can .be used to solve a special class of allocation problems for which the following conditions are satisfied: 1. There must exist an objective, such as profit, cost, or quantities, which is to be optimized and which can be expressed as, or represented by, a linear function. 2. There must be restrictions on the amount or extent of attainment of OPERATIONS RESEARCH 15-32 the objective and these restrictions must be expressable as, or representable by, a system of linear equalities or inequalities. The general linear programming problem may be expressed mathematically as follows: PROBLEM-STATEMENT I. Find the values of XI, X 2 , X a, •.. , Xn which maximize (minimize) (52) subject to the conditions that Xj (53) ~ 0, . j = 1,2, ... , n and (54) where aij, bi , and Cj are given constants (i = 1, 2, ... , m; j = 1, 2, ... , n). PROBLEM-STATEMENT II. Given the column vectors from eq. (54), alj a2j Pj = j = 1,2, ... , n amj (55) bl b2 Po = bm the problem can also be stated as follows: Determine non-negative values of Xl, X 2 , ••• , Xn which maximize (minimize) the linear functional n (52a) Z = XIC I + X 2 C2 + ... + XnCn == L . j=I XjPj = Po. OPERATIONS RESEARCH 15-33 Solution of Linear Programming Problems Among the several techniques which can be used to solve linear programming problems, the most important ones are the simplex technique and the transportation technique. There is also a special linear programming problem, called the assignment problem, for which special techniques greatly reduce the tremendous amount of computation that would otherwise follow from the use of either the transportation or simplex techniques. The assignment problem is discussed in Ref. 1, Chap. 12. Solution of Linear Programming Problems by the Simplex Technique The solution of linear programming problems by the simplex technique may best be illustrated by the solution of a specific problem. The problem, simplified for purposes of illustration, may be stated as follows. PROBLEM. A manufacturer wishes to maximize the profits associated with producing two products, Rand 8. Products Rand 8 are manufactured by a two-stage process in which all initial operations are performed in machine center I and all final operations'may be performed in either machine center IIA or in machine center IIB. Machine centers IIA and IIB are different from each other in the sense that, in general, for any given product they yield different unit rates and different unit profits. In addition, a certain amount of overtime has been made available in machine center IIA for the manufacture of products Rand 8. Since the use of overtime results in changes (decreases) in unit profits (but not in unit rates), let us denote separately, by machine center IIAA, any overtime use of machine center IIA. The unit times required to manufacture products Rand 8, the hours available in each machine center, and the unit profits are given in Table 4. In this table, RI, R 2 , and Ra denote the three possible combinations for producing R, and similarly, 8 11 S2, and 8 a are defined for product 8. TABLE 4. UNIT TIMES REQUIRED TO MANUFACTURE PRODUCTS RAND 8 Product 8 Product R Operation Machine Center I IIA IIAA IIB Profit per part (in dollars) 1 2 Rl 0.01 0.02 R2 0.01 R3 0.01 81 82 83 0.03 0.05 0.03 0.03 0.02 0.05 0.03 0040 0.28 0.32 0.08 0.72 0.64 0.60 Hours Available 850 700 100 900 OPERATIONS RESEARCH 15-34 The problem is to determine how much of each product should be made through the use of each possible combination of machine centers so as to maximize the total profits, 'and to keep in mind the prescribed limitations on the capacities of the machine centers. The assumption here is that one can sell all that one can produce. This is a simplification which may be removed very easily by imposing additional restrictions in the form of maximum permissible quantities of each product. (See Ref. 36.) Silllplex Solution. The simplex technique is a procedure which, through a series of repetitive arithmetic operations, progressively approaches, and ultimately reaches, an optimum solution. The procedure may be summarized briefly as follows: 1. The problem is first set up in mathematical form in which all relevant initial relationships and restrictions are stated. 2. The problem is then set up in tabular form. 3. An initial (feasible) solution is determined. 4. Alternative changes to this solution are evaluated. 5. A new solution is determined by introducing the "most favorable" alternative change. 6. Steps 4 and 5 are repeated to derive successively better solutions. 7. When, at any stage, step 4 evaluates no alternative choice favorably, the procedure is complete and gives an optimal solution. More explicitly, the simplex technique is carried out as indicated in the following steps: Step 1. Rephrase the problem in mathematical form. Let XI, X 2 , X a, X 4 , X 5 , X6 denote the amounts to be made of products RI, R2 , R a, SI, S2, Sa, respectively. Then the total profit Z will be given by (56) Z = 0.40X l + 0.28X2 + 0.32Xa + 0.72X4 + 0.64X5 + 0.60X 6• Furthermore, the restrictions to the problem will be given by O.OIX l (57) + 0.OIX2 + O.OIXa + 0.03X4 + 0.03X5 + 0.03X6 ~ 850 0.02X 1 + 0.05X4 ~ 700 0.02X2 + 0.05X5 ~ 100 0.03Xa + 0.08X6 ~ 900. Therefore, the problem may now be restated as follows: Determine the values of Xj ~ 0 (where j = 1, 2, "', 6) which maximize eq. (56) subject to the restrictions of eqs. (57). The restrictions X j ~ 0, j = 1, 2, "', 6, arise from the fact that, since the manufacturing process is irreversible, one must preclude the appearance of negative values for these variables. OPERATIONS RESEARCH 15-35 Step 2. Reduce the system of inequations (i.e., the restrictions) to an equivalent system of equations by introducing new IJ,on-negative variables X 7 , X s , X g , X lO • These new variables, X 7 , X s , X g , and X lO , are variously called "disposal activities," "pseudo variables," or "slack variables." In this problem, it can be seen that positive values of these slack variables represent underutilization of capacity in machine centers I, IIA, IIAA and IIB respectively. The introduction of these slack variables results into the system of equations: 0.01X1 (58) + 0.01X2 + 0.01X3 + 0.03X4 + 0.03X5 + 0.03X6 + X 7 = 850 0.02X 1 + 0.05X4 + Xs = 700 0.02X2 + 0.05X5 + Xg = 100 0.03X3 + 0.08X6 + XlO = 900. Step 3. Complete the transformation of the given set of eqs. (56) and (58) into the standard form used in the simplex technique by making the following set of transformations. Rearrange eqs. (58) so that corresponding Xis appear in the same column. Then let the symbol P j denote the column of coefficients of Xj (j = 1, 2, "', 10), and Po denote the righthand column of numbers in the system of eqs. (58). Assuming a zero profit or cost associated with each slack variable X 7 , X s , X 9 , X lO , the linear programming example may now be restated as follows: Determine the values of a set of non-negative Xj (where j = 1, 2, .. " 10) which maximize the linear form (functional) (56a) Z = OAOX 1 + 0.28X2 + 0.32X3 + 0.72X4 + 0.64X5 + 0.60X6 + 0·X7 + O·Xs + O·Xg + O·XlO subject to the restrictions 10 (58a) L XjPj = Po. j=1 Step 4. Exhibit the column vectors P j in a systematic, i.e., tabular, form. This is done in Table 5 by means of eqs. (58), all blank spaces in the table representing zeros. It should be noted that eqs. (58) can be generated simply by multiplying each coefficient in any Pj column by the corresponding Xj and then reading across the rows. (The bold vertical line shows where to place the equal signs.) The square submatrix formed by {P 7 , P s, P 9 , P lO }, which consists of elements that are equal to Ion the main diagonal and that are everywhere else equal to zero, is of special importance.' This matrix is called the unit 15-36 OPERATIONS RESEARCH TABLE PI P2 0.01 0.01 5. COLUMN VECTORS FOR SIMPLEX SOLUTION Pa P4 0.01 0.03 P6 P6 P7 Ps P 9 P IO Po - - - - - - - -- - - - -- ---- -- -0.03 0.03 1 850 ---- -- -- -- -- --- -- ---- - - 0.02 1 0.05 700 - -- -- -- -- -- --- -- ---- - 0.02 0.05 1 lOO - -- -- ----- -- --- -- -- -- - - 0.08 0.03 1 900 or identity matrix. The set of vectors which form the identity matrix are, in turn, said to be a unit basis of the particular space of interest, which is, in this problem, a four-dimensional space. The basis vectors are linearly independent vectors in terms of which every point in the n-dimensional (here, n = 4) space may be uniquely expressed and in terms of which a solution (or solutions) will be stated. Step 5. The columns of Table 5 are now rearranged as shown in Table 6a. Then, a column labeled "Basis" is inserted to the left of the Po column and, in this column, the basis vectors are listed. For this example, the slack vectors form the unit basis. In some problems for which some of the restrictions are stated either in terms of equalities or in terms of inequalities which impose minimum limits, so-called artificial vectors will have to be introduced in order to form a unit basis (see Ref. 36). It should be noted also that structural vectors may be such that they may be included in the unit basis. Next, a row of Cis is added, where the Cis are defined as the coefficients of the corresponding Xis in the expression for Z given in eq. (56). Then, a column of Ci's is added, these corresponding to the Cis, but having the subscript i to denote the row, rather than the subscript j, which is used to denote the column. The expression for Z can now be written as 10 (59) Z = L: CjXj . j=1 Step 6. Next, add a row of numbers labeled Zj, where j denotes the appropriate column. Letting X ij denote the element in the ith row and jth column of the table, the Z/s (including Zo) are defined by (60) Zj = L: CiX ij . i TABLE 6. SIMPLEX METHOD (Ref. 1) (a) First Feasible Solution Ci ~ J Basis Po P7 Ps Pg PIO 0.40 0.28 0.32 0.72 0.64 PI P2 P3 P4 Ps 0.60 P6 - ...... - P7 850 Ps Pg 700 PIO 0.01 1 - 1 100 . 0.01 0.02 - 0.03 0.03 _10.051 0.03 o "'tJ - 0.05 0.02 1 m ::0 ~ 0.08 0.03 1 900 0.01 (5 Zj Z Ul . Zj - Cj -0.40 -0.28 -0.32 -0.72* -0.64 -0.60 ::0 m m Ul » ::0 () (b) Second Feasible Solution :I: I 0.72 ~ f- P7 430 P4 14,000 Pg 100 PIO 900 1 -0.002 -0.6 0.01 1 0.4 20 0.03 0.01 10.05\ 0.02 1 0.08 0.03 1 Zj 10,080 14.4 0.288 Zj - Cj 10,080 14.4 -0.112 0.03 0.72 -0.28 -0.32 -0.64* -0.60 01 W ~ TABLE 6. SIMPLEX METHOD (Ref.- I)-Continued 01 (c) Third Feasible Solution Ci ~ Basis -+ Po 0.72 P1 P4 14,000 0.64 Ps 2,000 - P10 370 P1 Ps P9 1 -0.6 -0.6 P10 20 ..,-. W - -, 0.28 0.32 0.72 0.64 PI P2 Pa P, Ps -0.002 -0.002 20 (X) ~ 0.40 0.01 0.60~ - --p&--- 0.03 0.4 1 0.4 1 900 - o ""'C 1 0.03 m 10.081 :::c > --I 0Z Zj - Cj 11,360 14.4 12.8 -0.112 - - - -0.024 ----~- -0.32 --- - -0.60* :::c m m --- - (J) (J) > :::c (d) Fourth Feasible Solution (") :::I: P7 -+ "-0.72 P4 14,000 0.64 Ps 2.000 0.60 1 32.5 P6 11.250 Zj - Cj 18.110 ~-'--------- -0.6 -0.6 3 -8 -0.002 -0.002 [QJJ 20 1 3 12! 14.4 12.8 1 0.4 20 --- -1 800 7! 8 -0.112* -0.024 1 -0.095 - (e) F1Jih Feasible Solution Ci ~ J Basis P7 Po 102.5 0.40 PI 35,000 0.64 Po 2,000 .... 0.60 P6 11 ,250 ~ Zj - Cj P7 Ps 1 -'2 1 Pg PIO -0.6 -i 0.40 0.28 0.32 0.72 0.64 0.60 PI P2 P3 p. P6 Ps -0.002 -m 0.005 t 1 50 20 rn -4-2 22,030 20 12.8 1 0.4 - -0.024 7t -0.095* 1 m :;:c ~ (5 Z en 0.28 -- ---- o-c :;:c m en m » :;:c (f) Sixth Feasible Solution n :::c P7 140 ·0.40 PI 35,000 0.64 Po 2,000 r~ 0.32 P3 30,000 Zj - Cj 24,880 1 -v1 -0.6 -! t 1 50 @]] 20 ,.100 20 - 12.8 ¥ -do 0.005 -0.002 1 t 1 -0.024* -- 0.28 0.2st 01 W -0 111 1o TABLE 6. SIMPLEX METHOD (Ref. I)-Continued (g) Maximum Feasible Solution Ci ~ J Basis Po P7 150 35,000 0.28 PI P2 0.32 Pa 30,000 Zj - Cj 25,000 0040 P7 Pg 1 -2" 1 P9 PIO 1 -a1 -2" 50 5,000 0040 0.28 0.32 0.72 0.64 0.60 PI P2 Pa P4 P5 P6 1 mro 1 21fO m ;:0 »-I 5 z en 1 i 1 50 200 o ." ;:0 m m en 5 1 » '2 ;:0 100 -3- 20 14 loj () 8 1 3 0.28 0.06 :J: 0.25i I OPERATIONS RESEARCH 15-41 Step 7. A row labeled Zj - Cj is entered into the table and for any column, say jo, consists of the corresponding CjO subtracted from the value of ZjO which was entered in the previous row. Steps 1 through 7 complete the first phase of the simplex technique calculations and result in what is known as a feasible solution to the problem, namely a solution which satisfies all the restrictions but which does not necessarily yield the optimum result. This feasible solution is given by the column vector Po (Table 6a) in terms of the basis vectors P 7 , P s , P g , P lO , namely, (61) X 7 = 850; Xs = 700; Xg = 100; X 10 = 900. That is, the initial feasible program consists of "Do not use any of the time available in any of the machine centers; i.e., do nothing," thus resulting in a net profit of Z = O. Optimum Solution Criteria. Having obtained a feasible solution, one can proceed to the optimum solution by considering the following mutually exclusive and collectively exhaustive possibilities: Ml. Maximum Z = 00 (i.e., maximum Z is infinitely large) and has been obtained by means of the present program. M2. IVlaximum Z is finite and has been obtained by means of the present program. M3. An optimum program has not yet been achieved and a higher value of Z may be possible. The simplex technique is such that possibilities M1 or M2 must be reached in a finite number of steps. Furthermore, if one remembers that X ij denotes the element in the ith row and jth column of the table, the technique is such that, for a given tableau (i.e., table or matrix): Cl. If there exist any Zj - Cj < 0, either Ml or M3 holds: (a) if all Xij ~ 0 in that column (for which Zj - Cj < 0), then Ml is true; (b) if some Xij > 0, further calculations are required, i.e., M3 holds. C2. If all Zj - Cj ~ 0, a maximal Z has been obtained (M2). Iterative Procedure to an Optimum Solution. In the example (Table 6a), Zl - C1 < 0 (as are Z2 - C2 through Z6 - C6) and, furthermore, some of the coefficients under PI are greater than zero. Hence, by condition C1b, further calculations are required (i.e., condition M3 holds). To discover new solutions, it is possible to proceed in a purely systematic fashion by the simplex technique. Furthermore, any new solution so obtained will never decrease the value of the objective functional (although an increase need not occur), and, as stated earlier, the optimal solution, if one exists, must be reached in a finite number of steps. Hence, the simplex technique is a converging iterative procedure. 15-42 OPERATIONS RESEARCH Step 8. Of all the Zj - Cj < 0, choose the most negative. (In the particular example, this is Z4 - C4 = -0.72 and is so indicated by an asterisk in Table 6a.) This determines a particular P j (namely P 4) which will be introduced into the column labeled "Basis" in Table 6b. Step 9. Determine the vector which this P j will replace by dividing all the positive Xij appearing in the Pj column into the corresponding X iO which appears in the same row under Po. (Since all the components of Po must be non-negative, all these ratios must, in turn, be non-negative.) The smallest of these ratios then determines the vector to be replaced. In the present example, P 4 i's to replace one of the vectors P 7 , P s , P g , or P IO ' Under P 4 , there are two positive Xij, namely X 7 ,4 = 0.03 and X S ,4 = 0.05. The division of these Xii's into the corresponding XiO'S which appear under Po gives a minimum of 14,000 (Le., 700/0.05). Thus, P g is the vector to be replaced by P 4, so that a new basis is formed consisting of the vectors P 7 , P 4 , P g, and P IO (see Table 6b). Step 10. Let subscript k denote "coming in," subscript r denote "going out," X'ij denote the elements of the new matrix, and (62) . X iO cp=mm-. i X ik [i.e., cp is the minimum of all ratios (XiO/X ik ) for Xik > 0]. The elements of the new matrix (X'ij) are calculated as follows. The elements, X' kj, of the row corresponding to the vector just entered into the unit basis are calculated by X rj X'kj = - . (63) X rk The other elements (X'ij) of the new matrix are calculated by (64) Xrj) X Ok X' tJ = X tJ.. - ( -X rk' t , 0 0 where eq. (64) also applies to the XiO'S appearing under Po and to the Zj - Cj in the entire bottom row (but not to the Z/s in the second to the last row). The new value of the profit function will be given by (65) or, since Co = 0, the profit function will be given by (66) OPERATIONS RESEARCH 15-43 For example, starting with Table 6a and proceeding to Table 6b, the most negative Zi - Ci is Z4 - C 4 = -0.72. Therefore Ie = 4. Hence, from eq. (62), ¢ O for all X i4 > 0, = min X' _t i X i4 i.e., 850 ) 700 = min ( = 28,333; = 14,000 = 14,000. 0.03 0.05 ¢ Therefore, P4 will replace P s ; or, in our notation, Ie = 4, r = 8. The elements in the P 4 row of Table 6b are then computed by eq. (63) XSj _ (XSi) • _ X I 4i -X S4 0.05 Therefore, X /40 = XSO) = (-0.05 X /41 = XSl) (-0.05 (700) = 14,000, 0.05 = (0.02) = 0.4, etc. 0,05 For the elements of the other rows, where Ie = 4, r = 8 are substituted into eq. (64), X' tJ.. = X tJ.. - -0.05 X (-XXSi) X'4 = X .. - (XSi) tJ t S4 t '4. Therefore, X' 70 = X 70 - XSO) (X (0.05 - 74 ) = 850 - (700) (0.03) 0.05 = 850 - (14,000)(0.03) = 850 - 420 = 430. and = (-0.40) - ( -0.02) (-0.72) 0.05 = (-0.4) - (0.4)(-0.72) = -0.4 + 0.288 = -0.112, etc. OPERATIONS RESEARCH 15-44 Finally, the new value of the profit functional will be given (see Table 6b) by (Zo - Co)' = (Zo - Co) - cP(Z4 - C4) = 0 - 14,OOO( -0.72) = + 10,080. The results are shown in Table 6b. Step 11. The process is then repeated until such time as either condition Ml or condition M2 holds. For the present example, the solution is obtained after six iterations, i.e., six tableaux or matrices after the first (see Tables 6a-g). The final tableau, Table 6g, yields the optimum solution. (If any other optimum solutions existed, they would be indicated by Zj - Cj = 0 for j's other than those appearing in the basis. Here, Zj - Cj = 0 for j = 1,2,3, and 7 only. Hence no other optimum solutions exist.) This optimum solution is also stated, both in terms of the number of parts and hours required, in Tables 7 and 8. TABLE 7. OPTIl\WM PROGRAM (NUMBER Total 35,000 5,000 30,000 R = 70,000 $25,000 Total profit TABLE 8. Machine Center 1 2 I IIA IIAA IIB o parts parts parts parts parts o parts o parts 8=0 + 0=$25,000 OPTIMUM PROGRAM (HOURS) Product 8 Product R Operation PARTS) Product 8 Product R R1 (Centers I-IIA) R2 (I-IIAA) R3 (I-IIB) OF R1 R2 R3 81 82 83 350 700 50 300 0 0 100 0 0 0 900 0 SurHours Used Hours Avail. plus Hours 700 700 100 900 850 700 100 900 150 0 0 0 Thus, one readily sees that the optimum (most profitable) program under the prescribed conditions consists of manufacturing 70,000 units of product R to the complete exclusion of product S. Furthermore, by e"q. (56) and also by (Zo - Co) in the optimum tableau, the total profits will be + 0.28(5,000) + 0.32(30,000) + 0.72(0) + 0.64(0) + 0.60(0) Z = 0.40(35,000) = $25,000. 15-45 OPERATIONS RESEARCH Alternate Step 8. One should note at this point that the improvement from one tableau to the next is given by -cp(Zk - Ck), see eq. (65). Furthermore, in practice, one need not select the most negative number (Zj - Cj) but, rather, that negative number which yields the greatest improvement. Thus, in the example, -CP(ZI - Cl ) - -CP(Z2 - C2) - -cp(Za - C-q) - -CP(Z4 - C4) - -CP(Z5 - C5) - -cp(Z6 - C6) = - COO) COO) COO) COO) COO) (-0.40) = 14,000, 0.02 k= 1 (-0.28) = 1400 0.02 ' , k= 2 (-0.32) = 9600 0.03 ' , k=3 (-0.72) = 10,080, 0.05 (- 0.64) = 1 280 ' , 0.05 (900) 0.08 (-0.60) = 6,750, k=4 k=5 k = 6. Therefore, instead of introducing P 4 into the basis, a greater gain is achieved at this step through the introduction of Pl. In this particular example, following alternate Step 8 enables one to reach the optimum solution, Table 6g, in three less iterations. Further Restrictions in Linear Progralllllling Problellls. Once having established the solution to a given linear programming problem, one may wish to consider (or evaluate) further restrictions on the variables. Thus, by referring to the preceding example, these restrictions may be in the form of: (1) minimum requirements for product S, (2) changes in the amount of time available in the machine centers, (3) changes in the prices of the various products, (4) changes in the unit production rates, e.g., due to the "introduction" of new equipment. The simplex technique is such that, in general, new optimum solutions can easily be constructed in terms of such added restrictions by' making use of the optimum solution to the original problem. For a full discussion of this point, see Ref. 1, Chap. II. Solution of Minilllization Problellls by the Silllplex Technique. To solve minimization problems by the simplex technique one may, in Step 8, select either (1) the most positive Zj - CJ", or alternately (2) the OPERATIONS RESEARCH 15-46 most negative Cj - Zj, and then proceed as before to the solution of the problem. The Transportation Problem A linear programming problem, for which a special technique has been developed is the so-called transportation problem which may be stated as follows. PROBLEM. Determine Xij ~ 0 which minimize m (67) Z n = 2: 2: CijXij, i=lj=l such that (68) n 2: Xij = Ai (i = 1, 2, "', m) Bj (j = 1, 2, "', n). j=l and (69) m 2: Xij = i=l The transportation problem is obviously a special case of the general linear programming problem; hence, it can be solved by the simplex . technique. However, a special solution technique, far simpler than the simplex technique, has been developed for solving transportation problems and, quite appropriately, it is called the transportation technique (Ref. 39). The procedure in the transportation technique is outlined as follows: 1. The problem is set up in tabular form. a. All requirements are explicitly stated. b. All permissible slack in the system is explicitly stated. c. All appropriate costs and/or revenues are determined. d. An objective function is determined. e. The computational framework is established. 2. An initial solution is determined. a. The initial solution must be technically feasible, i.e., it must meet all restrictions. 3. Alternative choices are evaluated. a. Changes in the solution are made one at a time. b. The evaluation is of the complete effect of each unit change. 4. The "most favorable" alternative is selected. 5. The number of units to be included in this change is determined. a. Owing to the linear nature of the model, each unit contributes the same cost or profit difference. . 15-47 OPERATIONS RESEARCH b. The limit on the number of units involved in the particular change is technical feasibility (non-negativity requirements). 6. A new solution is determined. a. The elements to change and the number of units to include have been previously determined. 7. Steps 3 through 6 are repeated. The process is a converging iterative one. S. When Step 3 evaluates no alternative favorably, the procedure is complete and one has an optimal solution. EXAMPLE. This example, taken from Ref. 37a, deals with the problem of moving empty freight cars from three "excess" origins to five "deficiency" destinations in such a manner that; subject to the given restrictions, the total cost of the required movement will be a minimum. The specific conditions of the problem and the unit (per freight car) shipping costs are given in Tables 9 and 10. Table 9 states that origins 8 1 , 8 2 , and 8 3 have surpluses of 9, 4, and S empty freight cars, respectively, while destinations D 1 ; D2 , D3 , D4 , and D5 TABLE 9. PHYSICAL PROGRAM REQUIREMENTS ~ Destina- ~ DI D2 D3 D4 Do Surpluses Origins 81 S2 S3 Deficiencies -- -- -X I2 X I3 X l4 X I5 -----X 2I X 22 X 23 X 24 X 25 -----X31 X 32 X33 X 34 X35 Xu 3 -----5 4 6 3 9 4 8 21 are in need of 3, 5, 4, 6, and 3 cars, respectively. For simplicity, it ha.s been assumed that the problem is self-contained, i.e., that the number of excess cars is equal to the number of deficiencies. Any transportation problem ca.n be made self-contained through the introduction of dummy origins or destinations. Table 10 lists the unit costs Cij of sending an empty freight car from the ith origin to the jth destination. 15-48 . OPERATIONS RESEARCH TABLE I~ 10. UNIT SHIPPING COSTS I tians D1 D3 D2 D4 D5 Origins Cn C12 C14 C13 C15 S1 -10 C21 -5 -20 C23 C22 -9 C24 -10 C25 S2 -2 C31 -10 C32 -8 C33 -6 -30 C34 C35 S3 -1 -20 -7 -10 -4 The solution to this problem by the transportation technique is obtained as indicated in the following steps. Step 1. Set up the tables listing the physical program requirements (Table 9) and unit shipping costs (Table 10). Step 2. Obtaining a First Feasible Solution. Write down an initial (feasible) solution, namely one which satisfies the movement requirements. (If a feasible solution also minimizes the total cost, it is then called an optimum feasible or, in this case, a minimal feasible solution). This can easily be done by applying a technique which has been developed by Dantzig, Ref. 38, and which Charnes and Cooper, Ref. 39, refer to as "the northwest corner rule," The northwest corner rule may be stated as follows: 1. Start in the upper left-hand corner of Table 9 (requirements) and compare the amount available at 8 1 with the amount required at D 1 • (a) If Dl < S1, i.e., if the amount needed at Dl is less than the number of units available at 81, set X 11 equal to Dl and proceed to cell X 12 , i.e., proceed horizontally. (b) If Dl = SI, set X 11 equal to Dl and proceed to cell X 22 , i.e., proceed diagonally. (c) If Dl > 81, set X 11 equal to SI and proceed to X 21 , i.e., proceed vertically. 2. Continue in this manner, step by step, away from the upper left corner until, finally, a value is reached in the lower right corner. Thus, in the present example (see Table 11), proceed as follows: (a) Set X 11 equal to 3, namely, the smaller of the amount available at 8 1 (9) and that needed at Dl (3). OPERATIONS RESEARCH TABLE I~ Hons 11. 15-49 FIRST FEASIBLE SOLUTION Dl D2 D3 D4 Do Total Surpluses Origins 81 ---- ® ® CD 9 ------ ® 82 CD ® ® 8 6 3 21 4 ------ 83 - -- - - - Total deficiencies 5 3 4 (b) Proceed to X 12 (rule la). Compare the number of units still available at 8 1 (namely 6) with the amount required at D2 (5) and, accordingly, let X 12 = 5. (c) Proceed to X 13 (rule la), where there is but one unit left at 8 1 while four units are required at D 3 . Thus set X 13 = 1. (d) Then proceed to X 23 (rule lc). Here X 23 = 3. (e) Continue and set X 24 = 1, X 34 = 5, and, finally, in the southeast corner, set X35 = 3. The feasible solution obtained by this northwest corner rule is shown in Table 11 by the circled values of the Xij. That this set of values is a feasible solution is easily verified by checking the respective row and column requirements. The corresponding total cost of this solution is obtained by multiplying each circled X ij in Table 11 by its corresponding Cij in Table 10 and summing the products. For any cell in which no circled number appears, the corresponding X ij is equal to zero. That is, the total cost is given by: 5 (70) Total cost = 3 3 5 L: L: CijXi.i = L: L: CijXij. j=1 i=1 i=1 j=1 The total cost associated with the first feasible solution is computed as follows: T.C. = X l1 Cl1 + X 12 C12 + X 13 C13 + X 23 C23 + X 24 C24 + X 34C34 + X 35C35 15-50 OPERATIONS RESEARCH + (5)(-20) + (1)(-5) + (3)(-8) + (1)(-30) + (5) (-10) + (3)( -4) T.e. = (3)(-10) -$251 (minus sign means "cost" rather than "profit"). Step 3. Evaluation of Alternative Possibilities. Evaluate alternative possibilities, i.e., evaluate the opportunity costs associated with not using the cells which do not contain circled numbers. Such an evaluation is illustrated by means of the program given in Table 11 and is exhibited in Table 12 (noncircled numbers only). This evaluation is obtained as follows. (For an alternative method of evaluation, see Ref. 1.) TABLE 12. FIRST FEASIBLE SOLUTION (WITH EVALUATIONS): I~ tions Dl D2 D3 D4 D6 C = 251 Total Origins SI -® CD ® 1-181 -11 9 ® CD -18 4 17 19 ® ® 8 5 4 6 - -- - S2 -11 -13 - -- S3 8 Total 3 -- 3 21 1. For any cell in which no circled number appears, describe a path in this manner. Locate the nearest circled-number cell in the same row which is such that another circled value lies in the same column. Thus, in Table 12, if one starts with cell SaD! (row 3, column 1), the value ® at SaD4 (row 3, column 4) satisfies this requirement; i.e., it is the closest circled-number cell in the third row which has another circled value, CD at S2D4, in the same column (column 4). The circled number ® in position SaD5 fails to meet this requirement. 2. Make the horizontal and then the vertical moves so indicated. In the example, move from SaD! to SaD4 (see Table 12). 3. Having made the prescribed horizontal and vertical moves, repeat the procedure outlined in Steps 1 and 2. For the example, this now gives cells S2Da and SlDa respect'ively; accordingly, one moves from CD at S2D4 to CD at SlDa by way of ® at S2Da. OPERATIONS RESEARCH 15-51 4. Continue in this manner, moving from one circled number to another by, first, a horizontal move and then a vertical move until; by only a horizontal move, that column is reached in which the cell being evaluated is located. (The fewest steps possible should be used in this circumambulatory procedure.) Thus, to continue the example, this step is from CD at S1D3 to ® at SIDl. S. Finally, move to the cell being evaluated (here, S3Dl)' This completes the path necessary to evaluate the given celL (Note. For the purposes of evaluation, the path ends, rather than starts, with the cell being evaluated.) 6. Form the sum, with alternate plus and minus signs, of the unit costs associated with the cells being traversed. (These unit costs are given in Table 10.) This is the (noncircled) evaluation to be entered into the appropriate cell in Table 12. Thus, for the example, one has for the evaluation of cell S3D1 : Path (Table 12) Unit cost (Table 10) Evaluation (S3Dl) S3 D 4 -10 +(-10) -5 -(-5) SlDl -10 +(-10) SaDl -1 -(-1) = +8 Accordingly, one enters +8 in cell S3D1 of Table 12. 7. Repeat the procedure outlined until all cells not containing circled numbers are evaluated. Step 4. I terative Procedure toward an Optimum Solution. If the noncircled numbers (the evaluations) are all non-negative, an optimum has been achieved. If one or more noncircled numbers are negative, further improvement with respect to the objective function is possible (e.g., the negative numbers in SID4, S2D2, etc., in Table 12). (At this stage, it should be quite apparent that one must be careful to circle the values of Xii obtained in a feasible solution in order to distinguish them from the "evaluation" numbers which are also in the same table.) Improvement is obtained by an iterative procedure in which one proceeds as follows: (a) Of the one or more negative values which &ppear, select the most negative one, say - N. If there are more than one such values, anyone of these may be selected arbitrarily. (b) Retrace the path used to obtain this most negative value. (c) Select those circled values which were preceded by a plus sign in the alternation between plus and minus and, of these, choose the one with the smallest value written in its circle, say m. (d) One is now ready to form a new table, wherein one replaces the most negative value, - N, by this smallest value, m. (e) Circle the number m and then enter all the other circles (except the 15-52 OPERATIONS RESEARCH one which contained the value m in the previous program) in their previous cells, but without any numbers inside. The improvement in cost from one program to the next will then be equal to mN. Furthermore, as with the simplex technique, one need not select the most negative number. It is permissible, and sometimes advantageous, to select the first negative number which appears. Since the improvement from one program to the next is given by mN, a study of Table 12 shows that selections of S2DI, S2D 5 , S2D5 , or SlD 5 would have resulted in improvements of 33, 39, 18, and 11 respectively, as compared with the TABLE 13. ITERATIVE PROCEDURE TOWARD SOLUTION (a) Value to Be Moved ~tin~ tions Origins '" Dl D2 ~ SI D3 D4 D6 Total -- -0 0 S2 0 CD 9 ---- 4 0 ---- S3 Total 3 5 0 0 8 6 3 21 ---4 (b) Second Feasible Solution: C = 233 I~ tions Dl D2 Da D4 D6 Total CD 7 9 18 0 4 Origins SI S2 - - -® ® ® - -- -11 1-131 CD -- Sa Total -10 -1 1 ® ® 8 3 5 4 6 3 21 OPERATIONS RESEARCH 15-53 improvement of 18 resulting from the selection of SlD 4 • Another alternative is to examine all products, mN, and select that negative numbered cell which results in the greatest improvement, in this case, S2D5. Thus in Table 12 the most negative number is -18 and appears in both cells SlD 4 and S2D5 (i.e., - N = -18). For such ties, one may arbitrarily select either of the cells containing this most negative number. Here, cell SlD 4 is chosen. Retracing the path used to obtain the -18 value in cell SlD 4, one then obtains +SlD3, -S2D3, +S2D4, -SlD 4. Of those preceded by a plus sign, namely SlD 3 and S2D4, both have the circled value CD in their cells. Consequently, either one of these may be chosen as the circled value to be moved. In this case, cell S2D4 is arbitrarily chosen. The circled value CD is then entered into cell SlD 4 (see Table 13a, i.e., that cell where 1-181 appeared in Table 12). (Therefore, the improvement over the program given in Table 12 will be 1 X 18 = 18 cost units. That is, the next program (Table 13b) will cost 251 - 18 = 233 cost units.) The other circles (without numbers) are then entered in the same positions as before (see Table 13a). Step 5. A new feasible solution is obtained by filling in the circles according to the given surplus-deficiency (input-output) specifications. This solution is given by the circled values in Table 13b. Step 6. The program is then evaluated, as before, and negative (noncircled) numbers still appear. Step 7. The process is successiyely repeated (Tables 14, 15, and 16) until, finally, in Table 16 the evaluation of the corresponding program given therein results in all (noncircled) numbers being non-negative. An optimum feasible solution, or program, therefore, has been reached. TABLE 14. ~ tions THIRD FEASIBLE SOLUTION: C = 181 I Dl D2 D3 D4 D6 Total Origins 81 - -- -- ® CD 0 CD 7 9 0 13 31 13 4 ® 8 3 21 - ----- - 82 2 83 - -- -- 1 101 -1 ® Total 1- 3 5 4 -6 15-54 OPERATIONS RESEARCH TABLE 15. C FOURTH FEASIBLE SOLUTION: = 151 ~ Destina- ~ Dl D2 D3 D4 81 10 -CD CD CD 82 12 83 ® Total 3 D5 Total Origins TABLE 16. ' 9 7 -4 13 31 13 CD - -- -- -- - ----1 8 ® ® 81 -5 4 6 21 3 OPTIMUM FEASIBLE SOLUTION: C = 150 ~ Destina- ~ Dl D2 D3 D4 D5 Total ® 7 9 30 12 4 Origins - - - -- - 81 10 82 11 1 CD CD CD 12 1 CD ® 8 5 4 6 3 21 - -- 83 ® Total 3 -- - - -- -- - Alternate Optimum Programs. If any of the evaluation numbers in the optimum tableau are zero, alternate optimum tableaux exist. These alternate optimum solutions are obtained by essentially the same procedure as that which was just given. The only variation is that the zeros (if any) which appear in the optimum feasible solutions are now treated in exactly the same manner as were the negative values. Furthermore, given such alternate optimum programs, say {PI!, {P 2 }, ... , {P n }, where {P n } refers to the set of Xij which form the nth optimum program, then (71) OPERATIONS RESEARCH 15-55 is also an optimum program provided the ai are non-negative constants such that n 2: ai = al + a2 + a3 + ... + an = (72) 1. i=l For example, the cost minimization problem represented by Table 17 has two optimum programs, namely those given in Tables 18 and 19. Table 19 , is obtained from Table 18 (and vice versa) by treating the zero in cell S3D5 of Table 18 (or cell S3D4 of Table 19) as the "most negative number" and proceeding as before. TABLE I~ tions D1 17. UNIT COST MATRIX D2 D3. D4 D6 -3 0 5 -1 7 -1 6 5 18 Total Origins 81 - -- -2 -1 -4 - -- -- -2 -5 -3 +1 - -- -- -2 -1 -4 -3 82 83 - -- - - - 2 Total TABLE ~ tions 18. 2 4 5 OPTIMUM PROGRAM FOR TABLE D1 D2 D3 D4 17 D6 Total ® 5 ® 7 0 6 5 18 Origins 81 82 - -- ® 2 2 4 - -- - - -- 1 2 ® ® 83 2 Total 2 - -- -- 2 ® CD - -- 2 5 4 15-56 OPERATIONS RESEARCH TABLE 19. ALTERNATE OPTIMUM PROGRAM FOR TABLE I~ tions Dl 17 D3 D4 Do Total ® 2 2 ® 5 1 2 (1) CD 7 2 ® 0 CD 6 2 5 4 5 18 D2 Origins SI -4 - -- S2 ® S3 2 Total 2 - - -- - - - - -- - An infinite number of derived optimum programs can now be obtained by forming what are called "convex linear combinations" of the two basic optimum programs. Thus, if we select two positive fractions whose sum is unity, e.g., 74 and ~4, we can obtain a new optimum program by multiplying every element of the first program by 74 and every element of the second program by % and then adding corresponding cells. This yields the derived optimum program of Table 22 and is obtained as shown in Tables 20 and 21. Similarly, other optimum programs could be derived for other non-negative fractions whose sum is equal to 1. Note. In general, derived optimum programs will involve fractional answers. These programs are for use only where nonintegral answers are realistic. ~ tions TABLE 20. D1 D2 i TABLE Da 18 = D4 Do Total Origins 81 S2 Sa Total ------ -- CD CD - - - - - CD CD CD CD ------- 2 2 .5 4 5 CD ------- - 5 7 6 18 OPERATIONS RESEARCH I~ tions TABLE 21. D1 D2 i 19 TABLE D3 15-57 = D4 Ds Total Origins - - --- - - CD S1 CD S2 S3 Total TABLE 22. A 2 ® - -- - - - ® - - --2 5 4 DERIVED OPTIMUM PROGRAM: I~ tions D1 CD CD CD 5 - - - - - - - - ----- D2 D3 6 5 1 TABLE D4 7 18 20 +i TABLE Ds Total ® 5 CD 7 CD CD 6 21 Origins -- -----® S1 S2 ® - -- - ® S3 Total -- 2 @ -----2 5 4 5 18 Solution of MaxiInization Problellls by the Transportation Technique. Although the exposition just given treats only a (linear) minimization problem, it should be obvious that the transportation technique is equally applicable to (linear) maximization problems. The only difference in solving maximization problems lies in the preparation of the "profit" matrix. Whereas in the minimization problem all costs are entered with a negative sign, here all profits (or whatever units are involved in the maximization problem) are entered without any modification of signs. Once the initial datum matrix is obtained, one proceeds to the solution exactly as previously outlined. OPERATIONS RESEARCH 15-58 Many variations on the transportation technique are available for the solution of transportation problems. One has already been cited with respect to the selection of the cell to be introduced in~o the new basis. A second variation, designed to decrease the number of iterations, involves a rearrangement of the cost matrix. By using the problem cited in Tables 9 and 10, this may be illustrated as follows. 1. Form a new matrix in which the first row and first column correspond to the cell yielding the least cost. In the example, this is S3D!. Enter the totals of 8 for S3 and 3 for D! in the new matrix. Place the smallest of these two numbers in that cell, S3D!. Variations on the Transportation Technique. ~ Dl S3 3 i Total : - -- -- 8 - -- -- -- Total 3 - -- -- - 2. This satisfies the requirement for D!, but still leaves 5 units available at S3. Hence, select the next least unit cost which involves S3. In the example, this is -4 in cell S3D5' Therefore, list D5 in the second column and enter the corresponding total (requirement) of 3. Compare the requirement of 3 units at D5 with the remaining availability of 5 units at S3, and assign 3 units to cell S3D5. ~ Dl S3 3 D5 - -- -- 3 - -- -- - - -- -- Total Total 3 - -- -- 3 8 OPERATIONS RESEARCH 15-59 3. Since 2 units are still available at 8 3 , select the third highest cost, namely -7 in 83D3. Enter D3 in the third column along with its total requirement of 4 units. Comparing the requirement of 4 units at D3 with the remaining availability of 2 (8 - 3 - 3) units at 8 3 , assign 2 units to cell 83D3' thereby using all available units at 8 3 but leaving 2 units still to be assigned to D3 • ~ Dl 83 3 D6 D3 Total - -- -- 3 2 8 - -- -- - - - -- - - -- -- Total 3 3 4 4. Compare the costs associated with D3 (C 13 = -5 and C23 = -8) and select 8 1 as the entry for the second rowand, with it, enter the availability at 8 1 , namely 9. Compare this availability at 8 1 (i.e., 0) with the remaining requirement at D3 (i.e., 2 = 4 - 2) enter 2 units in cell SID3, and thereby satisfy the requirement at D3 • ~ Dl 83 3 D5 D3 Total -2 3 - ---- - - 81 9 - -- - - - - -- -- Total 8 3 3 4 OPERATIONS RESEARCH 15-60 5. By proceeding in this fashion, the following matrix is obtained: ~ Dl S3 3 D5 D3 D4 . D2 Total - -- -- 3 2 8 .---- - - - - - SI 2 6 1 9 4 4 5 21 ---- ---- S2 Total 3 - -- 3 4 6 The cost for this initial feasible solution is given by 3( -1) + 3( -4) + 2( -7) + 2(05) + 6( -9) + 1( -20) + 4( -10), i.e., neglecting the minus sign which indicates cost, T.C. = $153, as compared with the first feasible solution of $251 obtained by the northwest corner rule (and with the optimum solution of $150). Such a reshuffling of the cost matrix generally leads to a better (i.e., lower cost or higher profit) first feasible solution so that the optimum solution is usually reached after a smaller number of alterations. The reader should note that this first feasible solution costing $153 could have been obtained without reshuffling the matrix. One simply starts in the c~ll of lowest cost (here S3Dl) and proceeds accordingly. For further details of the transportation technique, including a discussion of so-called degenerate cases, see Ref. 39. The mathematical derivation of the transportation technique is given in Ref. 40, Chap. 23. Alternate Method of Evaluating Cells in Transportation Technique. An alternate evaluation technique (or procedure) is presented by means of the problem represented by Tables 23 and 24, namely the unit cost table and the table listing the first feasible solution of the transportation problem given earlier (see Tables 10 and 11). The evaluation technique presented here is a variation of that originally designed by Dantzig in Koopmans (Ref. 40, Chap. XXI), and is part of the procedure described in Henderson and Schlaifer (Ref. 41). The discussion of determining the costs of deviating from the optimum solution is given in Ref. 41. The first OPERATIONS RESEARCH TABLE I~ tions 23. 15-61 UNIT SHIPPING COSTS D1 D2 D3 D4 Ds -5 -8 -9 -10 -30 -20 -7 -10 -10 -6 -4 Origins Sl S2 S3 -20 . -10 -2 -1 TABLE ~ tions 24. D1 FIRST FEASIBLE SOLUTION D2 D3 D4 Ds Total Origins Sl -3 S2 S3 Total 3 5 1 - -- -- 1 3 - -- -- 5 3 8 4 3 21 5 6 9 4 part of the evaluation procedure is to form a new table (Table 25) corresponding to Table 24, but listing the unit costs rather than the amounts to be shipped. These costs are given by the boldface numbers in Table 25. Add to Table 25 a column labeled "Row Values" and a row labeled "Column Values" and calculate these values as follows: 1. Assign an arbitrary value to some one row or some one column. For purposes of illustration, let us assign the value 0 to row 8 1 • 2. Next, for every cell in row 8 1 which contains a circled number representing part of the feasible solution, assign a corresponding column value (which may be positive, negative, or zero) which is such that the sum of the column value and row value is equal to the unit cost rate. More generally, if ri is the row value of the ith row, Cj the column value of the jth column, and Cij the unit cost for the cell in the ith row and jth 15-62 OPERATIONS RESEARCH TABLE 25. UNIT COSTS AND FICTITIOUS COSTS CORRESPONDING TO FIRST FEASIBLE .. ,:; SOLUTION .. I'~ tions Dl D2 D3 D4 D6 Row Values Origins 81 82 ! 83 --10 -20 -5 -27 -21 - -- - --- - _-13 -23 -8 -30 -24 - -- -10 12 -3 -4 7 - -- - Cqlumn Values -10 -20 -5 -27 0 -3 17 -21 I column which contains a circled number, then all row and column values are obtained by the equation (73) Thus, by assuming r1 = 0, it can be immediately determined from eq. (73) that C1 = -10; C2 = -20; C3 = -5. 3. Next, since C3 = -5 and C23 = -8, determine that r2 = -3. 4. Since r2 = -3 and C24 = -30, then C4 = -27. 5. From C4 = -27 and C34 = -10, then r3 = + 17 is obtained. 6. Finally, for r3 = +17 and C35 = -4, C5 = -21 is obtained. This procedure for assigning row and column values can be used for any solution-matrix which is nondegenerate, i.e., given a matrix of m rows and n columns, where the solution consists of exactly m +"n - 1 nonzero elements. (Any solution consisting of less than m + n - 1 nonzero elements is said to be degenerate. Simple methods for dealing with degeneracy may be found in Charnes and Cooper (Ref. 39), Henderson and Schlaifer (Ref. 41), and Dantzig (Ref. 38).) After all row and column values for Table 25 have been computed, th~ table can be completed by filling in the remaining cells according to eq. (73). This results in the lightface figures given in Table 25. After Table 25 has been completed, the cell evaluations may be obtained as follows. Form a new table (Table 26) which consists of the unit cost OPERATIONS RESEARCH 15-63 rates of Table 23 subtracted from the number in the corresponding cell of Table 25. That is, in symbolic notation, {Table 26} = {Table 25} - {Table 23}. The cells corresponding to movements which are part of the solution will contain zeros. These zeros are given in boldface type in Table 26. The TABLE 26. CELL EVALUATIONS FOR TilE FIRST FEASIBLE SOLUTION I~ tions , D2 D1 D3 D4 Do Origins 81 0 0 0 -18 -11 82 -11 -13 0 0 -18 83 8 17 19 0 0 resulting numbers for the remaining cells are given in lightface type and are the cell evaluations to be used in determining a better program or solution. (Comparison with Table 12 will show this to be true.) When these cell evaluations have been determined, proceed as previously outlined in the section. Geollletric Interpretation of the Linear Progralllllling Problelll A geometric interpretation of the linear programming problem may be given by means of the following specific two-dimensional example. PROBLEM 1. To determine X,' Y ~ 0 which maximize Z = 2X + 5 Y subject to X ~4, Y (74) X ~ 3, + 2Y ~ 8. The system of linear inequalities which constitute the restrictions results in the convex set of points given by polygon OABeD of Fig. 6. That is, any point (X, Y) on or within the polygon satisfies the entire system of inequalities (74). Hence, there exist an infinite number of solutions to system (74). The linear programming problem then is to select, from this 15-64 OPERATIONS RESEARCH y X=4 I I I I I I I I I i--_~~B_____ :______________ y= 3 I FIG. 6. ~~"x Region satisfying restrictions (23) for non-negative X and Y (Ref. 1). y FIG. 7. Family of parallel straight lines, Z = 2X + 5Y (Ref. 1). y ................... A....,...-----::::~ ..................... ...................... .................... ................ .......... .................. C . . . . . ~:::{'~+2Y=8 .......... .................... ......... ............................. ___ ___ .............................. . . . . . .................... 2X+5Y= 19 ...... ...... D ........................................................... . . . . . --~~--------------~~--------~~'>~->~---->~----X o ............... . . . . . . . . . . 2X+5Y=8 ........... FIG. 8. 2X+5Y= 15 ...................... ~X+5Y=O Figure for geometric solution of linear programming problem (Ref. 1). OPERATIONS RESEARCH 15-65 infinite number of points, the one or more points which will maximize the function Z = 2X + 5 Y. The function Z = 2X + 5 Y is a one-parameter family of parallel lines; i.e., the function represents a family of parallel straight lines (of slope -7~) such that Z increases as the line gets farther removed from the origin; see Fig. 7. The problem may then be thought of as one of determining that line of the family, 2X 5Y = Z, which is farthest away from the origin but which still contains at least one point of the polygon OABCD. Figure 8 shows how several members of the family Z = 2X 5 Yare related to the polygon OABCD and, in particular, shows that the solution is given by the coordinates of point B. Point B is the intersection of Y = 3 and X 2Y = 8. Hence, B is given by (2, 3) and, in turn, Zmax = 2(2) 5(3) = 19. GeoInetric Interpretation of the SiInplex Method. In order to exhibit, geometrically, what happens when one solves the problem by means of the simplex technique, the simplex solution of the example of Fig. 8 is given in Tables 27 a-c. We see from those tables that the solution progresses from the point (X == Xl = 0, Y == X 2 = 0) to the point (X == Xl = 0, Y := X 2 = 3) to the point (X == Xl = 2, Y == X 2 = 3); i.e., referring to Fig. 8, from point 0 (origin) to point A to point B. Mathematically, polygon OABCD (Fig. 6) constitutes a convex set of points; i.e., given any two points in the polygon, the line segment joining them is also in the polygon. An extreme point of a convex set is any point in the convex set which does not lie on a line segment joining some two other points of the set. Thus, the extreme points of polygon OABCD are points 0, A, B, C, and D. The optimum solution to the linear programming problem will be at an extreme point and this optimum (extreme) point is reached by proceeding from one extreme point to another. Note that, in the example discussed here, the solution proceeded from extreme point 0 (Table 27a) to extreme point A (Table 27b) and, finally, to extreme point B (Table 27c). l\lore Than One OptiInuIn Solution. If the example is now changed slightly to read: PROBLEM 2. To determine X, Y ~ 0 which maximize Z = X + 2 Y subject to the restrictions + + + + X X ~4, Y ~ 3, + 2Y ~ 8, then Fig. 9 shows that the solution is given by either extreme point B or extreme point C. This is because X 2Y = 8 is both a boundary line of + 27. TABLE (a) Feasible Solution SIMPLEX METHOD Correspond~ng to X = 0, Y = 0 in Fig. 8 ", 2Z1 0 0 0 0 2 5 Basis Po Pa P4 P5 PI' P2 0 Pa 4 1 0 0 1 0 0 P4 3 0 1 0 0 1 0 P5 8 0 0 1 .1 2 Zj 0 0 0 0 0 Zj - Cj 0 0 0 0 -2 I (b) Feasible Solution Corresponding to X = 0, Y ~ = 0 ---5 " 3'in Fig., 8 " 0 0 0 0 2 5 Basis Po Pa P4 P5 PI P2 0 P3 4 1 0 0 1 0 5 P2 3 0 1 0 0 0 P5 2 1 1 Zj - Cj 15 -2 0 --5 0 1 --0 0 -2 Ci (c) Maximum Feasible Solution Corresponding to X ~ 0 Ci I = 2, Y 0 = 3 in Fig. 8 - 0 0 0 2 5 P5 PI P2 Basis Po Pa P4 0 Pa 2 1 2 -1 0 0 5 P2 3 0 1 0 0 1 2 PI 2 0 -2 1 1 0 Zj - Cj 19 0 1 2 0 0 15-66 OPERATIONS RESEARCH 15-67 the polygon OABCD and also a member of the family of parallel lines Z = X + 2 Y. Hence B = (2, 3) and C = (4, 2) both constitute solutions and yield the answer Zmax = 8. y o ............ X+2Y=O FIG. 9. Geometric solution of linear programming problem with more than one optimum solution (Ref. 1). Furthermore, any convex linear combination of Band C will also be a solution, namely, any point on the line segment BG. The Dual Problem of Linear Programming By a dual theorem of linear programming, one has a choice of two problems to solve instead of just one. This is because every linear programming problem has a dual problem such that one involves maximizing a linear function and the other involves minimizing a linear function. Furthermore, if one solves a linear programming problem by the simplex technique, a tableau corresponding to an optimum solution automatically contains a solution to the dual problem. Thus, one is free to work with either the stated problem or its dual. The dual problem of linear programming is illustrated by the example given earlier (eq. 74), namely: PROBLEM. Determine X, Y ~ 0 which maximize Z = 2X + 5 Y subject to X ~ 4, Y (74) X ~ 3, + 2Y ~ 8. This problem may be displayed in tabular form as is done in Table 28, that is, the restrictions may be read off by interpreting a light vertical line as + and the heavy vertical line as ~. Furthermore, the function to be maximized is given by the bottom row, na,mely 2X + 5Y. To obtain the dual problem, extend Table 28 as is done in Table 29. Then, by reading down each column as indicated, obtain the dual problem, namely: OPERATIONS RESEARCH 15-68 TABLE 28. TABULAR FORM OF PROBLEM Max TABLE 29. x y 1 o 4 o 1 3 1 2 8 ----12 5 DUAL PROBLEM IN TABULAR FORM x y Min - -- -- - - 1 0 4 WI - -- -- - - 0 1 W2 3 - -- -- - - 1 2 8 W3 - -- Max DUAL PROBLEM subject to (75) 2 (see Table 29). 5 Minimize g = 4WI + 3W2 + 8Wa + Wa ~ 2, W 2 + 2Wa ~ 5. WI The inequalities, ~, are converted to equalities by the subtraction of non-negative slack variables. Then, since -1 cannot be entered into a basis, one may also add artificial variables to provide for the basis. Thus, WI + Wa ~ 2 is first converted into WI + Wa - W 4 = 2. Then the artificial variable W6 may be added to provide WI + Wa - W4 + W6 = 2. For a detailed discussion, see Charnes, Cooper, and Henderson, Ref. 36. OPERATIONS RESEARCH 15-69 If one returns to the simplex solution of the maximization problem of Table 27c, one sees that the following results are given: Zmax = 19, and (76) Cl 0, Xl = 2, Zl - X2 = 3, Z2 - C2 = 0, Xa = 2, Za - Ca = 0, X4 = 0, Z4 - C4 = 1, X5 = 0, Z5 - = C5 = 2. Now, X a, X 4 , and X5 correspond to slack variables. Hence, if one starts with the first slack variable and renumbers the Zj - Cj in order, and denotes these reordered Zj - Cj by Z'j - C'j, one obtains (77) Z'! - C,! = 0 (corresponding to former Za - Ca), Z'2 - C'2 = 1 (corresponding to former Z4 - C4), Z'a - C'a = 2 (corresponding to former Z5 - C5 ), z'4 - C' 4 = 0 (corresponding to former Zl - Cl ), z' 5 - C' 5 = 0 (corresponding to former Z2 - C2). Setting W j = Z'j - C'j gives the solution to the dual minimization problem; that is, if the minimization problem were to be solved by the simplex technique, the following results would be obtained: gmin = 19, and WI = 0, bl ) = 2, = 1, -(g2 - b2) = 0, Wa = 2, - (ga - ba) = 0, W4 = 0, -(Y4 - b4) = 2, W5 = W2 (78) -(gl - 0, -(g5 - b5) = 3, where the bj are the corresponding coefficients of the Wj in the minimization function. Conversely, given the solution to the minimization problem (i.e., given eq. 78), the solution to the dual maximization problem can be determined by starting with the first slack variable W 4 and relabeling the - (gj - bj ) in order. Hence, solution eqs. (76) would result. 15-70 OPERATIONS 'RESEARCH For dual problems" it can be shown that Zmax = gmin; in other words, that the two problems ani equivalent (see Ref. 36 and Ref. 40, Chap. XIX). Hence, in solving a linear programming problem, one is free to work with either the stated problem or its dual. Since, as a rule of thumb, the number of iterations required to solve a linear programming problem is equal to one to one and a half times the number of rows (i.e., restrictions), one can, by an appropriate choice, facilitate the computation somewhat, especially in such cases where there exists a sizable difference in the number of rows for each of the two problems. A Short Cut in Solving Linear Progralllllling Problellls One of the many advantages of both the transportation and simplex techniques is that judgment can be used to good advantage in facilitating the computations required in order to arrive at an optimum solution. In the transportation problem involving m rows and n columns, the use of judgment (or a good guess) simply requires designating m n - 1 cells which are expected to correspond to a solution. After these m + n - 1 cells have been selected, proceed as in the transportation technique, first filling in these cells with circled numbers and then "evaluating" the remaining cells to determine whether or not the solution is an optimum one. Consider the problem of Fig. 6 and eq. (74). It will be shown that given a "good" guess, the corresponding simplex matrix can be constructed. Then one proceeds to the optimum solution, if the solution guessed is not already optimum. This demonstrates how one may utilize judgment in the general linear programming problem (using the simplex technique). PROBLEM. To determine X, Y ~ 0 which maximize Z = 2X 5Y subject to X ~4, + + Y (79) X ~ 3, + 2Y ~ 8. Converting this system of inequalities to equalities by means of slack variables Sa, S4, and S5 yields + Sa Y + S4 X + 2Y + S5 X (80) = 4, = 3, = 8. N ow, suppose that one "guesses" or has reason to believe that the optimum solution is such that it will not involve X; i.e., that the final solution will consist of Y, Sa, and S5' This means, accordingly, that X = 0 and S4 = o. OPERATIONS RESEARCH 15-71 Hence, to obtain the "solution," i.e., the elements of the basis that would appear in the Po column of the simplex tableau, one needs only to set X = 0 and S4 = 0 in eqs. (80), yielding S3 = 4, Y = 3, + S5 = 8, (81) 2Y so that Y = 3, (82) S3 = 4, S5 = 2. These values are then entered in the simplex tableau (see Table 30) under the column labeled Po. Note that P 2 corresponds to Y. TABLE 30. FEASIBLE SOLUTION, SHORT-CUT ApPROACH ~ Ci 2 5 Basis Po P3 P4 P6 PI P2 0 P3 4 1 0 0 1 0 5 P 2(Y) 3 0 1 0 0 1 0 P6 2 0 -2 1 1 0 Zj 15 0 5 0 0 5 Zj - Cj 15 0 5 0 -2 0 Next, construct the body of the simplex matrix. Since each value of Zj - Cj corresponds to the mini~u:m cost of deviating from the optim.um program by one unit of Xii one can determine, for each j, the corresponding Zj - Cj and the Xij which appear in that column. For example, consider that one will deviate from the program of Y = 3, S3 = 4, and'S5 = 2 by insisting that X = 1. One then needs to determine the changes in Y, S3, and S5 which result from the unit change in X. Therefore, solve 1 (83) , + S3 = 4, Y = 3, 1 + 2Y + S5 = 8, which result from eqs. (80) by letting X = 1 and S4 = o. 15-72 OPERATIONS RESEARCH Solving eqs. (83) yields (84) x = 1, Y = 3, Sa = 3, 8 5 = 1. Comparing eqs. (82) with (84) then shows that the following changes in Y, Sa, and 8 5 occur because of a unit change in X: (85) ~Y = 0, ~Sa = 1, - Therefore, in setting up a simplex tableau (see Table 30), these values would be inserted under the column labeled P 1 which corresponds to the variable X. Similarly, for 8 4 solve Sa = 4, + 1 = 3, 2Y + 8 5 = 8. (86) Y This yields (87) Y = 2, 8 a = 4, 8 5 = 4, so that (88) ~Y = 1, ~Sa = 0, ~S5 = -2. Insert these values in column P 4 of Table 30. Next, since P 2 , P a, and P5 are in the basis, complete the corresponding columns (as is done in Table 30) by inserting D's and l's in the appropriate places. _ Finally, compute the Zj - Cj's to determine whether the "solution" is optimum. This is done as at the outset of any simplex solution; i.e., first compute Zj by (89) Zj = 2: CiXij i and then subtract the corresponding Cj. Since P 2 , P a, and P 5 are in the basis, Z2 - C2, Za - Ca, and Z5 - C5 are all equal to zero. Additionally, applying eq. (89), yields Z1 - Z4 - + 0(5) + 1(0) - 2 + 1(5) + (-2)(0) - C1 = 1(0) C4 = 0(0) = -2, °= 5. Thus Table 30 is completed and, not having an optimum solution (owing to Z1 - C1 being negative), one can proceed to obtain the optimum solution as before. The reader should note that Table 30 is identical with Table 27b and was OPERATIONS RESEARCH 15-73 generated without a tableau such as is given in Table 27a. The same technique can also be applied to larger size problems so that, with a good estimate of the variables which will make up the solution, a great amount of computation might be eliminated. The Assignment Problem (See Ref. 1, Chap. 12) The assignment problem is a special linear programming problem which may be stated mathematically as follows: Determine X ij which minimize T = L: aijXij i,j subject to Xij n = i, j = 1, 2, "', n n )' X··=)" .. lJ . X lJ i=l Xi?, = 1, i = 1, "', n; j = 1, "', n. j=l In other words, the assignment problem is such that: (a) Xij = 1, if the ith facility is assigned to the jth job; 0, otherwise. (b) Each row and column of the solution matrix will have one element unity and all other elements zero. For both the assignment problem and the transportation problem, socalled "methods of reduced matrices" exist which enable one to obtain the optimum solution with great ease. 5. WAITING TIME MODELS ProhleIll Statement. A waiting time problem arises when either units requiring service or the facilities which are available for providing service stand idle, i.e., wait. Problems involving waiting time fall into two different types, depending on their structure. a. Waiting line problems involve arrivals which are randomly spaced and/ or service time of random duration. This class of problems includes situations requiring either determination of the optimum number of service facilities or the optimum arrival rate (or times of arrival), or both. The solution of these "facility and scheduling" problems is obtained through what is called waiting line theory or (from the British) queuing theory. Queuing theory dates back to the work of A. K. Erlang, who in 1908 published Use of Waiting-Line Theory in the Danish Telephone System. In Erlang's and subsequent work up to approxim9tely 1945, applications were restricted in the main to the operation of telephone systems. Since then the theory has been extended and applied to a wide variety of phenomena. See Ref. 42 and Ref. 1, Chap. 14. 15-74 OPERATIONS RESEARCH Referenoe 42 also contains an excellent list of activities to which queuing theor,y has been applied, a description of the use of the Monte Carlo technique in solving queuing problems, and a comprehensive list of references. b. Sequencing. The second type of waiting time problem is not concerned with either controlling the times of arrivals or the number of facilities, but rather is concerned with the order or sequence in which service is provided to availaqle units by a series of service points. This is the so-called sequencing problem. See Ref. 1, Chap. 16. For a discussion of related problems such as the (assembly) line-balancing problem and the traveling salesman (or routing) problem, see Ref. 1. Problem Characteristics of Queuing Models Every queuing or waiting line problem can be characterized by the following factors: 1. Input, the manner in which units arrive and become part of the waiting line. 2. Stations, the number of service units (or channels) operating on the units requiring service. 3. Service policy, limitations on ,the amount of service that can be rendered~or is allowed. 4. Queue discipline, the order in which units are served, e.g., first come, first served; random selection for service; priority. 5. Output, the service provided and its duration. To specify a queue completely, all five factors must be described. Notation (see Ref. 42). X p. C Cf n k p Pn(t) pn mean arrival rate (number of arrivals p~r unit time) mean service rate per channel the number of service channels mean number of free service channels number of units (customers) in the syst.em number of phases in the Erlang service case utilization factor for service facility: p = X/CIl the probability that there be, at time t, exactly n units in the system, both waiting and in service the steady-state (time-independent) probability that there be n units in the system, both waiting and in service: n=oo :E P net) n=O Cp :E Pn = 1 n=O traffic intensity in erlangs: n=c-l X Cp P(=O) P(>O) n=OO = =- = p. C - the probability of no waiting the probability of any waiting cf = :E n=O n=oo nPn + n=c :E CPn OPERATIONS RESEARCH 15-75 P(>r) L the probability of waiting greater than time r the average number of units in the system, both waiting and in service: Lq the average number of units waiting in the queue: 00 Lq = L (n n=c - c)Pn =L - C + Cj W the average waiting time in the system: A(t) B(t) bk(t) cumulative distribution of times between arrivals with density function aCt) cumulative distribution of service or holding times with density function bet) probability density for kth Erlang distribution Input. Arrivals or inputs into a queuing system may occur at intervals of regular length. For such cases the cumulative distribution of time intervals between arrivals is given by the uniform distribution = A(t) Dfort < to; 1 fort ~ to. If the input distribution is of Poisson type, the time interv[\,ls between arrivals are exponentially distributed. The cumulative distribution is then given by A(t) = 1 - e-Xt • An intermediate type of input may be described by the Erlangian frequency distribution of times between arrivals b (t) = k [(Xk)k] e-Mttk - 1 • r(k) This yields the exponential distribution when k = 1 and the uniform distribution when k becomes infinitely large. As Saaty points out (Ref. 42), the normal distribution also produces a good fit to arrival data in some practical problems. Output (Service or Holding Thnes). Distributions of service or holding times are defined as for arrivals or inputs. In practice, Poisson inputs and exponential service times occur very frequently. ASSulllptions Leading to a Poisson Input. (See Ref. 42.) One has a Poisson input when the following assumptions are satisfied: 1. The total number of arrivals during any given time interval is independent of the number of arrivals that have already occurred prior to the beginning of the interval. OPERATIONS RESEARCH 15-76 2. For any interval (t, t + dt), the probability that exactly one arrival will occur is Xdt + O(dt 2 ), where X is a constant, while the probability that more than one arrival will occur is of the order of dt 2 and may be neglected. For a further discussion of the Poisson input and properties of a Poisson process, see Refs. 42 and 43. AssulTIptions Leading to an Exponential Holding TilTIe Distribution. If a channel is occupied at time t, the probability that it will become free during the following time interval dt is given by p. dt, where p. is a constant. (See Ref. 42.) It follows that the frequency function of the service times is p.e-p,t, while the mean duration of service is 1/p., since the expected value of t is E(t) Queuing Models = i oo 1 te-p,t dt = -. p. o P. To date, there have been essentially two different theoretical approaches to queuing, one through differential difference equations due to Erlang and the other through integral equations as studied by Lindley. The first approach may be illustrated by means of a single channel queuing system with both X and p. constant. A Poisson input, exponential holding time, first-come, first-served single channel queue is assumed. Differential Difference Equations. If the operation starts with no items in the queue, then the following equations describe the given system. (See Ref. 1.) Poet + dt) = poet) (1 - X dt) + PI (t)p. dt (n = 0), (n ~ 1). By transposing and passing to the limit with respect to dt, these equations become dPo(t) dt dPn(t) - - = -(X dt (n = 0), + p.)Pn(t) + XPn_l(t) + p.Pn+l(t) (n ~ 1). The time-independent steady-state solution is obtained either by solving these time-dependent transient equations and letting t ~ 00 in the solution, or by setting the derivatives with respect to time equal to zero, and solving the resulting steady-state equations. The latter approach yields, successively: OPERATIONS RESEARCH 15-77 By mathematical induction, these formulas then reduce to the single equation: . Pn = pn(1 - p), where p = A/ p., since c = 1. The expected number of units in the system is given by L = L: npn = (1 - p) L: npn = p/(1 - p). The expected number of units in the line is given by Lq = L - p = p2/(1 - p). The expected waiting time is given by (see Ref. 42) W =iooTdP( T) = pep,(p-I)r and W = A p.(p. - A) p p. (1 - p) The expected number waiting, of those delayed, is 1 (1 - p) The expected waiting time of those delayed is 1 W p.(1 - p) P(>O) See Refs. 42-44. (b) Constant Holding Time Distribution (Refs. 42 and 45). The steadystate equations are: Po = 1 PI p, = (1 - p)(eP Pn = (1 - p) 1), - n L: k=l [ (k )n-k (_l)n-k ekp p (n - k)! Here, (k )n-k-I + --p--(n - k - I)! ] (n ~ 2). k P(>T) = p L: eP(p,T-i) [ -p(P.T - i)]iji!, i=O where k is the largest integer less than or equal to P.T. W = A[2p.2 (1 - ~)] p. = p 2p.(1 - p) • Finally, the expected waiting time of those delayed is 1 2p.(1 - p) (c) Poisson Input, Erlangian Holding Time Distribution. The probability density function for the kth Erlang holding time distribution is given by bk(t) = [(;~~~] e-pk'tk- 1 • OPERATIONS RESEARCH 15-79 The steady-state equations are (Ref. 45) = J.l.Pl APO (A Here + J.L)Pn (n + APn-k = J.l.Pn+l L= W = = 0), + 2k - pep (n ~ 1). pk) 2k(1 - p) , p(k + 1) • 2J.l.k(1 - p) 2. Priority Discipline: Arbitrary Holding Thnc, Nonpreemptive Service (Refs. 42 and 46). (a) Finite Number of Priorities, N. Assume a system with Poisson input for the kth priority with arrival rate Ak, arbitrary holding time with service rate J.l.k, and a priority queue discipline. Items of different types enter the system with assigned priorities for service. Whenever the system is free to service an item, it selects items of highest priority on a first-come, firstserved basis. However, if an item of higher priority enters the system while one of lower priority is in service, this service is not preempted, i.e., sent back to the waiting line. For this situation, where Wo Wk = . (1 - Uk-l) , (1 - Uk) Ai Pi =-, J.l.i N A= L = L Pi Wo = !A i Ai, k Uk < 1, oo 2 t dF(t) , 1 N F(t) = AiFi(t), A i=l and where Fk(t) is the cumulative holding time distribution function for the kth priority. The expected length of the line is given by L N L = L i=l AiWi. OPERATIONS RESEARCH 15-80 (b) Two Priorities, Preemptive Service, Exponential Holding Time. (See Refs. 42 and 47.) Priority 1 and 2 calls arrive at a single channel with arrival rates Al and A2, respectively. Both priorities have Poisson arrival distribution. Priority 1 calls in the queue enter the channel before all priority 2 calls in queue and replace any priority 2 calls in the channel on their arrival. The priority 2 call in the channel then reenters the queue. Priority 1 and 2 calls have exponential service time distribution with service rates J.LI and J.L2 respectively. Let PI = At!J.Lt, and P2 ~ A2/J.L2 where At!J.LI + A2/J.L2 < 1. Let Pnm be the probability that n priority 1 calls and m priority 2 calls are in the queue. The steady-state equations are: + Al + A2)Pnm + AIPn_I m + A2Pnm- 1 = 0 (m, n > 0), m m m m 1 JLIPl + J.L2PO + (J.L2 + Al + A2)PO + A2PO - = 0 (n = 0, m > 0), o 0 o J.LIPn+l - (J.LI + Al + A2)Pn + AIPn_I = 0 (m = 0, n > 0), J.LIPn+l m - (J.LI 1 - (m = n = 0). The expected number waiting, first priority, is given by PI 1 - PI The expected number waiting, second priority, is given by P2 [ 1 - PI + (J.Lt!J.L2)PI ] (1 - PI)(l - PI - P2) • (c) Continuous Number of Priorities (Ref. 42). For an excellent discussion of results for a single channel priority queuing system with application to machine breakdowns, see Ref. 48. The number of available machines is assumed to be infinite. Priorities are assigned according to the length of time needed to service a machine with higher priorities being assigned to shorter jobs. Since the length of service time may correspond to any real number, a continuous number of priorities exists. 3. Randolll Selection for Service: Illlpatient Custolllers, Exponential Holding Tillle. (See Refs. 42 and 49.) Assumptions. Poisson input, exponential holding time, random selection for service, items leave after a wait of time To. Steady-State Equations. -APO Pn-l - (A + (J.L + C1)Pl = 0 (n = 0), + J.L + Cn)Pn + (J.L + Cn+1 )Pn+l = 0 (n ~ 0), OPERATIONS RESEARCH 15-81 where Cn is the average rate at which customers leave when there are n customers in the system and where Po is obtained from 00 L Pn n=O Solution. = 1. n Pn = Xnpo II (J.L + Ck)-l (n=1,2, ... ), k=l Cn = J.L exp (- J.LTo/n) 1 - exp (-J.LTo/n) . 4. Lhnited Source: Exponential Holding Time. (See Refs. 42 and 43.) Assumptions. Input from a source having only a finite number m of customers. Exponential service time, single channel (servicing of m machines). Steady-State Equations. (n = 0) [em - n)X + J.L]Pn = (m - n + l)Xpn-l + J.LPn+l (l~n~m-l), (n ~ m). J.LPm = Xpm-l Solution. m Po = 1 - LPn, n=l L = m - [(X + J.L)/X](l - Po). 5. Constant Input: Exponential Holding Time. and 50.) For constant input at intervals of length 0, (See Refs. 42 Pn = Po(l - po)n, where Po is given by 1 - Po = exp (-J.LPoO). Furthermore, P(>T) = (1 - Po) exp (-J.LPoO), and W = _.£00 T dP(>T) = (1 - PO)/JLPoo 6. Queue Length-Dependent Parameters and Time-Dependent ParaIlleters. For an excellent resume of queuing results for queue length- dependent and time-dependent parameters, see Ref. 42. 51-53. See also Refs. OPERATIONS RESEARCH 15-82 Queuing Theory Formulas. Two Channels in Series (See Ref. 54.) Exponential Holding Times. (See Ref. 42.) Assumptions. Poisson input with mean A, two channels in series with exponential holding times, f.J.l and f.J.2, respectively. After finishing service at the first gate, the customer moves on to the second gate. (a) Unlimited Input. The average distribution of customers throughout the system is given in the following table. Channell Average number of customers waiting for service Average number of customers being served Average total number of customers Channel 2 Total System ~+~ Xl X2 1 - Xl 1 - X2 Xl X2 Xl X2 1 - Xl 1 - X2 1 - Xl Xl 1 - X2 + X2 ~+~ 1 - Xl 1 - X2 The steady-state solution giving the probability that there are nl customers waiting at the first gate and n2 at the second is given by where Xl = "A/f.J.l <1 and The probabilities of having n customers waiting at the first channel and at the second channel are, respectively, Pl(n) = xln(1 - Xl), P2(n) = X2n(1 - X2). (b) Limited Input. For a resume of results for limited input, see Ref. 42. Queuing Theory Formulas. Three Channels in Series Results for the case of three channels in series can be found in Ref. 54. Queuing Theory Formulas. Multiple Channels in Parallel, Poisson Input An excellent resume of results for both a finite and infinite number of channels can be found in Ref. 42. For the case with a finite number of channels, this includes: (a) identical exponential holding times, (b) identical constant holding times, (c) (priority discipline) different Poisson inputs, a finite number of priorities with the same exponential holding time (nonpreemptive), and (d) (limited source) exponential holding time. OPERATIONS RESEARCH 15-83 The resume for an infinite number of channels covers: (a) exponential holding time and (b) limited source, exponential holding time. Sequencing Models For a detailed discussion of sequencing models, see Ref. 1, Chap. 16. Only a few results are presented here. 1. Two-Station and n Jobs, No Passing. Consider the case of n jobs to be processed on two machines, A and B, with each job requiring the same sequence of operations and no passing allowed. The order (sequence) in which jobs are processed on machine A must be retained in processing these same jobs on machine B. It is assumed that material can be held between work stations so that, in the meantime, the preceding work station is left clear to start work on another job. It is further assumed (without loss of generality) that all jobs must first go to machine A and then machine B. Let Ai = time required by job i on machine A, Bi = time required by job i on machine B, T = total elapsed time for jobs 1, 2, ... , n, Xi = idle time on machine B from end of job i - I to start of job i. The sequenr.ing problem is to minimize T, the total elapsed time. The total elapsed time may be expressed as n T = L n Bi + LXi. i=l i=l For any given set of items, ~? =lB i is constant; therefore, the problem of minimizing T is equivalent to that of minimizing n Dn(S) = LXi, i=l where Dn(S) is a function of the sequence S. Procedure for Finding the OptilllUlll Sequence. A procedure for finding the optimum sequence for two stations, n jobs, and no passing is due to Johnson (Ref. 55) and can be described by means of the example represented in Table 31. . TABLE 31. MACHINE TIMES (IN HOURS) FOR FIVE JOBS AND i Ai Bi 136 272 347 453 574 Two MACHINES OPERATIONS RESEARCH 15-84 Step 1. Examine the A/s and B/s and find the smallest value [min (Ai, Bi)]. In this illustrative case, this value is B2 = 2. Step 2. If the value determined falls in the Ai column, schedule this job first on machine A. If the value falls in the Bi column (as it does in this case), schedule the job last on machine A. Hence, job 2 goes last on machine A. Step 3. Cross off the job just assigned and continue by repeating the procedure given in steps 1 and 2. In case of a tie, choose any job among those tied. In this illustrative case, once job 2 is assigned, the minimum value which remains is 3, and it occurs in Al and B 4 • There is a choice, so arbitrarily select AI. Then job 1 goes oil machine A first. Now B4 is the minimum remaining value. Hence, job 4 goes on machine A next to last. The minimum remaining value is 4, and it occurs in Aa and B 5 • Then job 3 can be put on machine A second and job 5 on third to the last. The resulting sequence is optimum and is 1, 3, 5, 4, 2. 2. Three Stations and n Jobs, No Passing. Let Y i = the idle time on the third machine before it starts work on the ith job, Ci = working time of the third machine on the ith job. The total elapsed time for three stations and n jobs (no passing) is given by n T = L i=1 n Ci +L Y i• i=1 Since ~i=ICi is fixed, the problem is to minimize ~i =IYi . Johnson (Ref. 55) has found an optimum solution to this problem for the special case where either (1) min Ai ~ max Bi or (2) min Ci ~ max B i . The first of these conditions is satisfied by means of an exact equality in the illustrative data given in Table 32. TABLE 32. MACHINE TIMES (IN HOURS) FOR FIVE JOBS AND THREE MACHINES i 1 Ai 8 Bi 5 2 3 4 5 10 6 7 11 6 2 3 4 Ci 4 9 8 6 5 To obtain an optimal sequence a new table, such as Table 33, is formed. Then the procedure (described in the preceding) for obtaining an optimum sequence for two stations is applied to Table 33. In this case, the following OPERATIONS RESEARCH 15-85 TABLE 33. SUMS OF MACHINE TIMES (IN HOURS) FOR FIVE JOBS FOR FIRST AND INTERMEDIATE MACHINES AND FOR INTERMEDIATE AND LAST MACHINES i 1 2 3 Ai 4 5 + Bi 13 16 8 10 15 Bi + Ci 9 15 10 9 9 sequences arise and are optimum for the originally cited three-station problem: 3,2, 1,4,5 3,2, 1,5,4 3,2,4,5,1 3,2,5, 1,4 3,2,4, 1,5 3,2,5,4, 1 In situations where the conditions min Ai ~ max Bi or min Ci ~ max Bi do not hold, no general procedure is available as yet for obtaining an optimum sequence. It follows that no general solution is yet available for the more general problem of n jobs and m machines, each job following an identical route with no passing allowed. However, the following statement holds: For optimum sequences (the criterion being the total elapsed time), the total idle time of the last machine must be minimized. 3. Identical Routing, Passing Permitted. Although each of n jobs may have to pass through each of m stations according to a specific route, the process characteristics do not always require that the order in which n jobs pass through each of the stations be identical, i.e., passing is permitted. Bellman (Ref. 56) and Johnson (Ref. 55) have shown, that for two or three station processes, the optimum sequence always involves the same ordering of jobs over each station. This result does not necessarily hold where more than three stations are involved. . 4. Different Routing. In many production operations, particularly in job shops, the various jobs which must be done require different routing through the work stations or centers. The problem of determining the optimum sequence for two jobs which have to be processed on m machines using two different routes has been treated by Akers and Friedman (Ref. 57) who, by means of Boolean algebra, have developed a technique for eliminating sequences which are technologically unfeasible. Their technique yields a subset of sequences, one or more of which is optimum. 15-86 OPERATIONS RESEARCH The Akers-Friedman technique can also be extended to apply to the case of n jobs and m stations. See Refs. 56 and 57. 6. REPLACEMENT MODELS ProbleIll State~ent. The theory of replacement is concerned with the prediction of replacement costs and the determination of the most economic replacement p~licy. There are two basic types of replacement problems concerned with (a) items that deteriorate with age and/or use and, (b) items with probabilistic life spans and with efficiencies that do not decline over their life spans. For type (a) items that deteriorate or degenerate with age and use, the problem is to determine when to replace equipment so as to minimize the sum of costs due to loss of efficiency, on the one hand, and cost of new equipment, on the other hand. For items whose efficiency declines over their life spans (e.g., machine tools, vehicles), prediction of costs involves determining those factors which contribute to increased operating cost, forced idle time, increased scrap, increased repair, etc. The alternative to the increased cost of operating aging equipment is the cost of replacing old equipment with new. There is some age at which replacement of old equipment is more economical than continuation at the increased operating cost. At that age, the saving from the use of new equipment more than compensates for its initial cost. For type (b) items that do not essentially deteriorate with age and use, but which have probabilistic life spans (e.g., light bulbs or'radio tubes), the problem is to determine when and how to replace the items (i.e., individually or in groups) so as to minimize the sum of costs (\f (1) the items, (2) replac' ing items after failure, and (3) group replacements. For a group of items with a probabilistic life span, the prediction of costs involves the estimation of the probability distribution of life spans and calculation from these of the predicted number of failures as a function of the age of the group of items. For several schemes for approximating the number of failures, see Refs. 43 and 58-62. 'For a complete discussion of both types of replacement problems, see Ref. 1, Part VII. ReplaceIllent of IteIlls That Deteriorate The measure of efficiency used as a basis for determining optimum replacement decision rules is the discounted value of all future costs associated with any replacement policy. Discounted cost is the amount required at the time of the policy decision to build up a fund at compound interest large enough to pay the pertinent cost when due. 15-87 OPERATIONS RESEARCH In general, the costs included in the replacement decisions cited here are all costs that depend upon the choice or age of the machine. See Ref. 1, Chap. 17, for a discussion of the relevant costs in replacement theory considerations. Cost Equation. Consider a series of time periods 1, 2, 3, 4, "', of equal length, and let the costs incurred in these periods be ClI C2, C3 , C4 , .. " respectively. It is assumed throughout that, relevant to items that deteriorate, these costs are monotonically increasing with time. Assume that each cost is paid at the beginning of the period in which it is incurred, that the initial cost of new equipment is A, and that the cost of investment is 100r% per period. The discounted value I(n of all future costs associated with a policy of replacing equipment after each n periods is given by (90) I(n = [+ A Cl ~ ~ + 1~ + r + (1 + r)2 + ... + (1 + r)n-l A + Cl C2 Cn . ] ] + [ (1 + r)n + (1 + r)n+l + ... + (1 + r)2n-l + .. '. Equation (90) may also be written as n A (91) I(n = + L: [Cd(l + r)i-l] i=l 1 - [1/(1 + r)]n or, if 1 X=--, (92) l+r then n A (93) Kn = + L: CiX i - l i=l 1 - xn Now, if the best policy is replacement every n time periods, the two inequalities (94) must hold. Furthermore, for the case where the Cn are monotonic increasing, these conditions are sufficient as well as necessary ones for Kn to be minimum. OPERATIONS RESEARCH 15-88 From eq. (93), K n- 1 - Kn > 0 is equivalent to (see Ref. 1) (95) and Kn+1 - Kn (96) > 0 is equivalent to Cn+1 > (1 - X)K n. These inequalities, (95) and (96), may also be written as: (A (97) and Cn (98) Cn+1 < > + C1) + C2X + ... + c n_ 1xn-2 1 + X + X 2 + ... + xn-2 (A + C1) + C2X + ... + cnxn-1 ' 1 + X + X + ... + xn-1 2 where the right-hand terms are the weighted averages of all costs up to and including the (n - l)st and the nth periods, respectively. Decision Rules. As a result of these two inequalities, the following decision rules for minimizing costs may be stated: 1. Do not replace if the next period's cost is less than the weighted average of previous costs. 2. Replace if the next period's cost is greater than the weighted average of previous costs. For further discussion and a geometric interpretation of these decision rules, and also an illustration of their use, see Ref. 1, Chap. 17. Replacelllent of Itellls that Deteriorate by Different Equiplllent. Here, one considers the replacement of equipment by new or alternate pieces of equipment other than those currently in use. Let K' n = minimum discounted value of all future costs of new equipment, D 1, D2, "', Dm = costs in each future period incurred with present equipment, X = 1/(1 + r), the discount factor, 'lrm = discount value of all future costs if present equipment is discarded after m periods. Cost Equation. (99) 'lrm = D1 + D2X + ... + DmXm- 1 + K' nxm. Therefore (100) 'lrm+1 - 'lrm = Dm+1Xm + K' n(Xm+1 - X m), and (101) 'lrm - 'lrm-1 ~ DmXm- 1 + K' n(Xm - X m - 1 ). OPERATIONS RESEARCH The condition 7l"m-l - (102) whereas the condition (103) 7l"m 15-89 > 0 is equivalent to Dm 7l"m+l - < 7l"m D m+1 (1 - X)J('n > 0 is equivalent to > (1 - X)J('n. Conditions (102) and (103) show that the minimum cost is achieved by continuing the use of the old equipment until the cost for the next period is greater than (1 - X)J(' n, where (1 - X)J(I n is the weighted average of the costs of using the equipment for n periods between replacements. ReplaceIllent of IteIlls That Fail The second class of replacement problems is concerned with items that do not deteriorate markedly with service but which ultimately fail after a period of use. The period between installation and failure is not constant for any particular type of equipment but will follow some frequency distribution. This section is concerned only with items that fail with increasing probability as they age. Furthermore, it is assumed hereafter that all failures will be replaced. The problem, therefore, is to plan the replacement of items that have not failed. Replacing a used but still functioning item with a new item is justified only if the cost of replacement is higher after failure than before, and if installing the new item reduces the probability of failure. The replacement policy will depend upon the probability of failure. It is therefore of considerable importance to estimate the probability distribution of failures. Statistical techniques used in such "life testing" are being developed rapidly and a growing literature on the subject is becoming available. See Refs. 63-65. The costs of replacement before and after failure are the other important factors. In this section, the cost of the alternatives of replacement or retention is considered and two policies are developed that minimize expected costs as a function of the cost of replacement, cost of failure, and probability of failure. Mortality Curves. The initial information on the life characteristics of a light bulb, for example, may be shown in the form of a mortality curve. A group of N light bulbs is installed, and at the end of t equal time intervals the number of bulbs surviving equals some function of t, say Set). The proportion of the initial bulbs remaining is, then, (104) Set) set) = - . N OPERATIONS RESEARCH 15-90 A typical mortality table is shown in Table 34 giving, at regular intervals of time, the number of survivors out of an original group of 100,000 bulbs. Specifically, the mortality curve would result from column 2 in Table 34, namely that given by S(t), and is given in Fig. 10. TABLE 34.' (1) Time Units Elapsed t 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 LIFE CHARACTERISTICS OF A LIGHT BULB: ORIGINAL POPULATION OF 100,000 UNITS (2) Survivors Set) 100,000 100,000 99,000 98,000 97,000 96,000 93,000 87,000 77,000 63,000 48,000 32,000 18,000 10,000 6,000 3,000 2,000 1,000 0 (4) (5) Reduction in Survivors Probability of Failure Conditional Probability of Failure Set - 1) - Set) pet) Vt.o 0 1,000 1,000 1,000 1,000 3,000 6,000 10,000 14,000 15,000 16,000 14,000 8,000 4,000 3,000 1,000 1,000 1,000 0 0.01 0.01 0.01 0.01 0.03 0.06 0.10 0.14 0.15 0.16 0.14 0.08 0.04 0.03 0.01 0.01 0.01 0 0.0100 0.0101 0.0102 0.0103 0.0312 0.0645 0.1149 0.1818 0.2381 0.3333 0.4375 0.4444 0.4000 0.5000 0.3333 0.5000 1.0000 (3) Column (1), number of elapsed periods. Column (2), survivors at end of period, based on figures supplied by a major light bu~b manufacturer. Column (3), rate of change of column (2). Column (4), column (3) divided by 100,000. Column (5), column (3) divided by value in column (2) for previous period. Life Span. Perhaps a more familiar presentation of the life characteristics of a group of items is in the form of a probability distribution of life spans. Such a probability distribution may be derived from the mortality table by taking SCt - 1) - set) (105) N = pet), the proportion of units failing in time period t. (See Table 34.) probability function, p(t), is plotted against t in Fig. 11. This 15-91 OPERATIONS RESEARCH 100,000 90,000 80,000 -::::ti5' 70,000 ~ 60,000 0 > .~ ::l en 50,000 '0 .... 40,000 (I.) .c E ::l Z 30,000 20,000 10,000 2 4 6 8 10 12 14 16 18 Time units elapsed, t FIG. 10. Number of survivors after t periods of time. (Data from Table 34.) 0.16 0.15 ~ 0.14 :s "C 0.13 0 0.12 a. 0.11 .~ (I.) E +=l 0.10 c:: .~ 0.09 :§ 0.08 ~ ~ c:: 0.07 ~ 0.06 c:: ~ 0.05 8. 0.04 ~ a.. 0.03 0.02 0.01 0 0 2 4 6 8 10 12 14 16 18 20 Time units elapsed, t FIG. 11. Probability of failure in tth period of bulb installed at beginning of first period. (Data from Table 34.) , OPERATIONS RESEARCH 15-92 Conditional Probability of Failure. Another descriptive notion of life characteristics is the conditional probability of failure or its complement, the probability that an item at time t will survive to time t + 1. This probability is given by (106) VtO , = Set - 1) - SCt) set) = 1 - --Set - 1) Set - 1) and is the proportion of surviving units failing in the subsequent period. (See Table 34.) This c'onditional probability function is plotted against t in Fig. 12. 1.0 o 0.9 J!l:;'~ 0.8 'c'g ~.~ 0.7 cc. :21:: 0.6 ~aJ ~ 6- .... aJ 0.5 ~.g 0.4 Oil) ~.!: 0.3 c.1lO £@ .E 0.2 0.1 OL-~==~~~L-~~~~~~~ o 2 4 6 Time units elapsed, t FIG. 12. Conditional probability of failure in tth period. (Data from Table 34.) ReplaceInent Process. It is assumed here that failures occur only at the end of a unit period of time. During the first t - 1 time intervals, all failures occurring during any given time interval are replaced at the beginning of the next time interval. At the end of the tth time interval, all units are replaced regardless of their ages. The problem is to determine that value of t which will minimize total cost. Rate of Replacement. The general expression for the number of units failing in time interval t is (107) = J(t) N{P(t) + % p(x)p(t - x) + E[~ where N = total units in the installation, p(x) = probability of failure at age x. p(x)p(b - x)] p(t - b) + ... }. OPERATIONS RESEARCH 15-93 Table 35 illustrates the use of eq. (107) to determine the total number of failures in each time period t, based upon the data of Table 34. TABLE 35. TOTAL FAILURES (REPLACEMENTS) IN EACH PERIOD t a (1) Period 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (2) (3) Replacements (1) Current Cumulative f(t) 'J;f(t) ° 1,000 1,000 1,010 1,020 3,030 6,040 10,090 14,201 15,392 16,665 15,000 9,480 6,175 6,160 5,521 7,309 9,317 10,181 11 ,529 ° 1,000 2,000 3,010 4,030 7,060 13,100 23,190 37,391 52,783 69,448 84,448 93,928 100,103 106,263 111,784 119,093 128,410 138,591 150,120 Period 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 (2) (3) Replacements Current Cumulative f(t) 'J;f(t) 12,047 11,706 10,820 9,697 8,700 8,288 8,413 8,862 9,523 10,100 10,413 10,507 10,348 9,999 9,636 9,079 9,220 9,271 9,447 9,669 162,167 173,873 184,693 194,390 203,090 211 ,378 219,791 228,653 238,176 248,276 258,689 269,196 279,544 289,543 299,179 308,258 317,478 326,749 336,196 345,865 Column (1), periods since original installation. Column (2), calculated as described in text. Column (3), cumulative sum of values in column (2). a Data based on Table 34. A second method for determining the number of failures in each period t, based upon the conditional probability of failure and using vector algebra is given in Ref. 1, Chap. 17. Cost of ReplaceInent. A second fundamental requirement of a useful replacement policy is that the cost of replacement after failure be greater than the cost of replacement before failure. This difference in cost is the source of savings required to compensate for the expense of reducing the probability of failure by replacing surviving units. Group replacing of units can cost less than replacement of failures by virtue of labor savings, volume discounts on materials, or for other reasons. 15-94 OPERATIONS RESEARCH Cost Equation. Let K(t) = total cost from time of group installation until the end of t periods. Then, if the entire group is replaced at intervals of length t periods, K(t) - - = Average cost per period of time. t \ Let C1 = C2 = f(X) = N = Then the (108) unit cost of replacement in a group, unit cost of individual replacement after failure, number of failures in the Xth period, number of units in the group. ; total cost, K(t), will be given by , K(t) = NC 1 + C2 t-l :E f(X). X=1 Therefore, the cost per period is given by (109) ]((t) NC 1 C2 t-l t t t X=l -=-+- :Ef(X). Minimization of Costs. Costs are minimized for a policy of group replac ing after t periods if K(t) K(t - 1) (110) --< , t t- 1 and if K(t) K(t + 1) . (111) -< . t t+1 J By using eq. (109), conditions (110) and (111) may be rewritten respectively as t-2 NC l (112) C2 f(t - 1) + C2 :E f(X) X=l < ----- t- 1 and (113) C2 f(t) X=l > ----- Conditions (110) and (111) and, in turn, conditions (112) and (113), are necessary conditions for optimum group replacement. They are not sufficientas illustrated by the function F(t) = t sin t, 0 ~ t ~ 471", which satisfies these conditions for not one but two values of t, although the function has but one true (as opposed to relative) minimum point. TABLE 36. AVERAGE COSTS FOR ALTERNATIVE GROUP REPLACEMENT POLICIES: CI/C2 = 0.25 (Data from Table 35) (1) (2) (3) (4) (6) (5) t J(t) (Current) t 1 2 3 4 5 6 7 8 9 10 ° 1,000 1,000 1,010 1,020 3,030 6,040 10,090 14,201 15,392 Total Cost L:J(X) X=l (Cumulative) ° 1,000 2,000 3,010 4,030 7,060 13,100 23,190 37,391 52,783 t-2 (t - l)J(t - 1) - L: J(X) 1 °° 2,000 2,000 2,040 2,090 14,150 35,220 67,620 104,619 t-l t-l tJ(t) - L:J(X) 1 [((t) = NC 1 + C2 L:J(X) 1 ° 2,000 2,000 2,040 2,090 14,150 35,220 67,620 104,619 116,529 25,000C2 25,000C2 26,OOOC2 27,OOOC2 28,01OC2 29,030C2 32,060C2 38,10DC2 48,190C 2 62,391C2 (7) Average Cost per Period K(t)/t 25,000C2 12,500C2 8,667C2 6,750C 2 5,602C2 4,838C2 4,580C2 a 4,762C 2 5,354C2 6,239C2 0 -c m :;:c » ~ 5 Z (J) :;:c m m (J) » :;:c n ::I: Column (1), number of periods between group replacements. Column (2), number of replacements from Table 35. Columns (3), (4), (5), calculated as indicated in column headings Column (6), calculated as indicated with C1 = 0.25C2 • Column (7), column (6) divided by column (1). a Therefore l = 7. 01 .0 01 15-96 OPERATIONS RESEARCH Conditions (112) and (113) may be interpreted as follows: 1. One should not group replace at the end of the tth period if the cost of individual replacements at the end of the tth period is less than the average cost per period through the end of t periods. 2. One should group replace at the end of the tth period if the cost of individual replacements for the tth period is greater than the average cost per period through the end of t periods. The use of these decision rules for the light bulb example (see Table 34) is illustrated in Table 36. For a full discussion of this replacement model and the solution of the light bulb example, see Ref. 1. Solution of Replacelllent Problellls by Monte Carlo Technique In the determination of optimum group replacement policies for items that fail, one needs to determine that value of t such that (114) will be a minimum, when N = number of units in the group, C 1 = unit cost of replacement in a group, C2 = unit cost of individual replacement after failure, cp(t) = number of failures in time t. Failure Equations. If f(t) is the probability density function of failure, then the expected number of first generation failures in time t is given by (115) >(t) = N {I(t) dt = NF(t), where (116) F(t) = f)(t) dt. Similarly, the number of second generation failures is given by That is, t (117) CP2(t) = N f. F(Ol)F(t - a) dOl. The number of third generation failures is given by OPERATIONS RESEARCH That is, t (t) = NF(t) +N 1. t F(a)F(t - a) da + N f.~o £~a Ji'(a)F({3 - a)F(t - (3) d{3 da + .. '. Unless simplifying assumptions are made relative to second and higher generation failures, it is almost impossible to obtain an analytic solution for Cj)(t) as given by eq. (119). However, by the use of the Monte Carlo technique one can solve for values of cJ>(t) without making any simplifying assumptions. That is, one can determine cJ>(t) for many values of t and then construct K(t) as a function of t in order to determine the optimum group replacement policy. EXAMPLE. The Use of the Monte Carlo Technique in Solving Replacement Problems. Assume that one wishes to determine the optimum group replacement policy for a group of light bulbs whose life pattern follows a normal distribution, the mean and standard deviations of which are 30 and 10 days, respectively. (That is, J.L = 30 days and (]' = 10 days.) Furthermore, assume that C1 = $0.50, C2 = $1.00, N = 10, T = 360 days, where T is the total time period under consideration. For purposes of illustration only, further assume that if group replacement is used, it can be done only at the end of 10, 20, 30, or 40 days. A chart can then be set up and, by use of a table of random normal numbers, the total expected number of failures can be determined for each value of t (t = 10, 20, 30, or 40 days). Tables of random normal numbers are based on a mean of 0 and a standard deviation of 1. Hence, any number selected from the table of random normal numbers must first be multiplied by 10 and then added to (if positive) or subtracted from (if negative) OPERATIONS RESEARCH 15-98 30. Thus, if the first random normal number selected from the table is 0.464, the adjusted random normal number will be (0.464) (10) + 30 = 34.64. That is, in the simulation of the light bulb system, the first bulb will last 34.64 (or 35) days before failing. The next number from the table of random normal numbers is, say, 0.137, which is adjusted to (0.137)(10) + 30 = 31.37. Therefore, the replacement to the first bulb will last 31 days, that is, 35 + 31 = 66 days after the start of the analysis. Therefore one can expect the first bulb to burn out after 35 days and its replacement to last through the balance of the 40-day period under discussion. This procedure is carried out for all ten lighting fixtures in the installation, and the expected number of failures for each of the intervals 10, 20, 30, and 40 days is determined as in Table 37. TABLE ~ 10 20 30 40 2 1 37. 3 FAILURE TABLE FOR 4 5 N 6 = 7 10 9 8 10 Total Failures -- - -----31 31 31 31,36,62 35 35 35 35,66 55 27 55 27 55 27,38 55 27,38,75 29 29 29,64 29,64 33 33 33 33,47 27 27 27,59 27,59 43 43 43 43 32 32 32 32,36 20 20 20,55 20,55 0 0 4 10 The entire procedure was repeated nine more times and the ten samples (each of sample size N = 10) gave the results shown in Table 38. TABLE 38. SUMMARY TABLE FOR TEN SAMPLES WITH t 10 20 30 40 N = 10 Total Number of Failures Average Number of Failures, cfJ(t) 1 15 51 96 0.1 1.5 5.1 9.6 From Table 38 and eq. (114), one can determine and compare the cost of group replacement for the 10-, 20-, 30-, and 40-day periods. These total expected costs over time period T (360 days) are: KlO = 3160 0[(10)(0.50) + (0.1)(1.00)] = $183.60, OPERATIONS RESEARCH + (1.5)(1.00)] ](30 = 3360°[(10)(0.50) + (5.1)(1.00)] ](40 = .!14600[(10)(0.50) + (9.6)(1.00)] ](20 = 2260°[(10)(0.50) 15-99 = $117.00, = $121.20, = $131.40. Thus, if one is to group replace at 10-, 20-, 30-, or 40-day intervals, one would do so every 20 days. (This assumes, of course, that, in practice, one has taken sufficiently large sample of random normal numbers.) If one did not group replace, one could expect to replace each bulb, on the average, every 30 days. Accordingly, the total expected cost over time period T of not group replacing, call it I(~J' is a ICf) = 10(3360°) (1.00) = $120. Therefore, under the assumptions of this illustration, one should group replace every 20 days (since Koc; > K20)' Other Models Although the solutions presented apply only to the particular model described earlier, models of other characteristics may be approached in the same way. For example, a model could be concerned with group replacement in which new bulbs are used for group replacement only, and used bulbs replace failures in between group replacements. A different model is needed when surviving bulbs are replaced at a fixed age, rather than at fixed intervals of time. The considerations of this chapter have been limited to demonstrating an approach to two basic replacement problems, one involving deterioration, and the other involving probabilistic life spans of equipment. For a discussion of the models which have been developed and solutions obtained for various sets of assumptions about the conditions of the problem, see Ref. 1, Chap. 17. For a useful review of equipment replacement rules from an industrial point of view, see Ref. 66. 7. COMPETITIVE PROBLEMS Introduction A competitive problem is one in which the efficiency of one's decision is affected by the decisions of one's competitors. Such problems include, for example, competitive advertising for a relatively fixed market or bidding for a given set of contracts. Game Problems. The most publicized competitive problem in O.R. is the "game" as developed by the late John von Neumann and discussed in his Zur Theorie der Gesellshaftsspiele (Ref. 67) in 1928 and, jointly with 15-100 OPERATIONS RESEARCH Oskar Morgenstern, in their Theory of Games and Economic Behavior (Ref. 68) in 1944. For many decades, economists tended to take as their standard model for their science, the situation of Robinson Crusoe, marooned on an uninhabited island and concerned with behaving in such a manner as to maximize the goods he could obtain from nature. It was generally felt that it would be possible to gain insight into the behavior of groups of individuals by starting with a detailed analysis of the behavior in this simplest possible case: the case of a single individual all alone and struggling against nature. This line of attack on economic problems, however, suffers from the defect that in going from a one-man society to even a two-man society, qualitatively different situations arise which could not have been foreseen from the one-man case. Von Neumann was led to believe that group economics could more profitably be viewed as analogous to parlor games of strategy. V on Neumann's game is characterized by a fixed set of rules and a known number of competitors whose possible choices are also known. Furthermore, the payoff for each combination of choices is also assumed to be known. The solution to von Neumann's game is obtained by a principle of conservatism called the minimax principle, namely one which will maximize the minimum expected gain or minimize the maximum expected loss. Very little has been accomplished by way of applying the von Neumann theory of games. Military applications have been referred to but have not been made public. Several authors have explored the possibility of applying game theory to industrial problems, but they have not dealt with actual applications. What then is the significance and value of game theory? This can best be answered by quoting Williams' (Ref. 69, p. 217) succinct appraisal: While there are specific applications today, despite the current limitations of the theory, perhaps its greatest contribution so far has been an intangible one; the general orientation given to people who are faced with over-complex problems. Even though these problems are not strictly solvable-certainly at the moment and probably for the indefinite future-it helps to have a framework in which to work on them. The concept of a strategy, the distinction among players, the role of chance events, the notion of matrix representations of the payoffs, the concepts of pure and mixed strategies, and so on, give valuable orientation to persons who must think about complicated conflict situations. * Bidding Problellls. A second type of competitive problem is one in which bidding takes place. Bidding problems differ from game problems in that: (a) the number of competitors is not usually known, (b) the number of choices is not known (since one can bid over a large range), and (c) the payoffs are not usually known but, rather, are subject to estimation * Reprinted by permission from The Compleat Strategyst by J. D. Williams, copyright 1954. McGraw-Hill Book Co. OPERATIONS RESEARCH 15-101 (e.g., in bidding for mineral rights). Furthermore, in some bidding situations (e.g., those in which one bids a dollar amount plus a percentage of the royalties), one may not be able to determine readily whether or not a given bid would have won or lost. Only a limited theory of bidding exists to date, although the concepts and techniques of statistical decision theory hold great promise in this area. A major research contribution has been made by Friedman (Ref. 70 and Ref. 1, Chap. 19). The number of applications of bidding theory has been very limited; however, in at least one instance, the results obtained have been spectacularly successful. Bidding models will not be discussed here. See Refs. 1 and 70. The Theory of GaInes Defini tions. Game, a set of rules and conventions for playing. Play, a particular possible realization of the rules. Move, a point in a game at which one of the players selects an alternative from some set of alternatives. Choice, that particular alternative selected. Strategy, a player's predetermined method for making his choices during the play. Classification of GaInes. 1. Players, the number of sets of opposing interests: (a) one-person, (b) two-person, (3) n-:-person (n > 2). 2. Payment, (a) zero-sum game, a game in which the sum of the payoffs, counting winnings as positive and losses as negative, to all players is zero; (b) nonzero-sum game, a game in which the sum of the payoffs to all players is not zero. 3. Number of moves: (a) finite, (b) infinite. 4. Number of choices: (a) finite, (b) infinite. 5. Amount of information regarding opponent's choices: (a) all, (b) part, (c) none. One-Person GaInes. One-person zero-sum games are trivial games which say "do nothing" since there is no gain to be made by the one participant in the game. One-person nonzero-sum games are the ordinary maximization and minimization problems solvable by calculus and other optimization techniques. Thus, in order to study the characteristic properties of games of strategy, it is necessary to go to games which involve more than one player. The discussion here will center mainly on twoperson zero-sum games. Two-Person, Zero-SuIn GaInes. Analysis of the very simplest of games shows that there are two general kinds, which may be illustrated by two kinds of coin matching. 15-102 OPERATIONS RESEARCH Single Strategy Games. Assume that one is matching dimes and quarters where, if both coins are the same, it is a standoff, but if the coins differ, the quarter takes the dime. In this game it is safest always to playa quarter, for then one can never lose, whereas one may lose by playing a dime. Such games in which each opponent will find it safest to stick to one strategy are called single strategy games. Mixed Strategy Games. The second general type of game may be illustrated by the usual penny-matching situation in which each player chooses either heads or tails. If the coins match, the matcher wins; if they do not match, the matchee wins. In this case, if either player sticks to one strategy, he may consistently lose. The only safe way to play the game is to play heads or tails in a completely random manner, as, for instance, by flipping the penny in the air just before one plays it. Such games are called games of mixed strategy. Payoff Matrix. Games can have any number of strategies. In principle, once each player has chosen one of the sets of strategies available to him, it is possible to calculate the probable outcome of the game. The net payoffs can then be arranged in a two-dimensional matrix, the payoff matrix. From the payoff matrix, one can then find whether the game is a single or a mixed strategy game and, if mixed, in what proportions to mix the playing. For further discussion of the definitions, classification of games, and examples of the construction of payoff matrices, see Ref. 71, Chap. 1. Single Strategy, Two-Person, Zero-Sulll Gallles Minilllax Principle. respect to player PI is Consider the game whose payoff matrix with A= If player PI chooses the number i (i.e., adopts the ith strategy, i = 1, 2, .. " m), he is certain to receive at least j = 1,2, "', n. Since he can choose i as he pleases, he can, in particular, choose i so as to make minjaij as large as possible. Thus player PI can choose i so as to receive at least max min aij. i i OPERATIONS RESEARCH 15-103 Similarly, player P 2 can choosej so as to make certain that he will receive at least max min (- ail), i j since for two-person, zero-sum games, the payoff matrix with respect to player P 2 will consist of elements (payments) which are the negative of the elements of matrix A. That is, player P 2 can choose} so as to make certain that he will receive at least -min max ail l i or, equivalEmtly, that player PI will get at most min max ail. l i Saddle Point. In summary, PI can guarantee that he will receive at least max min ail l i and P 2 can prevent PI from receiving more than min max ail. If i max min ail = min max ail = aiolo = v, i i j l PI will settle for v and P 2 will settle for -v. Games for which the equation above holds are called games with a saddle point. More specifically, (io, jo) is called a saddle point and aiojo is called the value of the game for player Pl. Furthermore, the best strategy for player PI is i o, and the best strategy for player P 2 is jo. (See Ref. 71, Chap. 1.) It should be noted that a saddle point of a matrix is a pair of integers (io, jo) such that aiojo is at the same time the minimum element of its row and the maximum element of its column. Every single strategy two-person, zero-sum game has a saddle point. This saddle point provides the solution of the game by designating the best strategies for each player and the value of the game. Example. The game represented by 437 6 5.2 1 0 o 1 3 4 2 2 1 5 has a saddle point at (1, 2). Its solution consists of the strategies 1 and 2, respectively, and the value of the game is 3. OPERATIONS RESEARCH 15-104 Stated in another manner, every game which contains a saddle point is a single strategy game (see Ref. 71). , Games without saddle points, such as the game represented by 1 I -1 are mixed strategy games. I -1 1 Mixed Strategy, Two-Person, Zero-SuIn GaInes Consider the (penny-matching) game whose payoff matrix is P2 Heads Heads Tails Tails I:: :: I Such a matrix has no saddle point and, hence, is not a single strategy game. Furthermore, one can readily see that it makes little difference to player PI whether he chooses strategy 1 (heads) or strategy 2 (tails), for, in either case, he will receive 1 or -1 according as P 2 makes the same or opposite choice. Player PI must play the game by making his selections by means of some chance device. The procedures for determining optimum mixed strategies are discussed below. Dominance. If a = (ab a2, "', an) and b = (bb b2, "', bn) are vectors (or rows or columns of a matrix), and if ai ~ bi (for i = 1, 2, "', n), one says that a dominates b. If ai > bi (for i = 1, 2, "', n), one says that a strictly dominates b. Convex Linear Combination. Let x(l) = (Xl (1), " ' , Xn (1», X(2) = (Xl (2), "', X(r) (Xl (r), "', Xn (r», Xn (2», X = (Xl, "', Xn). Let a = (ab "', ar) such that ai ~ 0 (i = 1, 2, "', r) and al + a2 + ... + ar = 1. Then X is a convex linear combination of x(l), "', x(r) with weights aI, "', ar if Xj = alx/ I) + a2x/2) + ... + arx/ r), for} = 1, 2, "', n. OPERATIONS RESEARCH 15-105 Thus, the point (0, 15) is a convex linear combination (with weights 7'13, of the points (6, 12), (-9, 15), and (4, 16). THEOREM. Let r be a rectangular game whose matrix is A; suppose that, for some i, the i-th row of A is dominated by some convex linear combination of the other rows of A; let A' be the matrix obtained from A by omitting the i-th row; and let r' be the rectangular game whose matrix is A'. Then the value of r' is the same as the value of r; every optimum strategy for P 2 in r' is also an optimum strategy for P 2 in r; and if w is any optimum strategy for PI in r' and x is the i-place extension of w, then x is an optimum strategy for PI in r. Moreover, if the i-th row of A is strictly dominated by the convex linear combination of the other rows of A, then every solution of r can be obtained in this way from a solution of r'. (See Ref. 71, Chap. 2.) Note. A similar theorem applies to dominating columns. (See Ref. 71, Chap. 2.) EXAMPLE. The following example of the application of this theorem is cited in Ref. 71 (p. 50): 3/s, and 72) 3 2 4 0 3 4 2 4 4 2 4 0 0 4 0 8 Row 1 is dominated by row 3, yielding 3 4 2 4 4 2 4 0 0 4 0 8 Column 1 dominates column 3, resulting in 424 240 408 Column 1 dominates a convex linear combination of columns 2 and 3, namely: 4 > !(2) + !(4), 2 = !(4) + !(O), 4 = !(O) + !(8). OPERATIONS RESEARCH 15-106 Thus, the first column can be omitted, yielding 2 4 4 0 o 8 Row 1 is now dominated by a convex linear combination of rows 2 and 3, sInce 2 = ~(4) + ~(O), 4 = ~(O) + ~(8). Therefore, the matrix reduces to As wiII be seen later, the solution to this latter matrix consists of the mixed strategy (%, Ys) for each player and a game value of %. Therefore, the value of the original game is ~-s, and the optimum strategy for the original game is (0, 0, %, Ys) for each player. General Theorems for Rectangular Games (Refs. 1 and 71) THEOREM 1. Every rectangular game has a specific value g. This value is unique. Furthermore, there exists for player PI a best strategy, i.e., there exist non-negative frequencies Xl, X2, "', Xm such that Xl + X2 + ... + Xm = 1 and such that if he plays plan I with frequency XI, plan I I with frequency X2, "', plan M with frequency Xm, then he can assure himself at least an expected gain of g, which is the value of the game. Similarly, for player P 2, there exists a best strategy Y = (YI, Y2, "', Yn), YI + Y2 + ... + Yn = 1, such that ~f P 2 played plans I, II, "', N with the above frequencies, respectively, he (P 2) can assure himself at most a loss of g. THEOREM 2. The unknowns, XI, X2, .. " Xm, YI, Y2, "', Ym and g (for the solution of a game) can be determined from the following relations: m Xl + X2 + ... + Xm == 2: Xi = 1, Xi ~ 0 (i = 1, 2, "', m) ; 1, Yj ~ 0 (j = 1, 2, "', n); i=l n YI + Y2 + ... + Yn == 2: Yj = j=l 11~ 2: Xiai i ~ g (j=1,2,"',n); i=l n L: aiiYj ~ g j=l (i = 1, 2, "', m). OPERATIONS RESEARCH 15-107 THEOREM 3. Let X* = (Xl *, X2*' ... , Xm *) and y* = (YI *, Y2*, ... , Yn *) be any optimal strategies for PI and P 2 , respectively, for a game whose value is g. If, for any i, n L: aijYj < g, j=I then Similarly, if for any j, m L: Xiaij > g, i=I then Y/ == o. Solutions of Rectangular G~llles Two-by-Two Gallles. To solve two-by-two rectangular games, first look for a saddle point. If one exists, the game is a single strategy game and the solution is immediately given as discussed above. If no saddle point exists, the game is a mixed strategy game and is solvable by either of the following methods. Algebraic Solution. Given: a b c d A= Let X and 1 - X be the frequencies with which PI plays plans I and II respectively. Then, if player P 2 plays plan I, PI can expect a(x) + c(l - X) = C + (a - c)x. On the other hand, if player P 2 plays plan II, PI can expect b(x) + del - X) = d + (b - d)x. The solution of any two-by-two game is given by the minimax principle, namely, by solving c EXAMPLE. + (a - c)x = d + Given: -3 7 6 1 (b - d)x. OPERATIONS RESEARCH 15-108 Then -3(x) + 6(1 - x) = 7(x) yields + 1(1 - x) i, = i. x = (1 - x) Similarly, one determines that and y = -g., (1 - y) = !. Finally, the value of the game is given by (since x = g = -3(i) + 6(i) = ~) 3. For this and other algebraic procedures, see Ref. 1, Chap. 18. Method of Oddments (Two-by-Two Game). The method of oddments for two-by-two games is given by Williams (Ref. 69). EXAMPLE. The method may be stated. by means of the game whose payoff matrix is Plan I II --- - - -- I -3 7 --- - --- II 6 1 To determine the optimum frequencies for PI, subtract the numbers in the second column from those in the first column. This gives: One of the two numbers will always be negative. Ignore the minus sign for the purpose of computing oddments. OPERATIONS RESEARCH 15-109 Then, the oddment for PI (1) is given by I 5 whereas the oddment for PI (II) is given by 10 II Therefore, the oddments for PI are 5 and 10, respectively, or, equivalently, the optimum frequencies are 5 1 --- = - 5 + 10 3 10 and - - 5 + 10 2 =-. 3 Similarly, by subtracting rows, one can determine that the optimum frequencies for player P 2 are % and %. Two-by-n GaInes. To find the solution of a two-by-n game: 1. Look for a saddle point. If one exists, the game is a single strategy game and the solution is given by the saddle point. 2. If no. saddle point exists, examine the payoff matrix for dominance and, eliminate all dominated strategies (if any) for PI and all dominant strategies (if any) for P 2 • 3. The matrix which remains will then contain a two-by-two submatrix with the property that its solution is also a solution to the two-by-n game. The pertinent two-by-two submatrix can be found in one of several ,vays, probably the easiest of which is the graphical method. Graphical Solution of Two-by-n Games. Given the game whose payoff matrix is: (P 2 ) 1 1 2 -6 1 4 3 3 5 5 0 6 7 -4 -1 - - - - - -- - - - - -- 2 7 3 -2 4 -3 0 1 OPERATIONS RESEARCH 15-110 Plot the payoffs for each strategy of P2 on two parallel axes, as shown in Fig. 13. Then, join the line segments which bound the figure from below and mark the highest point on this boundary. The lines which intersect at this. point identify the strategies that player P2 should use. 7 3 3 1 O~~----~~~~~------~O -1 -2 -3 -4 -6 FIG. 13. Graphical solution of two-by-n game. In the given example, these are strategies 5 and 6. (N ote that strategies 2, 4, and 7 dominate strategy 6 and could have been eliminated immediately. Similarly strategy 3 dominates 5 and could be eliminated.) Therefore, the appropriate two-by-two subgame is 1 o -4 2 -3 0 which, by the method of oddments, gives (t, t) and a game value of g = -177. x* = and y* = (1-, t) OPERATIONS RESEARCH 15-111 Hence, the solution to the original game is X* = (-'f, :}) y* = (0, 0, 0, 0, -}, and t, 0), and, again, g = _177It should be noted that, for m-by-two games, one proceeds as above, marking, however, the line segments which bounds the graph from above and then identifying the lowest point on this boundary. N ole. This is merely a graphical application of the minimax principle. See Refs. 1, 69, and 71. Three-by-Three Games. To find the solution of three-by-three games: 1. Look for saddle point. 2. If none exists, examine the payoff matrix for dominance and reduce it accordingly. 3. If a three-by-three matrix remains, solve by method of oddments to see if a three-by-three solution exists. 4. If oddments method fails, try the two-by-two subgames for a solution. EXAMPLE. Method of Oddments (Three-by-Three Game). Consider the game whose payoff matrix is 1 1 3 6 6 - -- -- 8 --:2 ° ° - -- -- - 3 4 6 5 To determine the optimum frequencies for player PI, subtract each column from the preceding column, yielding 1 3 6 -6 10 -2 -2 1 15-112 OPERATIONS RESEARCH The oddment for PI (1) is given by 1 10 -2 -2 1 the numerical value of which is the difference between the diagonal products: 10(1) - (-2)( -2) = 6. Similarly, P 1 (2) is given by 6 -6 -2 1 2 namely, 6(1) - (-6)( ~2) = -6, and P 1 (3) is given by 6 -6 10 -2 3 or (6)( -2) - (-6)(10) = 48. Therefore, the oddments for PI are 6:6:48 so that the optimum frequencies are X* = (lo, 110' t). OPERATIONS RESEARCH 15-113 Similarly, by subtracting rows, one determines the oddments for P2, namely, 38: 14:8 and optimum frequencies y* = (~8, 7 4 3 0' 3 0)' Furthermore, the value of the game is given by g= (1)(6) 1r 1(8) 1r 8(4) 10 23 =_e 5 Note. Every solution obtained by the method of oddments must be tested. It may well be that the three-by-three game does not have a threeby-three solution, but, rather, a two-by-two solution. See Ref. 69. Three-by-n GaInes. To find the solution of three-by-n games: 1. Look for a saddle point. 2. If none exists, examine the payoff matrix for dominance and reduce it accordingly. 3. If a three-by-n matrix remains, the problem is then to find the solution by the earlier methods, since every three-by-n matrix has solutions which are either three-by-three or two-by-two (or a saddle point). Solve the two-by-n sub games by the graphical method. If no two-bytwo solutions exist, the solution must then be a three-by-three solution which can be obtained by successively trying each three-by-three subgame. See Refs. 1, 69, and 71. Four-by-Four GaInes. For games which do not have a saddle point (i.e., mixed strategy games) and which, after removing rows and/or columns due to dominance, reduce to a four-by-four game, there is a method of oddments for obtaining the desired solution. For this method, see Williams (Ref. 69). Other Solutions of Rectangular GaInes There are a variety of other methods for solving rectangular games, a few of which are cited here. Matrix Solution of GaInes. Let A = (aij) be the m X n matrix of a game, B = (b ij ) be any square submatrix of A of order r J r = (1,1, "',1), a 1 X r matrix, CT = transpose of C, where C is any matrix, adj B = adjoint of B, Xi ~ 0, ~Xi = 1, X = (XI, X2, "', x m), Yj ~ 0, ~Yi = 1, y = (YI, Y2! "', Yn), > 1, OPERATIONS RESEARCH 15-114 x= a 1 X r matrix obtained from X by deleting those elements corresponding to the rows deleted from A to obtain B, Y = a 1 X r matrix obtained from Y by deleting those elements corresponding to the columns deleted from A to obtain B. Solution. 1. Choose a square submatrix B of A of order r (~2) and calculate _ X - J r adj B - Jr(adj B)Jl = (Xl X2 " ••• , X r ), . and 2. If some Xi < 0 or some Yj < 0, reject the chosen B and try another. 3. If Xi ~ 0 and Yj ~ 0 for all i, j = 1, 2, ... , r, calculate IBI g =----= T Jr(adj B)J r and construct X and Y from X and Y by adding zeros in the appropriate places. Check whether m L: Xiaij ~ g, for all j, i=l and whether n L: Yjaij ~ g, for all i. /=1 If one of the relations does not hold, try another B. If all relations hold, then X, Y, and g are the required solutions. See Refs. 1 and 71. Iterative Method for Solving a GaIDe. There is an approximate method of solving rectangular games which enables one to find the value of such games to any desired degree of accuracy and also to approximate to optimal strategies. See Ref. 71, Chap. 4, and Ref. 1, Chap. 18. Solution of Rectangular GaIDes by Linear PrograIDIDing. It can be shown that the problem of solving an arbitrary rectangular game can be regarded as a special linear programming problem and, conversely, that many linear programming problems can be reduced to problems in game theory. Thus, the techniques for solving linear programming problems (e.g., the simplex technique), especially through the use of high-speed electronic computers, can be applied to the solution of game theory problems. See Ref. 71, Chap. 14. OPERATIONS RESEARCH 15-115 Zero-Sum, n-Person Games The theory of n-person games, n > 2, is not in an altogether satisfactory state. For an excellent exposition on the elements of zero-sum n-person games, see Ref. 71 and the original text on the subject, namely that of von Neumann and Morgenstern (Ref. 68). A very brief discussion is to be found in Ref. 1. 8. DATA FOR MODEL TESTING Introduction. The type of evidence one uses to test a model depends very much on the kind of test one has in mind. Tn testing a model one asks, "What are the possible ways in which a model can fail to represent reality ~dequately and hence lose some of its potential usefulness?" Following are four ways in which one may question the adequacy of a model. 1. The model may assert a dependence of the effectiveness of the system (the dependent variable) on one or more (independent) variables which, as a matter of fact, do not affect the system's effectiveness. That is, the model may fail by including variables which are not pertinent. 2. The model may fail to include a variable which does have a significant effect on the system's effectiveness. 3. The model may inaccurately express the actual relationship which exists between the measure of effectiveness and one or more of the pertinent independent variables. 4. Finally, even if the model is an accurate picture of reality in the sense of conforming to the foregoing three conditions, it may still fail to yield good results if the parameters contained in it are not evaluated properly. In testing the model, begin by testing it as a whole, i.e., by determining the accuracy of its prospective or retrospective predictions of the system's effectiveness. If this procedure shows that the model is not adequate, further testing will be required to find out which of the four types of deficiencies mentioned here is present. The design of the process of collecting data consists of the following parts: (1) definition (including measurement), (2) sampling (including experimental designs), (3) data reduction, (4) use of the data in the test, (5) examination of the result, and (6) possible redesign of the evidence. S~ientific Definitions. Scientific defining consists of specifying the best conceivable (not necessarily obtainable) conditions under which, and procedures by which, values of the variables can be obtained. Concern with ideal (or optimum) observational conditions and procedures is quite important if one wants to know how good are the results one eventually obtains. Further, and more important, the ideal conditions and procedures act as a standard by means of which one can evaluate the attainable 15-116 OPERATIONS RESEARCH observational conditions and operations, determine their shortcomings, and make any necessary adjustments in the resultant data. For a detailed discussion of scientific defining, see Ref. 72. The two most common types of quantitative variables are the enumerative and the metric. The enumerative variable requires counting for its evaluation whereas the metric variable requires measurement. Scientific Definitions of Enumerative Variables. Two types of errors can arise in the counting operation, overenumeration and underenumeration. Overenumeration results either from counting the same unit more than once or from counting units which should not be counted at all. Underenumeration, on the other hand, results from the failure to count a unit which should be counted. Furthermore, these errors can occur because of a failure to match elements with consecutive integers (e.g., overenumeration because of skipping numbers and underenumeration through duplication of numbers). It is desirable to design the best conceivable counting procedure, even if the design cannot be carried out in practice. This involves specifying the standard environment in which, and the standard operations by which, the count can ideally be made, as well as providing an explicit definition of the elements to be counted. Once this standard is specified, it will be possible to use it to evaluate alternative practically realizable counting procedures and to select the best of these. The standard also provides a basis for estimating the error that is likely to occur in the practical counting procedure which is eventually used. Scientific Definitions of Properties (Metric Variables). The idealized design of a procedure for measuring properties depends primarily on the type of property involved. Scientific definitions of -properties involve specifying the following characteristics of the idealized measuring procedure: 1. Identification of the thing, event, or class of things or events which should be observed. 2. Specification of the environment in which the observations should be made. 3. Specification of the changes in the environment which should be made, if any, during the observation period. 4. Specification of the operations to be performed and the instruments and measure to be used by the observer. 5. Specification of the readings (data) to be made. 6. Specification of the analysis of the data. The formal description of the measure to be used states what logical and mathematical operations one wants to be able to perform on the data to be obtained in evaluating a variable. For a complete discussion of the theory OPERATIONS RESEARCH 15-117 of measurement see Refs. 73-79. The scientific definition (observational standard) states how, ideally, one would go about collecting pertinent data. The operational specification of the data collection process states how one actually intends to collect and adjust the data. Errors can arise in each of these three stages of planning relative to testing the model. Sampling. In evaluating variables, one is either involved in measuring the property of a single unit or in counting the members of, or measuring the properties of, a class of units (a population). The definition of a property of a single unit specifies the conditions under which the observation should be made. If these conditions can be met and observations can be made without error, only one observation is required. But if the conditions are not met, observations are subject to error which can only be estimated if two or more observations are made. How many observations to make, and where, are sampling questions. Since the standard conditions specified in the definition can seldom be met in practice, one must choose one of two courses: (1) An experimental design must be chosen which, by techniques such as the analyses of variance and covariance, makes it possible to assess the magnitude of the deviations and ascribe them to specific environmental factors, or (2) observations must be made on a subset (of the population) which make it possible to draw inferences that are valid for the whole population with the least possible bias. The subject of sampling is concerned with the selection of appropriate subsets. In the main, sampling can be described as the selection of items from a population. The "population" of objects, events, environments, and stimuli to be sampled should be specified in the definition of the variable being evaluated. The population represents all the possible data of the relevant kind that can be collected. Evaluation of Samples and Sample Estimates. The decision which must be made in designing a sampling procedure is concerned with the method of drawing the sample and the method of making estimates about the population from the sample. If a prescribed method is carried out correctly, there are two opposing considerations: (1) the probability that the estimate made on the basis of the sample will actually deviate from the true population value by an amount greater than some amount x; (2) the cost of taking the sample. In the main, the probability of deviations will decrease with an increase in the sample size, but the cost of taking the sample will increase with an increase in sample size. Types of Sampling Designs. In unrestricted random sampling every possible sample has the same chance of being chosen. Restricted random sampling represents methods by which each possible sample does not have an equal probability of being drawn. But in each case where random 15-118 OPERATIONS RESEARCH sampling is used scientifically, the probability of selecting any sample is known. All the various schemes for sampling are based on very simple, practical con~iderations. These are: 1. Items of the population may fall into recognizable groups (e.g., in terms of location or dollar amounts on an invoice). If this is the case, it is reasonable to think in terms of sampling from these groups,because in general one reduces the variance of the estimates and (more important) one can be selective in the amount of sampling that is done in each group. Invoices with large dollar amounts are more important than ones with small dollar amounts; hence a larger sample of the more important items should be taken. 2. Items of a population often fall into clusters (e.g., a shipment shown on an invoice; people in a house, block, or town; items in a warehouse). If one looks at some item in a cluster, one might just as well look at the rest of the items. Hence the cluster becomes the basis of sampling, not the original items. The use of clusters may increase the variance of the estimates but greatly decrease the costs of gathering the sample-the usual economic balancing problem. 3. One does not have to plan completely in advance. One can let the sample information that comes in dictate how the next steps are to be taken. The following is a general classification of the principal types of sampling designs: 1. Fixed sampling design. The sampling design is fixed and not subject to change in terms of sample data. A. Unrestricted random sampling. A random sample 'is selected from the whole population by either 1. Simple random sampling. Assigning a different number to each element in the population and using random numbers to select the sample, or 2. Systematic random sampling. Where a population is ordered, selecting a starting place at random and then selecting subsequent elements at a fixed interval from the first and subsequent selections. Tables of random numbers can be found in Refs. 10 and 80-82. Details on the generation of such numbers can be found in Ref. 83. B. Restricted random sampling. The population is divided into subgroups (and possible subsubgroups, etc.) and either some of these are selected and/or random samples from some or all of these are selected. 1. Multistage random sampling. Random samples are drawn from subgroups which ,have themselves been selected (a) with equal OPERATIONS RESEARCH 15-119 probability, or (b) with probability proportionate to the relative size of the subgroup, or some other criterion. 2. Stratified random sampling. A random sample is drawn from every subgroup of the population. The size of the sample from the subgroups may be (a) independent of the size of the subgroups (i.e., samples of equal size), (b) proportionate to the relative size of the subgroup, or (c) proportionate to the relative size of the subgroup and the dispersion of the elements within it (optimum allocation). 3. Cluster sampling. A random sample of subgroups is selected, all elements of which are included in the final sample. 4. Stratified cluster sampling. A combination of B2 and B3, where more than two stages of sampling are involved. II. Sequential sampling. A small random sample is selected and analyzed, on the basis of which a decision is made as to whether or not to continue sampling and if so, how. The samples may be either A. In groups, as in double or multiple sampling, or B. Single items taken one at a time. For details on sequential sampling see Refs. 84-87. The aspect of sampling called experimental design usually refers to a sampling plan based on the variables in the model which is to be tested. Instead of keeping everything fixed except one variable, it is possible to design data collection systems at optimum locations of some of the variables of the model. This sampling method assumes that the variables of the model can be manipulated in reality-or at least in a realistic model. For information on various types of experimental designs, see Refs. 88-91. For comprehensive surveys of contemporary sampling theory see Refs. 72 and 92-95. Reduction of Data. The observations made on the sampled items or in a sample of situations provide the raw data on the basis of which variables are assigned values and hence, provide the basis for testing all or part of the model. In many cases the data require collation, editing, coding, punching, etc. Discussion of these phases of data processing can be found in Ref. 72, Chap. X. In general, the ultimate form to which the data must be transformed to be useful in the testing process will be either an estimate of the value of a parameter or an inferential "statistic" which describes a relationship between two or more variables. For example, in testing a lot-size model the cost variables must first be evaluated in order to compute the total cost "predicted" by the model. Once these predictions are obtained they are compared with observed values in order to derive a "statistic" which can be used to determine whether or not the model predicts well. 15-120 OPERATIONS RESEARCH For a discussion of data reduction and the problems of estimation and obtaining estimates of the variability of estimates, see Ref. 1, Chap. 20. For statistical tests of the significance or nonsignificance of a variable, see Refs. 96-105. These tests may make it possible to determine such matters as: (a) whether a variable should or should not be included, (b) whether the form of an analytic function is linear or some other type, (c) whether the form of a probability function is normal or some other type, (d) whether the model has failed to include a variable that ought to have been included. For further discussion of procedures for testing the adequacy of models and the solutions derived from them, see Ref. 74, Chap. 20. 9. CONTROLLING THE SOLUTION Introduction. Many, if not most, O.R. projects deal with mangement decisions that are recurrent. Hence the solution must be used over and over again. But the systems which are dealt with in O.R. are seldom stable. Their structure is subject to change. Relationships between the variables, or system parameters, which define the system and the value of the parameters themselves are usually subject to change. In such situations the relationships and parameters used in the decision rule must be adjusted for changes in the system as they occur. Costs may change, the distribution of demand may change in some or all of its characteristics, and the relationships between variables may change over time. Hence the values of the relationships and parameters should be periodically reevaluated and the assumptions involved in the model (from which the decision rule is derived) should be reexamined periodically. That is, the solution must be controlled lest it lose some of its effectiveness because of changes in the system. Complete methodologies have not yet been developed for optimizing control procedures. Enough is known, however, to design procedures which are more likely to lead to success than either leaving this phase of the project to chance or relying on others (management or operating personnel) to take care of them. For a complete discussion, see Ref. 1, Chap. 21. Controlling the Solution. The effectiveness of a solution in an O.R. problem may be reduced by changes in either values of the parameters of the system or the relationships between them, or both. A previously insignificant parameter may become significant, or, conversely, previously significant parameter may become insignificant. Changes in the values and functional relations of the parameters which remain significant may also affect the effectiveness of the solutiop.. Not every change in a parameter or relationship is significant. In general terms, a change is significant if (1) adjustment of the solution for the a OPERATIONS RESEARCH 15-121 change results in an improvement in effectiveness and (2) the cost of making the adjustment and carrying it out does not offset the improvement in effectiveness. Design of a control system, then, consists of three steps: (1) listing the variables, parameters, and relationships that either are included in the solution or should be if their values were to change; (2) development of a procedure for detecting significant changes in each of the parameters and relationships listed; (3) specification of action to be taken or adjustments to be made in the solution when a significant change occurs. The l:;tst two steps are interrelated because determination of the significance of a change (step 2) depends on the cost of making the adjustment specified in step (3). Control of Parameters. The first step in designing a control procedure involves listing all the variables and relationships which, if they were to change in value, might affect the effectiveness of the solution. The parameters which are listed should be classified into two types: 1. Variables whose values during the period covered by a decision can be known in advance, such as the number of models in a line, the number of work days in an accounting period, and the price for which an item is to be. sold. Control of such measures consists either (a) of establishing communication lines between those who know these values and those who use the decision rules or (b) of providing the latter with source material (such as a calendar in the case of the number of work days per accounting period). 2. Measures whose values cannot be known in advance, such as number of units sold, number of hours worked, and arrival rate of trucks. These values must be estimated in advance. Essential to the control of any measure, is the determination of whether its true value or one or more of the characteristics of its estimate have changed. This determination consists of testing the hypothesis that no change has occurred in the variable or the characteristics of its estimate (which are themselves variables). Errors in Detecting Changes. Determination of whether or not such a change has occurred is subject to two types of error: (I) asserting that a change has occurred when it has not and (II) asserting that a change has not occurred when it has. An understanding of the two types of error is essential to comprehend what is involved in controlling a variable. See Ref. 1, Chap. 21. Detection of and Adjustment for Significant Changes. Ideally, in the design of a control system for a variable, six interdependent decisions should be made if possible. (In some situations there may be no choice with regard to one or more of these decisions.) The decisions are as follows: 15-122 OPERAliONS RESEARCH 1. The frequency of (i.e., period between) control checks. 2. The number of observations per control check, if more than one is possible. 3. The way items should be selected for observation (i.e., the sampling design), if more than one observation is specified. 4. The statistical testing procedure to be used to determine whether or not a value has changed. 5. The specific decision rule based on the test. 6. The action to be taken, if the test indicates that a parameter's value has changed. Costs. Again, ideally, these decisions should be made in such a way as to minimize the sum of the following costs: 1. The cost of taking the observations. 2. The cost of performing the test. 3. The expected cost of a type I error (i.e., the cost of changing a value when it is not warranted). 4. The expected cost of type II errors (i.e., the cost of not changing a value when it is warranted). Unfortunately, at the present time the six decisions listed cannot be made in such a way as to assure minimization of the sum of the four costs. The design of an optimizing procedure can be specified in general terms; i.e., a model can be constructed which expresses the total expected cost as an abstract (but not as a concrete) function of the six decisions. In addition, some of the expressions which would appear in the model cannot be evaluated. In most situations, for example, the expected costs associated with type I and type II errors cannot be determined. For further reading, see Refs. 106-109. Further details on methods of controlling parameters are given in Refs. 110-118. Controlling Relationships. Every probability distribution asserts a relationship between the probability of an event and the values of other variables. In the case of distributions which appear in the model and solution, as, for example, the distribution of demand, the parameters which 'define the distribution must be controlled (e.g., the mean and variance) as well as the form of the distribution (e.g., normal or Poisson). Both aspects of the distribution should be subjected to control. There are no "standard" procedures for controlling the form of a distribution. Such control may be obtained by periodically testing the "goodness of fit" in the manner given by standard statistics texts. The frequency with which such tests should be conducted depends on the rate at which data are generated. The visual plotting of data, as they become available, can frequently indicate when a check should be made. Examina- OPERATIONS RESEARCH 15-123 tion of these charts can provide clues to changes in the parameters of the distribution as well as to the form. The control of relationships which do not ta.ke the form of probability distributions also involves control over the form of the function which relates the variables and the values of the variables. Every O.R. project has unique characteristics which create unique control problems but which also offer challenging opportunities for the development of unusual control procedures. There is a good deal of room for scientific creativity in this phase of the research. For a full discussion and illustration of the development of control procedures, see Ref. 1, Chap. 21. 10. IMPLEMENTATION Concern of the O.R. Team. Once the solution has been derived and tested, it is ready to be put to work. Conversion of the solution into operation should be of direct concern to the research team for two reasons: 1. No matter how much care has been taken in deriving a decision rule and testing it, shortcomings may still appear when it is put into operation or ways of improving the solution may become apparent. If adjustment of the decision rule to take care of unforeseen operating problems is left in the hands of those who do not understand how it was derived, the adjustment may seriously reduce its effectiveness. Operating personnel may, for example, see no harm in making what appears to them to be a slight change, but such a change may be critical. 2. Carrying out the solution may not be as obvious a procedure in the context of complex operations as it initially appears to be to the researchers. The solution must be translated into a procedure that is workable if its potential is to be fully realized. The procedure must be as accurate a translation of the solution as is practically feasible and only the researchers can minimize the loss in the solution's effectiveness that is incurred in this translation. The nature of the implementation problem depends on whether the solution pertains to a one-time or repetitive decision. In the case of one-time decisions the problem is simpler but by no means disappears. Translation of the solution into the operating procedure involves answering three questions and proceeding accordingly. The three questions are: (1) Who should do what'? (2) When'? (3) What information and facilities are required to do it? On the basis of the answers to these questions the operating procedure can be designed and any necessary training and transition can be planned and executed. Implementation of a solution involves people taking action. These people must be identified, and the required action must be specified. The details cannot be enumerated without a thorough knowledge of the opera- 15-124 OPERATIONS RESEARCH tions and the division of responsibility in the organization under study. The analysis of the organization provides much of the needed information and the rest should be provided by management and operating people working with the research team. This and the other phases of implementation require continuous cooperation and communication among management, operators, and researchers. Each person who is given responsibility for initiating action in carrying out the solution or using the decision rule should be instructed as to when they are to take action. The tools required to do the job should be made available to those who need them, and these people should be trained in their use. The tools should not be too complex for the operating personnel to use. It may be necessary, for example, to convert even a simple equation into a nomograph or tables. In some cases the tools may require simplification even if such simplification results in a loss of some of the original solution's power. The solution or decision rule is generally used by personnel whose mathematical sophistication is less than desired. Consequently, if one wants to assure use of the recommended decision rules, one must frequently simplify them before handing them over to executives and operating personnel. In many cases this means that one must either translate elegant solutions into approximations that are easy to use or sidestep the elegance and move directly to a "quick-and-dirty" decision rule. It should be realized that in one sense almost every solution in O.R. is an approximation and is "quick and dirty" to some degree. This follows from the fact that in constructing every model some simplifying assumptions are made. Reality is too difficult to represent in all its complexity. These simplifying assumptions reduce the generality of the model and solutions derived from it. But this is only a polite way of saying that quickness and dirtiness are involved. It is well for the operations researcher to realize that an approximate solution which is used may be a great deal better than a more exact solution which is not. For further discussion of the problems of implementing the solution, see Ref. 1, Chap. 21. REFERENCES 1. C. W. Churchman, R. L. Ackoff, and E. L. Arnoff, Introduction to Operations Research, Wiley, New York, 1957. 2. F. N. Trefethen, "A History of Operations Research," in Operations Research for Management, J. F. McCloskey and F. N. Trefethen (Editors), The Johns Hopkins Press, Baltimore, Md., 1954. 3. M. L. Hurni, Observations on Operations Research, J. Opns. Research Soc. Am., 2 [3], 234-248 (1954). OPERATIONS RESEARCH 15·125 4. H. F. Smiddy and L. Naum, Evolution of a "Science of Managing" in America, Mgmt. Sci., 1 [1], 1-31 (1954). 5. J. H. Curtiss, "Sampling Methods Applied to Differential and Difference Equations," in Seminar on Scientific Computation, International Business Machines Corporation, New York, 87-109, Nov. 1949. 6. 1. S. Sokolnikoff and E. S. Sokolnikoff, Higher Mathematics for Engineers and Physicists, McGraw-Hill, New York, 1941. 7. R. E. Bellman, Dynamic Programming, Princeton University Press, Princeton, N. J., 1957. 8. C. C. Holt, F. Modigliani, and H. A. Simon, A linear decision rule for production and employment scheduling, Mgmt. Sci., 2 [1], 1-30 (1955). 9. C. C. Holt and H. A. Simon, Optimal decision rules for production and inventory control, Proceedings of the Conference on Operations Research in Production and Inventory Control, Case Institute of Technology, Cleveland, 0., 1954. 10. The RAND Corporation, A Million Random Digits, The Free Press, Glencoe, Ill., 1955. 11. H. Kahn, Applications of Monte Carlo, Project RAND, RM-1237-AEC, RAND Corporation, Santa Monica, Calif., April 19, 1954. 12. G. W. King, The Monte Carlo method as a natural mode of expression in Operations Research, J. Opns. Research Soc. Am., 1 [2], 46-51 (1953). 13. U. S. Department of Commerce, National Bureau of Standards, Monte Carlo Method, Applied Mathematics Seminar 12, June 11, 1951. 14. J. B. Crockett and H. Chernoff, Gradient methods of maximization, Pacific J. Math., 5 (1955). 15. B. Klein, Direct use of extremal principles in solving certain optimizing problems involving inequalities, J. Opns. Research Soc. Am., 3 [2], 168-175 (1955). 16. H. W. Kuhn and A. W. Tucker, "Nonlinear Programming," in Second Berkeley Symposium on Mathematical Statistics and Probability, J. Neyman (Editor), University of California Press, Berkeley, Calif., 481-492, 1951. 17. K. Arrow, T. Harris, and J. Marschak, Optimal inventory policy, Econometrica, 19 [3], 250-272 (1951). 18. C. Eisenhart, Some Inventory Problems, National Bureau of Standards, Techniques of Statistical Inference, A2-2C, Lecture 1, Jan. 6, 1948 (hectographed notes). 19. C. B. Tompkins, Lead time and optimal allowances-an extreme example, Conference on Mathematical Problems in Logistics, George Washingt.on University, Appendix I to Quarterly Progress Rept. No.1, Dec. 1949-Feb. 1950. 20. T. M. Whitin, The Theory of Inventory Management, 2nd edition, Princeton Universit.y Press, Princeton, N. J., 1957. 21. A. Dvoretzky, J. Kiefer, and.J. Wolfowitz, On the optimal character of the (s, S) policy in inventory theory, Econometrica, 21 [4], 586-596 (1953). 22. A. Dvoretzky, J. Kiefer, and J. Wolfowitz, The inventory problem, Econometrica, 20 [2], 187-222 (1952); [3], 450-466 (1952). 23. E. B. Berman and A. J. Clark, An optimal inventory policy for a military organization, RAND Rept. D-647, RAND Corporation, Santa Moni.ca, Calif., March 30,1955. 24. H. A. Simon, On the application of servomechanism theory in the study of production control, Econometrica, 20 [2], 247-268 (1952). 25. R. Bellman, 1. Glicksberg, and O. Gross, On the optimal inventory equation, Mgmt. Sci., 2 [1], 83-104 (1955). 26. H. J. Vassian, Applicat.ion of discrete variable servo theory to inventory control, J. Opns. Research Soc. Am. 3 [3], 272-282 (1955). 15-126 OPERATIONS RESEARCH 27. A. Charnes, W. W. Cooper, and D. Farr, Linear programming and profit preference scheduling for a manufacturing firm, J. Opns. Research Soc. Am., 1 [3], 114-129 (1953). 28. G. Dannerstedt, Production scheduling for an arbitrary number of periods given the sales forecast in the form of a probability distribution, J. Opns. Research Soc. Am., 3 [3], 300-318 (1955). 29. R. Bellman, Some applications of the theory of dynamic programming, J. Opns. Research Soc. Am., 2 [4], 275-288 (1954). 30. R. Bellman, Some problems in the theory of dynamic programming, Econometrica, 22 [1], 37-48 (1954). 31. R. Bellman, The theory of dynamic programming, Bull. Am. Math. Soc. [6], 503516 (1954). 32. R. Bellman, I. Glicksberg, and O. Gross, The theory of dynamic programming as applied to a smoothing problem, J. Soc. Ind. Appl. Math., 2 [2], 82-88 (1954). 33. T. M. Whit in, Inventory control and price theory, Mgmt. Sci., 2, 61-68 (1955). 34. T. M. ·Whitin, Inventory control research: A survey, Mgmt. Sci., 1, 32-40 (1954). 35. H. A. Simon and C. C. Holt, The control of inventory and production rates-A survey, J. Opns. Research Soc. Am., 2 [3], 289-301 (1954). 36. A. Charnes, W. W. Cooper, and A. Henderson, An Introduction to Linear Programming, Wiley, New York, 1953. 37. G. H. Symonds, Linear Programming: The Solution of Refinery Problems, Esso Standard Oil Co., New York, 1955. 37a. A. Orden, Survey of research on mathematical solutions of programming problems, Mgmt. Sci., 1, 170-172 (1955). 38. G. B. Dantzig, Ref. 40, Chaps. I, II, XX, XXI, and XXIII. 39. A. Charnes and W. W. Cooper, The stepping stone method of explaining linear programming calculations in transportation problems, Mgmt. Sci., 1 [1], 49-69 (1954). 40. T. C. Koopmans (Editor), Activity Analysis of Production and Allocation, Cowles Commission Monograph No. 13, Wiley, New York, 1951. 41. A. Henderson and R. Schlaifcr, Mathematical programming, Harvard Business Rev., 32, 73-100 (May-.Tune 1954). 42. T. L. Saaty, Resume of queuing theory, Opns. Research, 5, 161-200 (1957). 43. W. Feller, An Introduction to Probability Theory and Its Applications,Wiley, New York, 1950. ·44. E. Brockmeyer, H. L. Holstrom, and Arne Jensen, The life and works of A. K. Erlang, Trans. Danish Acad. Tech. Sci., 2, Copenhagen, 1948. 45. Raymond, Haller and Brown, Inc., Queuing Theory Applied to Military Communication Systems, State College, Pa., 1956. 46. A. Cobham, Priority assignment in waiting line problems, J. Opns. Research Soc. Am., 2, 70-76 (1954); also 3, 547 (1955). 47. J. Y. Barry, A priority queuing problem, Opns. Research, 4, 385-386 (1956). 48. E. Koenigsberg, Queuing with special service, Opns. Research, 4, 213-220 (1956). 49. D. Y. Barrer, A waiting line problem characterized by impatient customers and indifferent clerks, J. Opns. Research Soc. Am., 3, 360-361 (1955). 50. R. Kronig, On time losses in machinery undergoing interruptions, Physica, 10, 215-224 (1943). 51. A. B. Clarke, The time-dependent waiting line problem, Univ. Mich. Engr. Research Inst., Rept. No. M720-1 R 39, 1953. 52. A. B. Clarke, A waiting line process of Markov type, Ann. Math. Stat., 27 [2], 452-459 (1956). OPERATIONS RESEARCH 15-127 53. T. Homma, On a certain queuing process, Rept. Stat. Appl. Research, 4 [1] (1955). 54. R. R. P. Jackson, Queuing systems with phase type service, Opnal. Research Quart., 5, 109-120 (1954). 55. S. M. Johnson, Optimal two- and three-stage production schedules with setup times included, Nav. Research Log. Quart., 1 [1], 61-68 (1954). 56. R. Bellman, Mathematical aspects of scheduling theory, RAND Rept. P-651, RAND Corporation, Santa Monica, Calif., April 11, 1955. 57. S. B. Akers, .Jr., and J. Friedman, A non-numerical approach to production scheduling problems, J. Opns. Research Soc. Am., 3, 429-442 (1955). 58. A. W. Br.own, A note on the use of a Pearson type III function in renewal theory, Ann ..Math. Stat., ll, 448-453 (1940). 59. N. R. Campbell, The replacement of perishable members of an operating system, .T. Roy. Stat. Soc., B7, 110-130 (1941). 60. W. Feller, On the integral equation of renewal theory, Ann. Math. Stat., 13, 243267 (1941). . 61. A. J. Lotka, A contribution to the theory of self-renewing aggregates, with special reference to industrial replacement, Ann. Math. Stat., 10, 1-25 (1939). 62. A. J. Lotka, The Present Status of Renewal Theory, Waverly Press, Baltimore, Md., 1940. 63. B. Epstein and M. Sobel, Life Testing. I, J. Am. Stat. Assoc., 48, 486-502 (1953). 64. B. Epstein and M. Sobel, Some theorems relevant to life testing from an exponential distribution, Ann. Math. Stat., 25, 373-381 (1954). 65. L. Goodman, Methods of measuring useful life of equipment under operational conditions, J. Am. Stat. Assoc., 48, 503-530 (1953). 66. Tested Approaches to Capital Equipment Replacement, Special Rept. No.1, American Management Association, New York, 1954. 67. J. von Neumann, Zur Theorie der Gesellshaftsspiele, Math. Ann., 100, 295-320 (1928). 68. J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior, 3rd edition, Princeton University Press, Princeton, N. J., 1953. 69. J. D. Williams, The Compleat Strategyst, McGraw-Hill, New York, 1954. 70. L. Friedman, Competitive Bidding Strategies, Ph.D. Dissertation, Case Institute of Technology, Cleveland, 0., 1957. 71. J. C. C. McKinsey, Introduction to the Theory of Games, McGraw-Hill, New York, 1952. 72. R. L. Ackoff, The Design of Social Research, University of Chicago Press, Chicago, Ill., 1953. 73. N. R. Campbell, An Account of the Principles of Measurements and Calculations, Longmans, Green and Co., New York, 1928. 74. C. W. Churchman, "A Materialist Theory of Measurement," in Philosophy for the Future, R. W. Sellars, V. J. McGill, and M. Farber (Editors), Macmillan, New York, 1949. 75. E. Nagel, Measurement, Erkenntniss, 2, 313-333 (1931). 76. E. Nagel, On the Logic of Measurement, Thesis, Columbia University, New York, 1930. 77. F. F. Stephan, "Mathematics, Measurement, and Psychophysics," in Handbook of Experimental Psychology, S. S. Stevens (Editor), Wiley, New York, 1951. 78. S. S. Stevens, On the problem of scales for the measurement of psychological magnitudes, J. Univ. Sci., 9, 94-99 (1939). 15-128 OPERATIONS RESEARCH 79. S. S. Stevens, On the theory of scales of measurement, Science, 103, 677-680 (1946). 80. H. B. Horton, Random Decimal Digits, Interstate Commerce Commission, Washington, D. C., 1949. 81. M. G. Kendall, "Tables of Random Sampling Numbers," in Tracts for Computers, No. 24, Cambridge University Press, Cambridge, England, 1940. 82. L. H. C. Tippett, "Tables of Random Sampling Numbers," in Tracts for Computers, No. 15, Cambridge University Press, Cambridge, England, 1927. 83. M. G. Kendall and B. B. Smith, Randomness and random sampling of numbers, J. Roy. Stat. Soc., 101, 147-166 (1938). 84. C. W. Churchman, Statistical Manual: Methods of Making Experimental Inferences, Pittman-Dunn Laboratory, Frankford Arsenal, Philadelphia, Pa., 1951. 85. Statistical Research Group, Sequential Analysis of Statistical Data: Application, Columbia University Press, New York, 1946. 86. A. Wald, Foundations of a general theory of sequential decision functions, Econometrica, 15, 279-313 (1947). 87. A. Wald, Sequential Analysis, Wiley, New York, 1947. 88. W. C. Cochran and G. M. Cox, Experimental Designs, Wiley, New York, 1950. 89. W. T. Federer, Experimental Design, Macmillan, New York, 1955. 90. R. A. Fisher, The Design of Experiments, Oliver and Boyd, London, 1949. 91. H. B. Mann, Analysis and Design of Experiments, Dover, New York, 1949. 92. W. E. Deming, Some Theory of Sampling, Wiley, New York, 1950. 93. M. H. Hansen, W. N. Hurwitz, and W. G. Madow, Sampling Survey Methods and Theory, Wiley, New York, 1953. 94. F. F. Stephan, History of the uses of modp,rn sampling, J. Am. Stat. Assoc., 43, 12-:-39 (1948). 95. F. Yates, Sampling Methods for Censuses and Surveys, Griffin, London, 1949. 96. W. J. Dixon and F. J. Massey, Jr., Introduction to Statistical Analysi:~, McGrawHill, New York, 1951. 97. R. A. Fisher, Statistical Methods for Research Workers, Oliver and Boyd, London, 1948. 98. A. Hald, Statistical Theory with Engineering Applications, Wiley, New York, 1952. 99. P. G. Hoel, Introduction to Mathematical Statistics, 2nd edition, Wiley, New York, 1954. 100. P. O. Johnson, Statistical Methods in Research, Prentice-Hall, New York, 1949. 101. F. J. Massey, Jr., The Kolmogorov-Smirnov test for goodness of fit, J. Am. Stat. Assoc., 46, 68-78 (1951). 102. E. B. Mode, The Elements of Statistics, Prentice-Hall, New York, 1941. 103. G. W. Snedecor, Statistical Methods, 4th edition, Iowa State College Press, Ames, Ia., 1946. 104. H. M. Walker, Elementary Statistical Methods, Holt, New York, 1943. 105. S. S. Wilks, Elementary Statistical Analysis, Princeton University Press, Princeton, N. J., 1949. 106. C. W. Churchman, Theory of Experimental Inference, Macmillan, New York, 1948. 107. J. Neyman and E. S. Pearson, On the problem of the most efficient tests of statistical hypotheses, Phil. Trans., A231, 289-337 (1933). 108. A. Wald, "On the Principles of Statistical Inference," in Notre Dame Mathematical Lectures, No.1, Notre Dame University, Notre Dame, Ind., 1942. 109. A. Wald, Statistical decision functions, Ann. Math. Stat., 20, 165-205 (1949). OPERATIONS RESEARCH 15-129 110. A. J. Duncan, Quality Control and Industrial Statistics, Irwin, Chicago, Ill., 1952. 111. E. L. Grant, Statistical Quality Control" 2nd edition, McGraw-Hill, New York, 1952. 112. J. M. Juran (Editor), Quality Control Handbook, McGraw-Hill, New York, 1946. 113. C. W. Kennedy, QuaWy Control Methods, Prentice-Hall, New York, 1948. 114. S. B. Littauer, Social aspects of scientific method in industrial production, Phil. Sci., 21, 93-100 (1954). 115. S. B. Littauer, Technological stability in industrial operations, Trans. N. Y. Acad. Sci., ser. II, 13 [2], 66-72 (1950). 116. P. Peach, An Introduction to Industrial Statistics and Quality Control, 2nd edition, Edwards and Broughton, Raleigh, N. C., 1947. 117. J. G. Rutherford, Quality Control in Industry-Methods and Systems, Pitman, New York, 1948. 118. W. A. Shewhart, Statistical Methods from the Viewpoint of Quality Control, U. S. Department of Agriculture, Washington, D. C., 1939. INFORMATION THEORY AND TRANSMISSION D. INFORMATION THEORY AND TRANSMISSION 16. Information Theory, by Peter Elias 17. Smoothing and Filtering, by Pierre Mertz 18. Data Transmission, by Pierre Mertz D INFORMATION THEORY AND TRANSMISSION Chapter 16 Information Theory Peter Elias 1. Introduction 16-01 2. General Deflnitions 16-02 3. Simple Discrete Sources 16-08 4. More Complicated Discrete Sources 16-19 5 Discrete Noiseless Channels Distribution of Information 16-24 16-26 Channel Capacity and Interpretations 16-32 6. Discrete Noisy Channels 7. Discrete Noisy Channels II. 8. The Continuous Case 16-39 References 16-46 1. INTRODUCTION Basis of Information Theory. As used here, information theory is a body of results based on a particular quantitative definition of amount of information. This definition has a firm 'claim to unique importance in connection with the engineering questions which arise in systems which transmit and store information. It has proved interesting and sometimes useful in other fields (Refs. 1-4). However, other definitions have also been proposed (Ref. 5) and one of them has a long and useful history in statistics (Ref. 6). Caution is therefore needed in applying this definition to a situation in which the theorems which are its main justification in transmission and storage problems do not apply. Communication Theory. Information theory is a subdivision of a broader field, the statistical theory of communication, which includes all the 16-01 16·02 INFORMATION THEORY AND TRANSMISSION probabilistic analysis of communications problems. This broad field includes in addition to information theory the analysis of random noise (Ref. 7), work on optimum linear filtering and prediction (Ref. 8, see also Chap. 17), statistical analysis of signal detection (Refs. 9, 10), and many other applications of probabilistic ideas which make no use of an information measure. Note on Terminology. Some authors, particularly in England, use information theory in a very broad sense, to include theories of scientific method and of statistical inference along with communications problems (Ref. 11). They then use "communication theory," or "mathematical theory of communication," or "theory of selective information," to denote what is here defined as information theory. Mathematical Character. Information theory is essentially a branch of mathematics. Although the language has a physical ring, the words information source, channel, coder, etc., are mathematical concepts physically inspired. The theory can be presented as a formal series of definitions, theorems, and comments. However, its relevance to a given problem is then not very clear. Section 2 provides contextual definitions and qualitative results: the later sections are more formal. 2. GENERAL DEFINITIONS A Communications System Figure 1 shows the model of a communications system which is used in information theory. At the transmitter the source produces an output that is coded and fed into the channel. The channel output may be identiNoise Message FIG. L Coded I signal Received signal Decoded message The model of a communications system which is used in information theory. cal to the channel input, or it may be altered by noise or distortion. At the receiver the channel output is decoded and used. The model derives from communications, but it is applicable to the communications aspects of other problems. Examples. The storage of a digital computer is a channel, possibly noisy, with input and output separated by time. A control system may be a channel with electrical input and mechanical output. INFORMATION THEORY 16-03 The Information Source For purposes of the theory, the source in a given analysis is the point at which information enters the part of the system under consideration. The source may not actually generate information, but may merely store or relay it. Examples: a stack of telegrams waiting to be transmitted; a reel of recorded magnetic tape. Whether the source under consideration is a true generator of information or merely a storage point is not of concern. Controlled and Uncontrolled Sources. A controlled source is one which generates information at a rate controllable by the transmitter. Examples. The stack of telegrams being read by a telegraph operator is a controlled source; so is a speaker who may be slowed down by his audience when he speaks too rapidly for them to take notes. An uncontrolled source is one which produces information at a rate determined internally, which cannot be adjusted to the coding and transmission facilities available. Segmentation. The output of a source is a sequence of symbols. It is convenient to break this sequence into segments at a number of different levels. This process is called segmentation by the linguists. Example. The output of a teletype system is a sequence of binary selections, "Mark" and "Space." If these are denoted by "0" and "1," then 0 and 1 are called the elements of the representation. The group of consecutive elements which represents a single letter, number or mark is called a character or letter of the representation, and the set of all possible characters or letters is called the alphabet. The alphabet in a teletype system may be thirty-two characters in number, with each character a group of five elements. Words and Messages. In many alphabets one of the characters is called the space, and given special significance. Sequences of characters occurring between successive spaces are called words. A sequence of words which is more or less independent of the preceding and succeeding source output is called a message. Example. A single telegram might be called a message. If successive source symbols are highly correlated the whole (possibly infinite) source output is the message. Other Levels. The above list of levels is not exhaustive, nor are all these levels relevant to the description of a particular source. Examples. In written English the additional levels of syllable, phrase, clause, sentence, and paragraph are recognized, but the element level is not used. In the 'analysis of spoken language a set of elements, the distinctive features, have been introduced (Ref. 12). In a binary digital computer, element and character coincide as being the symbols 0 and 1 used to represent the binary digits. In a binary-coded-decimal machine the elements are the same, but the characters are groups of four, five, or six elements representing one decimal digit or one alphanumeric symbol. The group of 16-04 INFORMATION THEORY AND TRANSMISSION digits which fits into one storage register is called a word, and one line of coding, consisting of several related words, which is called an instruction in computer terminology, might be called a message. Choice of Levels. The choice of levels of segmentation is not standardized outside linguistics, nor is there any agreement on terminology. When no particular type or level of segmentation' is implied, the output of a source will be a sequence of symbols, selected from some finite alphabet. When two levels are needed at once, as in discussion of word-by-word translation of a sequence of letters, word will be used for the higher level and symbol or letter for the lower level. In mathematical discussion, a segment of a sequence will be called a message only if it is strictly statistically independent of preceding segments. Representations and Codes If a source makes a series of binary selections, its output may be represented, for example, by a sequence of A's and B's or by a sequence of O's and l's. These are two representations of the source output. If the first is taken as primary, then the second is called a coded version of the first. Codebooks. To get from one representation to the other requires a dictionary, called an encoding codebook. This has two entries: A (1) ~ 0 B~l to get back from the second representation to the first requires a decodiny codebook with the entries O~A (2) l~B Codes. A code is a transformation, which is defined by an encoding codebook or an equivalent set of rules. If a coded message is to be decoded, the inverse transformation as defined by the decoding codebook is also required. In the example of eqs. (1) and (2), the transformation is oneto-one on each symbol and defines its own inverse, so that only one codebook is required and it may be written with double-headed arrows: A~O (3) B~l In representations with large alphabets the two codebooks may still be useful even if only one is necessary. Example. It is convenient to find a telephone number in a standard d,irectory by looking up a name, but the inverse operation is tedious, though unambiguous. INFORMATION THEORY 16-05 Transliteration. A code is called a transliteration if each input symbol is transformed directly into one output symbol, so that symbolby-symbol coding.is possible and the number of entries in the codebooks is equal to the size of the alphabet. The code of eq. (3) has this property, but not all one-to-one codes do. Example. The representation of the binary source output as a sequence of A's and B's may be coded into a representation as a sequence of O's and l's by the codebook ° AA ~ AB ~ 10 (4) BA ~ 110 BB ~ 111 To each input sequence corresponds one output sequence, which may be decoded into its original form by reversing the arrows in the codebook if the time origin of the coded version is known. The coded output is a one-to-one transformation on the input sequences, but it is not a transliteration of the symbols A and B. Choose a different level of segmentation (word, rather than character) and let the first representation consist of sequences of the four words AA, AB, BA, BB. Then if the four words 0, 10, 110, and 111 are taken as the dictionary for the second representation, the code becomes a transliteration. Significant Codes. In assigning code numbers to objects (Example: the items in a catalog) a distinction is made between significant and nonsignificant codes. Each code number may be considered as a coded version of a description of the item in English. A significant code is one in which transliteration is possible at some level of segmentation below that of the entire code number. Example. The Gode number assigned to a garment in a catalog may consist of a sequence of groups of decimal digits, the first group denoting type of garment, the second size, the third color. Each group may be independently decoded into English words, so that transliteration is possible if each group is considered as a word in the coded version of the message. A code that cannot be decoded piece by piece is called nonsignificant. Example: a code assigning simple serial numbers to items in a catalog. Coding and Decoding Delay. Note that coding and decoding delays arise when the coding is not a transliteration. Example. In the codebook of eq. (4), after the source has selected its first symbol, the coder must wait until the second symbol has also been selected before it can encode the pair. In decoding, after a 1 has been received, it is necessary to wait for one or two more input symbols before the appropriate output pair can be selected from the set AB, BA, BB. 16-06 INFORMATION THEORY AND TRANSMISSION Representation and Selection. The output of a coder is called a representation of its input if it is obtained from the input by a one-to-one transformation with at most a finite encoding delay~ Note that this definition agrees with the colloquial meaning of representation for a significant code, in which each segment of the output represents a corresponding segment of the input. A nonsignificant code requires a different interpretation. The code number corresponding to a telegraphic greeting is not a modified version of the message, but an instruction as to where in the decoding codebook the message will be found. The coded version does not represent the message but selects it. This concept is basic for information theory (Ref. 13). Coders. The coder in Fig. 1 matches the source to the channel. The first requirement on the coder is that it match alphabets. It must transform sequences of symbols from the source alphabet into sequences of symbols in the alphabet which the channel will accept. This requirement does not specify.the coder completely. The two codebooks of eqs. (3) and (4) both transform A's and B's into O's and l's, but they describe different coders. . Statistical Matching . . For economy of transmission facilities the coder may be designed to minimize the number of channel symbols required, on the average, per source symbol. This requires knowledge of the statistics of the source. Example. The codebook of eq. (4) is more complicated than the codebook of eq. (3) and introduces delay. However, if A's occur 99 per cent of the time and B's only 1 per cent, the output coded via eq. (4) will require only 0.5015 channel symbol per source symbol, whereas the output coded via eq. (3) will require one channel symbol per source symbol, and so will take nearly twice as long to transmit. Economical coding will be discussed in Sect. 3. Channels A discrete channel, like a coder, accepts a sequence of symbols selected from its input alphabet and produces a related sequence of symbols selected from its output alphabet. The precise boundaries of the channel in a given system are a matter of choice. Example. A teletype system may be analyzed by using as a channel the medium of transmission of the electric pulses. A second analysis of the same system might treat the channel as running from input keyboard to output printer. As essential difference between a channel and a coder is that the channel output may not be an accurate representation of its input: some information about the message may get lost in transit. This may occur in two ways. INFORMATION THEORY 16-07 a. Loss. A channel is lossy if it is possible to make finer distinctions at its input than are preserved in its output. Example. Pulses of fixed duration and of any a"mplitude greater than 0 may be used successfully to trigger a circuit which provides an output pulse of fixed duration and amplitude, no output being produced if the input pulse is smaller than O. A channel is made of this circuit by using as an input alphabet pulses of the ten amplitude levels -5, -4, ... -1, + 1, "', +5. This channel accepts ten input symbols and produces two output symbols: it loses the additional A~ o B 1 C ~ a 1 A<.5 b F~ H I 1 ~ ARo.a 9 E G a 0.5 b 1 8<.5 0.1 0.2 B 0.8 b c 0.5 J d (a) Lossy channel (b) Noisy channel, (c) Noisy channel accurate reception FIG. 2. Some examples of discrete channels. amplitude information present in the input. Such a channel is shown schematically in Fig. 2a. b. Noise. A channel is noisy if a given input sequence may be received as anyone of a number of possible output sequences, depending on random action of the channel. In a lossy channel the received sequence is determined by the transmitted sequence; in a noisy channel this is not true. Examples. Figure 2b shows a noisy channel in which the noise does not bother the receiver, who is still able to tell what has been transmitted, although the transmitter does not know exactly what has been received. Figure 2c shows a noisy channel in which the channel noise prevents either transmitter or receiver from knowing with certainty what happens at the other end of the channel. Decoders. Any of the channels of Fig. 2 can transmit information at a definite rate with an arbitrarily small probability of error. This can be done simply for the first two channels by merely lumping together some of the input symbols and output symbols, respectively. It can also be done for the channel of Fig. 2c, by making use of the proper coder and 16-08 INFORMATION THEORY AND TRANSMISSION . decoder. The coder is still like the codebooks of eqs. (3) and (4), although the matching job performed is more sophisticated. But the decoder is different. Since the channel performs a transformation on the input sequences which is many-to-many rather than one-to-one, the decoder must perform a many-to-one transformation. In the channel, each input sequence may produce many output sequences. The decoder must decode all of these (or at least all' of them which occur with appreciable probability) into the same output sequence, if it is to avoid making errors. This will be discussed in Sect. 7. 3. SIMPLE DISCRETE SOURCES Self-Inforlllation Measure Let x denote a particular event, and let Prob {x} be its probability. The amount of information associated with the occurrence of the event x is defined to be (5) I(x) = - log Prob {x}, where the choice of logarithmic base corresponds to the choice of a unit of information. The quantit.y I(x) is sometimes called self-information (Ref. 14) to distinguish it from the mutual information relating two events, discussed in Sect. 6. Units. Logarithms to the base 2 are chosen for eq. (5). The resulting unit is the bit, which is the amount of information associated with the occurrence of an event of a priori probability one-half. Other information units include the Hartley, which is the information given by an event of probability 710, and the nat, or natural unit, which is the information given by an event of probability 1/e, where e is the base of N apierian logarithms, e = 2.71828· ... Bits and Binits. In computer terminology bit is often used as a contraction of binary digit. This practice cannot be followed in information theory, since the occurrence of a binary symbol with a priori probability other than Yz does not provide a bit of information. The word binit will therefore be used as an abbreviation for binary digit (Ref. Ifi). Properties. The information measure has the following two important . properties. 1. Since Prob .{x} ~ 1 for any event x, (6) I(x) ~ O. 2. Let x and y be two statistically independent events, and let x, y denote the event which is their joint occurrence. Then (7) I(x, y) = I(x) + I(y), INFORMATION THEORY 16-09 since the probability of the event x, y is, by hypothesis of independence, the product of the probabilities of x and y. Distribution of InforIllation Message Source. Consider a set Jlf of n different messages, llf = {mi}, 1 ~ i ~ n, and a random process that generates sequences by selecting messages from this set. (The word message implies that successive selections are statistically independent. See Sect. 2.) Let Xk be the kth message selected in time sequence, - co < Ie < co. Then Xk is a random variable, taking values from the set M = {md, with (8) as the probability of selecting the ith message as the leth choice. As eq. (8) implies, it is assumed that the process is stationary, so that Pi is independent not only of the earlier selections but also of the time index k. Bar Plot of Distribution of InforIllation. The amount of information associated with the selection of message mi is then also independent of k and of prior selections: it is given by (9) The random variable I (Xk) takes its values from the set with probabilities (10) Because of eq. (9), if all the probabilities Pi are different from one another, then on a bar plot of the distribution of information, the bars which give the probabilities of the different possible information values all terminate on the single exponential given by (11) Mean and Variance. The information distribution is completely determined by the probabilities Pi, via eq. (9). The most important para:meters of the distribution are its mean value, n (12) and its variance, (13) n 16-10 INFORMATION THEORY AND TRANSMISSION EXAMPLE 1. The source illustrated in Fig. 3 selects messages from the set M = {A, B, C} with probabilities {0.755, 0.185, O.OGO} and information . values {0.405, .2.434, 4.059}. The iriformation distribution has mean value 1.00 bit/symbol, and standard deviation (J'I = 1.10 bits/symbol. 1.0 t " 0.75 ~ 0.5 :.c ~ .c £. 0.25 3.0 4.0 Information, bits ~ FIG. 3. Bar plot of information distribution. The three message probabilities terminate on the exponential of eq. (11), shown dotted. EXAMPLE 2. The source illustrated by Fig. 4 has an alphabet M = {O, I}, with probabilities {72, 72}. This distribution also has a one-bit to~: """'" }. Co e ~ 0,25 0.0 L::-----:::-:-::-----lL--_---L___---I 0,0 0,25 0,5 0,75 1.0 p(O) - - - FIG. 5. The entropy function H(p(O), pel)) = -p(O) log p(O) - pel) log pel). operation is more complicated, and 1 no longer has the form of the entropy function of a probability distribution. The entropy function H (PI, P2) is illustrated in Fig. 5. Since PI P2 = 1, this is actually a function of a single variable only. + Binary Coding The rate of a source is a significant parameter because it determines the communications facilities required to transmit the source output after 16-12 INFORMATION THEORY AND TRANSMISSION proper coding. The source in Fig. 4 generates information at a rate R = 1 bit/symbol, and each output symbol is just one binit. The curve of Fig. 5 shows that one bit is the maximum average amount of information that one binit can convey. In this case, the rate R has the interpretation that the source output may be represented in binits so as to require R = 1 output binits per source symbol. This interpretation can be generalized to other sources. FIRST BINARY CODING THEOREM (Controlled Sources). Given a discrete message source which generates information at an average rate R bits per message and given any 0 > 0, it is possible to construct a representation of sequences of messages as sequences of binary symbols so that, on the average, less than R + 0 output binary symbols are required per input symbol from the source. It is not possible to find a representation using fewer than R output binary symbols per source symbol (Ref. 16). A code which satisfies the requirements of this theorem does the job of statistical matching referred to in Sect. 2. Shannon-Fano Coding. The general strategy in constructing efficient binary codes is to divide the message set into two subsets of nearly equal probability and to use the first digit of the coded output sequence to indicate in which half the selected message lies. Each half is divided into two subsets again by the next digit, and the process terminates on subsets which contain only one message (Ref. 17). This procedure is not quite explicit, however. It will not be possible to make all dichotomies equiprobable unless all the message probabilities are powers of 72. If not, then there are many possible not-quite-perfect codes, and it is difficult to choose among them. The following procedure, called Huffman coding, is explicit, and it gives a "best possible" code (Ref. 18). Hufi'lllan Coding. 1. List all possible messages in order of decreasing probability, and assign as the last digit in the coded output a 0 to the next-to-Iast message and a 1 to the last message. These two messages will agree in all the (as yet unknown) digits preceding the last one. 2. Merge the last two messages, adding their probabilities, and insert the sum in its proper position in the list of message probabilities. Now repeat step 1. Continue until all messages are merged. EXAMPLE. The process is illustrated for the message set of Fig. 3 in Fig. 6, by a kind of a graph which is called a tree for obvious reasons. The code for each message is read off starting at the left node and reading the O's and 1's which label the branches along the (unique) path terminating in the selected message. For Fig. 6 this leads to the codebook INFORMATION THEORY (15) p(A) = 0.755 = 0.185 0.060 p(B) p(C) = A ~ 0 B ~ 10 C ~ 11. 16-13 1.000 0.060-- C A coding tree. FIG. G. Codewords. The Prefix Property. The output sequences of a codebook are called codewords. The code of eq. (15) and of Fig. 6 illustrates a characteristic feature of Huffman codes called the prefix property: no . codeword is a prefix of any other longer codeword. The prefix property is a sufficient condition to guarantee that a sequence of codewords written down in order without spacing can be uniquely decoded into a sequence of source symbols. Decodability is required in order that the output be a representation of the input, and spaces between words are not permitted since, if they were used, the output alphabet would be ternary and not binary. The prefix property is not necessary, however. I t is possible to construct codes, which can be decoded after some delay, that do not satisfy this condition. EXAMPLE. The codebook (16) A ~ 0 B ~ 01 C ~ 11 is decodable, but the codewords do not satisfy the prefix condition since o is a prefix of 01. There is no advantage to such codes in the binary coding case, and there seems to be none in general (Refs. 19, 20, 21). The Szilard-Kraft Inequality. Let Wi be the number of binits in the codeword for the ith message mi. Thus for the codebook of eq. (15) one has WI = 1 (17) W2 = 2 W3 = 2. 16-14 INFORMATION THEORY AND TRANSMISSION The smaller the Wi are, the fewer output binits are required per .input symbol. However, if they are tOb small, the codewords cannot all be different. Thus if Wi = 1 for all i, the only distinct codewords are 0 and 1, which cannot distinguish three different messages. A condition on the lengths of code words is given by: THE SZILARD-KRAFT INEQUALITY. Given a set of n messages, it is possible to assign a codeword of length Wi to the i-th message, and to satisfy the prefix condition, if and only if the Wi satisfy the inequality n L 2- (18) wi ~ 1. i=l If the codeword lengths do not satisfy this condition, no decodable code can be constructed (Refs. 22, 23). Coding hnplications. Suppose all Pi are powers of 72, so that all information values are integers. Then let Wi = Ii = - log Pi. This gives n n n (19) which satisfies the constraint of eq. (17). Thus a decodable code can be constructed in which each message has a codeword length in binits equal to its information content in bits. Then the average codeword length, which is the average number of output binits per input message, is n (20) ill = L n PiWi i=l = L Pili = 1 = R. i=l It can be shown that no smaller value of ill can be obtained from codeword lengths satisfying eq. (18). This proves the First Binary Coding Theorem for these special cases, with 0 in the theorem = O. EXAMPLE. Consider the following set of five messages and their codes. Message ml m2 ma m4 m5 Probability 1 2 1 "4 1 "8 716 716 Codeword 0 Ii Wi 1 1 10 2 2 110 1110 1111 3 4 3 4 5 5 General Case. In general the Ii are not integers, since the Pi are not powers of 72. However, a decodable code can always be constructed in which Wi is the smallest integer which is greater than or equal to Ii. Then 16-15 INFORMATION THEORY < Ii + 1, PiWi < Pili + Pi, Ii ~ Wi (21) PJi ~ R=1~w~1+1=R+1. so that the average number of output binits per message is never more than one in excess of the average number of bits per message. This means that if the number of bits per message is large, the percentage excess is small. One can always make the number of bits per message large by coding sequences of input messages, taking all possible sequences of length L messages as a new message set, containing n L different messages. EXAMPLE. For the codebook of eq. (15) and the source of Fig. 6, the codeword lengths Wi are given in eq. (17). One can compute w, the average number of binits per source symbol: n (22) W = L: PiWi i=l = 1 X 0.755 + 2 X 0.185 + 2 X 0.060 = 1.245 binits/bit. N ow form all 32 = 9 possible pairs of messages selected by the source of Fig. 6. Using the Huffman coding procedure, as illustrated by the tree 0.5700 0.0111 --Be --M o 1.0000 L..---=---=-_._--..:::.....----..--=----.,--..:::...---..------=--:::-=~-=--_._-=-_<0.0147 --CA 0.0453 FIG. 7. --CB 0.0111 A coding tree for message pairs. in Fig. 7, gives the following set of messages, probabilities and codes. Message AA AB AC BA BB BC CA CB CC Probability 0.5700 0.1397 0.1397 0.0453 0.0453 0.0342 0.0111 0.0111 0.0036 Code o 11 101 1001 10001 100000 1000011 10000100 10000101 --CC 0.0036 16-16 INFORMATION THEORY AND TRANSMISSION Evaluating eq. (22) in this case gives W = 2.0767. The average information per message is two bits: for since successive messages are statistically independent, the information in a sequence of two messages is the sum of the informations in each of the two, and the average of a sum is the sum of the averages. The efficiency of this coding in bits per binit is about 0.96. In terms of the inequality of eq. (21) this might be as low as % or as high as 1.00. The fact that the coding here has an efficiency well above its lower bound is typical of coding results. I t is due to the fact that the entropy curve in Fig. 5 has a very broad maximum, so that the message set must be quite skewed in probabilities before the efficiency of coding drops very low. General Case Continued. Define WL as the average number of binits required to code a block of L source symbols. Rewriting inequality (21) for the new message set gives LR ~ WL ~ LR + 1, (23) WL R+ 1 R'5::-'5::--· - L - L Now vh/ L is the average number of output binits per original input message, so that the last line of eq. (23) satisfies the requirements of the First Binary Coding Theorem for any L > 1/0. This proves the .theorem in the general case of a message source with arbitrary information distribution. It also justifies a definition which assigns the same rate to the two very different sources of Figs. 3 and 4, if these sources can be controlled. Controlled Source Coding. The source of Fig. 4 may be controlled to read out one binit per second. The source of Fig. 3 may be controlled so that its coded output produces one binit per second. This will require that the source be speeded up when it generates A's, and slowed down when it generates B's or C's, but on the average it will be generating very nearly a symbol per second. The average rate of the source, then, determines the communications facilities required to transmit its encoded output. The differences in the distributions of Figs. 3 and 4 affect only the amount of delay and the size of the codebook required for efficient coding into binits. This result holds for any controlled source, whose symbol rate may be varied in order to keep its information rate constant. Uncontrolled Sources. Here the source generates messages at a constant rate. If there is any variance in its information distribution, the rate at which it generates information will then fluctuate. In an efficient code, Wi, the number of output binits in the codeword for the message mi, is still close to the message self-information J(mi). The average number INFORMATION THEORY 16-17 of binits per message WL is still near to the source rate R. But it is possible (though highly improbable) that all L of the messages in a block will be those of maximum self-information. Therefore it is not possible to transmit all message sequences as they come along unless a channel is used which can transmit binits at a rate equal to Wmax times the (fixed) rate at which the uncontrolled source generates messages. Here Wmax is the largest of the Wi, i.e., the length of the longest codeword. MiniInax Coding. If all the message sequences generated by an uncontrolled source are to be transmitted unambiguously, the best code to use is not the Huffman code, which minimizes W but will have a large W max if the information distribution has appreciable variance. Rather it is better to use a code that minimizes the value of W mux , a minimax problem with a simple solution (Ref. 24). The codewords are taken of uniform length W m , where Wm is the integer such that (24) or Here nm is the number of messages. Since there are 2Wm codewords of length W m , there are at least enough to label all the messages. Note that this coding procedure is quite independent of the message probability or information distribution and depends only on the number of messages in the set. Efficient Coding: Uncontrolled Source. The average rate R at which a source generates information still has significance when the source is uncontrolled. Not all sequences of L messages can be coded into about LR binits, but almost all of them can. More precisely we have: SECOND BINARY CODING THEOREM (Uncontrolled Sources). Given a discrete message source which generates information at an average rate R bits per message, and given any 0 > 0 and any E > 0, it is possible to construct a representation of sequences of messages as sequences of binary symbols so that, for each message sequence, less than R + 0 output binary symbols are required per input symbol from the source, except for a set of message sequences whose total probability is less than E. The procedure is to code the messages in blocks of length L, coding each block into a codeword of length Wi ~ Ii 1. The theorem follows because the self-information of a sequence of messages is the sum of the self-informations of the component messages (since statistical independence is assumed), and because the sum of a large number L of identically distributed, statistically independent random variables is very likely to be very near, percentagewise, to Ltimes the mean of the distribution. + 16-18 INFORMATION THEORY AND TRANSMISSION Sums of Random Variables. The Second Binary Coding Theorem follows from the weak la\v of large numbers. Stronger results derive from the Tchebysheff inequality and the central limit theorem. These three results applied to self-information are: 1. WEAK LAW OF LARGE NUMBERS. For any E > and any a > 0, an integer Lo can be found so large that the probability that a sequence of L > Lo messages will have an amount of self-information greater than L(1 a) is < E. 2. TCHEBYSHEFF INEQUALITY. For any a > 0, the probability that a a) sequence of L messages has an amount of self-information greater than L(1 is < E = ui/La 2 • 3. CENTRAL LIMIT THEOREM. For any a > 0, the probability that a sequence of L messages has an amount of self-information greater than L[1 + (a/VI)] is asymptotically given by the expression ° + + (25) E = 1 2 V27rLUI foo X=-fJ/VL e-x 2/2L u r 2dx = 1 2 ~ fOO e-Y 2/ 2ur2dy. y=fJ Here I is the mean and ui is the variance of the self-informationdistribution of the messages. Coding Interpretations. Each of these results translates into a result for efficient coding, since by eq. (21) it is possible to assign distinct binary codewords to all possible message sequences of length L so that the difference between codeword length in binits and information in bits is less than unity for each sequence. Thus to every sequence in a set of sequences of total probability ~ 1 - E, we can certainly assign codes of length < L(I + a) + 1. The remaining sequences, of total probability ~ E, may all be assigned the same codeword, and will cause ambiguity or error a fraction E of the time. Storage and Delay. The central limit theorem shows that for fixed error probability E, the difference between wL/L, the binits per message, and I, the bits per message, decreases with blocklength L only like 1/ VL. This implies that it may be necessary to use much longer blocks to get efficient coding for an uncontrolled source than would be required for a controlled source, for which the difference between wL/L and 1 = R decreases like I/L, as in eq. (23). Effective Number of Messages. From the weak law of large numbers, there is a set of message sequences of total probability > 1 - E, each sequence of which has self-information within ±La of Ll. Each sequence in this set then has probability in the range (26) and the total number of sequences in the probable set, for large L, lies INFORMATION THEORY 16-19 in the range (27) no matter how small e and o. Adding unity to L ultimately multiplies the number of messages in the probable set by 21. This is what would happen if there were just (28) different messages in the set, and neff is therefore called the effective number of messages, or the effective alphabet size. Since I ~ log n, we have (29) That is, the fact that all messages are not equiprobable produces a growth in the probable sequence set as if a smaller equiprobable message set were being used. 4. MORE COMPLICATED DISCRETE SOURCES Most natural sources are more complicated than those discussed in Sect. 3. A more general source is a random process which generates sequences of symbols like the letters of English text, in which each symbol is selected with a probability which depends on the values of the preceding symbols. Joint Probabilities. Consider a set S of n different symbols, S = lsd, 1 ~ i ~ n. Let Xk be the symbol selected at (integer) time k, -00 is a random variable, taking values from the set S = process is well defined if the joint probabilities Xk < k < +00. {Si}. Then The random (30) are known for all combinations of x values and all values of j. It will be assumed that the process is stationary (and indeed ergodic), so that the probabilities are independent of the time index k. 16-20 INFORMATION THEORY AND TRANSMISSION Conditional Probabilities. Knowledge of the joint probabilities of eq. (30) is equivalent to knowledge of the conditional probabilities qO(Si) I qi (Si Xk-I) (31) I qj(Si Xk-b Xk-2, ... , Xk_j). The two sets of probabilities of eqs. (30) and (31) are related by qO(Si) (32) = PI (Si) I qj(Si Xk-b Xk-2, ... , Xk-j)Pj(Xk-b Xk-2, ... , Xk-j) = Pj+I (Si, Xk-b Xk-2, ... , Xk_j). Markov Sources If for some integer N and all integers j (33) qN+j(Silxk-b Xk-2, ... , Xk-N-j) = >0 qN(Silxk-b Xk-2, ... , Xk-N), so that a knowledge of N preceding symbols gives all the probabilistic information available about the next symbol value, then the process is a multiple Markov process of order N (Ref. 52). When N = 1, the process is called a simple Markov process. Self-InforInation. If the process is Markov of order N, then the self-information provided when the symbol Si occurs is the negative logarithm of its probability, but this is now a conditional probability depending on the values of the preceding N symbols. The self-information is thus a random function of the N random variables Xk-b Xk-2, ... , Xk-N. (34) I N(Si IXk-l, Xk-2, ... , Xk-N) = - log qN(Si IXk-I, Xk-2, ... , Xk-N). Average Self-InforInation. A self-information of the symbol Si which is not a random function may be obtained by averaging eq. (34) over the conditional probability rN(xk-b Xk-2, ... , Xk-N lSi) that when Si occurs at time k, the preceding N symbols will have the values Xk-I, Xk-2, ... , Xk-N. By Bayes's theorem, (35) PN+I(Si, Xk-I, Xk-2, ... , Xk-N) rN(xk-b Xk-2, ... , Xk-N lSi) PI (Si) and INFORMATION THEORY 16-21 is defined as the average self-information of the symbol Si for an Nth order Markov process. Source Rate. The average rate R at which a Markov source generates information is equal to the average IN of the IN(si) over the probabilities of the symbols PI (Si). This gives the average self-information per symbol of the process: n (37) R = IN = 2: PI(Si)IN(Si). i=l Other Sources If the source is not a Markov process of finite order, the self-information of a symbol may not be well defined, since it may then be a function of an infinite number of random variables. However, the quantity I N(Si IXk-b Xk-2, " ' , Xk-N) given by eq. (34) is still defined for each N, and it is called the Nth order conditional self-information of Si: the quantity IN(si) defined by eq. (36) is called the N-th order average self-information of the symbol Si. It can be shown that for any process, for all Si and any N, IN(si) (38) ~ 0 IN(si) ~ IN+1(Si); the average information provided by the occurrence of Si when the N preceding symbols are known is a monotone decreasing function of N. Further knowledge of the past, on the average, makes whatever symbol happens next more probable, and therefore less informative. Upper Bounds on Source Rate. If an information source has a measured first-order probability distribution PI (Si) , the average self-information of each of its symbols is at most equal to - log PI (Si), the value it would have if successive symbols were statistically independent. If the source has a given conditional distribution of order N, the average self-information of each symbol is bounded above by the average selfinformation IN(si) of a Markov precess of order N with the same Nth order conditional distribution. Again, statistical dependence beyond what is contained in the given distribution can only reduce the average selfinformation of each symbol. Self-Inforlllation of a SYlllbol. From eq. (38) it follows that one can define a limit, (39) I(si) = lim IN (Si) , N_~ which converges for each Si to a non-negative number. The limit, I(si), 16-22 INFORMATION THEORY AND TRANSMISSION is the average self-information added by the occurrence of the symbol si when all the preceding symbols are known. Note that no general lower bound better than 0 can be obtained. . A process which looks random on the basis of Nth order statistics may always be deterministic on the basis of statistics of order N + 1 and have an average rate of zero. Source Rate. The average self-information i of the process is again equal to its rate R, and is .given by either of the two expressions (40) lim iN, N-+~ where iN, the average,Nth order self-information of the process, is given byeq. (37). . More General Sources. The only type of source of greater generality than the multiple Markov process of finite order which has been studied in detail is called a finite-state source (Ref. 25) . Note. A finite state source (a) includes the Markov processes of finite order but it is not included among them and (b) is still not general enough to generate all and only the grammatical sentences in English word by word (Refs. 26, 27). Because of the complexity of natural sources like written language, indirect methods must be used to estimate their rate. A straightforward application of the definitions involves the measurement of probability distributions of order so high that in all the written English there would be too small a sample for an accurate estimate. Coding and Delay The first binary coding theorem still applies to more complicated sources. It carries over unaltered to any discrete ergodic source, whether it be multiple Markov, finite state, or still more general. This may be shown by two coding methods. 1. Block Coding. Segment the source output into blocks of length L, and code each block from a fixed codebook containing a binary sequence for each of the n IJ possible sequences of source symbols. Each source sequence may be coded as before into a number of binits at most one greater than its total information content in bits. The average self-information in a block of L source symbols is equal to io, the average self-information of the first symbol in the block when no past history is known, plus i1, the average self-information of the second symbol when the first is known, etc., the last term being i L - 1 , the average INFORMATION THEORY 16-23 self-information of the Lth symbol when all preceding L - 1 are known. Averaging over all sequences gives WL, the average number of binits per block of L input symbols, as bounded by L-l L-l 2: I j (41) ~ WL 2: I j + 1. ~ j=O j=O Dividing eq. (41) by L gives the average number of binits per source symbol: 1 L-l _ WL 1 L-l _ 1 (42) - 2: I j ~ - ~ - 2: I j L j=O L L j =0 . L +-. The summation in eq. (42) is the average of the first L average self-informations of the process. Since Ij cannot increase with j, the average will in general be greater than the limit I = R, but it will approach the limit as L ~ 00. Given any 0 > 0, it is always possible to find an L so large that the first coding theorem is satisfied, but the required L may be very large even if the process is Markov of small order. Example. A simple Markov process (of order one) has 10 = 10 bits per symbol, and 11 = R = 710 bit per symbol. It will take L = 100 to code the output of this source at 50 per cent efficiency, i.e., at two output binits per input bit. 2. Conditional Coding. A more complicated but more efficient procedure is conditional coding. Here blocks of L2 symbols are encoded by using one of 11,L1 codebooks. The codebook to be used is determined by the preceding Ll symbols. Such a process has an encoding delay of L = Ll + L2 input symbols, and an average number of binits per source symbol given by 1 (43) L-l _ - 2: I j L2 j=L1 ~ WL2 - L2 ~ 1 - L-l _ 1 2: I j + -. L2 j=L1 L2 EXAMPLE. In the simple Markov example given for block coding, conditional coding will give better than 50 per cent efficiency for Ll = 1, L2 = 10, L = 11, which is much less delay and a much smaller codebook than is required by the L = 100 above for block coding. Correlated InforIllation Values. In extending the second binary coding therorem to more complicated uncontrolled sources, an additional kind of delay arises. The occurrence of a symbol with self-information value above the mean may make it more probable that the succeeding symbol will also have self-information abov~ the mean. Then it will be necessary to encode much larger blocks of. symbols in order to make it highly probable that the percentage deviation of the self-information of the block from its mean value will be small. Mathematically the problem 16-24 INFORMATION THEORY AND TRANSMISSION becomes one of adding correlated rather than statistically independent random variables, and convergence to the mean may be much slower. The second binary coding theorem itself extends to a very broad class of sources, but the Tchebysheff inequality and the central limit theorem do not apply in the form given above. 5. DISCRETE NOISELESS CHANNELS Type I. Channels. The simplest noiseless channel is one in which each of the set S = {Si} of symbols which may be applied as an input to the channel is received unaltered at its output. Since there is a one-to-one correspondence between input and output symbols, they may be identified and called channel symbols. If all channel symbols are of equal duration, their number n and their common duration t completely specify the channel. Such a channel will be called a channel of type 1. Channel Capacity. The capacity of a noiseless channel is the maximum average rate at which information can be received over it. The capacity may be measured in bits per symbol, denoted by C, or in bits per second, denoted by Ct. If the common symbol duration is t seconds, then tCt = C. (44) For a type I channel, inequality (38) and the following discussion show that statistical dependence between successive symbols cannot increase the rate of transmission. Therefore the capacity can be computed by assuming statistical independence and maximizing the average rate R in bits per symbol. (45) with respect to variations in the probability distribution PI (Si). But this rate is just the entropy of the Pi(Si) distribution, which has the maximum value log n, attained when all PI (Si) are equal to lin. Thus for a type I channel, C (46) = log n bits per symbol, Ct = (lit) log n bits per second. Redundancy. If a source is connected to a type I channel and selects channel symbols with unequal probabilities or with statistical dependence, the rate R = I at which it generates information will be less than the capacity C of the channel. The difference C - R is defined as the (absolute) redundancy, in bits per symbol, of the source with respect to the channel. The ratio of absolute redundancy to channel capacity, a number between INFORMATION THEORY 16-25 o and 1, is defined as the relative redundancy of the source with respect to the channel. In terms of the interpretation at the end of Sect. 3 of R = I as the logarithm of the effective alphabet size of the source, the redundancy is a measure of the reduction in logarithm of the size of the effective alphabet, due to nonoptimum utilization of the channel. Type II Channels. A more complicated noiseless channel, which will be called type II, has a different duration ti for each channel symbol Si. lt is again true that capacity is attained by using symbols chosen with statistical independence and maximizing the average rate R t (now in bits per second) with respect to the symbol probabilities PI (Si). But the rate is now given by (47) and the maximization leads to the condition that the instantaneous rates at which each symbol transmits information all be equal to the channel capacity C. Thus (48) or PI(Si) = 2-Ctti • The capacity Ct is determined, for given durations ii, by the normalization requirement that the probabilities of the symbols sum to one; this gives (49) and C t is the (unique) real root of this equation. Redundancy with respect to a channel of type II is defined as it was for a channel of type I, using R t and Ct rather than Rand C. Notice that a source is redundant with respect to a type II channel unless the probabilities with which it chooses symbols are unequal, and are given by eq. (48). Type III Channels. Shannon (Ref. 28) discusses a finite-state channel, which will be called type III. Here the symbols are of different durations, and the alphabet of symbols available at each instant depends on the preceding symbols which have been sent over the channel. An expression for the capacity of such a channel has been given (op. cit.). Type III channels have storage. They will not be discussed here, except to note 16-26 INFORMATION THEORY AND TRANSMISSION that for each such channel there is a corresponding finite-state source, which has no redundancy with respect to the given channel. This optimizing source no longer selects successive symbols with statistical independence. Noiseless Channel Coding Theorellls. The capacity C of a noiseless channel is a rate at which some particular optimizing source can transmit information over the channel. Since the source whose output may need transmission will not usually be an optimal source, this is not a justification for considering'C to be an important channel parameter. The justification is that given a channel of capacity C, any source of rate R < C, and no source of rate R > C may be so encoded as to'permit reliable transmission over the channel. Binary coding theorems one and two can be interpreted as showing how any source can be coded into a binary noiseless channel of type I. These theorems can be generalized. NOISELESS CHANNEL CODING THEOREMS I. Controlled Source. Given a discrete controlled source and a discrete noiseless channel which has capacity C t bits per second, it is possible to control the source to any average rate R t < C t and to encode its output for unambiguous reception over the channel. This is not possible for any R t > Ct. II. Uncontrolled Source. Given: a discrete uncontrolled source of type I, II, or III with average rate R t bits per second; a discrete noiseless channel of capacity C t bits per second; and any 0 > O. If R t < C t , it is possible to encode sequences of source symbols for transmission over the channel so that the probability that such a sequence will be incorrectly decoded is < o. This is not possible if R t > Ct. 6. DISCRETE NOISY CHANNELS. I. DISTRIBUTION OF INFORMATION Mutual Inforllla tion Let x and y denote two related events, and let x, y denote the event which is their joint occurrence. Let Prob {x}, Prob {y}, and Prob {x, y} be the associated probabilities. The self-information given by the occurrence of x is defined in eq. (5) as (50) I(x) = -logProb {x}. If y is now observed, and x and yare not statistically independent, the probabili ty of x a priori will be changed a posteriori to (51) Prob {x/y} = Prob {x, y} Prob {y} . INFORMATION THEORY 16-27 This change in the probability of x changes the amount of information required to select it to Prob {x, y} (52) l(xly) = - log Prob {xIY} = - log . Prob {y} The diflerenee between eqs. (50) and (52) measures how the amount of information required to select x is changed by the knowledge of y. This difference is denoted by lex; y), the amount of mutual information between xandy. Then (53) lex; y) = - log Prob {x} - log Prob {y} log + log (Prob {x, y}/Prob {y}) + log (Prob Prob {x, y} Prob {x} Prob {y} {x, y} /Prob {x}) . Mutual information is measured in the same units as self-information (see Sect. 3). Properties. 1. lex; y) is symmetric: (54) lex; y) = ley; x). This follows from the last line of eq. (53), and justifies the name "mutual information." 2. lex; y) vanishes if x and yare statistically independent. If not, there is a decomposition generalizing eq. (7): (55) lex, y) = - log Prob {x, y} = lex) + ley) - lex; y), showing that lex; y) plays the role of a correlation (Ref. 2). If Prob {x, y} is greater than Prob {x} Prob {y} then lex; y) is positive. 3. lex; y) may be positive or negative, but cannot be greater than the self-information of x or y: l(x; y) ~ lex) (56) ~ ley). This follows from eq. (53), since the conditional probabilities are at most unity, and have nonpositive logarithms (Refs. 14,29). Notation. Any I function whose argument contains no semicolons is interpreted as the negative logarithm of the probability of its argument: thusl(xly) = -logProb {xly}. Any I function whose argument contains a semicolon between two sets of variables is interpreted as in eq. (53), where x and y stand for the expressions to the left and right of the semicolon, and x, y stands for their conjunction. 16-28 INFORMATION THEORY AND TRANSMISSION Distribution of Mutual Information Mutual information measures how much information one symbol provides about another. It can be used in the discussion of Sect. 5 on sources which generate sequences of related symbols. Here it will be applied only to the discussion of noisy channels. Discrete Noisy Channel. Consider a simple noisy channel, as illustrated in Fig. 2. There is a set U = {Ui}, 1 ~ i ~ n u , of symbols which may be transmitted, and a set V = {Vi}, 1 ~ j ~ n v , of symbols which may be received. It will be assumed throughout that the channel is without storage and that it and the source are stationary: the probability of transmitting the symbol Ui and receiving the symbol Vj is independent of time and of prior transmissions and receptions. Let Xk and Yk be the transmitted and received symbols at (integer) time k, Xk E:: U and Yk E:: V. Then the pair (Xk, Yk) is a stationary random variable, taking values from the set U X V = {Ui' Vj} of ordered pairs of transmitted and received symbols, with probabilities (57) Denote the first order probabilities by nv (58) P(Ui) = L: P(Ui, Vj), j=l nu q(Vj) = L: P(Ui, Vj) j=l and the conditionals by p(Ui, Vj) (59) P(Ui) Mutual Information of a Noisy Channel. The amount of information given by Yk about Xk is also a stationary random variable, which takes values equal to the numbers I(ui; Vj) with probabilities P(Ui, Vj). The distribution of this random variable is completely determined by P(Ui, Vj), through eqs. (53) and (58). Its most important parameter is its mean value, the average rate R at which the received symbols give information about the transmitted symbols: nu R = nv L: L: P(Ui, vj)I(ui; Vj), i=lj=l (60) INFORMATION THEORY 16-29 Although for particular Ui, Vj the mutual information may be negative, the average R of eq. (60) is always positive. EXAMPLE 1. The Binary Erasure Channel. As illustrated in Fig. 8, this channel accepts two input symbols, 0 and 1, and produces three output symbols, 0, 1, and X. With probability p its output reproduces its input; p(O) t = ! 0=Ul~0.9 Vl=O 0.1 :5 Vz =x 0.1 p(1) =! 1 = Uz FIG. 8. v3 0.9 =1 1.0 ~ 0.8 0.6 23 0.4 o 0: 0.2 -1.0 0.0 Information, bits ~ 1.0 Binary erasure channel and mutual information distribution. with probability q, its input is erased and an output X indicates the erasure; o and 1 are transmitted with equal probability. When a 0 or a 1 is received, the mutual information is Prob {ut, vd 1(0; 0) = log - - - - - - Prob {UI} Prob {VI} (61) log p = Prob {vII ud Prob {VI} = log 2 = 1 bit = 1(1; 1). p/2 When an X is received, 1(0; X) = log (62) Prob {v2lud Prob {V2} q = log - = log 1 = 0 q = 1(1; X). This gives the information distribution shown in Fig. 8, with the average value 2 (63) R = 3 2: 2: P(Ui, vj)l(ui; Vj) i=1 j=1 = (p/2)1(0; 0) = 2(p/2) + (q/2)1(0; X) + (p/2)1(1; 1) + (q/2)1(1; X) + 2(q/2) X 0 = p. 16-30 INFORMATION THEORY AND TRANSMISSION EXAMPLE 2. The Binary Symmetric Channel. As illustrated in Fig. 9, this channel also accepts the two input symbols 0 and 1, but it only produces the same two output symbols. With probability p its output reproduces p(O) = p(l) i =i ~ O=UIX~ ~ 1=u2 8~ t 1~ O=Vl 1=v2 ~~:~ E 0.4 £~ -3~.0----~--2~.0-------~1.~O------0~.0------~1.0 Information, bits ~ FIG. 9. Binary symmetric channel and information distribution. its input: with probability q = 1 - p its output is the incorrect symbol. o and 1 are transmitted with equal probability, and q is < %. When the correct symbol is received, the mutual information is (64) leO; 0) = log Prob {v!/ud Prob {v!} = log pl(!) = log 2p > 0 = 1(1; 1). When an error is made, (65) 1(0; 1) = log Prob {v2/ud Prob {V2} = logql(!) = log2q < O. This gives the mutual information distribution illustrated for q = in Fig. 8, with the average rate (66) R = p log 2p = 79 + q log 2q = p + q + p log p + q log q 1 - H(p, q), where H(p, q) is the entropy function illustrated in Fig. 5. For q = H(p, q) = H(t, t) 79, = 0.5032, and R = 0.4968 bit per symbol. Averages of Inforlllation Measures. In addition to the average rate R, other averages of information measures must be considered. Notation. An average of an information function l(ui; Vj) over the joint distribution P(Ui, Vj) is denoted by replacing the names "u/' and "v/' of the symbols by the names "U" and "V" of the sets from which the symbols are selected. A single capital denotes an average over a INFORMATION THEORY univariate distribution. 16-31 Thus nv nu L L leU; V) = R = P(Ui, vj)l(ui; Vj) i=l j=l nu leU) = H( {P(Ui)}) L p(ui)l(ui) i=l (67) nu no L L l(U I V) P(Ui, Vj)l(uiIVj) i=lj=l nu np - L L P(Ui, Vj) logp(uiIVj). i=l j=l Equivocation. The average rate at which information about the transmitted symbols is supplied to the channel is the average self-information of the transmitted symbols. This by eq. (56) is greater than the average rate at which such information is received. The difference is nu nv i=l j=l nu nv i=l j=l - L L P(Ui, Vj) log P(Ui) (68) - L L = l(UI V) ~ p(Ui' Vj) log p(uiIVj) o. This quantity is the conditional entropy of the set U given V. It measures the average amount of information about the transmitted symbol which the receiver still lacks after noisy reception, and thus the average rate at which it would be necessary to transmit additional information over an extra channel in order to make the receiver certain of each transmitted symbol. This quantity is also called the average equivocation of the received symbols. Equivocation is present in the channels of Figs. 2a and 2c, but not in Fig. 2b. Irrelevance. The average rate at which the received symbols give information (subject matter unspecified) is the average self-information of the received symbols. From eq. (56), this is also greater than the average rate at which information is received about the transmitted symbols. 16-32 INFORMATION THEORY AND TRANSMISSION The difference may be shown, as in eq. (67), to be nu nv L L leV) - leU; V) p(Ui, Vj) log q(Vj lUi) i=l j=l (69) = I(VI U) ~ o. This quantity is the conditional entropy of V given U. It measures the amount of received information not relevant to the transmitted information, but relevant only to the channel noise. The names "Spread," "Dispersion," and "Prevarication" have all been used for this quantity. "Irrelevance" seems more appropriate in the case, for example, of the channel of Fig. 2b, in which the receiver receives information which is irrelevant but not misleading. The channel of Fig. 2a has no irrelevance, but that of Fig. 2c does. 7. DISCRETE NOISY CHANNELS. INTERPRETATIONS II. CHANNEL CAPACITY AND The Noisy Channel Specification Formally a noisy channel without storage is a graph, like those in Figs. 2, 8, and 9, in which each branch connects a transmitted symbol with a received symbol and has a number on it. The numbers are the conditional probabilities q(Vj IUi) which define statistically what the channel does to given input symbols. These numbers are fixed for a given channel, but the transmitter is free to decide how to use the input symbols. Only the channel with no storage will be considered. TransInitter Strategy. The transmitter strategy formally is a random process, selected by the transmitter to generate sequences of transmitter symbols for transmission over the noisy channel. If an input message sequence is coded into a sequence of transmitter symbols, the random process which generates the messages and the operation of the coder may be combined to obtain the new random process which is the transmitter strategy. In Sect. 4, eq. (38) et seq., it was pointed out that the self-information of a symbol can on the average only be reduced by statistical dependence on preceding symbols. The same is true of the mutual information provided by a received symbol about transmitted symbols for a channel with no storage. If the transmitter wants to maximize the average amount of mutual information received, he can do no better than to select successive transmitted symbols independently from some distribution P(Ui). Then the problem of choosing a transmitter strategy reduces to the problem of choosing a first order distribution P(Ui) for the transmitted symbols. Then P(Ui) and the q(vjl Ui) together determine channel operation completely. INFORMATION THEORY 16-33 Capacity of a Noisy Channel. The channel capacity C of a given noisy channel is defined as the maximum value of the transmission rate R which can be obtained by varying the transmitter strategy. For a channel with no storage, this is the maximum R which can be obtained by varying P(Ui), with q(Vj I Ui) fixed. Thus from eq. (69), nu (70) C = max R = max p(u,) p(u,) nv 2:: 2:: p(Ui)q(Vjl Ui) log 1i=l j=l q(volu o)} J ~ q(Vj) , where the values of q(Vj IUi) are held fixed as the P(Ui) are varied, and the variation of q(Vj) is determined from the relation nu (71) q(Vj) = 2:: p(Ui)q(Vj lUi). i=l The maximization is carried out by differentiating eq. (70) for R with respect to each of the P(Ui), subject to the constraint that nu (72) 2:: p(Ui) = 1. i=l However, this maximization may lead to negative values for some of the p(Ui). It is then necessary to eliminate one or more of the input symbols by setting its probability at zero, and to maximize again with a smaller input set, until a maximum R is obtained with all P(Ui) non-negative (30). Interpretations of Capacity One interpretation of the capacity C of a noisy channel is provided by its definition. In Example 1, Fig. 8, the binary erasure channel has a rate of transmission R = P when O's and l's are transmitted with equal probabilities, as shown by eq. (63). This is the maximum rate attainable for this channel, and is therefore its capacity, C = p. One bit of information per symbol is supplied to the channel, and on the average P bits of information about the transmitted symbol are received. But the transmission process is not reliable. Since information is being supplied to the channel at a rate greater than the channel capacity, not all of it can get through, and the channel determines in a random fashion which bits will be lost and which will be saved. Feedback Interpretation. In the binary erasure channel, suppose that the transmitter can look over the receiver's shoulder and can see which of the transmitted symbols have been erased in the channel. This can be accomplished by having a noiseless feedback channel from receiver to transmitter. Every time a transmitted digit is erased the transmitter can repeat it, going on to the next digit as soon as the first unerased version of the preceding one has been received. 16-34 INFORMATION THEORY AND TRANSMISSION The transmitter is now supplying the channel with information at an average rate of p bits per transmitted symbol. The repeated digits do not give additional information about the message to be transmitted, but only about where erasures have occurred in transmission. The receiver receives information at the same average rate of p bits per symbol, and receives each message symbol once. Here channel capacity has the interpretation it has in the case of a noiseless channel: it is the maximum average rate at which information can be transmitted reliably over the channel. In fact noiseless channel coding Theorem I: Controlled Source (Sect. 5) applies verbatim to this noisy channel. Coding Interpretation. A noiseless feedback channel is not usually available. Fortunately it turns out that it is not needed. It is possible to obtain reliable transmission over a noisy channel, at any rate less than channel capacity, by proper encoding, without making use of any feedback information. This is the primary justification of "capacity" as a significant parameter for a noisy channel. Indeed it is perhaps the most important single justification for the definition of mutual information and the whole structure of information theory. The formal expression of this fact is the: NOISY CHANNEL CODING THEOREM. Given: a discrete source of type I, II, or I I I (Sect. 5) with average rate R t bits per second; a discrete noisy channel without storage of capacity C t bits per second; and any 0 > O. If R t < C t , it is possible to encode sequences of source symbols for transmission over the channel so that the probability that such a sequence will be incorrectly decoded is < o. This is not possible if R t > Ct. Relation to Noiseless Case. This theorem is essentially the second Noiseless Channel Coding Theorem, with a few minor modifications. Both of these theorems may be strengthened to give relations between the error probability 0, the difference Ct - R t between rate and capacity, and the length L of the sequence of source symbols which must be encoded. Results like the Tchebysheff Inequality and the Central Limit Theorem must be applied to both the source self-information distribution and the channel mutual information distribution. Some work has been done on these problems recently (Refs. 31-34). hnplications. The Noisy Channel Coding Theorem shows that a lack of reliability in a channel does not impose a corresponding lack of reliability on the received and decoded messages. This alone is not surprising. For example in the binary erasure channel, transmitting at rate R = liN bits per symbol by repeating each message binit N times for transmission, the error probability per message binit is (73) 16-35 INFORMATION THEORY since the receiver can decode each message binit unless all N repetitions are erased. The probability in eq. (73) can be made arbitrarily small, but only by letting R ~ o. The theorem also states, however, that error probability can be made arbitrarily small without decreasing rate R, so long as R < C, the channel capacity. This requires proper encoding of long sequences of source symbols. Construction of Codes. There is as yet no analog to the Huffman code for noisy channels. Considerable work has been done in designing codes for the binary symmetric channel of Fig. 9 (Refs. 35-43). However, no simple, explicit coding procedure has yet been found for transmitting at rates arbitrarily close to channel capacity with arbitrarily small error probability. The only constructive procedure available transmits at rates less than capacity. However,if the rate is kept fixed, it is possible for the receiver to set the error probability as low as he desires, but this depends on how much delay he is willing to tolerate. This procedure has been discussed for the binary symmetric channel (Ref. 40). It will be illustrated here for the binary erasure channel of Fig. 8. Error-Free Coding for the Binary Erasure Channel In the binary erasure channel of Fig. 8 the error probability can be reduced by using some of the input binits as information symbols and some as check symbols. Such a coding procedure is illustrated in Fig. 10. Message 10100110 Received noisy message __ -' 0 1 0 0 11 11 X 55J d 1 11 X X Erasures I I 0 0 0 I I 0 0 Decoded message 10 FIG. 10. 00110 1 111 x x 010 1 101 Parity check coding for the binary erasure channel. 16-36 INFORMATION THEORY AND TRANSMISSION Parity Check Codes. Assume that the probability q of erasure per digit in the channel is 0.05. In Fig. 10, each group of four successive message digits has added to it by the coder a fifth digit for checking. The added digit is selected to be a 0 or a 1 so as to make the total number of I's in the block of five coded digits even. A check digit of this type is called a parity check (Ref. 35). If the channel erases only one digit or none in each block of five, the receiver can correct the erasure by filling it in with a 0 or 1 so as to make the total number of I's in the block even again. The channel may erase two or more digits in a single block of five, as shown in the third block in Fig. 10. The receiver cannot correct that block. However, with q = 0.05, so that on the average only 5 per cent of the transmitted symbols are erased, the receiver will be able to correct more than three-quarters of the erasures. In fact, the average number of erasures per block of N = 5 remaining after correction is equal to N q, the average number before correction, less NpN-I q, the contribution to the average due to blocks containing a single erasure. This gives (74) Nq - NpN-I q = Nq[I - pN-I] = Nq[I - (1 _ q)N-I] < Nq[I - (1 - q)N] < Nq[I - (1 - Nq)] = (Nq)2. Thus the number of erasures remaining is reduced from Nq to (Nq)2, by a factor of Nq = 5(0.05) = 0.25. Behavior of First Stage. In the coding procedure illustrated in Fig. 10, input information is being supplied at an average rate of ~t = 0.80 bit per input binit. The capacity of the channel is C = p = 1 - q = 0.95 bit per symbol. The resultant probability of remaining erasures is <74 of the original probability of erasure in the channel. Without reducing the input information rate it is possible to reduce the probability of remaining erasures by iterating this kind of checking procedure. The second step in such an iteration is illustrated by Fig. 11. Iteration. In Fig. 11, the size of the basic block has been increased from 5 to NI = 10 digits, the first nine being message digits and the tenth being a parity check on the preceding nine. After nineteen such blocks of ten, the coder adds a twentieth check block. The first digit in the check block is a parity check on the first digits in each of the nineteen preceding blocks, i.e., digits 1, 11, 21, .. " 181 in time order. The second digit in the check block checks all the second digits in preceding blocks and so on; the last digit in the check block checks the nine preceding check digits, and it is in fact a parity check on the whole group of (10)(20) = NIN2 = 200.digits. This is visualized most easily as in Fig. 10, in which the blocks are aligned below one another, and each digit in the check block INFORMATION THEORY 16-37 checks the column above it. The last digit in the check block checks both the row to its left and the column above it. The receiver decodes each row which has no more than a single erasure as soon as the check digit at the end of that row is received. If a row has more than one erasure, it can still be decoded properly when the check block arrives, if each column has only one erasure left after the first decoding step. 1 0 1 1 1 0 o 0 I I 1 1 o 1 I o 0 0 0 0 I I I 1 o I I o I 1 I o 0 o I I o I I I I 0 I I I 0 o 0 o 0 1 0 I 0 0 I 0 o 0 o 1 I I I 1 000 I 0 I o 1 I I 0 100 I o 0 I 0 I I 1 I 0 I 0 1 1 0 1 000 0 I I 100 I 0 I 0 I I I o 1 I 0 1 0 I 0 1 0 I I I o 0 1 0 I I I 0 0 1 0 I 0 I I 0 0 0 I o I I I I 0 I 0 1 0 0 1 0 I 1 0 I I 1 1 0 I 1 0 I I I 0 1 0 1 0 0 1 I o 0 I I I I 0 o 0 I I 0 0 o 0 o 0 I I 100 I 0 0 o I I I 0 0 I 0 1 1 1 I 001 1 x 0 o I 001 I 0 o I I 1 I o 0 I I 0 0 I I 100 0 I I I 1 1 I 0 0 001 100 010 0 I 0 I 0 I 1 o 0 1 x 0 100 o 0 x 100 o I 1 I 0 1 1 lOx o 0 I I 1 0 I I I I o 0 I 0 0 I I x 0 x I 001 I 1- I 0 I 0 I o 0 o I I 0 I I 0 I o 0 0 o 0 I 1 I 0 1 1 100 I o 000 I 0 0 I I I 0 100 1 0 1 0 I I I I 0 I 1 I 001 I 0 o I 100 0 1 0 0 I 0 I 0 I I o 0 I I I I I I 0 1 o 0 I I 0 I 0 I I 0 I I 100 o 0 1 0 0 I I I 1 1 o I I x x o 0 o 0 I 0 I 0 I 1 x 0 o I 0 I I x I 0 I 0 I 0 I 1 o 0 I 100 100 1 I 0 I I 0 o 0 0 0 I 000 1 I o 1 o 0 I I 100 x o x I o 0 I 0 I o I 1 0 I 0 I 1 0 I 1 000 1 o I o 0 I 0 I o I 1 I 0 I I o 0 o I 0 I I I I I 0 I I o I I o 0 0 1 0 o 0 I I x x 0 1 0 I I I I 0 I o I 001 I I I 0 I I 1 1 0 I 1 I 1 0 100 I 0 1 I I 0 o I I 0 o 0 o I 0 0 I 0 I I I I 0 I I 0 o 1 I 0 100 0 o I I 0 o I I I 1 I 0 I 1 1 I 0 000 0 0 o I 1 0 I 0 0 0 I 0 0 000 I I I I 0 I 0 0 I 0 I I 0 0 1 I 100 I 0 I I I I 100 0 1 0 1 1 0 1 I 0 000 I o 0 1 o 1 o 1 I 010 I I I I o I o 0 1 0 I 1 o 0 1 0 I I 1 0 0 1 o 0 0 1 I 0 0 1 1 0 1 o 0 I 100 1 0 0 I o 0 1 1 1 1 0 0 1 0 1 1 1 0 000 I I 0 0 1 0 1 0 1 o 0 1 0 1 0 1 I 0 1 0 1 1 I 0 I 1 0 o '-----,vv---.,J \\,._ _ _ _ _- - - - - J1 \\,.----Vv---~/\....- - - " " v v - - -...... Transmitted message FIG. 11. Received noisy message After correction by rows After correction by columns Iterated checking for the binary erasure channel. Erasure Probability. Since none of the digits appearing in a single column have ever been together in a check group before, they are statistically independent, and the distribution of erasures in each column is binomial again. Define q as the erasure probability in the channel, ql as the average erasure probability remaining after correction of rows, q2 as the average erasure probability remaining after checking by columns, PI 1 - ql, P2 = 1 - q2· N1ql = N1q - N1pN-1q (75) < (Nlq)2, For q = 0.05, Nl = 10, this gives ql < !q. For N2 = 20, q2 < !ql. Further Iteration. The next step, keeping Nl = 10 and N2 = 20, is to add a check layer of 200 check digits after 39 layers of 20 blocks of ten digits each have been transmitted. This third order check will again multiply the erasure probability by a factor < 72. This procedure can 16-38 INFORMATION THEORY AND TRANSMISSION be repeated indefinitely, giving for the kth order check a remaining erasure probability qk (76) < (N kqk-l)qk-l < !qk-b Nk = 2Nk - 1 or qk < 2-kq: = 2k - 1N 1 • In the limit as k ~. 00, the remaining erasure probability becomes arbitrarily small. The rate of transmission in bits per symbol is just the fraction of input symbols which are message digits and not check digits. This is R = (1 - /0)(1 - lo)(l - io)'" (77) >1>1>1- + lo + io + ... ) (/0)(1 + t + t + ... ) (/0 2 10 = 0.80. Thus the rate is at least as great as the rate in the simple block checking scheme of Fig. 10, but the error probability is as low as the receiver cares to set it if the transmitter adds the check digits of all orders, and if the receiver is willing to wait long enough for a sufficiently high order check to come along before decoding. Relation between Error Probability, Rate, and Delay. The iterative coding procedure just discussed is not optimum. However, it shares two characteristics with optimum systems. 1. The reliability attained increases, for fixed rate, as the permitted coding delay increases. 2. The reliability attained increases, for fixed delay, as the required transmission rate decreases. For any noisy channel there is a trading relationship between the probability P e of residual error, the permissible delay N, the transmission rate R, and the channel capacity C. Here N is the number of symbols delay permitted between the transmission of a given symbol and the computation of its decoded version. The best terms of trade can be shown to give an approximately exponential decrease of error probability with delay: (78) in the sense that (79) Pe) = x(C, R) log lim ( - - N-+w N exists for C > R as a positive number. The function x(C, R) is called the exponent of the error probability, For C - R small but positive, the ex- INFORMATION THEORY 16-39 ponent is approximately given by (C - R)2 (80) x(C, R) - 2ui where ui is the variance of the mutual information distribution for the given channel, and for the transmitter distribution P(Ui) which attains capacity (Refs. 31, 32, 34, 41). 8. THE CONTINUOUS CASE Con tinuous Sources A waveform like that of Fig. 12 shows two kinds of continuity. It takes on a continuum of amplitude values, and its amplitude changes continuously with time. Signal ampf"de ~Time FIG. 12. A continuous waveform. Quantization. The amplitude continuity may be removed by amplitude quantization, as in Fig. 13. This may be accomplished by a quantizer, which has an amplitude transfer characteristic of the staircase type as illustrated. The output is a waveform whose amplitude values are selected Quantizing levels Signal amplitude ~Time t FIG. 13. An amplitude-quantized waveform. from a discrete set, but whose jumps from one value to another occur at arbitrary times. The difference between the input signal and the quantized output is the quantization noise (Ref. 44). Salllpling. The time continuity may be removed by making periodic observations of the amplitude of the waveform, deriving from the con- 16-40 INFORMATION THEORY AND TRANSMISSION tinuous time function a sequence of sample values. The period of the sampling is called the sampling interval. The resultant samples still have amplitudes selected from a continuous set. SAMPLING THEOREM. If a waveform x(t) is bandlimited to frequencies between 0 and W cycles per second, then it is completely determined by its samples x(kT) taken at a sampling interval T = 1/ (2W) seconds. The function x(i) may be re-created from its sample values by the expansion (81) x(t) = -Time Transmitted waveform ° ° D9 D9 FIG. 15. CJ ~ P 799 Binary pulse-code modulation. Self-Information of Continuous Signals. In a sampled sequence which is not quantized, each sample value is selected from an infinite set and may have infinite self-information. If the samples x(kT) are statistically independent and have the probability density p(x), then - log p(x) and its average value (82) H(X) = - foo p(x) log p(x) dx -00 are still definable in many cases, but they no longer represent information values. The quantity H(X) is still called the entropy of the distribution with density p(x), but is no longer the average self-information of the source per sample. The entropy function in the continuous case is not invariant under a change in the scale by which the amplitude x is measured. The infinite self-information associated with a selection from a continuous set arises from the fact that the selection of a single real number between 0 and 1 is equivalent to the selection of an infinite sequence of binary digits, namely the binary expansion of the real number, and conversely. Continuous Noisy Channels Only stationary channels, without storage, and with bandwidth limited from 0 to W cycles per second will be considered. Such a channel is defined by a conditional probability density function q(y Ix). For a given value x of the transmitted sample, q(y Ix) gives the density of the distribution of possible received values y. 16-42 INFORMATION THEORY AND TRANSMISSION Additive Noise. Let z be a noise voltage selected with probability density r(z), and let the received signal y be the sum of the transmitted signal and the noise, y = x + z. Then EXAMPLE. (83) q(ylx) = q(x + zlx) = r(z) = r(y - x). Thus a continuous channel with bandwidth Wand additive noise is completely specified by the distribution of the noise which is added. Mutual Information and Rate. If each transmitted sample value x is selected from a probability density p(x), and the channel is specified by the conditional density q(ylx), the joint density p(x, y) = p(x)q(ylx) defines both the channel and the transmitter strategy, in analogy to the discrete case. The probability density q(y) of the received sample values is then given by q(y) = (84) foo p(x, y) dx. -00 The random variable p(x, y) lex; y) = log - - - (85) p(x)q(y) is again defined as the mutual information between x and y, and its average value OO p(x y) (86) R = leX; Y) = p(x, y) log , dx dy -00 -00 p(x)q(y) f foo is the average rate of transmission of mutual information, in bits per sample. This measure retains its informational significance while selfinformation does not, because mutual information is invariant to a change in the scale on which both the transmitted sample x and the received sample yare measured. EXAMPLE. Additive Noise. As before, let y = x + z, where z is an added noise, statistically independent of x. Then by eqs. (83) and (86), (87) R = l(X; Y) = OO f -00 = f OO -00 foo p(x, y) log q(y Ix) - dx dy q(y) -00 foo p(x, y) log rex - q(y) -00 y) dx dy -foo foo p(x, y) log q(y) dx dy + foo foo p(x, y) log r(y -00 -00 = H(Y) - H(Z). -00 -00 x) dx dy 16-43 INFORMATION THEORY Channel Capacity. The capacity C of a noisy continuous channel is the maximum value of the rate R which may be obtained in eq. (8G) by varying the probability density p(x), with which the transmitted sample values are chosen. The variation is usually constrained so as to keep constant a given peak or mean square value of the time function x(t), which is determined through eq. (81) by the sample values. In general, finding C is a difficult variational problem. EXAMPLE. Additive Noise. By eq. (87), the rate R in a channel with independent additive noise is the difference between the entropy of the distribution of received signal values y and the entropy of the distribution of the additive noise z. For a given channel, the noise distribution is fixed, so that maximizing the rate reduces to the maximization of H(Y) by variation of p(x): thus from eqs. (83) and (84), (88) max H(Y) = max p{a:} {_foo [foo p(x)r(y --:- x) dX] p{x} X -00-00 log [L:p(x)r(y - x) dX] dY }, subject to constraints on peak or average transmitter power. This problem is difficult. Entropy of Gaussian Distribution. If the sample values x(kT) of a bandlimited function x(t) are selected with statistical independence from a distribution with density p(x), the mean square value of the time function x(t) is equal to the mean square value of the samples (Ref. 44), which is given by x2 = (89) f'" X2p(X) dx. -00 Thus x 2 is the signal power S. For x 2 = S fixed, the distribution with maximum entropy is Gaussian (Ref. 16), with (90) and entropy, from eq. (82), given by (91) H(X) = - (1/v!2;S) foo e-00 = log v!2;S = ! log 27reS. + ! log e x2 / 2S (log [1/v!2;S] - x 2 /2S) dx 16-44 INFORMATION THEORY AND TRANSMISSION Gaussian Additive Noise. If an added bandlimited noise z is statistically independent of the signal x, and either x or z has zero mean value, (92) where N is the mean noise power. Thus if the channel is given and the average transmitter power is constrained, the received power is determined by eq. (92). The entropy H(Y) in eq. (87) will then be maximized if q(y) is Gaussian with variance S + N. However, the sum of two independent random variables cannot have a Gaussian distribution unless both random variables themselves have Gaussian distributions (Ref. 49). Thus only if the noise is Gaussian can the transmitter select a (Gaussian) p(x) which will lead to a maximum H(Y). The rate will then be R = H(Y) - H(Z) (93) + N) = t = t log (1 + SjN) bits per sample. log 271"(S - t log 271"S Rewriting eq. (93) on a bits-per-second basis gives: CAPACITY OF A CHANNEL WITH ADDITIVE GAUSSIAN NOISE. Given a channel bandlimited from 0 to W cycles per second, with an average transmitter power S, perturbed by additive white Gaussian noise of total power N, its capacity is (94) c = W log (1 + SjN) bits per second. The restriction to white noise (noise which has a uniform spectral density in the interval 0 to W cycles) is required in order that successive samples of noise be statistically independent. If they are not, the capacity will be greater than that given by eq. (94). Dependence of Capacity on Bandwidth. Holding S fixed and increasing W, the noise power N increases with W, since noise power in 1.5 t = = /Iog2e bits_ _ 1.44_ bits 1 nat _-.l ___ __ __ _ 1.0 C bits 0.5 1 2 3 WI(SINo)~ . FIG. 16. Channel capacity and bandw!~th. INFORMATION THEORY 16-45 frequencies previously rejected now enters the channel. For thermal noise and shot noise, the noise power N is directly proportional to W: N = NoW watts, (95) where No is the noise power per cycle bandwidth. Substituting this in eq. (94) gives (96) C = Wlog (1 + S/NoW), which is plotted in Fig. 16. Interpretations of Capacity Some interpretation of the capacity of a continuous noisy channel is required, since the channel can accept input information at an infinite rate, but can only transmit information about its input to the receiver at a finite rate. Discrete Input Interpretation. One interpretation is provided by the fact that the noisy channel coding theorem still applies to the continuous noisy channel. The output of a discrete source may be encoded for transmission over a continuous noisy channel at any rate less than channel capacity, and the receiver can then decode the received signal with arbitrarily small error probability. In this case the transmitted signals may be continuous waveforms, but they are selected from a finite set, and therefore have finite self-information (Ref. 45). Quantization Interpretation. If the receiver cannot distinguish between transmitted waveforms which are very near to one another, it is not necessary to transmit the precise waveform generated by a continuous source: some "near-by" waveform will do. The quantization process discussed earlier shows one procedure for selecting a near-by waveform. The transmitted waveforms of any finite duration are then a discrete set, and one may be selected by the receiver with small error probability despite the noisy channel. Other measures of distance, or fidelity of reproduction, have been introduced and studied (Refs. 16, 46). Reduction of Ignorance Interpretation. A final interpretation also carries over from the discrete case. If successive samples are statistically independent, the receiver knows a priori that x will be selected from p(x). A posteriori the true value of x is selected from p(x Iy), a narrower distribu. tion with less entropy. The change in entropy, (97) H(X) - H(XI Y) measures the average reduction in· the receiver's ignorance of the value of the transmitted sample (Refs. 16, 29). 16-46 INFORMATION THEORY AND TRANSMISSION Generalizations. The analysis of the continuous case has been extended to cases of mixed type, i.e., to distributions which have discrete probabilities as well as densities (Refs. 16, 50, 51), and some discussion has been given of the nonbandlimited case (Ref. 46). REFERENCES 1. Y. Bar-Hillel and R. Carnap, Semantic information, Jackson, Editor, Communication Theory, Butterworths, London, 1953. 2. W. J. McGill, Multivariate information transmission, Trans. I.R.E., PGIT-4, 93-111, Sept. 1954. 3. S. Kullback, An application of information theory to multivariate analysis, Ann. Math. Stat., 23, 88-102, March 1952. 4. B. Mandelbrot, Simple games of strategy occurring in communication through natural languages, Trans. I.R.E., PGIT-3, 124-137, March 1954. 5. M. P. Schutzenberger, On some measures of information used in statistics, C. Cherry, Editor, Information Theory, Butterworths, London, 1956. 6. R. A. Fisher, Theory of statistical estimation, Proc. Cambridge Phil. Soc., 22" 700-725 (1925). 7. S. O. Rice, Mathematical analysis of random noise, Bell System Tech. J., 23, 282-332 (July 1944); 24,46-156 (Jan. 1945). Reprinted in N. Wax, Editor, Noise and Stochastic Processes, Dover, 1954. 8. N. Wiener, The Extrapolation, Interpolation and Smoothing of Stationary Time Series with Engineering Applications, Technology Press and Wiley, New York, 1949. 9. D. Middleton and D. Van Meter, Detection and extraction of signals in noise from the point of view of statistical decision theory, J. Soc. Ind. Appl. Math. 3, 192-253 (Dec. 1955); 4., 86-119 (.Jun.e 1956). 10. W. W. Peterson, T. G. Birdsall, and ·W. C. Fox, The theory of signal detectability, 'Prans. I.R.E., PGIT-4, 171-212, Sept. 1954. 11. C. Chcrry, On Human Communication, Technology Press and Wiley, 1957, esp. p. 247, footnote. 12. R. Jacobson, G. Fant, and M. Halle, Preliminaries to speech analysis, M.I.T. Acoust. Lab. Rept. 13, 1952. 13. R. V. L. Hartley, Transmission of information, Bell System Tech. J., 7, 535-563 (July 1928). 14. R. M. Fano, Statistical Theory of Informatio~, Technology Press, Cambridge, Mass., 1957. 15. M. J. E. Golay, Bits and binits, Proc. I.R.E., 42, 1452 (Sept. 1954). 16. C. E. Shannon, A mathematical theory of communication, Bell System Tech. J., 1948 as reprinted in C. E. Shannon and W. Weaver, The Mathematical Theory of Communication, Univ. of Illinois Press, Urbana, 1949. See p. 28. 17. R. M. Fano, The transmission of information, M.l.T. Research Lab. Electronics Tech. Rept. 65, March 194n. 18. D. A. Huffman, A mcthod for the construction of minimal-redundancy codes, Proc. I.R.E., 40, 1098-1101 (Sept. 1952). 19. A. A. Sardinas and G. W. Patterson, A necessary 'and sufficient condition for unique decomposition of coded messages, I.R.E. Convention Record, Pt. 8, 104-109, March 1953. INFORMATION THEORY 16-47 20. B. Mandelbrot, On recurrent noise limiting coding, in Proceedings of the Symposium on Information Networks, Polytechnic Institute of Brooklyn, New York, 1955. 21. M. P. Schutzenberger, On an application of semi-group methods to some problems in coding, 'l'rans. I.R.E., IT-2, 47-60, Sept. 1956. 22. L. K. Kraft, A Device for Quantizing, Grouping and Coding Amplitude-Modulated Pulses, S.M. Thesis, Elec. Eng. Dept., M.LT., 1949. 23. B. McMillan, Two inequalities implied by unique decipherability, Trans. I.R.E., IT-2, 115-116, Dec. 1956. 24. B. Mandelbrot, Diagnostic et transduction en l'absence de bruit, Institut de Statistique de l'Universitc de Paris, Paris, 1955. 25. Shannon, op. cit. in (Ref. 16), p. 22. 26. N. Chomsky, Three models for the description of language, Trans. I.R.E. IT-2, 113-124, Sep~ 1956. 27. N. Chomsky, Syntactic Structures, Mouton and Co., London, 1957. 28. Shannon, op. cit. in (Ref. 16), p. 26. 29. P. M. Woodward, Probability and Information Theory, with Applications to Radar, McGraw-Hill, New York, 1953. 30. S. Muroga, On the capacity of a discrete channel, J. Phys. Soc. Japan, 8, 484-494 (1953). 31. A. Feinstein, A new basic theorem in information theory, Trans. I.R.E., PGIT-4, 2-22, Sept. 1954. . 32. C. E. Shannon, The rate of approach to ideal coding (abstract only), I.R.E. Convention Record, Pt. 4, 47, March 1955. ~3. P. Elias, Coding for noisy channels, I.R.E. Convention Record, Pt. 4, 37-46, March 1955. 34. C. E. Shannon, Certain results in coding theory for noisy channels, Information and Control, 1, 6-25 (Sept. 1957). 35. R. W. Hamming, Error detecting and error correcting codes, Bell System Tech. J., 29, 147-160 (1950). 36. M. Plotkin, Binary codes with specified minimum distance, Univ. Penna. Moore School Research Div. Rept. 51-20, 1951. 37. M. J. E. Golay, Binary coding, Trans. I.R.E., PGIT-4, 23-28, 1954. 38. E. N. Gilbert, A comparison of signalling alphabets, Bell System Tech. J., 31, 504-522 (1952). 39. 1. S. Reed, A class of multiple error-correcting codes and the decoding scheme, Trans. I.R.E., PGIT-4, 38-49, Sept. 1954. 40. P. Elias, Error-free coding, Trans. I.R.E., PGIT-4, 29-37, Sept. 1954. 41. P. Elias, Coding for two noisy channels, in C. Cherry, Editor, Information Theory, Butterworths, London, 1956. 42. D. Slepian, A class of binary signalling alphabets, Bell 811'!tem Tech. J., 35, 203-234 (Jan. 1956). 43. D. Slepian, A note on two binary signalling alphabets, Trans. I.R.E., IT-2, 84-86, June 1956. 44. W. R. Bennet, Spectra of quantized signals, Bell System Tech. J., 27, 446-472 (July 1948). 45. C. E. Shannon, Communication in the presence of noise, Proc. I.R.E., 37, 10-21 (Jan. 1949). 46. A. N. Kolmogorov, On the Shannon theory of information transmission in the case of continuous signals, Trans. I.R.E., IT-2, 102-108, Dec. 1956. 47. P. Elias, Predictive coding, 'Trans. I.R.E., IT-I, 16-33, March 1955. 16-48 INFORMATlqN THEORY AND TRANSMISSION 48. B. M. Oliver, J. R. Pierce, and C. E. Shannon, The philosophy of PCM, Proc. I.R.E., 36, 1324-1331 (1948). 49. H. Cramer, Random variables and probability distributions, Cambridge Tracts in Math. No. 36, Cambridge, England, 1937. 50. S. Kullback and R. A. Liebler, On information and sufficiency, Ann. Math. Stat., 22, 79-86 (March 1951). 51. K. H. Powers, A unified theory of information, M.l.T. Research Lab. Electronics Tech. Rept. 311, Feb. 1956. 52. J. L. Doob, Stochastic Processes, Wiley, New York, 1953, esp. p: 89. D INFORMATION THEORY AND TRANSMISSION Chapter 17 Smoothing and Filtering Pierre Mertz 1. Deflnitions: Smoothing and Prediction. Symbols 2. Deflnitions: Correlation 3. Relationship between Correlation and Signal Structure 4. Design of Optimum Filter 5. Extensions of Procedure 6. Network Synthesis References 1. DEFINITIONS: SMOOTHING AND PREDICTION. 17-01 17-05 17-09 17-13 17-19 17-25 17-32 SYMBOLS Time Sequence of Data. A plot is shown in Fig. 1 of a small portion of a time sequence of data, f(t). Such a time sequence may also be represented by an electrical signal, in which the variable is a voltage or current. The sequence of data may be taken only at successive discrete intervals of time, instead of continuously. This is illustrated by the discrete ordinates indicated in Fig. 1. t oj. Time--+ FIG. 1. [11= Variation of a physical quantity with time. ]7-0) 17-02 INFORMATION THEORY AND TRANSMISSION Stationary TiIne Sequence of Data. The variation of a physical quantity with time constitutes a continuous time sequence of data. The values form a distribution. If this distribution does not show a long-range trend with time, the time sequence is said to be stationary. (See Chap. 13.) Quasi-stationary time sequence is a distribution that is statistically stationary (i.e., shows no trend) in the short range but not in the long range. Errors or Noise. There is usually a random error in the determination of a given physical quantity, or in its representation by a given electrical Amplitude t t+T Time FIG. 2. Variation of a physical quantity, with superposed error, with time. signal. This is illustrated by the erratic solid line of Fig. 2. The dotted line is the same plot as Fig. 1. A random error may be considered as added to the actual physical quantity. If there is no long time trend in the distribution of the random error, it is also a stationary sequence. In electrical signals the added sequence is usually called "noise," and in industrial processes, "disturbances." Smoothing Problem. For data with random errors such as Fig. 2, an averaging process could be used to reduce the error. This assumes that the physical sequence is "smoother" than the data sequence. In an electrical signal representing the data such an averaging process can be carried ou t by a filter. Optimum Filter. There is likely to be some filter design which has an optimum frequency-response characteristic. If the filter suppresses the rapid departures too much, it also suppresses some real variations in the physical quantity represented by the data. If, on the other hand, it does not suppress them sufficiently, it is not reducing the error as much as is feasible. For a classical theoretical analysis by Wiener see Ref. 1. The optimum filter as designed by mathematical theory is not usually critical. In practice an elementary filter is generally devised which approximates it and gives almost equal performance. SMOOTHING AND FILTERING 17-03 Predicting Problem. It is occasionally desirable not only to smooth the data, but also to extrapolate or predict the data. For example in Fig. 2 it may be needed, at time t, to predict the most likely signal which will occur at time t + T. This is feasible because the variation of the physical quantity is restrained by physical laws, and it does not have complete or random liberty of action. Wiener's analysis indicates that prediction may also be effected with a filter. An optimum design is secured from nearly the same formulation as that used for the smoothing process. 0.6 r-----..,------.-------r-----, o ~ ~ -8 :5 0.4 I------t-----+---~"'-I----'------I ........ ::J :a. ctJ ;1:- ~ ~ Amplitude ~ ~0.2r----+-~~--r----+---~ ~if t!: 0.5 1.0 1.5 2.0 Radian frequency, w FIG. 3. Amplitude and phase characteristics vs. frequency of an optimum filter. EXAMPLE. Prediction Filter. The transfer amplitude response and transfer phase shift of a smoothing and predicting filter designed according to the Wiener theory are presented in Fig. 3. The ordinates are shown as 1.5 z.1.0 ~ 'Vi c: Q.I ... "0 "~ Cl/-J,. Q.I := o 0.0.5 ·~"'!!.:!!.s:e ~ Noise alone o o 0.5 1.0 Radian frequency, FIG. 4. ------ 1.5 W Signal and noise power spectra. 2.0 17-04 INFORMATION THEORY AND TRANSMISSION functions of the radian frequency w. The filter is designed for a prediction time of 1 second, and for the signal and noise power spectrum illustrated in Fig. 4. More details regarding the problem and the design are given below. Prediction: Discrete Data. When the data are discrete rather than continuous, the solution is not simply embodied in the form of an electrical filter. The solution describes instead a mathematical process that accomplishes an analogous averaging and predicting effect. Basis of Treatment. In the present exposition, the Wiener theory is followed (Ref. 1). Advantage is taken of the treatments of Levinson (Ref. 2) and of Bode and Shannon (Ref. 3) in simplifying the presentation. It is to be recognized that much other work has been done on the subject. At the close of the chapter some of this work, particularly recent development, is noted. SYInbois A,B An A(w) a, b, c B(w) bi C C,D,E,F Ci e F(w) j, j(t) g, g(t) gi h(t) I, II i I(w) J,H j, h K(w) Ko(x) k(t) Li M,N m,n p Q(w) constants numerator of partial fraction amplitude part of transfer response, in nepers portions of an integral phase of transfer response, in radians coefficient of pi in polynomial parameter numerators of partial fractions, sometimes with subscript indices capacitance of jth element in filter base of N apierian logarithms signal correlation function signal amplitude noise amplitude coefficient of p in continued fraction total instantaneous wave amplitude integrals, forming part of m~re extensive formulas v=i current, function of frequency limits of summation indices summation indices transfer response function of frequency, of Wiener filter Bessel function of second kind, pure imaginary argument transfer response function of time, of Wiener filter inductance of jth element in filter limits of summation indices indices Heaviside operator = iw factor of (W) cI>n(w) cI>jf(W), cI>gg(w) cP cp(r) CPn(r) cpjf(r), cpgg(r) cpjg(r), cpg/(r) 'l'(w) 'l'*(W) 'l'n(W), 'l'n *(W) 1/;(r),1/;*(r) W 17-05 Fourier transform of Q(w) resistance variable of integration, for radian frequency prediction time time variable variable of integration, for radian frequency voltage, function of frequency voltage, function of time voltage amplitude numerators of partial fractions transfer response function of frequency admittance, function of frequency impedance, function of frequency driving point impedance, function of operator P transfer impedance, function of operator P transfer impedance, function of frequency zeros of polynomials, with imaginary parts > 0 complex conjugates of corresponding a's and (3's constant integration limit on time variable constant natural logarithm of'l'(w) indices time variable, for correlation or integration correlation spectrum, Fourier transform of cp(r) correlation spectrum of nth derivative of input function autocorrelation spectra phase shift correlation function correlation function of nth derivative of input function autocorrelation functions cross-correlation functions factor of cI>(w) for which singularities have imaginary parts >0 complex conjugate of'l'(w) corresponding factors of 'i'n(w) Fourier transforms of 'l'(w), 'l'*(w) radian frequency zero of polynomial, with imaginary part > 0 complex conjugate of Wj 2. DEFINITIONS: CORRELATION Autocorrelation Function. The amplitude of a signal at any given time is not wholly independent of its value at other times. The correlation that exists may be expressed in terms of an autocorrelation function. In Fig. 5 the signal amplitude f is measured at times t and t + r. The autocorrelation function cjJ(r) of a signal is the average product of the signal at time t and the signal at time t r, averaged over a period of + 17-06 INFORMATION THEORY AND TRANSMISSION t+r Time--? FIG. 5. Data taken for determination of autocorrelation coefficient. time long enough to smooth out instantaneous fluctuations. overscribed bar to indicate averaging, cp(r) = J(t)J(t or 1 cp(r) = lim - (1) ()-+"'> 2() With the + r), I() J(t)J(t + T) dt, _() where T = finite time shift. The autocorrelation function is a measure of the extent to which the value of J(t) at any given time can be used to predict J(t) at a time interval T later. EXAMPLE. An autocorrelation function is illustrated in Fig. 6. This is the one assumed for the signal whose spectrum is illustrated in Fig. 4. 1.5 , . . . - - - - . , - - - - . , - - - - . , - - - - , ~ -e. 1.0 t - - - - - - t - - - - + - - - - - - + - - - - - l C o :es:::: ~ s:::: o ~ 0.5 t------t-~---+-----.:~-+-----l f! o (.) o-~2-----~1------~0------~1------~2 Correlation time, FIG. 6. T Autocorrelation function. SMOOTHING AND FILTERING· 17-07 The autocorrelation function in general is at a maximum at T = 0, and it usually drops off to zero or nearly zero for large values of T. It shows even symmetry about T = O. The peak is broad when J(t) contains primarily low frequencies and narrow when J(t) contains primarily high frequencies. Power Density Spectrum. It is useful to deal with the Fourier transform of the autocorrelation function cp(T) which may be called the autocorrelation spectrum, pew). This is (2) and reciprocally (3) Wiener (Refs. 1 and 4) has identified the autocorrelation spectrum 0, = complex conjugates of a, {j. The a's are the zeros, and the (j's poles, of the function / Yew) /2. A plot of these, for the function plotted in Fig. 7, is given in Fig. 13. There I o Zeros x Poles 3 tra Co ~o c: ·00 ra E -1 ... 2 -1 o 2 Real part, w FIG. 13. Singularities of rational correlation function. are only poles, one being above, and the other below, the axis of real w's. The response indicated by Fig. 9, with zero phase shift, is not represented by a rational function, and hence it cannot be described simply by zeros and poles in the complex frequency plane. Location of Zeros and Poles for Minilllulll Phase Network. From Bode and Shannon (Ref. 3), the minimum phase network has the transfer response function (w - al)(w - (2) (13) Yew) = K - - - - - - - (w - (jl) (w - (j2) The zeros and poles are all in the upper half-plane of the complex frequency space. A plot of the zeros and poles which applies to the amplitude SMOOTHING AND FILTERING 17-13 response of Fig. 9 and phase shift of Fig. 11 is shown in Fig. 14. In this specific case there are no zeros and only one pole, which is in the upper half-plane. I o Zeros x Poles 3 -e co a. ~O co c ·00 co E -1 -2 o -1 2 Real part, FIG. 14. W Singularities of physically realizable shaping network. 4. DESIGN OF OPTIMUM FILTER Criterion of Opthnization. A criterion of performance is necessary to judge when one filter design is better than another. That used by Wiener is based on a comparison between the filter output and the actual signal, freed of noise, at the extrapolated time. The difference, or error, is measured as a function of time, and its root-mean-square value determined. The optimum filter is taken as that for which the root-mean-square error (Chap. 13) is a minimum. This assumption is important to the development of the theory. Wiener Solution, Smoothing and Prediction. The optimum filter proposed by Wiener has a frequency response which may be designated as K(w). This has a Fourier transform k(t). The reciprocal relations between the two are: K(w) = (14) foo k(t)e- iwt dt -00 (15) k(t) = f 100. twt ~ 27r K(w)e dw. -00 In the simple Wiener solution the transfer response of the optimum filter has the value (16) K(w) = 1 27r'l1(w) i 00 0 e- iwt dt f 00 -00 (u) ~ eiu(t+T) du 'l1*(u) 17-14 INFORMATION THEORY AND TRANSMISSION Here
(w) of eqs. (8) and (9). The factors are taken, as in eqs. (12) and (13), such that (w) assumed known, is given by Levinson (Ref. 2) as (19) (20) _1 X(w) - ~ 7r 00 0 w log rI>(s) 2 2 ds. w-s 'l'*(w) = rI>(w)/'l'(w). Identification with Minimum Phase Network. The connection of eq. (17) with eqs. (12) and (13) identifies 'l'(w) with the transfer response characteristic Yew) of the minimum phase shaping network which has the amplitude response characteristic I Yew) I = vi
ff(W) ( = (27) --. 2 w +1 The Fourier transform is found in Campbell and Foster (Ref. 7). This is tabulated with the argument p = iw instead of w, so that -1 cI>fI(P) = - - . p2 _ 1 (28) From pair 444 (Ref. 7) (29) This is represented in Fig. 6, ignoring the factor For the noise alone: %. (30) where C (31) ---? 00.
(r) 1 1 2 w + 1 ---? - - - +1 + eCe-IT/CI). (w/C)2 = !(e-ITI + e2 , For the conditions illustrated in Figs. 9 and 10, where it is assumed that the phase response of the shaping network is zero (Sect. 3), (33) (34) SMOOTHING AND FILTERING 17-17 From pair 558 (Ref. 7), (35) Here 1(0 is the Bessel function of the second kind, and imaginary argument (the ac tual argument being i IT I). It is tabula ted in Watson (Ref. 8). This is illustrated in Fig. 10. For the shaping. network assumed as physically realiz;~ble: . -'L (36) wJJ(w) = - - ' . ' w - 'L 1 (37) ,'.W!f(P) = --. p+ 1 From pair 438 (Ref. 7), . t> O. (38) This is illustrated in Fig. 12: i wJf*(w) = --.' (39) w+'L -1 wJf*(p) = - - . P- 1 (40) From pair 439 (Ref. 7), t < O. (41) (42) " e[w - (i~/e)] w(w) =, . . ' w - . w*(w) = e[w 'L + (i~/e)] . w+'L 1 --=~------'-------;:==:-- W*(w) e[w ~ -:iHw + D w -' i (iVI + e2 /e)] E + w+ (iV 1 + e2 / e) , 17-18 INFORMATION THEORY AND TRANSMISSION where -~ D = '·e , _/ + 'v I + e2 ~ E= ' , e+ VI +e2 . De-T I(w - i) . e-T (43) K(w) = _/ (e + v'1 + e2 ) (v 1 + e2 + eiw) -T =e 1 • ----,---- exp [-i tan- 1 (Ew/~ )] e+ v'1 + e'2 v'I + e2 + e2 w2 The form of eq. (43) indicates the nature of the variables in K(w). The first factor depends only upon the prediction time T, and the second only upon the noise spectral density e2 • The denominator of the third factor is a combined function of noise density and frequency. All three of these quantities are real and hence affect only the amplitude response of K(w). The numerator of the third factor is complex. It has a modulus of unity and expresses the phase shift of K(w). The amplitude response and the phase are the two quantities plotted in Fig. 3. EXAMPLE 2. A second illustration, by Wiener, assumes no noise. The function required of the filter is prediction. Consider 1 + 1)2' (44) CP(w) = (w 2 (45) 1
O. + 1)2 SMOOTHING AND FILTERING 17-19 From eq. (18) (47) 1 K(w) = 'I'(w) r J 00 !/I(t + T)e- iwt dt 0 (48) This solution is also reached from the alternative formulation of eq. (26). The solution consists of a term independent of wand a term which comprises a differentiation of the input wave. This was also the case in the previous example, except that there the total band was limited by the denominator. In the present case the total band is infinite. In polar form the solution is (49) K(w) = e-T V1 + 2T + T2 + w2T2 exp {itan-l [wT/(l + T)]}. 5. EXTENSIONS OF PROCEDURE Filters with Lag. On occasion the urgency in time of reproduction of a signal mixed with noise is not so great as the need for greatest feasible reduction of the noise in the reproduction. In such cases the prediction time T is advantageously changed into a lag, that is, T becomes negative. The optimum filter formula of eqs. (16) and (26) still holds generally, but there are difficulties in carrying out the second integration, with the lower limit of zero. Wiener (Ref. 1) suggests an approximation thus: 0T etw (50) ~ [1+ (iwT/2V)]" • 1 - (iwT/2v) The single case is considered where the ratio of eq. (24) contains only one pole in the upper half-plane of w. This is assumed of the first order, at WI. A possible pole in the lower half-plane may be at W2*. Then iwT (51) (52) ipjf(w)e 'I'*(w) ~ ~ [1 + 1- ~ n=l (iWI T /2v)]" (iWI T /2v) F (w ~ WI)(W - W2*) .An [1 - (iWIT /2v)]n X + (w - WI) + (w Y - W2*) . 17-20 INFORMATION THEORY AND TRANSMISSION In eq. (52) the first summation term in the partial fraction expansion is approximated only to n = N ::; v. Then (53) K(w) ~ ( LN A n n=l [1 - (iwlT j2v)]n X-) /'It(w). +w - Wl In the example which was illustrated by eqs. (36) to (43), Wiener gives the result as (54) T2j2 K(w) = [1 1 + (Tj2)][(Tj2)VI + e2 ....; e] + iw + (VI + + eiw)[1 e2 (iwTj2)] 1 - (T/2) + [1 + (T/2)](e + VI + e2 )(VI + e2 + eiw) Where the noise level is high and the signal weak a simple formula is obtained for the optimum filter with lag. Let (55)
n(w) may be factored as (59) Then the Wiener formula for the optimum filter is, for the prediction time T, (60) K(w) = 1 (iwT)n-l + iwT + ... + - - (n - 1)! i oo n + w 271"'l'n(w ) . e-~t& 0 [ (e .T W iut 'l'n(u) e -00 un foo X (iUT)n-l] 1) - (iuT) - ... - - - - duo (n - 1)! - This is the form of the equation in which no noise is assumed in the input. Where there is noise the 'l'n(u) in the integrand would be replaced by the expression cI>ff(w)/'l'*(w) in eq. (16). EXAMPLE. Assume that the correlation spectrum of eq. (44) describes the correlation in the second derivative of the input function and that no noise is present. That is, (61) or 1 cI> (w) - 2 - (w2 + 1)2 • Then the solution for the filter for the prediction time T, as given by eq. (60), is oo . foo 'l'2(U) .' . 2 w J(w) = 1 iwT + e- UJJt dt - - 2 - elut(ewt - 1 - iuT) duo 271"'l'2(W) 0 -00 u i + As in eq. (46), Let -1 I = 271" -1 1=271" foo [ -00 foo -00 2 U iut e . 2 (u - ~) . (e luT - 1 - iuT) du, eiu(t+T) eiut iTe iut ] du u 2(u - i)2 u 2(u - i)2 u(u - i)2 The integration of the third term of the integrand is accomplished by the 17-22 INFORMATION THEORY AND TRANSMISSION combination of pairs 210 and 442 (Ref. 7) and gives c = -Tte- t - Te- t + T, t> o. The integration of the first and second terms is accomplished by the further application of pair 210, and gives + T)e-(t+T) - 2e-(t+T) - (t + T) + 2, te- t + 2e- t + t - 2, t > o. a = - (t b= t + T> 0, Thus I=a+b+c = -te-t(e-T + T - 1) - e-t(Te-T + 2e-T + T - 2), t> o. Continuing to the second integration of eq. (60) II = i = oo I e-iw' dt ioo(Ale-' -A + Be-')e;W' dt iB ------. (w - i)2 W - t 'l'2(W) + iwT + w2A + w2B + iw3B. 1 + iw"T + (iw)2(Te-T + 3e-T + 2T - 3) + (iw)3(Te- T + 2e-T + T K(w) = 1 (62) K(w) = - 2). In this solution, there are three successive differentiations of the input wave to add to the constant term. As in the previous example, characterized by eq.(48), the absence of noise in the assumptions leads to an infinite band in the filter. "Filters" for Discrete Data. The signal which has been assumed in the discussion up to this point has been continuous. However, it could be expressed as the amplitude of a succession of discrete pulses, such as the maximum daily temperatures at a given location. Where the data are discrete, in electrical form or not, they cannot be passed through a physical electrical filter to carry out the operations which SMOOTHING AND FILTERING 17-23 have been discussed. The operations can, however, be expressed in terms of mathematical processes, with only small changes from the previous description. For discrete data eq. (1) becomes 1 (63)' cp(n) = 1I~~~ 2M + 1 !If mEM f(m)f(m + n), and eq. (2) becomes 00 (64) n=-oo The factorization problem of eq. (17) is modified somewhat, because cI>(w) is more likely to be an empirically determined rather than an analytically expressed quantity. Wiener (Ref. 1) has indicated various means for handling this problem; one method is the following. Equation (17) may be written: (65) log (w) = L 00 ane- iwn + ao + L ane- iwn n=-oo With cI>, and therefore log cI>, real, the series shows symmetry between positive and negative n's. From the discussions, regarding eqs. (10), (11), (12), and (17) to (20), and recognition that the first term at the right of eq. (66) shows no response at positive times (n > 0), and the third term no response at negative times (n < 0), the terms of eq. (66) may be identified with those of eq. (65). That is, 00 (ao/2) + L ane-iwn 1 are identified with log 'IF(w), and -1 (ao/2) +L ane- iwn -00 with log 'IF*(w). This permits the computation of 'IF(w) and 'IF*(w) from the Fourier series of eq. (66). The Fourier series itself may be obtained empirically by numerical computation or by the use of a harmonic analyzer. Other empirical solutions of the 'factorization problem have been presented. Some of these relate to the determination of the minimum phase 17-24 INFORMATION THEORY AND TRANSMISSION of a network which shows an empirical transfer amplitude response characteristic (Refs. 6 and 9). Additional empirical solutions for the factorization have been presented, related to other problems (Ref. 10). Alternatives to Wiener Criterion of Optilllization. Lee (Ref. 11) has explored the possibilities of an alternative to the Wiener criterion. He minimizes the integral square cross-correlation error (integrated with respect to time). He expresses the conditions in terms of an autocorrelation of the autocorrelation function (as if the latter were itself a signal) and a cross-correlation of the autocorrelation function and the crosscorrelation function. In these terms the specifications for the optimum filters are completely analogous to those of Wiener. Zadeh and also Middleton and van Meter have outlined possibilities in the design of an optimum filter which uses the methods of decision theory. (See Ref. 12.) This utilizes all the information known to the designer about the signal and the noise, and optimizes the decision. This optimum minimizes the integrated risk of wrong decisions. The results obtained are chiefly of conceptual value, because the computational problems are formidable even in relatively simple situations. Other authors have also applied criteria of or akin to those used in decision theory (Ref. 13). The procedures are particularly effective conceptually where the signal interpretation occurs under extremely unfavorable noise conditions, such as for the signals from fringe areas of search radars. The paper of Middleton and van Meter (Ref. 12) contains a complete bibliography. Zadeh and Ragazzini (Ref. 14) have extended the Wiener theory to the case where the data are described by a nonstationary time series. Particularly they assume its approximation by a polynomial of a given order in time, with unknown coefficients. Such cases have a certain practical interest. The treatment uses an approach suggested in a report by Bode, Blackman, and Shannon, in 1948, to the Research and Development Board. Chang (Ref. 15) has enlarged the Wiener criterion by considering also integrated squares of errors in the frequency domain. The integration further weights these errors according to arbitrary functions of the frequency. He develops two theorems, a minimization theorem and a separation theorem. The first is an extension of the Wiener theorem in terms of frequency, and the second represents an alternative procedure to the factorization methods of Wiener. Still other authors (Ref. 16) have given consideration to extending the hypotheses under which the Wiener criterion can be used. In particular they have extended the treatment to include nonstationary noise and a system having time-varying linear parameters. SMOOTHING AND FILTERING 17-25 Nonlinear Prediction. The discussion so far has centered on a filter which performs linear operations on the signal input to it. That is, it multiplies the Fourier components of that input by a given numerical factor and shifts their phase by a given angle. Both of these vary with the frequency of the component, but they are independent of the amplitude of the component. Some thought has been given by Bode and Shannon (Ref. 3) and others (Ref. 17) to the possibilities of nonlinear prediction, in which the operation on the signal would vary with the signal amplitude. At the expense of functional complications, this permits improvement in the accuracy of prediction under certain conditions. The problem has not been worked out to anything like the analytical detail devoted to linear prediction. 6. NETWORK SYNTHESIS Introduction. Many of the problems discussed so far lead to a solution in the 'form of an electrical filter. Specification of the transfer properties of this filter comes out as the end product of the solution in terms of frequency as K(w), or of time as k(t). In an actual case a further step is necessary, namely the network synthesis. The filter or network must be built, and it needs specification in terms of components. This is an art with an extensive background and innumerable ramifications. The scope of the present discussion is limited to electric networks, with some references for amplification. In data signal processing, on occasion the mechanical properties of equipment may affect signal propagation. Some discussions of electromechanical elements have be~n given by Everitt and Anner, and by Graham (Ref. 18). The stage of analysis considerea at this point bridges some of the steps from a theoretical toward a schematic design of the equipment. Some of the solutions which have been advanced in the present chapter, particularly to problems which are largely of prediction and exclude noise, are essentially formulas for analog computation (see Vol. 2, Analog Computers). COnlponents of Electric Networks. The elements composing electric networks within the limited scope of this treatment consist of resistances, inductances, and capacitances. These elements are marked by various relations between voltage across them and current flow. The· voltage may be set up as a function of time, as vet) = Vo cos (wt + cj» or as a function of radian frequency, as V(w) = voe icfJ • Here it is taken as a complex quantity. expressed. The current may be similarly 17-26 INFORMATION THEORY AND TRANSMISSION The relationship between the two, for a resistance, is Yew) = RI(w), where R is the value. of the resistance (idealized as independent of frequency). For an inductance the relation is Yew) = iwLI(w) and for a capacitance it is Yew) = I(w)/(iwC). The quantity Z(w) = V(w)/I(w) is called the impedance, and its reciprocal Yew) = l/Z(w) = I(w)/V(w) is called the admittance. The impedance (or admittance) can be expressed for any aggregation of interconnected elements ending in two terminals. As such it may be called the "driving point impedance" (or admittance) of that network at the specified pair of terminals. In an aggregation it is also possible to measure the voltage at one pair of terminals, and the current at another pair. Here the voltage to current ratio is called the transfer impedance between the two pairs of terminals, and the current to voltage ratio the transfer admittance. Specification in Tenns of Transfer Response. The properties required in filters such as have been specified in Fig. 3, or in eqs. (16), (26), (43) and others, have been expressed as a transfer response. When a function of frequency, it has been called K(w), and when of time, k(t). According to the circumstances of the particular equipment considered, the response as a function of frequency may be set up as a transfer impedance or a transfer admittance. It may also be a transfer ratio merely of voltages, or of currents. For simplicity, consideration here is limited to a case practical in vacuum tube circuitry, where the input to the filter is taken as a current, and the output from it a voltage. That is, K(w) is identified with a transfer impedance, or ZT(W). Cauer's Method of Synthesis. In Fig. 15 a vacuum tube gives output I into the filter. The voltage V across the terminating resistance R drives a succeeding vacuum tube. The figure shows a generally practical case of the filter both starting and ending with a bridged capacitance. The filter itself therefore comprises an odd number of elements. If in any case 17-27 SMOOTHING AND FILTERING it is desired to omit one of the end capacitances it may be assumed to approach zero. For a filter of this type, the transfer admittance is a rational function of W with n poles and no zeros, and can be written as (67) Zo ZT(W) = - - - - - - - - - - (w - Wl)(W - W2) ..• (w - wn ) Cauer (Ref. 19) has indicated a method for synthesizing the network from this function, which has been noted by Peless and Murakami (Ref. 20). R t V ~__~__~____~~~__~t FIG. 15. Low-pass ladder filter. For this, Z is changed to a function of P = iw, thus (68) Zl(P) = R bnpn + bn_lpn-l + ... + blP + 1 . The normalizing factor becomes R. Then the driving point impedance Zl (p) of Fig. 15 (excluding the terminating resistance R) is found by dividing the even part of the denominator by the odd part, and multiplying the whole by the normalizing factor R. bn_lp n- l bn_ 3 p n- 3 b2p2 1 (69) Zl(P) = R . bnpn bn _ 2p n-2 blP + + + ... + + ... + + Z is then expanded into a continuing fraction, as (70) R Zl(P) = - - - - - - - - - 1 glP + --------1 g2P+-----g3P + 1 17-28 INFORMATION THEORY AND TRANSMISSION Zl is found from the elements of Fig. 15 as (71) 1 Zl(P) = - - - - - - - 1 - - - - pC l + --------1 Thus C l = gdR, L2 = g2R, C3 = g31R, (72) Cn = gnlR . ILLUSTRATION. The solution of the illustration expressed by eq. (43) represents an especially simple case. (73) ZT(P) = (e/VI + e2 )p + l' Here K represents a constant factor or gain adjustment to be set when lining up the equipment. R (74) Z1(P) = (e/VI + e2)p' (75) C1 = (e/V 1 + e2 )IR. Butterworth-Th 0 III son Filters. A few general characteristics of the performance of filters like Fig. 15 may be noted. When a stepped wave signal, as indicated in Fig. 16a, is used as input, the output signal has the essential character of Fig. 16b. Where the input amplitude is current, and the output amplitude is voltage, this trace in Fig. 16b is called the indicial impedance of the filter. The trace is distinguished by a rise time, which measures the duration between crossings of 0.1 and 0.9 of the final amplitude. (Somewhat different ranges are occasionally used.) The trace is also distinguished by an overshoot. This is measured as a per cent of the final amplitude. SMOOTHING AND FILTERING 17-29 In the design of such filters for general purposes (and in the absence of a specific formulation like the vViener equations) it is usually desirable to conserve the shape of the input signal as much as feasible. This is obtained with a short rise time and low overshoot. I t is similarly desirable to conserve frequency space. In terms of a response characteristic, such as illustrated in Fig. 9, the region where the response is large is called the passband. The region where it is small is called the elimination band. An intermediate region is called the rolloff Overshoot t Q) S ~ c.. E C2: o Time~ Time~ (a) FIG. 16. (b) Passage of step function signal through filter: (a) input, (b) output. band. Conserving frequency space consists in limiting the total band, comprising passband and rolloff band together. For a given passband it means limiting the ratio of the rolloff bandwidth to the passband width. Such conservation is significant in that it represents a system cost, in money, bulk of apparatus, and other factors, to maintain transmission over a wider frequency band than necessary. Within a given frequency space (passband plus rolloff band) rise time and overshoot conditions are mutually antagonistic. A plot of one versus the other, for some filter designs of the type of Fig. 15, is shown in Fig. 17. Here the rise time is normalized in a manner discussed below. A series of filter designs was presented by Butterworth (Ref. 21) characterized by a minimum of curvature of the response characteristic in the passband and called maximally flat amplitude. This is plotted in Fig. 17 for m = 0 (Butterworth). Here n has the same meaning as in Fig. 15. The design condition leads to generally short rise time, but fairly high overshoot. A similar series of designs was presented by Thomson (Ref. 22) characterized by a minimum of curvature in the phase characteristic and called maximally flat envelope delay. It is plotted in Fig. 17 for m = 1 (Thomson). This design condition leads to generally higher rise times, but lower overshoots. INFORMATION THEORY AND TRANSMISSION 17-30 20~----~-------r------~------~~------~--~ 10 8 6 4 ..... c: (1) u .... 2 (1) a. C; 0 .s:: l!? (1) > 00.8 0.6 0.4 0.2 1.2 5 Elements 0.1 2 3 4 Rise time. normalized seconds FIG. 17. Design curves, transitional Butterworth-Thomson filters. Transitional Butterworth-ThoInson Filters. Peless and Murakami (Ref. 20) have prepared a series of designs intermediate between these two, by degrees indicated by the parameter m. Their rise time versus overshoot performance is plotted in Fig. 17. In all these filter designs the ratio of roll off bandwidth to the passband width tends to reduce as the number of elements is increased. This is why the better compromises between rise time and overshoot in Fig. 17 appear for the smaller numbers of elements. A rough measure of the utilization is given by the ratio of the frequency for which the drop in response is large (say 30 db, or an amplitude ratio of 1 to 31.6) to the frequency for which the drop is small (say 3 db, or an amplitude ratio of 1 to 1.41). In the figure this quantity is called the cutoff flatness; it is plotted in dotted lines. Each line indicates a locus of compromises, between rise time and overshoot, for a given fixed degree. of frequency band conservation. SMOOTHING AND FILTERING 17·31 The normalized rise time has been taken, in the Butterworth filters, to indicate the rise time of a filter whose response has dropped 3 db at a radian frequency of 1 radian per second (or a cyclic frequency of 1/(271") cycles per second). The rise times for the other filters are for designs whose amplitude response characteristics in the elimination band are asymptotic to those of the Butterworth filters. Where m is positive, the 3-db points of the filters come at lower frequencies than for the Butterworth filter. The exact amounts are indicated by w in Tables 1 to 4. TABLE -0.2 1.054 o 0.2 0.949 1 m CIWR L2W/R C3wR L 4w/R C5wR 0.8 0.820 1.0 0.786 1.2 0.756 o o o o o 0.673 1.486 0.643 1.554 o 0.707 1.414 0.618 1.618 0.596 1.677 0.577 1.732 0.561 1.782 1.0 0.712 0.411 1.184 2.055 1.2 0.671 0.398 1.168 2.148 1.0 0.659 1.2 0.617 o 1 0.500 1.333 1.500 2. 0.2 0.933 0.478 1.287 1.624 -0.2 1.064 o 1 3. FOUR-ELEMENT FILTERS o o o 0.354 1.008 1.508 1.856 0.342 0.978 1.492 2.004 0.330 0.951 1.484 2.143 0.320 0.928 1.481 2.273 0.311 0.907 1.482 2.395 1.0 0.617 0.262 0.767 1.221 1.659 2.453 1.2 0.572 0.255 0.747 1.203 1.676 2.602 o o 0.368 1.043 1.535 1.698 0.309 0.894 1.382 1.695 1.545 0.8 0.756 0.425 1.201 1.958 0.8 0.712 '0.383 1.082 1.577 1.531 1 0.6 0.810 0.441 1.224 1.854 0.6 0.774 0.399 1.127 1.643 1.353 o 0.4 0.868 0.458 1.252 1.743 0.4 0.845 o -0.2 1.064 0.321 0.928 1.435 1.770 1.323 THREE-ELEMENT FILTERS 0.2 0.924 TABLE w 0.6 0.859 o TABLE C1wR L 2w/R C3wR L 4w/R C5wR 0.4 0.902 0.747 1.338 -0.2 1.064 0.525 1.393 1.368 m Two-ELEMENT FILTERS o TABLE w 1. 4. 0.2 0.916 0.298 0.864 1.338 1.656 1.751 o o FIVE-ELEMENT FILTERS 0.4 0.824 0.288 0.836 1.300 1.640 1.945 0.6 0.740 0.279 0.811 1.269 1.638 2.125 0.8 0.671 0.270 0.788 1.243 1.646 2.295 Design Data. The information given above permits making a general compromise choice of filter for any given situation. The specific compromise choice depends upon what ultimate use is made of the signal. Where 17-32 INFORMATION THEORY AND TRANSMISSION the timing indication is important, the compromise would favor low rise time as against low overshoot. Where amplitude indication is important, the reverse holds. Where frequency conservation is important as compared with a more favorable rise time versus overshoot compromise, or where other specifications indicate need of sharpness in cutoff, enough elements to do this may be chosen. The element values, taken from the Peless-Murakami paper (Ref. 20), are listed in Tables 1 to 4. The values are normalized as for the rise times, and are also normalized to a termination resistance R. Thus the entry of 1 for CwR means that C alone is 1 farad divided by 27r times the cyclic frequency of the Butterworth 3-db cutoff, and again divided by R. If that cutoff is 1000 cycles, and R is 1000 ohms, C = 1 X (27r X 103 )-1 X 10-3 farad (76) = 0.159 microfarad. Similarly the entry 1 for Lw/R means that L alone is 1 henry times R divided by 27r times the cyclic frequency of the Butterworth 3-db cutoff. Again for a cutoff of 1000 cycles, and R of 1000 ohms. (77) L = 1 X (27r X 103 )-1 X 103 = 0.159 henry. The even element filters are all designed for C1 = o. Tchebysheff-Darlington Filters. More general discussion of the synthesis of networks has been presented by Darlington (Ref. 23) and by Grossman (Ref. 24). The procedures there employed lead to the use of mathematical contributions of Tchebysheff, and the filters have been called Tchebysheff-Darlington filters. Essentially in the papers referred to, the design is specifically applied to filters in which tolerances are placed on a permissible variation in transmission over the passband, and on a required completeness of suppression in the elimination band. REFERENCES 1. N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series, Wiley, New York, 1949. (Published in a Classified Report in 1942.) See also: A. Kolmogoroff, Interpolation und Extrapolation von stationaren zufalligen Folgen, Bull. acado sci. U.R.S.S., Ser. math., 5, 3-14 (1941). H. Jacot, Theorie de la Prevision et du filtrage des series aIeatoires stationnaires selon Norbert Wiener, Ann. Telecommunications, 7, 241-249, 297-303, 325-335 (1952). 2. N. Levinson, A heuristic exposition of Wiener's mathematical theory of prediction and filtering, J. Math. and Phys., 26 (2), 110-119 (1947). (Reprinted in Ref. 1.) SMOOTHING AND FILTERING 17-33 3. H. W. Bode and C. E. Shannon, A simplified derivation of linear least square smoothing and prediction theory, Proc. I.R.E., 38, 417-425 (1950). 4. N. Wiener, Generalized harmonic analysis, Acta Math., 55, 117-258 (1930). 5. H. Nyquist, Certain topics in telegraph transmission theory, Trans. Am. Inst. Elec. Engrs., 47, 617-644 (1928). 6. H. W. Bode, Network Analysis and Feedback Amplifier Design, Van Nostrand, Princeton, N. J., 1945. 7. G. A. Campbell and R. M. Foster, Fourier Integrals for Practical Applications, Collected Papers, American Telephone and Telegraph Company, New York, 1937; also Bell System Tech. J., 7, 639-707 (1928). 8. G. N. Watson, A Treatise on the Theory of Bessel Functions, Macmillan, New York, 1944. 9. D. E. Thomas, Tables of phase associated with a semi-infinite unit slope of attenuation, Bell System Tech. J., 26, 870-899 (1947). 10. E. O. Powell, An integral related to the radiation integrals, Phil. Mag., 34 (7), 600-607 (1943). A. Fletcher, Notes on tables of an integral, Phil. Mag., 35 (7), 16-17 (1944). F. W. Newman, The Higher Trigonometry, Superrationals of Second Order, Macmillan and Bowes, Cambridge, England, 1892. A. Fletcher, J. C. P. Miller, and L. Rosenhead, An Index of Mathematical Tables, Scientific Computing Service, Ltd., London, 1946. 11. Y. \V. Lee, On Wiener filters and predictors, Proceedings of the Symposium on Information Networks, April 12-14, 1954, Vol. III, pp. 19-29, Polytechnic Institute of Brooklyn, N ew York. 12. L. A. Zadeh, General filters for separation of signal and noise, Proceedings of the Symposium on Information Networks, April 12-14, 1954, Vol. III, pp. 31-49, Polytechnic Institute of Brooklyn, New York. D. Middleton and D. van Meter, Detection and extraction of signals in noise from the point of view of statistical decision theory, Pts. I and II, J. Soc. Ind. and Appl. Math., 3, 192-253 (1955); 4, 86-119 (1956). 13. P. M. \Voodward and 1. L. Davies, Information theory and inverse probability in telecommunication, Proc. Inst. Elec. Engrs. (London), 99, Pt. III, 37-44 (1952). 1. L. Davies, On determining the presence of signals in noise, Proc. Inst. Elec. Engrs. (London), 99, Pt. III, 45-51 (1952). D. O. North, Analysis of the Factors Which Determine Signal to Noise Discrimination in Radar, Rept. PTR-6C, RCA Laboratories, June 1943. G. W. Preston, The design of optimum transducer characteristics using the method of statistical estimation, Proceedings of the Symposium on Information Networks, April 12-14, 1954, Vol. III, pp. 51-59, Polytechnic Institute of Brooklyn, New York. L. A. Zadeh and J. R. Ragazzini, Optimum filters for the detection of signals in noise, Proc. I.R.E., 40, 1223-1231 (1952). J. L. Lawson and G. E. Dhlenbeck, 'Phreshold Signals, Mass. Inst. Technol. Radiation Laboratory Series, Vol. 24, McGraw-Hill, New York, 1950. T. G. Slattery, The detection of a sine wave in the presence of noise by the use of a non-linear filter, Proc. I.R.E., 40, 1232-1236 (1952). 14. L. A. Zadeh and J. R. Ragazzini, An extension of Wiener's theory of prediction, J. Appl. Phys., 21, 645-655 (1950). 15. S. S. L. Chang, Two network theorems for analytical determination of optimumresponse physically realizable network characteristics, Proc. I.R.E., 43, 1128-1135 (1955). 17-34 INFORMATION THEORY AND TRANSMISSION 16. R. C. Booton, An optimization theory for time-varying linear systems with non-stationary statistical inputs, Proc. I.R.E., 40, 977-981 (1952). R. C. Davis, On the theory of prediction of non-stationary stochastic processes, J. Appl. Phys., 23, 1047-1053 (1952). J. Bendat, A general theory of linear prediction and filtering, J. Soc. Ind. and Appl. Math., 4, 131-151 (1956). 17. A. G. Bose, A theory for the experimental determination of optimum non-linear systems, I.R.E. Convention Record, Pt. 4, pp. 21-30, March 1956. R. Drenick, A non-linear prediction theory, Trans. I.R.E., PGIT-4, 146-152, (Sept. 1954). 18. W. L. Everitt and G. E. Anner, Communication Engineering, McGraw-Hill, New York, 1956. R. E. Graham, Linear servo theory, Bell System Tech. J., 25, 616-651 (1946). 19. W. Cauer, Ausgangsseitig Leerlaufende Filter, ENT, 16, 161-163 (1939). E. A. Guillemin, A summary of modern methods of network synthesis, in Advances in Electronics, Vol. 3, pp. 261-303, Academic Press, New York, 1951. 20. Y. Peless and T. Murakami, Analysis and synthesis of transitional ButterworthThomson filters and band pass amplifiers, RCA Rev., 18, 60-94 (1957). 21. S. Butterworth, On the theory of filter-amplifiers, Exptl. Wireless and Wireless Eng., 7, 536-541 (1930). V. D. Landon, Cascade amplifiers with maximal flatness, RCA Rev., 5, 347-362 (1941). 22. "\V. E. Thomson, Networks with maximally-flat delay, Wireless Eng., 29, 255263 (1952). J. Laplume, Amplificateurs moyenne frequence a distortion de phase reduite, L'Onde Electrique, 31, 357-362 (1951). 23. S. "Darlington, Synthesis of reactance fourpoles, J. Math. Phys., 18, 257-353 (1939). 24. A. J. Grossman, Synthesis of Tchebyeheff parameter symmetrical filters, Proc. I.R.E., 45, 454-473 (1957). D INFORMATION THEORY AND TRANSMISSION Chapter 18 Data Transmission Pierre Mertz 1. Introduction and Symbols 2. Formation and Use of the Electrical Signal 3. Transmission Impairment References 18·01 18·07 18·18 18·30 1. INTRODUCTION AND SYMBOLS Basic Considerations. Data which are generated at a given point, either as a result of collecting original information or at the output of a computer as a result of the processing of other data, often have to be transmitted to some other point in order to be used for further data processing or remote control. Two basic parameters determine the extent of the undertaking which this transmission involves. 1. The order of magnitude of the distance between the points of origination and utilization. 2. The nature of the data that 'are to be transmitted. This includes the information content of the data and the frequency band which is required to handle it in the transmission medium. For fundamental discussion on this point see Chap. 16. The treatment here analyzes principally the current practical art, in which the efficiency of utilization of the frequency band is much lower than ideal. The adaptability of the data signals to transmission over available facilities is a practical factor of great importance. There are extensive 18·01 18-02 INFORMATION THEORY AND TRANSMISSION systems of communication already set up, reaching over vast areas, which are in current commercial use. Transmission Distances Involved. The engineering effort required to set up a system of data transmission varies considerably with the distance involved. Some concrete illustrative gradations are: A few inches or feet One to a few hundred feet One to a few miles One to several hundred miles Several hundred to several thousand miles International or intercontinental facilities This discussion stresses particularly lengths in the middle regions, from a few miles to a thousand miles. Nature of Transmission Facilities Available These facilities need to meet certain requirements, discussed below. Of these the frequency band which they are capable of transmitting is paramount, and is of chief concern here. Local Wiring. The band which this wiring will handle is indefinite, and it varies with the physical structure of the conductors and their inductive exposure to other electrical circuits. Bands have been handled from less than 100 cycles to television bands of a few megacycles. Telephone Facilities. These include all of the plant which has been developed for telephonic purposes, and hence they comprise a wide variety of facilities. They are sometimes nominally characterized as capable of a 3-kc band. This full width band is not usable for data transmission, . partly because some of the facilities cut off below this and partly because the lower frequency region, below 1000 cycles, is not likely to be effectively employed in the data transmission (Ref. 1). See Sect. 2 for more quantitative details regarding a usable band. There are telegraph facilities of narrower band, but since these are usually multiplexed on telephone facilities, they are not considered separately. Program Transmission. These circuits are commercially used for the interconnection of radio broadcast stations. They have frequency bandwidths, in round numbers, of 3, 5, 8, and 15 kc (Ref. 2). As in the case of the telephone facility, the full band cannot be expected to be utilizable for data transmission. Also the commercial demand for the 8- and 15-kc bands is very low, so that there is at present a substantial network of only the 3- and 5-kc bands. DATA TRANSMISSION 18-03 Television TransInission. An extensive network exists at present interconnecting television broadcast stations and studios and facilities for theater television (Ref. 3). The bandwidth of these facilities generally runs to a little over 4 Mc. However, on older coaxial cable facilities the bandwidth is only 2.7 Mc. Some experimental facilities of broader frequency band than 4 Mc have been furnished for short period tests, but not on a commercial basis. Other Wide Band Conductor TransInission. For economy, telephone facilities are frequently gathered together in more or less large groups. The combined signal for the entire group is then handled over a wire circuit much as a single signal (Ref. 4). Groups of 48-kc bandwidth, and super-groups usually of 5 groups, or 240 kc, are handled in this manner. Also, on other types of system, bands of 16 kc are found. The use of these types of bands would, of course, require the development of arrangements for extending them from the terminals, at offices of the common carriers where they are located, to other premises. Carrier current facilities are also multiplexed by power companies on power lines. These are suitable for data transmission (Ref. 32). Radio Facilities. Radio facilities naturally present certain elements of flexibility in their use compared with facilities provided over conductors. The limits to this flexibility are, however, set by allocation problems and by the propagation characteristics of the frequency region used (Ref. 5). The frequency bandwidths used run from those for individual channels to large aggregations of multiplexed channels which may include television channels (Ref. 6). The utilizable bandwidth for the individual channels is not necessarily set by the adjacent allocations. It is often actually set by multipath echo effects. It tends to run from something under a telephone bandwidth (3 kc), up to the general order of magnitude of television channels (6 Mc). Radio channels that form part of a large aggregation, particularly those leased from common carriers, tend to run at telephone or television bandwidth, and differ little from similar circuits over conductor facilities. Similarly, group and super-group bands of intermediate width are transmitted, but the use of these again requires the development of arrangements for extending them from the terminals. Nature of the Data Data consist fundamentally of two types of information (Ref. 7). 1. Choices among a group of possible conditions. A single datum, such as a room temperature, represents the single choice out of an established gamut. The total possible number of choices in that gamut depends both on the range of the gamut and on the precision of the indication within the A 18-04 INFORMATION THEORY AND TRANSMISSION gamut. For example, a range of room temperatures may be established between 50° and 90° F, and the indication may be given to individual degrees. Then the datum represents one choice out of a possible 40, and it may be this which is to be transmitted. 2. The timing of one or a series of events (Ref. 31). One might, for example, send the equivalent of a clock ticking from one geographical place to another, to assure the simultaneity of astronomical observations made at these places. In many cases in practice, both types of information may be needed. For example, in air traffic control, both the position of a given plane and the time at which it occupies that position, are needed. The position is indicated at the same time that it is occurring. Such a datum is said to be sent in real time, as distinguished from sending it much later on as a component in some abstract calculation. Data sent in real time are characterized by becoming "stale," i.e., of losing their value, if delayed too long in transmission. Continuous Analog Data. In the room temperature example cited above, the temperature may be represented by the position of the end of a mercury thread, or by the angular position of a shaft (dial thermometer), or again by the value of a given voltage. This position or voltage is not the actual temperature, but may be identified as analogous to it. Such data, where some different quantity varies proportionately (or according to some other appropriate law of variation) to the quantity desired, are called analog data. The demarcations between choices are not emphasized in the datum quantity, but they are important in a statement of the indication. Where the analog relationship between the utilized data and the original quantity is not interrupted, the data are said to be continuous. Discontinuous Analog Data. It may suffice to have the temperature information once every 10 minutes instead of continuously. An analog quantity may be set up in which the relationship to the original quantity is interrupted when not needed. The results are called "discrete" or discontinuous analog data. This may make it possible to interlace other data between the temperature readings. Multiple Speed Analog Data. In the case of a clock it has been found convenient to use the angular position of the shaft of the minute hand to identify one out of 60 choices, or one minute in the hour. For general use, however, a range of 12 hours is desirable. It is not convenient to use a pointer that can identify one out of 720 possible choices. The problem is solved by using two shafts, one geared to the other. The minute hand identifies one out of 60 choices. The hour hand identifies one out of 12 choices, each of which corresponds to one group of the 60 choices of the DATA TRANSMISSION 18-05 minute hand. The principle is sometimes extended by adding a third shaft and hand to read seconds. It is even further extended in conventional watthour and gas meters. All these are examples of "multiple speed" or "multiple shaft" analog data. Digital Data. The above process can be carried to its logical conclusion, where each shaft distinguishes only one out of two choices. In this extreme case the demarcations between choices are emphasized, and the choices would more usually be indicated by two-position members rather than by shafts. The choice may be considered as identified by a sequence of binary indications or binary digits and the data are called digital. Less extreme forms are sometimes used in which one out of three or more discrete choices are indicated by each digit. Digital information may be transmitted over a group of wires, byassigning one digit to each wire. This is known as parallel transmission. Or the various digits may be assigned to successive ordered pulses (or spacing intervals) on a single wire. This is called serial transmission, and the series of digits may be ordered in either direction. Examples. The digits indicating large values may come first (as in reading decimal digit Arabic numerals), or those indicating small values may come first (as in adding or multiplying operations with Arabic numbers). TiIning Data. This information can be indicated in a variety of ways. More usually the desired time is indicated by the wave front of an appropriate transition, say in voltage. Starting and Other Auxiliary Inforlllation. In the example just given above, where successive digits are transmitted serially, it is usually desirable to identify the start of the sequence by some auxiliary information. At other times the auxiliary information is in the nature of a pilot, reference or calibration datum against which the magnitude of the utilized data are compared before actual use. Other auxiliary information is sometimes needed for error checking or possible other purposes. Error Standards. It is not generally expected that data transmission will be completely perfect. For one or another reason, errors are caused. Thus in engineering a given system there is some need to give thought to what kind of error performance will be acceptable. In the case of analog data although the boundaries between successive choices are not emphasized, the spacing between the choices is important. This spacing is obtained from the precision which is found useful in the data. It is expected that this precision will be maintained in the transmission of the data. I t is common to express the error expected or experienced, in terms of its root-mean-square (rms) value (see Chap. 13). Occasionally a maximum error is noted, often say three times the rms 18-06 INFORMATION THEORY AND TRANSMISSION value. In a Gaussian distribution, errors larger than this occur with a frequency of about 1 part in 370. Timing errors are measured by a similar rms or maximum displacement, but in a timing variable rather than an amplitude parameter (Ref. 31). In the case of digital data an elementary measure of the error is the frequency of occurrence of errors in the binary digits in the data at the receiver. On occasion, a more sophisticated measure is desirable which takes account of the distribution of the errors in time. This is because in general, when one digit in a specific group of digits is in error, the usefulness of the entire group is vitiated. Thus errors in close succession, in such cases, do not cause as much ultimate impairment as when they are more scattered. Measures of impairment in such cases are not easily established without some detailed knowledge of the entire scheme for setting up and using the data that are transmitted. Where special measures are incorporated into the signals for error checking, it is usually convenient to count the frequency of occurrence of both the detected and undetected errors. Of these, the first are apt to constitute only a minor impairment but the second are serious. Undetected errors are those not detected during the test, but obtained from some later comparison between the signals actually sent and those actually received. SYIllbols a D f I k M p(x) R r r(w) T t V v () ). T ¢, ¢(w) w relative echo amplitude envelope delay cyclic frequency wave amplitude normalizing constant, equal to mean square value of I matrix of resistances probability density of variable x resistance, with subscripts for specific cases amplitUde ratio amplitude ratio at radian frequency w pulse duration time time variable voltage instantaneous signal voltage pulse separation time (front edge to front edge) wavelength of ripple along cyclic frequency scale (= llf) echo delay phase shift at radian frequency w radian frequency (= 27rf) DATA TRANSMISSION 18-07 2. FORMATION AND USE OF THE ELECTRICAL SIGNAL Encoding and Decoding The first step in the preparation of a signal for transmission consists in expressing the variable that is intended for transmission into some sort of code that can be used to form the electrical configuration. Analog Data. There is not very much latitude for coding such data, aside from transferring from one type of physical quantity into another. Thus a temperature or a distance may be transformed into a shaft rotation or a voltage. The principal modification that can be introduced is the insertion of some sort of nonlinear relation between the one quantity and the other. Digital Codes. A simple code into which an analog quantity may be converted is the binary digital code (see Chap. 16). A diagram of the 8 choices for a 3 binary digit code is illustrated in Fig. 1. The dark areas 2345678 FIG. 1. ~ Mark D Space Diagram of 3-digit binary code selections. indicate, say, voltage (or current) "on," and the white areas, voltage (or current) "off." They are termed respectively marking and spacing. A variation of this code sometimes used to simplify an encoding mechanism is the reflected binary or Gray code (Ref. 8). This is shown in Fig. 2. The simplification in mechanism comes 2345678 essentially because the change from any given choice to the next adjacent choice involves the change of only one binary digit. Other variations of this simple code type have been devised. One such is a coding to in- FIG. 2. Diagram of 3-digit reflected elude negative values of the encoded binary (Gray) code selections. quantity (Ref. 9). More complicated codes have been devised in which present code designation is a function also of past history of the value being encoded, or of more than one variable (Ref. 10). These complications are conceived principally to condense the information to be transmitted into the most compact code possible. They do involve an increase in cost of equipment, and a loss in time at both en- 18-08 INFORMATION THEORY AND TRANSMISSION coder and decoder that may be important where the transmission operates in real time (see Sect. 1, Nature of the Data). Some economic study has been made of such points (Ref. 11). See also Chap. 16. Processes of Digital Encoding. Only some elementary principles can be mentioned here (Ref. 9). More details of these processes are given in VoL 2, Chap. 20. 1. A basic method of encoding consists in laying out the analog input along one dimension of a two-dimensional code matrix and reading the coded output along the other dimension. This can be utilized for any arbitrary code. As an illustration the diagrams of Figs. 1 and 2 may represent plates, with holes punched through the shaded squares, in a cathode ray tube (Ref. 12). The electron beam is deflected along the horizontal coordinate by analog voltage input. A subsequent vertical deflection then gives the coded signal, in serial form, on an electrode beyond the plate. The beam goes through the punched holes, but is stopped where no holes exist. 2. A second basic method consists in encoding the analog quantity first into a unit-counting code. For each value of the analog quantity to be transmitted a counting mechanism counts and cumulates unit increments up to a value nearest to the input quantity. A unit-counting code is not efficient for transmission since the number of binary digits sent is large. It can be converted into a binary digital code by successive scale of two counting dividers (see Vol. 2, Chap. 20). If other codes are desired a further conversion can be made. 3. A third basic method uses the general principle that any decoder may be used for encoding by associating it with an appropriate inverse feedback path. An arbitrary code indication is set up, say the last previous transmission. This is decoded, and the result is compared with the input. The inverse path mechanism uses the error to step the code in the direction to reduce the error. The stepping mechanism continues until the error is less than the smallest choice interval. Coding mechanisms in which the present output depends on more than the single present input exhibit a greater variety of types and will not be discussed here. Processes of Digital Decoding and Smoothing. The decoder in general has two broad functions. 1. Decoding proper is to convert the digital indication into an analog indication. In nearly all cases this appears as an individual analog indication when the code is received. In some cases this is the only function needed. 2. To hold or store and possibly smooth the analog indications are needed where individual indications are required at more frequent intervals than the code permits, or where continuous analog indications are required. DATA TRANSMISSION 18-09 The decoding function proper may be classified by types of mechanism, as for the encoder. For the moment these are limited to the case where a single input leads to a single output. (a) In a basic type of decoder the choices indicated by the respective binary digits lead to the single element of an arbitrarily prearranged matrix. This element translates to its prearranged analog output. M FIG. 3. Relay matrix for 3-digit code. An example of this is shown in Fig. 3 in terms of relays. The respective digits operate relays 1, 2, and 3. Any given choice leads to some resistance of the matrix M. These are chosen in advance to yield the desired analog voltage V at the output, for the given choice. (b) A variation of this is applicable to codes where the successively ordered digits contribute proportioned weights to a cumulation of the 1~ 2~ 3~ FIG. 4. Relay matrix for 3-digit binary code. analog total. This occurs in the binary digital code. An example is shown in Fig. 4. The successively ordered digits choose respective resistances Rl, R 2 , and R 3 • These are proportioned to cunmlate currents, in the progressive ratios of 4, 2, and 1, in the output resistance, which must be 18-10 INFORMATION THEORY AND TRANSMISSION low compared to the R's to keep the contributions independent. The output voltage V gives the analog for the binary digital choice. (c) If instead of current contributions, successive pulse counts are cumulated, the process leads to a translation from a binary digital to a unit-counting code. This can then translate further to the final analog quantity. Such an arrangement is the inverse of the second basic encoding process. (d) Finally an encoder may be inserted in an inverse feedback path for conversion into a decoder. An arbitrary analog output is set up, say the last value just previously decoded. This is encoded, and the code indication is compared with the present input code. The inverse feedback path uses the difference to change the analog value in the direction to reduce the difference. Several steppings of the code may be needed until identity is secured. This basic method is not easily applicable to arbitrary codes. Holding the analog signal requires some form of temporary storage (Ref. 13). Where the error objective calls for a more accurate interpolation between the discrete values, still more equipment is needed. The process has been here called interpolation, but it is clear that after one discrete value has been obtained and before the next is available, the process really required is extrapolation or prediction. The principles are described in Chap. 17 for the optimizing properties required in the above processes. More than just an electrical filter may be needed, because of the discrete character of the values. Where the data are such that the best correlation occurs between successive values of the analog variable, a mere holding, or zero order prediction, is optimum. Where, as is quite possible in practice, a good correlation holds between successive rates of change (or velocities), a first order predictor is better. This predicts from the velocity as derived from past data. Where a good correlation held on the accelerations, a second order predictor would be called for. This second function of the decoder may be used by itself in cases where the data were merely sampled as discrete analog data, and not digitally encoded. Error Detection Codes. It is possible to introduce deliberate redundance into the code used in the data transmission path. This establishes auxiliary relationships. At the receiver a test may be made for these auxiliary relationships. When they are found missing, the fact is an indication of error in the transmission. A simple form of this redundance is the parity chec7~ (Ref. 14). For this the message is divided into successive groups of binary digits, and an extra digit is provided at the end of each group for the redundant informa- DATA TRANSMISSION 18-11 tion. The number of marks in the group is noted as being even or odd. If even, the added digit is made marking; if odd it is made spacing, to make the total always odd. Hence the reception of a total even number of marks indicates an error. Undetected errors can exist when two errors conspire to main tain the total odd. The system can also be arranged to make the correct total always even. Another example is the 2 out of 5 code. Here 5 binary digits are always disposable for the signal, and 2 of these are always made marking. This gives 10 combinations, which is very handy for translation from and to a decimal digit code. When the receiver receives any other than 2 marks, it indicates an error. Two errors can combine here also to evade detection. The code may also be used with 3 marks out of the 5 binary digits. The redundance may be increased to the point where the specific digit in error may itself be located in the signal, and therefore corrected. This is an error-correcting code. Some combinations of errors exist that can be detected by this, but not corrected. Even rarer combinations are possible which evade detection. Modulation and Multiplexing Methods Several steps must be considered in these processes. Baseband Signal. The information which is to be transmitted from one point to the other eventually appears in the form of an electric amplitude (say a voltage or a current) before it is propagated over the transmission medium. In a continuous analog system it appears, say, as a continuously varying d-c voltage. In a discrete analog system it appears as a succession of pulses of varying voltage amplitudes. In a binary digital system it appears as a succession of pulses each individually of either marking or spacing voltage amplitude. The signal in this form is called a baseband signal. It has a frequency spectral distribution of power which goes down to and includes zero frequency (or d-c). Its amplitude distribution depends upon the shaping of the individual pulses or shape factor (where pulses are involved) and upon the sequence of amplitudes which codifies the information or discrimination factor (Ref. 15). Nyquist has shown (Ref. 15) that the complex amplitude at each frequency is equal to the product of these two factors, each of which is complex. In a code that gives sufficient randomness to the signal, and that permits positive and negative voltage values, with an average of zero, the long time average of the power distribution in the discrimination factor is flat over the frequency range. In such a system, therefore, the signal power distribution is equal to that for the shape factor, or for a single pulse (aside from a normalizing factor). 18-12 INFORMATION THEORY AND TRANSMISSION For an idealized pulse with rectangular sides, such as shown at A in Fig. 5, the frequency band is infinite, as illustrated by the full line. However, most of the power is located below the frequency l/T, where T is the pulse duration. Practical pulses are in general rounded somewhat as shown at B of Fig. 5. For these, all but a negligible proportion of the power is located 1.0 I"'III::"""----r----,.-----r---~--___. ... 0.75 Q.) ~ 0 c. roc .~ Ul 0.5 Q.) > ~ co Qj a:: 0.25 FIG. 5. Power spectra of various pulses. below frequency l/T, as illustrated by the dotted line. The pulse form at C indicates that obtained from the full line spectrum, cut off to zero above frequency l/T. NOIninal Effective Band. Nyquist h,as further shown (Refs. 10 and 15) that the minimum frequency band required to transmit independent amplitudes for each pulse, where the pulse separation is 0, is 1/(20). This may be called a nominal effective band. In practice a somewhat larger band is generally used. If the successive pulses are set up edge to edge (that is, o made equal to T), then in the full line of Fig. 5, the nominal effective band reaches from 0 to 1/(2T). The band actually used in practice usually reaches to l/T, or twice as far. The part of the band between 1/(2T) and l/T transmits a portion of the signal that has low power, and may be designated as rolloff band. Occasionally a narrower rolloff band than that reaching to l/T may be used. This results in oscillatory transients or "ringing" before and after each pulse of the signal. Where short pulses are transmitted at infrequent intervals, 0» T. In such cases a wider frequency band is used than necessary for the informa- DATA TRANSMISSION 18·13 tion, and liT» I/O. Additional pulses from other channels of information can be interlaced in between, to use the frequency space more fully. It is found in Sect. 3, Tolerances, that this leads to the need for high fidelity in the transmission. All1plitude Modulation. Except for direct wire transmission over short distances it is not usually practicable to transmit a baseband signal of the spectral distribution illustrated in Fig. 5. This is because it involves transmission all the way down to and including d-c . ... Q.I 3: o C. Q.I ::ro :0:; ai 0:: Frequency FIG. 6. Spectra of baseband and carrier pulses. A simple procedure to avoid the need of d-c is to use the baseband signal to form the envelope of a carrier wave, as shown at B in Fig. 6. Here the spectrum becomes that of the carrier frequency and two symmetrical sidebands. Each of these has the same shape as the baseband. The baseband or envelope signal is recovered at the receiver usually by the use of a rectifier. Certain precautions are needed when using a carrier signal in this way. Interferences are likely to develop between the lower sideband and the baseband, if the carrier is placed at such a low frequency that they overlap. Each of these interferences can, if needed, be reduced at the source. The more usual procedure, however, is to allocate the spectrum to avoid such an overlap. Another point to. be noted is that over certain types of telephone facilities, in which another carrier wave is used for the transmission, the data 18-14 INFORMATION THEORY AND· TRANSMISSION signal carrier may not be reproduced at its exact frequency. The received signal may be displaced up to two cycles per second from that transmitted (in some older facilities this may be some 20 or 30 cycles per second). Thus the system cannot be designed in such fashion that the reproduction of this exact frequency is critical. For example, the carrier frequency cannot be depended upon for use as a synchronizing frequency. Vestigial Sideband Transmission. The information carried in one sideband is duplicated in the other. Thus only one of the two is necessary Carrier FIG. 7. Vestigial sideband spectrum. for transmission, and a saving in frequency space required in the transmission medium is achieved by suppressing the other sideband. However, cutting off a band sharply at the carrier is difficult in data transmission, where the power spectrum contains frequencies close to and including the carrier. To solve this Nyquist (Ref. 15) has indicated a sloping cutoff as shown in Fig. 7. This retains a "vestige" of the suppressed band, and is, therefore, called vestigial sideband transmission. The cutoff through the carrier region introduces interfering spurious components to the signal. These are called "quadrature" components because in their interfering effect they add in quadrature to the undistorted signal. The interference usually causes an impairment in signal-to-noise ratio, which is discussed more fully in Sect. 3, Quadrature Component in Vestigial Sideband TransmISSIOn. When all the precautions which have been discussed are allowed for, it is found that with double sideband modulation a signaling speed of about 650 signal elements (each of duration T = 1.54 milliseconds in Figs. 5 and 6) per second may be transmitted over a substantial proportion of telephone circuits (Ref. 16). With vestigial sideband transmission about 1600 signal elements per second have been transmitted over selected and suitably treated telephone circuits (Ref. 17). This is slightly. more than double that with double sideband. The increase comes mostly from the use of the vestigial band, but in part from selection and treatment of circuits. DATA TRANSMISSION 18-15 Frequency Modulation. Characteristics of the carrier wave other than its envelope amplitude may be varied in accordance with the baseband signal. A common example is the variation of its instantaneous frequency. This can have certain advantages, for example, when transmitting over a medium whose amplitude response at the receiver varies from instant to instant. A more detailed analysis also shows that transmission by frequency modulation is less subject to impairment from noise than by amplitude modulation (Ref. 18). Other Methods. Still other characteristics of the carrier wave may be varied to indicate the signal (Ref. 19). Phase modulation may be used. Or the data signal (whether itself constituted of pulses or not) may be transmitted over a medium that uses a pulse code form of modulation (Ref. 20). In this case the instantaneous amplitudes of the data signal are reproduced by a secondary pulse code. However, the requirements for this mode of transmission have not as yet been worked out. The range of different possibilities is very great. Frequency Division Multiplex. In Fig. 6 a second carrier with its sidebands may be placed at a higher frequency than B, and transmit an .... CLl :: a c. CLl .2: ro (j) a: Frequency FIG. 8. Carrier signal spectra, frequency discrimination. independent signal. More carriers can, of course, continue to be added, as suggested in Fig. 8. The limit is the frequency band available on the transmission system. This is known as multiplexing by frequency division. A consequence of this form of multiplexing lies in modulation products which are generated between pairs (and larger groups) of the simultaneously operating channels. These products can cause interference into the channels in which they fall. The modulation arises from nonlinearity in the transmission process, at a possible variety of points according to the details of the facility used. Engineering precautions are needed to keep the interference down to an acceptable level. Thne Division Multiplex. Where a signal uses a basic pulse which is repeated at much longer intervals than its own duration, other independent 18-16 INFORMATION THEORY AND TRANSMISSION signals may use pulses which are inserted intermediate between these. A scheme of five such channels is indicated in Fig. 9. The limit is, of course, fixed by the relative durations of the individual pulses, the spacing interval between pulses of the same channel and the guard space which is required between pulses of one channel and of its nearest neighbors to prevent mutual interference; The choice of whether frequency or time division multiplexing is preferable in a given case depends upon the nature of the transmission impair- 1l--.JlL...--___.-JnL...--____ n ~ ~ ~ 15Jl n Il~ ______ n~ n n ____ ~ 11 Time FIG. 9. Pulse signal profiles, time discrimination. ments to be expected and upon the relative costs. Both methods are in extensive use. Other Forllls of Multiplexing. In a generalized study of multiplexing (Ref. 21) it is found that n independent channels can be multiplexed on a given signal through the superposition of n mutually orthogonal functions. The arrangements discussed above represent two possible solutions, but there are many others. A single example is the use of independent amplitude modulation channels on the sine and cosine waves of a carrier. Although these other orthogonal function solutions offer possibilities in the art, they have not at the present time received as much design effort on actual embodiments as have the frequency division or the time division multiplexing. Auxiliary Signals It was noted in Sect. 1, Nature of the Data, that when the data are presented in certain ways it is desirable to include some starting or other auxiliary information to mark out specific blocks of the information or to give other reference conditions. This auxiliary information needs to be distinguished in some way from the primary information. It comes regularly in the organization of the transmission, so that while the system is in normal operation the distinction need not be particularly conspicuous. However, for one cause or another the transmission may occasionally be interrupted. When this occurs, reestablishment is likely to be quicker, the more distinctive the auxiliary signals are. DATA TRANSMISSION 18-17 lYluliiplexing the auxiliary signal with the principal signal may be done in a large variety of ways. A simple form is used in the standard teletypewriter. Here the distinction is secured by setting up a pattern of marking and spacing in the binary signal that is not duplicated in any FIG. 10. Stop and start pulses in teletypewriter signal. portion of any character. As indicated in Fig. 10, this pattern consists of a marking stop signal that in duration is equal to or greater than 1.4 signal elements, followed by a spacing start signal of one signal element duration. At a and b are shown stop signal elements of minimum duration, Frequency FIG. 11. Double sideband signal with auxiliary word start channel. at c one of longer than minimum duration. There is, of course, some possibility that the excess over the minimum duration would bring the stop signal to 2 signal elements. This could then be duplicated in portions of characters, which could then be confused with the stop-start pattern. FIG. 12. Use of amplitude discrimination for word start channel. This does occur on occasion, and the return to normal operation takes several characters. At the other extreme, distinctiveness in the auxiliary signal is obtained by the use of an extra, narrow band carrier channel for it. One example of this is illustrated in Fig. 11. Another method is to use amplitude discrimination (Ref. 17). This is illustrated by the signal of Fig. 12. 18-18 INFORMATION THEORY AND TRANSMISSION With this arrangement the amplitude range permitted for the signal is reduced in comparison with the power capacity of the transmission medium, since the maximum capacity is used for the auxiliary signal. Thus the effective signal-to-noise ratio is less than it could be if the principal signal utilized the full capacity. There are numerous other methods of introducing the auxiliary signals. 3. TRANSMISSION IMPAIRMENT Electric circuits do not reproduce signals with complete fidelity. A basic element of the engineering of a system consists, therefore, in establishing tolerances on the permissible impairment of the signal consistent with acceptable performance. The influence of limitation of the frequency bandwidth has already been discussed in Sect. 2. Noise No electric communication circuit is ever free from varying currents and voltages which are uncorrelated with the transmitted signal (except possibly in some statistical manner) and which tend to be confused with it at the receiver. These erratic waves have been perceived and studied in telephony. They have there been named noise because they end in audible noise in the receiver. The term has, however, been generally extended in the art to cover the effect in other types of communication. Since noise is unpredictable in detail, it has in general to be dealt with in a statistical manner. Extensive study has been made of its statistical and other properties (Ref. 22). The discussion here is confined to a simple exposition of what one can expect in signal transmission media. Single Frequency Noise. The noise wave may consist of a sustained single frequency. On the time scale this is a simple harmonic variation in voltage or current. On the frequency scale it is a single line spectrum. Single IITIpulse Noise. On the other hand, the wave may consist of a sharp impulse at a given time. On the frequency scale it consists of a density of components which is uniform in amplitude out to some frequency beyond which it drops and approaches zero. This frequency depends upon the duration of the impulse. For an infinitesimal duration the frequency is infinity. The phases of the components are closely correlated. CUITIulation to Gaussian Noise. It is clear that the single frequency and the single impulse represent opposite extreme types of noise. In practice one can encounter a cumulation of a number of different single frequencies, each of different amplitude and phase. One can also encounter a cumulation of different single impulses, each of different amplitude and timing. 18-19 DATA TRANSMISSION Each of these cumulations, as it becomes more extensive, and with sufficient randomness in its components, approaches "white" Gaussian noise (Ref. 23). White Gaussian noise may be defined in simple terms as random noise which has a Gaussian distribution of amplitude in the time domain and uniform distribution of power in the frequency domain. (To keep the total power finite, the uniformity need extend only somewhat beyond the frequency range under consideration for the signal.) The term "white" stems from its analogy to Rayleigh-Jeans radiation in optics (Ref. 24). This is somewhat of a misnomer, however, in that this radiation l.0 R 1 .1 . aylelgh distribution (\ 0.8 I \1;1 I I I , "C \ \ \ I ~ I :cro a: \ I cQ) -g \\ I ~0.6 en \ ! 0.4 Gaussian _ distribution V ,1\ J 0.2 o ~ -3 ,/ -2 II , \ \ \ \ \ I -1 ) 0 1 \ \ \ ~~ 2 3 Amplitude, ratio to rms FIG. 13. Normalized Gaussian and Rayleigh noise distributions. has a uniform distribution of power in the frequency spectrum. A white radiation used by colorimetrists has uniform power distribution in the wavelength spectrum. Thus Rayleigh-Jeans radiation, which has more power than this in the blues and less in the reds, is not white but really blue. The Gaussian distribution in a normalized form is expressed by the equation (1) P dI _ (~) Vk Vk - ~ -I2/2k. V27rk e This is the probability that the instantaneous amplitude lies between 18-20 INFORMATION THEORY AND TRANSMISSION (I + dI) / 0, where k is the normalizing constant, equal to the mean square value of I. A plot of this normalized distribution is illustrated in the full line of Fig. 13. The normalization consists in referring to the amplitude I in terms of its ratio to the rms value 0. Noise encountered in practice is rarely apt to be exactly anyone of the three types which have been described. These are, however, much used as idealizations for mathematical and engineering purposes. Non-White and Non-Gaussian Noise. On occasions where the noise actually encountered is sufficiently different from the idealization to influence a conclusion it is necessary to deal with the non-white and nonGaussian noise as such. The deviation of noise from the idealized white Gaussian may be characterized in a number of different ways. It may be characterized by size, as large or small; toward single frequency idealization or toward impulse idealization; by variation in spectral distribution or in distribution of amplitudes as a function of time; or other characteristics. One frequently used variant is noise obtained from Gaussian noise by envelope rectification. This follows a Rayleigh distribution of amplitudes (Ref. 23). The Rayleigh distribution in normalized form is expressed by the equation 1/0 and (2) 1>0 = 0, 1<0, where the symbols have the same meaning as in eq. (1). A plot of this distribution, in the normalized form, is illustrated in the dotted lines of Fig. 13. A second frequently used variant is filtered white Gaussian noise. This modifies the power spectrum of the noise. A special case occurs when the filter has a passband which becomes narrow compared to the' spectrum of the signal which is being disturbed. In this case the noise approaches single frequency noise. A third variant is the impulsive noise encountered in communications circuits. This is cumulated from single impulses of the type which have been considered. However, the number cumulated is small enough and not sufficiently random in amplitude or time of occurrence, so that the distribution of amplitudes (including zero amplitude, which is important) is not Gaussian. This can occur for example in telephone circuits exposed to some forms of dial-switching equipment or to static. It is necessary DATA TRANSMISSION 18-21 for close engineering in such cases to determine the exact distribution of amplitudes, and sometimes of timing instants. Influence of Noise on Error. The effect of noise, of course, is that it changes the received wave and tends to cause the signal for one set of data to be confused with that for another. CASE 1. Analog Data. The error may vary continuously from zero to large amounts. One simple method of expressing the error is in terms of its rms value. There are other methods (see Chap. 17), but they usually lead to more complicated techniques of engineering. The optimum noise performance may be obtained in a given system when both signal and noise are filtered through the optimum filter (see Chap. 17). t Amplitude FIG. 14. a Time~ Effect of noise in causing errors in data pulse signals. CASE 2. Digital Data. The received signal wave may vary over a range without any misinterpretation resulting (Ref. 25). When the departure exceeds this range, however, a marking signal may be misinterpreted as a space, or a space as a mark. A simple illustration of this is shown in the baseband signal of Fig. 14. At a is the assumed signal, consisting of a space, a mark, two spaces, two marks, and a space. The interpretation of marking or spacing is assumed to be made according to whether the wave amplitude falls above or below the critical value b, at sampling instants which are designated on the line b. The effect of a few noise pulses/are indicated by dotted lines on a. In the simple baseband system shown, the critical level b is at half the marking level, or 6 db below marking. Thus the critical signal-to-noise ratio, in terms of marking level to noise peaks, is 6 db. For a higher signalto-noise peak ratio, no errors in transmission are caused by the noise. For a lower ratio, errors appear. Because of the erratic nature of noise, this is not always a convenient specification. If the noise has, say, a Gaussian distribution, no matter what the critical level b is, there is some finite probability that it will be exceeded (with the appropriate polarity) and cause an error. The engineering of the system then consists in first setting an acceptable error performance (see Sect. 1, Error Standards). This sets the acceptable probability for the existence of noise pulses of a given polarity 6 db below marking 18-22 INFORMATION THEORY AND TRANSMISSION level. In Fig. 13 one can tell, for Gaussian noise, how far above rms a level must be to occur with any given probability. This indicates how far below the critical level b the rms of the noise must be set. If that ratio is translated into db, then by adding 6 db one obtains the figure which must be specified for the signal- (marking level) to-noise (rms) ratio for the system, in order to meet the desired error performance. Error probability, 1 part in 10 8 24 22 > ~ 20 Q; tlO :ii 18 @ 16 E 3: 14 o ~ 12 r-~ 1--.- 106 107 - -r- r-1- I'-~ 10 3 10 4 105 102 Gaussian noise, -r-- ~'r-. 1-_ r--.. '''1-- Rayleigh / noise - .I -b: ~ -,~~ a) (f) " 8 1\ \ 6 E4 2 o ~ ~~. :§1O 'g 10 5 2 1 10- 6 \ 1\ 0:: I' 10- 5 10- 4 10- 3 10- 2 10- 1 1 2 5 10 100 Error probability, per cent FIG. 15. Error probabilities from Gaussian and Rayleigh noise distributions. For convenience this ratio in decibels has been plotted as the solid line in Fig. 15. A dotted line is shown for the case of noise having a Rayleigh distribution. If the noise is of the impulsive type it is usually impractical to specify it in terms of its rms value, as a ratio to the signal. In such cases it is apt to be expeditious to make measurements on the noise itself, to determine the amplitude of its peaks at the error frequency which has been set. Then the marking amplitude can be set 6 db above this. The noise margin of 6 db which has been discussed here is a basic figure. If the signal has three levels to be recognized, as in Fig. 12, the figure has to be increased. It will also be found (see below) that other margins need to be added to allow for other impairing effects on the signal. Echoes and Equalization Echoes and Transfer Functions. In general, no transmission medium reproduces a sent signal wave shape exactly in all its characteristics. The departures from exact faithfulness can be considered from two points of view, sometimes one being more convenient and sometimes the other. DATA TRANSMISSION 18-23 According to the first point of view the departures may be considered as a succession of more or less delayed echoes of the original wave. Some of the echoes are of the same polarity as the original wave, and some of the opposite polarity. According to the second point of view the signal and its transmitted reproduction may be analyzed into their Fourier transforms. Each Fourier component of the reproduction is obtainable from that of the original by multiplication by an amplitude response factor and displacement by a phase shift. The factor and phase shift vary from frequency to frequency over the spectrum of the signals. Since the Fourier transform is unique, the complete description in a linear system of a given case according to the one viewpoint can also be inatched exactly by a complete description according to the other viewpoint. Whichever one is used is then simply a matter of engineering convenience. Practical experience indicates that the echo treatment leads quickly to equalizer designs. This is because it forms an immediate bridge to functions of frequency, and equalizer designs are simple in such terms. The design of filters and equalizers from transfer functions of time is usually far more cumbersome. Equalization. In practical transmission systems these distortions are usually reduced by what is called equalization. A network is placed in tandem with the system which again multiplies all the Fourier components, each by an amplitude response factor, and displaces each by a phase shift. The response factor of the network is designed to vary in the inverse way from that of the system, so that the products of the two are as nearly as possible constant over the frequency spectrum. For this reason the network is called an equalizer and the process called equalization. The phase shifts are designed to add together to a total phase shift which is proportional to frequency. Since the perceptive mechanism of the ear is not very responsive to phase shifts, the equalization of telephone circuits has generally concerned itself almost exclusively with an equalization of the amplitude response factor. This unconcern with the phase correction is occasionally of importance in the use of telephone facilities for data transmission. It sometimes requires the insertion of phase correcting networks to supplement the amplitude correction already existing for the telephone use. It is not usually possible or economically feasible in practice to correct a system exactly. The considerations which are given below can apply equally well to a residual departure, left after such correction has been applied as is practical, or to an uncorrected facility. hnpairlllent of Noise Margin. It is clear that an echo partakes of one property in common with noise, i.e., it changes the received wave and 18-24 INFORMATION THEORY AND TRANSMISSION tends to cause the signal for one set of data to be confused with that for another. Where a given deviation has previously been set as acceptable, the echo uses up some of this possibility for deviation and leaves less of it as an allowance for the noise. In this sense, therefore, it impairs the noise margin of the system. Where, as was noted above, a margin of 6 db was necessary for the marking signal level over individual noise peaks to just avoid potential error, a greater margin is needed in the presence of echo. The amount of this excess margin which can be allotted to echo, in any given case, is a matter of engineering judgment. It depends upon the relative costs of reducing the echo and the alternative of reducing the noise. One may say that, in general, an inGrease of 1 db in margin is small, and that as severe f1limitation as this on the echo is economical where the Time- FIG. 16. Close-in and remote echoes in signal. echo is fairly easy to reduce. At the other extreme one may say that an increase of 10 db in margin is fairly large. I t is likely to be economical only where it is quite difficult to reduce the echo, or where the noise expected is very low. In the engineering of data systems it is convenient to consider steps respectively of 1, 3, 6 and 10 db in the noise margin impairment. The amplitude of echo that can cause a given impairment depends upon how much it is delayed with respect to the original signal. As an illustration the signal A in Fig. 16 may be followed by a comparatively long delayed echo at B. Here the impairment depends upon the echo amplitude, and it does not vary much with small changes in the echo delay. The signal may also be followed by a closely spaced echo as at C. The sum of signal and echo is shown at D. Here the major effect of the echo is to change somewhat the wave shape of the signal, but mostly it changes signal amplitude. A substantial part of the effect of the echo consists merely in changing the effective loss of the transmission facility somewhat. This part of the effect can be compensated for by an adjustment of receiving gain. The impairment from a closely following echo of a given amplitude is less than from a long-delayed echo of the same amplitude. Also the impairment can be expected to vary rather rapidly with delay, for the short delays. Relationship between Echoes and Equalization. This relationship, suggested above, may be examined quantitatively (Refs. 7 and 26). DATA TRANSMISSION 18-25 Consider a single Fourier component of the signal voltage, of frequency w/27r v = cos wt. (3) When this is transmitted over a system that generates an echo of relative amplitude a and relative delay T, it becomes (4) (5) v = coswt v + acosw(t - T) + a cos wt cos WT + a sin wt sin WT, + a cos WT) cos wt + a sin wt sin WT. = cos wt = (1 If the overall transmission is designated as v = r(w) cos [wt -
,/// ,/ Frequency (a) FIG. 17. / /' /' Frequency (b) Nominal effective band= 21T Phase shift characteristics: (a) remote echo, (b) close-in echo. listed in the third column as an echo tolerance. It is converted into decibels in the fourth column. The tolerance may be placed instead on the ripple amplitude in the amplitude response characteristic. The numerical figure of the third column expresses this ripple excursion, in each direction, in nepers. It is converted, in the fifth column, into decibels. The tolerance may also be placed on the phase shift ripple. For this purpose the quantity of the third column is assumed as measured in radians. For convenience it is converted into degrees in the sixth column. So far, nothing has been said concerning the absolute propagation time of the system. When this is taken into consideration, it is found that the phase ripple really occurs about a diagonal straight line through the origin, as illustrated in Fig. 17a. Where the echo delay is very short, and only a small portion of a ripple cycle appears in the utilized frequency band, the straight line about which DATA TRANSMISSION 18-27 the phase deviations are to be taken is not so easily identified. A more or less arbitrary, but practical construction is given in Fig. 17b. Here the straight line is drawn to intersect the actual phase at the frequency which marks the top of the nominal effective band. This is the reciprocal of twice the signal element duration. The phase departure is taken as the maximum double excursion from this straight line, as indicated by ~cp in the figure. Note that in Fig. 17a the excursion ~cp measures double the ripple amplitude and consequently double the echo amplitude. In Fig. 17b the double ripple amplitude is not accessible within the scope of the plot, and is larger than i1CP. Figure 17a represents a remote echo, such as at B in Fig. 16, and Fig. 17b represents a close-up echo, such as at C in Fig. 16. It is then clear that for a given excursion ~cp the echo amplitude in Fig. 17b (close-up echo) is larger than for Fig. 17a (remote echo). This variation of the actual echo amplitude for a given ~cp corresponds approximately to the variation in permissible echo amplitude for a given impairment suggested in Fig. 16. Thus (Ref. 7) the specification of the phase departure, for an allotted impairment, is roughly independent of whether the departure occurs as a single long bend such as in Fig. 17b, or as a fine structure ripple such as in Fig. 17a. The tolerances which are listed in the last two columps of Table 1 were set for a binary digital transmission. Tolerances for a continuous analog transmission have in practice been found to be of the same order of magnitude. However, tolerances for a discrete analog system, with time division multiplexed channels interlaced, need to be much more severe (Ref. 27). Envelope Delay Distortion. In practice it is often convenient to measure the phase shift characteristic of a transmission system in terms of its envelope delay. This represents the time of transmission of the envelope of a carrier, as the carrier frequency is varied through the spectrum. It is measured as (Ref. 28) (12) D = dcp/dw, where D = envelope delay, seconds, cp = phase shift, radians, w = radian frequency, radians per second. When this differentiation is applied to the simplified eq. (10) the result is (13) D = (d/dw) a sin WT = aT cos WT. 18-28 INFORMATION THEORY AND TRANSMISSION The double excursions of the ripples in eq. (13) are llD = 2aT. (14) It is found from eq. (13) that the application of a fixed tolerance on the envelope delay ripple irrespective of the wavelength of the ripple (or corresponding echo delay T) leads to an exaggeratedly severe limitation on echo amplitude a for large values of T. In other words the use of the. envelope delay for the purpose of specifying limits on phase distortion for data transmission tends to be unduly severe on fine structure excursions in the characteristic. Thus when the envelope delay criterion is used, it is necessary to be aware of this and appropriately ignore the finer structure ripples in the characteristic. In a general way it is found (Ref. 7) that a delay distortion of ±OA signal element duration gives a noise impairment, under unfavorable conditions, of about 3 db. If one takes distortions as roughly proportional to the permissible echoes the tolerance figures are given in Table 2. TABLE 2. ENVELOPE DELAY TOLERANCES Impairment, db 1 3 6 10 Tolerance, Signal Element ±0.15 ±0.4 ±0.7 ±0.9 Quadrature Component in Vestigial Sideband Transmission. An interfering component similar in certain respects to an echo is generated by the usual form of vestigial sideband transmission (Sect. 2, Amplitude Modulation). This is the quadrature component, so called because this interference adds in quadrature to the otherwise undistorted signal (Ref. 29). Although the precise wave shape of this interfering component is different from that of an echo, it does use up signal amplitude range in much the same manner and requires an increase in signal to noise margin. It has been shown (Refs. 29, 30) that the amplitude of this interfering component varies according to how much frequency space is allowed to the vestigial sideband, and, to some degree, to the particular shape of the cutoff. The wider this frequency space, the smaller will be the amplitude of the quadrature component. In actual data transmission practice the vestigial bandwidth used, as measured from the carrier to the frequency at which the response drops to a very low value, tends to run from some one-half to one-fourth of the nominal effective band. DATA TRANSMISSION 18-29 The amplitude of the quadrature component can also be changed by changing the depth of modulation of the signal. The depth of modulation is reduced by sending a finite amplitude (instead of the more usual zero amplitude) of carrier during a spacing signal. This reduces the quadrature component, and to that extent it reduces the impairment which it causes in the signal to noise ratio. It does, of course, also reduce the amplitude range of the signal between marking and spacing, and to that extent also impairs the signal-to-noise ratio. As the spacing carrier rises, this impair15r-------.-------,--------r-r----~ en a; .0 'g "0 10r-------~-------+--~~-;------~ ...: c: Q) E 'ro a. .S Q) en '0 Z 0.25 0.50 0.75 1.0 Ratio of spacing to marking signal FIG. 18. Noise impairment caused by quadrature component in vestigial sideband transmission. ment also rises, and the quadrature component impairment drops. At some value there is a minimum total impairment. A typical case has been worked out by ,Sunde (Ref. 30) and is illustrated in Fig. 18. Level Changes Transmission systems in general show some variation with time in overall net loss (or gain). This comes from a variety of causes, such as changes in temperature (and therefore resistance) of conductors, variation in battery supply, and aging or replacement of vacuum tubes. Analog SysteIn. An amplitude modulated analog system is especially vulnerable to received level change. A system engineered to a possible ±5 per cent error does not represent a very high performance. Yet if all the error is assigned to level change, this is required to be within less than ±Y2 db. This is a severe requirement for anything but a comparatively short direct wire circuit. 18-30 INFORMATION THEORY AND TRANSMISSION Because of this, an analog system usually uses a pilot channel of some type to transmit a reference amplitude. Also, frequency modulation is often preferred to amplitude modulation. Even in this case, however, in some carrier facilities, as was mentioned above in Sect. 2, there is a change in frequency which is analogous to a level change. Thus a pilot channel may be needed to send a reference frequency. Digital Systento In a binary digital system a level change cuts into the signal-to-noise margin somewhat in the same way as does an echo. If no change is assumed in the critical level distinguishing a mark from a space, then for the four grades of impairment considered before, the allowances are given in Table 3. TABLE 3. LEVEL CHANGE TOLERANCES r = Amplitude Ratio + 1)/2 = Amplitude Tolerance 0.89 0.71 0.50 0.32 0.95 0.86 0.75 0.66 (r Impairment, db 1 3 6 10 Tolerance, db 0.5 1.4 2.5 3.6 These are still fairly severe requirements, and usually some compensating device is provided in the system. This can be an automatic adjustment of the critical level at a given fraction of marking level, or an automatic volume adjuster for the marking level, or possibly even both. If a three-level signal is used, as in Fig. 12, the tolerances are correspondingly more severe. REFERENCES 1. A. B. Clark, Telephone transmission over long cable circuits, Bell System Tech. J., 2, 67-94 (1923). . J. T. O'Leary, E. C. Blessing, and J. W. Beyer, An improved 3-channel carrier telephone system, Bell System Tech. J., 18, 49-75 (1939). H. J. Fisher, M. L. Almquist, and R. H. Mills, A new single channel carrier telephone system, Bell System Tech. J., 17, 162-1S3 (193S). C. W. Green and E. 1. Green, A carrier telephone system for toll cables, Bell System Tech. J., 17, S(}--105 (193S). R. S. Caruthers, The Type N-1 carrier telephone system: Objectives and transmission features, Bell System Tech. J., 30, 1-32 (1951). 2. F. A. Cowan, R. G. McCurdy, and 1. E. Lattimer, Engineering requirements for program transmission circuits, Bell System Tech. J., 20, 235-249 (1941). R. A. Leconte, D. B. Penick, C. W. Schramm, and A. J. Wier, A carrier system for SOOO-cycle program transmission, Bell System Tech. J., 28, 165-1S0 (1949). DATA TRANSMISSION 18-31 3. F. A. Cowan, Networks for theater television, J. Soc. Motion Picture & Television Engrs., 62, 306-313 (1954). S. Doba and A. R. Kolding, A new local video transmission system, Bell System 'Pech. J., 34, 677-712 (1955). C. H. Elmendorf, R. D. Ehrbar, R. H. Klie, and A. J. Grossman, L-3 Coaxial system design, Bell System 'Peck. J., 32, 781-832 (1953). 4. R. E. Crane, J. T. Dixon, and G. H. Huber, Frequency division techniques for a coaxial cable network, Trans. Am. Inst. Elec. Engrs., 66, 1451-1459 (1947). K. E. Appert, R. S. Caruthers and W. S. Chaskin, Application and transmission features of a new 12-channel open-wire carrier system, Trans. Am. Inst. Elec. Engrs., 73, Pt. I, 18-27 (1954). 5. Radio Spectrum Conservation, Report of the Joint Technical Advisory Committee, McGraw-Hill, New York 1952. 6. A. A. Roetken, K. D. Smith, and R. W. Friis, The TD-2 microwave radio relay system, Bell System Tech. J., 30, 1041-1077 (Pt. II) (1951). 7. P. Mertz, Transmission line characteristics and effects on pulse transmission, Proceedings of tke Symposium on Information Networks, April 12-14, 1954, Vol. III, pp. 85-114, Polytechnic Institute of Brooklyn, New York. 8. W. M. Goodall, Television by pulse code modulation, Bell System Tech. J., 30, 33-49 (1951). 9. B. Lippel, A systematic survey of codes and coders, I.R.E. Convention Record, Pt. 8, Information Theory, pp. 109-119, 1953. 10. A. E. Laemmel, Design of digital coding networks, Proceedings of the Symposium on Information Networks, April 12-14, 1954, Vol. III, pp. 309-320, Polytechnic Institute of Brooklyn, New York. A. Feinstein, A new basic theorem of information theory, Trans. I.R.E., Professional Group on Information Theory, PGIT-4, pp. 2-22, Sept. 1954. P. Elias, Predictive coding, I.R.E. Trans. on Information Theory, IT-I, No.1, pp. 16-33, March 1955. D. Slepian, A class of binary signaling alphabets, Bell System Tech. J., 35, 203-234 (1956). C. E. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, Urbana, IlL, 1949. R. M. Fano, The transmission of information, Mass. Inst. Technol., Research Lab. Electronics, Tech. Rept., No. 65, 1949. 11. B. M. Oliver, Efficient coding, Bell System Tech. J., 31, pp. 724-750 (1952). N. M. Blackman, Minimum-cost encoding of information, Trans. I.R.E., Professional Group on Information Theory, PGIT-3, pp. 139-149, March 1954. 12. R. W. Sears, Electron beam deflecting tube for pulse code modulation, Bell System Tech. J., 27,44-57 (1948). 13. L. B. Wadel, Analysis of combined sampled and continuous-data systems on an electric analog computer, I.R.E. Convention Record, Pt. 4, pp. 3-7, 1955. G. Franklin, Linear filtering of sampled data, I.R.E. Convention Record, Pt. 4, pp. 119-128, 1955. S. P. Lloyd and B. McMillan, Linear least squares filtering and prediction of sampled signals, Proceedings of the Symposium on Network Theory, April 13-15, 1955, Vol. V, pp. 221-247, Polytechnic Institute of Brooklyn, New York. R. M. Stewart, Statistical design and evaluation of filters for the restoration of sampled data, Proc. I.R.E., 44, 253-257 (1956). 14. R. W. Hamming, Error detecting and error correcting codes, Bell System Tech. J., 29, 147-160 (1950). 18-32 INFORMATION THEORY AND TRANSMISSION M. J. E. Golay, Binary coding, Trans. I.R.E., Professional Group on Information Theory, PGIT-4, pp. 23-28, Sept. 1954. P. Elias, Error-free coding, Trans. I.R.E., Professional Group on Information Theory, PGIT -4, pp. 29-37, Sept. 1954. 1. S. Reed, A class of multiple-error-correcting codes and the decoding scheme, Trans. I.R.E., Professional Group on Information Theory, PGIT-4, pp. 38-49, Sept. 1954. R. A. Silverman and M. Balser, Coding for constant-data-rate systems, Trans. I.R.E., Professional Group on Information Theory, PGIT-4, pp. 50-63, Sept. 1954. 15. H. Nyquist, Certain topics in telegraph transmission theory, Trans. Am. Inst. Elec. Engrs., 47, 617-644 (1928). 16. A. 'V. Horton and H. E. Vaughan, Transmission of digital information over telephone circuits, Bell System Tech. J., 34, 511-528 (1955). 17. J. V. Harrington, P. Rosen, and D. A. Spaeth, Some results on the transmission of pulses over telephone lines, Proceedings of the Symposium on Information Networks, April 12-14, 1954, Vol. III, pp. 115-130, Polytechnic Institute of Brooklyn, New York. 18. D. Middleton, On the theoretical signal to noise ratios in FM receivers: A comparison with amplitude modulation, J. Appl. Phys., 20,334-351 (1949). 19. H. S. Black, Modulation Theory, Van Nostrand, Princeton, N. J., 1953. 20. W. M. Goodall, Telephony by pulse code modulation, Bell System Tech. J., 26, 395-409 (1947). 21. N. Marchand, Analysis of multiplexing and signal detection by function theory, I.R.E. Convention Record, Pt. 8, pp. 48-53, March 1953. 22. P. L. Chessin, A bibliography on noise, I.R.E. Trans. on Information Theory, IT-I, No.2, pp. 15-31, Sept. 1955. J. R. Pierce, Physical sources of noise, Proc. I.R.E., 44, 601-608 (1956). W. R. Bennett, Methods of solving noise problems, Proc. I.R.E., 44, 609-637 (1956). 23. S. O. Rice, Mathematical analysis of random noise, Bell System Tech. J., 23, 282-332 (1944); 24, 46-156 (1945). 24. F. K. Richtmyer and E. H~ Kennard, Introduction to Modern Physics, p. 189, McGraw-HilI, New York, 1942. 25. B. M. Oliver, J. R. Pierce, and C. E. Shannon, The philosophy of PCM, Proc. I.R.E., 36, 1324-1331 (1948). 26. P. Mertz, Influence of echoes on television transmission, J. Soc. Motion Picture & Television Engrs., 60,572-596 (1953). H. A. Wheeler, The interpretation of amplitude and phase distortion in terms of paired echoes, Froc. I.R.E., 27, 359-385 (1939). 27. W. R. Bennett, Time division multiplex systems, Bell System Tech. J., 20, 199221 (1941). 28. H. Nyquist and S. Brand, Measurement of phase distortion, Bell System Tech. J., 9, 522-549 (1930). 29. H. Nyquist and K. W. Pfleger, Effect of the quadrature component in single sideband transmission, Bell System Tech. J., 19, 63-73 (1940). 30. E. D. Sunde, Theoretical fundamentals of pulse transmission, Bell System Tech. J., 33,721-788,987-1010 (1954). 31. P. M. Woodward, Theory of radar information, Trans. I.R.E., Professional Group on Information Theory, PGIT-l, pp. 108-113, Feb. ~953. 32. Guide to application and treatment of channels for power-line earrier, Trans. Am. Inst. Elec. Engrs., Pt. III-A, 73, 417-436 (1954). FEEDBACK CONTROL E. FEEDBACK CONTROL w. M. Gaines, Editor 19. Methodology of Feedback Control, by W. M. Gaines 20. Fundamentals of System Analysis, by S. J. Jennings and A. A. Winkeljohann 21. Stability, by W. E. Sol/ecito and S. G. Reque 22. Relation between Transient and Frequency Response, by C. E. Bradford and M. W. DeMerit 23. Feedback System Compensation, by P. G. Cushman 24. Noise, Random Inputs, and Extraneous Signals, by D. L. Lippitt 25. Nonlinear Systems, by W. M. Gaines 26. Sampled-Data Systems and Periodic Controllers, by J. E. Barnes, Jr. E FEEDBACK CONTROL Chapter 19 Methodology of Feedback Control w. M. Gaines 1. Symbols for Feedback Control 19-01 2. General Feedback Control Definitions 19-04 3. Feedback Control System Design Considerations 19-12 4. Selection of Method of Synthesis for Feedback Controls 19-19 19-21 References 1. SYMBOLS FOR FEEDBACK CONTROL Alphabetical List by Letter Symbols Terminology given in Table 1 is for feedback control covered in the following chapters. In the case of specific physical examples, the terminology of the particular field from which the example is taken will be used; for example, in an electrical example, e may be used for voltage and i for current. The last column of the table lists the chapter where the symbol is first used. This reference may be useful to the reader for looking up discussions of the various quantities. The nomenclature used is patterned after the standard nomenclature and symbols of the American Standards Association (Ref. 1). Capital letters will be used to represent the Laplace transforms of the time functions; for example, A(s) is the Laplace transform of aCt). An asterisk (*) indicates that the quantity is in sampled form; for example, e*(t) is the sampled form of the signal e(t). 19-01 TABLE 1. LETTER SYMBOLS FOR FEEDBACK CONTROL Symbols a A aCt) b,B B bet) c(t) c*(t) d,D D D(s) e(t) e*(t) J J(t) get) GD'(lxl, w) or GD h(t) Use or Term Arbitrary constant and/or coefficient for differential equation Arbitrary constant for time response equation Impulse response of reference input terms (function of time) Arbitrary constant for time respons,e equation Magnitude of deadband Primary feedback variable (function of time) Controlled variable (function of time) Sampled form of c(t) Arbitrary constants for time response equation Magnitude of negative deficiency; a denominator term; used also as a subscript Polynomial in s usually the denominator Actuating signal (function of time) Sampled form of e(t) Frequency, cycles per second (see definition of w) Arbitrary variable (function of time) Impulse response of forward element (function of time) Describing function £ £-1 m met) Impulse response of feedback elements (function of time) Magnitude of hysteresis ith term in a series, used as subscript Ideal value of the ultimately controlled variable (function of time) Complex number, Y=1 kth term in a series, used as a subscript Gain constant for system Dynamic error coefficients, subscript indicates associated derivative Static position error coefficient Static velocity error coefficient Static acceleration error coefficient Denotes application of the Laplace transform integral Inverse Laplace transform Used as a subscript to denote mth term in series Manipulated variable (function of time) M Magnitude of Mm Maximum value of n net) Use as a subscript to denote nth term Output of nonlinear element Particular value of n; a numerator term; a subscript Polynomial in s usually the numerator H i i(t) j k K Ko, K 1 , K 2 , etc. Kp Kv Ka N N(s) ~ (jw) , i.e., I~ (jw) I I~ (jw) I 19-02 First Used Chap. 20 Chap. Chap. Chap. Chap. 25 20 20 26 Chap. 25 Chap. 22 Chap. 20 Chap. 26 Chap. 20 Chap. 25 Chap. 20 Chap. 25 Chap. 20 Chap. 20 Chap. 20 Chap. 20 Chap. 20 Chap. 20 Chap. 21 Chap. 21 Chap. 22 TABLE Symbols p,p pn 2 p(x) p P(x q(t) ret) R tT td tp t8 T u(t) vet) wet) x(t) } yet) ye(t) Yd(t) z(t) z Zn Z a 'Y 8 A ~ f e ~} 7r II(x) n = n) 1. LETTER SYMBOLS FOR FEEDBACK CONTROL (Continued) First Use or Term Used Differential operator, p = d/dt, p2 = d2/dt 2 Chap. 20 nth pole Chap. 22 Probability distribution of x Chap. 24 N umber of poles in right half of s-plane Chap. 21 Probability function Chap. 24 Indirectly controlled variable (function of time) Chap. 20 Reference input variable (function of time) Chap. 20 Number of counterclockwise rotations of a vector Chap. 21 from -1 + jO to H(jw)g(jw) locus as w varies from 0 to jrxl to -jrxl to -0 Magnitude or level of saturation Chap. 25 Laplace transform operator = (J' + jw Roots of numerator of G(s), zeros; Zn also used in this Chap. 22 case Roots of denominator of G(s), poles; Pn also used in Chap. 22 this case Time, seconds Rise time, seconds Chap. 22 Delay time, seconds Chap. 22 Time to first peak or overshoot of transient, seconds Chap. 22 Settling time, seconds Chap. 22 Time constant, seconds Chap. 20 Disturbance variable (function of time); step function Chap. 20 Desired value or command variable (function of time) Chap. 20 Impulse response of given element Variables used when standard termmology for feedback systems is not applicable System error (function of time) Chap. 20 System deviation (function of time) Chap. 20 Indirectly controlled system impulse response Chap. 20 z transform operator Chap. 26 nth zero Chap. 22 Number of zeros in right half of s-plane Chap. 21 Phase angle of closed loop frequency response Chap. 21 Phase margin Chap. 21 Increment; Dirac function, impulse function Chap. 20 Incremental change in variable, usually used Ax, Ay, etc. Denotes equality by definition RelatIve dampmg, damping tactor Phase of angle of open loop frequency response Chap. 21 Frequency, damping, used when (J' and ware not applicable 3.14159 Product sign meaning Xl' X2' X3' X4 • ••• • Xn Chap. 20 Standard deviation (probability); decrement factor 19·03 Chap. 24 Chap. 20 19-04 FEEDBACK CONTROL TABLE ~ n Symbols (x) r cp(r)
-< o "T1 "T1 m m o0' » () A () o z ~ :::c o r- "P -' 01 ~ TABLE Type 9. Minimum error criterion 10. Phase margin 11. Gain margin 4. COMMON PERFORMANCE SPECIFICATIONS Specified Transient or frequency response Frequency response Frequency response 12. Mm peak Frequency response 13. Band width Frequency response Definition The response of the system is adjusted to minimize a function of the total error that results from both signal and noise or extraneous signals. The criterion may take several forms, e.g., min. squared error, min. absolute error times time. Defined as 180 0 + phase shif t at unity gain of the open loop frequency response. Gain margin is ratio of maximum stable gain to actual gain, i.e., gain at phase crossover. Ratio of maximum of closed loop frequency response to a low frequency value. Defined variously (a) usually as frequency where closed loop response falls to V1or3dbofitslowfrequency value, or (b) sometimes as the frequency at the significant peak M m, (Continued) General Remarks Used to optimize system response to reject unwanted noise, and pass the true signal. Used to specify performance index when just signal is considered. Within basic assumptions frequency response analysis is very useful. Used on systems which operate on random or noisy data, e.g., missile radar guidance and fire control. Analog computers can be used to apply criterion to "nonlinear systems. Used as a rule of thumb in frequency response analysis to indicate stability and performance. Easy to use and to obtain directly from frequency response diagram. Same as 10. Indicates relative sensitivity of system to gain variations. Can be calculated by Routh's criterion. Not as good a criterion for performance as 10. Little used. Used with Nyquist and frequency response analysis. Rules of thumb relate Mm and transient overshoot. Easy to calculate from frequency response diagram. Used with frequency response analysis and is related to speed of response of system. Used also when definite frequency bandpass is needed for fidelity. M m , bandpass, and the phase shift at these valves give a good indication of the closed loop response and are often used when a number of closed loops are 'fJ ..... 0. ." m m o O:J » () A () o Z -I :::c or- 14. Static error coefficient 15. Dynamic error coefficients (or steady-state error coefficients) 16. Maximum system error Frequency response Frequency response and root locus Transient response or (c) the crossover of the open loop response. Defined as the final error resulting from a continuous input of position, or velocity, or acceleration, etc. The magnitude of the input and the maximum tolerable error must be specified. Defined as the steady-state error resulting from the derivatives of the input function. The time function and/or its derivatives must be specified as well as the maximum tolerable error. Defined as the maximum tolerable system error, Yeo The input function and opera ting condi tions must be specified. operated in tandem as system. Used to set low-frequency gain of open loop frequency response. Useful where steady inputs are encountered. ~ m -t :c Relates system gain and time constants to errors arising from higher derivatives of input. Used to estimate error resulting from varying input to given system and conversely to determine closed loops pole-zero location to give desired error. Accurate where input varies at slow rate compared to bandpass. Becomes poorer as input varies more rapidly because of transient effects. Used in analysis of fire controls, machine controls, etc., where input varies in an expected manner. Distinguished from steady-state error because maximum error under dynamic conditions is specified. Normally used to define performance with a varying input, e.g., automatic milling machine control. Not usually used with simple aperiodic inputs. Used in conjunction with minimum error criterion (9) to place absolute bound on error. o o o ro G> -< o "'T1 "'T1 m m o c;I }> n A n o Z -t AI or- ~ 'I TABLE 4. COl\nION PERFORMANCE SPECIFICATIONS Type 17. Resolution Specified Low level characteristics 18. Duty cycle Power element rating 19. Maximum operating conditions Power element rating Definition Defined as the maximum tolerable change in the input without a change occurring in the output. Input and operating conditions must be specified. Defined variously, depending upon application. Intent is to define the average power requirement. Included to indicate the wide variety of maximum performance requirements sometimes specified, e.g., maximum velocity, maximum load torque. (Continued) General Remarks Can appear in various forms, i.e., maximum position input change required to obtain output change, or minimum velocity at which a servomechanism will track with tolerable velocity error. Objective of specification is to allow more efficient selection and/or design of the power element. Specification can take the form of an rms power requirement or where an average does not adequately describe the situation a time distribution and level may be given. Used extensively when large power drives are involved. Many of these limits are implied by other performance requirements. Often necessary to define implicitly load requirements separately, e.g., load running torques or power (independent of accelerating torques). 'P -' ex> "'T1 m m o ~ » n A n o z-t ;:0 or- METHODOLOGY OF FEEDBACK CONTROL 19-19 c. Practical Aspects. The ultimate cost and manufacturability must be considered during the synthesis. This, of course, implies ascertaining the physical realizability of the controller and assuring that practical tolerances are maintained. Reliability and ease of servicing must also receive proper consideration. Environmental conditions and customer requirements on component packaging must also be factored into the mechanical design and may affect the performance. 5. Test and Evaluation of EquipIllcnt. In most cases unpredicted and secondary effects will require final adjustment to be made after the actual equipment is assembled. This is often the more economical way to reach a final design when a wide range of adjustment can be included in the design or preliminary models can be built and tested relatively fast. This would be the case for many types of instrument servos. On the other hand, it would be horrendous to attempt this approach with an elaborate, expensive missile system which is expended at each test firing. In such cases the extensive use of analysis and computer facilities to minimize the testing is justified. 4. SELECTION OF METHOD OF SYNTHESIS FOR FEEDBACK CONTROLS The major analytical methods available to aid in the synthesis of feedback control systems are summarized in Table 5. No general rules are available for the selection of the proper method, and the designer should be familiar with all methods in order to select the one best suited to his problem. It is often desirable to carry root locus and frequency response diagrams in parallel. The root locus supplies time domain information, and the frequency response provides the simplest method of estimating the method of compensation. SysteIll OptiIllization N one of these techniques allows a completely systematic design approach. The major difficulty is in defining and specifying optimum performance and determining what performance index to use for evaluation. See Chap. 24. Although criteria have been proposed, the mathematical labor involved in the more sophisticated is prohibitive. Actually the accuracy and extent of the available data usually warrant the use of only the more simple criteria. These criteria do not encompass the entire problem and therefore must be used carefully. The material in the following chapters presents the available useful design criteria. FEEDBACK CONTROL 19-20 TABLE 5. SUMMARY OF MAJOR ANALYTICAL TECHNIQUES FOR FEEDBACK CONTROL SYSTEM ANALYSIS Type 1. Differential equations 2. Routh Hurwitz criterion 3. Root locus 4. Frequency response 5. Describing functions 6. Closed loop pole-zero location Usefulness Classical solutions of differential equations are generally too involved for practical use in synthesis. N ondimensional performance charts help on second order systems. Significance of individual system component values difficult to ascertain. Used to determine the limiting stability conditions. Can be extended to include damping factor only with difficulty. Limited usefulness. The best solution to the problem of directly synthesizing the time response. Particularly useful when the performance specifications are in terms of the time response. Construction of the diagrams can be time-consuming and the performance can be sensitive to small relative changes of locus in low-frequency region. The most used approach presently available. The locus can be plotted in the form of a Nyquist, log magnitude-angle diagram, or the log magnitude and phase diagram. The latter has the advantages of easy construction by templates and of easy introduction of compensating characteristics. Easy to include experimental data in frequency response analysis. The difficulty of relating transient and frequency response is a limitation. An extension of the frequency response techniques to nonlinear systems. Good performance criterion not available. Method can treat higher order systems. Requires determining realizable and practical components after the definition of the system response. Not in wide use as yet but possesses the good feature of working directly from the desired closed loop response. Described in Chap. 20 Chap. 21 Chaps. 21, 23 Chap. 23 Chap. 25 Chaps. 22, 23 The Use of Computers This has supplanted much of the paper design study. This approach allows rapid and complete (often visual) evaluation of the expected system performance. At the present state of the art, however, it is not possible to obtain a complete design from the computer without interpretation at various steps by the design engineer. The ultimate use of the computers will occur when a complete systematic design can be programmed; but this cannot be done until mathematical expressions can be equated to the decisions now based upon "engineering judgment." METHODOLOGY OF FEEDBACK CONTROL 19-21 Availability of computers has not eliminated the need for a thorough knowledge of the standard feedback control techniques for analysis. Although, when an analog computer facility is available, the conventional analytical techniques are used principally for preliminary, order-of-magnitude estimates and for verifying computer solutions, experience has shown that a thorough knowledge of alternate techniques will enhance the usefulness of the computers. REFERENCES 1. Letter Symbols for Feedback Control Systems, ASA YI0.13-1955, American Standards Association, New York, July 1955, Sponsored by American Society of Mechanical Engineers. 2. LR.E. Standards on Terminology for Feedback Control Systems, 1955 Proc. I.R.E., 44, No.1 (1956). 3. Proposed Symbols and Terms for Feedback Control Systems, A.S.E.E. Subcommittee Rept., Elec. Eng., October 1951. E FEEDBACK CONTROL Chapter 20 Fundamentals of System Analysis S. J. Jennings and A. A. Winkeljohann 1. Representation of Physical Systems 2. Classical Methods of Analysis 3. 4. 5. 6. Block Diagrams System Types Error Coefficients Analysis of A-C Servos: Carrier Systems References 20-01 20-28 20-56 20-66 20-70 20-79 20-84 1. REPRESENTATION OF PHYSICAL SYSTEMS Methods of System Analysis In order to study the performance of a physical system, equations must be written from the physics of the situation to describe the excursion of all variables. To describe the operation of a physical system in mathematical form, its differential equations may be written which, in general, will be nonlinear in character. In many cases it is possible, by restricting the region for which results are valid, to write linear differential equations with constant coefficients for the system. The solution of the linear differential equation then yields the complete steady-state and' transient response of the system for a given input. The transient response indicates the system stability while the steady-state response to a sinusoidal input is very useful in system synthesis. 20-01 FEEDBACK CONTROL 20-02 TABLE 1. Parameter Translation systems: Mass Spring DefOrma? TYPICAL COMPONENT EQUATIONS (Ref. 10) Equation Description d 2x M dt 2 The net force acting on a body is equal to its mass times its acceleration with respect to an arbitrary fixed reference. 11 - 12 = dx = ~ M dt ; f (11 - h) dt dx 1 df The force which must be applied to each end of a spring to deflect it a distance x is equal to the spring constant K times x. f = D (dXl _ dX2) The force which must be applied to each end of a dashpot to produce a relative motion of its two ends is equal to the viscous damping coefficient D times the relative velocity. f = Kx; dt = K dt f~JXFree f'~length Dashpot (viscous damping) dt dt The net torque acting on a body is equal to its inertia times its angular acceleration with respect to an arbitrary fixed reference. Rotational systems: Inertia Oc!l~ ., "\.(0/q; Torsional spring Rotational dashpot The torque which must be applied to each end of a torsional spring to produce a relative angular deformation 01 - O2 of its two ends is equal to the rotational spring constant times the angular deformation. dO q=Bdt The torque which must be applied to a rotational dashpot to cause it to rotate with an angular velocity is equal to the rotational viscous damping coefficient times the angular velocity. FUNDAMENTALS OF SYSTEM ANALYSIS 20-03 TABLE 1. TYPICAL COMPONENT EQUATIONS (Ref. 10) (Continued) Equation Parameter Electrical systems: Inductance di Vl - V2 = L dt ; The voltage drop caused by current flowing through a capacitance is equal to the integral of the net current flowing through the capacitance divided by its capacitance. Capacitance Resistance R ~ OJ x M K D 112 Description The voltage drop caused by current flowing in an inductance is equal to the inductance times the rate of change of the net current flowing in the direction of the drop. Vl - V2 = Ri; The voltage drop caused by current flowing through a resistance is equal to the net current flowing through the resistance multiplied by the resistance. English Gravitational Units q torque, lb-ft time, seconds inertia, slug-ft2 distance, feet J () angle, radians mass, slugs G torsional spring conspring constant, stant, lb-ft/rad lb/ft B rotational damping codamping coeffiefficient, lb-ft/rad/sec cient, lb/ft/se~ v i L C R Electrical Units time, seconds vol tage, volts current, amperes inductance, henrys capacitance, farads resistance, ohms The solution to differential equations by either the direct method or by Laplace transformations is useful primarily in the analysis of a given system with all parameters prescribed. This approach is less useful in the design or synthesis of a control since the effect of the variations of parameters on the exponential time function exponents is difficult to visualize. For more complex systems the problem of factoring the high order polynomial characteristic equation becomes quite laborious. For synthesis the root locus, frequency response, and closed loop pole zero location methods are recommended. (Chaps. 21, 22, and 23.) Even for analysis the classical time solution has been largely supplanted by the wide usage and availability of analog computers. As a result the 20-04 FEEDBACK CONTROL classical techniques of solution are used primarily as checks on analog computer results or as aids in visualizing the basic performance. The charts included in Sect. 2 are useful in this case. Although the solution of differential equations is no longer of paramount importance, the correct description of the system or component dynamic performance by differential equations is basic to all methods of analysis and synthesis. It is most important that the control designer understand differential equations and 'their application to his field of endeavor. A suggested approach for obtaining these physical equations is: 1. Understand the system well enough to draw a schematic diagram showing the relationship of all variables, including all pertinent components as well as the load. 2. Replace the schematic with equivalent circuits or analogies. 3. Rearrange this diagram into convenient noninteracting sections or blocks. 4. Write the characteristic equation of each section from the functional relationship. 5. Obtain the transfer function from these equations. 6. Simplify this block diagram and obtain the complete system characteristic equation by algebraic manipulation. This sequence is an analysis approach; synthesis of a system reverses this method after starting with known requirements to obtain a system equation. Physical Laws. To write the equations which mathematically describe the system or component performance, it is necessary to understand the basic operation of the device and the physical laws governing the various processes involved. The wide field of application of feedback control theory makes it prohibitive to list all the fundamental laws that might be required. The following partial list of textbooks in the particular field of interest for these physical laws and Table 1 are useful. 1. Physics: Erich Hausmann and E, P. Slack, Physics, Van Nostrand, Princeton, N. J., 1948. 2. Electrical: W. L. Everitt, Communication Engineering, McGraw-Hill, New York, 1937. 3. Thermodynamics: P. J. Kiefer and M. C. Stuart, Principles of Engineering Thermodynamics, Wiley, New York, 1954. 4. Fluid Mechanics: R. C. Binder, Fluid Mechanics, Prentice-Hall, New York, 1949. 5. Kinematics: J. L. Synge and B. A. Griffith, Principles of Mechanics, McGraw-Hill, New York, 1949. 6. Circuit Analysis: E. A. Guillemin, Mathematics of Circuit Analysis, Wiley, New York, 1949. 7. Materials: Stephan Timoshenko, Strength of Materials, McGraw-Hill, New York, 1953. FUNDAMENTALS OF SYSTEM ANALYSIS 20-05 8. Hydrodynamics: H. Lamb, Hydrodynamics, The University Press, Cambridge, England, 1932. 9. Mechanics: F. B. Seely, Analytical Mechanics for Engineers, Wiley, New York, 1952. EXAMPLES. The following examples illustrate the use of basic physical laws and Table 1 in obtaining the equations describing the system performance. Whenever possible, simplifying initial conditions are chosen. Electric circuit. FIG. 1. 1. An electric circuit such as Fig. 1 requires: The summation of voltage drops in a closed loop is equal to zero. diet) 1 E cos wt = Ri(t) Li(t) dt. (1) C 0 dt KIRCHHOFF'S LAW. + E +- it X »m;;;;;;;;;;J77/T//// Viscous damping, D FIG. 2. Damped spring mass system. 2. A spring mass system such as Fig. 2 requires: The summation of forces acting on a body equals the change in momentum. NEWTON'S LAW. (2) d2x dx M - 2 = - D - - Kx dt dt or (M S2 + Ds + K)x = O. 3. A combined electrical and rotational mechanical system is a d-c motor with fixed field excitation (ignoring armature inductance) and a pure inertia load is shown in Fig. 3. 20-06 FEEDBACK CONTROL FIG. 3. D-c motor with inertia load. Summing voltage drops: E(t) = Ri(t) (3) + KeN(t), where Ke = motor voltage constant, N(t) = motor speed. Summing torques: dN(t) . (4) J - - = Ktz(t) dt ' where K t = motor torque constant, J = motor inertia. Eliminating i(t) from eqs. (3) and (4) results in the transfer function of output speed to input voltage: N(t) (5) 1 1 E(t) where s = d/dt, Tm = RJ/KeKt = time constant. 4. A common electromechanical system is a synchro with a pure inertia -G~ 81 Kt Z r) [7H !l 82 J B FIG. 4. Schematic of synchro system. load and with viscous damping as shown schematically in Fig. 4, where K t is the torque gradient. FUNDAMENTALS OF SYSTEM ANALYSIS 20-07 Summing torques: (6) Further examples are used throughout this section. Circuit Simplification Techniques A nalogies are useful in setting up physical systems and interpreting their boundary conditions since this approach compares known systems with the unknown. Often thermal, mechanical, hydraulic, etc., systems are converted to an electrical equivalent since electric circuit analysis methods have been developed to a high degree. Examples of conversions of physical systems to electrical equivalents are given in Ref. 2. ANALOGIES. The following equations show the analogies among three systems: Equations System (7) d2 x M dt 2 dx + D dt + Ksx = J(t) Mechanical translatory system Mechanical rotation system Electric circuit where M = mass, slugs, D = damping, Ib/ft/sec, Ks = spring gradient, Ib/ft, x = distance, ft, K t = torque gradient, lb/ft, () = angular displacement, rad, L = inductance, henrys, R = resistance, ohms, J = inertia, slug-ft 2 , F = friction, lb/rad/sec, C = capacitance, farad, q = charge, coulombs. Analogous elements are listed in Table 2. "-l TABLE 2. a b ANALOGOUS ELEMENTS (Ref. 2) 00 Electrical elements Electrical resistor Electrical capacitor :"'tR~: o qq;-o = R d(ql - q2) E Ec = dt R R = resistance q = charge Ie o C Electrical inductor 'I,L 0 1 C(ql - q2) = capacitance d 2q EL=L-2 dt L = inductance ." m m 0 c:J » n A n 0 Z Viscous damper Mechanical elements (translational) 1--Jr l l~%2 i = D d(Xl x - X2) dt = displacement D = damping coefficient Spring l-- tx) '(_t.. f = K(XI - X2) K = spring constant Inertia ~j i= 2 Md x dt 2 M = inertia -I ;;0 0r- Mechanical elements (rotational) Torsional damper Shaft stiffness Ie r G K y B Inertia ~ ." 2 T = K({}l - (}2) T = Jd {} dt 2 C Z o > ~ m B = damping coefficient K = stiffness coefficient J = moment of inertia Z -i > r- en o ." Fluid capacity Fluid resistance Hydraulic elements en -< en -i Rh q.--- -q2 Rh = hydraulic resistance q = rate of flow Q = quantity of flow (Q2 = 0 in electrical analog) ~q 2 - > > r- P Ui q . - Ch P = Rhq = Rh dQ dt m ~ Z -< en = Ql - Q2 Ch Ch = hydraulic capacity Ql = quantity of inflow Q2 = quantity of outflow p. = pressure t-.) o b '0 FEEDBACK CONTROL 20-10 FIG. 5. Wye-delta transformation. Aids for Circuit Shnplification. The following techniques are useful in reducing the system equations to simpler form: WYE-DELTA TRANSFORMATION. The circuits of Fig. 5 are equivalent if the following relations are satisfied: (10) (11) (12) Zl= Z2 = Za = ZbZc Za + Zb + Zc ZaZc Za + Zb + Zc ZaZb Za + Zb + Zc (13) Za = (14) Zb = (15) Zc = Z lZ2 + Z2 Z a + Z3 Z 1 Zl Z l Z2 + Z2 Z a + ZaZl Z2 Z l Z2 + Z2 Z a + ZaZl Z3 SUPERPOSITION. If a system is linear the system response to several inputs will be the sum of the response to each input separately (refer to Fig. 6). , Linear system characteristic equation g g(x) + g(y) + g(z) + g(w) = FIG. 6. Output , Superposition. THEVENIN'S THEOREM. The effect of any impedance element in a circuit may be determined by replacing all the voltage sources by a single equivalent voltage source and all other impedances by a signal impedance in series with the impedance of interest. For Fig. 7, the equivalent voltage Eab is equal to the open circuit voltage that is present across a-b with the circuit broken at a-b. FUNDAMENTALS OF SYSTEM ANALYSIS 20-11 Zab a b Network containing impedances and voltage sources I=~ Zab+Z FIG. 7. Thevenin's theorem. N ORTON'S THEOREM. The current in any impedance ZR, connected to two terminals of a network, is the same as if ZR were connected to a constant-current generator whose generated current (Isc) is equal to the current which flows through the two terminals when these terminals are short-circuited, the constantcurrent generator being in shunt with an impedance equal to the impedance of CD Network containing impedances and voltage sources ~ IR 01----.1 L..-._ _ _ _ _- ' FIG. 8. Equivalent circuits using Norton's theorem. the network looking back from the terminals in question. This theorem is similar in many respects to Thevenin's theorem. It is illustrated by Fig. 8. Nodal and Mesh Analysis. A general approach to circuit analysis is illustrated by Fig. 9a, band eqs. (16) through (21). In the nodal analysis + (a) FIG. 9. (b) (a) Nodal approach; (b) mesh or loop approach. the summation of currents at a junction or node is equal to zero. This is useful in solving for an unknown voltage, given driving voltages and im- FEEDBACK CONTROL 20-12 pedances. From Fig. 9a, (16) e - el el el ------=0 Zl Z2 Z3 ' (17) el = (18) eZ2Z3 Z2 Z 3 + Z l Z2 + Z l Z3 . Use of the mesh analysis uses voltage summations about the closed loops. Usually the unknown current, such as i2 of Fig. 9b is found in terms of the known voltages and impedances. From Fig. 9b, to solve for i2 by using determinates: (21) Zl + Z2 -Z2 -Z2 Z2 + Z3 Tables of Typical Transfer Functions. The transfer function of a system or element is the ratio of the transform of the output to the transform of its input under the conditions of zero initial energy storage. It is a com- plete description of the dynamic properties of a system and may be represented as a mathematical expression of the frequency response, or the time response to a specified input. In Tables 3 to 5 are summarized typical transfer functions in Laplace transform form for typical mechanical, electrical, and hydraulic control elements. For a more complete tabulation of transfer functions of RC networks see Chap. 23, Sect. 2. Tables 6 to 8 consist of three sections of a morphological table of servo components appearing in Ref. 11. Further material on the subject of transfer functions may be found in Refs. 3, 11, 12. TABLE 3. SUMMARY OF TRANSFER FUNCTIONS FOR REPRESENTATIVE MECHANICAL ELEMENTS (Ref. Mechanical Elements: Transfer Functions N omencla ture Rotation Rotation Spring mass damper 3a) Oz = load angular position, radians, Om = motor position, radians, J = moment of inertia, pound- 8 l (s) 8 m (s) foot-seconds/second, n = gear ratio, Ks = shaft spring constant, poundfeet/radian, B = damping torque coefficient, pound-feet/radian/second. l/n (J /Ks)S2 + (B/Ks)s + 1 "'T1 c: Z o » ~ m Z -t ».- en o "'T1 Translation Translation Mass M Spring mass damper Xes) Yes) Spring-dashpot (lag) Xes) _ 1 Yes) - (D/Ks)s Spring-dashpot (lead) Xes) _ (D/Ks)s Yes) - (D/Ks)s + 1 1 = (M/K)S2 + (D/K)s + 1 Damp~ +1 1: %(1) x = mass displacement, feet, y = platform displacement, feet, M = mass, pound-seconds/second/ foot, D = damping coefficient, pounds/ foot/second, K, Ks = spring constant, pounds/foot. en -< en -t m ~ » z » ~ en Ui 1 yet) ~ Displacement reference t-.) tp -- Co) TABLE 4. SUMMARY OF TRANSFER FUNCTIONS FOR REPRESENTATIVE ELECTRIC ELEMENTS (Ref. 3a) Electric Elements: Transfer Functions D-C motor For speed control N(s) 1 Vacs) = Ke(TmS + 1) For position control 1 8(s) Vacs) = KeS(TmS + 1) D-C generator and motor For position control Kg/KeR Ec(s) = s(Trs + l)(TmS Nomenclature D-CMotor "ut 8(s) ',"'lje3 Kg = generator voltage constant, volts/field ampere, R = series resistance of motor and genera tor armature circuit, ohms, Tf = generator field time constant, seconds, Ec = voltage applied to generator control field, volts, Va = voltage across drive motor terminals, volts. Kl Js2(TlS + 1) to ~ "'woo,,", ~ ~I ttt Consta nt flux "mm oc:J » n A n o z -I AI or- Galvanometer Galvanometer 8(s) 1(s) N = velocity of motor, radians/second, () = position of motor, radians, Va = applied voltage, volts, Ke = voltage constant of motor, volts/radians/second, T m = motor time constant, seconds. D-C Generator + 1) ~ ~ .... J:o,. J = moment of inertia of galvanometer element, pound-foot-seconds/second, Kl = torque coefficient, pound-feet/ampere, Tl = time constant of galvanometer coil circuit, seconds, () = position of galvanometer element, radians, I = signal current, amperes. Gyroscope Gyroscope n(S) K2 1(s) = J S(T2S + 1 ~ 1-;'~ Precession coil ttt Constant flux Stabilizing networks For rate signals (phase lead) Eo(s) Ts Ein(s) = Ts + 1 For integral signals (phase lag) Eo(s) 1 Ein(S) = Ts + 1 J = moment of inertia of gyroscope, pound-footseconds/second, Q = angular velocity of gyroscope, radians/second, JQ = angular momentum of gyroscope, pound-footsecond, K2 = torque coefficient, pound-foot/ampere, T2 = time constant of gyroscope precession coil circuit, seconds, I = signal current, amperes. Eo(s) TIT2S2 + (T I + T 2)s + 1 Ein(S) = TIT2S2 + (T I + T2 + T I2)S + l' Tl2 »TI + T2 Z o 3: m » -i Ein = input voltage, volts, Eo = output voltage, volts, T = RC, time constant, seconds. :---t~ R~ tEo » r- (J) o on (J) E·nI .f1IE' -< (J) Phase Lag T = RC, time constant, seconds. -i m 3: » » E z • For rate and integral (lead-lag) C Z Phase Lead E~l on Lead-Lag TI = RICI) T2 = R 2C2 = time constants, seconds. Tl2 = R IC2 ~ (J) Ui ~ ~ -01 ~ C? ..... 0- TABLE 5. SUl\fMARY OF TRANSFER FUNCTIONS OF REPRESENTATIVE HYDRAULIC ELEMENTS x = piston displacement from neutral, feet, y = input displacement from neutral, feet, C1 = piston velocity per valve displacement, second-I, C2 = piston travel per valve displacement. Cl Spring load dominant For phase lag a, b = linkage distances, feet, Tv = a +C b , valve effective time constant, seconds. Xes) _ C Yes) - 2 a Valve-piston linkage For phase lag Cylirrder Xes) b/a yes) = Tvs + 1 1 For phase lead 1, b, d = linkage distances, feet, Tv = (1 : b)C' valve time constant, seconds, Valve For phase lead Xes) d [T3 S Yes) = 1 + b T"s T = Tv(1 3 + 1J +1 » () Val ve-Piston Linkage =-;- ; T3 > onds. Tv "'T1 m m oOJ Nomenclature Hydraulic Elements Valve-piston Load reaction negligible Xes) yes) (Ref.3a) +d b)(b + d) ' lead time constant sec, A () o Z -f :::0 o r- Hydraulic Motor Hydraulic motor 'With compressibility 8(s) Yes) Sp/dm m 2 'With negligible compressibility 8(s) Yes) = Sp/dm [LJ s d m 2 fI ed ad 6nes + dLJ s + 1] [VJ 2 s Bdm 2 s Rotatab~ Rot.ling Cjllnd"b!ocl.~"de ... rblocl ~lIt~:~ + 1] s Motor load 8 shi" St.t"",,, va!veplatO$ PIstons Motor 8 = motor position, radians, y = displacement of pump stroke from neutral, feet, Sp = flow, cubic feet/second, from pump per unit displacement, y, feet, dm = motor displacement, cubic feet, J = moment of inertia, pound-foot-seconds/second, L = leakage coefficient, cubic feet/second/pound/ square foot, V = total oil volume under compression, cubic feet, B = oil bulk modulus, pounds/square' foot. ." c: Z o » ~ m Z > r- en o ." en -< en -t m ~ » » r- z ~ Ui to.) C? --........ ""C? --' TABLE 6. ERROR DETECTORS (REF. 00 11) (a) Type (b) Main No. Application (a) D-c or a-c resistance bridge (b) Position control (a) D-c tachom- eter bridge (b) Speed con- trol (a) A-c mag- netic bridge (b) Position con- trol, particularly for gyro pickups where very small forces prevail 4 (a) A-c synchro- system (b) Position control where continuous rotation is desired Operation Possible Modifications Error voltage, x, appears when the position of the moving arms of the potentiometers A and B are not matched. The power source, E, is applied across both potentiometers. A measures reference position as voltage and B regulated position as voltage, their difference being x. Potentiometer can be wound on a helix to get more than 360 of rotation. Error voltage, x, appears when speeds of tachometers A and B vary. A measures reference speed as a voltage and B regulated speed as a voltage. The difference between these voltages is x. A can be replaced by Error voltage, x, appears when relative positions of rotor A and stator B do not match. Rotor A measures reference position magnetically and stator B regulated position magnetically. Voltage E, across exciting coil, L, provides energy. When rotor covers unequal areas of each exposed stator pole (unbalanced magnetic bridge) pickup coils M and N have unequal voltages induced. Voltage difference is x. Error voltage, x, appears whenever the relative positions of the rotors of synchro-generator, A, and synchro-control transformer, B, are not matched. The reference position is measured by A as a magnetic flux pattern which is transmitted to the synchro-control transformer through the interconnected stator windings. If the rotor of B is not exactly 90 0 from the transmitted flux pattern, x is produced. Operating Features A and B can be remote. Continuous rotation not possible. Accuracy Limited by Potentiometer winding. Features DeterFrequently Used mining Energy with This Device: Required to Vary (a) Table 7 Amplifier Reference Quan- (b) Table 8 Error tity Measurement Corrector Contact arm and bushing friction. (a) 2, 3, 4, 5 (b) 1,2,3 0 -n m m A and B can be remote. Top speed limited by commutator. Tachometer accuracy. Commutator resistance. Brush and bearing friction. (b) 1,2,3 Four poles instead of three can be used with two having exciting windings and two pickup coils connected bucking. Limited rotation. Air gap usually small. Machining tolerance, magnetic fringing, and voltage phase shift. Load taken from x. Bearing friction. (a) 2, 4 (b) 1,2,3 A dual system can be used whereby the unity synchrosystem sets the approximate position and the high-speed or vernier system sets the accurate position. Unlimited rotation. The synchro-generator and control transformer can can be remote. Machining tolerance, accuracy of winding distribution. Distributed or nondistributed winding of control transformer rotor. Load taken from x. Bearing and slip ring friction. a battery as the reference. (a) 2, 3, 4 o OJ > n '"n o z-t :;:a o r- (a) 2, 4 (b) 1,2,3 (a) Frequency bridge (b) Frequency control (a) Millivolt bridge (b) Temperature control Error voltage, x, appears when reference and regulated frequencies differ. Tube channel A produces a filtered sawtooth wave that gives a d-c voltage inversely proportional to the reference frequency. Tube channel B produces a similar voltage as a measure of the regulated frequency. The difference of these doc voltages is x. May be used as a speed regulator if B is made an a-c tachometer. A and B can be remote. Tubes can be either gas or vacuum. A wide range of frequencies can be covered. Vacuum tubes should be used for high frequencies. Temperature and aging effects on tube and circuit elements. Tube input impedance. (a)4 Error voltage, x, appears whenever the regulated temperature differs from the reference temperature. The regulated temperature is measured as a voltage by the thermoelectric effect of two dissimilar metals, B. The reference temperature is represented as a voltage from the battery-potentiometer source A. The difference in these voltages is x. An electronic voltage source or another thermocouple can be substituted for A. A and B can be remote A wide range of temperature can be covered. Ability to detect very low millivolt signals. Contact arm and bushing friction. If electronic voltage source A is used, tube input impedance. (a) 2, 4 (b) 1,3 (b) 1,6 "C Z o > ~ m Z -i > r- (J) o " B (J) -< (J) x -i m ~ > Z > r- CD -< (J) m~m Ui ~~~ o Synchro generator Synchro control transformer CD o ~x~ "-> C? ..... -0 ~ a TABLE No. 7 (a) Type (b) Main Application (a) Phototube bridge (b) Position control by intercepting a light beam 6. ERROR DETECTORS Possible Modifications Operation Error voltage, x, appears when movable shutter is in other than desired position. Light reaching phototube, B, measures shutter position. This light is measured as a voltage by the phototube current variation. A reference position of the shutter is represented by the battery-potentiometer voltage. The difference of these voltages is x. An electronic voltage source or another light source and phototube can be substituted for A. tv o '(Continued) . Operating Features A and B can be remote. Glass surfaces through which light travels must be kept clean. Accuracy Limited by Continued accuracy of light source and phototube. Features DeterFrequently Used mining Energy with This Device: Required to Vary (a) Table 7 Amplifier Reference Quan- (b) Table 8 Error tity Measurement Corrector Contact arm and bushing friction. If electronic voltage Bource A is used, tube input impedance. (a) 2, 4 (b) 1 "'T1 m m o I:JJ (a) Mechanical differential (b) Position control and speed control (a) Beam bal- ance (b) Voltage control, speed control, and tension control 10 (a) Modified beam balance (b) Speed control (flyball governors) Displacement x appears whenever the relative reference and regulated positions change. Reference position is measured as an angle by one side of the differential A and regulated position as an angle by the other side of the differential B. The difference in the two positions rotates the middle member of the differential giving displacement x. Spur-gear differential. Displacement x appears whenever the variable force is different from the reference force. The variable force, B, and the reference spring force, A, are measured as moments. The difference in these moments produces displacement x. Any variable force other than a spring can be used. For remote operation B can be a transmitted force. x movement limited. By changing springs a wide force range can be covered. Displacement x appears when regulated speed, w, differs from reference speed. This is represented by spring force, A, about fulcrum, 0, the regulated speed by centrifugal force mass, B, about O. Difference in moments of forces about 0 produces displacement x. Any variable force other than a spring can be used. A wide speed range can be covered. x movement limited. or Since A and B must be located together, synchro ties or their equivalent can be used to transmit remote positions to A and B. Continuous rotation possible with speed limited by gears. Gearing backlash. Power taken from x. Bearing friction. Pitch of gears. (a) 1,6,7,8 (b) 1,2,3, 4, 5 » n A n o Z -f ::c o rLoad taken from x. Bearing friction. Load taken from x. Friction. Magnitude of forces. Screw pitch and friction. Magnitude of forces. Screw pitch and friction. (a) 1,6,7 (b) 1,3,5 (a) 1,7,8 (b) 1,5 11 (a) Bimetal (b) Temperature control 12 (a) Float (b) Liquid level control 1,6 1,6 Displacement x appears whenever the surrounding temperature and the reference temperature are different. The reference temperature is represen ted by the position of the adjustable reference point, A. The surrounding temperature is measured by the position of the bimetal strip, B. The difference in these positions produces displacementx. Bimetal can be made snap acting at some standard temperature. Wide temperature range possible by selection of proper bimetal. Load taken from x. Ability to measure accurately small x deflection. Time lag and hysteresis of bimetal. Mounting of reference point. (a) (b) Displacement x appears when regulated and reference liquid levels differ. Point A is reference. The liquid level is measured as a position by the floatB. The difference produces displacement x. A float controlling a pulley system can be used rather than a lever. With the proper mechanical arrangement a wide variation in liquid height can be controll€d. Load taken from x. Variable density of the liquid. Friction. Mounting of reference point. (a) 2, 7 (b) 3. 4 Phototube, B t~ ~ Moveable shutter ------'--~~~=~.:./ =~==~~~====~~= ; 5; m ~ hi .<] Reference point Reference~ . o » B ~§~ C point Z --I » r- en o "TI ® R "TI c: Z en U; --I m 5; CD Reference spring, A !TTl Bimetal, I B ,x • 1\~Jf AdjUsta~le' reference point, A· I Screw adjustment @ @ ~/ -~eference point ~ » z » Adjustable reference point, A x~, ~ en en &:; I 'l;: Liquid level @ ~~~~~~]~~~~~~~~~~~~~~~~~~ t-.) o ~ 6. TABLE ERROR DETECTORS ...., (Continued) Features DeterFrequently Used mining Energy with This Device: Required to Vary (a) Table 7 Amplifier Reference Quan- (b) Table 8 Error tity Measurement Corrector (a) Type Possible Modifications (b) Main No. 13 (a) Bellows (b) Pressure control and temperature control 14 Operation Application (a) Piston (b) Pressure control Operating Features Displacement x appears when surrounding and reference pressures differ. Reference pressure is represented as position by adjustable point, A. Surrounding pressure is measured by the bellows as a position. Difference in these positions produces displacement x. Spring can be added in addition to bellows spring. Limited x travel. Displacement x appears when regulated pressure outputs of pump and reference differ. Reference pressure is a force on the piston by spring, A. Regulated pressure is a force on the piston by the fluid. Their difference produces displacement x. A standard pressure source can be substituted for the spring. A and B can be remote. Limited x travel. Accuracy Limited by ..,....,o 1,4,5,6 Load taken from x. Hysteresis of bellows spring. Mounting of reference point. (a) Friction. Load taken from x. Piston forces involved. Screw pitch and friction. (a) 7, (b) 5 (b) 1,6 8 ." m m oc:J » n A n x ~ I I I I I I .'f o ! Z I ~
T 2) will start flat at log Xo (xo = initial value) and then asymptotically approach the slope of the reciprocal of the larger time constant (kITl)' Drawing the asymptote will give an intersection log [x o Tl on the response axis. See Ref. 6. Tl + T2 ] FUNDAMENTALS OF SYSTEM ANALYSIS 20-27 Frequency Response. The most generally useful method is to excite the system or element under test with a sinusoidal signal. The frequency response is obtained by making a comparison of the amplitude and phase relations of the input and output over the frequency range of interest. The phase and amplitude relations can be obtained in a number of ways, e.g., from (a) direct oscillograph or recorder readings of the variables, (b) Lissajous patterns on a long persistence oscilloscope, and (c) special test equipment that gives a direct reading of the phase and amplitude ratios. An analytical expression approximating the transfer function can be obtained by curve matching techniques. Sufficiently good results are often obtained .by a simple trial and error approximation of the frequency response obtained by the use of the straight line asymptote defined in Chap. 21. Straight lines with slopes which are multiples of 20 db per decade are first drawn so as to approximate the experimental data. The exact frequency response corresponding to the estimated straight line response is then calculated by use of the graphs of Chap. 21. The agreement between the calculated and measured response is checked and the process is repeated if necessary. With a little experience one or two iterations are usually sufficient. The intersections of the straight lines are the poles and zeros of the transfer function. More elaborate approximation methods are available if needed. See Refs. 6 and 8. Correlation Technique. The autocorrelation function of white noise is an impulse. Therefore the cross correlation of the input and output of the system is simply the impulse response of the system when the input is white noise. An experimental setup similar to Fig. 10 can therefore be Xi (t) ~ " System under test Xo(t) - 10 (t) r g(t) Ii (t) White noise generator FIG. 10. Cross correlator g(t) ~ Test configuration to obtain system response by correlation technique. used to evaluate the transfer function of one element or system. Because the cross correlation filters all signals not correlated with the input white noise, the technique has the potential advantage of allowing the normal system operation to continue while the test is being conducted. See Ref. 8. The practical difficulties of mechanizing a satisfactory crosscorrelator has limited the usefulness of this method. FEEDBACK CONTROL 20-28 2. CLASSICAL METHODS OF ANALYSIS System Equations In Methods of System Analysis in Sect. 1, the correct description of the system or component dynamic performance by differential ,equations was stated to be basic to all methods of analysis and synthesis. General Linear Differential Equations. A common type of system equation, the general linear integro-differential equation with constant coefficients may be w'ritten in terms of the input x(t) and the driving function yet) (Ref. 13) as dnx(t) dn-lx(t) ao - + al 1 n dt dt n- (22) dx(t) + ... + an-l -dt. - + anx(t) + an+lfx(t) dt + ... + an+qfqx(t) dt q = yet). As a class, the homogeneous equation resulting from reducing the righthand side of the equation to zero has as its general solution a linear combination of solutions of the exponential form ePnt , where Pn may be real or complex. Characteristic Equation. The operator p = d I dt together with lip = f dt may be substituted into the reduced homogeneous equation. The resulting operational equation may be handled by the rules of algebra as explained in Chap. 8, Sect. 1. Factoring out the operational part of this equation yields the characteristic equation (Ref. 13): (23) aopn+ q + alpn+q-l + ... + an_lpl+q + anpq + a~+lpq-l + ... + an+q = o. General Solution to Linear Differential Equations The complementary solution to eq. (22) is (24) Xt = An+qe(pnH)t + An+q_le(pnH-l)t + ... + Ale(Pl)t, where pn are the roots of the characteristic eq. (23). The complete solution is (Ref. 13) (25) x(t) = Xt + Xs , where Xs is the particular solution to eq. (22). The particular solution is obtained by substituting an assumed solution and solving for the coefficients. (See Part A, General Mathematics.) FUNDAMENTALS OF SYSTEM ANALYSIS 20-29 Absolute Stability Defined. (See also Chap. 21.) The stability of a system may be broadly defined as that property which insures that it will remain in operating equilibrium through normal conditions (Ref. 14). A system is said to be on the verge of stability when it is hunting, that is, subject to sustained oscillations; if the oscillations grow, the system is unstable; if they decay, the system is stable. N onoscillatory instability is also possible, such as the exponential growth of a system variable in response to a disturbance. Tables 14 and 15 give examples of stable and unstable performance and the dependence of stability upon the nature of the roots of the characteristic equation (or exponents of the complementary solution). When system gain is increased to provide desired accuracy, instability is frequently encountered. This is the situation which is attacked with equalization (or stabilization) methods designed to provide a margin of stability without compromising system accuracy. A margin of stability is nearly always desired from the hunting condition. I t is implied that the system is linear or may be linearized in the neighborhood of the operating point for the purpose of analyzing stability (for linearization of nonlinear systems, see Linearization, Chap. 25, Sect. 2). EXAMPLE. Second Order System (M otor Synchronizing on a Fixed Signal). In Fig. 11 a motor drives a load to which it is coupled directly from an Initial pOSition of controlled variable = Co Motor FIG. 11. Motor driving load from an initial position Co to correspondence at position O. Combined inertia = J Ib-ft-sec 2 ; damping = D lb-ft/rad/sec; stiffness = ]( lb-ft/rad (Ref. 3a). initial rest position Co to correspondence at position 0 (Ref. 13). The input to the motor produces a torque that is proportional to the difference between the controlled load position and the reference input position. Thus motor torque equals - Kc since the desired final position is zero. Present in the load and the motor are mechanical friction and electrical damping torques both of which are proportional to the motor speed; friction and damping torque equal D(dc/dt). Static friction forces are negligible. There is also a torque due to the combined inertia J and this torque equals J(d 2 c/dt2 ). FEEDBACK CONTROL 20-30 The complete torque equation can now be written as d 2c J- 2 dt (26) dc + D -dt + Kc = O. The steady-state displacement is zero, that is, the corresponding position, so that the transient response is the entire motion. By writing the characteristic equation as 2 D I( (27) p + J P + J = 0, a further modification will be made. in the interest of obtaining a simpler form of the solution. Let (28) and (29) .JIf: = "'0 = undamped natural frequency, D _ rr;-; = 2vKJ r = damping factor. If these substitutions are made, the characteristic equation may now be written (in nondimensional form) as (30) p2 + 2rwop + wo 2 = 0, in which the two roots are (31) PI = - [r - (32) P2 = v?-=-i ]wo, -[r + Vr2 - l]wo. The effect of r upon the form of the transient solution of a second order system is treated in the next section. Use of Laplace Transfornt The work involved in using the classical approach to the solution of linear differential equations may be simplified to a routine process through the use of the Laplace transform and its inverse, which uses the same approach tb obtain both transient and steady-state solutions (Refs. 15, 16). The Laplace transform has the advantage of handling initial conditions and discontinuous inputs directly. For a complete presentation of this method see Ref. 17. The Laplace transform is defined as £[J(t)] = F(s) ~ L"f(t)e-" dt. FUNDAMENTALS OF SYSTEM ANALYSIS 20-31 The inverse Laplace transform is defined as f+ C £-l[F(s)] = f(t) = -1. 27rJ iOO F(s)e t8 ds, t ~ o. c-ioo In these definitions s is the complex operator (J' + jw. The abscissa of absolute convergence, denoted by (J'o, is at (J'o > o. The Laplace transform and inverse Laplace transform are referred to as a transform pair. Laplace Transforlll Applied to Feedback Control Systelll. A closed loop linear control system may be represented in terms of the complex operator s by the eqs. (33), (35), and (36) as follows: C(s) = G(s)E(s), (33) where C(s) = transform of the controlled variable, c(t), E(s) = transform of the actuating error, e(t), G(s) = transform of the transfer function of the forward control elements and may be given the factored form: (34) (35) G(~ = K(s - sa)(s - Sb) ••• sn(s - Sl)(S - S2) ••• , E(s) = R(s) - B(s), where R(s) = transform of the reference input, ret), B(s) = transform of the feedback, bet). (36) B(s) = H(s)C(s), where H(s) is the transform of the transfer function of the feedback elements and may be similar in form to that given in eq. (34). The block diagram for the above system of equations is given in Fig. 25b. The transform of the closed loop transfer function (see Fig. 25c for the block diagram) for this control system is (37) C(s) R(s) G(s) 1 + G(s)H(s) Expressing C(s) in terms of polynomials in s, N(s) avs v + aV_ls v- l + ... + als + ao C(s) = = , (38) D(s) sq + bq_ls q- l + ... + bls + bo where, because of the nature of the functions G(s), H(s), and R(s), general q ~ v. III TABLE 9. F(s) No. 2 3 _1 (e- as s _ o~ t J(t) , 1 s _1 e-a8 s 1 (Ref. 15) SOME USEFUL LAPLACE TRANSFORM PAIRS 1 or u(t), unit step at t e-bs ) u( t - a) - u( t - b), r1m u( t) 5 1 S2 t, unit ramp at t 6 1 ,sn (n - I)! 16 17 S2(S + a) s(s + a)2 e-at _ e-'Y t 1 + a)(s + "Y) 1 s(s + a)(s "Y- a 1 -+ a"Y + "Y) 1 S2 tn-Ie-at 1 - 2 (1 - e-at - ate-at) a 1 (s tn- l 1 - (1 - e-at ) a e-at + at - 1 a2 + a) 1 12 15 1 (n - l)t 1 11 0 te-at + a)n s(s = e-at 1 (S 10 14 1 1 s+a 1 (s + a)2 9 13 a-+O. ai 1/1 ~ tan- l (ala - ao)/{3 al 1 25 S2 - {32 26 s S2 - (32 27 sF(s) - f(O+) 1 . h ~ sm {3t cosh (3t df(t) dt d 2f(t) df(t) s2F(s) - sf(O+) - (0+) dt F(s) jC-l)(O+) 29 -+ s s F(s) jC-I)(O+) jC-2)(0+) 30 - + + S2 S2 s 28 --;}j2 ff(t) dt f[ff(t) dtJ dt 31 aF(s) af(t) 32 FI(S) ± F2(S) !t(t) ± h(t) 33 aF(as) f(~) 34 35 36 F(s + a) e-atf(t) FCs - a) eatf(t) e~a8F(s) f( t =F a), a where r(t - a) = 0, + a) = 0, f(t 20-33 >0 2 fJo e -at sin - Unit impulse, £ · u(t) [11m a-+O fJt a "TI (e- at _ e--Y') 'Y- a fJ u(t - a)] 1 1 + ~fJ e- at sin (fJt - tan-l~) -a 1 - (1 1 + at)e- at + 'Ye - at - ae--yt a-'Y o 5: m Z --I »r- o s t- -2a2 fJo + -1 e- at sin ( fJt - 2 tan -1 - fJ ) fJ -a 2 t- - a + ie-at + 2 _e-at a 'Y "TI +a 'Y2e- at - a 2e-'Yt t - -- + -----a'Y a'Y( 'Y - a) en -< en --I m £[t] =~ Specific form of C(s)/R(s) Z en =- Unit ramp, 1 C » = 1 Unit step, £[u(t)] ~ a 2te- at 1 5: fJo 2 where and a fJo 2 (s + a)2 + fJ2 = two a2 = a2 where + fJ2 = wo2 a'Y (s + a)2 (s a = wo where a and + a)(S + 'Y) + 'Y = 2two a'Y = wo2 » z »r- -< en iii Note. Laplace transform, C(s), of each time response in this table is simply the product of the transform of the input, R(s), by the system transform C(s)/R(s) in the table. ~ o W tn 20-36 FEEDBACK CONTROL The general case of expansion into partial fractions with higher order poles yields the solution (see transform pair No. 0.21 of Ref. 15) (43) where (44) and (45) Order of SysteID Responses as Seen froID Partial Fraction Expansions. Any complex linear system can be represented as a combination of first and second order systems. This may be seen from the partial fraction expansion of a system response function such as that of eq. (38) into partial fractions (46) N(s) C(s) = D(s) C1 C2 Ck a1 s + ao =--+--+ ... +--+ + ... , s + 2rwos + w0 Sl S - S2 S - Sk S2 2 where two conjugate complex first order poles have been combined into a single term. The response of the complex system can, therefore, be considered as the sum of the responses of first and second order systems. The responses of first and second order systems are thus of considerable importance and are given for various inputs in the following. First Order SysteID Responses. A first order system is characterized by a single energy storage. An example of a first order system is the simple hydraulic servo of Table 5 for phase lag. The system equation is thus (using T for Tv), (47) dx(t) T - dt + x(t) = (b/a)y(t) = f(t). The transform of the equation may be written: (48) [Xes)] = [ Ts 1 +1 ] [F(s)] which is of the form (see Ref. 16) (Response function) = (System function)(Excitation function). Asymptote 1:: 100% ------r------------- Final value Q) (a) I II E Q) Tangent?/ o (1J Ci. / U) C 63.2% I ----/-- / I. / / / / / h T t~ FIG. 12a. Response of first order system to step showing time constant relationships. 1.0 , 0.8 .... (b) I I I I ,1- r-Ste p / 1/ I I \ V 0.5 ~I~ ", J 1\ 0.4 II 0.3 ~ I 1/ 0.2 , t\. I" t--.,i)Z 0.1 I II I_I mpu Ise, xT A 'N.. 1/ o o FIG. 12b. ::r::pf / \ ~I~ 0.6 0 k: V \ \ 0.7 1-0- ~~r- I 0.9 1\ 2 t T rH-3 4 Step and impulse response of first order system: dx T dt = e- tIT, Impulse: xT A Step: ~=1- + x = J(t). A = ioo(impU!Se function) dt. -00 e-tIT, A = magnitude of step. 20·37 5 FEEDBACK CONTROL 20-38 The characteristic equation is (49) Ts +1= 0, and the transient solution is (50) The performance can be characterized by the quantity T, called the time constant of the system (see Ref. 13). Physically, the time constant is the time to complete 1 - e- 1 = 63.2% of the change after either a step 5 ./ V 1/ 4 V Input, At-:--.. 3 (c) 1/ .I ':>lV / V v V 2 II V ~ " v V V 1/ 1/ x AT 1/ 'I' f ~ 1/ ~ / " 1-" 7 ~.1 /",' /1" / v/ / / o o / "--~ ~ - Final asymptote 1/ V I I I tit I '/ / 1 2 t 3 4 5 T FIG. 12c. Ramp response of first order system: dx T dt + x = f(t) = At, x t -AT = -T - (1 - e- tIT) . or impulse input. Also it is the time given by the intersection of the tangent to the transient at t = 0 with the asymptote to the final value when a step or impulse is applied at t = 0 (see Fig. 12a). Table 10 lists three types of input J(t) , the corresponding excitation function F(s), the response function Xes) for the system function l/(Ts + 1) and the inverse transform x(t) of the response function, Xes) and x(t) forming a Laplace transform pair. In Fig. 12b are plotted the step and impulse response of a first order system obtained from the solutions FUNDAMENTALS OF SYSTEM ANALYSIS 20-39 appearing in Table 10. In Fig. 12c is plotted the ramp response of a first order system from the solution appearing in the same table. Second Order SystClll Responses. The solutions for unit impulse, step and ramp inputs to the second order system of eq. (26) generalized by setting the right-hand side equal to ](r(t), namely d 2c J- (51) dt 2 de + D - + Ke dt = Kr(t) are illustrated respectively in Figs. 13, 14, and 15a, b, e. The transform 0.8 II 1\ 0.7 \ / / II II 0.6 0.5 '/ '1/ 0.4 wo 0.2 0.1 0 0.1 1\ ~ '" v- ~\ '< ~v r = 0.6 \ 1\ r = 0.8 ~// :---... rv ~ ~ r =1.0 ~ :---... "- ,,\ )( V ~ ~"\ rY ~-r = 1.2 c(t) 0.3 r = 0.2 \/ vr = 0.4 V , ~ ~ ~L-r j " = 1.4 "~~ J \\\ ~ ~ I::::::::-t-. II 1 2 \\ 3 \ '\ \' '" 4""", 1\ , 1\ 0.2 1\ \ 0.3 t..... \ 0.4 5 V "" / -fJ, ~7-:::: 1/r; J I II V J II Response of second order system: wo2 C(8) = 8 2 + 2,rW08 + wo2 R(8) to unit impulse in r(t) for various values of ,r. ~ f\ J...- r--..:.: t--.. f\ FIG. 13. 1'\ / ~::-... wot 8 -N~1\1o 9 20-40 FEEDBACK CONTROL 1.6 '"' I\ I 1\ 1.4 \ II 1.2 I ..... I I I --, r,\ ''\ / wot I2 IIi / V-4~j ~6 -\-- r-=:::.--- -- 1.0 [/ / ..... 'tr 0.8 r- r 0.4 FIG. 14. I ..... 1.4 1:2 1.0 1,/ '/!f!1 WI ~V l VI I ~ Response of second order system: C(s) ~ / Iff!V f o / = s2 to unit step in ret) for various values of 8/ -- F ~ ... --- ~ --_ ...../ ~r= r=o?f!/,I ~r-t = 0.4 0.2 ~ /' =0.6 f- .....- / = ~ ,~ I / /r\ / r 0.8-1-/- IL/I/, ;/ r // /j; = "---./ = m I 0.6 ",,-- wo2 + 25"wos r. + wo2 R(s) 1o ;...- 9r---r---r---r---+---+---~--;---;---~~~ 8r---r---r---r---+---+---~--;---~~~~~ 41---jf--- 8 9 10 10r---.--.,,-r-rr---n~-'---'__--r---~--'----r---r---'-'-- I ; / : / 1/ I ~I I / V / 1 // 9~--+-~~~~~4---;--T~--~---r---+---+---+~~--~ 8 I : I / / / l! 1,1/ ~//'V rr == 0.8~ 0.6 -;....! / I II " / ' :, -,.... T-fl"---r.i-'-f--+-f-t-----iT-I--+--t--t----t---:t'-~""t_-_r-__j r = 1.0 Ci'-~fNl / / /r / //V 7 r = 1.4y-" 11/,1 / r= 1.2~,""~NW ' ~r=O.4 /L ,I! // " V ,'/// ,/ r=o.~/ JY~V o~~~--~--~--~--~--~----~--~--~--~-~--~--~ o 2 4 6 8 10 12 14 16 18 20 22 24 26 (b) FIG. 15a, b. Response of second order system: C(s) = [(w02)/(S2 unit ramp r(t) = t 2 for various values of r. 20·41 + 2rwos + wo2)]R(s) to FEEDBACK CONTROL 20-42 24 /. /. /. /. //}' 22 /. 20 ", /. /. /. /. V :/ V /. .(/. /" 18 . /. 16 /. r= 0.2- ~ f= 0.4 r=0.6 14 V ./;: /. / / / /. Y l-? /. /./. -:;:;-\ ~ 12 -""" ~N 10 ~/. //V /. 8 /. r= 6 - r = r= 4 _ r= 2 0.8-.. 1.0 . . . . 1.2, 1.4 A o r ~ 'l 1//.,1bY/ o ~~ /. /.~ 2 V 4 8 6 10 12 14 16 18 20 22 24 26 wat 2f (c) FIG. 15c. Response of second order system: wo 2 C(s) = 82 + 2rwos + wo2 R(s) to unit ramp r(t) = t for various values of r. of this equation may be written in a nondimensional form (as for the first order system of eq. 47). (Response function) = (System function) (Excitation function) (see Ref. 16), which in this case is 2 (52) C(s) = [ 2 2] [R(s)]· s + 2twos + Wo w0 The time response may then be obtained by looking up the inverse transform pair in a table of Laplace transforms (see Ref. 15). The form of the response will be oscillatory, critically damped, or overdamped as 20-43 FUNDAMENTALS OF SYSTEM ANALYSIS the damping factor r < 1, = 1, or > 1. Table 11 is a chart of the time responses illustrated in Figs. 13 to 15 (see Refs. 6, 13, 18, and 19). Tables 12 and 13 and Figs. 16 to 19, from Ref. 20, are for the determination of equation coefficients and system parameters for second order systems. Table 14 illustrates time responses. Table 15 treats stability. I 1.0 I I I 1 1 1 I II H = steady state change h = transient overshoot from steady state ~. , I \ 0.8 I I I I I I I I 1-' I I , , I I I I 1 s[..i!..+2t 1\ w02 \, \ 0.6 I Wo S+I] ll~ 0 t- :~ \ 1\ \ \ h s ~ 0.4 ~~+ W02 \ '\ 0.2 ~ r'\. 0.2 0.4 2t 8+1] wo ~ H t ~ H t f-- - .- - ~ ......... o o h I-- ~ j"... 0.6 ~ "'" r-- r-0.8 1.0 FIG. 16. Determination of equation coefficients for second order systems from response .curves (Ref. 20). I'-) o l:... ,f::o.. TABLE 12. RELATIONSHIP AMONG SYSTEM PARAMETERS AND EQUATION COEFFICIENTS FOR SECOND ORDER SYSTEM (Ref. 20) az[(d 2x)/(dt 2)] Parameter Damping ratio Undamped angular natural frequency Symbol ~ al 2~ I~ Wo Undamped natural frequency jo Undamped natural period To Angular natural frequency w ~aturalfrequency j + al[(dx)/(dt)] + aox = y(t) Definition in Terms of Equation Coefficients '\j a2 Equivalent Expressions To Tl T2 woTc' 2trTc ' 2Y'l'lT2 1 lao ( al)2 2; '\j~ - 2a2 1 w Wo 2trVl=12' To' 2tr' 2tr ~-~ao ( al)2 2a2 ' "mm +1 oOJ 2Y; » n 2tr 2tr~-~ao lao V w 1 1 ---,---, 2trjo, - , - , ~ yl - S2 To sTc y T l T 2 1 '\j~ - + 1 271", TVl=12, Wo woV1 - S2 , 2trf 2tr, 'T woyl=f2 271" 1 2trSTc 2tr~Tc, 101 vl=f2 2trVl=f2 rT ' ~ w 1 ' 2tr' T' To c vl=f2 2trSTc '"n o Z -I ::0 or- Natural period Critical time constant L[\rge time constant (S 2?r T 1) > 1) I( al)2 T2 > '\J 1) v 2a2 + '( al)2 \j 2a2 ao a2 ao a2 al ao Tt Overcritical time constant 1 To Ty!=12 2TlT2 2TI T t swo' 2?rs' 2?rS 'T I + T 2' V + l' 2S2 1 Tl al 2a2 Time parameter ratio (S 2a2 2a2 al al 2a2 Small time constant (S ( al )2 luo '\J az Tc > Z?r 2?r 2?rSTc 1 To ;-' woy1- S2' y1- S2' i' y1- S2 . al 2a2 + . I( ai )2 '\J 2a2 al 2a2 - ~ _ _ _1_ _ , wo[S - yS-2 - 1] ~ vT c" 2 ~ -n V Z +1 c: o » wo[S + lITo STc TI yS-2 - 1( y~wo' 2?ry~' y~' -; m Z --I » r- en 2S - , TI al 2 ao (2a) - VvST 2?r Wo 5; Wo ao a2 ,rv, TeVv, +T v + 1 T I, 2S T e, _2s 2 j= 2, - - v V v Tc o -n en -< en --I m S + YS2 - 1 T I S - yt! - l' T2 5; · » z » r- -< en en '"o 1c.n TABLE 13. DETERMINATION OF EQUATION COEFFICIENTS FOR SECOND ORDER SYSTEMS FROM RESPONSE CURVES (Ref. Type of Response Transfer Function Response Curve Equation Parameters Used Equation Coefficients in Terms of ao and Equation Parameters Method Used to Find Equation Parameters [(~y + ~~ Oscillatory 8 + 1] o < r < 0.5 ".:"" ",,,,,.k ~ t"o~vo f\ "1 [(~y + ~~ f, T ~ % o ~ 0.. aoT2 41T'2 (1 - r) t_ "3 toM" + 1] rVl="? 1T' Xo, Xl, X2, Xa, ••• ". "T1 Form ratios "0 can be any peak s aoT Measure T, e ..., a2 al 8 20) t~ "3 m m Xl X2 Xa X2 Xa -, -, -, -, -, - Xo X2 Xo Xo Xl Xl X3 og:, ... » () A 8 [ (~y + ~~ 8 + 1] s [ t (~) + ~~ s + 1 ] 2 " s [ (~y + ~: s + 1 ] r from Fig. 17 o Z -i ::0 t_ o rNear critically aperiodic 0.5< 1 Find 1 () r t lm~ f, --t6 --.. --1- Measure tlJ t2, ta Form ratios t2 ta ta - t2 -,-,--tl il t2 - tl t% lOO~-~i Find r, wolt, wOt2, wota from Fig. 18 Compute value of ~ 0.736 -- "0 0.406 I 0.19 < 2.0 tl " o t2 t3 t_ 0.801 ---0.599 --I 0.264 _ I! ! t_ wo r = fav and wo = WOav 2aor ao wo wo2 s ~y+ 2r [( s wo S + 1] Critically aperiodic r= t x Tc To 1.0 t x + ~~ s + 1] Measure Tc on Fig. 18 for r = 1.0, Tc = t- 1 s [(;) t1:,------ 100 0.7360.264 -,. tl = t2 - tl = t3 - 2aoTc t2 lOOtlr------ 0.736 -- I 0.264 o : To -n C t_ Z 8 ~ [( s y 2r +WOS+ 1 ] t N onoscillatory x r > 1.0 t:~:;=- t_ t " ""~ Xo ' xl --- %2 ----!- 11 12 t- Semilog plot of response curve VlI Tl Plot response curve on semilog paper. Extrapolate straight line portion of plot to t = O. Measure In Xl and In X2 at and t2 respectively. Measure Xo and X(ex) Compute TI from Tl = aOTI(V v + 1) o » ~ m Z -t » r- CJ) tl o -n CJ) -< CJ) -t t2 - In Xl - tl In X2 m ~ » z » 1 S 2r ] [ (~S)2 +Wos+1 t % -:t:z= Compute v from X (ex) v=---":'':'''''':''X(ex) - Xo r- -< CJ) Ui t_ Invert response curve to agree with above plot, then plot on semilog paper. t-.) o 1.. "'-J () 1) ~ 0 TABLE 14. Step function position F(s) JCt) 1 s 1 CD 1.. 00 TIl\m RESPONSES OF Smm COMMON TRANSIENT MODES (Ref. 20) fflJ,t==Time Response t~ Step function velocity Step function acceleration First order lag converging ® MV ,--- 1 ~ ® 1 !t2 83 Q) 1 Tcs +1 _1 e- t / Tc Tc ""LJ ,--/("~ 1/Te t~ ® 1 sCTcs + 1) 1 - e- t / Tc '~------ 1ft) ,- "T1 m m 0 D' > n A n 0 Z -I :::0 0r- First order lag divergmg ® ~ et/ T c 1 +1 -Tc8 ® 1 8(-Tc8 -1 + 1) ~ fllJI Tc IITc~ t __ + etlTe "., 1--==-=== t __ Undamped second order t=O O 100, r", f For r > 5, v"V4t2 / 1/ I / 6 5 4 / '/ 3 2 / L L lL II 1 1.0 1.2 1.6 2.0 s 4.0 2.5 3.0 Damping ratio, r 5.0 s 1 1 FIG. 19. 1 Time constant ratio TdT2 as a function of damping ratio for overdamped second order system (Ref. 20). FUNDAMENTALS OF SYSTEM ANALYSIS TABLE 15. 20-53 STABILITY As A FUNCTION OF THE NA'l'UItE OF 'l'HE ROOTS OF THE CHAItACTERISTIC EQUA'l'ION (Ref. 20) No. of Example of Performance Given in Table 14 Type of Stability of System Stable Stable Verge of stability; undamped oscillatory time response Unstable Unstable Nature of Roots of Characteristic Equation (or Exponents of N onoscil- OscilComplementary Solution) latory latory All roots have negative real parts. 4 A single zero root; all other roots, if any, 1 9 have negative real parts. Conjugate imaginary roots all different, 5 8 in addition to roots for stable systems above, if any. Roots with positive real parts, in addi6 10 tion to other types of roots, if any. 7 Repeated zero or conjugate imaginary 2 roots, in addition to other types of 3 roots, if any. Application of Convolution Integral A convenient method of calculating the time response of a system to any arbitrary input makes use of the convolution integral (see Ref. 21), which may be written as c(t) = (53) it f(T)g(t - T) dT, -00 where c(t) is the time response, J(t) is the input, and get) is the weighting function or characteristic time response to a unit impulse (see Weighting Function in Chap. 9). To evaluate this equation the arbitrary input is approximated by a series of impulses as shown in Fig. 20. If the im- I I I{t) I : ({T) I I II II I 4-AT~ I I I ,,1 FIG. 20. Approximation of a function, f(t), by a series of impulses. FEEDBACK CONTROL 20-54 pulsive response, get), is known, the sum of these responses to the impulses approximating the input signal constitutes the total time response, as illustrated by Figs. 1.47 and 1.48 of Ref. 8. Of course, a theoretical impulse function has zero width in time; in the practical case, if its width is much smaller than the response time of the system being considered, the results obtained will be valid. The quantity fer) is the average height of the rectangular approximation of an impulse; Ar is the width; and r is the time to the center of the rectangle as illustrated in Fig. 20. The value of the time response at time t1 may be expressed as (54) 2:: C(t1) = fer) Ar get! - r). T=Tl,72,·· -,il This indicates that C(t1) is the sum of responses to impulse inputs, all evaluated at t1 • This same method may be used with the transfer function of the system, since it is simply the Laplace transform of the weighting function. Steady-State Solution of System Equations Although the complete solution of a linear differential equation for a system subjected to some driving function contains both transient and steady-state portions, the steady-state part can be obtained independently of the transient. Sinusoidal Driving Functions. The general form of the steady-state response of linear systems to sinusoidal excitation is sinusoidal and of the same frequency as the driving function. An example of the steady-state response cs(t) of the second order system of eq. (51) to a sinusoidal input is given in Fig. 21. When steady-state excitation with sinusoidal driving forces is considered, the Laplace transform is intimately related to the impedance concept. For the Laplace transform it will be found that s may be replaced by jw to obtain the steady-state response to a sinusoidal driving function (see Ref. 22). In Table 16 are given typical terms of an integro-differential equation showing use of the operator jw to obtain the electrical and "motional" impedances of analogous electrical and mechanical forms. The justification for this substitution of jw for s is given in Ref. 22. Application of this technique to the differential eq. (52) yields: (55) C(· ) JW = (jw)2 w 0 2 + 2swoUw) + wo 2 R(· ) JW. Complex Plane Plot. The steady-state response of a system as a function of frequency is very useful in servomechanism and regulator design 20-55 FUNDAMENTALS OF SYSTEM ·ANALYSIS 1.0 0.8 0.6 0.4 0.2 ~ 2 '" ~ 0 r-----r-~--------r_----+_~--------r_-~t -0.2 -0.4 -0.6 -0.8 -1.0 FIG. 21. Example of steady-state response, cs(t) to unit amplitude sine wave input, vet), for second order system: C(s) = 2 s "-'0 2 + 2rwos + wo2 for r = 1 and w = O.5wo. TABLE 16. SUMMARY OF EQUATION TERMS AND COMPLEX QUANTITIES (Ref. 3a) Physical System Electrical Derivative Form L diet) dt Transform Form Complex Form Complex Impedance LsI(s) j~LI(j~) j~L Ri(t) Rl(s) R1(j~) .!. 1(s) 1(j~) C s j~C M dv(t) dt MsV(s) j~MV(j~) j~M Dv(t) DV(s) DV(j~) D Kfv(t)dt KV(s) s l! V(j~) -)- b fi(t) dt Mechanical 1~ = jXL R -j - ~C = -jXc .K ~ 20-56 FEEDBACK CONTROL (Ref. 23). Use is made of complex plane diagrams in which the magnitude and the angle of the output to input ratio are shown by a single line on the complex plane as in Fig. 22. The complex output-input ratio C/ R is obtained by substituting jw for s in the transform eq. (52). o -0.2 1.0 -0.8j FIG. 22. Complex plane plot of C R (jw)2 + wo 2 2swo(jw) + wo2 for control system of Fig. 11. Logarithlllic Plots. Instead of plotting vector loci of the transfer function as in Fig. 22, the contours can be plotted to a logarithmic scale (see Refs. 24, 25, and 26). To exploit certain manipulative advantages, the attenuation and phase angle graphs are made separately. The attenuation is plotted in decibels, or 20 IOglO Iatten. I versus the IOglO W; the phase angle is also plotted versus loglO w. In Fig. 23 the complex transfer function of eq. (55) has its attenuation and phase angle plotted against the IOglO(W/WO), giving a nondimensional chart for the frequency response of second order systems over a range of values of damping factor r. 3. BLOCK DIAGRAMS Definition of Terllls A block diagram is a simplified method of presenting the interconnections of significant variables. It displays the functional relationships rather FUNDAMENTALS OF SYSTEM ANALYSIS 20-57 than the physical and thus gives a clear· insight to the problem. The physical system and interrelationship determine the block diagram arrangement, each block is a logical step in the flow or signal process. Block diagrams are built up by algebraic combinations of individual blocks where each block is a transfer function. An example is shown by Fig. 24 where the transfer function of the controlled system is G3 = C/M5 • Most block diagrams only show the desired inputs and outputs; however, in many physical systems there are loading and -regulating effects. These effects must be considered and can be handled as separate input effects. The recommended nomenclature (Ref. 29) for symbols in the block diagram is illustrated by Fig. 24 where: V = desired value, R = reference input, E M U C Q = = = = actuating error, manipulated variable, disturbance function, controlled variable, = indirectly controlled variable, B = feedback. The symbols used, such as C, may be in Laplace, operational, or sinusoidal form and can be indicated as C(8), C(p), or CUw). Lower-case letters are used to indicate time functions (r, v, e, m, U, c, b). Generally the parentheses (8), (p), or (jw) are dropped unless a particular form of representation is required. The transfer functions are labeled as follows: A for reference input, G for forward elements, i.e., from error to output, N for disturbance input, Z for i.ndirectly controlled system, and H for feedback. Numerical subscripts are used to identify individual elements. References 29 to 31 are the standards of the American Institute of Electrical Engineers and the Institute of Radio Engineers. The important point is that a consistent system be used. Construction and Signal Flow As illustrated by Fig. 24 the arrows connecting the blocks indicate the unidirectional signal flow. The circular junction point with appropriate plus or minus signs is used to indicate summing or differencing points respectively, i.e., :-Ir X_Y_ _ ~ x..;.Y_ _ ; .... FEEDBACK CONTROL 20-58 / 7 / / 00 .Iv 10 / AV ~ 10 ",,:0 10 C'! 10 10 ° ~ II V II M M (\J0II J1~~ ~ \. I °...... ( I ( I / / / / ~ ~\ \ I / / / ( ~~\ 00 o 1/ / / ~ /1 ~ ~ III o ~\ fl ~ dII dII II MMM ('t) o C\I o ° C\I 0 0 0 .-4 (\J I salqpap ''1I:J I o ('t) I o 3 3 o -= ~ ~~~-C-t-., _~t--~l ~t---. i'-...~r =0,05 -20 ----.;~~~S§~~d:=g g:15 - I -40 e -60 en ~ -"0 _~ -80 Q '0 Q) "60 r----+-~- t ~"~0~~-\-t:f 0.4~~ ~~H ..-t ~ g:~5 --I----f--.!-~- - ~ ,~ r----f--~ f;rr= g~~vv~~~~ ~ ~ ---I--~-+~~Ll~lJ = = 0.3 0.8_ - 1.0....... ~ -120 if. -140 -160 "'T1 C Z o » ;:: m -100 c cu I Z ~~ - r--t---+-l-LL_J -e--- I I -180 0.1 0.3 0.4 0.5 0.6 0.8 1.0 Magnitude and phase shift of C IR versus frequency ratio Note. For r> r- en '~~~ o "'T1 2 3 4 CJJ/CJJo FIG. 23. » - lil~~~"-~ ~ III 0.2 -t 5 6 8 10 en -< en -t m ;:: » z » !:( en wi wo for various values of r. Ui 1 plots are simply those for two unequal time lags (Ref. 3a). t-.) o ~ -0 ... , e be Disturbance input element N M4 Reference input elements A ~ BI - Control elements GI MI ~ B2 Control elements G2 + ~ M5 -n Controlled system Ga ~C~ Indirectly controlled system Z m m o cc » n A n o Z -I Feedback elements ;:c C H2 Feedback elements C HI FIG. 24. Block diagram of representative closed loop system. o r- FUNDAMENTALS OF SYSTEM ANALYSIS 20-61 The I.R.E. standard graphical symbols may also be used (Refs. 30 and 31). Mixing point: ~ 1 X3 f x3=f(Xl,X2) X2 Summing point: l + ~ ~ _ X3 X3 =Xl - X2 X2 Multiplication point: ~ Xz X3 71' X3 = XIX2 Algebra of Block Diagrams A complex block diagram can be rearranged or reduced by combining blocks algebraically. When all the loops are concentric, the indicated manipulations can be carried out directly by successively applying the relation CIR = GI(l + GH) to the innermost loops. When the inner loops are not concentric or even intertwining, the block diagrams can usually be reduced to concentric loops by the following rules and by reference to Table 17. . 1. Data takeoff channels can be moved forward (in the direction of arrows) or backward in the system at will except that the takeoff point cannot pass a summing point. Whenever a data takeoff branch is moved forward past a function G, the function 1/G, must be added in series with the branch. Whenever a data takeoff branch is moved backward past a function G, the function G, must be added in series with the branch. 2. A channel feeding into a summing point can be moved forward and backward in the system at will except that it cannot pass a data takeoff point. As this feed channel is moved forward in the system past a function G, the function G must be added in series with the channel. As it is moved backward past a function G, the function 11G must be inserted in the channel. 3. In some cases, it will be found necessary to move a takeoff point past a summing point or a summing point past a data takeoff point in order to reduce the system block diagram to simple concentric loops or parallel paths, which can be handled by methods (1) and (2). This can be done by removing a troublesome feedback point or data takeoff by closing an 20-62 TABLE FEEDBACK CONTROL 17. THEOREMS FOR THE TRANSFORMATION AND REDUCTION OF BLOCK DIAGRAM NETWORKS (Ref. 17) Equivalent Network Original Network Theorem 1. Interchange of elements 2. Interchange of summing points ~ tb ct ~ tc bt tb ct ~±b±C ---~) 3. Rearrangement of summing points a b±c c 4. Interchange of takeoff points 5. Moving a summing point ahead of an element 6. Moving a summing point beyond an element 7. Moving a takeoff point ahead of an element 8. Moving a takeoff point beyond an element 9. Moving a takeoff point ahead of a summing point 10. Moving a takeoff point beyond a summing point 11. Combining cascade elements a ~a____~____~__~ a t. c c ) Ic=a±b ...(<-='c_______--:t-_--', b ~ c· c=a±b a b a ~ c a=~ c=a±b r= b (c=a±b b 20-63 FUNDAMENTALS OF SYSTEM ANALYSIS TABLE 17. THEOREMS FOR THE TRANSFORMATION AND REDUCTION OF BLOCK DIAGRAM NETWORKS (Continued) 12. Removing an element from a forward loop 13. Inserting an element in a forward loop 14. Eliminating a forward loop 15. Removing an element from a feedback loop 16. Inserting an element in a feedback loop ~ 17. Eliminating a feedback loop 18. Special form of 17 19. Special form of 17 K1Gl ~ 11= (K 1 Gl)(K2G2~ ~ ~ KIG 1 HK1Gl ~ ~ ~ __ 1_ ~ b c=a±b b=d l+K 1G 1 KIG 1 20. Inserting a feedback loop to replace an element a a 21. Different form of 20 ~ B d )' d ;,. ~ ~b K1G 1 !-K1G 1 ;lI _1__ 1 K1Gl 1:, ~d' 20-64 FEEDBACK CONTROL internal loop, thus replacing a loop by a closed loop transfer function, which has no takeoff or feedback points. Exalllpies to Illustrate Transforlllation Rules. EXAMPLE 1. The forward elements GI, G2 , and G3 , in Fig. 25a may be combined by multiplication as shown by eq. (56) and Fig. 25b: (56) C Ml E E - =- 148 M2 .- .- M2 Ml Mt C = G1G2 G3 = G. '~. M2 )EJ c ): (a) _R~ c ~ (c) (b) FIG. 25. (a) Simple closed loop system; (b) combinations of forward transfer functions; (c) system in simplest form. In practice, loading of one element by another must be considered. Figure 25b is further reduced to Fig. 25c by use of eq. (57): (57) C G R 1 +GH Direct or unity feedback is also common and represents a particular case of eq. (57), where H = 1. EXAMPLE 2. Reduction of a complex diagram is shown in Fig. 26. Note in the first reduction, Fig. 26b that the block diagram is altered to include an additional G4 element for mathematical simplicity, although the signal flow and algebra is identical. Also note in Fig. 26d that a positive feedback is accomplished by using eq. (58), except that the H has a negative sign. (58) FUNDAMENTALS OF SYSTEM ANALYSIS 20-65 c M5 ~----------~Hl ~----------~ (a) Original system (b) First reduction _ I----~ G 5 - (c) Second reduction E (d) Third reduction (e) Final reduction FIG. 26. G3 1+G4H2 (Ref. 3a). 20-66 FEEDBACK CONTROL EXAMPLE 3. For systems with multiple input and/or disturbances, the superposition theorem is used. The example of Fig. 27 is used to show the response C as a function of two inputs. c (a) System with multiple inputs ~r:::r (b) With R2 = 0 c (c) With R t =0 FIG. 27. Let R2 = 0 as in Fig. 27b: C (59) Let Rl = 0 as in Fig. 27c: (60) Combining inputs: C G2 -=------. R2 1 + G1G2H (61) 4. SYSTEM TYPES Definition of Systelll Types The idea of the functional similarity of seemingly different transfer functions is strengthened by classification into types. Three common types FUNDAMENTALS OF SYSTEM ANALYSIS 20-67 are ones in which the following conditions are obtained after the transient has subsided: Type O. A constant value of the controlled variable requires a constant actuating error signal. Type 1. A constant rate of change of the controlled variable requires a constan t actuating error signal. Type 2. A constant acceleration of the controlled variable requires a constant actuating error signal. These characteristics may be identified in terms of the transfer function. For a simple closed loop system with direct feedback the error signal E(s) = R(s) - C(s), (62) where R, the reference input signal, is compared with C, the output signal. The forward transfer function C(s) G(s) = (63) E(s) is of the general form K(l + als + a2 + ... ) (64) G(s) = 2 3 • sn(1 + bls + b2s + b3 s + ... ) The value of the integer n in eq. (64) is equal numerically to the type of the system. Complex plane plots may be obtained by replacing s by jw in eq. (64). The nature of the plots as w -7 0 will be representative of the type of servomechanism studied, as illustrated in the following section subdivision, Typical Complex Plane Plots. Typical Complex Plane Plots A type 0 servomechanism representative plot is given in Fig. 28 (Ref. 34). At w = 0, the transfer function G(jw) is on the positive real axis and has a Imaginary G(jw) FIG 28. Representative complex plane plot for type 0 servomechanism system (Ref. 34). 20-68 FEEDBACK CONTROL finite value Kp. Generally as w -7 00, G(jw) traverses the fourth and then the third quadrants and approaches the origin. A type 1 servomechanism representative plot is given in Fig. 29 (Ref. 34). For this plot as w -7 0, the polar plot .of G(jw) approaches minus infinity on the imaginary axis. Generally as w increases toward +00, G(jw) enters Imaginary w = +00 G(jw) Re = jw(l +jw)(l +jro)(l +j~) FIG. 29. Representative complex plane plot for type 1 servomechanism system (Ref. 3). the third and then the second quadrant. The type 1 servomechanism when used for a position control system may also be called a "zero displacement-error system" meaning that the output has the desired value of displacement, in contrast to the type 0 servomechanism, where an error proportional to the desired amount of displacement is necessitated. A type 2 servomechanism representative plot is presented in Fig. 30 (Ref. 34). For this plot as w -7 0, the plot of G(jw) approaches minus infinity Imaginary -1 +jO \ \ \ " "- Re '\. ........ w-O+ ................ _ ----- ...... --------.,.,.... FIG. 30. Representative complex plane plot for type 2 servomechanism system (Ref. 3). FUNDAMENTALS OF SYSTEM ANALYSIS 20-69 on the real axis. The plot may be closed from w = 0+ to w = 0 - by a circle of infinite radius traversed in a counterclockwise direction, as indicated by the dotted line. The type 2 servomechanism has a "zerovelocity error" characteristic since it is able to maintain a constant output speed with no actuating error. It is also capable, like the type 1 servomechanism, of maintaining a constant output position without actuating error. Typical Application Examples of type 0 servomechanisms are speed regulators for d-c motors and jet engines and other forms of regulators controlling voltage, current, or temperature, where proportional controllers are employed. Examples of type 1 servomechanisms are position control systems with such integral controllers as d-c motors, hydraulic motors, and hydraulic valve-piston linkages. Other examples of type 1 servomechanisms are speed control systems such as for a jet engine with proportional and integral control. Examples of type 2 servomechanisms are position control systems in which a pilot motor is employed to drive a control element, whose position controls the speed of the main drive motor that supplies power to the load being positioned and torque motors with series compensation. Block diagrams of each of these servomechanism types are given in Ref. 34. The following paragraph is in substance from Servomechanism Analysis by G. J. Thaler and R. G. Brown (Ref. 35). TABLE Type System 18. CHARACTERISTICS AND ApPLICATIONS OF TYPES SERVOMECHANISMS (Ref. 35) Locus Characteristic Error Characteristic o Closed. Position error at all times. 1 Open. The low-frequency end of the locus goes to infinity along the negative imaginary axis. Open. The low-frequency end of the locus approaches infinity along the negative real axis. No static error. Lag error when operated at constant velocity. 2 No static error. No position error at constant velocity. Constant error in acceleration. 0, 1 AND 2 Application Static positioning systems where high accuracy is not important. Some regulator systems. High-accuracy static and dynamic positioning systems. High-accuracy dynamic posi tioning systems. Control acceleration errors. FEEDBACK CONTROL 20-70 In general, the complexity of equipment, cost, and difficulty in design increase greatly with the more advanced type of systems. Type 1 servomechanisms are therefore more common than any of the others. Occasionally, accuracy requirements will justify the type 2 system, and in other cases where high accuracy is not essential a type 0 system is more economical. Table 18 summarizes characteristics and applications of types 0, 1, and 2 servomechanisms. 5. ERROR COEFFICIENTS One of the important figures of merit of a system is its accuracy under various conditions. By accuracy is meant the ability of the system to minimize the error between the actual output and the desired output. The usual types of accuracy specified for a control system are its static accuracy and its dynamic accuracy. Static accuracy is the accuracy for the output or one of its specified derivatives in a steady-state condition. Dynamic a"ccuracy is the accuracy existing during transient conditions of the output and of its derivatives. Static Error Coefficients A static error coefficient may be defined as the ratio of the steady-state constant value of the output or of one of its constant derivatives to a constant applied error. The static error coefficients are then: Position error coefficient, Output, c . C(s) Kp = = hm-Applied error, e 8~O E(s) for constant output, c. Velocity error coefficient, Kv = Velocity of output, c Applied error, e for constant velocity of output, !::. . sC(s) 8~O E(s) = hm-- dc c = -. dt Acceleration error coefficient, Ka = Acceleration of output, • .. !::. . 2 d c dt -2' S2C(S) = hm-8~O Applied error, e for constant acceleratIOn of output, c = c E(s) FUNDAMENTALS OF SYSTEM ANALYSIS 20-71 K p , [(v, and Ka are respectively the gain constants of type 0, 1, and 2 control systems. For a sinusoidal applied error e, the error coefficients for types 1 and 2 control systems may be defined in terms of the maximum velocity and acceleration of the output c as follows: Velocity error coefficient (type 1): v, (65) .Il... v where e /Cmax ~ e at time of max Cmax = l'Im---I w-'O e /cmax c. Acceleration error coefficient (type 2) : VI .I\, a (66) = l'Im---I cmax w-'O e / Cmax A • f .. were e /cmax = e at tIme 0 max c. h In the limit K'v and K' a are identical in value to the values of Kv and Ka obtained for constant c and C. Table 19 presents a comparison of errors for various types of controlled motion in which c, C, or c is constant or c is oscillating sinusoidally at a much lower frequency than that corresponding to the shortest time constant of the control system. TABLE 19. COMPARISON OF STEADy-STATE ERRORS Type of control system Limit G(s), s ~ 0 Error for constant output, c Error for constant velocity of output, c Error for constant acceleration of output, Maximum e for sinusoidally varying c = C max sin wt c 0 Kp c/Kp (Ref. 34) 1 Kv/ s 0 2 Ka/ s2 0 0 00 c/Kv 00 00 c/K a Cmax/Kp wCmax/Kv w2Cmax /Ka Dynamic Error Coefficients A form of dynamic error coefficient is defined as the ratio of the input of one of its specified derivatives to the component of the error which may be assigned to it during a dynamic condition. That is, the error may be expanded in a series in terms of the input and the derivatives of the input. The dynamic error coefficients are then the reciprocals of the coefficients of the various derivatives since they indicate proportionality between dynamic error components and input derivatives. . FEEDBACK CONTROL 20-72 vVriting the transform of a system with unity feedback gives 1 E(s) (67) R(s) 1 + G(s) E(s) may be expanded from the ratio of two functions of s, R(s) (68) E(s) = 1 + G(s) into R(s) operated upon by the Maclaurin series expansion of 1/[1 or by simply dividing the numerator by the denominator, (69) E(s) = 1 1 + I(p R(s) +- 1 KI sR(s) 1 +- K2 + G(s)], s2R(s) "', which is valid near s = 0 (the steady state). Let K o = 1 + Kp. Ko, Kl, K 2, ... are commonly called dynamic error coefficients (Refs. 36 and 37) of the system. Note. Strictly speaking, Ko, Kl, K 2, etc., are not "dynamic" error coefficients since the transient terms were lost upon the series expansion of eq. (68). More correctly, they might be termed steady... state error coefficients. The term dynamic is common usage. Ko, Kl, K2 may contain not only the values of the static error coefficients K p, K v, and K a , respectively, but also expressions involving the static error coefficients and the time constants of the system. Therefore, high gain alone is not sufficient for accurate dynamic performance, for low system time constants are also important for this purpose. EXAMPLE. Dynamic error coefficients. For the position servo of Fig. 11: Kv G(s) = - - s(1 TIS) (70) + where Kv = K/D, Tl = J/D. From eqs. (68) and (70) for unity feedback (71) (72) E(s) _ _ R_(s_)_ 1 G(s) + E(s) = ~ sR(s) + [Tl I(v + TlS2 + s + TIS2 R(s),. s Kv Kv - ~] s2R(s) + .... Kv The dynamic error coefficients in this case are Kv and Kv2/(KvTl - 1), showing that here the system time constant produces an acceleration component of error proportional to the time constant. TABLE lId [ e(t) = -Ko ret) + -Kldt Locus Identification + K12dt2d 20 u 4 ret) SERVO ERROR COEFFICIENTS nt) + -K31 3dtd + ... (Ko . ret) = 1 (Ref. 39) + Kp and Kp = 00 for all servos in table) ] "'T1 Transfer Function G(s) WlW2 6-12 s(s + W2) WIW2 W3 6-12-18 s(s + W2)(S + W3) 1 1 1 KI K2 Ka 1 WI - 1 WI WlW2 + WlW3 0 » ~ m 2WI W2 - W2 Z 3 Wl W2 2 Wl W2 WI c Z - Wl 2 W3 W 2 2 Wl W2W3 - 2WIW2 - 2WIW3 3 Wl W2 W3 -i » r+ w2w 3 (J) 0"'T1 (J) 6-12-6-12 6-12-18-12 + W3) s(s + W2)(S + W4) WIW2(S + W4) s(s + W2)(S + W3) WIW2(W3/W4)(S w4 WIW3(W2 WI - W2W4(WI + W4) WIW3(WIW3 - 2W4 - WIW4(W2 + W3) - W2 W3(WI 2 Wl W2 W3W4 wlw4 - 2W2W4 - WIW2) + W2W4(WI + W4)2 3 Wl W3 3w 2 Wl 2w 32w 2 WlW3 1 + W4) 2 + W4) WIW4(WIW4 - 2W3W4 - 2W2W4 - Wlwa - 3 W1 W4 3w 2 WIW2) + W2W3(WI + W4)2 -< (J) -i m ~ » Z » r-< (J) Ui 12-6-12 WI(S s2(s + W2) + W3) 0 W3 WIW2 W2 WIW2 W3 2 ~ 0 ~ w FEEDBACK CONTROL 20-74 Error Calculation for a Given Input. The error may be calculated for a given input when ret), r'(t), r"(t), etc., are known or calculated. First the transforms R(s), sR(s), s2R(s) , etc., are evaluated from the input and its derivatives by the formula for real differentiation: n (73) snF(s) = £ff(n)(t)] + L j -.;:,. 'li .W L: / 1/ Observer Input Angle ret) and Two Derivatives ret) t an -1 d'l. d2 dt 2 [r(t)] dt 2 [r(t)] A/L At L 1 + (At/L)2 -2(A;i/L3)t + (At/L)2]2 [1 Values of Error Coefficients Ko, K I , K2 (see eqs. 69 and 72) o Dynamic Error, e(t) (see eq. 74) A/L ] 1 ] [ e(t) = [ Kv 1 + (At/L)2 - [TIl] [ 2(A 3/L3)t ] Kv - Kv 2 [1 + (At/L)2]2 + ... An upper bound for the correction term Rn+l may be found by replacing rn+l(r) by Irn+l(r) Imax and performing the integration of eq. (75) (a valid procedure for many functions gn+l(t - r)). When r, r', r", .. " rn suffer at most step discontinuities, the response may be expressed by the first n + 1 terms of eq. (74) plus Rn+l of eq. (75) plus the expression: n Mi i=O k=l 2: 2: flikgi+l (t - tik) , where Mi is the number of step discontinuities of ri(t), flik is the magnitude of the discontinuities, and tik is the time of the kth discontinuity. The contribution of impulses can be added in separately. The response at any time t may be written in three parts: 1. A finite number of terms of the equivalent of the familiar dynamic error expansion, eq. (74), 20-76 FEEDBACK CONTROL 2. A corresponding finite set of transient terms which accounts for possible discontinuities in the arbitrary forcing function and its derivatives. 3. A convolution integral which clearly places in evidence the exact inaccuracy in the response involved in using a finite number of coefficients (equivalent to the familiar error coefficients). The above expression results from a closed expansion of the convolution integral to which the response at any time t may be equated. The usefulness of this expansion of the response into three parts lies in the fact that in many problems Rn+l contributes a small portion of the total response. Consequently Rn+l may either be neglected or crudely approximated without introducing appreciable inaccuracy in the total response. In such cases the process of convolution to obtain the response is replaced by differentiation and summation. If rn+l(r) in eq. (75) is replaced by its maximum absolute value, then for many functions gn+l (t - r) it may be shown that integration yields an upper bound, IRn+llmax = \gn+2(O) \ Ir n+l(r)lmax = \cn+ll I r n+l(r) Imax, where Cn+l is the coefficient of rn+l(t) in the expansion of the response and of sn+l in the Maclaurin series for the system function such as E(s)jR(s). EXAMPLE. Dynamic Error Expressed by Expansion of the Convolution Integral. Illustrate the expansion of the dynamic error for a case in which there exists a known solution. Let E(s)jR(s) of eq. (67) be Bj(s + b), the response characteristics gi+l (t) being and the Maclaurin series expansion being 2:-B ( 00 i=O b S )i -b (see eq. 69). Then the dynamic error may be expressed by n of . the error expansion n B ( 1 )i . 2: - - r1(t) i=O b -b (see eq. 74) plus the transient terms M" 2: 2: n i=O k=l AikB ( - 1 )i+l -b e-b(t-ttk) + 1 terms FUNDAMENTALS OF SYSTEM ANALYSIS 20-77 plus the remainder I t [rn+l(r)][B ( -=-1 )n+l "" e-b(t-T)u(t - r)] dr b -00 (see eq. 75), which may be bounded by Ir"+I(r) 1 max / ~ CJ+ 1 /, + in which is recognized the (n 1)th coefficient of the Maclaurin series for E(s)jR(s). Let R(s) be Aj(s a). Then the input and its derivatives ri(t) are given by (-a)iAe-atu(t), and for each value of i, ri(t) has one discontinuity (Mi = 1) at time til = 0 of magnitude ~il given by (-a)iA. Also Irn+l(r) Imax is given by an+IA. + The dynamic error then may be written as ~ A (~) (~)\e-at _ z=o b b e-bt ), ' with the remainder Rn+l bounded by A(Bjb)(ajb)n+l. This series for the error converges rapidly and is useful for a« b, corresponding to an input which is slow compared with the system response characteristics. As n ~ 00, the dynamic error expansion ~ AB[(e-at - e-bt)j(b - a)], which checks with the inverse Laplace transform of ABj(s + a)(s + b). Another example of the expansion of the convolution integral to yield a time response in three parts is given in Ref. 41. Relative Usefulness of Error Coefficients. Both static and dynamic error coefficients are treated in Ref. 34. Dynamic error coefficients are also treated in Refs. 37 and 36, which includes a treatment of the approximate relations, for small overshoot, between the time delay and rise time of the step function response and the dynamic error coefficients. Dynamic error coefficients [{o, KI, K2 ... have been defined by eq. (69) and have the same values for all error constants up to and including the one which is nonzero and finite in the classical definition of static error coefficients, K p , K v, [{a, ... (as given in St~tic Error Coefficients). This definition of dynamic error coefficients by eq." (69) has the advantage of giving additional information about a system, because the value of any constant is not forced to be zero whenever the preceding constant is nonzero and finite as is the case with static error coefficients K p , K v , K a , .. '. (These coefficients Ko, K I , K2 ... are the same in the steady state as the dynamic error coefficients defined for any time t at the beginning of this section subdivision.) 20-78 FEEDBACK CONTROL Relationship between Dynamic Error Coefficients and Roots of System Equations. This section is, in substance, from Automatic Feed- back Control System Synthesis by J. G. Truxal (Ref. 42). As a result of the relation between C/R and E/R, there results the Maclaurin expansion C(s) 1 1 1 2 - - = l - - - - s - - s - .... R(s) Ko Kl K2 (76) The relation between the dynamic error coefficients K o, Kb K2 and the poles and zeros of the closed loop expression C/ R is readily determined if C/ R is written in factored form. (77) where the zeros lie at -Zi, the poles at -Pi. The solutions for the dynamic error coefficients are: m K (78) Ko = II Zi i=l n m II Pi - K II Zi m where II indicates the product of all factors from i i=l i = m and for cases where Kp ~ ex:> = 1 to and including (Ko = 1 + Kp), (79) (80) Equations (79) and (80) are of basic importance in servo synthesis for they represent the correlation between the dynamic error coefficients and the system response characteristics, specifically the time delay and rise time of the response of the system to a step function. In addition the two equations indicate the manner in which lead and integral equalization permit control over Kl and K2 without affecting relative stability. Generalized (dynamic) error coefficients are treated in greater detail in Ref. 36, and their relation to closed loop roots is treated at length in Ref. 42. FUNDAMENTALS OF SYSTEM ANALYSIS 20-79 Guillemin's Method. Equations (79) and (80) are of basic importance in feedback control system synthesis by Guillemin's method, which is described in Chap. 23. In the first step of this method the closed loop transfer function is determined from the specifications for frequency response and transient response. In Chap. 23 use is made of techniques for obtaining with the aid of these two equations the zeros and poles for compensation required to obtain the desired closed loop transfer function. 6. ANALYSIS OF A-C SERVOS: CARRIER SYSTEMS In a d-c servo the signals are directly proportional to the instantaneous amplitude whereas in an a-c system the signals are modulated carrier waves, and the information is carried in the modulation. For instance in an amplitude modulated system the envelope of the modulated wave contains the signal information. Owing to the convenience and simplicity of using a synchro or chopper circuit as a modulator and a two-phase motor as a demodulator, many carrier systems use suppressed-carrier, amplitude modulation. Frequency and phase modulation are not yet in common use although they offer theoretical advantage for null detection. The value of an a-c system lies in the possible use of sensitive, accurate, low-force level sensors, inexpensive and relatively easily produced as amplifiers, and power elements with low maintenance requirements. Basic Types of Elements. Three types of elements are encountered in carrier systems: Type 1. Elements in which both input and output are modulated carriers. Type 2. Modulators which have inputs at signal frequencies and outputs which are modulated carriers. Type 3. Demodulators which have modulated carriers as inputs and have signal frequencies as outputs. Figure 31 shows several electronic circuits that can be used as modulators and demodulators. In addition the mechanically tuned or electromechanical choppers are used extensively. The a-c servo motor also serves as a demodulator and the a-c tachometer acts as a modulator. Two-phase servomotors and a-c tachometers are analyzed in Refs. 45-47 and 49-51. An accurate mathematical representation of these elements is complex and simplifying assumptions and analogies are commonly used. Simplifying Assumptions. The majority of work done with a-c feedback systems has been on suppressed. carrier systems. For this type of system the following assumptions are normally made: FEEDBACK CONTRO"t 20-80 r.c C a , d-c signal Input (modulator) Output (demod.) Output (mod.) Input (demod.) Vacuum Tube Diode r a-c ref. '-----"'\ 00000) a-c signal 1 d-c signal Diodes a-c signal Transistor FIG. 31. Modulators and demodulators. FUNDAMENTALS OF SYSTEM ANALYSIS Input a-c (demod.) d-c (mod.) 20-81 Output d-c (demod.) a-c (mod.) Vacuum Tube Triodes Input a-c/d-c aJc. ref. Output d-c/a-c E: C Magnetic-Amplifier Input Output Transistor FIG. 31. Modulators and demodulators (continued). FEEDBACK CONTROL 20-82 1. Modulators generate perfect suppressed carrier signals; i.e., harmonics besides the sideband frequencies, self-generated noise, and serious phase shifts in the modulator are neglected. 2. The motor acts as a perfect demodulator with a prescribed static and dynamic relationship between the driving voltage envelope and the output shaft position. 3. The carrier frequency and magnitude remain constant. Suppressed Carrier SystCIn. A simple suppressed carrier open loop system is shown in Fig. 32. If the input to the preamplifier is the product of the carrier and the reference input, then (M cos aWet)(V cos wet) = MV(cos awet) (cos wet). Expanding this yields (81) MV[cos (1 + a)Wet + cos (1 - a)wetJ. As indicated by eq. (81) this type of modulator has an output containing only the two sideband frequencies, the sum and difference of the modulating FIG. 32. Open loop carrier system (Ref. 3b). and carrier frequencies, (1 - a)w e and (1 + a)w e, and no carrier frequency, We. This gives rise to the name suppressed carrier. A typical suppressed-carrier feedback system using tachometer stabilization is shown in Fig. 33. The signal equations are indicated at the significant points in the system. M cos aWct ,-----, K cos (awct + cP) FIG. 33. Typical a-c servo with tachometer feedback (Ref. 3b). 20·83 FUNDAMENTALS OF SYSTEM ANALYSIS SystmTI Analysis and Design. For the purposes of system analysis the a-c components can usually be treated in the same manner as analogous d-c components. For instance as shown in Fig. 34, the speed-torque curves of a d-c and an a-c machine are analogous, and the analysis of the a-c machine can proceed in the same manner as the analysis of the d-c machine. See Table 5, Sect. 1. Note, however, that the a-c machine 8 Fixed source r:) e~I~!~~ v 1 I Load 1/ N "t:I Applied voltage Rated voltage "t:I Q) Q) Q) Q) c. c. en en Torque Torque-speed characteristics of d-c shunt motor FIG. 34. Torque A-c two-phase motor linear approximation to d-c motor D-c analogy to a-c two-phase motor. characteristics are nonlinear and that to derive a linear transfer function, the analysis must be carried out on a linearized, incremental change basis. The linearization methods of Sect. 2, Chap. 25, can be used. There are torques at other frequencies besides the signal frequency produced in the motor. These torques are at frequencies (2 + a)w e and (2 - a)w e and normally because of the high frequency and low amplitude produce little mechanical motion. However, the associated currents can produce important heating effects. Similarly because the motor tends to discriminate against quadrature control signals, the currents produce little torque but the heating and/or saturating effects of quadrature currents can be important. The a-c and d-c analogy can be extended to a-c tachometers. An a-c tachometer with a control signal of a frequency aWe and an amplitude pro- 20-84 FEEDBACK CONTROL portional to aWe affects system performance in a manner similar to a d-c tachometer. Alternating-current stabilizing networks and a-c system stability are treated in Chap. 23, Sect. 3. Noise, quadrature voltage or carrier phase shift, variations in carrier frequency, and pickup present major problems in a-c system design, and their consideration dictates as much as the stability analysis the form and characteristics selected. As a result it is desirable to define the environment arid operating requirements before investigating the system stability. ACKNOWLEDGMENTS The cooperation of the following is gratefully acknowledged in granting permission to reproduce material in this chapter: American Institute of Physics. From: Journal of Applied Physics (part of Sect. 5). General Electric Company. From: Servomechanisms and Regulating System Design by H. Chestnut and R. W. Mayer (John Wiley & Sons, N ew York) (Tables 3, 4, 5, 16, 19; Figures 11, 23, 26, 28, 29, 30, 32, 33). McGraw-Hill Book Company. From: Automatic Feedback Control by W. R. Ahrendt and J. F. Taplin (Table 21); Servomechanism Practice by W. R. Ahrendt (Tables 1, 20); Servomechanism Analysis by G. J. Thaler and R. G. Brown (Tables 2, 18). Westinghouse Engineer (Tables 6, 7, 8). John Wiley & Sons. From: Transients in Linear Systems by H. F. Gardner and J. L. Barnes (Tables 9, 15). American Institute of Electrical Engineers. From: Electrical Engineering (Table 17). Bureau Of Aeronautics, United States Navy. From report prepared by Northrop Aircraft Co. (Tables 13 to 15, Figures 16 to 19. REFERENCES 1. E. A. Guillemin, Synthesis of RC Networks, J. Math. Phys., 28, 22-42 (1949). 2. G. J. Thaler and R. G. Brown, Servomechanism Analysis, Chap. 1, McGraw-Hill, New York, 1953. 3. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design: (a) Vol. I, 1951, (b) Vol. II, 1955, Wiley, New York. 4. J. R. Ketchum and R. T. Craig, Simulation of Linearized Dynamics of Gas-Turbine Engines, Nail. Advisory Comm. Aeronaut., Tech. Notes 2826, November 1952. 5. L. M. Toss, How to reckon basic process dynamics, Control Eng., 3,50-55 (1956). 6. H. Chestnut and R. VV. Mayer, Servomechanisms and Regulating System Design, Vol. II, Chap. 1, vViley, New York, 1955. 7. J. B. Reswick, Determine System Dynamics-without Upset, Control. Eng., 2, 50-57 (1955). 8. J. G. Truxal, Automatic Feedback Control System Synthesis, McGraw-Hill, New York, 1955. FUNDAMENTALS OF SYSTEM ANALYSIS 20-85 9. 'V. M. Gaines, Frequency response methods in design of turbojet engine controls, Second Feedback Controls System Conference, Am. Inst. Elcc. Engrs., April 1954. 10. W. R. Ahrendt, Servomechanism Practice, McGraw-Hili, New York, 1954. 11. S. W. Herwald, Forms and principles of servomechanisms, lVestinghouse Eng., 6, 149-155 (1946). 12. W. R. Evans, Control System Dynamics, McGraw-Hill, New York, 1954. 13. H. Chestnut and R. 'V. Mayer, Servomechanisms and Regulating System Design, Vol. 1, Chap. 3, Wiley, New York, 1951. 14. S. B. Crary, Power System Stability, Vol. 1, p. 1, Wiley, New York, 1945. 15. M. F. Gardner and J. L. Barnes, Transients in Linear Systems, Vol. 1, Appendix A, Wiley, New York, 1942. 16. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. 1, Chap. 4, Wiley, New York, 1951. 17. M. F. Gardner and J. L. Barnes, Transients in Linear Systems, Vol. 1, Chaps. 3-6, Wiley, New York, 1942. 18. G. J. Thaler, Elements of Servomechanism Theory, Chap. 3, McGraw-Hill, New York, 1955. 19. G. J. Thaler and R. G. Brown, Servomechanism Analysis, Chap. 4, McGraw-Hill, N ew York, 1953. 20. Methods of Analysis and Synthesis of Piloted Aircraft Flight Control Systems, BuAer Rept. AE 61-41, March 1952, Appendix, Sect. A. 21. M. F. Gardner and J. L. Barnes, Transients in Linear Systems, Vol. 1, Chap. 8; Wiley, New York, 1942. 22. E. Weber, Linear Transient Analysis, Vol. 1, Chap. 2, Wiley, New York, 1954. 23. H. Chestnut and R. 'V. Mayer, Servomechanisms and Regulating System Design, Vol. 1, Chaps. 9 and 10, Wiley, New York, 1951. 24. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. 1, Chaps. 12 and 13, Wiley, New York, 1951. 25. G. S. Brown and D. P. Campbell, Principles of Servomechanisms, Chap. 8, Wiley, New York, 1948. 26. H. M. James, N. B. Nichols, and R. S. Phillips, Theory of Servomechanisms, Chap. 4, McGraw-Hill, New York, 1947. 27. M. F. Gardner and J. L. Barnes, Transients in Linear Systems; Vol. I, Chaps. 2 and 7, Wiley, New York, 1942. 28. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. I, Chap. 7, Wiley, New York, 1951. 29. A.I.E.E. Standards Subcommittee on Terminology and Nomenclature of the Feedback Control Committee, Am. Inst. Elec. Engrs., January 1950. See also Letter Symbols for Feedback Control Systems, ASA Y10.13-1955, American Standards Association, New York, July 1955. 30. IRE 26.S2 Standards on Terminology for Feedback Control Systems, 1955, Proc. I.R.E., 44, 107-109 (1956). 31. IRE 26.S1 Standards on Graphical and Letter Symbols for Feedback Control Systems, 1955, Proc. I.R.E., 43, 1608-1609 (1955). 32. T. D. Graybeal, Transformation of Block Diagram Network, Elec. Eng., 70, 985990 (1951). 33. T. M. Stout, A Block Diagram Approach to Network Analysis, Trans. Am. Inst. Elec. Engrs., Application and Industry, 71, 255-260 (1952). 34. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. 1, Chap. 8, Wiley, New York, 1951. 20·86 FEEDBACK CONTROL 35. G. J. Thaler and R. G. Brown, Servomechanism Analysis, Chap. 7, McGraw-Hill, New York, 1953. 36. J. G. Truxal, Automatic Feedback Control System Synthesis, Chap. 1, McGraw-Hill, New York, 1955. 37. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. II, Chap. 2, Wiley, New York, 1955. 38. P. E. Smith, Jr., Design Regulating Systems by Error Coe.fficients, Control Eng., 2, 69-74 (1955). 39. W. R. Ahrendt, Servomechanism Practice, Chap. 14, McGraw-Hill, New York, 1954. 40. W. R. Ahrendt and J. F. Taplin, Automatic Feedback Control, Chap. 7, McGrawHill, New York, 1951. 41. E. Arthurs and L. H. Martin, Closed expansion of the convolution integral (A generalization of servomechanism error coefficients), J. Appl. Phys., 26, 58 (1955). 42. J. G. Truxal, Automatic Feedback Control System Synthesis, Chap. 5, McGraw-Hill, New York, 1955. 43. R. A. Bruns and R. M. Saunders, Analysis of Feedback Control Systems, McGrawHill, New York, 1955. 44. M. Panzer, Envelope transfer function analysis in a-c servosystems, Trans. Am. Inst. Elec. Engrs., 75, 274-279 (1956). 45. S. S. L. Chang, Transient analysis of a-c servomechanisms, Trans. Am. Inst. Elec. Engrs., 74, 30-37 (1955). 46. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. II, Chap. 6, Wiley, New York, 1955. 47. R. J. W. Koopman, Operating characteristics of two-phase servo motors, Trans. Am. Inst. Elec. Engrs., 68, Pt. I, 319-329 (1949). 48. A. Hopkin, Transient response of small two-phase induction motors, Trans. Am. Inst. Elec. Engrs., 70, Pt. I, 881-886 (1951). 49. L. O. Brown, Transfer function for a two-phase induction servo motor, Trans. Am. Inst. Elec. Engrs., 70, Pt. 2, 1890-1893 (1951). 50. R. H. Frazier, Analysis of the drag-cup a-c tachometer, Trans. Am. Inst. Elec. Engrs., 70, Pt. 2, 1894-1906 (1951). 51. S. A. Davis, Using a two-phase motor as a tachometer, Control Eng., 2, 75-76 (1955). 52. J. G. Truxal, Automatic Feedback Control System Synthesis, Chap. 6, McGrawHill, New York, 1955. 53. G. M. Attura, Effects of carrier shifts on derivative networks for AC servomechanisms, Trans. Am. Inst. Elec. Engrs., 70, Pt. 1, 612-618 (1951). 54. C. S. Draper, W. McKay, and S. Lees, Instrument Engineering, Vol. II, McGrawHill, New York, 1953. E FEEDBACK CONTROl Chapter 21 Stability W. E. Sol/ecito and S. G. Reque 1. Introduction 2. Classical Solution Approach 3. Routh's Criterion 4. Nyquist Stability Criterion 5. Bode Attenuation Diagram Approach 6. Root locus Method 7. Miscellaneous Stability Criteria 8. Closed loop Response from Open loop Response References 21-01 21-02 21-05 21-09 21-29 21-46 21-71 21-72 21·81 1. INTRODUCTION Definition of Stability. A stable system is one wherein all transients decay to zero in the steady state. An unstable system is here loosely defined as one in which the response variable increases without bound with a bounded signal input. Reason for Stability Analysis. The primary objective of a control system design is to devise a system such that a controlled variable is related to a command signal in a desired manner within permissible tolerances. If power elements with reliable, unchanging characteristics were available, the problems of control system design would be much simplified. Since, in the main, the characteristics of power elements change with time, temperature, load, pressure, etc., a feedback element is employed to 21·01 21-02 FEEDBACK CONTROL remove the deleterious effects of change in element characteristics. To improve performance of the system, a _natural solution is to increase the gain or amplification in the system. The combination of a closed loop and high gain leads to problems of instability. Purpose of Stability Analysis. To be a satisfactory control system, the system deviation resulting from any normally encountered deviation stimulus must reduce with increasing time to a small value within acceptable tolerance. It is the purpose of stability studies to indicate a system's dynamic behavior, and if this behavior is improper or inadequate, the studies should point the way toward proper system revision to improve performance. Methods of Stability Analysis. The methods of studying stability presented in this chapter are restricted to linear systems .. A linear system is one in which the output due to simultaneous inputs is the same as the sum of the several outputs due to R~S) +_:E the inputs acting alone. In other words, a linear system is one which may be described by ordinary linear differential equations. wherein the FIG. 1. General negative feedback system. theorem of superposition holds true. For nonlinear systems, see Chap. 25. Consider the general feedback system shown in Fig. 1; s is the Laplace transform complex variable. The transfer characteristics are given by ::: C(s) I:J. C(s) R(s) G(s) 1 + G(s)H(s) The stability of a system is uniquely defined by those values of s which make (2) 1 + G(s)H(s) = 0. All the methods of stability analysis, therefore, confine themselves to investigation of eq. (2), in one fashion or another. The techniques can be classified in two general categories: those which obtain the explicit values of the roots of eq. (2) and those which obtain information about the bounded regions wherein all the roots lie. In the first category belong the classical approach and the root locus method. In the second category belong Routh's criteria, Nyquist's criteria, Bode's method, and many others. The relative merits of each will be discussed as each method is examined in detail. 2. CLASSICAL SOLUTION APPROACH As shown in Chap. 20, it is possible to relate the controlled variable to the command (reference) variable by a differential equation. For the sta- 21-03 STABILITY bility studies involved here, assume this is a linear differential equation of the form dnx(t) dn-lx(t) (3) ao - - + al 1 + ... + anx(t) dtn dt ndmy(t) dm-ly(t) = bo - - + bl I + ... + bmy(t) . dtm dt mSince this equation is linear, the solution may be broken into the sum of two solutions, the particular solution and the complementary or homogeneous solution. The particular solution, Xss , also called the forced response or the steadystate solution, is of the form xss = f(x). (4) In other words, the forced response is of the same character as the reference. For example, if y is sinusoidal, Xss is also sinusoidal. Characteristic Equation. When the operator p is substituted for d/dt and y is set equal to zero, eq. (3) becomes (5) aopnx(t) + aIpn-Ix(t) + a2pn-2x(t) + ... + an-lPx(t) + anx(t) =0 or (6) [aopn + aIpn-l + a2pn-2 + ... + an-lP + an]x(t) = o. This is the characteristic equation (see Chap. 20) of the system. The complementary solution, Xl, also called the homogeneous solution or transient solution, is of the form (7) The exponents pI, P2, Pa, ... , Pn are the roots of eq. (6). The coefficients AI, A 2 , A a, ••• , An depend upon the initial conditions of the system and the forcing function, y. . Note that when multiple roots occur, say PI = P2, the transient solution is of the form (8) The total solution is the sum of the two parts (9) X = Xss + Xl. Relation of Stability to Characteristic Equation. Instability has been defined as the output becoming large without bound for bounded input. Since the steady-state solution is of the same character as the forcing function, only the transient solution can provide terms which in- 21·04 FEEDBACK CONTROL crease without bound for bounded input. This occurs when any of the roots Pb P2, Pa, ... , pn have positive real parts because the corresponding exponential terms in eq. (7) or (8) tend to infinity as t becomes infinite. Because Ab A 2 , A a, ••. , An are finite values depending on initial conditions and Y88 is bounded for a bounded input, system stability is dependent only upon the nature of the characteristic equation of the system! In other words, system stability is uniquely determined by the behavior of the exponential terms in the transient response given by eq. (7) or (8). a. If all the roots have negative real parts, all the exponential terms decay to zero as time increases. This is a stable system. b. If any of the roots have positive real parts, the corresponding exponential terms increase without limit. This is an unstable system. c. If any of the roots are purely imaginary, the corresponding terms oscillate at constant amplitude. This condition is the dividing point between a stable and an unstable system. It is here also considered unstable. d. If it so happens that multiple roots occur, i.e., PI = P2, which are purely imaginary, the output increases without bound. Again this is an unstable condition. The fundamental problem in ascertaining system stability is therefore one of determining the nature of the roots of the characteristic equation of a given system. The straightforward method of determining stability of a system consists of the following steps: a. Write the differential equation of the system relating input and output variables. b. Substitute P for d/dt and equate the input signal to zero. This is the characteristic equation of the system in operational form. c. Obtain the roots of the characteristic equation with assigned values for all constants. d. Examine the roots. If all roots have negative real parts, the system is stable. If any of the roots have zero or positive real parts, the system is unstable. Figure 2 shows the regions of root location for stable and unstable systems. Note. If the Laplace transform method of analysis had been used, the confllusions would have been identical except that the complex variable s would replace the operator p. Equation (2) would yield: (10) [aos n + aIsn- 1 + ... + anlX(s) = [bosm + bIsm - 1 + ... + bmlY(s). As an input-output ratio similar to eq. (1) this is (11) C(s) Xes) R(s) Yes) --- = --- = ---------------------------- STABILITY 21-05 Solution of the equation resulting from setting the denominator of eq. (11) to zero is exactly the same as solution of eq. (2). [1 + G(s)H(s)] is a fraction of polynomials in s where the numerator polynomial is the characteristic function of the system. As shown in Fig. 2, stability is uniquely defined by those values of s which satisfy eq. (2). Because of more universal acceptance, the complex variable s will be used in place of the operator p in all the following methods of stability analysis. I axis FIG. 2. Pictorial representation of stability definition. Relative Merits of the Classical Solution Approach. This method of determining stability has the advantage of being theoretically exact, but suffers from two major disadvantages: a. A great amount of labor is required to factor equations of degree higher than 3. b. To factor any higher order equations, the coefficients must be numerical values. The loss of system parameters in literal form obscures the ways to improve system performance should redesign become necessary. 3. ROUTH'S CRITERION {Refs. 1, 2, 3} In 1877 E. J. Routh developed an algebraic method for determining whether a polynomial has roots with positive real parts. This method does not reveal the exact values of the roots but shows the bounded regions wherein they are located. Reference to Fig. 2 shows that this is all that is necessary to determine whether a system is stable or not. If all roots lie in the left half s-plane, the system is stable. Application of the Routh Criterion. Step 1. Write the characteristic equation in the form (12) [aos n + alsn- 1 + a2Sn-2 + ... + an_IS + an]X(s) = O. Remove all the zero roots, i.e., the roots that occur at s = O. If the zero roots do occur, they can easily be recognized because s or some mul- FEEDBACK CONTROL 21-06 tiple of s will be common to all terms in eq. (12). For example, if an = 0 in eq. (12), s would be common to all terms and could be placed outside the brackets. Step 2. Examine eq. (12) to see that all the coefficients of s are nonzero and of the same sign. If this is not true, an unstable system is immediately indicated. Step 3. Arrange the coefficients in an array of the form: Index n: n n n n n n n - bl a2 a3 b2 as a7 b3 CI C2 C3 4: dl d2 5: 6: el e2 iI 1: ao al 2: 3: -7: a4 a6 gl The index number indicates the highest order of s in a row. The first two rows consist of all the terms in the given equation and the rest are calculated in the following fashion. b2 = d2 = ala4 - aOa5 al clb a - blCa , , etc. etc. Cl etc. Notice that two terms in the first column are used in each calculation. As the term to be calculated shifts to the right, the additional two terms in the formula shift to the right also.· The formulas for calculation of terms in a row use only those terms in the two rows immediately above. The process is continued for (n 1) rows where n is the order of the characteristic equation. Step 4. After the array has been completed, stability can be investigated by inspection of terms in the first column. The number of changes in sign of the terms in the first column is the number of roots with positive real parts. This constitutes Routh's criterion. + STABILITY 21-07 EXAMPLE. Given the fourth order equation 8s4 + 2s3 + 3s2 + s + 5 = O. The array becomes: Index [b 1 = 2·3 - 8·1 =-2 2 = -1 J 2 1 [b 2 = 2·5 - 8·0 = 2 5 [Cl = -1·1-2·5 =-l1=l1J -1 -1 [d = 4: + 8 3 3: + 2 2: - 1: +11 0: +5 1 5 l 2~5 11· 5 - (-1· 0) 11 = = 5J 1~~5 = 5J There are two changes of sign in column one (between indexes 3 and 2, and 2 and 1), therefore the equation must have two roots with positive real parts. Since a fourth order equation has four roots, the remaining two roots must lie in the left half s-plane. Note. A generalization can be made from this example. The last term (+5) came down through the array without change. Since all the coefficients in the equation are positive, the first two terms in column one are positive. Only terms of index 2 and 1 in column one can be negative. Thus, a maximum of two sign changes can occur. Therefore, one can conclude that if all terms in a fourth order equation are nonzero and of the same sign, at least two roots must lie in the left half s-plane. This conclusion is of no great import in itself but it merely points the way to intelligent use of this method of analysis. Special Cases in Applying the Routh Criterion. Because the Routh criterion can be used to advantage in other commonly used stability studies, it is worth while to pursue the criterion in greater detail here. a. Row multiplication. Any row may be multiplied by a positive constant without affecting the criterion. This may be used to decrease the arithmetic labor involved. b. When the first term in a row is zero and other terms in the same row are not zero. To continue the process, replace the first column zero by an arbitrarily small positive constant, ~, and continue the calculations. Examine the complete array in the usual fashion. If necessary, ~ may be assigned any arbitrarily small value. This number may be positive or negative but is customarily assumed positive. FEEDBACK CONTROL 21-08 c. When all terms in a row are zero. This special case arises when roots lying radially opposite and equidistant from the origin occur as shown in Fig. 3. A pair of conjugate pure imaginary roots is of this category. When a row of zeros occurs, take the preceding row of coefficients and form a subsidiary function. This subsidiary function is the polynomial in s having as coefficients the terms of a row; the exjw ponent of the highest power of s is the index of the ~-- ---¥ J row and successive powers of s decrease by two. I I I EXAMPLE. The subsidiary function of the row 1 (J' with index 3 of the preceding example is I I I *-~- Is-plane ---* FIG. 3. Roots radially opposite and equidistant from origin. f(s) = 2s3 + s, whereas the subsidiary function of the row with index 2 is f(s) = -S2 + 5. Upon formation of the subsidiary function of the row preceding the row of zeros, differentiate it with respect to s and replace the row of zeros by the corresponding coefficients of the differentiated function. Proceed in the usual manner. The index numbers remain unaltered. Upon completion of the array, the number of changes in sign indicates the number of roots in the right half s-plane. The remaining roots are either in the left half s-plane or on the axis of imaginaries. One of several procedures can be utilized to determine the number of each (Ref. 4). A straightforward approach is as follows. In the original equation replace s by - s. This substitution rotates all the roots of the equation through 180 degrees. Those roots of the original equation in the left half s-plane are now in the right half s-plane. Application of Routh's criterion to this new equation determines the number of these roots. Thus, the number of roots of the original equation in the left half s-plane has been ascertained. The total number of roots is equal to the order of the original equation. Therefore the number of roots on the axis is equal to the total minus the sum of those in the right and left half s-planes. Relative Merits of Routh's Criterion. This criterion serves as a quick check on absolute system stability. It can also be used to advantage in the more powerful Nyquist criterion. It nicely avoids the necessity for factoring an equation to determine the nature of its roots. This method does not provide a clear indication of system performance and does not clearly show the ways to improve a design should improvement be required. STABILITY 21-09 4. NYQUIST STABILITY CRITERION This powerful criterion is based on the fact that the frequency response of the open loop transfer function indicates the stability characteristics of the closed loop system. In Fig. 1 the open loop transfer function is represented by G(s)H(s). Restrictions on the General Nyquist Criterion. a. G(s)H(s) must be the ratio of the transforms of linear differential equations. b. G(s)H(s) must be single valued and an analytic function (Ref. 5) for all values of s having zero or positive real parts except at possible discrete points (Ref. 4). Basic Definitions. In general G(s)H(s) is a fraction of rational polynomials in s. N 1 (s) Kl(S + SI)(S + sa) ... G(s) = - - = - - - - - - - - (13) 1)1(S) (s + S2)(S + S4)(S + S6) (14) N 2(s) K2(S + s'!)(s + s'a) ... H(s) = - - = . 1)2(S) (s + S'2)(S + S'4)(S + S'6) ... The all important eq. (2) can be written as (15) 1 + G(s)H(s) = 1 + N 1(s)N2(s) 1)1 (s )1)2( s) (16) 1 + G(s)H(s) 1)1(S)1)2(S) = + N 1(s)N2(s) 1)1 (S)1)2(S) . Characteristic Function. [1)1 (S)1)2(S) + Nl (s)N 2(s)] represents the characteristic function of the closed loop system of Fig. 1. The characteristic equation is merely the characteristic function set equal to zero. Zeros. The factors (s SI), (s sa), "', represented by Nl (s) are called zeros of G(s). This terminology arises because when s takes on the value of a root of Nl (s), i.e., -s1, -Sa, .. " Nl (s) equals zero and G(s) does likewise per eq. (13). Poles. The factors (s + S2), (s + S4), '.', represented by 1)1 (s) are called poles of G( s). When s takes on the value of a root of 1)1 (s), i.e., -S2, -S4, '.', 1)1(S) equals zero and G(s) goes to infinity per eq. (13). This rise to infinity is called a pole. Note. Per eq. (16), poles of G(s)H(s) are also poles of [1 G(s)H(s)] whereas zeros of [1 + G(s)H(s)] are unknown and their nature to be determined by the stability criterion. Zeros of [1 + G(s)H(s)] are poles of C(s)/R(s). + + + 21-10 FEEDBACK CONTROL Nyquist Criterion. General Procedure. a. Plot G(s)H(s) for s traversing the boundary of the entire right half s-plane in a clockwise direction. (See following note.) b. Draw a vector, Y, from (-1 + jO) [the minus one point in the G(s)H(s)-plane] to G(s)H(s) and observe the angular rotation of this vector for the above values of s. c. Let R be the net number of revolutions of this vector. R is positive for counterclockwise revolutions and negative for clockwise revolutions. d. Determine the number of poles of G(s)H(s) in the right half s-plane, i.e., poles with positive real parts. Call this integer number P. If necessary, Routh's criterion may be used to determine this. e. The number of zeros of [1 + G(s)H(s)], Z, is determined from the equation (17) Z = P - R. f. The system is stable if and only if Z = 0, i.e., if the number of counterclockwise revolutions of G(s)H(s) about the -1 point is equal to the number of poles of G(s)H(s) in the right half s-plane. Note. If G(s)H(s) has any poles on the jw-axis (i:e., pure imaginary roots), when s is taking on values up the jw-axis, it must bypass these points. It is customary to make s traverse a small semicircle to the right of these points as shown in Fig. 4. If G(s)H(s) ever does have poles on +joo jw jWl (J' -jwl s-plane -joo FIG. 4. Traversal of s for the Nyquist plot where G(s)H(s) has poles at ±jwl and O. the jw-axis whose values are unknown, it is almost as much effort to determine these as it is to find the zeros of [1 + G(s)H(s)] directly. In this case, Dzung's criteria (Refs. 19, 20) may be a better approach for stability analysis. Fortunately this condition arises infrequently. The Physical Meaning of Making s Traverse the jw-Axis. In short, it is obtaining the steady-state frequency response of the open loop transfer STABILITY 21-11 function G(s)H(s). Consider the case shown in Fig. 5. A sin wt is the input and in the steady-state condition, B sin (wt + 0) is the output. To be theoretically exact, the steady-state condition is the condition that exists after an infinite time has elapsed. This allows all the transients to die out to absolute zero for a stable system. For practical consideration, steady state occurs after the transients have settled down to arbitrarily A sin we ): ..... I __ G_(S_)H_(S_)_-.JI B sin (wt + 0) FIG. 5. Frequency response measurements. small values. Comparison of the ratio of the output sinusoid to the input sinusoid reveals that a gain change, B/ A, and a phase shift, 0, have occurred. This gain change and phase shift are due to G(s)H(s) and can be considered as the magnitude and direction of a vector. This is the vector notation of the steady-state behavior of G(s)H(s). Sinusoidal Input Variable. The Laplace transform of the input variable is (18) w w .c-1 [sin wt] = 2 S 2 +w (s + jw)(s - jw) The graphical representation of the Laplace transform of this sinusoid is a pair of points on the imaginary axis a distance of ±w from the origin. As the frequency of the sinusoid varies, w varies with it. See Fig. 6. Consider the case where K G(s)H(s) = . (19) (s + S2)(S + S4) The poles of G(s)H(s) are plotted at -S2, and -S4 in the s-plane in Fig. 6. In this same figure are plotted the poles of the input sinusoid whose frequencies are successively Wo, wI, W2, W3, W4, •• .', wn • The corresponding vectors representing the gain change and phase shift G(jw)H(jw) for each frequency are plotted in Fig. 7. One method to obtain G(s)H(s) for any particular s is to substitute the particular value of s in eq. (19). An equivalent, but more illuminating, procedure is to consider each factor in eq. (19) as a separate vector. In Fig. 6 are shown the two factor vectors for s = jW2' As s assumes values up the jw-axis, the vectors from the roots increase in magnitude and phase. Since these vectors appear in the denominator of eq. (19), as s traverses up the jw-axis, the magnitude of G(s)H(s) decreases whereas its phase becomes increasingly negative. For this particular transform given by eq. (19), for positive w, G(jw)H(jw) lies in the third and fourth quadrants. The entire curve expands or contracts with respective increase or decrease in the gain eom;tant K. 21-12 FEEDBACK CONTROL Imaginary -jwz G(s)H(s)-plane s-plane FIG. 6. s-Plane plot of input sinusoid and vectors of G(s)H(s) factors. FIG. 7. G(s)H(s)-Plane plot of frequency response ofG(s)H(s). Conformal Mapping. Mathematically, G(s)H(s) is a function which transforms a point in the s-plane to a point in the G(s)H(s)-plane. This mapping of points or curves in one plane to points or curves in another plane is called conformal mapping. The line along the jw-axis in the s-plane maps into the curve in the G(s)H(s)-plane shown in Fig. 7 by use of the transform G(s)H(s). An important point to remember is that any curve in the s-plane produces a corresponding curve in the G(s)H(s)plane. The curve in the s-plane which lies on the jw-axis corresponds to the input function being a variable frequency sinusoid. The shape of the corresponding curve in the G(s)H(s)-plane depends on the particular fraction of polynomials represented by G(s)H(s). Figure 8 shows other lines along which 8 might vary. Figure 9 shows the corresponding curves of G(s)H(s) for G(8)H(s) given by eq. (19). Points on line (1) correspond to input functions of the form. (20) whose Laplace transform is (21) £-1[e- u1t w sin wt] = - - - - - - - - - - (8 + 0"1 + jw)(s + 0"1 - jw) STABILITY 21-13 Points on line (2) correspond to input functions whose Laplace transform is (22) The conformal mapping procedure obtains definite corresponding curves for G(s)H(s) as shown in Fig. 9. Imaginary / -0"1 -0"1 + jWl + jco ---- + jco jw 0"1 j~L__ 0"1 + jWl 2 / ./ -- / / I / I I /\ I I \ \ ( ~:\~ "~~ Wo ~- - / _ From Fig. 7 ,..- '- " , , '~ 2 \ 1 . . . , " '\.A "0. \ i ~ Wo '1 Re s-plane G(s)H(s) -plane FIG. 8. Particular paths of s in s-plane. FIG. 9. Plot of G(s)H(s) for paths of s in Fig. 8. Principles of Nyquist Criterion. By use of conformal mapping principles it can be shown (Ref. 6) that if s is made to traverse the boundaries of a given area, observation of the behavior of the vector from the -1 point to G(s)H(s) in the G(s)H(s)-plane indicates how many zeros of [1 + G(s)H(s)] lie in the area whose boundaries were traversed by s. Refer to Fig. 10, where s is made to traverse the boundary of area A, and the corresponding path of G(s)H(s) is as shown in Fig. 11. Observation of the net rotation of the vectior iT about the -1 point gives a clear indication of the roots of [1 + G(s)H(s)] in area A. For every pole of G(s)H(s) I09ated in area A, iT will experience one net counterclockwise rotation about the -1 point. For every zero of [1 + G(s)H(s)] in area A, V will experience one net clockwise rotation about the -1 point. Therefore if the number of poles, P, of G(s)H(s) in area A is known, the number of zeros of [1 + G(s)H(s)] in area. A can be found by subtracting from P the number of net revolutions of 11 about the -1 point. If area A is FEEDBACK CONTROL 21-14 made to encompass the entire right half of the s-plane, existence of zeros of [1 + G(s)H(s)] in this area can be determined from the above procedure and stability of the closed loop system can be ascertained! Imaginary Re G(s)H(s)-plane s-plane FIG. 10. Arbitrary path of s in the s-plane. FIG. 11. Corresponding path of G(s)H(s). The left portion of the boundary in Fig. 12 corresponds to making the input to G(s)H(s) a sinusoid. The traversal out at infinity is only of mathematical importance because infinite values are difficult to handle in physical equipment. For practical purposes, that finite region relatively close to the origin is of major imjW---~----"""'~ portance as will be more clearly demonstrated in the Bode approach \ Outatoo~ to stability analysis. SUllllllary. Stability is uniquely I t defined by those values of s which make I 1 + G(s)H(s) = O. I , , s-plane I To ascertain existence of zeros of [1 + G(s)H(s)] in the right half s.,,/JIplane, Nyquist's criterion requires -----~---s to traverse the boundary of the FIG. 12. Path of s enclosing entire right half s-plane. entire right half s-plane. The portion of the boundary of major importance, thejw-axis, corresponds to a sinusoidal input function. Therefore, the frequency response of the open loop transfer function G(s)H(s) gives clear indication of stability of the closed loop system. This is most fortunate because constant amplitude variable frequency generators are much easier to build than exponentially varying variable frequency I STABILITY generators. mented. 21-15 Experimental procedures are thereby more easily imple- Application of Nyquist Stability Criterion. EXAMPLE 1. Given G(s)H(s) In Fig. 13 consider s in the region from b to c. As w becomes increasingly large, J( lim [G(s)H(s)] = "3 = 0/ -270°. 8--+j~ s In this region G(s)H(s) approaches zero asymptotically to the -270degree direction, i.e., the +j.B-line. As s traverses the boundary c-d-e Imaginary c t - - - - _..... --~ ...... jw , \ \ I b I I b ------qd Ih A -1 I I G(s)H(s)plane --_/ FIG. 13. s-Plane plot G(s)H(s) = s(s FIG. 14. / I t I Nyquist plot of G(s)H(s). 1(s + SI) + S2)(S + S4)(S + S6) . out at infinity, the G(s)H(s) vector rotates 540 degrees in the counterclockwise direction, but since the magnitude is zero, this rotation is unobservable. The region e-f is the conjugate of c to b. In the region f to g there is a continuous curve which wiggles a bit because of the pole zero locations as shown in Fig. 13. At point g the s traverse takes a 90degree turn to the right. In conformal mapping, angles are preserved in the small, therefore the G(s)H(s) plot also takes a 90-degree turn to the right. In the region g-h-a, G(s)H(s) behaves like K/s. K lim [G(s)H(s)] = -. 8--+0 S FEEDBACK CONTROL 21-16 In other words, the movement of s is very close to the pole at the origin so the vectors of the poles, and zeros relatively far away do not experience great change. The vector of the pole at the origin experiences a 180degree change in the counterclockwise sense. Since this vector is in the denominator of G(s)H(s), G(s)H(s) experiences a 180-degree change in the clockwise sense. The region a to b is the conjugate of g to f. The G(s)H(s) plot is usually plotted solid for s = +jw and dotted for the rest of the boundary. From Fig . .13 it is apparent that G(s)H(s) has no poles in the right half s-plane. p = o. Notice that the zero of G(s)H(s) is not considered at all. If the gain constant is such that the -1 is at point A in Fig. 14, R = o. Therefore z = P - R = 0 - 0 = 0, and the closed loop system is stable. If the gain constant is raised such that the -1 point is at B, there are two clockwise encirclements of the -1 point, z = P - R = 0 - (-2) = +2, and the closed loop system is unstable and has two poles in the right half s-plane. If the gain were adjusted such that the -1 point were at D, i.e., the G(s)H(s) curve passes right through the -1 point, R is indeterminant. This condition produces a constant amplitude sinusoidal oscillation in the closed loop system. A change in the gain constant is like changing the calibration on the coordinate axes. EXAMPLE 2. Given K(s - 10) G(s)H(s) S2 + 100 In the region a to b in Fig. 15 G(s)H(s) = For + .) K( 10 - _ wJW 100 2 KVw2 + 100 / 100 _ w2 tan- 1(w / -10). K W = 0, G(s)H(s) = - / -180° . 10 As w increases, the magnitude of G(s)H(s) increases and the phase angle becomes more negative. As W approaches 10, G(s)H(s) approaches infinity along the + 135-degree line. In the region b-c-d lim [G(s)H(s)] = 8-+jl0 K .. (s - JI0) STABILITY 21-17 Therefore, since s takes a 90-degree right turn and proceeds 180 degrees counterclockwise, G(s)H(s) takes a right turn and proceeds 180 degrees clockwise. In the region d-e, G(s)H(s) is well behaved and proceeds to zero as s approaches joo. As s ~ 00, K lim [G(s)H(s)] = - = 0/ -90 0 s 8--+jOO • Along e-f-g, G(s)H(s) remains at zero. The rest of the curve is the conjugate image of e to a. For this system P = O. From Fig. 16, R = 0 for the -1 point at A. This system is stable. For the -1 point at B, R = -1 and Z = 1. This system is unstable. Imaginary e ;f/ jw --~ h -----."..?'/~ , b .,/' d \ \ \I II II tl I (J" A f a B 'e-f-g -1 10 ",/ s-plane i( g FIG.15. II If/ c h / s-Plane plot ,(/ / It I I II II II / tI d/II ~____ FIG. 16. Re ~ _ _ _ _ _ ,Jfi Nyquist plot of G(s)H(s). K(s - 10) G(s)H(s) = s2 + 100 Practical Considerations in Plotting DiagraIlls. If one should ever find that the number of counterclockwise encirclements of the -1 point is greater than P, he may correctly infer that he has made a mistake in calculating either P or R! The procedure in drawing the Nyquist diagrams is first to draw in the approximate shape of the G(s)H(s) curve for the prescribed traversal of s. The labor involved is by no means negligible. To avoid unnecessary labor, the reader is advised to learn first how to use the following Bode diagrams and apply them to obtain the exact Nyquist plot when necessary. The Bode approach is presented after the Nyquist criterion for ease in presenting the requisite theory. Subsequent usage should by no means be affected by order of theoretical presentation. 21-18 FEEDBACK CONTROL Strictly speaking, the small semicircles about poles of G(s)H(s) on the jw-axis and also the traversal of s out at infinity do not correspond to constant amplitude sinusoidal input. The polar plot used in the Nyquist criterion is therefore not strictly a frequency response plot. For purposes of simple definition, these exceptions are overlooked. Abbreviated Nyquist Stability Criterion. When the open loop transfer function is stable by -itself P = 0, and the criterion for stability reduces to R = o. STATEMENT 1. For a stable open loop transfer function, the closed loop system will be stable if there are no encirclements of the - 1 point in the G(s)H(s)-plane for s = jw. The criterion may be further reduced to observing the behavior of G(s)H(s) for positive w in the region where the magnitude of G(s)H(s) is near unity. The additional restriction is that G(s)H(s) becomes a constant less than 1 (or zero) as s becomes increasingly large. This restrictfon means that in eq. (15) the order of N 1 (s)N 2 (s) is less than or equal to the order of Dl(S)D2(S). Where the respective orders are equal, the product of gain constants, K 1 K 2 , from eqs. (13) and (14) must be less than 1. For the cases that fall within the above-mentioned restrictions (and there are many), the criterion can be restated. STATEMENT 2. In the region of frequencies where G(jw)H(jw) is near the unit circle, the system is stable if the -1 point is not encircled. STATEMENT 3. If the further restriction is imposed that G(jw)H(jw) is well behaved in the region of the unit circle, then stability is indicated by the phase angle of G(jw)H(jw) for positive values of w when it crosses the unit circle. For phase angles less than -180 degrees at unit circle crossover the system is stable. For phase angles more negative than -180 degrees at unit circle crossover, the system is unstable. In :Fig. 17, G1 (s)H 1 (s) represents a stable closed loop system whereas G2 (s)H2(S) represents an unstable system. A well-behaved G(s)H(s) is loosely defined as one that does not wander too much in the region of the unit circle. A not too wellbehaved open loop transfer function is shown in Fig. 18. For systems of this type, the general Nyquist criterion should be used. Adequate information about system stability is contained in Fig. 18, but more than the first unit circle crossover must be inspected. Phase Margin. For those systems that do fall within the abbreviated criterion, additional definitions have evolved. The phase of G(jw)H(jw), measured with respect to the positive real axis STABILITY 21-19 and defined as positive in the counterclockwise sense, is given as e. The phase margin is the phase of G(jw)H(jw) at unit circle crossover and is measured with respect to the direction of the -1 point: (23) 'Y = 180 0 + e. In Fig. 17, G 1 (s)Hl(S) has a positive phase margin whereas G2 (s)H2(S) has a negative phase margin. Phase margin at unit circle crossover evidences system stability with plus and minus values indicating stable and unstable systems respectively. Zero phase margin at unit circle crossover means that G(jw)H(jw) passes through the -1 point and therefore that the closed loop system will sustain a constant amplitude oscillation. Imaginary Imaginary Re G(s)H(s)-plane FIG. 17. Unit circle in the G(s)H(s)-plane. FIG. 18. Not too well-behaved G(s)H(s). EXAMPLE. In Fig. 18 the phase margin at the first unit circle crossover is positive. The -1 point is encircled so the system is unstable. This example illustrates the case where inspection of only the first unit circle crossover could lead to erroneous conclusions. Gain Margin. A second point of particular significance is the gain or magnitude of G(jw)H(jw) where it crosses the negative real axis. This is 0"1 in Fig. 19. The reciprocal of this value is the gain margin of the system. The gain constant of G(s)H(s) could be raised by a value 1/0"1 before instability arose. The -1 point can be considered a vector of unit magnitude and direction of. -180 degrees. Note that the phase margin is defined with relation to the magnitude of the -1 point whereas the gain margin is defined with relation to the direction of the -1 point. Conditional Stability, Unconditional Stability. A conditionally stable system is one where instability can come about by either an increase or decrease in system gain. An unconditionally stable system is one where 21-20 fEEDBACK CONTROL Imaginary Re l' = 180 + (} G(s)H(s)-piane FIG. 19 .. Determination of phas;e margin, gain margin. instability can come about only for an increase in system gain. Figures 20 and 21 illustrate these cases. Imaginary ./ -I ./ ,.. -* ---, Imaginary "'- / --- , ~ ~ '\. \ / \ I \ \ \ \ Re \ \ ........ \ ~ -1 \ Re I J \ \ , "- "- FIG. 20. ~- -- ---~ ./ / / / / 'f -~- ,..,,/ G(s)H(s)-plane G(s)H(s)-plane Conditionally stable system: , / / I FIG. 21. Unconditionally stable system: K G(s)H(s) = - - - - s(s S2)(S S4) + + Inverse Polar Plots. The preceding Nyquist diagrams are polar plots of G(s)H(s). These diagrams led to ascertainment of the nature of the zeros of eq. (2), 1 G(s)H(s) = O. + STABILITY 21-21 If in this equation both sides are divided by G(s)H(s), 1 ---+ 1 =0. G(s)H(s) (24) Let G'(s)H'(s) represent the inverse of G(s)H(s) G'(s)H'(s) (25) + 1 = o. The above mathematical manipulations cannot alter the factors of eq. (2). The zeros of eq. (24) or (25) are exactly the same zeros of eq. (2). Investigation of system stability via the inverse polar plot leads to conclusions identical to those arrived at by use of the direct polar plot of G(s)H(s). In certain design applications use of the inverse loop transfer function may more clearly demonstrate effects of design changes. Polar Plots of Some Common Open Loop Transfer Functions. The following plots represent some commonly encountered system functions. Once the reader recognizes how these were generated, he should be ready to handle any newly encountered situation. In the G(s)H(s)-plane plots are the letters A, B, C. These represent possible locations of the -1 point dependent upon the value of the gain constant K. Stability is indicated for various locations of the -1 point. See Figs. 22-36. See also Figs. 13-16. Imaginary ~---, // ---x-------+--------~~- \ -1 -S2 "\ Re w=O G(s)H(s)-plane Always stable FIG. 22. Polar plot of G(S)H(8) = K --I (8 + 82) p = o. 21-22 FEEDBACK CONTROL Imaginary -+______r-~~~------~w~=-~e ---X--~r-~--------~-r- -s2 -1 -Sl \ \ "' .... _ _ ...... .Jt! / I I G(s)H(8)-plane Always stable FIG. 23. Polar plot of G(s)H(s) = K(s (s + S1) , + S2) p = O. Imaginary jw -"",, \ - o FIG. 39. Minimum phase response curve regions. The Nyquist Stability Criterion Rephrased in Terms of the Bode Diagrams. For well-behaved, minimum phase G(s)H(s), the closed loop system is stable if at the frequency where the log magnitude of G(jw)H(jw) is equal to zero, its phase angle is less than - 180 degrees. Or, the system is stable if at the frequency where the phase angle of G(jw)H(jw) is -180°, the log magnitude is less than zero. If the condition arises where the phase angle is equal to -180 degrees and the log magnitude is zero at a frequency Wo, the closed loop will sustain a constant amplitude oscillation at a frequency woo This condition corresponds to G(jw)H(jw) passing through the -1 point in the Nyquist plot. H the phase angle of G(jw)H(jw) is defined in terms of phase margin as given by eq. (23) and Fig. 19, the stability criterion is commonly expressed as follows. At gain crossover (the point where the magnitude curve crosses the log M = 0 line) a positive phase margin indicates a stable system whereas a negative phase margin indicates an unstable system. By use of Bode diagrams it is possible to deduce whether a system is or is not stable in situations more complicated than that wherein G(s)H(s) is a well-behaved, minimum phase network. When complicated situations arise, final conclusions should be checked by use of the general Nyquist criterion or the Routh criterion. Mechanics of Drawing Bode Diagrams. When given a transform G(s)H(s), the most straightforward procedure in drawing Bode diagrams is to pick values of jw, substitute into G(jw)H(jw) , and grind out the complex algebra. Fortunately this laborious procedure is not required frequently because G(s)H(s) is usually known in factored form. There are four basic building blocks used in drawing Bode diagrams. 1. K±\ a pure gain constant. 2. S±l, a pure differentiation or pure integration. 3. (s + wO)±l, a simple lead or simple lag. 4. (S2 + 2rwos + w0 2)±1, a quadratic lead or quadratic lag. In reverting these basic factors to logarithmic plots it would be entirely possible to use logarithms to the base e and to use the common multiplier 21-32 FEEDBACK CONTROL of 1. Since the decibel concept was in vogue and orders of 10 are easier to handle than orders of e, logarithms to the base 10 were used and the multiplying factor was taken as 20. A decibel is equal to Po Vo Decibels = db = 10 IOglO - = 20 IOglO - . Pi Vi (30) Transfer functions in general are more similar to voltage ratios, VO/Vi' than to power ratios, PO/Pi, therefore the multiplying factor of 20 is commonly used. Some writers (Ref. 10) would rather use the multiplying factor of 10 and units of decilogs, but this seems to be of small consequence in stability analysis. The First Building Block: The Pure Gain Constant. 20 log K±l = ±20 log K. (31) The logarithm of a pure gain constant is independent of frequency and therefore plots as a horizontal line in the magnitude and phase curves. K has zero phase angle if it is positive and -180 degrees if it is negative. Magnitude / 2 0 log K (K> 1) ~~ tl.O.2 0 ~20 log 0·- - u K(K 'tx 0.4 . 0.5 -60 IJ) Q) ~§~ ~ ~ 0.6V-'::::~ 10.8 I r = 1.0-- ~ -80 Q) ~~ "C -120 -180 0.1 » n (;:2 + 2r C:O+ 1) merely reverse sign of ordinate I o r- r-f-..: ~ I 0.2 A n \~~ ~ ~ t:-~ ~~ ~~~::::- r:::--r=::::t:::::r-t-- To obtain values for -160 oo;J ~ ~ «i ::l -100 ~ a.. -140 I - - -n m m 0.3 0.4 0.5 '0.6 0.8 1.0 2 wlwo~ FIG. 45. Phase curves of [(S2/w0 2) + 2r(s/wo) + 1]-1. 3 4 5 6 8 10 Z --I ;;:c o r- STABILITY break frequency. From eq. (40) (41) M(w) = ±20 log 21-39 Je:Y+ ( ::s· 1- For w/wo much less than 1, all w/wo terms are negligible and (42) M(w)1 ~ ±20 log v1 W = 0 db/decade. -«1 wo For w/wo much greater than 1, the dominant term is (43) M(w) I;;» 1 = ±20 log J(-::s w ±40 log-· Wo This is merely twice the slope of a simple break for the similar assumption. Therefore, the asymptote of a complex break is ±40 db per decade for large w/wo. When a quadratic factor is encountered, a first approximation is made by considering r = 1, which means that a simple break of multiplicity 2 occurs at woo For more accurate work, the data in Figs. 44 and 45 must be used. In this case, r is calculated from the given quadratic factor and the requisite magnitude and phase information obtained from the corresponding r curves. The scale shown in Fig. 43 can be used to obtain phase of quadratic factors by proper use of the additional scales on the right and left sides. Since a graph must be kept for magnitude information, it seems logical to use a graph for phase information also. It is possible to plot graphs of magnitude correction terms for simple and complex factors to an expanded scale to improve accuracy. Since the corresponding phase correction curves offer small expanded scale possibilities, it is usually of little value. The Transforlll of a Pure Tillle Delay e -sT. This is of particular interest in many cases. (44) 20 log e- sT I . = 20 log 1 - jwT = 0 - jwT. S=3W Equation (44) shows that the magnitude is independent of frequency and the phase is linearly related to frequency. The magnitude and phase curves are shown in Fig. 46. This function falls within the limitations of the Nyquist criterion, and stability can be investigated in the usual fashion. FEEDBACK CONTROL 21-40 In Qj 10 ..c '(3 Q) "C ~ tlO 0 ..J ;/ a -Magnitudele-ST I -10 /Phase/e-ST -~ a -100 ~ ~, \ In Q) ~-200 Q) "C ~ -300 -400 0.1 0.2 0.4 2 4 \ 10 wT~ FIG. 46. Bode diagram of e- sT • Application of Bode DiagraIlls. EXAMPLE 1. Draw the Bode diagram of G(s)H(s) = 316 s(s + 10) . First put the simple lag factor in nondimensional form G(s)H(s) 31.6 s[(s/10) + 1] Separate individual factors 1 1 G(s)H(s) = 31.6 . - . s (s/10) + . 1 The asymptotic approximate and the exact Bode diagrams of these individual factors are shown in Fig. 47. The composite G(s)H(s), in heavy solid lines, is merely the summation of all the separate magnitude and phase curves as indicated by eq. (29). At gain crossover w = 16, () = -147° for the exact curve whereas w = 18 and () = -150° for the approximate curve. In most cases the approximate answers are sufficiently accurate because in practice the transfer functions represent average values and will not correspond exactly with delivered equipment. Also, as equipment wears in normal use, the transfer characteristics change. For these 21-41 STABILITY reasons the designer must usually provide a margin of safety and some adjustments which will permit small changes when required for improved system performance. The system shown in Fig. 47 is stable for all values of loop gain because the phase angle approaches -180° asymptotically. The Nyquist plot of a similar transfer function is shown in Fig. 25. It is well to keep both representations in mind. 60 I I'--r-... 40 ..0 .(3 20 Q) "C c: 0 :;:: ro :J c: Q) ~ 0 --- 31.6 :\ I--..... en OJ IS(fo+l) I 31.6 --k~1 Exact gain crossover" !--<~ (fo~ 1)1'\ ~ k.~APproXimate gain crossover 1 --- -20 « r--.... I---- ~ ~~ ---r-- ""'-'"----r--.. r--.. -...... t--- -..... -40 ~ ...... 1-/31.6 0 :i.l -30 ~ -60 "C -90 . ~ -120 it -150 -180 0.1 Q) ;t 0.2 - A 0.4 ;/ 1""---_ --1 2 .... 4 --L -...... ~ 10 w- TO + ~ --... t..- -s- 1 31.6 ./ /,S(&+1) ~ I "20 40 100 200 Bode diagram of G(s)H(s) = 31.6· (l/s)· [l/(s/lO) FIG. 47. 400 1000 + 1]. EXAMPLE 2. Given the following loop transfer function, determine ]( such that gain crossover occurs at 0 = -135° at w greater than 100 rad/sec. K(s G(8)H(8) = (82 + 80) + 68 + 100)(8 + 400) . Again, by nondimensionalizing and separating factors G(s)H(s) ]( ·80 100·400 [(sI1O)2 + (~.6SI1O) + 1J .(8: + 1) . (8140~) + 1 The frequency response curves are shown in Fig. 48. The composite curves can be drawn without resort to drawing all the individual curves as done in Fig. 47. Neglect the constant term and consider first the FEEDBACK CONTROL 21-42 asymptote approximations to the separate factors. There is a quadratic lag break at 10, a simple lead break at 80 and a simple lag break at 400. The approximate curve is flat out to 10, breaks down to -40 db per decade at 10, breaks to - 20 db per decade at 80 because of the simple lead, and then breaks back to - 40 db per decade at 400 and continues on at this slope. The servomechanism scale shown in Fig. 43 is very useful in drawing thei3e asymptote lines. The exact curve is drawn in by obtaining correction If) "iii ·0 ./"" ~ 0 OJ "tJ -20 r::: 0 :;:; ro :J I""- I ydb .0 -40 0 c I. de Magnltu ~" ..... , ~ OJ ~ -60 --- 0 111 -45 OJ So OJ ."tJ -90 a) If) " ro E: -135 -180 1 FIG. 48. 2 4 \ Phase "', 10 20 """ t 40 100 200 --... ~ 400 " '" "- -- 1000 2000 4000 .. ( )H( ) _ ~ (8/80) + 1 Bode dIagram of G 8 8 - 500 [(8/10)2 + (0.68/10) + 1] [(8/400) "' 10,000 + 1] terms for the quadratic lag for r = 0.3 from Fig. 44 and for the simple lead and simple lag from Fig. 42. The exact phase curve is drawn in by use of the servomechanism scale shown in Fig. 43 for the simple breaks and the phase curve for the quadratic lag by use of the phase curve for r = 0.3 in Fig. 45. The arrowhead on the servomechanism scale is placed at the frequency where the phase is desired, and the phase contribution by the simple breaks is read at the break frequencies. The lead and lag terms contribute positive and negative phase angles respectively. To set the gain constant such that gain crossover occurs at () = -135° at w greater than 100, the entire magnitude curve is shifted up until this occurs. Instead of shifting the magnitude curve up, it is simpler to shift the zero db line down. This corresponds to recalibration of the db axis. The required amount of O-db line shift corresponds to /(/500. To STABILITY 21-43 meet the requirements of the example /(/500 = 42 db. ]( therefore must equal 63,000. Use of Bode Diagrams in Drawing Nyquist Plots. When system design is attempted by use of the Nyquist diagrams, it soon becomes apparent that the labor involved in drawing the diagrams is excessive. This comes about because design changes come in terms of multiplying factors which are laborious to incorporate because multiplication is a relatively complex process. The logarithm concept of the Bode diagrams reduces 10 '"" 4 "~ Is(to +1) I31.6 '\ \ ~ \ 0.2 0.1 1 2 4 10 20 \ 40 100 w~ FIG. 49. Reproduction of magnitude of Fig. 47. multiplication to the simple process of addition. It is advantageous first to plot a Bode diagram and transfer the values from this diagram to the' polar plot "Then information is desired in such form. The major stumbling block to this procedure is the conversion of decibels to gain numbers. It is possible to plot the Bode magnitude diagrams on log-log paper, as shown in Fig. 49 and thereby circumvent the use of decibels. Gain factors are clearly brought to view. This approach suffers from two major disadvantages. First, the plot of phase is still best accomplished on semilog paper, therefore separate scales would be required for the magnitude and phase curves. Use of decibels allows a. single semilog paper to be used for both plots. Second, in adding the magnitudes of two factors, a pair of dividers or some such device would become necessary. Shift of the zero line is not as simple as it is with the decibel scale. A more useful approach is to use a scale as shown in Fig. 50. The scale is transparent. Three possible scale factors on the decibel scale are avail- FEEDBACK CONTROL 21-44 38.6 40 100 342.L- r - 5 0 56 60 32.2 29.5 30 -20 23.5 18 15 2 0 _ '--10 18 15.6 14 35.6 35 34 32.2 3 0 - -32 29.5 40 32 30 25 26 - 18.6 40 38.6- -85 38 38 35.6 26 85 25 23.5 15- ,....---L-5.6 5 18 100 40 38.6 38 60 50 35.6 40 34 30 32.2 20 29.5 10 3.2 ~ ~2 III ~ "2 I:) 3.5 18 1.8 III -6 c: ·iii 5 -7.8 C) .32 3 15- 14 - -.2 -16.5 6 -24.4. -26 25 -27.8 -30.4 -30 _35- 34 - .06 .18 .~ c .056 .032 -36.5 1.5 ~ .0 III .085 -2 05 03 .018 -10.5 015 20 23.5 15 20 18.6 18 10 8.5 15.6 15- -5.6 6 14 5 2 0 1 . 4 - -.85 4.4 -6 7.8 -10 -15 .~b .32 3 6 .3 2 5- -1.8 1.5 ~ 3.5 Q) If) ..c iii .18 .2 =3 .15 Q Q) -20 .1 21.4- 1--.085 -25 24.4 -26 .06 05 27.8 .04 - 3 0 - 1--.032 -30.4 03 1 § 85 ~ 0 -2 -1.4 'ro 6 -4.4 -5-.56 -6 .5 -7.8 .018 -10.5 02 36.5 015 -40 01 3 .2 15- -.18 -16.5 .15 -20 -21.4 -22 .1 - ·9!3S -24.4 25- -.056 -26 -30.4 -34 -36.5 -40 Gain-decibel conversion scale. .06 .05 -27.8 FIG. 50. (!l .4 10- -.32 -14 -34 -35 9.5 c: 'iii .6 .5 .4 16.5 -22 4 10- -3.2 C!l -14 -40-'--.01 12.2 ~c: Q; .0 .04 r-.02 1.8 3.5 .15 _ 21.4 - 20-;;- _ . 1 30 25- -18 .4 10 40 30- -32 15 1 5- -L.56 50 26 20 10 18.6- -8.5 15.6 5.6 6 14 5 12.2 4 1 0 - -3.2 9.5 3 t 1.5 .85 -1.4 -4.4 -10.5 3 60 35- -56 15 12.2 9.5 100 85 04 30- -.032 35- -.018 .03 .02 .015 .01 21-45 STABILITY able. The scale is placed vertically on the Bode magnitude diagram and values of gain read directly. Another approach is that of a graph as shown in Fig. 51. Values of decibels are read off the magnitude curve and the graph is used to convert decibels to gain numbers. +50 J IL +40 t'!. L +30 II +20 1.; IL +10 If VI II Qi :g 0 l.-' Q) CI 1..,- -10 1/ -20 1"'- -30 1.1 -40 Gain numbers FIG. 51. Gain-decibel conversion chart. Relative Merits of the Bode DiagraIn Approach. This approach is by far the most generally used method in system design. Design modifications can be analyzed with a minimum of labor involved in drawing the diagrams. The approximate curves allow the designer to investigate a host of designs in a short time. The most promising approximate designs are then investigated more exactly. The host of curves provides an indication of how system performance will change in the presence of some nonlinearities in the system. This approach is based on the limitations of the abbreviated Nyquist criterion, and when complex situations arise, it is best to revert to the 21-46 FEEDBACK CONTROL Nyquist criterion or Routh criterion for exact stability 'evaluation. Bode diagrams can be used to draw the Nyquist diagrams. The 6. ROOT LOCUS METHOD This method (Refs. 11-13), developed by Evans, provides a means for obtaining the roots of the characteristic equation of a closed loop system, the values of which clearly indicate system stability. Essentially, the method assumes that a chosen complex number is a root of the characteristic equation and tests to see if it can be. If this test is favorable, one constant is changed to a value such that the complex number is a root of the equation: This constant is the loop gain of the closed loop system. The complex numbers which represent possible roots of the characteristic equation, when plotted in the s-plane and identified with the necessary corresponding loop gain, form curves which are the loci of the characteristic equation roots. These roots represent poles of the closed loop response which clearly indicate system stability and transient performance. This plot in the s-plane provides a rapid evaluation of the effects of varying the gain in a system. It provides a graphical representation of the predominant features of a closed loop system, i.e., its poles, and when system behavior is inadequate, provides a clear indication of proper compensation. For a system whose loop transfer function is given by K G(s)H(s) = - - - - - S(T2S 1)(T4s 1) + + the root locus plot is shown in Fig. 52. This plot shows that as the loop ( X,---, 1 - T{ s-plane FIG. 52. Root locus for K G(s)H(s) = - - - - - S(T2S 1)(T4s 1) + + STABILITY 21-47 gain, I(, is increased, the closed loop poles move in the direction of the arrowheads. For all values of gain less than Ie, the closed loop poles lie in the left half s-plane, and the system is stable. For all values of gain greater than K c, the system is unstable. Kc is a critical gain factor. Also shown by the plot is the fact that, as the gain varies from Kl to I(c, the closed loop poles have a decreasing damping factor. Therefore, one can expect the transient response to be more oscillatory and to have a longer settling time as the gain is increased in this region. . Theory of Construction. Consider the general negative feedback system shown in Fig. 1. Assuming that the feed forward and feedback transfer functions are composed of fractions of rational polynomials in s, i.e., am1sm1 + aml_lsml-l + ... + alS + ao Nl(S) G(s) = --, (45) n1 - bnts + bnl_lSnl-l + ... + bls + bo Dl(S) (46) H (s) = d sm2 + dm _ISm 2-1 + ... + dIS + do N (s) m2 2 = _2_ . en2 sn2 + en2_lsn2-1 + ... + els + eo D2(S) The closed loop response is (47) G(s) C(s) R(s) 1 N 1(s)D2(S) + G(s)H(s) DI(S)D2(S) + N 1(s)N2(s) The root locus method obtains the roots of the fractional equation [1 + G(s)H(s) = 0], which are the roots of the characteristic function [Dl(S)D2(S) + N 1(s)N 2(s)]. It is of interest to note that the closed loop response has numerator factors (zeros) which are identical to the zeros of the feed forward transfer function and poles of the feedback transfer function, N I (s) and D2 (s) respectively. To find the closed loop poles 1 + G(s)H(s) = O. Therefore (48). G(s)H(s) = -1 = II ±N7r, N = 1,3,5,7, For this identity to exist, the angle of G(s)H(s) must lie along the negative real axis of the s-plane. This constitutes the angle' condition: (49) IG(s)H(s) = ±N7r, N = 1,3, 5, 7, Also, the magnitude of G(s)H(s) must be unity. magnitude condition: (50) IG(s)H(s) I = 1. This constitutes the 21-48 FEEDBACK CONTROL In general, (51) G(s)H(s) + 81)(8 + 83)(8 + 85) ... (8 + 82m-I) (8 + 82)(8 + 84)(8 + 86) ... (8 + 82n) K(8 where K(SI)(83)(S5) ... (82m-I) (52) (S2) (84)( S6) . . . (82n) represents the loop gain. Each factor in eq. (51) represents a vector in the s-plane as shown in Fig. 53. jw (f s-plane FIG. 53. Vector representation of typical polynomial factors. The angle condition, eq. (49), requires that (53) ---- ---- + ... [/8+S1 +/8+S3 or (54) +/s+s2m_d - [/S+82 ---- k=m L: /(8 + 82k-l) + ... +/s+s2nl = ±N7I' i=n - L: /(8 + S2£) = ±N7I'. This states that for an exploratory point, s, to lie on the root locus, the summation of the angles of the zeros minus the summation of the angles of the poles of the open loop response mU8t be an odd multiple of 71'. • Procedure. The procedure is to plot the poles and zeros of G(s)H(s) in the s-plane, choose an exploratory point, s, layoff the factor vectors [note that the factor vector arrowheads always lie on the exploratory point s in Fig. 53], sum the angles with the proper sense; if they add up to ±N71', the point is on the root locus. If not, move the exploratory point over and repeat. This constitutes assuming that a chosen complex number is a root of the characteristic equation and testing to see if it can be, i.e., if it satisfies the angle condition. STABILITY 21-49 When a point is located that does satisfy the angle condition, then the vector magnitudes are measured and values are substituted in the magnitude condition, eq. (50), to calibrate the constant K. [Note. K determines loop gain, but it is not defined as such per eq. (52).] (55) IKI = __ I(S_+ __S2_)_'_I(_s_+_S_4)_I_"_'_'_(S_+_S_2n_)_' , (s + SI) I '(s + S3)' ... '(s + S2m-l) , Repetition of the above steps should ascertain the complete locus. When the locus is completed, the actual K of the given system is determined from G(s)H(s). By reference to the root locus plot, the closed loop poles are then obtained by inspection. General Theorell1s for Construction. At first glance, it appears as though a root locus may lie anywhere in the complex plane and to discover it may be a hit-or-miss proposition. Fortunately, the locus must take on certain definite patterns governed by the number and location of the open loop poles and zeros. The following general theorems aid in ascertaining the approximate root locus. THEOREM 1. Number of branches of the locus is equal to the number of closed loop poles. A branch is a separate portion of the root locus which has all values of loop gain on it. For a given loop gain only one pole may exist on one branch of the complete root locus plot. The number of branches is therefore equal to the degree of the characteristic equation in s because this determines the total number of poles. Reference to Fig. 54 shows that for a given loop gain, KI, there are four closed loop poles, one on each of the four branches labeled (1), (2), (3), (4). ]( is the loop gain because all factors are in nondimensional form. THEOREM 2. The locus starts at open loop poles or infinity (K = 0) and ends at open loop zeros or infinity (K = 00). Inspection of the magnitude eq. (55) shows that at open loop poles K is zero because one of the numerator magnitudes becomes zero. K is infinite at open loop. zeros because a zero magnitude term appears in the denominator. For the locus to start at infinity it is imperative that G(s)H(s) have more zeros than poles, i.e., its numerator would be of higher degree than its denominator. Equation (55) shows that for this case and for s approaching infinity, ]( approaches zero. For the locus to end at infinity, it is imperative that G(s)H(s) have more poles than zeros. In this case eq. (55) shows that for s approaching infinity K approaches infinity also. Figure 54 shows the case where the loop transfer function has four poles and one zero. The branches start at the poles for a loop gain of zero. As the loop gain increases to infinity, branch (2) goes along the real axis from the pole to the zero while the other three branches tend toward infinity. FEEDBAcK CONTROL 21-50 THEOREM 3. For the locus to exist on the real axis, the sum of poles and zeros to the right of the exploratory point must be odd. This is so because conjugate complex roots together contribute zero angle when the exploratory point is on the real axis. Only poles and zeros on the real axis to the right of the exploratory point contribute angle (180° each), there- x (K=O) (KXO) FIG. 54. Root locus plot for G(s)H(s) = (T2S s-plane K(T}s + 1) + 1)(T4S + 1)[(s2/w02) + 2t(s/wo) + 1J ( fore, the above conclusion. By again referring to Fig. 54 it is seen that where the locus exists on the real axis there are either five or three poles and zeros to the right. THEOREM 4. The locus is symmetrical with respect to the real axis. The characteristic equation is a rational polynomial in s with real coefficients. Therefore, the roots, when complex, must occur as conjugate pairs. In Fig. 54, branch (3) is the image of branch (4) about the real axis. THEOREM 5. The locus leaves an open loop pole or approaches an open loop zero in the direction given by ±N7r minus the sum of angles of vectors from remaining poles and zeros to the pole or zero in question. Consider the exploratory point s to be very close to an open loop pole. As s circles the STABILITY 21-51 pole, the angle change due to the vector from the pole in question to s changes greatly. The other vectors change direction only minutely; Therefore, since the angle contribution from all other poles and zeros is nearly fixed, the angle contribution of the pole vector in question must J Probable Asymptote line [l~ol = 60 ~/ A 0 ] I / locus Direction of departure 870 / -L L"-;? [180"-90"+54°-33°-24"=87°1 j4 ",,/'i/f I Asymptote line ",/ ,// / I (r3:2~0 = 18001 //"'33"/? I ~l • J"'"24. / 1i40 I x~ x~~~·~~~----;-------~ (J" -15 -12 -9 -3 I l- s-plane I 90· x----L FIG. 55. Construction theorems G(s)H(s) = -j4 + 1) . (~ + 1) C~ + 1) (3 ~ j4 + 1) (3 ~ j4 + 1) K(! contribute the amount necessary to satisfy the angle condition. There~ fore, the direction of locus departure from an open loop pole is ascertained. A similar situation arises near an open loop zero. Reference to Fig. 55 shows that the direction of departure of the locus from the uppe,r complex pole is 87°. THEOREM 6. The direction of locus asymptote lines is given by ±N7f' n-m n = number of poles, m = number of zeros. FEEDBACK CONTROL 21-52 When the exploratory point is extremely far from the cluster of open loop poles and zeros, they all contribute essentially the same amount of angle. Since these must add up to ±180 degrees or some odd multiple, the foregoing conclusion exists. In Fig. 55 it is seen that ±60° and 180 0 are the directions of the asymptote lines. Part of the 180-degree line also happens to be a branch of the locus. THEOREM 7. Asymptote lines cross real axis at i=n k=m L: (J'i - L: (J'k i=l k=l n-m (J'i = real part of ith poles, (J'k = real part of kth zero. This corresponds to the centroid of the pole-zero cluster. In Fig. 55 the asymptote lines cross the axis at -7. Practical Considerations. If the loop transfer function has many poles and zeros, some of which are located relatively far from the main cluster and from the jw-axis, a first order approximation can be made to the exact locus by omitting the distant poles and zeros. This procedure requires good engineering judgment. The advantage lies in quicker ascertainment of the important part of the locus which can be drawn to an expanded scale. The procedure to be used in drawing a root locus is to plot the poles and zeros of the open loop response. From the preceding generalizations, sketch in the loci. Graphically determine the exact loci. With the open loop gain constant pick off the closed loop poles. The closed loop response is then made up of the poles obtained from the plqt and the zeros and multiplying constant from inspection of G(s) and H(s) per eq. (47). Multiple loops are handled by first reducing the minor loops to transfer functions in factored form. It is of interest to remember here that the root locus method is a graphical procedure of factoring the characteristic equation of a system. The minor loop transfer functions are then included as blocks in the major loop and the major loop root locus is then plotted. Donahue's Analytical Procedure to Calculate the Root Loci. A relatively simple analytical means of plotting a root locus has been developed (Ref. 34). This method determines a point on the locus by shifting the jw-axis a given distance, (J'I, and then calculating the frequency, WI, at which the locus crosses the jw-axis in the s-plane. The requisite loop gain is then calculated from WI. Successive points on the locus are obtained by successive shifting of the jw-axis. Tables 1 and 2 have been derived by Donahue to aid in the calculations (Ref. 34). STABILITY 21-53 Referring to the general single loop negative feedback system of Fig. 1, let G(s)H(s) = K(RN (56) RD + IN) , + ID where RN is the real part of the numerator (even powers of s) and [N is the imaginary part of the numerator (odd powers of s). The denominator follows similar notation. For a point to lie on the root locus, from eq. (48), RD + IN) + ID + KIN = -RD - [Do [«(R N -----= (57) -l. Therefore KRN (58) Equating real part to real part and imaginary to imaginary' K= (59) and (60) Equations (59) and (60) provide means of solution for frequency and gain at the root locus crossing of the jw-axis. EXAMPLE. Let H(s) = (61) (62) where from the preceding RN = ho, IN = s, Substitution in eqs. (59) and (60) gives (63) (64) K = - (S2 + ho ro) , 1 (s + S4) K(s + ho) , FEEDBACK CONTROL 21-54 TABLE (m,n) General Form (1,2) (8 2+r18 +ro) K(8+h o) 1. AID FOR ROOT Locus ao al ro +rlO" +0"2 Tl +20" CAL (2,2) K(8 2+h18 +ho) (8 2+rI8+ ro) 1-0 +rlO" +0"2 TI +20" (0,3) K (8 3+r282+rI8+ ro) rO+rlO"+r20" 2 +0"3 rl +2r20" +30"2 ro +rlO" +r20"2 +0"3 rl + 2r20" +30"2 ro +rlO"+r20"2 +0"3 rl + 2r20" +30"2 ro +rlO" +r20"2 + 0"3 rl +2r20" +30"2 rO+rlO"+r20"2 +r30"3+0"4 rl +2r20" +3r30"2+40"3 ro+rlO"+r20"2 +r30"3+0"4 rl +2r 20" + 3r30"2+40"3 rO+rlO"+r20"2 +r30"3+0"4 rl +2r20" +3r30"2 +40"3 ro+rlO"+r20"2 +r30"3+ r4 0"4+0"5 rl +2r20"+3rs0"2 +4r40"3+50"4 ro+rlO"+r20"2 +r30"3 +r40"4 +0"5 Tl +2r20"+3r30"2 +4r40"3+50"4 ro +rl 0" +r20"2 +rs0"3 +r40"4 +r50"5 +0"6 Tl +2r20" +3T30"2 +4r40"3+ 5r50"4+60"5 [t riO"i] [~ iriO"(i-l) J (1,3) (2,3) (3,3) (0,4) (1,4) (2,4) (0,5) K(8+ho) (8 3+r282+rI8 +ro) K(8 2+h18 +ho) (83+r282+rI8+ro) K(8 3+h282+hI8 +ho) (8S+r282 +r18 +ro) K (8 4+r38 3+r282+rI 8+ rO) K(8+ho) (8 4+rs8 3+r28 2+r18 +ro) K(8 2+hI8+ hO) 4 (8 +r38 3+r282+r18+ro) K (8 5+r484 +rs8 S+r282+r18 +ro)· (1,5) K(8+ho) (8 5+r484 +r383 ~r282 +r18 +ro) (0,6) K (86+r585 +r484 +r38 3+r282 +r18+ro) K[ (m,n) t t=o i h i8 ] [t Ti8i ] t=o t=O STABILITY 21-55 CULATION BY DONAHUE PROCEDURE a2 a3 a4 as bo bl b2 0 0 0 0 ho+O" 0 0 0 0 0 0 ho+hlO" +0"2 hI +20" 0 r2+ 30" 0 0 0 0 0 0 r2+ 30" 0 0 0 ho+O" 0 0 r2+ 3 0" 0 0 0 ho+hlO" +0"2 hI +20" 0 r2+ 30" 0 0 0 r2 + 3r30" + 60"2 r3+ 40" 0 0 ho+hlO" hI +2h20" +h20"2+0"3 +30"2 0 0 r2+3r30" +60"2 r3+ 4 0" 0 0 ho+O" 0 0 r2 +3r30" +60"2 r3+ 40" 0 0 ho+hlO" +0"2 hI +20" 0 r4+50" 0 0 0 0 r4+50" 0 ho+O" 0 0 0 0 0 r2+3r30" +6r40"2 + 100"3 r3+ 4r40" +100"2 r3+ 4r40" r2+3r30" +100"2 + 6r40"2 + 100"3 r2+3r30"+6r40"2 n+4r40" r4+5 r50" r5 +150"2 +60" + lOrs0"3 + 150"4 +100"2+200"3 [~ C~k) riO" (i-2) ] [thiO"i] 1=0 [~ ihiO"(i-l) ] h2+30" 0 [~ (%k) hiO"(i-2) ] FEEDBACK CONTROL 21-56 In eqs. (63) and (64) let s = jw w2 - ro K=--- (65) ho (66) Equations (65) and (66) define the gain and frequency at which the root locus crosses the imaginary axis. To calculate other points on the locus, shift the jw-axis by replacing s with (s 0-) K(s (J ho) (67) G(s (J)H(s (J) = , (s (J)2 rl(s (J) ro which reduces to K(s + bo) (68) G(s (J)H(s (J) S2 alS ao' where + + + + + + + + + + + + + bo = ho + (J. By analogy to eqs. (62), (65), and (66) (69) 1 2 - K = - (ao - w ), bo . (70) Tables for Donahue Procedure. This example gives rise to the first , row of Tables 1 and 2. For (J = 0 eqs. (69) and (70) reduce to eqs. (65) and (66). Table 1 has the parameters ao, aI, a2, etc., and bo, bI, b2, etc., determined in terms of the original numerator and denominator power series coefficients for each of several types of loop transfer functions. Table 2 gives w 2 and - K in terms of these a and b parameters. The procedure therefore consists of reducing G(s)H(s) to a fraction of two power series, identifying this with the proper row in Table 1, substituting in values of (J, which lead to calculation of the parameters a and b and subsequent solution of w2 and -K per Table 2. (JI, WI, and KI provide a point on the root locus. The occasion may arise that for a given (J, there may exist no real w or positive K. This merely signifies that no locus exists in this portion of the s-plane. It will be noted that the' last line in Table 1 has the general equation which can be used to evaluate the a's and b's for additional transfer functions. But since the corresponding general equations for the K and w2 are missing, the above serves more as a check on new derivations than as a means of avoiding work. TABLE 2. AID Locus FOR ROOT CALCULATION BY DONAHUE PROCEDURE w2 -K (1, 2) lao - boad 1 - lao - w2] bo (2,2) [bOal - b1ao] [al - bl ] lao - w2] [b o - w2] (0,3) al (1, 3) [boa I - ao] [b o - a2] (2,3) ![(b o + al - b1a2) ± V(b o + al - bla2)2 - 4(boal - b1ao)] (m,n) (3, 3) 2(b 2 ~ [(b 2al + bo - a2 ao - bla2) ± V(b 2al H(a2 - bOa3) ± (1,4) (0,6) a2)2 + 4(boal - ao)] 1 Ha3 ± V a32 - 4all ~ [(b Oa3 - 2(b0 - a4 a2) ± V(boa3 - a2)2 - 4(b o - a4)(b oal - ao)] 1 -2 [a3 ± as (J') -t » ~ r- [b o - b2w21 ::::j [w 4 - a2W 2 + ao] 1 - (0,5) (1, 5) [b o - w2 ] lao - a2W 2] ao - b1a2)2 - 4(b 2 - a2)(b oal - b1ao)] - b ' [(b Oa3 + al - b1a2) ± V(boa3 + al - b1a2)" - 4(a3"- b1)(boal - b1ao)] 2(a3 - 1 2 1 + bo - V (bOa3 - a2W - lao - a2W 2] bo lao - a2W 2] [~] (0,4) (2,4) lao - V a3 - 4alaS] -< 1 - [w 4 - a2W 2 + ao] bo [w 4 - a2W 2 + ao] [b o - w 2 ] [a4w4 - a2W 2 + ao] 1 bo [a4w4 [ -w6 a2W 2 + ao] + a4w4 - a2W 2 + ao] ~ ~ ....... FEEDBACK CONTROL 21-58 Cons truction Aids From the discussion in the preceding subsections, it may be inferred that locating points that satisfy the angle condition is a time-consuming procedure. To aid in this respect, a simple device can be constructed which mechanically, sums the vector angles. Mechanical Angle SUInIner. {See Fig. 56.) This device is made of clear plastic. The arm rotates on the disk with a slight drag. To use, place the pin point at an exploratory point s with arm pointing horizontally to the left and the zero degree arrowhead aligned under the arm centerline. ~ ,Arm" Pin head ===~~=====;:=====F==~~ Disk (:Pin point Side View FIG. 56. Mechanical angle summer. To sum angles of pole vectors, hold the disk in place and rotate arm centerline to a pole root. Release disk and return arm to neutral. Friction causes disk to rotate with arm. To sum angles of zero vectors, reverse the order of disk rotation. Rotate arm centerline to a zero root (disk free to rotate). Hold disk and return arm to neutral. When all roots have been successively accounted for and the arm has been returned to the neutral position, the 180-degree arrowhead should lie under the arm centerline for a point to be on the root locus. The Spirule shown in Fig. 57 (Ref. 13) performs the above operation plus the additional feature of calibrating the locus. A logarithmic spi'ral curve on the arm' permits the logarithm of a length to be obtained as, an FIG. 57. Spirule. (Developed by 'V. Evans. Available from the Spirule C()mpany, . Whittier, Calif.) STABILITY 21-59 angle, so that the addition of such angles corresponds to adding logarithms. The root locus is calibrated rather simply with this addition. Conductivc Papcr Disk. (See Figs. 58 and 59.) To minimize further the labor involved and therefore enhance its use, machines have been devised which perform the necessary operations automatically. One such machine uses the fundamental idea that the electric potential de- A-c supply FIG. 58. Angle measurement on a conductive paper disk. A-c supply Circles of equipotential FIG. 59. Magnitude measurement on a conductive paper disk. veloped on a conducting paper disk could represent angles or logarithms of lengths. This principle has been tried with success (Ref. 35). A Mechanical Plotting Machine. This machine, described in Ref. 14, is a simple mechanical instrument which sums angles of vectors by using the principle that torque developed by a rotational spring is proportional to the angle of rotation of the spring. The machine is simple in construction, portable, and it requires no auxiliary power. A Compact Analog Machine. Described in Ref. 15 and called the "Complex Plane Analyzer" this machine can, among other functions, be used to obtain a root locus plot. The principle involved is that of reducing vector multiplication to two independent summations of phase and log magnitude. To this end, a logarithmic potentiometer is used to measure 21-60 FEEDBACK CONTROL magnitude and a linear one measures phase. Capacitors are individually charged with voltages representing these quantities. Summation of capacitor voltages produces the required overall products and quotients. The machine is simple, rugged, can also be used to plot phase loci, and is available commercially. SOllle COllllllon Root Loci The following plots (Figs. 60-91) are presented to aid in checking some of the preceding theorems, to present some general loci and to show in general how redistribution or variation in number of poles and zeros affects the plot. jw jw s-plane s-plane ~x~x--+-----~U~ -84 FIG. 60. -Sl FIG. 61. Root locus plot for G(8)H(8) = ( 8 ~X---X--~r-~------~ -84 -82 K(8 + 81) + 82)(8 + 84 ) -82 -81 Root locus plot for G(8)H(8) K(8 + 82) = (8 + 82)(8 + 84 ) jw jw ;to (-a + j(3) u FIG. 62. Root locus plot for G(8)H(s) (8 Root locus plot for FIG. 63. G(8)H(8) + a + j(3)(8 + a - j(3) (8 + a + j(3)(8 + a - j(3) STABILITY 21-61 jw .---x----x -SI FIG. 64. FIG. 65. Root locus plot for G(8)H(8) = ( 8 K(8 + 81) + 82)(8 + 8a) ! Root locus plot for K G(S)H(8) = - - - - - - (8 81)(8 82)(8 8a) + + X + jw jw (-0: + j(3) ~X -52 I-V __________~~U~ ( -SI X - - - t -U ;:;.... -51 I-~ FIG. 66. Root locus plot for FIG. 67. Root locus plot for G(8)H(8) G(8)H(8) K (8 { + 81)(8 + a + i(3) } X (8 + a - i(3) K (8 { + 81)(8 + a + i(3)}o X (8 + a - i(3) FEEDBACK CONTROL 21-62 jw jw 8-plane --------x~~--x~x -82 FIG. 68. FIG. 69. Root locus plot for G(s).H(s) (S { -84 -81 Root locus plot for G(8)H(8) = (8 + a + j(3)(s + a - j(3)}. X (s + Sl) K(s + S1) + 82)(8 + 84)(S + 86) jw jw ct FIG. 70. G(s)H(s) Root locus plot for -86 FIG. 71. Root locus plot for G(s)H(s) 21-63 STABILITY \ (-0: jw + j{3) u -x FIG. 72. jw Root locus plot for G(s)H(s) + +Jm +a - (s a { X (s FIG. 73. Root locus plot for G(s)H(s) J(3)(s + S2) } jw jw 8-plane x--x-~--+---"-U -x-++--E+~>--- -82 FIG. 74. -S4 Root locus plot for FIG. 75. -S6 -S1 Root locus plot for G(s)H(s) = (s K(s + Sl) + S2)(S + 84)(S + S6) FEEDBACK CONTROL 21-64 jw FIG. 76. Root locus plot for G(s)H(s) = jw FIG. 77. K(s + SI)(S + S3) • s(s + S2)(S + S4) Root locus plot for G(s)H(s) = K(s + SI)(S + S3) • s(s + S2)(S + S4) jw jw FIG. 78. Root locus plot for G(s)H(s) = K(s + SI)(S + S3) • s(s + S2)(S + S4) FIG. 79. Root locus plot for G(s)H(s) = K(s + SI)(S + S3) • s(s + S2)(S + S4) 21-65 STABILITY jw FIG. 80. Root locus plot for G(s)H(s) = K(s + SI)(S + 83) • (s + S2)3 jw FIG. 81. G(s)H(s) Root locus plot for = K(s s(s + SI)(S + S3) • + S2)(S + S4) jw jw (-a + j(3) ---x----~----+---~-82 FIG. 82. Root locus plot for FIG. 83. G(s)H(s) G(s)H(s) K K (8 { Root locus plot for + 82)(S + S4)(S + S6)}' X (s + S8) s(s + S2)(S + a + j(3)(s + a - j(3) 21-66 FEEDBACK· CONTROL (-ex + j(3) jw jw X (-ex ~+ + j/3) (f ~+ (-ex -j{3) FIG. 84. Root locus plot for FIG. 85. G(s)H(s) Root locus plot for G(8)H(8) K (S { K + S2)(S + 84)(8 + a + j{j)}o X (8 + a - j(3) 8(8 { + 82)(8 + a X (8 +a +j{j)} -j{j) jw jw (-ex X + j(3) (f (-~-j{3) FIG. 86. FIG. 87. Root locus plot for . Root locus plot for G(s)H(s) G(8)H(8) K (8 { + al + j(3l) (8 + al - j(3l) X (8 + a2 + j(32) (8 + a2 - S(8 } j(32) { + 82)(S + a + j(3)}o X (8 + a - j(3) 21-67 STABILITY ,@"2+-~FIG. 88. FIG. 89. Root locus plot for Root locus plot for G(s)H(s) G(s)H(s) S(S ,{ + S2)(S + a + j{j)} X (S + a - j{j) S2(S + a + j(3)(s + a - j{j) jw +j 37r/T +j7r/T (f -j7r/T -j37r-jT FIG. 90. Root locus plot for G(s)H(s) = Ke- Ts • FIG.91. Root locus plot for G(s)H(s) Ke- Ts = --. (s + S2)' FEEDBACK CONTROL 21-68 Interpretation of Results The root locus plot provides a pictorial representation of the roots of the characteristic equation of the closed loop response. The location of these roots determines the modes of the transient response. Figure 92 shows contours of constant characteristics of these modes. Line of constant WD Circle of constant Wo (J' s-plane FIG. 92. Contours of constant characteristics of transient response modes. The jw-axis defines the limit of absolute stability. For the system to be absolutely stable, all roots must lie in the left half· s-plane. Circles concentric with the origin correspond to loci of roots with constant undamped natural frequency. Therefore, for a system prescribed to have a maximum natural frequency mode, all roots must lie within the corresponding prescribed circle. Lines of constant imaginary part correspond to lines of constant damped natural frequency. For prescribed maximum damped natural frequency, all roots must lie within the area bounded jw FIG. 93. Location of roots for combined restraints. by the corresponding prescribed lines of constant imaginary part. Lines of constant real part correspond to lines of constant response time or constant exponential decay factor (- rwo). Again for prescribed maximum individual response time, aU roots must lie to the left of the corresponding STABILITY 21-69 line of constant negative real part. Radial lines passing through the origin correspond to lines of constant damping ratio (r). For prescribed minimum damping ratio, all roots must lie within the area bounded by the corresponding minimum damping ratio lines encompassing the negative real axis. Note that lines of zero damping factor, infinite response time, and absolute stability are the same. Combined restraints may be imposed on the modes of the transient response by reducing the area of the root location to that area common to the individual areas. For example, with specified maximum response time, maximum damped natural frequency, and minimum damping ratio, the roots would have to lie within the cross-hatched area of Fig. 93. Multiloop Systellls Analysis For multiloop systems, the procedure is to reduce the individual inner loops to transfer functions in factored form by use of minor loop root loci. The major root locus is then drawn up as a single loop. A particular advantage of the root locus method of analysis is that, when changes are made in the minor loops, the effect on the overall loop is shown directly. For example, consider a closed loop voltage regulating system shown in Fig. 94. When a load is imposed upon the system, the gains change V(s) (Ts s+1)(TS s +1) FIG. 94. Multiloop voltage regulating system. because of nonlinear behavior. The major loop gain decreases whereas the minor loop gain increases. The minor loop root locus is shown in Fig. 95. The major loop root is shown in Fig. 96. This figure reveals that the net effect on the overall system of imposition of full load is that the dominant pol.e pair (those complex roots closest to the origin) shifts to a lower frequency with a slightly higher damping ratio whereas the sub dominant pole pair (those complex roots furthest from the origin) shifts to a higher frequency with a lower damping ratio. The conclusion here is that imposition or removal of load does not severely affect the system stability or performance. 21-70 FEEDBACK CONTROL K 2 • no load minor loop gain K'2' full load minor loop gain s-plane FIG. 95. Minor loop root locus plot K 2s G(s)H(s) = (T6 S + l)(Tss+ 1)(T12S + l)(T14S + 1) • K 5 • no load major loop gain K's. full load major loop gain \ "'"tK's I ~- .... ;/ K5 ------x~~~--~----~~o_----------- FIG. 96. Major loop root locus plot. STABILITY 21-71 In multiloop systems, desired performance of the overall loop can sometimes be achieved by use of unstable minor loops. In these instances, it must be remembered that if a failure can occur such that· the remaining system releases large amounts of uncontrolled energy, the design should be critically reviewed. In practice, systems are usually designed with stable inner loops. Systelll Design By nature, synthesis is more complicated than analysis. A few genera] observations can be made with regard to reshaping the root locus to obtain the required root locations. Inspection of the plots shown in Figs. 60 to 91 shows that poles tend to repel the locus whereas zeros tend to attract it. Also, as the difference between the number of poles and zeros increases, the locus tends to shift toward the right half s-plane. System synthesis through use of the root locus technique amounts to proper placement of the closed loop poles and zeros. The process is by no means simple, but by use of some of the previously mentioned machinery, a large amount of the labor is circumvented. For a detail of design in terms of root loci, see Chap. 23 and Ref. 16. Relative Merits of Root Locus Method This method is theoretically exact and places in evidence the salient features of a closed loop system. Drawing the locus may involve a slight amount of work, but excessive labor is circumvented by use of mechanical aids. Major advantages of this method are: a. The behavior pattern of the entire closed loop can be shown in one simple diagram. b. Modes of the transient response are placed directly in evidence. c. Effects of variations in system parameters are placed directly in evidence. This is a relatively new method and is gaining widespread use. 7. MISCELLANEOUS STABILITY CRITERIA There are many, many methods to perform stability analysis of linear systems. The following is a brief account of some methods not discussed in previous sections. For the interested reader, the references can be consulted for theory and details of operation . . Hurwitz Criterion (Refs. 17, 18). This criterion is similar in nature to the Routh criterion and involves use of determinants. It is in general more laborious than the Routh criterion and offers information only with regard to whether or not all roots of the characteristic equation lie in the 21-72 FEEDBACK CONTROL left half s-plane. This method has been used to advantage in deriving other stability criteria such as stability boundary diagrams. Dzung's Criterion (Refs. 19, 20). This stability criterion is very similar to the Nyquist criterion except that it avoids the necessity of determining the location of poles of G(s)H(s) on the jw-axis. It offers particular advantage when G(s)H(s) is not known in factored form and Routh's criterion indicates poles of G(s)H(s) on the jw-axis. Wall's Criterion (Ref. 21). This stability criterion is similar to the Routh criterion and in many cases the computations are somewhat simpler. Stability Boundary Theory (Refs. 22, 23, 24). This method is nice in that some simple arithmetic calculations are made by using the coefficients of the characteristic function, the results are plotted on given charts, and stability is ascertained by inspection. The main disadvantage lies in the large number of charts required for higher order systems. Stability Plus Assurance of Margin of Stability (Refs. 25, 26, 27) + By substitution, Sf = (s a) and/or Sf = se j8 in the characteristic equation, which corresponds to shift and/or rotation of the axes in the s-plane, and subsequent analysis of the resulting equation, stability plus assurance of a margin of stability can be ascertained. The substitution may result in an equation with complex coefficients. Analysis may be carried out by use of any of the following. Nyquist and Dzung Criteria. These criteria are general in nature and are applicable. Analog of Hurwitz (Refs. 28, 29), Wall (Ref. 4), Routh Criteria (Ref. 4). These criteria are similar to the criteria as described previously. Leonhard's Criterion (Ref. 30). This stability criterion is similar to the Nyquist criterion. Analog COInputer Approach (Ref. 33). By simulating the equations which describe the physical equipment's behavior, it is possible to study system stability and performance characteristics. An entire part in Vol. 2 is devoted to analog computers. 8. CLOSED LOOP RESPONSE FROM OPEN LOOP RESPONSE As shown by eq. (1), the closed loop response of the general negative feedback system is a function of the forward and feedback transfer functions. Block diagram manipulation of the diagram in Fig. 1 leads to that shown in Fig. 97. That portion of the system shown in the dashed rectangle is a unity feedback system whose closed loop response is given by G(s)H(s) (71) -------'-' 1 + G(s)H(s) . STABILITY 21-73 For any value of G(s)H(s), the closed loop response can be considered a vector given by (72) All is the magnitude of the vector where a is its direction in radians. ,-----------, I I I I I I I el(s) I R(s) + I I I I IL _ _ _ _ _ _ _ _ _ _ _ _ --1I FIG. 97. General negative feedback system. Contours of Constant M and a. It can be shown (Ref. 31) that for unity feedback systems, certain curves in the complex plane correspond to loci of constant }vI and constant a. For the direct polar plot of G(s)H(s), the M loci are circles with Radius = IM 2M-1 I M2 and center at - -2-- M -1 on the axis of reals. The curves are shown in Fig. 98. The a loci correspond to circles passing through the origin and the -1 point and centers at 1 1 2 2 tan a --+j--. These a curves are shown in Fig. 99. These curves of constant 111 and a are useful for many purposes. An important use is that of obtaining the closed loop response from a plot of the open loop response, G(s)H(s). G(s)H(s) is superimposed on curves of constant M and constant a, and the closed loop magnitude and phase angles are obtained by inspection of the respective circles at points of intersection with G(s)H(s). For the inverse polar plot of G(s)H(s) it can be shown (Ref. 31) that the contours of constant M are circles with center at the -1 point and radius equal to 1/M. See Fig. 100. The contours of constant a are straight lines passing through the -1 point with slope equal to a. The M and a contours in the inverse G(s)H(s)-plane plot are somewhat easier to use because of ease of construction. 21-74 FEEDBACK CONTROL 4 3 2 -4 FIG. 98. Circles of constant M in the G(s)H(s)-plane. STABILITY 21-75 4 3 2 Re O~------~~§§~~------~~ -1 -2 -3 -4 -4 -3 FIG. 99. -2 -1 o 2 Circles of constant a in the G(s)H(s)-plane. 3 21-76 FEEDBACK CONTROL It is of interest to note that the lYI and a contours are perfectly general curves for unity feedback systems. In other words, G(s)H(s) is not restricted to those values of s on the fw axis. FIG. 100. Contours of constant M and a in the inverse G(s)H(s)-plane. Nichols Charts The information contained in the M and a circles, when plotted in terms of decibels and phase angle as shown in Fig. 101, are commonly referred to as Nichols charts because of the fundamental work first done by N. B. Nichols (Ref. 32). The curves shown in the figure have a mirror image about the -180° ordinate. The total curves correspond to the principal value of the logarithm given by eq. (28). Since the logarithm of a complex number is multivalued, eq. (27) the curves repeat as shown in Fig. 102. Stability Analysis on the Nichols Chart. Note that the -1 point in the G(s)H(s)-plane corresponds to the O-db, -180° point in Fig. 10l. For well-behaved, minimum phase G(s)H(s) , the system is stable if G(jw)H(jw) crosses the O-db line to the right of the -1 point on the Nichols chart. Figure 103 shows a stable system with a phase margin of +48° at gain crossover and a gain margin of 6.8 db at phase crossover. Exact Closed Loop Response. The procedure to obtain closed loop response from open loop response is as follows: a. Manipulation of the general negative feedback system to the form shown in Fig. 97. STABILITY 21-77 22 1--+-+----1\ -2.0 (0.794) f-L-I-+-~~~-\+-_\__f_++__l'+__I__+~-_+_-1l__+____:;l~ '~-4-:"¥=-+--+l--I-hI+-I--fSr...:--:-:-:-¥---+-\--!-\--b..../ -3.0 (0.708) -12L-_~ -180 -160 __~~~~~~~~~~~_U-~~~~ -140 -120 -100 -80 -60 -40 -20 0 Phase angle. degrees FIG. 101. Nichols chart. I'-.) --' ~ 00 .!!l .c "T1 .CU 'u cu m m r: ~ :t> () cu () o "C o:J o CO ::::J C A o ~ Z -t ;::0 o r-720 0 -630 0 -540 0 -450 0 -360 0 -270 FIG. 102. 0 -900 -1800 Phase, degrees Multiple Nichols charts. 90 0 1800 2700 360° STABILITY 24 21-79 (1.0) 221----H--t\. -2.0 t+---H--f--f-lI.---t---t-Ir-::i;;oo-"1(0.794) (-1 point) -12~ __~__~~__~~~__~~__u-~~__~~~ -180 FIG. 103. -20 G(s)H(s) plotted on Nichols chart. 0 FEEDBACK CONTROL 21-80 b. Plot G(s)H(s) directly or inversely with the corresponding M and G(s)H(s) on the Nichols chart is shown in Fig. 103. c. Obtain the closed loop M and a at points of intersection of G(s)H(s) with the M and a loci. d. Modify this response by I/H(s) to obtain the overall closed loop response. ApproxiInate Closed Loop Response. The approximate closed loop response can be obtained by plotting the Bode diagram for G(s) and a loci. 40 20 VI Q) .0 -----...... ...::::.: ~ ~~ Q) 0 o r-. --.-.- ........ ~ --la(s)1 \ 1(S)1 A '0 C ......... ........... I;' R(s) -20 ,I~~,)II --12: "":::::~I-............ ---~ ~ 1\ ~ \C(S) \ R(s) - -, i"- ........ -90 lG -120 ~ bO ~ -150 ---- I---... / Phase angle of ~~~ (approx.) ....... N -;--,,1"I"--.. -......... -180 0.1 0.2 ....... i'- 0.4 0.7 1 4 2 7 10 -- - 20 r-- 40 70 100 w~ FIG. 104. Approximate closed loop response. [H(S)]-l on the same sheet. The closed loop response is approximately equal to the lower of the two curves at any given frequency (see Fig. 104). (73) G(~ C(s) R(s) 1 1 + G(s)H(s) [1/G(s)] + H(s) when H(s) is much smaller than G(s) and G(s)H(s) is greater than 1. 1 -G(s) + H(s) = H(s) and C(s) 1 --=--1 R(s) H(s) when G(s) is much smaller than H(s) and G(s)H(s) is less than 1. 21-81 STABILITY 1 -+ H(s) G(s) 1 ~-- G(s) and C(s) R(s) ~ G(s). The approach is to approximate the closed loop response by the lowest portions of G(s) and [H(S)]-l. Assume all the breaks are simple and of multiplicity one or more. The phase diagram is drawn assuming the simple breaks of a minimum phase nature. The approximation is worst in the region of w where G(s) = [H(S)]-l. If necessary, the exact closed loop response can be obtained in this region by using the preceding exact method and the Nichols charts. EXAMPLE. G(s) = 10 s(0.5s + 1) , H(s) (0.33s + 1) In Fig. 104 are plotted G(s) and [H(S)]-l. The approximate closed loop response is shown as the heavy line and is approximated by the equation (74) C(s) 10(0.33s --~ R(s) S(28 + 1) + 1)(0.09s + 1) The phase angle curve is that corresponding to eq. (74). N ole. There are many ways to investigate stability of linear closed loop systems. If used properly, they should all obtain the same result. REFERENCES 1. E. J. Routh, Stability of a dynamical system with two independent motions, Proc. London Math. Soc., ser. 1, 5, 97-99 (1874). 2. E. J. Routh, A Treatise on the Stability of a Given State of Motion, Cambridge Uni~ versity Press, Cambridge, England, 1877. 3. E. J. Routh, Advanced Part of a Treatise on Advanced Rigid Dynamics, 6th edition, pp. 210-231, Cambridge University Press, Cambridge, England, 1930. 4. T. J. Higgins, Epitomization of the Basic Concepts Underlying the Theory of "The Stability" of Servomechanisms, Advanced Servomechanisms and Automatic Control Theory, Class Notes EE 216, University of Wisconsin, Ronald, New York, 1955. 5. E. A. Guillemin, The Mathematics of Circuit Analysis, Technology Press and Wiley, New York, 1950. 6. H. Chestnut and R. VV. Mayer, Servomechanisms and RegUlating System Design, Vol. 1, vViley, New York, 1951. 7. H. W. Bode, Network Analysis and Feedback Amplifier Design, Van Nostrand, Princeton, N. J., 1945. 21-82 FEEDBACK CONTROL 8. H. W. Bode, Amplifiers, Patent 2,123,178 (1938). 9. N. Balbanian and W. R. Lepage, What is a minimum phase network? Trans. Am. [nst. Elec. Engrs., Pt. 1, No. 22, January 1956. 10. G. A. Biernson, Estimating transient responses from open-loop frequency response, Trans. Am. [nst. Elec. Engrs., 74, 388-402, Pt. 2, January 1956. 11. W. R. Evans, Graphical analysis of control systems, Trans. Am. [nst. Elec. Engrs., 67, 547-551 (1948). 12. W. R. Evans, Control system synthesis by root locus method, Trans. Am. [nst. Elec. Engrs., 69, Pt. 1, 67-69 (1950). 13. W. R. Evans, Control Systems Dynamics, McGraw-Hill, New York, 1954. 14. A. H. Harris, A Simple [nstrument for Summing Angles in the Root Locus Method of Solving Ordinary Equations and Stability Problems, University of California, UCRL2269, Berkeley, July 10, 1953.' 15. Thf!l Complex Plane Analyzer, The Technology Instrument Corporation, CPA type 250-A, Acton, Mass. 16. J. G. Truxal, Automatic Feedback Control System Synthesis, McGraw-Hill, New York, 1955. 17. A. Hurwitz, Ueber die Bedingungen, unter welchen eine Gleichung nur WurzeIin mit negati~en reaBen Theilen besitzt, Math. Ann., 46, 273-284 (1895). 18. L. Orlando, SuI problema di Hurwitz, Rendiconte Accademia Lincei, ser. 5, Vol. HI, pp. 801-805, Rome, 1910. 19. L. S. Dzung, The Stability Criterion, in Automatic and Manual Control, Butterworths, London, 1952 (Proceedings of the 1951 Cranfield Conference, pp. 13-23). 20. L. S. Dzung, Das Stabilitatskriterium nach Nyquist, Regelungstechnik, 1, 143-145 (1953). 21. H. S. Wall, Polynomials whose zeros have negative real parts, Am. Math. Monthly, 52, 308-322 (1945). 22. E. Sponder, On the representation of the stability region on oscillation problems with the aid of Hurwitz determinants, NACA Technical Memorandum 1348, August 1952, A Translation of E. Sponder, Zur Darstelling des Stabilitatsgebietes bei Schwingungsaufgaben mit Hilfe der Hurwitz-Determinanten, Schweiz. Arch., 16, 93-96 (1950). 23. J. F. Koenig, On the zeros of polynomials and the degree of stability of linear systems, J. Appl. Phys., 24.,476-482 (1953). 24. T. J. Higgins and J. G. Levinthal, Stability limits for third-order servomechanisms, Trans. Am. [nst. Elec. Engrs., 71, Pt. 2, 459-467 (1952). 25. J. F. Koenig, A relative damping criterion for linear systems, Trans. Am. [nst. Elec. Engrs., 72, Pt. 2, 291-295 (1953). 26. A. Vazsong, A generalization of Nyquist's stability criteria, J. Appl. Phys., 20, 863-867 (1949). 27. A. Leonhard, Relative Damping as Criterion for Stability and as an Aid in Finding the Roots of a Hurwitz Polynomial, in Automatic and Manual Control, Butterworths, London, 1952 (Proceedings of the 1951 Cranfield Confere~ce" pp. 25-43). 28. E. Frank, On the zeros of polynomials with complex coefficients, Bull. Am. Math. Soc., 52, 144-157 (1946). 29. H. Bilharz, Bemerkung zu einem Satze von Hurwitz, Z. angew. Math. u. Mech., 24, 77-82 (1944). 30. A. Leonhard, Ueber Selbsterregung elektrischer Maschinen, Arch. Elekrotech., 40, 343-346 (1952). 31. G. S. Brown and D. P. Campbell, Principles of Servomechanisms, Wiley, New York, 1948. STABILITY 21-83 32. H. M. James, N. B. Nichols, and R. S. Phillips, Theory of Servomechanisms, McGraw-Hill, New York, 1947. 33. C. L. Johnson, Analog Computer Techniques, McGraw-Hill, New York, 1956. 34. Robert Donahue, unpublished, M.LT. Flight Control Laboratory. 35. General Electric Company, unpublished, Schenectady, New York. E FEEDBACK CONTROL Chapter 22 Relation between Transient and Frequency Response C. E. Bradford and M. W. DeMerit 1. Introduction 2. Response Characteristics Defined 3. Relation between Transient Response and location of Roots of Characteristic Equation 4. 5. 6. 7. Relation between Closed loop and Open loop Roots Design Charts Relating Open loop Frequency Response and Transient Response Approximate Relations-Rules of Thumb 22·01 22·02 22·03 22·15 22·18 22·43 Numerical and Graphical Techniques of Relating Transient and Frequency 22·43 22·61 Response Referenc;e$ 1. INTRODUCTION The frequency response technique of analyzing servo systems is used to facilitate both the analysis and synthesis operations (Chaps. 20 and 21). Often it is desirable to transform the results of the frequency response analysis into transient response form in order to interpret them more readily. Conversely, it is often desirable to transform the transient response performance requirements into frequency response form for synthesis purposes. These operations can be performed exactly by rigorous mathematical techniques; however, the operations are time consuming 22·01 22-02 FEEDBACK CONTROL and tedious, so it is often profitable to use less accurate but more easily applied techniques. The purpose of this section is to present some of the more useful techniques for relating the transient response to the frequency response and the inverse relations between frequency and transient response. 2. RESPONSE CHARACTERISTICS DEFINED Transient Response. System response is often specified and interpreted in terms of the characteristics of the transient response to a step . 1.05 1.00 0.95 Q) VI c:: 0.90 0 c.. C/Rlp VI ~ 0.50 Q) E i= 0.10 o ~~--~-+--~--------~---------'td~ I I Time_ (Delay time) -t I : ~ I I I I (Rise rtime) tp - - - ' (Time to peak) /--------'--'- ts I ----~ (Settling time) (a) Q) VI c:: o c.. ~ 1.000 1----,..--- 3 db 0.707 _1. _______ _ >. u c:: Q) :::J 0' Q) Lt o~------------~------~----------- w- (b) FIG. 1. (a) Representative system response to unit step input. (b) Representative system frequency response (closed loop). input. The parameters which are most often used to describe the transient response are: C/R Ip, the peak value of the transient including any overshoot; tp , the time to the first peak if the response is underdamped and thus has an overshoot; TRANSIENT AND FREQUENCY RESPONSE 22-03 ts , the settling time, measured from the initiation of the step input to the time at which the system output no longer deviates more than a certain percentage, often 5 or 2 per cent from its final value; N, the number of oscillations it takes the system to "settle" or reach ts. Other parameters sometimes used to describe the transient response are: td, the delay time, measured from the initiation of the step input to the time at which the response has reached half the final value; tT , the rise time, which is the difference between the time at which the response has reached 10 per cent of the final· value and the time at which 90 per cent of the final value is reached. Rise time is also sometimes defined as the time from 5 to 95 per cent, and also as the reciprocal of the slope at the instant the response is 50 per cent of the final value. Figure 1a illustrates the definitions of these transient response parameters. Frequency Response. System response is also often described in terms of certain frequency response characteristics. Chief of these are: M m , the maximum amplitude ratio of the closed-loop frequency response, which is sometimes designated GIR 1m; W m, the frequency at which Mm occurs; Wb, the bandpass frequency which is generally defined as the frequency at which the closed loop response is down 3 db from the nominal steadystate gain value. Figure 1b illustrates the definitions of these terms. 3. RELATION BETWEEN TRANSIENT RESPONSE AND LOCATION OF ROOTS OF CHARACTERISTIC EQUATION Mathematical Relation. The open loop frequency response may be represented by the open loop response function in terms of its poles and zeros, roots of the denominator and numerator respectively, of the forward and feedback portions of the control system, (1) N 1 (s)N2 (s) G(s)H(s) Dl (S)D2(S) + Zll) (s + Z12) sm(s + Pll)(S + P12) K(s ••• ... + Z21) (s + Z22) .. . (s + P21)(S + P22) .. . (s The closed loop frequency response function can be obtained as follows: (2) C(s) G(s) --- = ---------R(s) 1 + G(s)H(s) FEEDBACK CONTROL 22-04 + The closed loop poles can be found by factoring [D I (S)D2(S) N I (s)N 2(s)]. Substituting the proper Laplace transform for R(s), C(s) may be represented by a sum of terms such as (3) C(s) = Al A2 Aa -+--+--+ ... s s + a2 s + aa where Zin are the roots of NI, Z2n are the roots of N 2, PIn are the roots of DI, P2n are the roots of D 2, anq. where the constants AI, A 2, A a, etc., are found by partial fraction expansion. The time response function is then found by performing the inverse Laplace transformation to get (4) c(t) = Al + A2 exp (-a2t) + Aa exp (-aat) + In general, this function must be plotted to determine such parameters as peak overshoot and settling time which are often of prime importance. The straight mathematical approach is impractical for any but simple systems because of the amount of tedious work involved and the fact that it does not lend itself to system synthesis. ApproxiInate Approach The time response can be estimated quite accurately by noting the location of certain predominate closed loop poles and zeros in the complex frequency plane (s-plane). The closed loop pole-zero configuration may consist of one or more pairs of complex poles and several real axis poles and zeros, and perhaps complex zeros. Ordinarily, one pair of complex poles will be of primary importance because of its frequency or damping ratio. For example, if a system contains two pairs of complex poles which have natural frequencies that differ by as much as 10 to 1, the designer may ordinarily consider either pair as dominant and perform an analysis in two parts, considering first one pair and then the other. For many cases it is reasonably accurate to neglect all but one complex pole pair and to consider the transient response to be made up of the dominant pair of complex- poles and various groupings of real axis poles and zeros. This assumption is made in the following discussion. Only underdamped systems are to be considered here. Overdamped systems may generally be analyzed quite easily by the normal mathematical techniques. Dominant Pair of Complex Poles. To determine the effect of the s-plane pole-zero configuration in the system transient response, it is convenient to first consider the relation between a single pair of complex poles on the s-plane and the characterizing parameters of the time response. TRANSIENT AND FREQUENCY RESPONSE 22-05 The additional effect of the real axis poles and zeros will be considered later. The expression for the closed loop frequency response function containing one pair of complex roots is C(s) (5) where G(s) R(s) Wo r 0"0 Wd 1 + G(s) = natural frequency, = damping ratio, = rwo = damping exponent, = Wo ~ = natural damped frequency, or oscillation frequency. The parameters Wo, r, 0"0 and Wd are shown on the s-plane in Fig. 2. Complex pole f i FIG. 2. One pair of complex roots and significant related parameters, r= cos "'1. Figure 3 illustrates that for constant values of natural frequency, Wo, the complex roots or poles of eq. (5) generate circles on the s-plane as the damping ratio r is varied. Radial lines from the origin are generated by holding r constant and varying woo Figure 4 illustrates that holding 0"0, the exponential damping factor, constant forms lines parallel to the imaginary (jw)-axis on the s-plane. Similarly, maintaining constant values for the damped frequency, Wd, forms lines parallel to the real (0") axis. 22-06 FEEDBACK CONTROL The expression for the time response to a unit step input is e(t) = 1 (6) + (WOIWd) exp (-O"ot) sin ~wdt -1/11), where 1/;1 = arctan wdl - 0"0 = arctan viI - r2/r. Figure 3 also illustrates that constant values of 1/;1 correspond to constant jw 0" Constant wo is a circle; constant r is a radial line. FIG. 3. jw s-plane 0"03 0"02 0"0) Wd3 wd 2 wdJ 0" FIG. 4. Illustrating lines of constant Wd and 0"0. values of r. From eq. (6) the characterizing parameters of the transient response can be derived. The equations for the more important ones are: 1Wo ~, (7) tp = (8) ts = 310"0 = 3/rwo, 'time required to settle to within 5% of final value (=4/rwo for 2%). 7r 1Wd = 7r time required to reach first peak. 22-07 TRANSIENT AND FREQUENCY RESPONSE C/Rlp (9) (10) 1 + exp (-7rr/VI=""?), = 1 + exp (-7r(J'O/Wd) the peak value of the ratio of output to input. . = N = i s /(27r/Wd). _~ = 3v 1 - r2/27rr number of oscillations to settle to within 5% of final value. Equations (7) through (10) relate the position of the dominant pair of complex poles on the s-plane to certain transient response parameters. 20 18 16 14 . ~ 12 3 ....Q. 0 3 ~ 10 ~ 0. ~ Cj 0 8 ...-t 6 4 2 0.2 0.4 0.6 Damping ratio, FIG. 5. C jR ip, N, wotp, wots versus 0.8 1.0 f r for system composed of two complex poles. 22-08 FEEDBACK CONTROL The relationships between these parameters and the damping ratio, S, are plotted in Fig. 5. Similarly certain closed loop frequency response parameters can be related to the position of the poles. (11) Mm = 1/(2sVI="?), 0 < s < 0.707, maximum frequency response ratio of output to = 1, 0.707 < s < 1 input (also defined as Mm). 2.0 r-----,------r-------r---,---------, 1.8 f-----t----fl---t----t----/---i 1.6 t-----+---\-\--+-----+---t-------I 1.4 t-----t--~~-----+---t-------I o 1.2 I-----+----t\~-~---t-------I 3 oQ 3 ~~ 1.0 f-----t'-----+~~_+_--=->i~+_-_ ~~ 0.8 t-----t----+---f"r----t---"r---I t-----t----+-----+---~---I 0.6 0.41-----+----+-----+---+----1 0.2 t-----+----+-----+---+----I o~--~--~--~---~--~ o FIG. 6. M m, Me, Wb/WQ, 0.2 versus 0.6 0.4 Damping ratio, r 0.8 r for system composed of two complex poles. 22-09 TRANSIENT AND FREQUENCY RESPONSE (12) Me = 1/(2.r), (13) Wb = Wo response ratio at the frequency corresponding to the natural frequency, or corner frequency. ~I ""I - 2t 2 + v_I2 - 4t 2 + 4t4 , b~ndpass frequency, at which response ratio is 0.707. Figure 6 shows graphically the relationship between these parameters and damping ratio. Effect of Real Axis Poles and Zeros. The mathematical expression for a system whose dynamic characteristics can be described by one pair of complex roots, one real pole, and one real zero is C(s) (14) R(s) The expression for the transient response to a unit step input may be written as (15) e(t) = 1 - W02(Z - p) 2 exp (-pt) Z(PPd) + (Z:d) ~p:) (::) exp (-uot) sin (wdl - .p, +.p3 - .p.) where PPd = distance from P to Pd, ZPd = distance from Z to Pd. A graphical representation of this system is contained in Fig. 7. jw -p FIG. 7. Illustrating a system with one pair of complex poles, one real pole, and one real zero. FEEDBACK CONTROL 22-10 The term [w0 2(z - p)/Z(PPd)2] exp (-pt) in eq. (15) is neglected in determining the following expressions for the characteristic parameters of the transient response. This approximation is valid when P is much larger than 0-0 (at least 3 times as large). The expressions for C/R /p, tp, and ts are: ~ I = (ZPd) ~ P \ exp [-0-0(71" - l/13 + 1/;4)J, (16) R p Z \PPd/ Wd (17) tp = (71" - 1/;a (18) + 1/;4)/Wd, ts = 3/Swo. The settling time, ts , remains the same as before since the exponential pole term in eq. (15) has been neglected. Thus, N also remains unchanged. For multiple poles and zeros the expressions for C/R /p and tp in eqs. (16) and (17) become: (19) I ~ = R p (ITq pqPdl pq ~ (IT ZqPd) exp [-0-0 q Zq . (71" - };wa tp (20) = 71" - };1/;a + };1/;4) J, Wd + };1/;4 , Wd where };1/;a = sum of all angles between the real axis zeros, and the dominant pole, };1/;4 = sum of all angles between the real axis poles, and the dominant pole, ITq PqPd pq product of the ratios of the poles to the distances from the poles to complex pole at point Pd, "ITq ZqPd = product of the ratios of the distances of the zeros to point Pd, Zq to the zeros. = As noted before, eqs. (19) and (20) are approximate, based on the assumption that P is large compared to 0-0, which is realistic for many practical : systems. Conclusions reached from a study of eqs. (19) and (20) are: .1. The time to peak, tp , is inversely proportional to Wd, the damped natural frequency. 2. The addition of a pole increases tp and decreases C/R /p, the magnitude of the peak. ~ , 3. The addition of a zero decreases tp and increases C/R Ip. 4. If a pole and zero are close together (dipole), their net effect on tp and C/ R /p is negligible. TRANSIENT AND FREQUENCY RESPONSE 22-11 5. Poles and zeros far out on the real axis have little effect on tp and C/R/ p • H the assumption of the real poles and zeros being large compared with the damping factor (p» 0"0) is not valid, the values of C/ R /p and tp can still be estimated though not by the simple use of eqs. (19) and (20). 1.0 t---+-+----i--~~--___:;;.",c----~o;;;;::____=:::::::;;;;;-- Correction for Ae-pt term. O~---------------L-~~~r=----------~------------------ o Time~ FIG. 8. Illustrating the effect of a significant real pole on CfR Ip approximations. The magnitude of the coefficients of the exponential terms such as the one in eq. (15) can be calculated, as outlined in the next section, and then the effect of these terms on C/R /p and tp can be estimated. By referring to Fig. 8 as an example, it is apparent that the value of the simple exponential term at time, tp , must be subtracted from the approximate curve to give a more exact value of C/R /p. In other cases this correction might have to be added. Of course this process becomes more difficult as the number of significant poles and zeros increases. Time~ FIG. 9. Response with a dominant real pole and a pair of complex poles. The addition of poles and zeros to the closed loop. response function of a control system may result in a response function whose dominant charac- _FEEDBACK CONTROL 22-12 teristic is that of the real pole rather than the complex pair. For such a case the system resppnse to a unit step input might be as illustrated in Fig. 9. Coefficients of Transient Response TerIns Frequently, it is desired, as soon as the closed loop poles are found by graphical or other means, to determine the exact expression for the time response. This may be especially true when it is obvious that two pairs of complex poles are significant, i.e., they are both located about the same distance from the origin. The coefficients of the terms in the equation describing the transient response of the system may be calculated by a formula developed in Laplace transform theory. This formula may be interpreted in terms of the polezero configuration of the root locus plot for the system. To illustrate this a system described by the following equation is assumed. C(s) (21) K(s + z) R(s) If the appropriate Laplace transform for R(s) is substituted into eq. (21), the expression for C(s) can be written. Assuming a unit step input in this case (R(s) = l/s), then K(s + z) C(s) = - - - - - - - s(s + PI)[(S + 0"0)2 + wi] (22) In terms of partial fractions eq. (22) can be written as (see Chap. 20) (23) A A A2 C(s)=I(~+-_I_+ [ S S + PI S + (0"0 + jWd) + S+ A ]• 3 (0"0 - jWd) The transient response equation for this system is (24) e(t) = K[Ao + Al exp (-PIt) + A2 exp (0"0 + jWd)t + A3 exp (0"0 - Deterlllining the Coefficients. The formulas for the coefficients A o, AI, and A3 are: (25) (26) jWd)t]. z z - PI TRANSIENT AND FREQUENCY RESPONSE (27) A2 = [8 + (0"0 - (0"0 (28) Aa = [8 + + jWd)]C(8)/8=_(UO+jWd) z - (0"0 + jWd) + jWd) [PI - (0"0 Z - (0"0 + jWd) 2jWd( 0"0 22-13 + jWd) (PI - + jWd)][(O"Q - 0"0 - jWd) - (0"0 + jWd)] jWd) (0"0 - j Wd)]C(8)/8=_(Uo_jWd) Z - (0"0 - jWd) - (0"0 - jWd)[PI - (0"0 - jWd)][(O"O + jWd) - (0"0 - jWd)] Z - (0"0 - jWd) -2jWd(0"0 - jWd)(PI - 0"0 + jWd) Note that these coefficients are the ratios of vectors in the root locus plot. For example, consider Fig. 10 which ilhlstrates the pole-zero con+jw x I I I +jwd I I (-PI - 0) 0" -X.~~~~==2F========~-====-=-~-------- I -PI ~-O"o I -jwd I I I X FIG. 10. Vectors for determining Al coefficient. figuration of the system under. consideration. With this plot the value of any coefficient can be determined by drawing vectors from all other poles and zeros to the pole or root which corresponds to the coefficient. The coefficient is then the ratio of vectors from the zeros to those from the poles. Figure 10 shows the vectors for the calculation of AI. From this (29) Al -PI +Z = --------------PI[( -PI Z - + 0"0) + jWd][( -PI + 0"0) PI -pd(O"o - PI)2 which agrees with eq. (26). + wi] - j-IJd] FEEDBACK CONTROL 22-14 Similarly, the other coefficients may be determined from the root locus plot. Figure 11 shows the same system with the vectors oriented for determination of the Aa coefficient for one of the complex roots. From this (30) Aa IAal/~a Aa = - - = - - - - - - - - - - AIA2A4 (IAll/~1)(IA21/~2)(IA41/~4) IAal since ~2 = +j or +90°. +jw FIG. 11. Vectors for determining A3 coefficient. In like manner the A2 coefficient can be determined as (31) A2 = IAal /-~a -------..:..-....:.~----- (IAII /-~t)(IA21 /-~2)(IA41 /-,/14) IAal The AI, A2, Aa, and A4 vectors can be expressed in terms of the pole-zero locations in the root locus plot. When this is done eqs. (25) to (28) are the results. These coefficients can be evaluated conveniently by use of the Spirule, although a ruler and protractor will suffice (Ref. 1). The time response of a system such as the one being considered here is usually expressed in equation form with, a sine or cosine term instead of the complex exponents. This is illustrated)nthe:follow:i,ng ,equations. TRANSIENT AND FREQUENCY RESPONSE 22-15 From eqs. (30) and (31) the coefficients A2 and A3 may be expressed as A3 = (33) IA.31 /A _3, J IA31 A2 = - - . - / - A 3 . J (34) Combining these equations with eq. (32) yields (35) e(t) = Ao + Al exp (-Plt) + exp (-uot) [1~31 = Ao where . IA31 exp (-jwd t - j/Aa) + Al exp (-plt) + 2 IA31 X + 1~31 exp (jWd t exp (-uot) sin (Wdt + j/A 3 ) ] + /A 3), IA31 I}.q I IA21 IA41 = ----- /A3 = /t/l3 - t/ll - t/l4 and these vector lengths and angles are shown in Fig. 11. 4. RELATION BETWEEN CLOSED LOOP AND OPEN LOOP ROOTS Mathematical Relationship. The closed loop frequency response function may be readily written in terms of the open loop 'function as C(s) (36) R(s) G(s) 1 + G(s)H(s) where G(s) = open loop transfer function of forward element H(s) = feedback element. If these are written as (37) then (38) C(s) R(s) Nl (s) / Dl (8) 1 + N l (s)N 2 (s) Dl (S)D2(S) The closed loop poles are thus the roots of the denominator of eq. (38). This is generally a high order polynomial and constitutes a tedious task if it is to be factored. For this reason methods of estimating closed loop roots from open loop roots are useful. 22-16 FEEDBACK CONTROL Graphical Method of Determining Roots. A convenient way to obtain the closed loop poles {roots of [N I (s) N 2(s) + DI (s) D 2(s)]} is to use the root locus technique of graphically plotting loci of the closed loop poles as functions of the open loop system gain (see Chap. 21). These loci may be plotted from the open loop pole-zero configurations on the complex frequency plane as shown in Chap. 21. The plotting is simplified considerably by use of the Spirule plotting tool. The selection of the open loop gain to give the proper system response may be determined either by use of frequency response or root locus synthesis techniques. In either case the configuration may be examined for its transient response characteristics. A method of performing the reverse operation, working from closed loop to open loop roots, is presented in Chap. 23. An Iterative Process of Determining Closed Loop Roots.· The closed loop poles {roots of [NI(s) N 2(s) + DI(S) D2(S)]} can be found more easily by mathematical techniques if known roots can be factored out and leave a simpler polynomial. A technique is available (Ref. 2) which allows closed loop poles to be found after the open loop response characteristics of the system, including the open loop gain, have been determined. The rules for determining closed loop poles by this method are the following: 1. An open loop zero located at a frequency lower than the crossover frequency, We, is approximately equal to a closed loop pole. 2. An open"loop pole located at a frequency higher than the crossover frequency, We, is approximately equal to a closed loop pole. These rules are used to make the first approximations for the closed loop poles. These approximate values may be refined by iteration using the following expressions: For open loop zeros much smaller than We, (39) where Pi = closed loop pole, ZI = open loop zero much smaller than We, Pi-I = value of Pi found by previous iteration (equals iteration), n = order of open loop zero. For open loop poles much larger than (40) were Pi = closed loop pole, We, ZI for first TRANSIENT AND FREQUENCY RESPONSE 22-17 P1 = open loop pole much larger than We, Pi-1 = value of Pi found by previous iteration (equals P1 for first iteration), n = order of open loop pole. If n is greater than unity, n values for, Pi will result with each iteration. Further refinement should continue only on those values of Pi remaining far from We. If the value of Pi approaches We the accuracy of the technique is poor. After these closed loop poles are found with sufficient, accuracy by iteration, they may be factored from the closed loop polynomial characteristic equation for the system. The resulting lower order polynomial may then be more easily factored. In general if the open loop poles and zeros are larger or smaller than We by a ratio of 3 to 1 or greater, two iterations will result in sufficient accuracy for finding the first closed loop poles. The coefficients of these transient response terms may be calculated as indicated in the previous section. EXAMPLE. Determining Roots. Assume the open loop transfer function (41) G(s)H(s) = s(s 400(s + 1) + 2)(s + 10)2' As previously stated: open loop zeros less than We are approximate closed loop poles. The crossover frequency, We, is 4 rad per second, as may be easily determined from a graphical plot of eq. (41). Therefore, (42) Pi-1 = -1.0. To refine this approximation 43) ( . 1 ~ -s(s (Pl + ) ~ + 2)(s + 10)21 400 ::::: _ [-1(-1 8=-1 + 2)(-1 + 10)2] 400 ::::: 0.20. Pi ::::: -1 (44) + 0.20 = -0.80. This may be repeated, (45) (Pi + 1) -0.8( -0.8 ::::: - [ + 2)( -0.8 + 400 ::::: 0.20. (46) Pi ::::: -0.80. 10)2] 22-18 FEEDBACK CONTROL Similarly, open loop poles larger than poles, so (47) (48) (49) We are approximate closed loop Pi = -10, -10, (Pi + 10)2 =. -400(s + 1) s( s + 2) I 8=-10 = + ¥cf- = +40. Pi = -3.7, -16.3. Since the We is 4, only the larger root can be expected to be useful. With it, repeating the process gives 2 -400( -16.3 + 1) (50) (Pi + 10) = -16.3( -16.3 + 2) = 26.3, (51) Pi = -4.9, -15.1. Again, (52) (53) -400( -15.1 + 1) 2 (Pi + 10) = -15.1( -15.1 + 2) , Pi = -15.35. The two closed loop poles thus determined are s = -0.80, -15.35, and hence they may be factored out of the expression (54) Nl (s)N2 (s) + Dl (S)D2(S) = 400(s + 1) + s(s + 2)(s + 10)2 = S4 + 22s3 + 140s2 + 600s + 400 to give the closed loop poles near (55) We as: S2 + 5.85s + 33.3 = (s - 2.93 + j5.34)(s - 2.93 - j5.34). Thus, the closed loop poles are: (56) s = -0.80, -15.35, (-2.93 + j5.34), (-2.93 - j5.34). In this example if H(s) is other than unity these roots are not the roots of C(s)/R(s), but are the roots of H(s) [C(s)/R (s)]. 5. DESIGN CHARTS RELATING OPEN LOOP FREQUENCY RESPONSE AND TRANSIENT RESPONSE An approximate method of relating steady-state frequency response characteristics and transient response characteristics has been described (Ref. 3). It makes use of a series of charts which indicate the type of open loop attenuation curves required to produce desired closed-loop responses. TRANSIENT AND FREQUENCY RESPONSE 22-19 If a servo system falls within the group considered in the charts, this method enables the designer to take a set of specifications setting forth steady-state frequency and/or transient response requirements and quickly estimate the necessary open loop characteristics. The charts also permit the designer to estimate the effect of changing various system parameters to give him a better understanding of the system. Description of Charts. The symbols used on the charts are defined below and illustrated in Fig. 12. maximum ratio of the closed loop frequency response (Mm) peak value of the ratio of controlled variable to input for a step function input ratio of the frequency Wm at which C/R 1m occurs to the frequency at which the straight-line approximation of the open loop response is 0 db We Wt ratio of Wt, the lowest frequency of oscillation for a step input, to the frequency at which the straight-line approximation of the open loop response is 0 db We, the frequency, We, at which the straight-line approximation of the open loop response is 0 db times the response time tp measured from the start of the step function until C/ Rip occurs the frequency, We, at which the straight-line approximation of the open loop response is 0 db times the settling time, t s , from the start of the step function until the output continues to differ from the input by less than 5 per cent Indicated in Figs. 12a, b, ap.d c are these various characteristics in terms of the familiar curves of the open loop transfer function, the frequency response, and the transient response to a step input. The charts, Figs. 13 to 30, were prepared for a system with an initial open loop attenuation, Fig. 12a, of 20 db per decade. However, the shape of the curve near 0 db is of greatest importance, so the curves may also be used for systems with initial attenuation slopes of 0 to 40 db per decade. Limitations. Of necessity the charts may be used for the analysis and synthesis of a somewhat restricted class of servomechanisms. Their use is restricted to: (a) Linear systems, or those which may be considered linear for a restricted range of operation. FEEDBACK CONTROL 22-20 c Open-loop transfer function G WI W2 We Frequency, radians/second (a) Steady-state frequency response Wm, Frequency, radians/second (b) Transient response following a unit step function input ~- Q o~----~-------------------------------------------------Time, seconds (c) FIG. 12. Sketches showing nomenclature used in the design charts of Figs. 13 to 30. (b) Single loop, unity feedback systems containing only series elements; of course a multiloop system may be considered if the inner loops are reduced to equivalent series elements. (c) Systems whose open loop characteristics fall into the category of servomechanisms described by Fig. 12. However, systems which ostensibly are not of this type may often be approximated by some which are, especially if the required approximations occur at an appreciable distance from the crossover frequency, We. (d) A step function as the form of the input signal producing the transient response. TRANSIENT AND FREQUENCY RESPONSE I 2.4 I ~ / 1.8 J C,) 1.6 1.4 / v / / J ,y 1.2 I I 1I I lL I J 1 L ; ! I) II / i ) II / / /1/ /1/ IL / LV '/ III = 40 I l..- II ~L 'r1 I Ii 1- t-- I-~~ ! -L 1- _-f/ .' :;::. 0.4 0.2 0 w3 I 11 ]I ........ - - ._-- --- -... III = 80 III = 60 0.01 - = 1 -/ . '-I- III = 40 WI '- ,! 1 1 1 i / I I..,I,<'L / - ~' --1 1/ v' L ~ ---- Wetp - - wets - - - _ WI to W2 = 40 db/decade _ W3 to 00 = 40 db/decade , -...:.....- ~ III = 30 0.1 / Wt we ---- /A II~ We i/ ~ i J -i- ~ "1-r-- h~~ ." -1/ j I ..... --- We I I ; 1/ 0 :J %Ip v J II -:::.-... 1.0 Q) Rm 0.1 ! 0 ...... 0.6 ci ------ ~ WI to W2 = 40 db/decade /. w3 to 00 = 40 db/decade ~~ III = 30 J 0.8 i I III = 20 1.2 3 I I 0.01 1.4 ! I III = 60 1.0 ~ >. 0 c: I V./ III = 80 I I J/ / / i ! I L / L / / 1/ 1 I I J J1 f I 2.0 I J I II 2.2 22-21 W3 -=1 We t- III = 20 1 We FIG. 13. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). 22-22 FEEDBACK CONTROL WI to W2 = 40 db/d~cade I wa to 00 = 40 db/decadewa - =2 2.4 2.2 ~ I , I I We £1R ----- J, m' 1.8 I i i ! 1/ / / / 1.6 I 1.4 pY"I 1I 1I V 1/ / f.'l = 80 } 1 I If I J // ! /L '1--,- IJ V I ~ tL ~~l = 30 .,d ~ I~~ III = 60 1.0 1 I} ~' I.? 1 I I I III ="'40 II .. k' 0.01 ~""f.'l=20 0.1 WI to ~2 ~ 40 db/d~cade . w3 to 00 = 40 db/decade - 1.4 ~ =2 We 1.2 "- "'- \ .Q e 0.6 / ~ c: Q) g. 0.4 Q) &t 0.2 o ./' 1 I 'I \. ... Ii III = 80 ~ ~ L1 JL 1-~ f'. 1-'1 ,V/1 / ~I i 1'""['. 1 \/ / I ' \. . // / - / J V ) V-"",,- r- III = 60 -- - I V V 11~ 1-,...,. 0.8 _ ~Ip --- - j ; Q 1.2 I II ; ,/ 2.0 - I I VI /'~'I " I I , j. '-j "- wets - - - - I L . "'-J If // W--- III = 40 f.'l = 30, wetp - - - I 1 .../ 'J~ I ,/ i / ~ We 1 Wm _____ We --"'-1'" III = 20 II 0.01 0.1 FIG. 14. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig., 12 for definition of nomenclature (Ref. 3). TRANSIENT AND FREQUENCY RESPONSE 22-23 2.4 2.2 2.0 ~ WI to W2 - 40 db/decade w3 to CIO = 40 db/decade w3 1.8 l> ~Im ------ ~ ~Ip I I 1.6 I / ,V 1.4 I I / I // I I I .// / /V 1.2 1.0 We = 4 I JI~ ~~V III = 4~ ~ ./ III = 80 1~=30 III =.60 " // I V I V ~~ 0.01 I ./ /v /' ~1=20 0.1 1.4 - WI to W2 = 40 db/decade w3 to CIO = 40 db/decade - 1.2 W3 - = 4 We s ....... 1.0 3 0.8 -" Ij -- / ,1 \. \ I '~ I I I / 0.2 ./ I' / III = 80 o I 11 r-t/ I-- \/1 ,/ /['1 1/ -:'1-. .... / .... ~t'-- - l,'p' I~~ ..... / .......... / 'J.~' \ I ~: N ./' ,1 il J l7 /1\1\/ 1/I/J / '1--:// I " Wt We Wm We -- -- ---- - wetp - - wets - - - - b<~J 'J-.. /,.- t<;'r- I~ .......... I J, III = 40 III = 3o IIlI = 20 III = 60 1 0.01 1 j I .......... ; II J - II ~ I 0.1 We FIG. 15. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). 22-24 FEEDBACK CONTROL 2.4 2.2 Wl to w2 = 40 db/decade w3 to 00 = 40 db/decade ~ 2.0 1.8 il Q 1.6 I 1.4 I I 1- J.LI = 80 I // f 9V' I I~ J.Ll = 40 V ~V I // 1~=30. i-"~1=2~"'" 0.1 i-""'J.Ll = 60 1.0 0.01 /' I 1.4 WI to W2 = 40 db/decade w3 to 00 = 40 db / decade 1.2 --~ ----We Wt We W3 =8 We - o ..... wetp--- ~ 1.0 wets - - - 3 \ .Q g. 0.4 \,'l , £ I " "" 0.2 o -J.LI = 80 I II 1 [" /1 1~'1 ..... , I I 11 l 0.01 = ,! t 7" " I I . . . . , ~/ "r-. ,," i L II II It. /'\ Ii 1/ ..... II -" \ '''\ 0.8 ~ cQ) - --- I / V I~ 1.2 p I I ~ 0.6 - ~Im ----- ~ =8 60 1 ~ J 11 ~ \ ,~ .. '/~ in " /r'~ , ~ ~ 1_', j(i, ~,/1'~! ~b-L IL~ ~ ~~~~ ':iL 1" // ~ th-)<- ",I' J.Ll = 41 J.Ll = 30 I -' / lit 1 l L 1 ....... J.Ll = 20 I 0.1 FIG. 16. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). TRANSIENT AND FREQUENCY RESPONSE 22-25 2.4 2.2 Wl to W3 to 2.0 ~ 40 db/decade W2 - 00 = 40 db / decade W3 w.; = 00 ~Im 1.8 - ---- - ~Ip --- c..> 1.6 L' 1.4 I ~~ I .........ILl I ILl = 401" \.// II/V" 1.2 1.0 / = 80 I-""'"ILl /1 ILl / ~ ~~~ /' = 60 -:;::::'" 0.01 ....... = 30 ~ ~IL1=20 =r- 1:::::"10-" 0.1 ~ We 1.4 1.2 0 ..... ~ <> ~ \ 0.4 0.2 0 1\ "'- /1 /'1',/ I ILl = 80 I ----- v' I if, "'- '-...., 1/ /'r', I' / ~l / ~I~r---\ :\ '/.., I y.~ \~, I I 1 / f--..J i 1', I,/-Ii r--.~ // h lL / lb, : " 1 1".1 / ""J / L = 60 { 0.01 / !' "\ . . .l / If "- - I \ Iff. "- -- wets _ _ _ V~" \ 0.6 QJ ~ We 1- s::: J: ~ Wetp 0.8 >u Wt We 1.0 3 :8 II = 40 db/decade 00 = 40 db/decade w3 -We =00 WI to W2 w3 to ~ V / r--[",,-_/l' II l- /-II ~' = 4DJ i/-II = 30 ~ ~ II I / /-II 1 = 2~ 0.1 FIG. 17. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). 22-26 FEEDBACK CONTROL 2.4 W~ to ~2 ~ t- W3 to CX) = 40 db / decade W3 I- -=1 We 2.2 t- l- 2.0 - t- I I I / II i Q / 1.6 / 1.4 / / / I I ~ 1.8 I J / / ! / J / 11 ~ 1/ I I / IV J..Ll = 60 j ,/ " V/ V/ V , If !J If _L II, / /' /1 / / Y JJ.l = 40 ~ JJ.l = 30 I I I 1.2 , i fi V" II I I / I' / J..Ll = 80 , I ; i ---QIRp ~Im II ,, ,I II I I I I 60 db/dec1ad1e / V / / / I I"""" JJ.l = 20 1.0 0.01 1.4 l- I WI I to W2 w3 to CX) I- 1.2 II- 0 ~ 1.0 3 We 0.8 0 :;::. I- - :! = 1 II I I '~ r-r---jf-,-I---i--+-+-++-HI,-t---I,--+--+-+-+-+-++-H ; ! We Wm I I I I II = 60 db/decade = 40 db/decade Wt .-I I- I 0.1 ---- ,. ! 1/ j !! V ! wetp _ _ wets - - - E 0.6 ~ c:: Q) g. Q) 0.4 ~ -r-- 1---+--+-+-+-++++7.JJ.-l= 80 ~ ...::r:.::- t-, -= 60 '-t-- JJ.l = 40 F--= ~ 30!--==JJ."-1....L_.--=2-;;-0P---ir--t--HI-H I I 0.2 I 0 0.01 0.1 1 FIG. 18. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). TRANSIENT AND FREQUENCY RESPONSE to W'2 =' 60'db/d~C~d~ W3 to co = 40 db/decade W3 f- =2 We 2.4 ~ 2.2 2.0 ~ IIf- I I I -: I Wl I ---*Ip - C Ii , I , I I / / I // 1.4 I r I V 1/ J I ) II /V I I I I / ,, /1 1/ 1/ I 7/I 7 7" / V I J IIV // f/ ~l =,40;::~ /v tY }/ LIb" III = SO/Ill = 6 0 / ~11 III = 30[..- A '---/Ill = 20 _/ 1.0 0.01 11 I' , I V , 1 , , I ; / 1/ / II J // 1.2 I I II 1.8 1.6 I : I I c..> -: I I r I m I I I ,! 22-27 WI 0.1 we 1.4 1.2 s -- Wm We ---- t- Wetp ---- -:;:; 1.0 tOJ 3 0.8 I.2 t- ro ~ 0.6 u c: 60'db/decad~ I i w; to W'2 =' w3 to CO = 40 db/decade W3 t- = 2 We tWt We ff- wets L 1,1 -I I Q) / 0.4 a.'i: o I ! ! II II II II! _II J AI I) i / Ii : If )/ 7f.). 1-1-- 1--r-..~111 V; ! l1'f-j1 ~J / } I..i\'-t--,. 't-~ Ai / f / \ 1 \ 1/ , \// \~" "-r--. J~ -..i.,/ I / II 0.2 1 ! ! i :J g I , III = SO 0.01 " / ---- I /1 /1 I " III = 60 -, I /1/ -I- t".. t-t-.r-. ~ /Tt"-.:::: Il~ ~ ~Ol/ III I Z _/ -- =3~ t-:::j-. Il} =~O 0.1 FIG. 19. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). FEEDBACK CONTROL 22-28 I I 2.4 f- 2.2 ff-' 2.0 - ~ I I I I II f- W\ 10 W2 = 60 db/decade w3 10 ex> = 40 db / decade W3 - =4 We £r I I l- I p / II r / 1/ 1.6 J J / / 1.4 /~ / I / 1/ /IL // .I~ ~ ~ J1.1 = 80_ J1.1 = 60 I II I I I IJ/ I / III II I / / I i /'/ / V ;' l/ I ! / I 40 """ 30 - ,-r:::1=t J1.1~ = - ~ WI; I-~ 3 I- f-- We Wetp wets 0.8 o ~ ~ c: Q) I 1/ i -- 0.4 0.2 o V ~1=20 --- -- --- II II \ ll. 112 If! 0.01 '-- I J1.1 :;= 60 WI We JI I /,1 t-~ J1.1 = 40 ~ r-.. I II II I J / nf 1\ 1f/'-l/1 /I~ 11K 1/ //-f.- V// d 1/ '!-J l'if' KL ~ r-r-; r-----.J j.z (j / I V pz:::. j .")Zr-J --,... I I-L ri 80 I II 1 l V- II hI\. /j J1.1 = II / II 1,1 0/\, \, !J I 1/ I II I \/ 1\ II I II Ii/ 0.6 :J /' 0.1 WI to = 60 db/decade w310 ex> = 40 db/decade W3 -We =4 o ~ 1.0 V w~ ~ 1.2 // 1/11 ~/ 0.01 1.4 I i J J / I 1.8 C) 1.0 Ii I £[ -R 1.2 I. I -~- R m I I tr.:.z r-...I':::' I • ....;' I 1---...... ;1- I 0.1 r-... J1.1 = 30 J1.1 = 20 1 1 FIG. 20. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). TRANSIENT AND FREQUENCY RESPONSE 2.4 II- 2.2 II- 2.0 I- , , '. ' ,, I " to w2 = 60 db/decade wato co = 40 db/decade wa - =8 WI , I : We ---- ~Im I C,) l 1.6 / I / / /// 1.2 -/ IL V V 1;' / / V l V V ~ = 30 I / / / / / / II /~/ ~" 0.01 I I ..... J.l.1 - 80~ = 60-tl= 40 I II I' 1.11 1.0 I I / / L 1.4 I / / I ct: 1.8 , I! If ~IRp 22-29 / I / l/ /1 L / V 1j,1' // / LV ;::" . / ~ .... 1--' J.l.1 = 20 0.1 WI We 1.4 WI' to I- W3 to '- I- o ..... Wt We Wm We ~ 1.0 '- CJJetp 3 0.8 CO :i'-/ = 40 db/decade -w3 =8 We I- 1.2 w~ = '60 db)de~a~e' wets -- .~ ---- ---- \ .'-...,..- o ~>. 0.6 /~ u c: ,/ I"~ tf g 0.4 J. I .lL L II -. /1J//...\ .~ / ,/ II 0.2 J.l.l l -l / L L / J I L 1 / /-,1 /1 /1\/~1 /1 /,~ /~~?f' "it jK", - Lt 1/ II /A I//L 1',- ' ;~ 7'--.{j /1- N r'f, 1""'"'7-~ y---- '.j. ...t, /~ ...~If/ ':'-~ tf-"' ..... / ~ = 40 = 331' = 60 1M = 20 = 80/ // II / Ill . . . CI) o \ / XI 1\ 'I 0.01 J.l.1 I / ~ J.l.1 0.1 FIG. 21. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). FEEDBACK CONTROL 22-30 I 2.4 2.2 ~ I I I II 'I 60 db/decade W3 to 00 = 40 db/decade W3 I= 00 We II- 2.0 I I- WI to W2 = f- ! ~Im --QI R p - / J I 1.8 I / J I I 1/ I / / v' 1/ 1.4 1.0 //v ./ V ~ Le- V J.l.1 = 80-_ 60-'--.:: 40 ~ J.l.l -;- ~1 - J.l.l = 0.01 I 1.4 _ II- I I I I 30 I I I / ; ,7 ,/ / / / ,;V / V / ~v ~"'" "I~ 7 // /~ ~1= 20 1 0.1 I II :~ ~~ %2 ~ ~~ ~~~~:~:~: W3 I- 1.2 -:;:;- 1.0 V II I !.I / // 1.2 3 I / I 1.6 u I f Q ~ I I -We = 00 / Wt We /11 Wm We I 1/ / J II 0.01 J II 1\ 0.1 1 FIG. 22. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). 22-31 TRANSIENT AND FREQUENCY RESPONSE 2.4 Wl 10 W2 = 40 db/decade walo co = 60 db/decade wa 2.2 -=1 We / 2.0 ~ 1.8 / / II' / / / Co) II I / / IJ / ~IR p / V / U? / /' J.l.l = 80 J.l.l = 60 J.l.l = 40 1.4 J.l.l = 20 J.l.l = 30 1.2 1.0 0.01 0.1 1.4 Wl 10 W2 = 40 db/decade walo co = 60 db/decade W3 1.2 ..... -... ..... 1.0 3 We -- wetp - - - 0.8 ,- ~ >. u 0.6 5- 0.4 I'--- - f--- c(l) (l) =1 We <> ~ - We 0 '-r It J.l.l = 80 J.l.l = 60 J.l.l = 40 J.l.l = 30 J.l.l = 20 0.2 a 0.01 0.1 FIG. 23. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). FEEDBACK CONTROL 22-32 I 1 : I ! ,I 2.4 2.2 If I - ~ 1.8 1/ I 1.6 1.4 II I I / / "/ / = 80 III I / ;' v 1/ III = = 60 / 1/ i 40 VIII I II I I / / / I II / = 30 V WI to W2 w3 to co = 40 db/decade = 60 db / decade W3 - 0.01 I II 1.2 8 ~.., 1.0 3 0.8 .Q ~ 0.6 ~ c I I- -- IIr- i I / i i / i _1-::-V --- ./ / 1- -'";-k-- 7 i ; ! II If ! ! Wt We / Wm ....... ~~ ~~ ...... Wetp ;-~~ :::~--f -I- / ! i ! wets -- - --- - -- - --- /1 1- I WI to W2 W3 to co = 40 db/decade 60 db / decade = - W3 - -We =2 IV g. 0.4 --- J: 0.2 o III = 80 III 0.01 ITT ! f / i .' I - 0.1 Ii 1.4 =2 We I 1.0 - " /1l1=20 I 1.2 I / V VI " , %Ip --- / / I I ~Im --- I I / I / / III , I " // v I I C,) / I I , I I ,/ I , f ! I 2.0 , ,, , : 1/ 1 I f = 60 - -:t--tIII = 40 III = 30 III = 20 0.1 FIG. 24. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). 22-33 TRANSIENT AND FREQUENCY RESPONSE WI to W2 W3 to 00 2.4 I , I : I 2.2 ! 2.0 ~ 1.8 I ij I J I J 1.4 1.2 I~ 1/ ~t/ / t ,II 1.0 I /1/ ~ = I~I' 40 .... I .... 0.01 - ,, II I / V vb , m /7 / II /J // /[J ~ J1.1= 60 /' J1.1 I J1.1 = 80 I IJ - %Ip --- I 1/ / J I 1.6 R / I II =4 £1 ---- I I 1/ I 1/ CJ We 1 I I - I I , ! I = 40 db/decade = 60 db/decade W3 11'J1.1 V = 30/J1.1 = 20 I~/' 1 0.1 1.4 WI W3 1.2 -.... 9 ' .......... 3'" 0.8 QJ / \. E 0.6 :>. u cQJ )11 ~ --..... ~ 6- ~ 1\ 1.0 "- Ii I II / \l--' / ~ f,.......... "\ """ a J1.1 = 80 '\ I ] )' 1,1 III', I"-f.--V / 1/ /' Ai /1 Ii / I' t--tJ1.1 = 60 J1.1 = 40 J1.1 = 30 I ~- J1.1 - We 7' .1/ I Wt Wm -We ---- I / - - - - -- / Ii -- / I! / ,I ! / r-t- 0.2 j / // W3 -We =4 ! II j/ 'V J / /-,V II 1/ 1\·. kf:; ~l,L../)..... / 0.4 I---r--, to W2 = 40 db/decade to 00 = 60 db/decade Wetp --- wets ---_ .J 1-_ = 20 I 0.01 0.1 1 FIG. 25. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). FEEDBACK CONTROL 22-34 WI to W2 = 40 db/decade W3 to co = 60 db/decade 2.4 I W3 - =8 We 2.2 ~Im ~Ip 2.0 ~ C,,) I I ! 1.6 I ! I I I II I / II / 1 il / / / ~ILI = 80 " I I / '/ ~~l fil=60 I............ ~l= = 40 /'" 30 01=20 ~ !-H1 0.01 II // /v /V // 1.0 0.1 I I w3 - =8 We 1.2 S I 3 "- 0.8 ~ ...... LL. 0 ....--, I/~ ILl \/1 / r.. ... == 80 / "~ '/ r;"" /' I ILl = 60 0.01 I 1\.rJI- ~ 1'/ r'}ll .. / ~ ...... ')'. . . . . 1 ......... ....... r.," " ~y.. / .I ,'/ l , ,..-,1'\. rl -....... / / I /7 \ I) ,/ cQ) 0.2 1,/ \ ~/V 0.6 0.4 Wt We I ~ 1.0 ~ I I WI to w2 == 40 db/decade w3 to co = 60 db/decade 1.4 C' ------- - 1.2 :::l - I 1.8 1.4 ~ >. u I ,II ILl = 40 1/ I I - - --- II Wm ----We .4 // Wetp wets ~(V / .j.J --.--- - , 1'7' /, f" )<1' r-- // I J A'-JI . . . . . II;( / ILl = 30 0.1 ILl = 20 1 FIG. 26. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). TRANSIENT AND FREQUENCY RESPONSE -r I 2.4 r- W3 ~ 2.2 f- r- 2.0 ~ f- I I to W2 = 60 db/decade to 00 = 60 db/decade WI £1Rm We il : =2 ---- I I C RIp - - I / C) 1.6 V iJ.l / / / /" I V / = 80 iJ.l : ,,: 40 iJ.l / r 1/ I // V / ~ l/ /1.1 - If 1 II,' II /rl I/~ = 60 l /1 / / 1.4 I I / I / 1/ II / 1.8 II II / ! I I -: 11 / I 22-35 J;,v // V l/ / = 30 ~I 1.2 / / / =20 1.0 0.01 0.1 Wt 1 We 1.4 r- WI W3 to W2 to 00 W3 t- 1.2 tf- ~ ~ <> 1.0 t- 3 . - We Wt We Wm We t- Wetp 0.8 = 60db/d~C;d:= 60 dbJdecade t- wets = 2 i / "-- 'r- ..... 1- ~ - It:="": 0 e I ; i I ,I ---- ---- Ii ,/ --- 1/ II I / / / I ~.-;; ~ !" ! fC:.":-e:...... - ~r-;.... - ~/ 0.6 ../ /' ~ t:: (I) ::J 0' 0.4 (I) ~ It 0.2 iJ.l = 80 iJ.l F = 60 /.1.1 = 40 /.1.1 = 30 -- - /.1.1 = 20 a 0.01 ~ 0.1 1 We FIG. 27. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3).' FEEDBACK CONTROL 22·36 2.4 WI I- Wa lI- 2.0 - l- ~ =4 ~Im ~Ip iI -I V J.l.1 = 80 I I I I I " = 60 db/decade WI to W2 t- wa to GO = 60 db/decade I- wa II- t- We We =4 -- Wm We ---- ~ I- We tp 0.8 ./ wets IL V ~ IL I/lI' / !Iv /' I / I I! V I V IL /V 1L /' ..... ~I' "I' J.l.1 = 4~1 = 30 ~ ~/'J.l.1 = 20 L'1 J.l.l ;" 60 0.01 I 3 / /V II 1.0 ~ Y l' 1/ I I / J // 1.2 -:;:;-.., 1.0 "V / I i I f ! L ! /1 I I I" I I II I : I I 1/ 1.4 1.2 / I 1.6 . II ; I l- / I C,,) 14 L II R:: 1.8 : I{ I I I I I I; I I ---- I I I I : Wa I- 2.2 I I to w2 = 60 db/d~cade to GO = 60 db/decade 0.1 ---+---+-_+__t_+-+-++-t----jl~_r__tl_+__t___r__t_t_t__H ; : ~ J t----.,!11L ~ !"i.. I JJ v JIJ( _ -.- _ r-===-=-.::::i'::..: .-.; - ---- .Q ~ 0.6 >. u c <1.1 ::s 0- J: 0.4 ,/ - 0.2 J.l.l = 80 J.l.l = 60 J.l.1 = 40 M = 30 J.l.1 I'- I = 20 FIG. 28. Charts giving comparison of steady-state frequency response and transient . response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). TRANSIENT AND FREQUENCY RESPONSE I 2.4 I ex) t- CI I- R m 2.0 - 1 I, I I I I I i I I I I I I- , , = 60 db/decade W3 We t- 2.2 I I- W1 to W2 = 60 db/decade W3 to =8 , I ---- / I QI - Rp j ~ 1.8 C,) I 1.6 I / I ; / 1/ II I / II " III 1.0 W1 to W2 t- W3 to ex) = 60 db/decade = 60 db/decad . t- w3 We\ t- I- S... 1.0 3 t- Wt We Wm We I- Wetp ~; = I 0.8 t- We ts I = 8 ~1=20 0.1 30 I--- IV L I jJ ---- ---- 11 i\ I ~ 1\ 0\ .I, ~/rL it, \, AL I // 1 1/ / c: OJ PZ-l / 0.2 1-11 = 80/ a 1111 ;; -- >. u !t I J ~ 0.6 OJ I It I 11/ .Q g. 0.4 I I I / /,/ J 0.01 o I VI /L L j / VL j vL lV j V iL lL I/i.I // ~ LL ~~ ~ 1-11 = 4'oL~V ./' 1-11 = 80 ~ 1-11~ 1.2 1.2 1 I I II I J 1 II 1 / 1/ II 1.4 1.4 22-37 / 0.01 V /1 ,t' ~ ... r--.J 1'-_ V ) .... 11 i .1 j .1 £1 ~ 111 '1 .,/, v~/..ti/l liL J ~' ·'U/'1::./}-/... 1'.-..1 . ~ " f-/,. •'>' .... ~1rt.'L 1-11 = 60/ 1/ It J' l! /:-1 .fl v ........ J/'I-f -J -b~ r... ",,-:': r-- ... ~ 1-11 = 40 I 1t;1 I = 30 J.L1 = 20 I 0.1 FIG. 29. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). FEEDBACK CONTROL 22-38 II,~ WI to W2' = 40 db/decade' W3 to co = 60 db / decade w3 We = 4 2.4 2.2 ~Im J.l.1 = 80 6~i: J.l.1 = J.l.1 = 40..JJ.~ J.l.1 = 30 --:-f-4. . I ~ ---- R p ~ C,) I 1.8 J.l.1 = 1.6 ,-1 I / 20" j':"< 1/ ~~ t./ h 1.2 ~ ~J.l.1 = 30 III~ ~ .... or<. "'t J.l.1 = 20 ~ 0.01 0.1 WI to W2 :: 40 db/decade w3 to co = 60 db / decade w3 -We =4 ..-. ~ 1.0 Wm we 3 :8 ~ >. (,) I I J.l.1 = 20 - ' - fJ.l.I = 30 ~~=40 . Wt We 0 0.8 ,/ ~ ~tf ./' :JVJ.l.1 = 40- J.l.1 = 80 J.l.1 = 60~..& ~ ~ 1.2 1.4 I~ }/I if r4 ~ t/ 1.4 I 1/ I III} 1.0 ~ I I ~I- 2.0 if- J.l.l = 60 ----- wetp --- wets ---- i ~. - ~~ ~~;-~ ~~ ~IB 1 'i"'~~-=-~ '/ . ~ - J.l.l - 60 &[;f~/ '-~ ltl 40 ;'1 V I ~V = =60/ --~', lI'l~ /P,1 = 3?LV J.l.l = 407' ....... ~ ~ / J.l.l =I, 2.2it' 11/ ~ J.l.1 = 30 . 1/ h.l'd ....... f~~'J "'-·T-1; ~/L ·"/~J.l.l = 20. ./.J:] ~Yl = 60,80 30.40 If 7 ~J.l.l o J.l.l 0.6 / cQ) g. 0.4 ~ J.l.l = lL. 0.2 / =t I 0 0.01 0.1 1 FIG. 30. Charts giving comparison of steady-state frequency response and transient response following a step. See Fig. 12 for definition of nomenclature (Ref. 3). Note that W2/ We is the abscissa for these charts. TRANSIENT AND FREQUENCY RESPONSE 22-39 Uses of Charts. Figures 13 through 30 are comparisons of steady-state frequency response characteristics and transient response following a step function of input as a function of wdw e (Ref. 3). The information presented in Figs. 13 to 30 is useful for analysis, that is, determining the response of systems already designed, or for synthesis, that is, determining what sort of system will be required to do a specified job. Typical examples of each are presented in the following. EXAMPLE 1. Analysis. Determine, approximately, the value of C/R 1m and the frequency at which it occurs, and the magnitude of the peak over40 ~ 30 20 -.... ILl --- -- - ........ r--~ _.-r- .... 10 Cii ..a .~ "0 - QI~ l~ i'........ I til I 0 R+ E - i'- r--~I'-- rr WI -10 -20 .......... W~ W3 ~ s(I+0.ls) 10 -30 0.1 1.0 FIG. 31. I I II I I 111111 w, radians/second "- f' 1"1' 10 f'" 100 Gain plot for example problem. shoot to a step function input and the time when it occurs for a system having the open loop transfer function, (57) G/E = 10/s(1 + O.ls), which is drawn in Fig. 31. From this, W3 = }Ll = 20 db. We = 10, (58) The values for WI and }Ll were arbitrarily selected; of course, choosing a value for one fixes the value of the other. Since there is no segment between" WI and W2 with either 40 or 60 db per decade slope the chart with FEEDBACK CONTROL 22-40 the lower attenuation rate 40 will be used. By entering the chart in Fig. 13 with the following parameters: J.LI = 20, (59) wdw e = W3/We WI to 0.1, W3 = 40 db/decade, W2 to = 40 db/decade, 00 = 1.0, the desired information may be obtained. ~ 1.16, C/Rlm (60) Wm/W e ~ Wm ~ C/Rlp Wetp - 0.7, 10 (0.7)(10) = 7.0, ~ ~ 1.18, 0.36, tp ~ 0.36 sec. EXAMPLE 2. Synthesis. The requirements of a position control system are assumed to be set forth in the following set of specifications: 1. C/R 1m = 1.3 or less. Wm = 2 cycles per second or more. 3. Velocity error coefficient (Kv) is 200 sec-I. 4. The attenuation rate of the open loop control will be 60 db per decade for frequencies greater than 120 rad per second. 2. The problem is to determine the open loop transfer function of a suitable control for this application. The frequency, We, must be considerably greater than Wm , so as a first assumption assume that We = 30 rad per second, and that W3 = 120 rad per second. For the specifications given there are many solutions to the problem. Figure 28, for which W3/ We = 4, shows that for J.LI = 40 db and wdw e = 2/30 = 0.067. C/R 1m = 1.25, (61) Wm/W e Wm = 0.6, = (0.6) (30) = 18 rad/sec. Thus, an open loop transfer function satisfying the specifications is (62) 200(1 C(s) E(s) s(1 + 0.2S)2 + 0.5s)2(1 + 0.00833s)2 Synthesis by means of the charts is basically a trial-and-error processassuming the solution and checking it. TRANSIENT AND FREQUENCY RESPONSE TABLE Parameter Time to peak Peak overshoot Damping ratio Settling time 1. RULE-OF-THUMB ApPROXIMATIONS Approximation tp ~ Remarks 7r/w e where tp = time from step input to peak value of response transient, seconds We = open loop crossover frequency, radians/ second C/Rlp ~ 0.85Mm where C/ Rip = peak value of transien t response to a step input Mm = maximum value of closed loop frequency response r = 1/( 2Me) where r = damping ratio Me = value of closed loop frequency response at the corner frequency ts (6%) ~ 22-41 3V"l="?/r Wd ts (2%) ~ 5V"l="?/rW d ts (6%) ~ 3Teq In Chestnut and Mayer's charts it is evident that for this general class of servomechanisms, those with a dominant complex pair of closed loop poles, the open loop crossover frequency, We, times the time to peak, tp , is about 3 or7r. In other words, the time to peak is about half the period corresponding to the open loop crossover frequency. The peak value of the transient response, C/ Rip, to a unit step in pu t is generally less than the maximum steady-state value, M m, of the closed loop frequency response. The maximum value of C/ R Ip generally approaches 2.0 while the maxi. mum value of Mm approaches infinity. For many applications "good" servos are those with the values of Mm between 1.3 and 1.5. For these servos Mm is generally 10 to 20% greater than C/Rlp. The damping ratio may be approximated from the value of the closed loop frequency response of the system at the corner frequency, We (the frequency at which the lines asymptotic to the log magnitude curve intersect). This is exact for a second order system. Of course this relationship may also be used to estimate Me, knowing the damping ratio. In addition, Me is approximately equal to M m for systems with low damping ratios. The settling time, ts , is generally defined as the time for the system to settle to within 5 or FEEDBACK CONTROL 22-42 TABLE Parameter Settling time (continued) Equivalent time constant 1. RULE-OF-THUMB ApPROXIMATIONS Approximation where ts = time for response to step input to settle to within some per cent of final value, seconds Teq = time for response to reach 63% of final value Wd = damped natural frequency, radians/ second r = damping ratio Teq ~ l/w e where Teq We Oscillation frequency = time for response to step input to reach 63 %of final value, seconds = gain crossover frequency, radians/second We ~ Wm ~ where We = Wm = We = O.75w e oscillation frequency of transient response, radians/ second frequency at which M m occurs, radians/second open loop gain crossover frequency, radians/ second (Continued) Remarks sometimes 2% of the final value. In either case it is quite difficult to predictt s for an underdamped system because it is subject to fluctuations of about one-half the period of oscilla tion for only small changes in system parameters. However, approximations (see eq. 18) can be made. The last approximation is for an overdam ped system. This relationship is exact for a simple single time constant system, but also quite good for the general case (Ref. 4). The frequency of oscillation of the transient response, We, is generally about equal to the frequency, W m , at which the frequency response peak, M m , occurs. Both Wm and We are usually less than We, the open loop crossover frequency. For the "good" servos with Mm = 1.3 to 1.5 an approximate relationship is as indicated. In this approximation Wt is used to mean essentially the same thing as Wd, the damped natural frequency, previously defined for a system with a dominant complex pair of poles. The use of Wt places no restriction on the system characteristics; however, generally, there is no significant difference between We and Wd. TRANSIENT AND FREQUENCY RESPONSE TABLE 1. RULE-OF-THUMB ApPROXIMATIONS Remarks trWt ~ trwm ~ 1.3 The system's rise time, tr , which is here considered to be the time for the response to a step input to go from 10 to 90% of its final value may be approximated as indicated for systems with a M m value of about 1.3 to 1.5. where tr Wt Wm Phase margin at crossover frequency (Continued) A pproxima tion Parameter Rise time 22-43 rise time (10 to 90%) = (defined above) = (defined above) = "Ie ~ 40° where "I e = open loop phase margin at the crossover frequency A phase margin of 40° at the unity gain (crossover) frequency generally corresponds to a M m ratio of a pproxima tely 1.5. Since this value of Mm is the maximum ordinarily considered feasible, the phase margin should be 40° or greater. 6. APPROXIMATE RELATIONS-RULES OF THUMB There are several approximations or rules of thumb which can be quite useful when time or facilities are not available for a more exact analysis. They may also be used as rough checks on the results of a more extensive analysis. The more common of these rules of thumb are presented in Table 1. They must be used with caution because, being approximations, they cannot apply with equal validity to all servo systems; and the approximations for transient response are applicable only for step inputs. 7. NUMERICAL AND GRAPHICAL TECHNIQUES OF RELATING TRANSIENT AND FREQUENCY RESPONSE The numerical techniques presented involve only routine calculations and provide a point by point determination of the related response without the need of obtaining the closed loop poles or other intermediate quantities. The methods presented require the follmving assumptions: (a) The system is linear. (b) The system frequency response approaches zero as the frequency approaches infinity. (c) The system's transient response begins with the system initially at rest. (d) The system is stable. These requirements are satisfied by most servo systems. Even a nonlinear system may generally be considered linear over a restricted operating range. 22-44 FEEDBACK CONTROL Determining Transient Response from Frequency Response. A relatively simple method for obtaining the time response to an impulse function input, knowing the frequency response, was developed by Floyd (Ref. 5). He derives the exact inverse transformation and then presents a method for numerically performing the necessary integration. The exact transformation is (63) e(t) = (2/rr) i oo {Re [G(jw)] cos tw} dw, where G(jw) isthe closed loop frequency response of the system considered. Floyd's procedure for evaluating this integral is to plot the real part of the closed loop frequency response, and then approximate the curve by a series of straight-line segments. This approximation is then treated as a summation of trapezoids. Equation (63) is applied to each trapezoid and the resulting time functions are added to obtain e(t). FIG. 32. Geometry of a trapezoid for approximating the real part of response function. Each particular trapezoid is defined as indicated in Fig. 32. Performing the integration indicated in eq. (63) the value for the integral is (64) where Al = AIWb the area of the trapezoid, and WI and Al are defined by the fiI2:ure. The value of e(t) is then the summation for all the trapezoids. (65) ~ -2An (Sin wnt) (Sin Ant) e(t) _- L..J ----. n=I 7r wnt Ant TRANSIENT AND FREQUENCY RESPONSE 22-45 EXAMPLE. Assume the closed loop frequency response, G(jw) , of the system to be expressed mathematically as 18.72 (66) G(jw) = [(jw + 1)2 + l][(jw + 0.6)2 + 9]· From this the real part of G(jw) is calculateu and plotted as sho,,-n in Fig. 33. The values used for w, .1, and A of the series in eq. (65) are: 1.2 W1 + 0.5 2.0 =--- 2 .11 + 1.2 - 0.5 .12 2 A1 = 1 X 0.85 .1 4 2 2.0 - 1.2 3.5 - 2.6 = ---- .13 = - - - - 2 2 A2 = 0.66 X 1.6 7.2 W4 + 3.6 A3 = 0.66 X 3.05 3.6 = ---2 + 3.5 2 7.2 - 3.6 = ---- 3.6 - 3.5 = ---- .15 2 A4 = 0.07 X 5.4 0.8 3.5 - 2.6 = ---- W3 2 = ---- 1.0 1.2 A5 2 = 0.07 X 3.55 [} 1, 0.6 i\ 0.4 CD ~ 0.2 ...... C5 ~ ~ r~ t-""---- t£!I-® \\ -0.2 -_ ~\®-® trExact -0.4 \ j -0.6 -0.8 L Approximation ~ 0 \~ o FIG. 33. 2 4 6 Frequency, w 8 Real part of response function and approximation. 10 FEEDBACK CONTROL 22-46 1.0 \ \.\ 0.8 0.6 0.4 0.2 \ 0 0.6 -"'\ 0.4 ~ 0.2 ~ 0 :5 ® \ '\ , ~ -0~2 ~ ~ ~~ j -0.6 ~ ~ -0.4 '(/// '/ Area between curve and CJJ axis is 7 CD+®-0+0-® 1/ I -0.8 .4 0.2 0 0 S;;?r -0.2 o FIG. 34. / " 0.2 0.4 0.6 0.8 Frequency, radians/second 1.0 Illustrating the trapezoids resulting from the straight-line approximation shown in Fig. 33. Figure 34 illustrates the trapezoidal approximations used for Fig. 33 and the foregoing calculations. The evaluation of eq. (65) then becomes (67) e(t) = (2/11-) [0.85 (Sin 0.855l) (Sin 0.35t) 0.855t 0.35t sin 1.6t) (Sin O.4t) + 1.07 ( - -1.6t O.4t - 2.01 sin 5.4t) (Sin 1.8t) + 0.38 ( - -- 5.4t l.8t 0.25 (Sin 3.05t) (Sin 0.45t) 3.05t 0.45t (Sin 3.55t) (Sin 0.05t)] • 3.55t . 0.05t The sin x/x tables (see Table 3) may be used to facilitate the evaluation of this equation at various values of t. TRANSIENT AND FREQUENCY RESPONSE 22-47 The exact solution, obtained by the inverse Laplace transformation, gives this result: (08) c(t) = 2.28 exp (-t) sin (t + 5.6°) - 0.761 exp (-0.6t) sin (3t + 17°). For comparison both eqs. (67) and (68) are plotted in Fig. 35. This is the system time response to a unit impulse function. If the response to 1.0 ..---.---..--:--,.----,-:----,----, 0.8 f---_+----!l--------'\,\--~---L---l 0.6 t----t--'l--t---+--\-\---i---f---i E t;,) QJ~ § 0.4 f - - - - t - f l - - _ t _ - - , - - - - \ - l : - l - - _ t _ - - l !} QJ a: 0.2 f----IJt--_t_--_t_--l-\'\--_t_--l O~-_+--_t_--_t_--l-~_t_-~ -0.2 ~-~~--:-'-::--~~-~-:---~~----' o 0.5 1.0 1.5 . 2.0 2.5 3.0 Time, seconds FIG. 35. Transient response for illustrative problem. a step function is desired instead, the graphical integration of the curve for the impulse response provides it (Ref. 5). Deterlllining Frequency Response frolll Transient Response Quite often the frequency response characteristics of a component or system need to be known but it is difficult to introduce a sinusoidal signal or to measure magnitude and phase shift of the output. In many cases it is much simpler to introduce an impulse or step input; and since time and frequency responses are uniquely related, it is possible to obtain the frequency response from the transient response. There are several approximate methods which have been developed for accomplishing this. Floyd's trapezoidal approximation method may be used but it yields only the real part of G(jw). To obtain the total vector magnitude and phase shift a set of curves such as those presented by Bode (Ref. 6) must be used. Other methods have been developed by FEEDBACK CONTROL 22-48 Bedford and Fredendall, by Teasdale, Brooks and German, and by Samulon (Refs. 7, 8, and 9). SaIDulon's Method. While the approaches vary somewhat the results are the same with the exception that Samulon's final equation has a "correction" term which makes it more accurate than the others. His procedure is presented here. Its basis is: SHANNON'S SAMPLING THEOREM. If a function c(t) contains no frequencies higher than feo cycles per second it is completely determined by giving its ordinate at a series of points spaced 1/ (2feo) seconds apart. N early any transient response curve will have some limiting value for its frequency spectrum, either due to the properties of the system or the test equipment itself. Example. The bandpass of the oscillograph might be the limiting item. Shannon has also pointed out that such a function, with limited frequency components, can be exactly synthesized by a sum of sin x/x func- 2 '-' / I I ~ c: o c. I/ '\ \ \ \ ,\ ,, , Vl Q) 0:: \ t, seconds FIG. 36. ~ Use of sin x/x function to approximate transient response. tions in a manner indicated in Fig. 36. The equation resulting from this approach is (69) G(jw) (~) (::) . (7r) ( W) exp sm - 2 (r) ~ jW"2 n::O Bn exp (-jwnr) Weo where Bn = the increment in the time response curve for a step function input, w = the frequency of interest, radians/second, Weo = the cutoff frequency for the system, radians/second, T = the sampling interval is equal to 7r/w eo . TRANSIENT AND FREQUENCY RESPONSE 22-49 Equation (69) would be exact if the system response contained no frequency components greater than Weo. This will never be absolutely true in a practical system but good results may be obtained nevertheless. In choosing the nominal cutoff frequency, Weo, the attempt should be made to estimate the frequency at which the steady-state frequency response is attenuated by at least 20 db. A good estimate of Weo is ten times We, the crossover frequency, as approximated in Table 1. The calculated response will be in error at frequencies lower than the Weo selected if the true response contains higher frequencies. It is therefore desirable to have a frequency characteristic which attenuates rapidly above Weo selected for calculation. If the system or instrumentation does not provide this attenuation a filter may be added. Samulon states "the amount of error, which will be largest near the nominal cutoff frequency, Weo, will be in general smaller than the amplitude response at Weo, provided that the response does not rise again above its value at Weo for frequencies greater than weo ." The calculated frequency response will indicate how valid the assumption of the cutoff frequency was. Note that with use of a lower Weo fewer points must be calculated. "5 1.4 I a. .: 1.2 a. .... 10,... 1.0 I / ::l .s Q) VI 0.8 c o a. ~ 0.6 Q) -- -- I c III 2\,) ~ I t: 0.4 'in .=I "4 V 2VI ~ I Sampling points 0.2 I 0 o 2.0 1.0 3.0 4.0 5.0 Time, seconds FIG. 37. Transient response for illustrative problem. EXAMPLE. Assume a system with a time response to a step input as shown in Fig. 37. By assuming a system cutoff frequency of Weo of 15.7 rad per second, (70) feo = 15.7 /27r = 2.5 cps. By Shannon's theorem the sampling interval should be (71) T = 1/(2feo) = 0.2 sec. 22-50 FEEDBACK CONTROL By reading the ordinates from the curve at the sampling points Table 2 is constructed. For computational convenience the frequency response at w = 7r/1.6 will be computed. TABLE 2. FREQUENCY RESPONSE CALCULATED BY SAl\WLON'S METHOD FOR W = 7r/1.6 Bn exp (-jwnr) Imaginary Real nT 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 (72) c(nT) 0.57 0.93 1.13 1.22 1.23 1.20 1.15 1.09 1.04 1.00 0.97 0.'95 0.94 0."94 0.95 0.96 0.97 0.98 0.99 1.00 1.00 Bn 0.57 0.36 0.20 0.09 0.01 -0.03 -0.05 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.01 0.01 0.01 0.01 0.01 0.01 0 I: Bn exp (- jwnr) + 0.527 0.254 0.076 + 0.218 0.254 0.185 0.090 0.009 0.004 0.021 0.019 0.021 0.046 0.060 0.046 0.028 0.011 0.019 0.028 0.028 0.020 0.009 0.004 0.009 0.010 0.009 0.007 0.004 0.004 1.108 - 0.008 0.044 - 0.890 +1.10 -jO.846 0.004 0.007 0.009 0.010 = 1.10 - jO.846 = 1.385/-37.6°. The vectors which are shown resolved and added numerically 'in Table 2 may be added graphically by simply plotting them end to end. The correction terms in eq. (69) will now be computed. Magnitude correction: (73) (7r /2) (7r /1.6) (1 /511') sin (11'/2)(11'/1.6)(1/511') 11'/16 - - - = 1.007. sin (11'/16) Phase correction: (74) exp (jwT/2) = expj(7r/1.6)(O.I) = 1/11.2°. TRANSIENT AND FREQUENCY RESPONSE 22-51 The final result is: (7r /2) (w/ Weo) . exp sm (7r/2) (W/Weo) G(}w) (75) (}wr/2)~Bn exp (-jwnr) = (1.007)(1.0/11.2°)(1.385/ -37.6°) = 1.395/ - 26.4 ° . The function chosen as an example is: e(t) = 1 - exp (-2t) + exp (-t) sin 1.5t, 3.5s 2 + 7s + 6.5 C(s) (s + 2)[(8 + 1)2 + 2.25]' (76) (77) (1 - 0.538w)2 . (78) GCJw) = (1 - 0.616w 2) + j(1.077w) + j(1.115w - 0.154w3 ) • The computed point is shown plotted on the exact G(jw) curve in Fig. 38. s I I I I I III (Computed magnitude 20 4 III Q; .c .~ ........ ~ 1'\ " o -4 compute~) "0 phase angle -16 0.1 = I I III/II 1.0 \ -20 ~ '~ ~Magnitude ~ I I II ar "Eo -40 ~ ~, ~shift +j(l.ll5w -O.lS4w 3) ~ Q) On Phase (1-O.538w 2 )+j(I.077w) (1-O.616w 2 ) o i\ \ G(jw) '\ t'i"o ... 10 '-" Q) III ro -60 ~'\ a: -80 '\.. 100 Frequency (w), radians/second FIG. 38. Exact response curves and calculated points for example problem. The sin x/x values given in Table 3 may be used to aid in computing the magnitude correction. Generally, these correction terms will be negligible; however, if accuracy is important they should be checked. Samulon (Ref. 9) presents a series of tables and nomographs which are useful if extensive work of this kind is to be done. The previous example illustrates the fact that the number of calculations required makes the whole problem rather tedious. If a great amount of such work is to be done, a special purpose analog computer may be used (Ref. 10). 22-52 FEEDBACK CONTROL TABLE x 0 3. A 1 FOUR-PLACE TABLE OF 2 3 4 sin x/x (Ref. 11) 5 6 7 8 9 - - - - -- -- -- - - - - - - - 0.0 0.1 0.2 0.3 0.4 +10000 9983 9933 9851 9735 10000 9980 9927 9840 9722 9999 9976 9919 9830 9709 9999 9972 9912 9820 9695 9997 9967 9904 9808 9680 9996 9963 9896 9797 9666 9994 9957 9889 9785 9651 9992 9952 9879 9774 9636 9989 9946 9870 9761 9620 9987 9940 9860 9748 9605 0.5 0.6 0.7 0.8 0.9 +9589 9411 9203 8967 8704 9572 9391 9181 8942 8676 9555 9372 9158 8916 8648 9538 9351 9135 8891 8620 9521 9331 9112 8865 8591 9503 9311 9089 8839 8562 9486 9290 9065 8812 8533 9467 9269 9041 8785 8504 9449 9247 9016 8758 8474 9430 9225 8992 8731 8445 1.0 1.1 1.2 1.3 1.4 +8415 8102 7767 7412 7039 8384 8069 7732 7375 7001 8354 8037 7698 7339 6962 8323 8004 7663 7302 6924 8292 7970 7627 7265 6885 8261 7937 7592 7228 6846 8230 7903 7556 7190 6807 8198 7870 7520 7153 6768 8166 7836 7484 7115 6729 8134 7801 7448 7077 6690 1.5 1.6 1.7 1.8 1.9 +6650 6247 5833 5410 4981 6610 6206 5791 5368 4937 6570 6165 5749 5325 4894 6530 6124 5707 5282 4851 6490 6083 5665 5239 4807 6450 6042 5623 5196 4764 6410 6000 5580 5153 4720 6369 5959 5538 5110 4677 6328 5917 5495 5067 4634 6288 5875 5453 5024 4590 2.0 2.1 2.2 2.3 2.4 +4546 4111 3675 3242 2814 4503 4067 3632 3199 2772 4459 4023 3588 3156 2730 4416 3980 3545 3113 2687 4372 3936 3501 3070 2645 4329 3893 3458 3028 2603 4285 3849 3415 2984 2561 4241 3805 3372 2942 2519 4198 3762 3328 2899 2477 4153 3718 3285 2857 2436 2.5 2.6 2.7 2.8 2.9 +2394 1983 1583 1196 825 2352 1942 1544 1159 789 2311 1902 1504 1121 753 2269 1861 1465 1083 717 2228 1821 1427 1046 681 2187 1781 1388 1009 646 2146 1741 1349 972 610 2105 1702 1311 935 575 2064 1662 1273 898 540 2023 1622 1234 861 505 3.0 3.1 3.2 3.3 3.4 +470 +134 -182 478 752 436 +102 213 506 778 402 +69 243 535 804 368 +37 273 562 829 334 +5 303 590 855 300 -27 333 618 880 266 -58 362 645 905 233 -90 392 672 930 200 -121 421 699 954 167 -152 449 725 978 3 4 5 6 7 8 9 x 0 1 - - -- - - - - -- - - - - - - - 2 TRANSIENT AND FREQUENCY RESPONSE TABLE x 3. 0 A FOUR-PLACE TABLE OF 1 I 2 3 4 22-53 sin x/x (Ref. 11) (Continued) 5 6 7 8 9 - -- -- -- -- ----- - 3.5 3.6 3.7 3.8 3.9 -1002 1229 1432 1610 1764 1026 1251 1451 1627 1777 1050 1272 1470 1643 1791 1073 1293 1488 1659 1805 1096 1313 1506 1675 1818 1119 1334 1524 1690 1831 1141 1354 1542 1705 1844 1164 1374 1559 1720 1856 1186 1393 1576 1735 1868 1208 1413 1593 1749 1880 4.0 4.1 4.2 4.3 4.4 -1892 1996 2075 2131 2163 1903 2005 2082 2135 2165 1915 2014 2088 2139 2166 1926 2022 2094 2143 2168 1936 2030 2100 2146 2169 1947 2039 2106 2150 2170 1957 2046 2111 2153 2171 1967 2054 2116 2156 2172 1977 2061 2121 2158 2172 1987 2068 2126 2161 2172 4.5 4.6 4.7 4.8 4.9 -2172 2160 2127 2075 2005 2172 2158 2123 2069 1997 2172 2155 2119 2063 1989 2171 2152 2114 2056 1981 2170 2150 2109 2049 1972 2169 2146 2104 2042 1963 2168 2143 2098 2035 1955 2166 2139 2093 2028 1946 2164 2136 2087 2020 1937 2162 2132 2081 2013 1927 5.0 5.1 5.2 5.3 5.4 -1918 1815 1699 1570 1431 1908 1804 1687 1557 1417 1899 1793 1674 1543 1402 1889 1782 1662 1530 1387 1879 1770 1649 1516 1373 1868 1759 1636 1502 1358 1858 1747 1623 1488 1343 1848 1735 1610 1474 1328 1837 1723 1597 1460 1313 1826 1711 1584 1445 1298 5.5 5.6 5.7 5.8 5.9 -1283 1127 966 800 634 1268 1111 950 784 617 1252 1095 933 768 600 1237 1079 917 751 583 1221 1063 900 734 567 1206 1047 884 718 550 1190 1031 867 701 533 1175 1015 851 684 516 1159 999 834 667 499 1143 982 818 650 482 6.0 6.1 6.2 6.3 6.4 -466 299 -134 +27 182 449 282 -118 43 197 432 265 -102 58 212 416 249 -85 74 227 399 232 -69 90 242 382 216 -53 105 257 365 200 -37 121 272 348 183 -21 136 287 332 167 -5 152 302 315 150 +11 167 316 6.5 6.6 6.7 6.8 6.9 +331 472 604 727 838 346 486 617 738 849 x 0 1 360 374 388 403 417 431 445 458 499 513 526 539 552 566 579 591 630 642 654 667 679 691 703 715 750 761 773 784 795 806 817 828 859 870 880 890 900 910 919 929 - - -- - - -- -- -- - - - - - 2 3 4 5 6 7 8 9 22-54 FEEDBACK CONTROL TABLE x 3. 0 A FOUR-PLACE TABLE OF 1 sin x/x (Ref. 11) (Continued) 2 3 4 5 6 7 8 9 ---- -- -- -- -- ----- - 7.0 7.1 7.2 7.3 7.4 +939 1027 1102 1165 1214 948 1035 1109 1171 1219 957 1043 1116 1176 1223 966 1051 1123 1181 1227 975 1058 1129 1186 1231 984 1066 1135 1191 1234 993 1074 1142 1196 1238 1002 1081 1148 1201 1241 1010 1088 1153 1206 1244 1019 1095 1159 1210 1248 7.5 7.6 7.7 7.8 7.9 +1251 1274 1283 1280 1264 1254 1275 1279 1262 1256 1277 1284 1278 1259 1259 1278 1284 1277 1257 1261 1279 1284 1275 1255 1264 1280 1283 1274 1252 1266 1281 1283 1272 1249 1268 1282 1282 1270 1246 1270 1282 1282 1269 1243 1272 1283 1281 1267 1240 8.0 8.1 8.2 8.3 8.4 +1237 1197 1147 1087 1017 1233 1193 1142 1080 1010 1230 1188 1136 1074 1002 1226 1183 1130 1067 995 1222 1179 1124 1060 987 1218 1174 1118 1053 979 1214 1169 1112 1046 972 1210 1163 1106 1039 964 1206 1158 1100 1032 956 1202 1153 1093 1025 948 8.5 8.6 8.7 8.8 8.9 +939 854 762 665 563 931 845 752 655 552 923 836 743 645 542 915 827 733 635 532 906 818 724 625 521 898 809 714 614 511 889 800 704 604 500 880 790 694 594 490 872 781 684 584 479 863 771 675 573 469 9.0 9.1 9.2 9.3 9,4 +458 351 242 134 +26 447 340 231 123 +16 437 329 220 112 +5 426 318 210 101 -6 415 307 199 91 -16 404 296 188 80 -27 394 286 177 69 -37 383 275 166 58 -48 372 264 156 48 -58 361 253 145 37 -69 9.5 9.6 9.7 9.8 9.9 -79 182 280 374 462 89 192 290 383 471 100 202 299 392 479 110 212 309 401 487 120 222 318 410 496 131 231 328 419 504 141 241 337 428 512 151 251 346 436 520 161 261 356 445 528 172 271 365 454 536 10.0 10.1 10.2 10.3 10.4 -544 619 686 745 796 552 626 692 751 801 560 633 699 756 805 567 640 705 761 809 575 647 711 767 814 582 653 717 772 818 590 660 723 777 822 1 2 x 0 1~84 597 604 612 667 673 680 728 734 740 782 787 791 826 830 834 - - - - - - - - - - - - -- 3 4 5 6 7 8 9 TRANSIENT AND FREQUENCY RESPONSE TABLE x 3. A FOUR-PLACE TABLE OF 1 0 22-55 sin x/x (Ref. 11) (Continued) 2 3 4 5 6 7 8 9 ---- -- -- -- -- ----- - 10.5 10.6 10.7 10.8 10.9 -838 871 894 908 913 842 873 896 909 913 845 876 898 910 913 849 879 899 911 913 852 881 901 911 913 855 883 902 912 912 859 886 904 ()12 912 862 888 905 913 911 865 890 906 913 911 868 892 907 913 910 11.0 11.1 11.2 11.3 11.4 -909 896 874 806 908 894 872 841 802 907 892 869 837 798 906 890 866 834 794 905 888 863 830 789 904 886 860 826 785 902 884 857 822 780 901 882 854 819 776 899 879 851 815 771 898 877 848 811 766 11.5 11.6 11.7 11.8 11.9 -761 709 651 588 519 756 704 645 581 512 751 698 639 574 505 746 693 633 568 498 741 687 626 561 491 736 681 620 554 484 731 675 614 547 476 726 669 607 540 469 720 663 601 533 462 715 657 594 526 454 12.0 12.1 12.2 12.3 12.4 -447 372 294 214 134 440 364 286 206 125 432 356 278 198 117 425 348 270 190 109 417 341 262 182 101 410 333 254 174 93 402 325 246 166 85 395 317 238 158 77 387 309 230 150 69 379 301 222 142 61 12.5 12.6 12.7 12.8 12.9 -53 +27 105 181 254 -45 35 113 188 261 -37 42 120 196 268 -29 50 128 203 275 -21 58 136 210 282 -13 66 143 218 289 -5 74 151 225 296 +3 82 158 232 303 +11 89 166 240 310 +19 97 173 247 316 13.0 13.1 13.2 13.3 13.4 +323 388 448 503 552 330 395 ·154 509 557 337 401 460 514 562 343 407 466 519 566 350 413 471 524 570 356 419 477 529 575 363 425 482 534 579 369 431 488 538 583 376 437 493 543 587 382 443 498 548 591 13.5 13.6 13.7 13.8 13.9 +595 632 661 684 699 599 635 664 686 700 x 8 14 L 0 1 614 607 611 618 603 622 625 628 641 644 638 647 650 653 656 659 669 666 671 673 676 678 680 682 689 691 692 688 694 695 697 698 703 702 703 704 705 706 706 707 - - -- -- -- -- -- -- - -- 2 3 4 5 6 7 8 9 FEEDBACK CONTROL 22-56 TABLE x 3. 0 A FOUR-PLACE TABLE OF 1 sin x/x (Ref. 11) (Continued) 2 4 9 3 5 6 7 8 - - -- -- -- -- -- -- - -- - 14.0 14.1 14.2 14.3 14.4 +708 709 703 690 671 708 708 702 688 668 708 708 701 687 666 709 708 700 685 663 709 707 699 683 661 709 707 697 681 658 709 706 696 679 656 709 705 695 677 653 709 705 693 675 650 709 704 692 673 648 14.5 14.6 14.7 14.8 14.9 +645 613 575 533 485 642 609 571 528 480 639 606 567 524 475 636 602 563 519 470 633 599 559 514 465 630 595 555 509 460 626 591 550 505 455 623 587 546 500 449 620 583 542 495 444 616 579 537 490 439 15.0 15.1 15.2 15.3 15.4 +434 378 320 259 197 428 373 314 253 190 423 367 308 247 184 417 361 302 241 178 412 355 296 234 171 406 349 290 228 165 401 344 284 222 159 395 338 278 216 152 390 332 272 209 146 384 326 265 203 140 15.5 15.6 15.7 15.8 15.9 +133 69 +5 58 120 127 63 -1 64 126 120 56 -8 71 132 114 50 -14 77 138 108 43 -20 83 144 101 37 -27 89 150 95 31 -33 95 156 88 24 -39 102 162 82 18 -46 108 168 76 11 -52 114 174 16.0 16.1 16.2 16.3 16.4 -180 237 292 342 389 186 243 297 347 393 192 248 302 352 398 197 254 307 357 402 203 259 312 362 407 209 265 318 366 411 215 270 323 371 415 220 276 328 376 419 226 281 333 380 423 232 286 337 385 427 16.5 16.6 16.7 16.8 16.9 -431 469 501 528 550 435 472 504 531 552 439 476 507 533 553 443 479 510 535 555 447 482 513 538 557 451 486 515 540 558 454 489 518 542 560 458 492 521 544 561 462 495 523 546 563 465 498 526 548 564 17.0 17.1 17.2 17.3 17.4 -566 575 580 578 570 567 576 580 577 569 x 0 1 572 573 574 568 569 570 571 575 579 579 577 578 578 579 579 577 579 579 580 580 579 579 578 580 -573 574 572 576 576 575 577 571 563 562 561 567 566 565 559 568 - - -- -- - - -- -- - - - - - 4 2 3 5 6 9 7 8 22-57 TRANSIENT AND FREQUENCY RESPONSE TABLE x 3. 0 A FOUR-PLACE TABLE OF 1 2 3 4 sin x/x (Ref. 11) (Continued) 5 6 7 8 9 - - -- -- -- - - -- -- - - - 17.5 17.6 17.7 17.8 17.9 -557 539 516 487 454 556 537 513 484 451 554 535 510 481 447 553 533 508 478 444 551 530 505 475 440 549 528 502 471 436 547 526 499 468 433 545 523 496 465 429 543 521 493 461 425 541 518 490 458 421 18.0 18.1 18.2 18.3 18.4 -417 376 332 285 236 413 372 328 281 231 409 368 323 276 226 405 364 319 271 221 401 359 314 266 216 397 355 309 261 211 393 350 304 256 206 389 346 300 251 201 385 341 295 246 195 381 337 290 241 190 18.5 18.6 18.7 18.8 18.9 -185 133 80 -26 +27 180 128 74 -21 32 175 122 69 -16 37 170 117 64 -10 42 164 112 58 -5 48 159 106 53 +0 53 154 101 48 +6 58 149 96 42 +11 63 143 90 37 +16 68 138 85 32 +21 74 19.0 19.1 19.2 19.3 19.4 +79 130 179 226 270 84 135 184 230 274 89 140 188 235 278 94 145 193 239 282 99 150 198 244 286 104 155 202 248 290 110 159 207 252 295 115 164 212 257 299 120 169 216 261 303 125 174 221 265 307 19.5 19.6 19.7 19.8 19.9 +311 348 382 411 436 314 351 385 414 438 x 0 1 333 337 341 344 330 326 322 318 372 362 369 375 378 365 358 355 403 394 400 405 408 391 397 388 429 431 424 427 434 419 422 416 447 449 451 453 443 445 455 440 ---- -- -- -- -- ----- 8 9 4 6 7 5 3 2 FEEDBACK CONTROL 22-58 TABLE 3. A FOUR-PLACE TABLE OF sin x/x (Ref. 11) (Continued) x 0 2 4 6 8 x 0 20.0 20.1 20.2 20.6 20.4 +456 472 483 489 490 460 475 485 490 490 463 477 486 490 489 466 479 487 490 488 469 481 488 490 487 23.5 23.6 23.7 23.8 23.9 -425 423 418 408 395 425 423 416 406 392 425 422 415 403 388 424 421 413 401 385 424 419 411 398 381 20.5 20.6 20.7 20.8 20.9 +486 478 464 447 424 485 475 461 442 420 483 473 458 438 415 482 470 454 434 409 480 467 450 429 404 24.0 24.1 24.2 24.3 24.4 -377 356 332 304 274 373 352 327 299 268 369 347 321 293 261 365 342 316 287 255 361 337 310 280 248 21.0 21.1 21.2 21.3 21.4 +398 369 335 299 260 393 362 328 292 252 387 356 321 284 244 381 349 314 276 236 375 342 307 268 228 24.5 24.6 24.7 24.8 24.9 -241 206 170 132 93 235 199 162 124 85 228 192 155 116 77 221 185 147 108 69 214 177 139 100 61 21.5 21.6 21.7 21.8 21.9 +219 176 132 87 42 211 168 123 78 32 202 159 114 69 23 194 150 105 60 14 185 141 96 51 5 25.0 25.1 25.2 25.3 25.4 -53 -13 +27 66 104 45 -5 35 74 111 37 29 21 +3 +11 +19 42 50 58 81 96 89 119 126 134 22.0 22.1 22.2 22.3 22.4 -4 49 93 136 178 13 58 102 145 185 22 67 111 153 193 31 76 119 161 201 40 85 128 169 209 25.5 25.6 25.7 25.8 25.9 +141 176 209 240 268 148 183 215 246 273 155 189 222 251 278 162 196 228 257 284 169 203 234 263 288 22.5 22.6 22.7 22.8 22.9 -217 253 287 317 344 224 260 293 323 349 231 267 299 329 354 239 274 305 334 359 246 280 311 339 364 26.0 26.1 26.2 26.3 26.4 +293 315 334 350 361 298 320 338 352 363 303 323 341 355 365 307 327 344 357 367 311 331 347 359 368 23.0 23.1 23.2 23.3 23.4 -368 388 403 415 422 384 400 413 421 424 26.5 26.6 26.7 26.8 26.9 +370 374 375 371 365 x 0 8 x 0 372 376 380 391 394 397 406 408 410 416 418 419 423 423 424 -----4 2 6 2 4 6 8 -------- 371 372 373 373 374 375 375 375 374 374 373 372 370 369 368 366 363 361 359 357 -------2 4 6 8 TRANSIENT AND FREQUENCY RESPONSE TABLE 3. A 22-59 sin x/x (Ref. 11) (Continued) FOUR-PLACE TABLE OF 2 4 2 4 6 8 x 0 0 x - - ---- -------- -'--------- 6 8 _._- 27.0 27.1 27.2 27.3 27.4 +354 340 323 303 280 352 337 319 200 275 349 334 316 294 270 346 331 312 290 265 343 327 307 285 260 30.5 30.6 30.7 30.8 30.9 -260 238 214 188 160 256 233 209 182 154 252 229 204 177 148 27.5 27.6 27.7 27.8 27.9 +254 226 196 164 131 240 220 190 158 124 243 214 184 151 117 238 208 177 145 111 232 202 171 138 104 31.0 31.1 31.2 31.3 31.4 -130 100 69 37 -5 124 94 62 31 +1 118 112 106 81 87 75 56 43 50 24 18 11 +8 +14 +20 28.0 28.1 28.2 28.3 28.4 83 90 +97 48 62 55 +26 +19 +12 23 16 -9 44 51 58 76 41 +5 30 65 69 34 -2 37 72 31.5 31.6 31.7 31.8 31.9 +27 58 88 118 146 33 64 94 124 151 39 70 100 129 157 45 76 106 135 162 52 82 112 140 167 28.5 28.6 28.7 28.8 28.9 -79 112 144 174 203 85 118 150 180 208 92 125 156 186 213 99 131 162 192 210 105 138 168 197 224 32.0 32.1 32.2 32.3 32.4 +172 197 219 230 257 177 202 224 243 260 182 206 228 247 263 187 211 232 250 266 192 215 236 254 269 20.0 29.1 29.2 29.3 29.4 -229 253 274 292 307 234 257 278 295 310 239 261 281 298 312 243 266 285 301 315 248 270 288 304 317 32.5 32.6 32.7 32.8 32.9 +272 284 293 300 303 275 286 205 300 303 277 288 296 301 303 280 290 297 302 303 282 292 299 302 303 29.5 29.6 29.7 29.8 20.9 -319 328 333 335 334 321 320 334 335 333 323 330 334 335 332 325 331 335 335 332 326 332 335 334 331 33.0 33.1 33.2 33.3 33.4 +303 300 294 286 274 303 299 293 284 272 302 298 291 281 269 302 297 290 270 266 301 296 288 277 263 30.0 30.1 30.2 30.3 30.4 -329 321 311 296 280 323 313 300 283 264 33.5 33.6 33.7 33.8 33.9 +260 243 224 203 180 257 240 220 109 175 x 0 8 x 0 328 327 325 320 317 315 308 305 302 293 290 287 276 272 268 -----4 2 6 247 224 198 171 142 243 219 193 165 136 254 250 247 236 232 228 216 212 208 194 190 185 171 166 161 -------2 4 6 8 22-60 FEEDBACK CONTROL TABLE 3. A x 0 34.0 34.1 34.2 34.3 34.4 +156 130 102 74 46 34.5 34.6 34.7 34.8 34.9 FOUR-PLACE TABLE OF sin x/x (Ref. 11) (Continued) 2 4 6 - -- -- - 8 x 0 2 4 6 8 -------- 151 124 97 69 40 145 119 91 63 34 140 113 86 57 28 135 108 80 51 22 37.0 37.1 37.2 37.3 37.4 -174 152 129 104 79 170 147 124 99 74 165 143 119 94 68 161 138 114 89 63 156 133 109 84 58 +17 +11 -12 18 41 47 69 75 96 102 +5 24 52 80 107 -1 30 58 85 112 -7 35 63 91 117 37.5 37.6 37.7 37.8 37.9 -53 26 +0 27 53 47 21 6 32 58 42 16 11 37 63 37 10 16 42 68 32 5 21 47 73 35.0 35.1 35.2 35.3 35.4 -122 147 170 192 211 127 152 175 196 214 132 157 179 199 218 137 161 183 203 221 142 166 187 207 225 I 38.0 38.1 38.2 38.3 38.4 +78 102 126 148 168 83 107 130 152 172 88 112 135 156 176 93 117 139 160 179 98 121 143 164 183 35.5 35.6 35.7 35.8 35.9 -228 243 255 264 271 231 245 257 266 272 234 248 259 268 273 237 250 261 269 274 240 253 263 270 275 38.5 38.6 38.7 38.8 38.9 +186 203 218 230 240 190 206 220 232 241 193 209 223 234 243 197 212 225 236 244 200 215 228 238 246 36.0 36.1 36.2 36.3 36.4 -275 277 276 271 265 276 277 275 270 263 276 277 274 269 261 277 276 273 268 259 277 276 272 266 257 39.0 39.1 39.2 39.3 39.4 +247 252 254 254 252 248 253 255 254 251 249 253 255 254 250 250 254 255 253 249 251 254 255 252 248 36.5 36.6 36.7 36.8 36.9 -255 243 229 213 194 246 232 216 198 178 39.5 39.6 39.7 39.8 39.9 +246 239 229 217 203 x 0 x 0 251 248 238 235 223 220 206 202 186 182 -----2 4 6 253 241 226 209 190 -81 245 244 242 241 237 235 233 231 227 224 222 219 214 211 208 206 199 196 193 190 -------2 4 6 8 TRANSIENT AND FREQUENCY RESPONSE 22-61 ACKNOWLEDGMENT Figures 13 to 30 are reproduced with permission from H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. I, Wiley, New York, 1951. The example in the section on Determining Transient Response from Frequency Response is reprinted with permission from G. S. Brown and D. P. Campbell, Principles of Servomechanisms, Wiley, New York, 1948. REFERENCES 1. ,V. R. Evans, Control System Dynamics, McGraw-Hill, New York, 1954. 2. G. A. Biernson, Quick methods for evaluating the closed-loop poles of feedback control systems, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, 53-70 (1953). 3. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. I, Wiley, New York, 1951. 4. G. A. Biernson, Estimating transient response from open-loop frequency response, Trans. Am. Inst. Elec. Engrs., 74-, 388-403 (1956). 5. G. S. Brown and D. P. Campbell, Principles of Servomechanisms, Wiley, New York, 1948. 6. H. W. Bode, Network Analysis and Feedback Amplifier Design, Van Nostrand, Princeton, N. J., 1945. 7. A. V. Bedford and G. L. Fredendall, Analysis, synthesis and evaluation of the transient response of television apparatus, Proc. I.R.E., 30, 440-458 (1942). 8. A. R. Teasdale, Jr., F. E. Brooks, Jr., and J. P. German, System Frequency Response Derived from Transient Response, Am. Inst. Elec. Engrs. District Paper, New York, October 1950. A. R. Teasdale, Jr., Get frequency response from transient data by adding vectors, Control Eng., 2, 56-59 (1955). 9. H. A. Samulon, Spectrum analysis of transient response curves, Proc. I.R.E., 39, 175-186 (1951). 10. J. B. Reynolds, Jr., Get frequency response from transient data by machine computing, Control Eng., 2, 60-63 (1955). 11. J. Sherman, Z. Krist., 85, 404 (1933). E FEEDBACK CONTROL Chapter 23 . Feedback System Compensation P. G. Cushman 1. Design Criteria and Techniques 23-01 2. Compensating Components: D-C Systems 23-18 3. Compensating Networks: A-C Systems 23-48 4. Open-Closed Loop Control 23-54 References 23-56 1. DESIGN CRITERIA AND TECHNIQUES The first step in the design of a feedback control system is the selection of a suitable power element with sufficient torque, or force, speed, and power rating to drive the load. Once the selection of a power element with known characteristics has been made, the signal devices, amplifiers, and stabilizing components have to be chosen with such characteristics that make the entire feedback control system meet system requirements of accuracy, speed of response, and stability. This chapter is devoted'to the synthesis of required characteristics of these compensating components and the presentation of characteristics of practical control system components. Section 1 derives feedback control system characteristics from system specifications. Synthesis of Log Magnitude Diagram from System Requirements Low-Frequency Portion: Static Error Coefficients. Error coefficients are one of the most common means of specifying control system 23-01 FEEDBACK CONTROL 23-02 performance. These coefficients are figures of merit, the higher the coefficient, the smaller the control system error in achieving a required output. The static error coefficients are defined as the ratio of the constant output required (position, velocity, or acceleration) to the control system error required to achieve that output. The types of control system and the static error coefficient associated with each type are summadzed in Chap. 20. (See also Ref. 1, Chap. 8.) The static error coefficients influence the log magnitude diagram in an easily visualized way and lead to a method of control system classifica- o db/decade --~-----..::'--- 20logKp Type 0 system Type 1 system w=Ku Type 2 system FIG. 1. Sample log magnitude diagrams showing influence of static error coefficients. tion. For example, a control system with a transfer function that approaches K p , a constant, at low frequencies (open loop transfer function, G(s) approaches Kp as s approaches 0) will have a log magnitude diagram which has zero slope at low frequencies. Such a system is called a type o system (0 slore at low frequencies) and can follow a steady input, ro, with an error of ro/ (1 + Kp). If Kp is large, the error will be small. However, if a velocity signal, r = vot, is applied, the error will continue to increase with time. For a system to follow such a signal with small error, a type 1 system is required which has a transfer function at low frequency of FEEDBACK SYSTEM COMPENSATION 23-03 Kv/jw (limit SG(S)8=0 = Kv) and an initial slope on the log magnitude diagram of - 20 db per decade. A type 1 system would follow the constant velocity input with an error of only vo/Kv. Similarly, a type 2 system (Ka/(jw)2 transfer function giving a slope of 2( -20) = -40 db per decade at low frequencies) is required to follow a constant acceleration input with moderate error. The type of system determines the shape of the log magnitmde diagram at low frequencies and the gain magnitude of this portion of the diagram is determined by the static error coefficients. The intersection of the extensions of the initial log magnitude diagram slope with the w = 1 line is at the value 20 log K p , 20 log K v, or 20 log K a , as the case may be. The intersection of the extensions of the initial slope with the O-db axis also has significance as shown in Fig. 1. Low-Frequency Portion: Dynalllic Error Coefficients. In addition to the steady-state characteristics, expressible in terms of the static error coefficients, it is often desirable to specify control system errors during a transient by means of dynamic error coefficients, defined in Chap. 20. That is, 111 1 e = - r -1' - f -·r+···, lio K1 K2 K3 + + + where r, i', f are successive derivatives of the input time function and K o, K 1 , K 2 , etc., are the dynamic error coefficients. The above relation is valid during time intervals in a transient which are far displaced in time from a discontinuity in the input function, r, and its derivatives. The above equation converges quickly to useful values for slowly changing input functions for which the higher order derivatives are small relative to the lower order terms. The coefficients can be evaluated by straightforward Laplace transform techniques, as given in Chap. 20. That is, n 1 dn [E-(s) ]. - 1 = -lim- Kn n! 8---+0 ds R Some of the error coefficients, evaluated in this way, will be found identical to the static coefficients of the previous paragraph. However, additional coefficients will also be determined. The composition of these generalized error coefficients can be seen from a general control system transfer function (see Ref. 4). E(s) R(s) + n1S + n2s2 + n3s3 + .. . 1 + d1s + d2s2 + d3s3 + .. . no 23-04 FEEDBACK CONTROL The dynamic error coefficients for this system are: - 1 Ko - 1 KI = no· 1 = nl - -dl' Ko 111 = n2 - - dl - - d2 • - K2 Kl 1 - Kk j=k-l = nk - L: j=o Ko 1 -d(k-j). Kj The dynamic error coefficients in general are composed of the gain term in combination with various sums and products of the system time constants. These coefficients are readily calculable and are valuable for analysis purposes for a system of known transfer function. However, they are not very useful in synthesizing the log magnitude diagram from system requirements, because each of the coefficients is composed of a number of parameters of the system characteristics. For this synthesis work, a more direct procedure is outlined in the next paragraph. Low-Frequency Portion: Transient Curve Fitting Procedure. A curve fitting procedure (Ref. 2) by which certain system error requirements are transformed directly to log magnitude values which the log magnitude diagram must exceed is useful. In this method, the expected transient input signals are matched by sinusoids. The principle is that if a control system can follow with small error sine wave inputs with amplitude, velocity, and acceleration components as great as those of the transient input, then it can follow the transient with small error. The worst transients that the control system will be expected to follow are presumably known, either in graphical or analytical form, together with the allowable errors during these transients. These transient time functions are plotted and fitted as closely as possible in various places with sine waves as indicated in Fig. 2. The amplitudes of these sine waves are AI, A 2, A 3, etc., with frequencies WI, W2, W3. AdE, A 2/E, A3/E, etc., are the required gain magnitudes of the log magnitude diagram at WI, W2, W3 if E is the allowable control system error. These points are shown in Fig. 3. The required log magnitude diagram must be above these points. It is important to fit the input transient at several places, such as peaks and maximum slope points, so that broad coverage of requirements is established by several points on the log magnitude diagram. Sometimes it (3 advantageous to take the derivative of the input transient and fit it FEEDBACK SYSTEM COMPENSATION 23-05 with sine waves of amplitude Vb V 2, etc., at frequencies Wl, W2. These fits will establish points on the log magnitude diagram of magnitude VdWlE, V2/W2E, etc. The procedure can be extended to higher derivatives also. t~ FIG. 2. Construction illustrating curve fitting procedure. Diagram must be above points obtained by curve fitting db 40------------~--~--~---------~~-----20 log Al 30------------~E~_r--~~---------------- I 20------------~--~1__~----~------~----20 log A2 ~~ XI ~20 log AE3 10------------=E--~1---Tl-----+j------------I j j O----------------~I--~I----~I------------WI FIG. 3. W2 Ws W-~ Log magnitude points obtained from Fig. 2. Mid-Frequency Portion of the Log Magnitude DiagraIll. Once the gain of the entire control system has been set so as to meet the requirements as outlined in the preceding paragraphs, it is usually desirable to reduce the system gain effective at the higher frequencies in order to reduce the susceptibility of the system to noise and other extraneous signals. However, this reduction in gain has to be achieved in such a manner that the system has the required stability. As explained in Chap. 21, stability may be assured by requiring the log magnitude diagram to have a slope of - 20 db per decade in the vicinity of the crossover frequency. To obtain adequate stability, this - 20 db per decade slope should extend for a frequency range of a decade or more. The use of the log magnitudeangle chart (Nichols chart) provides a measure of stability in terms of the maximum M of the closed loop frequency response. Such charts are given FEEDBACK CONTROL 23-06 in Chap. 21. To indicate approximate magnitudes, Fig. 4 shows the maximum M that could possibly be obtained for a particular minimum value of phase margin. Often it is convenient to express the degree of stability, or damping, of a system by means of a damping factor. Strictly, a damping factor can be applied only to a system that can be described by a second order linear differential equation with constant coefficients, but it is frequently applied 5 , 4 \ \ 3 M \ 2 \ ~ 1 o 10 20 30 '" 40 .............. 50 --... - 60 70 80 90 Phase margin, degrees FIG. 4. Maximum peak of frequency response versus minimum phase margin. to higher order systems. When the response is determined largely by two complex roots, which is fairly common, the closed loop response is characterized by a zero slope region of approximately unity gain at low frequencies followed by a resonant peak in the vicinity of crossover of the open loop. At frequencies above the resonant peak, the slope changes to - 40 db per decade and then usually to even greater negative slopes. Thus for the frequency region from zero to somewhat above the resonant peak, many systems have much the same frequency characteristic as a second order system. For such a system the height of the resonant peak, when expressed as a numeric ratio, M m , determines the damping factor, r, by the equation: 1 M m = - - = = =2 valid: o < r < 0.707. 2rv'1 - r ' FEEDBACK SYSTEM COMPENSATION 23-07 Frequently it is convenient to measure the magnitude of the frequency response, Me, at the corner frequency. For such a measurement, the damping factor is 1 t=-· 2Mc The damping of oscillations in a physical system is a function of the damping factor. When a system is excited by a unit step function, the magnitude of the first and successive overshoots is determined by the damping factor as shown in Fig. 5. See also Chap. 20. 0.8 \ i\ 0.7 _\ 0.6 \ 0.5 0.3 ... \ 0.1 Correction using ~iChOIS chart \ l Closed internal loop, B2/M "../ -20 ).( ,,/ Tach ".. -10 " . '\ ............. ~ 1 '0 Rate feedback component. """ (tachometer) i\ ~ '" '\ V \ -30 [\~ \ /,V \ -40 0.01 100 10 0.1 (a) 50 40 ' ..... 30 20 Qj 10 0 0 VI .c 0 Q) "- ............. ........ ..... .......... ..... IIII yOpen outside loop, C/E l " ........... .......... Inverse of internal feedback .............. .......... ,~ Iii I~ 20 log Kl V ... ~.~ ~ r.....", .......... -10 -20 -30 -40 0.01 ~, 1 element Internal feedback I..... ~. ~ .......... " I I \ .....\ . \ II :11 0.1 \ 10 (b) FIG. 17. Internal rate feedback compensation. 100 23-23 FEEDBACK SYSTEM COMPENSATION 50 1///'" , , 40 Power element ... V" ".~ , 30 ~ 20 J..-i.- v~ If) a; .c ·0 Q.I 10 / 0 -20 ~ ..... ~ V )/ Open internal loop ), I J..- network feedback to- '"'f\ )/ '// f'r, Closed internll loop, B2/M / " / 1\ (~~~t@' 1 element - B2 -30 -40 0.01 !C. .Rate plus lead ~ , l' i't> I;J..- Cl -10 ~ J..- v ... ./ \ Internal feedback \ '. \ , 100 10 0.1 w(a) ,, "'N .... 20 log Kl '. ~.f \ 1\ 50 40 " 30 \. 20 Inverse of internal feedback"" If) 10 Q.I Cl - Open outside loop, C/E 1 " " ~, Q) .c ·0 \ \ r--.... " 0 -10 E1 -20 M E2 K1 -30 Power element C ' .. 1'r-.. 't--o "- ~ r.., Feedback "" -40 0.01 I I 1\ ",~r, 10 0.1 w(b) FIG. 18. , 1\ IIID Internal rate plus lead network feedback. ~o 100 FEEDBACK CONTROL 23-24 and phase diagram for an internal rate feedback element, the closed internal loop and the log magnitude diagram of the outside open loop. The gain of the internal loop must be greater than unity at frequencies up to and somewhat beyond the desired crossover frequency of the outside loop and the gain, Kl, may be set to give the proper crossover. Much of Fig. 17 can be constructed by using approximate straight line diagrams, but portions of the diagram in the frequency region near crossover of the internal loop should be corrected by using accurate magnitude and phase values from the log magnitude-angle diagram (Nichols chart, see Chap. 21). Rate and Lead Network Feedback. In some systems, it is necessary to obtain higher gain at the low frequencies. This can be obtained in a system using internal feedback by adding a lead network to the rate feedback. Figure 18 shows the log magnitude and phase diagram of such an internal feedback element along with the system diagram leading to the open loop diagram of GIE(s). For the higher order lead networks, the inner loop may be unstable by itself. In such a situation it is necessary to determine the number of Imaginary Re I \ \ \ , ' ..... ,,--- FIG. 19. Nyquist diagram of unstable inner loop as indicated by two clockwise rotations. positive real poles in the closed inner loop in order to apply a stability criterion to the outer loop. An example of such a system is shown in Figs. 19 and 20. The closed inner loop contains two positive real poles as indicated by the two encirclements of the -1 point by the Nyquist sketch of the open inner loop transfer function. For the outer loop, and therefore the whole system, to be stable, the Nyquist plot must encircle the -1 point twice counterclockwise. It does this as indicated in Fig. 20. FEEDBACK SYSTEM COMPENSATION 23-25 ,...------- .............. Imaginary \~W ", = 0- " " ............ , ", \ w = 0':_:"'\ \\ \ , \ \ \ \ I I Re ,, I I I --------FIG. 20. -- .," '" '" ,,/ / I I I I Nyquist diagram of outside loop. Stability is indicated by two counterclockwise rotations about -1. Lead Network Feedback. A lead network is a rate measuring device that is somewhat inferior in performance to the components mentioned in the paragraph above, but because of simplicity and low cost is often used in place of these more expensive components. Such a network is equivalent to a tachometer, or other rate device, at low frequencies, but it does not have rate characteristics above a frequency which is equal to liT where T 50 40 ..... 30 20 VI OJ .c 'C:; Q) c Tachometer characteristic V" -10 ~ -20 FIG. 21. V ".../ a -40 0.01 .... ,...... 10 -30 y ".../ ~ .... ~~ ~ Lead ~ \ network characteristic) ", .... ~~ '~ V 0.1 10 100 Comparison of tachometer and lead network characteristics. 23-26 FEEDBACK CONTROL is the time constant of the network. Figure 21 shows a comparison of tachometer and network characteristics. The tachometer also has a highfrequency droop in its log magnitude diagram, but this is usually well above any frequencies .of interest in the feedback control system. The lead network can also have a rate characteristic out to high frequencies by reducing the network time constant, but this lowers the gain of the circuit at the frequencies of interest. Such a lead network feedback is particularly useful in systems in which a d-c voltage is one of the intermediate outputs. An example is the voltage rate feedback around the amplidyne in a voltage regulator. Multiloop SysteDls. In the preceding discussion, internal feedback loops have been closed to form a portion of the open outside loop of a feedback control system. In the same way this second, or outside, loop can be closed by using the log magnitude-angle diagram, and becomes a portion of a third feedback control loop. This procedure can be extended to any number of concentric feedback loops, such as may be present in a complex feedback control system. However the block diagram of complex control systems often is not in the form of concentric loops. Chapter 20 shows how intertwined block diagrams can usually be transformed into concentric loops by making use of superposition rules. Alternate Methods of Representation The preceding paragraphs have shown how the log magnitude diagram of a power element can be modified by the addition of series or feedback components to obtain the log magnitude diagrams synthesized in Sec. 1. There are several over forms in which these same data may be presented and handled to obtain the same desired results. Some of the more commonly used forms are the Nyquist diagram, the inverse complex plane diagram, and the root locus plot. Nyquist DiagraDl and Inverse CODlplex Plane DiagraDl. The use of these diagrams is described in detail in Chap. 9 of Ref. 1. Since these diagrams contain exactly the same information as the log magnitude diagram, essentially the same principles as described above may be used. The steps may be summarized: 1. Select the starting axis (type of system) and gain factor from the static error coefficient requirements. 2. From stability and transient response requirements, determine the maximum allowable M and draw in this M circle. 3. By using the gains established in step 1 and the chosen power element, draw a Nyquist diagram. 4. Add frequency sensitive networks (or proper internal feedback loops) as needed to reshape the diagram to avoid the required M contour. See FEEDBACK SYSTEM COMPENSATION 23-27 Fig. 22. (This is a trial and error process which will become more efficient with the user's experience.) Imaginary Nyquist diagram of power element using required gain. ----------~~----+_------~~--_;+_----;_-Re System Nyquist diagram I Starting axis and low-frequency gain established by requirements I FIG. 22. Nyquist diagram showing synthesis procedure. The Nyquist diagram is used in this discussion because of its historical position, although it is somewhat easier to use the inverse complex plane plot in this type presentation. Root Locus Plots. (See Ref. 6.) This is essentially a complex plane graphical representation of the pole-zero configuration synthesis presented in Sect. 1. In this plot, the locations of the closed loop poles are traced on the complex plane as the open loop gain is varied. Use of this diagram may be broadly outlined in steps analogous to those of the preceding paragraph: 23-28 FEEDBACK CONTROL 1. Select the closed loop poles from system specifications of performance and stability. These may be located on the complex plane. 2. Start with the poles and zeros of the power element and draw the root locus plot. 3. Add open loop pole and zero combinations to modify the root locus plot to pass through the required closed loop poles. Reference 6 indicates optimum selections of added pole and zero configurations to achieve the desired changes in locus shape. Figure 23 indicates the above steps. A i W11 Closed loop poles at required gain ; I!I Root locus of --=r compensated system Root locus of power element I~---r- -..--x--------~---------------~---_+k ~ Power element poles x Added compensating poles o Added compensating zeros EI Required closed loop dominant poles FIG. 23. Root locus plots showing synthesis procedure. COlllparison of Alternate Methods of Representation with the Log Magnitude Diagralll. The log magnitude and phase diagrams con- tain the same information as the Nyquist or inverse complex plane diagram. Because the log magnitude diagram is much easier to construct than the other two diagrams, and required modifications to meet specifications are FEEDBACK SYSTEM COMPENSATION 23-29 more easily visualized and constructed on the log magnitude diagram, there is normally no reason for using the Nyquist or inverse diagram in control system design work. In very complex systems, containing several positive real roots, it may be desirable to make rough order of magnitude Nyquist sketches to check rotations about the -1 point in order to check stability but the actual numerical work should be done using log magnitude diagrams. The log magnitude diagram is also easier to construct than the root locus plot and would normally be used for problems concerned with stability, bandwidth, static and low-frequency errors. However, the root locus plot has more specific information regarding actual transient response characteristics and damping factor, and would be used in problems in which this type of information is of primary importance. Design Aids Charts of Elcctric Networks. Tables 3 through 7 show many of the electric networks useful in compensating d-c systems. All these networks are of the resistor and capacitor type since any practical type of frequency characteristic can be obtained with these components. Inductance is also a useful circuit component, but large time constants cannot be obtained in sizes competitive with resistance-capacitance components. TABLE 3. STABILIZING NETWORKS: LEAD NETWORKS WITH 20 DB/DECADE SLOPE (Ref. 2) • Attenuation Characteristio Network AC II Go- 0 G .. - 1 (a) BR~ d:P:j--~:- I. I fill fj G.-~ Go - 0 1+:8 (b) o db I I 7'j fj 1 G o - _- fill G.. - 1 I+~ o -----------db I fill Ta G G o - - _1_ 1 +D~N __ 1- .. I+~ o -------------- 1 G . - _- I+~ o ------------- I I ~ 1'; 01--,---:---db Goo 20 db/dec B G. - 23-30 'iT'+'N B(E + G + Nl + GN G.. - (B + N)(E + G) + BN TABLE 3. STABILIZING NETWORKS T. T. ABRC ,-. Transfer Function Til T ,• +1 T,. T ,• + 1 A(B + D)RC G (Til + I) '(T .. + I) ANRC G (Til + '(T .. + I) 1) ANRC G (Til + '(T .. + 1) 1) A(E + NlRC G (Til + o(T.. + 1) 1) A(E + NlRC G~ (T,l + 1) (T,.+ 1) A (Continued) [(E + G + Nl + G:] RC 23·31 ABRC B B+N T , [ B B +D ] + D +N T, [B+/:N] ~TI [B A + D + EE: N ] B +D+N [(E + G) +B B: T, N] RC TABLE 4. STABILIZING NETWORKS: LEAD NETWORKS WITH 40 DB/DECADE SLOPE (Ref. 2) Attenuation Characteristic Network (a) Go = 0 G.. - 1 o -----------ff.---20 db/dec db (b) G. =0 QC AC 1 r; (d G.. - 1 o -----------------Goo (d) G _ .. 1 1 + (8 + G)E BG ER G.. - 1 23-32 TABLE 4. STABILIZING NETWORKS Tr&Ilsfer Function T,T"I T,T.." + [ T, (1 +~) + T,] ,+ 1 T,T,.' TIT~' [ ( 8 ' G)D] + [ T, ( 1+ n G) + T, ( 1+ GD) 1+ --io-- T 1 7',s" (T,. + n(T" + 1) + {Tl (1 +~) + T, [1 + (8 ;~)DJ} + ~ • + l)(T~ + 1) [1 + (8 ~GG)E (T,. T,T, J• + 1 J.' (Continued) Tl T, ABRC GQRC ABRC GQRC DQRC AN ANRC - DQ Tl DQRC ANRC - DQ T, BE A B +E RC GQRC. AN + {T' [1,+ ~ + (B +~: ME] + T, [1 + (B + ~cg + El]} ,+ ~ ,TIT•• , + [ TI ~ + T, (B TIT••' ! E) ] 8 + Go , + [ Tl + T. (1 + ~) ] 8 + 1 I 23-33 TABLE 5. STABILIZING NETWORKS: LAG NETWORKS JVITH 20 DB/DECADE SLOPE (Ref. 2) Network Attenuation Characteristic Go NR LD ! T! db 1 ~ (a) Go= 1 G.. - 0 Or-~--~--------------- Go NR ~ t: 1;0 c db j' 1 1 ~ \II Tz (b) G Go - 1 __ 1_ 1+~ .. o -------------------NR Go db Gm 1 1'; \II 1 ~ (c) G __ 1_ o 1 +~ G .. 1 =---- l+~+~ o --------------------Go NR db 1 Tz (d) 23·34 \II TABLE 5. STABILIZING NETWORKS (Continued) Transfer Function TI Tl 1 Til + 1 NHRC 0 T,.+ 1 T,l + 1 (B ~ N) Tl BHRC 0.(T.3 + 1) (TI'+ 1) [1 + B(:ZN)] T, BHRC O. (T •• + 1) (T I.+ 1) [ +~+~ 1 I B F N N I+ B + O +"F 23-35 T ' (B a.: 0) HRC TABLE 6. 'STABILIZING NETWORKS: LAG NETWORKS WITH 40 DB/DECADE SLOPE , (Ref. 2) Network Attenuation Characteristic Go O.-----~------------ DR NR db L =scf C::cf tI I· (a) Go - 1 G. - 0 o DR NR 0;--------------- ER db e.. SC HC FR (b) F Go - D + E + /" + N G.. - 0 (e) . ",. ~ ~ Go · l o Go -----------------DR NR db Drawn for T1> T2 1 Tz (d) G F o""D+F+N 23-36 G.. = [ FN] [ 1 + B(F + N) 1 D] D (B + F) + G + Ii (F + N) TABLE 6. STABILIZING NETWORKS (Continued) Transfer Function 1 TIT,a1 + [Til ( 1 + ~) + T.] • + I I TIT••' D) + T. (E +~' 1 + [ TI ( 1 + N -F+ - N)] • + Go (T I• ;- I)(T.s TIT. [ (1 + + [ T, (1 + ~ + ~) + F T,T. {[ 1 + +F N (T,s + 1) + ~ ) ( 1 + ~) +~] .' IHT.s FN] [ D] B(F + N) 1+G + T. (1 + ~) ] • + (8 Tl T. lINRC' DSRC ~ DSRC F) HNRC BlIRC GSRC BHRC GSRC I 1) D(H+Fl}. + Ii (F + N) • + F)] ( D D ) } b ;- Go T2 AC l~ (a) Go -1 G.;o I Drawn for T1> T2 AC (6) GO= _ _I_N Q. eo I I+ D + G ER (Ref. 2) Drawn for 71 > T2 Drawn for 71> T2 AC , db (e) Gn .. 1 (d) GO= _ _I_N l+ B + O 23-38 TABLE 7. STABILIZING NETWORKS (Continued) TransCer Function + (T" I)(T" + ~"T,a' + [ T, (I +~) I) t T. ] , + + [ T, T. BEIRe ANRC BO ii+GHRC' ANRC 1 (T,a.,. IHT" of- 1) T,T,,' T, (1 ... ~) + T.l a + ~ - T,T. r 1 T (T,a of- B(:~ N)] (T,. T,T, [ 1 + B(EE +N N) ]8, + { T, ( 1 ,. OCT"~ + 1) + [ T, ( 1 + ~) + T.] , + 1 + l)(T., of- 1) EN + ~) B + T. [ 1 + (B + G)(E + N)] } ,+ lGo 23-39 DHRC' A(E + MRC ~HRC AlE + MBC B+G TABLE 7. (Continued) STABILIZING NETWORKS Network Attenuation .Characteristic _1_ CII CII=~ G1 =(1+ ~) ~ +1 (e) ER Go - 1 G. - 1 Go - 1 G _ 1 • AC (fJ DN 1 ER + B(D + E + N) + N(D + E) B(D+ E+N) AC db DN (8) + B(D + E + N) N(D + E) 1 + B(D+E+N) G _ 1 .. 1 G O - - -- N I+ B + G ER AC db 71«T2 Attenuation curves are for fairly large values of F only (h) DN 1 c G N(D - 1 23·40 + E) + B(D+E +N) 1 ( + B(D + E + N) + F E D +E +N ) ( DN) D +N +B 7. TABLE STABILIZING NETWOHKS (Continued) Transfer Function T,T•• I + T 2• + 1 + [TI (1 + ~) + T.] 8 + 1 T,T,.' TIT. [1 T T [1 I • + BCD +DNE + N) ] + B(D+ N(D + E) ] , 1 + E+ N) T,T, [ 1 + B(D : ; + N)] 8 TIT. + { T, '[IT1 , . [ ( N) 1+B + 2 [ 1 81 8 2 + [ N ( D + E) TITI 1 + B(D + E + N) TI ~HRC A(D +N)RC D+N + (TI + T.)8 + 1 [TI (1 + ~) + T.] 8 + 1 + T. [ 1 + (B + G)(~: E + Nl]} 8 + 1 + B(DN(D + E+ +E) N) ] 8 2 {TI + E) ] + TI [ 1 + (B + N(D G)(D + E + N) 1 + B(D:; + N)] T. } 8 BHRC A(D + E + N)RC ~HRC A(D +E+N)RC ~HRC A(D B+G + 0.1 + Tz [1 + (B + G)(~: E + N)]} 8 + 1 ] 81 + F D + EE + N ) (D + N + DN) B {T. 1 ( T, ( DN) + E) ] + {TI (1 + lY) B + Ii D + N +B + T. [ 1 + (B + NCD G)(D + E + N) T. ( E ) ( DN )} 1 +Ii D+E+N D+N+ B + G 8+0. 23·41 B+G + E+N)RC t.J TABLE 8. Mechanical Lead Network DI (a) ~ xi I I Log Magnitude Characteristic ~OOOljObO~ Xo Kl W MECHANICAL COMPONENTS, LEAD Odb 1.t.J Transfer Function ~G~ TIS TIS + 1 20!db/decade ~ w~ 1 TI T2 DI KI ... Ti (bi ~ xi C Go=O ~ I rOOOOHI Xo K2 ~ Dr Goo = 1 "T1 m m o 0:1 Odb > n Goo f- ~idb/decade iboooooo I ~ 1 1 711 Kl Go=_l1 + K2 Kl w_ T2 G Dl KI GOTl TIS + 1 o T2 s + 1 ~ xi ~ Xo iboooooo'--- Odb Kl f- --I 4:::LYooo'D2 K2 Goo 40 db/decade w~ 0 0 0 =0 Goo = 1 Z ;;:c Goo = 1 ~20db/decade o o r- Dl (c) A n Dl Kl TlT2S2 TlT2S2+ [T l + (1 + :2)T2 ]s+1 1 D2 K2 I 1_ TABLE 9. MECHANICAL COMPONENTS, LAG Mechanical Lag Network Log Magnitude Characteristic KI (a) 1 DI ~libooooo' xi I I a ~ Xo odb Transfer Function ,T1 T2 1 Dl KI ... Go ~ecade TIS + 1 l 1 "T1 m m oC:I w~ 1'1 G O= 1 » () Goo =0 A I VI ! (b) - 0- KI rOOOOOOOO D2 - o db I-- GO r I Dl LF=1 1- (c) - 0-----100000000 f' DI - r l 1 w~ T2s+ 1 TIS + 1 T2 Goo odb Xo G 1 00 -1 +D /D 2 1 Go - o ~ "'C m Z VI I ~~ecade w~ "-- I Dl KI » -t oZ D2 K2 1 D2 UOooL£L}- -t m () T; 40 db/decade K2 DI KI ! G o= 1 KI l -< VI ~ Goo 1 ~ :-: I db/decade I Xo xi ~20 Go = 1 Goo =0 Tl T2S2 + [TI + (1 + ~2 )T2]s + 1 1 I ~ (.) 1(.) t-.) 10. TABLE w MECHANICAL COMPONENTS, LAG-LEAD l... ~ Mechanical Lag-Lead Network a xi K1 Dl ~ooo~oooy. Kl Log Magnitude Characteristic o db D, EIJ--1Q1lo ~ IGII i _. - Transfer Function 1 1 - Go = G co = 1 Tl + T2 1- Tl + (1 +~~)T2 G _ T2 Dl Kl D2 K2 w_ ~ 1'1 0 TI (T1 8 +1)(T2 8 -+1) TIT28 2 + [T 1 + (1 + ~~ )12]8 + 1 " "T1 m m o O:J }> n "I A n I i I ! o z--t :;0 o r- TABLE 11. MECHANICAL-HYDRAULIC COMPONENTS log Magnitude Characteristic Mechanical-Hydraulic Network w --;.- .1 Tl Valve T2 Goo Vi Piston Transfer Function G00 1i s TIs + 1 a (a + b)K "T1 m m Goo = -} Go=O 0 o;J ~ Go j ! I .1 Go = ... :b G _ b+c 00- W --;.- 12 Tl » n A Go(Tls + 1) T2s + 1 b+c CJ( a (a + b)K (J) -< (J) -I m ~ .. n 0 ~ ~ Go w --;.- .1 Tl Go = f +1 ... iJ m Z (J) » -I (5 Goo=O z lli I 1 "7i G o-C- .. +b TIs b+c CJ( Goo= Goo GO(Tls : .1 T2 w_ + 1) Tzs+ 1 b+c CK I (a: b)K b:c t-.) K = velocity of piston per unit valve displacement W 1.. OJ "l W J:,. 0. TABLE 12. PNEUMATIC COMPENSATING COMPONENTS. ApPROXIMATE RELATIONSHIPS FOR HIGH Loop GAIN CONTROLLERS, E« 1 LEAD Plot for T = Tl Pm - Po Al [ 1 + TlS] Pc - P r = A2 1 + kTls Tl = me i ~ ., Al/kA2 bO o· -' Al/A2 k- = change in PI for a unit change in Pm when m is completely closed. I 1 1 2'lrT 27rkT Log frequency. cpm .... = Pressure source "T1 m m 0 0;1 » n A n 0 Z -f :::c 0rPlot for T = Tl Pm - Po = Al [1 Pc - P r A2 p; Tl = me + (A3/ A2)Tl8] 1 + Tls ! ~ bO .9 -----r-: Log frequency. cpm LAG Differential area Al Plot for T [1 + l/TIS] Pm - Po Al Pc - P r = A2k 1 I ~I + e/kTIs ~ TI = CRe E = = TI 1 AI/E.A2 ·1 I "T1 a system constant related to the loop gain. -~ m m 0 c;J To atmosphere tX ~W -80 ~ IJ 10 0.1 0.01 W We FIG. 26. Characteristics of the bridged-T network. eo (s2/wc 2) ~= (s2/wc2) + vTJT-tCs/wc) + 1 + gV 1'2/T1(S/wc) + l' DN T1 = D +N HRC , g = (1 + 15N) T1T2 + 1. T2 = ACD + N)RC, 100 FEEDBACK CONTROL 23-52 As indicated in Table 7(a) the transfer function of the bridged-T network is eo TIT2S2 + T 2s + 1 TIT2S2 ei + {[I + (N/D)]Tl + T2}S + 1 where Tl = ( DN )HRC, D+N T2 = A(D + N)RC. The circuit is adjusted until TIT2 = 1/w e2, where We is the carrier frequency. The minimum value of the transfer function occurs at W = We, where the value is 1 1 g 1 + [1 + (N /D)](TdT2) The above equation can be rewritten eo ei (S2/ w/) + -vT;iT;(s/w c) + 1 (S2/We 2) + VT 2/T 1 g(s/We) + 1 Characteristics of this network for several values of T2/Tl are shown in Fig. 26. It is seen that this can approximate the ideal characteristic of Fig. 25. The factor T2/Tl largely influences the rate of change of angle near We whereas g determines the magnitude of phase change that can be , obtained. Characteristics of the parallel-T network can be found in Ref. 5. Sensitivity to Carrier Frequency Shift. The most serious weakness of a-c feedback control systems using series networks is that the system performance is impaired by normal shifts in the frequency of the carrier. For example, in an aircraft power system, the 400-cps power may be frequency regulated to only 5 or 10 per cent. In many a-c feedback control systems using series stabilizing networks, this amount of frequency shift will render the system completely useless. Figure 27 shows the effects of a carrier frequency shift on the operation of an a-c stabilizing network. These are: 1. The gain of the control systems at low frequencies is increased. This may result in saturation in subsequent elements in the control system. 2. The phase of the carrier is shifted. This means that any phase sensitive devices such as discriminators or a-c motors will not operate at best efficiency. FEEDBACK SYSTEM COMPENSATION . Nominal We 23-53 31 Log magnitude Phase FIG. 27. Effect of carrier frequency shift on the operation of an a-c series stabilizing network. 3. The phase lead at the control system frequencies (awe) is decreased. Thus the network does not perform the function for which it was intended and instability may result. It is noticed that "fast" control systems, which are designed to have a high crossover frequency (large values of a) are less susceptible to catTier frequency shifts than systems which have a large effective lead network time constant. Tachometer Stabilization. Alternating-current tachometers are excited by the carrier frequency along with other components in an a-c control system and generate an amplitude-modulated signal proportional to velocity. This operation is not hindered by reasonable changes of the carrier frequency so that this method of stabilization is not subject to the limitations of a-c stabilizing networks. The analysis or synthesis of a-c systems using tachometers can proceed just as in the case of the d-c system described in Sect. 2 under Rate Feedback. Other Techniques. The previous sections have discussed means of stabilizing all a-c feedback control systems by using rate-producing components. Often it is necessary to add gain at lmv frequencies and reduce this gain below the desired crossover frequency by means of an integrating or reset component. This is the type of system (using d-c signals) described in Sect. 2, Phase Lag Compensation. FEEDBACK CONTROL 23-54 Theoretically, an a-c carrier lag network can be constructed by using a bridged-T circuit in the feedback channel of a feedback amplifier. However, since the time constant of a lag network has to be considerably larger than used in a lead network, the effect of carrier frequency shifts usually makes this method impractical. Commonly lag networks are obtained by rectifying the carrier to direct current and then using d-c networks. This procedure then is the same as that of Sect. 2 and the system is no longer an all a-c system. Another method to obtain an effective lag network is to use a small "reset" servo in parallel with the signal channel, as shown in Fig. 28. Error signal + + K Reset servo transfer function + Tms) = - - -___s(l K 1+ s[(l +K) K (1 FIG. 28. + Tms) + Tms] Reset servo channel in parallel with signal channel. The servo channel has high gain at low frequencies but, because of the tachometer feedback, this gain falls below that of the regular signal channel below the crossover frequency. 4. OPEN-CLOSED LOOP CONTROL Open-closed loop control, sometimes called schedule and trim, is not so much a different kind of control as it is a different way of visualizing or synthesizing a control system. The principle is that an open loop control system, although not accurate enough for the complete control, responds predictably and stably to an input signal, and it can be used as an approximate, or first order correction, control system. Then the required accuracy can be obtained as a correction or "trim" to the open loop and is accomplished by the use of a relatively slow but high gain feedback loop. This is illustrated in the block diagram of Fig. 29, which is modified for analysis purposes in Fig. 30. The only closed loop to consider is the easily stabilized, low crossover frequency loop and that the required highfrequency response is obtained by the open-ended forcing function. In FEEDBACK SYSTEM COMPENSATION 23-55 the actual system this forcing action is attained by the parallel signal channel to the power element. An allied situation exists when there is difficulty in measuring the control system output accurately and immediately. The trouble may be in an Schedule channel + FIG. 29. c Block diagram of open-closed loop control. inherent delay in the measuring device, such as the time lag of a thermocouple measuring temperature, or it may be caused by the need for a smoothing and averaging process to attain accuracy from noisy data. In such cases, the delayed but accurate measurement can be used in a trim feedback control loop and then an internal, fast response, feedback loop is formed by using an alternate measurement. This alternate quantity is c R FIG. 30. Modification of Fig. 29 for analysis. related to the desired output in a known way, such as a pressure change which accompanies a change in temperature, but accuracy of such a relationship is not high enough to use the alternate quantity as the ultimate measurement of the desired output. Open-closed loop control is covered in detail in Ref. 9. ACKNOWLEDGMENT Tables 3 to 7 are reproduced with permission from H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. II, Wiley, New York, 1955. 23-56 FEEDBACK CONTROL REFERENCES 1. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. I, Wiley, New York, 1951. 2. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. II, Wiley, New York, 1955. 3. J. G. Truxal, Automatic Feedback Control System Synthesis, McGraw-Hill, New York, 1955. 4. Paul E. Smith, Jr., Design regulating systems by error coefficients, Control Eng., 2, 69-75 (1955). 5. Leonard Stanton, Theory and application of parallel-T resistance-capacitance frequency-selective networks, Proc. I.R.E., 34, 447-456 (1946). 6 ·W. R. Evans, Control System Dynamics, McGraw-Hill, New York, 1954. 7. J. E. Gibson, 14 ways to generate control functions mechanically, Control Eng., 2, 65-69 (1955). 8. D. M. Considine, Editor, Process Instruments and Controls Handbook, McGrawHill, New York, 1957. 9. John R. Moore, Combination open-cycle, closed-cycle control systems, Proc. I.R.E., 39, 1421-1432 (1951). E FEEDBACK CONTROL Chapter 24 Noise, Random Inputs, and Extraneous Signals D. L. Lippitt 1. Introduction 2. Mathematical Description of Noise 3. Measurement of Noise 4. System Response to Noise 5. System Design in the Presence of Noise References 24·01 24-02 24-06 24-11 24-15 24-19 1. INTRODUCTION Linear systems can be designed to obtain a desired response to commands and disturbances which may be exactly defined either by an equation or by a graphical plot (Chaps. 19 through 23). In many cases inputs can be described adequately only in a statistical manner. Examples are the jitter observed in automatic radar tracking systems and gust disturbances to an aircraft. This chapter covers methods for: (a) Measuring and describing statistical inputs. (b) Computing the system response to such inputs. (c) Specifying optimum designs. 24-01 24-02 FEEDBACK CONTROL 2. MATHEMATICAL DESCRIPTION OF NOISE Random processes are described in Chap. 12, Sect. 16, and Chap. 13, Sect. 2. It is sufficient to note here that a random process has a complete set of probability distribution functions. If these distributions are independent of time, the process is stationary and its characteristics can be defined by time averages (Ref. 1). Autocorrelation. The most useful description of a random process for control system analysis is the autocorrelation function cf> defined by eq. (1) for a stationary function of time x(t): cf>xx(T) = lim - (1) T-~ 1 2T f+T x(t)x(t + T) dt. -T Figure 1 graphically illustrates eq. (1). For nonstationary processes the autocorrelation may be described by an ensemble average that is a function x(t)x(t Fro. 1. + T) Illustration of the computation of the autocorrelation function. NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS 24-03 of time as well as r. This definition is given in eq. (2): f f +00 (2) CPxx(t, r) = -00 +00 XIX2 P (t, XI, t + r, X2) dXI, dX2, -00 where pet, Xb t + r, X2) is the joint probability density of Xl at time t and X2 at time (t + r). Important properties of the autocorrelation function for stationary series are: (3) cpxx(r) ~ CPxx(O), cpxx(r) = CPxx( -r), (4) (5) In eq. (5), the bar indicates a time average. An interesting example is the autocorrelation of the function sin wt. = A sin wt. X(t) (6) 1 cpxx(r) = lim T-H~' 2T = A2/2 f+T A 2 sin wt sin wet + r) dt -T cos wr. Although sin wt is not strictly stationary, this example illustrates the effect of a pronounced periodicity in noise data. If it exists, it will show up in the autocorrelation as cosine function. Cross-Correlation. In some cases a control system will have two inputs, X and y, which are not completely independent. The relationship is expressed by the cross-correlation function defined by eq. (7) for stationary series: 1 cpXy(r) = lim x(t)y(t + r) dt. (7) T-Hfj 2T -T f+T For nonstationary series CPXy must be expressed as an ensemble average as given by eq. (8): f f +00 (8) CPXy(t, r) = -00 +00 xyP(x, t, y, t + r) dx dYe -00 Important properties of the cross-correlation function for stationary series are: (9) (10) [CPXy(r)]max < x2 or y2 (whichever is larger), cpXy(r) = Cpyx( -r). 24-04 FEEDBACK CONTROL cP(T) ~(W) ~(W) =p =0 -Wo Wo cf>(T) =2woP [ Si~;OT] ) (a) T (b) FIG. 2. (a) and (b) Examples of autocorrelation and spectral density pairs. The autocorrelation of the sum of two correlated functions is given by eq: (11): (11) If x(t) and yet) are independent: (a) The cross-correlations become constants equal to the products of their means or xy. (b) The au tocorrela tion of the sum becomes the sum of the individual autocorrelations plus twice the product of the means, or CPxx(T) cPyy(T) + 2xy. + Spectral Density. An alternate description of a stationary random process is the spectral density, <1>(w). It is a measure of the distribution of energy in the frequency spectrum. For a voltage wave the units would be volts 2 per radian per second. The following discussion is not rigorous but will show the physical s'ignijicance of <1>(w). Assume that several samples of noise of duration T NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS ~(W) ~ (W) = 24-05 c/J(r) 2a{3/'rr(a 2 + (32)~ -::------=..;.:.:....:....:..~---=--~--.....,.... [a 2 +({3 + w)2][a 2 +({3 - W)2] c/J(r) =e -aIT1COS ({3r + q,) (c) ~(W)=P q,(r) = 21l"Po(O) T (d) FIG. 2. (c) and (d) Examples of autocorrelation and spectral density pairs. seconds have been expanded in a Fourier series of the form shown in eq. (12) : (12) where Wn = 2n7l"/T~ an = ..: rT x(t) cos wnt dt, TJ o bn = -2 T iT x(t) sin wnt dt, 0 cn 2 = a n 2 + bn 2 ; 1/;n = tan- 1 (bn/a n). AssuID:e that T is very long compared to the longest periodicity present in the function. If the c's and 1/;'s for a given value of n are considered over a large number of samples, it will be found that the 1/;'s are uniformly distributed between ,+~and -71" and that the cn 2 ,s have an average value cn 2, A knowledge of Cn 2,S for a, given process is sufficient to predict the 24-06 FEEDBACK CONTROL output of a linear control system with a transfer function G(jw) between the noise input and the system output. The mean square of the output is (13) The experimental determination of the Cn 2,S would be relatively inefficient. However, the cn 2 ,s are related to the spectral density by eq. (14) for large T. (14) where wn = nLlw = 27rn/T. Hence if the average value of the input is zero, becomes eq. (15): Co is zero and eq. (13) 21. "'(ol) 1G(Ol) 12 dOl. 00 (15) ",2 = The spectral density is related to the autocorrelation function by the Fourier cosine transform as shown in eqs. (16) and (17): (16) (17) ~(w) 1 =- 27r ¢(r) = f f+oo ¢(r) cos wr dr, -00 +00 cp(w) cos wr dw. -00 The cross-spectral density ~xy(w) bears the same transform relationship to the cross-correlation function ¢Xy(r) as the spectral density does to the autocorrelation function. Figure 2 shows several pairs of spectral density functions and autocorrelation functions. 3. MEASUREMENT OF NOISE The greatest problem involved in the analysis of the response of a control system to a random input is obtaining the required characteristics of the input. If the input is stationary, long samples are necessary and numerous calculations must be made. In most cases the use of high-speed digital computers is necessary if the job is to be completed in reasonable time. If the input is nonstationary, the magnitude is multiplied many times since the results must be calculated separately for each value of time. Calculation of ¢(r) for Stationary Inputs. The most straightforward methods of analysis if the input is stationary are to compute the auto- NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS 24-07 correlation function defined by eq. (1). The approximate form for calculation is given by eq. (18): N-m 1 (18) ¢(m Lh) = L XnX n +rn , N - m + 1 n=O where Ar is the time interval at which values of the function are read and is the value of the function n Lh seconds from the beginning of the sample. SaInpling Rate. The value of Ar is set by Shannon's sampling theorem Xn 1 Ar = - . (19) 2js The value of js is the highest frequency present in the data. In general, this will not be known. In most control systems there are considerations other than noise which set an upper bound on the system band pass so that a filter may be inserted in the device which records the sample to eliminate frequencies not of interest. This is desirable since by increasing Ar the number of calculations is reduced. Required Range of r. The maximum value of r is determined by the use to which ¢(r) is to be put. If it is desired to compute the variance of a system output, reference to eq. (29) will show that mmax Ar should equal the longest anticipated settling time of the system output to an impulse applied at the noise input. An insight to the effect of using a finite value, mmax, can be obtained by performing the integration of eq. (16) over a finite range or by multiplying the true autocorrelation function by a function u( r) which equals unity for - T < r < + T and zero elsewhere. Then the approximate spectrum is given by eq. (20): (20) approx(w) f+X> u(r)¢(r) cos wr dr. 1 =- 271" -00 Since the true spectrum is the Fourier transform of ¢(r), and (T /71") (sin wT/wT) is the transform of u(r), the approximate spectrum is given byeq. (21): T sin aT (21) (w - a) - - - da. f+oo 71" -00 aT For example, if the time function were a pure sine wave of frequency Wo of unity power, the true spectrum would be given by eq. (22): (22) (w) = i[o(wo) where o(w) = impulse at w. + o( -wo)], FEEDBACK CONTROL 24-08 Then if the autocorrelation were computed only for (- T < r < + T), the approximate spectrum computed from the result would be given by eq. (23): (23) (w) which are physically impossible. Required SaInple Length. The required sample length for computing cp( r) depends on the use to which the result will be put and to a certain extent upon the frequencies contained in the input fllnction. A 'useful rule of thumb is that N in eq. (18) should be at least 10 times the maximum value of m. As a check, the autocorrelation can be computed for two samples of equal length. If then the results are nearly equal, the samples are NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS 24-09 probably long enough. If not, the average of the two should be compared with results from a sample twice as long and so forth until agreement is reached. Once an approximate knowledge of the frequency components of the noise has been obtained, a better estimate can be made of the required sample length. Reference to eq. (28) shows that computing the system output from the autocorrelation function of the input is equivalent to computing the output directly by convolution and averaging the square. Hence, if a sample of the system output T seconds long is sufficient to give an accurate measure of the output mean square, a sample T Ts seconds long, where Ts is the system settling time, is sufficiently long to compute the autocorrelation of the input. If ¢O(T) is the estimated output correlation, the ratio of standard deviation of the computed output mean squares cr2[ao2] for samples T seconds long to the true mean square is given byeq. (24): + 2 (24) cr [;02] 0 cr = [ 2 ~ T ¢ (0) iT (T _ T)¢02(T) dT] V2 • 0 Figure 4 shows a plot of this ratio where ¢O(T) is (e- aiTi ) as a function of (aT). References 2,3,4 give a more detailed consideration to the problem. 1.0 0.1 0.01 L...-_ _ _ _ _---L._ _ _ _ _ _- L - _ 10 100 1000 aT FIG. 4. Standard deviation of errors in computing the output mean square error from a finite length sample. Nonstationary Inputs. Any process observed in nature is nonstationary in the strict sense of the term. However, in many cases the input characteristics will vary so slowly that samples long enough to compute FEEDBACK CONTROL 24-10 cf>(r) can be considered stationary. In this case the techniques previously discussed are applicable. A slightly more difficult problem exists when the input changes too rapidly to obtain a sufficiently long sample, but it still does not change appreciably during the settling time of the control system. In this case, several recordings of the input must be obtained over the range of characteristics of interest. Then short samples can be drawn from common points and the autocorrelation averaged. The amount of computation required is vastly increased. One type of slowly changing nonstationary function can be treated in a more simple manner. If the frequency components retain the same amplitudes relative to each other but where absolute magnitude increases or decreases with time, a single recording can be used. The input is divided into several short samples. The autocorrelations of each sample are normalized, so that the function is divided by cf>(0), and the normalized functions are averaged. The autocorrelation for any specific time is then the averaged normalized autocorrelation function times the mean square value of the input corresponding to that time. Note that this technique is helpful only if it is known that the spectrums have the same form. Otherwise, more data would have to be taken to establish the point. If the input characteristics vary appreciably during a settling time of system, the problem becomes immensely complicated. To compute cf>(r, t1 ) it is necessary to average the products from many recordings of the input. At least one hundred products would be necessary to obtain 10 per cent accuracy when the output of a control system is calculated. Correlation Computers. Special computers for the computation of correlation functions can be built where many correlations must be done and high-speed digital computers are not available. The basic principle is illustrated in Fig. 5. The noise is recorded on a media such as magnetic c) -d- CD ® (0 ,-------1 Integrator FIG. 5. Correlation computer. tape and played back through two reading heads spaced a distance d apart. The second reading head receives the same signal as the first except NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS 24-11 that it is delayed by a time equal to d divided by the tape speed. The two outputs are then miltiplied and the result integrated. The output of the integrator is then given by eq. (25): (25) I = ff(t)f(t - r) dt. I may then be divided by the time interval over which the run is made to get the autocorrelation. Other possible recording media are photographic film and ink recordings tracked by hand (Ref. 5). Still other methods use pulse sampling and storage. Computers of this type depend on the availability of the noise in the right form. For instance, radar tracking data are usually taken with a moving picture camera so that the data must be read frame by frame before they are useful. Also, several records may have to be combined to arrive at the noise. In such cases, it would be simpler to use a general purpose digital computer. The accuracy of such systems is limited by the recording mechanism, multiplier, and integrator. To obtain reasonable accuracy these units become bulky and expensive. 4. SYSTEM RESPONSE TO NOISE Tilne DOIllain Methods. The response of a linear dynamic system to an input x(t) is given by the integral in eq. (26): (26) Substituting this eq. (26) in eq. (1) gives the autocorrelation function of the output if the input is stationary: (27) cpcc(r) = 1 lim T-+~ 2T f +TJro g(s) Jro g(r)x(t + r 00 00 r)x(t - s) dt ds dr -T The mean square value of the output is obtained by setting r = 0 in eq. (27) : (28) c2(t) = .£00g(s) .£00g(r)q,xx(s _ r) ds dr. FEEDBACK CONTROL 24-12 By an appropriate change of variables the alternate form of eq. (28) is given by eqs. (29) and (30): f' +00 c2 (t) = (29) cPxx(r)cPgg(r) dr, -00 cPgg(r) = (30) foo g(t)g(t + r) dt. -00 Strictly speaking the transition from eq. (28) to eq. (29) is possible only if the input x(t) is stationary. However, if the characteristics of x(t) vary only slightly during one settling time of the control system and if cP(r) has been computed from a sample which is long compared to one settling time, eq. (28) is approximately true for nonstationary inputs. For the more general case of a linear time-varying system with a nonstationary input, the mean square output at time (t) is given by eq. (31): . , ~ "" i~g(t, 8) i~g(t, r)>(t - (31) r, r - .) d. dr, The autocorrelation function is defined in this case by eq. (2). The function get, s) is defined as the effect on the system output at time t of an impulse applied at time (t - s). The wavy line over c2 (t) in the equation indicates an ensemble average rather than a time average. Frequency DOInain Methods. For cases where the spectral density is known, the mean square of the output is given by eq. (32): c2 (t) (32) = i~ (") IG(jOl) 12 dOl, 2 (a) x(t) ---II ~211----.l. ~C(t) y(t) -----II ~1 -If G2 I-_ _ (b) FIG. 6. Control system with two inputs: (a) actual circuit; (b) equivalent circuit. NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS 24-13 A more general case is shown in Fig. 6 where there are two inputs to a control system and the inputs may be correlated. In terms of the equivalent circuit, the mean square output is given by eq. (33): f.oo[l Gx(jw) 12iJ>xx(w) + IGy(jw) 12iJ>yy(w) c2 (t) = 2 (33) + Gx*(jw)Gy(}w)'PXY(w) + Gx(}w)Gy*(}w)'Pyx(w)] dw. The starred transfer functions are complex conjugates. COInputer Methods. A modification of eq. (28) leads to an analog computer method for computing noise output. If the input noise has a constant spectral density K, the autocorrelation function becomes a delta function at r = 0 with strength 27r1C Then the system output is given by eq. (34): . (34) c2(t) = 2"K g2(r) dT. f.oo Generating the impulse response by analog techniques, squaring it, and integrating the result give the mean square output. If the input noise JL Impulse FIG. 7. Computer simulation to compute the mean square output for a correlated noise input. does not have a constant spectral density, the output can be computed from the system shown in Fig. 7. The filter transfer function is specified byeq. (35): (35) The same technique can be used for linear time-varying systems, and for nonstationary random inputs if the inputs are equivalent to stationary noise passing through a linear time-varying filter. For this case eq. (34) becomes eq. (36): (36) ;?ft) = f.oo g2(t, T) dT, 2"K where get, r) has the definition given following eq. (31). The function get, r) with r as the variable can be generated from the adjoint of the control system and a shaping filter (Ref. 6). The following is quoted from Ref. 7. "The adjoint is found from the analog of the original by: l.' Turning each element in the loop around and reversing the direction of signal flow. FEEDBACK CONTROL 24-14 2. Letting the variation of time-varying element start from some time tl and run backward relative to the action in the original system. 3. Interchange the input and output of the system. The new input is oCt - t 1)." The output is then get, T) as a function of T. This output can then be squared and integrated to give a machine solution of eq. (36). Figure 8 shows an example of a system (a) and its adjoint (b). fl (t) I--:-......,..-~ c(t) x(t) f'l----r t- JL Impulse (b) FIG. 8. Analog of a system (a) and its adjoint (b). Noise Generators. Many nonlinear control systems and systems involving a human operator will not yield readily to analytical techniques. In these cases simulation using a random noise generator is required. Several such generators (Refs. 8, 9, 10) are available. In general, the output is a flat spectrum with various amplitude distributions possible. Where the spectrum of the true input noise is known, a shaping filter, as specified in eq. (35), can be used to modify the output of the noise generator. Although the use of a noise generator and a simulated system provides a simple solution to many complex problems, it is also a time-consuming NOISE, RANDOM INPUTS, AND EXTRANEOUS SIGNALS 24-15 one. The methods of the previous section of this chapter provide an indication of the sample lengths required where non-time-varying systems are tested with stationary inputs. For time:-varying systems, the answers will be of interest at one or more times during the run. The number of runs required is determined by standard statistical methods. 5. SYSTEM DESIGN IN THE PRESENCE OF NOISE Previous sections of this chapter have shown methods for describing a random input or disturbance and methods for computing the response of a system to these inputs. It remains then to establish procedures which can be used to apply these methods to the design of control systems. Unfortunately each problem is a little or greatly different from any general case so that the designer must examine his problem and determine what methods are adequate. Mean Square Error Criteria. Practically all the work covered in this section aims at minimizing the mean square error of the system. In some cases restraints are' placed upon the solution in an attempt to conform more nearly to the practical situation. The limitations of this approach are listed below. 1. The mean square error may not be the proper criteria. For instance, in a gun fire control system the object is to maximize the probability of destroying the target. 2. The data concerning the system inputs will seldom be exact enough to warrant an extended analysis or to justify the system complexity required to realize the desired response. 3. The optimum design may be very sensitive to practical limitations of the system, such as gain variations. As a result, it is suggested that formal methods of optimum design are good guides for a design but that more useful results are obtained by starting with a conventional design and varying the parameters to minimize the noise error as computed by formulas in Sect. 3 of this chapter or by analog computers. OptiIllUIll Design for Stationary RandoIll Inputs. For the case where the signal and the noise enter the system at the same point and are both random and stationary and their cross-correlation is zero, the linear filter giving the least mean square error is given by eq. (37) (Ref. 11): (37) G (.) cps(w) G (. ) opt JW = cps(w) cpn(w) d JW , + where cps(w) = signal spectral density, cpn(w) = noise spectral density, Gd(jw) = desired transfer function if noise was not present. FEEDBACK CONTROL 24-16 The mean square error is given by eq. (38): (38) 2 0" Emin = f+OO ,,(t, t - 7) ~ 1. 00 >II(t - 7, t - r)g(! - r, t) dr, where get - r, t) is the response of the control system at time t to an impulse applied at time (t - r) and the autocorrelation functions are defined by eq. (2). Equation (49) will usually require a numerical solution. The response get - r, t) is the optimum only at time t. A different time will in general require a different response so that the system must be time-varying. The values of get - r, t) are most easily obtained by using adjoint techniques with an analog computer (Ref. 7). A cut-andtry variation of parameters can be used to approximate the calculated optimum response. Systelll Optilllization Under Constraints. The response called for by the minimum squared error criteria may call for unrealistic demands on components by requiring extended ranges of operation to prevent saturation or by requiring highly rated components to prevent overheating. Hence, the optimization should more practically be carried out under the restraints of rms power dissipated in the output or signal level at various points in the system (Refs. 16 and 17). EXAMPLE. Suppose that the input consists solely of a signal with spectral density 1 must be true for phenomena to occur in single loop system with simple saturation. ( IG I = open loop gain, 'Y = phase margin). (See Ref. 3.) COMMON NONLINEAR PHENOMENA Characteristics and/or Effects When excited by a sinusoidal driving signal, the resonance is normal for small amplitude signals. Theoretically, for larger amplitude signals the resonance bends as in A below. In practice the three-valued function cannot be measured but the response will appear to jump as in B below. The jump will occur at different values of frequency depending upon whether the frequency is increasing or decreasing. The phase characteristic exhibits a corresponding jump. locus of /resonant peaks large inputs ,Jf.._ ~~ ~ .........., E < .• ~ \ Frequency_ A. 2. Limit cycle or bounded Occurs in unstable or conditionally stable nonlinear systems. Describing func- Small mputs .g 'K ~ "-""~ • t-=:=:= General Remarks For jump resonance to occur the system must be second order or higher. To have significant bending the damping must be 0.1 or less in a second order system. The phenomena can occur in systems with saturation or increasing gain characteristics. The bending is to the right for increasing gain characteristics and to the left for saturation. The normalized second order-equation of the type dx 2/dt 2 + dx/dt + f(x) = F cos wt (Duffing's equation) has been solved for various forms of the function f(x) and each case has exhibited the jump resonance when the viscous damping was small. (See Ref. 4.) The existence of the jump resonance can be confirmed by the use of describing functions. (See Refs. 3 and 5.) B An unstable linear system can exhibit oscillations that grow without bound. A nonlinear system that is unstable Limit cycle oscillations can arise from a wide variety of system conditions. Conditionally stable systems with sat- "T1 m m oo;J » n '" n 0" z -t :;;0 o r- oscillations 3. Subharmonic generation tion analysis shows that conditions of IG I = 1 and 'Y = 0 must be met for phenomena to occur in simple systems. For a stable system with an unstable limit cycle it is necessary to excite the system beyond the level of the limit cycle to obtain self-sustained oscillations. Appears in nonlinear systems excited sinusoidally. No general rules are available defining the necessary conditions for occurrence. The phenomena have been observed in lightly damped systems with nonlinear restoring force and in systems with nonlinear energy delays. can oscillate at fixed amplitudes. Such oscillations are referred to as limit cycle oscillations. Limit cycles can be either stable or unstable depending upon whether the oscillation converges or diverges from the conditions represented. Depending upon the system characteristics, the limit cycle oscillation can vary from nearly simple harmonic oscillation to a highly nonlinear, relaxation type oscillation. Self-excited oscillations arising in a stable system with unstable limit cycle are referred to as soft oscillations. Selfsustained oscillations which occur after the system has been excited to a given level (unstable limit cycle) are referred to as hard oscillations. When the output contains subharmonics of the input exciting frequency, the phenomena is referred to as subharmonic generation. uration will contain both a stable and unstable limit cycle and an unstable system with saturation will have one stable limit cycle. System imperfections that appear at low signal levels (backlash, friction, etc.) can, under the proper conditions, cause limit cycle oscillations. Existence of this type of limit cycle makes it necessary to define instability in terms of the acceptable magnitude of an oscillation since a low level nonlinear oscillation mayor may not be detrimental to performance of the system. Because soft and hard types of oscillations can exist, the designer must specify the input range completely in evaluation or synthesis of a nonlinear system. Limit cycles can be most correctly explained by use of the phase plane; however, the magnitude and fundamental frequency of the limit cycle can be estimated to a first order of magnitude by means of describing functions. Systems with elements having hysteresis, i.e., backlash, magnetic. hysteresis, friction, have been known to exhibit this type of performance when excited with a sinusoidal input. The transition from harmonic to subharmonic operation can be quite sudden, but once the subharmonic is established, it is often quite stable. (See Ref. 8 and its bibliography.) z o Z !:: Z m > ;;:c U) -< U) --I m ?; U) t-:) 111 6 111 ~ 01 b 0- TABLE Type 4. Intermodulation effect on gain Condition for Occurrence Occurs in amplitude-sensitive nonlinear systems excited by two or more frequencies. The frequencies can be separate inputs or one input with a complex waveform. The amplitude of the complex wave must be sufficient to enter the nonlinear region. 1. COMMON NONLINEAR PHENOMENA (Continued) Characteristics and/or Effects Because of the amplitude-sensitive nonlinearity the frequencies will be intermodulated. This causes the original frequency components to have different amplitudes and phase shift than obtained from the nonlinear system with only one frequency present. This can be interpreted as a different phase shift and/or attenuation through an element. The effect is also apparent when noise is present with the signal. General Remarks In a simple saturating system the effect can be explained quite easily. Two frequencies are considered. Mter the saturation the amplitude of both will be reduced beyond that expected if only one frequency had been present. If we are considering the effective gain with respect to one of the two frequencies, the gain will have been reduced. This gain reduction in the open loop can be interpreted as reducing the gain crossover and therefore increasing the phase shift of the closed loop for the frequency being considered. (See Ref. 7.) By considering one of the frequencies as an extraneous signal, the effect of noise on the performance of a saturating system can be envisioned. The effect is particularly significant if the amplitude or phase shift of the closed loop is important to system performance. -n m m oC::I » () A () o Z -I ;;c o r- NONLINEAR SYSTEMS 25-07 mente A great deal of work of this type remains to be done in nonlinear system analysis. The major problem for the systems engineer lies in synthesis of a control. In synthesis one needs in addition to a full appreciation of the characteristics and a complete understanding of the nature of the task to be performed: (a) methods of rapidly, approximately estimating the effect of different types of compensation in order to allow selection of a potentially good approach and (b) having selected a general approach, a logical design procedure which converges on the "best" design. Although no such generally satisfactory method exists, the methods of linearization, describing functions, and phase plane analysis are powerful analytical tools for attacking nonlinear problems. (See Sects. 3, 4, and 5.) By and large these methods are not exact but often suffice for preliminary design calculations. The majority of these methods attempt to linearize the problem sufficiently to' allow the use of the well-known techniques used in the study of linear systems. Because of the difficulty in providing generalized design criteria or design charts for any but the simplest non1inear system, the synthesis of a system using nonlinear elements is primarily a cut-and-try process tempered with common sense. Unusual Phenonwna Peculiar to Nonlinear Systems. Many unusual phenomena occur in nonlinear systems. In a linear system the response to a given input defines the response to be expected from any input. This is not true for a nonlinear system. Cases arise where performance may completely deteriorate between a step response and a sinusoidal response. Table 1 summarizes some of the more common types of nonlinear phenomena which have been catalogued. The list demonstrates that the designer must be aware of the peculiar characteristics exhibited by nonlinear systems and completely specify the operating conditions in order to proceed with an intelligent, efficient design. The types of nonlinear phenomena described in Table 1 are those most commonly encountered in nonlinear feedback control systems. Other types of nonlinear phenomena have been catalogued; frequency entrainment, asynchronous excitation and asynchronous quenching, parametric excitation, etc. See Ref. 6 and its bibliography for more details on nonlinear phenomena. 3. METHODS OF' ANALYSIS: LINEARIZATION Frequency response analysis can be used only when the system is described by linear constant coefficient equations. Certain nonlinear systems can be linearized by use of the perturbation theory. The method assumes that for very small deviations about the operating point the system is linear. The perturbation method determines the coefficients of the new linear equation describing the performance of the system. Once 25-08 FEEDBACK CONTROL reduced to a linear form the usual frequency response techniques can be applied. Method of Evaluating Linearized Coefficients. A nonlinear function of one or more variables can be linearized if the function is analytic. (Refs. 6 and 9.) Expand the function in a Taylor series about the operating point and neglect all the second order and higher derivatives. One thereby considers only the incremental change about a nominal value. If the function is f(xI, X2, X3,· .. , xn) = f(Xi), then the Taylor expansion about point ai(xI = all X2 = a2, X3 = a3, etc.) is k ~ i, X~=O'l Xk=ok or and (1) for small values of Xi - ai. Equation (1) is a linear relation between tly and Xi - ai or tlXi, the incremental changes. The accuracy of the approximation can be estimated by evaluating the next terms in the Taylor series, e.g., the second order terms are k ~ i. Xk=a](. xt=at Alternate Method of Evaluation of Linearized Coefficients. The same linearization can be accomplished by substituting al + tlXI, a2 tlX2, etc., for Xl, X2, etc., in the original function, and neglecting all second order and higher terms, i.e., terms containing the product of the type tlXItlX2. Here al and a2 are the values at the operating point and tlXI and tlX2 are small devia ti ons. + NONLINEAR SYSTEMS EXAMPLE. tuting x = Xo Consider the nonlinear function 25-09 lex, y) w = = + AX, y = Yo + Ay, w = Wo + AW Wo + AW = xoYo + Ayxo + xoAy + Ax Ay. xv. Substi- If AX and Ay are small, the term Ax Ay is small and can usually be neglected. Wo + AW ~ XoYo + AXyo + Ayxo. Since Wo = XoYo, the remaining two terms must be deviations, therefore, (2) Equation (2) is a linear expression for Aw. For small variations about the operating point Xo, Yo, eq. (2) describes the performance. The coefficient Yo is the gain between Ax and AW, and Xo is the gain between Ay and Aw. The same relationship would have been obtained by using the Taylor series expansion. Table of Useful Algebraic Approxhnations for Linearization. Table 2. is useful when making the above substitution into nonrational equations. The terms in Table 2 were determined by considering the TABLE 2. USEFUL ALGEBRAIC ApPROXIMATIONS m«1 Algebraic Expression 1 1. 1 + m 2. (1 + m)n 3. em 1-m 1 +mn 1+m 4. loge (1 + m) 5. sin (m) 6. cos (m) 7. (1 Approximation + ml)(l + m2) +m2 + n(n - 1)m2 +m 2 2 2 m2 m 2 m rn 3 6 m2 1 1 Next Term in Series 2 + ml + m2 +mlm2 series expansion of the closed form. The last column of the table can be, used to estimate the accuracy of the approximation. To use the table, it is necessary to work the expression into a nondimensional form. 25-10 FEEDBACK CONTROL EXAMPLE. Consider the flow through a variable orifice. Flow, q, IS given by where A = orifice area, a variable, P = pressure drop, a variable, Cd = flow coefficient, a constant. By substituting the incremental change form qo + Aq = Cd(Ao + AAo)YPo + AP. Dividing the quantity under the radical sign by Po yields U sing approximation 2 in Table 2 for n = qo + Aq Y2 yields the expression ~ Cd(Ao + AAo) VPo (1 + ~ AP). . 2 Po By expanding and neglecting higher order and constant terms this reduces to CdAO) _ r::Aq = ( 2yP AP + (Cdv Po) LlA, o where the terms in parenthesis are the equivalent gain between AP and AA and flow, Aq. Graphically Linearizing Systent Characteristic. The analytical expression for the function need not be known. If the graphical relationTangent at woo Xo. slope t w FIG. 2. Wo ----- ow I =~ Xo uX ow Determination of the linearized coefficient from a plot of the function. NONLINEAR SYSTEMS 25-11 ship between the variables is known, a linear expression can be obtained by considering incremental departures from the nominal values at the particular operating condition being considered. For Fig. 2 then Wo + .1w = f(xo) + (slope of function at xo, wo) ~x OWl =Wo+- and ox ~x Xo ~w = -OWl ~x. oX Xo The method can be extended to functions of more than one variable by obtaining the slopes from the appropriate curves. Note that in dealing with functions of more than one variabl~ the slope must be taken so as to be independent of all variables but the ones being considered. In Fig. 3 a function of (b) FIG. 3. Determination of the linearized coefficients from cross-plots of a function of two variables: (a) xw plot and (b) yw plot. two variables is cross-plotted to obtain the independent slopes from separate plots. From these the linear relation for small deviations at values of the variable X2, Y2 is For functions of two variables that are reasonably regular it is usually possible to pick the values for calculating the slopes from a single plot of the function and avoid the labor of cross-plotting the functions (see Ref. 9). Use of Li,nearized Coefficients. The characteristics of the function or system are approximated by the linearized coefficients for small varia- 25-12 FEEDBACK CONTROL tions from the operating point. Therefore, for small disturbances, only the deviation terms need to be considered in determining the stability. The "constant" terms defining the operating point will remain the same to Ax a first order approximation. Under these conditions the block diagram l---~.1w for the multiplication w = xy is + shown in Fig. 4. Ay For small disturbances, the approximation can be substituted into FIG. 4. Equivalent block diagram for the system diagram and the usual the linearized small deviation expression methods of linear analysis and comfor the function w = xy. pensation used. Note that one is now dealing only with the deviations and not the total variable. For further applications of this method see Sect. 7. Lhnitations. The above approximations are limited to small deviations from the operating point. The errors get progressively worse as the signal level is increased, and considerable care must be exercised when dealing with large excursions. The validity of the approximation can be checked by evaluating the next terms in the Taylor series or the last column of Table 2. The perturbation theory is valid only if the derivative of the function exists. The method would be of doubtful value in dealing with a relay characteristic. The method is sometimes limited when the operating point is at 0. One or more of the variables will then have a steady-state value of 0, and terms involving that variable and the deviation can be lost. This can lead to an indeterminate or inaccurate solution. For instance, in the example of linearizing w = xy if either Xo or Yo were 0, the variation of .1w with the corresponding value of .1x or .1y is zero. See eq. (2). Although the error of the approximation is still .1x .1y this term becomes significant with respect to the other terms as Xo and Yo become small. Consider the example of the flow through an orifice. The linearized deviation expression is repeated here: As Po ~ 0, this expression becomes indeterminate and the analysis based on the approximation under these conditions loses significance. A linear expression no longer adequately describes these situations. NONLINEAR SYSTEMS 25-13 4. METHODS OF ANALYSIS: DESCRIBING FUNCTION General Definition of Describing Function. When an element is excited by a sinusoidal signal, the describing function is the ratio of the fundamental of the output signal to the sinusoidal input signal. A describing function of an element may be a complex quantity, characterizing both amplitude and phase relations between the input and output. The describing function may be a function of both signal amplitude and frequency. Use of the Describing Function. Within the validity of the basic assumptions, the describing function, representing the nonlinear element, can be substituted directly into the system equations for the nonlinear characteristics. The use of the describing function quasi-linearizes the frequency response equations. Since the describing function will be a function of amplitude, the system frequency response will be a function of both frequency and amplitude. The quasi-linearization of the system equations in the frequency domain allows the use of the Nyquist criterion to determine stability. Although the dependency of the frequency response upon both frequency and amplitude complicates the calculations slightly, practical methods are available which require little effort beyond that normally required in the plotting of a Nyquist diagram. Usefulness of Describing Function. The describing function provides an approach which allows a solution in the frequency domain. The ability to manipulate the system equation in the frequency domain is valuable because: (a) frequency response techniques developed for linear system are available for synthesis, (b) synthesis and analysis can be handled with relative ease, and (c) the technique is not limited to systems with few energy storage elements. Basic ASSulllptions of Describing Function Method. If the input to the nonlinear element is sinusoidal, then it is assumed that: 1. The output is periodic and of the same fundamental period as the input signal. 2. Only the fundamental of the output wave need be considered in a frequency response analysis. 3. The nonlinear element is not time-varying. 4. Only one nonlinear element is considered to exist in the system. Assumption 1 implies that no subharmonics are generated. If the element used in a system is driven by sinusoidal signal, the output of the device by assumption 2 is considered to be sufficiently filtered by 25-14 FEEDBACK CONTROL the system characteristics so that the signal fed back into the input of the nonlinear element is essentially sinusoidal. The degree to which the input to the nonlinear element must be "essentially sinusoidal" is determined by how critical the nonlinearity is to the wave shape of the driving signal. While the describing function cannot be obtained for a system in which the coefficients are time-varying because the output would not reach a steady-state periodic solution, the describing function can be obtained for an element with characteristics which are dependent on frequency. In such a case the describing function will be a function of both amplitude and frequency of the signal. If a system contains two nonlinearities of major importance, it is still possible to get a describing function for the system. Often the easiest way to obtain the describing function in this case is to lump the characteristics of the two nonlinearities and obtain an over-all describing function. In general, it is not practical to consider each nonlinearity separately. Theory of Describing Functions. Consider the system of Fig. 5. It is convenient in describing function analysis to have the nonlinear and Linear elements Nonlinear elements ~ FIG. 5. gn(m) I n ) I g(') c Block diagram of simplified nonlinear system. linear elements separated as in the figure. In the following it will be assumed that this has been done. The output, n, of the nonlinear element is related to the input, m, by n = [gD(m)]m. If the input is sinusoidal, then, by the assumptions made, the error M(jw) must be sinusoidal and of the fundamental frequency. N(jw) = GD[M(jw)]M(jw). The output, N(jw), can be represented by a Fourier series: where Nl (jw) is the first harmonic of the output N(jw). NONLINEAR SYSTEMS 25-15 By definition, the describing function is G M D1 ( I w I,) N 1 (jw) 111 (jw) =---= I N 1-(jw) I (Nl(jW)) - an Ie - - - , IM (jw) g M (jw) where GD1 (I M I, w) is the describing function as a function of amplitude, IM I, and frequency, w. Usually for convenience (I M I, w) and the sUbcript 1 are dropped; this leaves GD as the symbol for describing functions. Within the validity of the assumption that the harmonics are sufficiently filtered by the linear system elements so that the feedback signal contains essentially the fundamental, the harmonics of the output of the nonlinear element can be neglected, and the describing function GD can be used as a sedes element in the frequency response analysis. GD can be determined by conventional Fourier series analysis. (See Chap. 14 and Ref. 10.) Stability Criterion. For the system of Fig. 5 the frequency response is approximated by c . GDG(jw) (Jw) = 1 GDG(jw) + R I where GD = the describing function, G(jw) = the frequency sensitive portion of the system. For a minimum phase shift system (see Chap. 23) the system will be critically stable when the denominator is zero or 1 + GDG(jw) = 0, or (3) G(jw) = -l/GD, or (4) -GD = l/G(jw). When eq. (3) or (4) is satisfied, the system will have a sustained oscillation of the amplitude and frequency which satisfies the equation (Ref. 11). LiInitations and Accuracy of Describing Function Method. There are two major disadvantages with the describing function analysis: 1. There is no convenient method to determine the accuracy. A method proposed by Johnson (Ref. 12) becomes laborious for more than the most simple systems. 2. Frequency response analysis allows prediction of the transient response. A describing function analysis allows at best a qualitative interpretation. These designs can therefore predict stability and frequencies of oscillation, but are limited to crude rules of thumb in prescribing FEEDBACK CONTROL 25-16 a given stable response. It must be pointed out that such approximations are often all that is justified by the accuracy of other system data available. In a wide variety of applications, use of describing functions has allowed prediction of the frequency and amplitude of oscillations within 20 per cent. It is difficult to generalize, but usually the method will be most accurate when GD and G(jw) are varying rapidly in the region of intersection (Ref. 12). Erroneous results may be obtained in. some cases when the intersection is approximately tangential (Ref. 13). Because the assumption of sinusoidal input to the nonlinear element is not exact, the method works best when GD is not sensitive to wave shape of the driving function. NOTE. The describing function method of analysis is the only practical analytical technique for treating nonlinear systems which are higher than second order. Describing Function: Methods of Presentation Inverted Nyquist Diagralll. The equation for sustained oscillation, -GD = l/G(jw) , indicates that an intersection of the -GD and l/G(jw) loci is a point having a given frequency and amplitude at which sustained oscillation can occur. j Unstable region Real \, I Stable region '\ '- G(~W) locus FIG. 6. I " / ' ........ _ _ _ - / / / Inverted Nyquist diagram for a stable nonlinear system showing the stable and unstable regions for the describing function locus. In a normal inverted Nyquist plot the -1 point should be on the left as the l/G(jw) locus is traversed in the direction of increasing frequency for a minimum phase shift system. In this case, there is no longer a fixed NONLINEAR SYSTEMS 25-17 -1 point, but the system is stable if the - GD locus is entirely on the left of the l/G(jw) locus. ' The stability criteria is then as follows: If a locus of all possible values of GD is plotted, then the system will be stable if the - GD locus does not intersect the l/G(jw) locus and GD locus lies completely on the left-hand side of the l/G(jw) locus when the l/G(jw) locus is traversed in the direction of increasing frequency. (Valid for minimum phase functions.) A stable system is shown in Fig. 6. By plotting in this manner the frequency sensitive, G(jw) , and amplitude sensitive, GD , portions of the system have been separated and can be considered independently. Nyquist Diagralll. Equation 3 leads to a Nyquist diagram as indicated NOTE. in Fig. 7. - G -L locus D Stable region Unstable region FIG. 7. Nyquist diagram for a stable nonlinear system showing the stable and unstable regions for the describing function locus. The stability criteria is as follows: If a locus of all values of -l/GD is plotted, then the system will be stable if the -l/GD locus does not intersect the G(jw) locus and the l/GD locus lies entirely on the left-hand side of the G(jw) locus when the G(jw) locus is traversed ,in the direction of increasing frequency. A stable system is shown in Fig. 7. Log-Angle Plane Representation. It is sometimes more convenient to work with magnitude and phase angle semi-independently. This can be done in the case of describing functions by use of the log magnitude- , angle plot. These are the familiar coordinates used on the Nichols charts. (See Chap. 21, Sect. 7). FEEDBACK CONTROL 25-18 For the case of a nonlinear system, the critical point is 20 10glO / G(jw) / = 20 10gio /1/GD / L G(jw) = -180° - L GD • (5) If the conditions of eqs. (5) are met, the system is unstable. A typical plot of a stable system servo is given in Fig. 8. As long as the two loci do not intersect, the system will be stable. CIJ iii .c '0 (]) "0 Stable region Unstable region 0-----4----~------~-------------- Phase angle • FIG. 8. Log magnitude-angle diagram for a stable nonlinear system showing the stable and unstable regions for the describing function locus. Typical Loci for Nonlinear SysteIns. Table 3 shows a number ·of different types of loci which can be expected. In each of these diagrams, the frequency and amplitude loci have been plotted. The arrows indicate increasing frequency and amplitude of signal input to the nonlinear element. System B has an intersection at point 1. This indicates that the system will be unstable at amplitudes less than those at point 1 and will be stable for larger amplitudes. This system, therefore, will be unstable and oscillate at the amplitude and frequency of point 1. If a small disturbance is introduced into the system B described in Table 3, the system will appear unstable and the amplitude of oscillation will increase until point 1 is reached. If the amplitude of oscillation becomes larger than this, the system appears stable and any oscillation would tend to die down to that corresponding to point 1. The amplitude and frequency corresponding to point 1 are the amplitude and frequency of the sustained oscillation. Point 1 of system B is called a convergent point because disturbances at either side tend to converge at these conditions. This is contrasted with point 2 of system C, Table 3, which is a divergent point since disturbances which are not large enough to give this value of GD will decay and disturb- 25·19 NONLINEAR SYSTEMS TABLE 3. TYPICAL LOCI FOR AMPLITUDE SENSITIVE NONLINEAR SYSTEMS Diagram Type Stability criteria Inverted Nyquist Diagram 1 -GD = G(jw) Nyquist Diagram - ~= GD G(jw) Log-Angle Diagram 2010glO IG(jw) / = 20 10glO/1/GD /, LG(jw) = -180° - LGD A. Stable system B. System with a convergent point c. System with a convergent point 1 and a divergent poin t 2; Case I, stable for small signal D. System with a convergent point 1 and a divergent point 2; Case II, unstable for small signals and very large signals 25-20 FEEDBACK CONTROL ances which are larger will result in oscillations which tend to increase in amplitude. In system D, point 1 is convergent and 2 is divergent. COlllparison of the Methods of Presentation. All the methods are equally valid. The designer may thus choose the one with which he is most familiar and/or the. one which best fits the design problem. In the inverted Nyquist diagram many of the simpler describing functions are bounded, whereas in the Nyquist diagram the describing functions will be infinite for some conditions. The other factors that influence the selection of one form of the Nyquist diagram are still applicable. (See Chap. 21.) The log-angle method of depicting the stability of the system has advantages in synthesizing a system containing nonlinearities. It is somewhat quicker to plot since G(jw) can be obtained directly from a Bode diagram. This method of display also lends itself to use with Nichols charts and templates. (See Sect. 7.) Frequency Variant Describing Functions. (See Ref. 12.) Describing functions which vary with frequency as well as signal amplitude will appear graphically like the typical plot of Fig. 9. The describing function becomes a surface in three dimensions (magnitude, phase, frequency), and if this surface is pierced by the frequency locus (also plotted G(jw) I - Gn (w2) I - Gn (w3) Phase FIG. 9. Log magnitude-angle diagram for a nonlinear system with a frequency and amplitude sensitive nonlinearity. Intersections (1) and (3) are not significant because at t.hese intersections G(jw) and GD(w) do not have the same frequency. NONLINEAR SYSTEMS 25-21 in three dimensions), the system will be unstable at the frequency and amplitude of the intersection. In other words, to have a significant int-ersection, it is necessary to have the intersection of the GD(J'w, 1M I) and G(jw) loci occur at the same frequency. This is indicated in Fig. 9 at W2' Tables of Useful Describing Functions Alllplitude Sensitive Nonlincaritics. Table 4 gives some of the more common describing functions for simple amplitude sensitive nonlinearities, with corresponding graphs in Figs. 10-16. TABLE 4. Type of Nonlinearity Saturation Nonlinear Characteristic nl .r.- USEFUL AMPLITUDE SENSITIVE DESCRIBING FUNCTIONS Output Wave Shape, n (Input = m = IMI sinwt) ~ I\.) Graph of Describing Function Equations of Describing Function GD If 2 (. -1 -;;: sm GD _ _ S IMI S . -1 S ) + 1M/ cossm . IMI 01 ~ I\.) Fig. 10 '-/ Deadband -# -;zi! Slope =K Hysteresis /''--',m f.C\\\J/ \ 2 K -;;: (7r . -1 2" - sm B B 21MI ) Fig. 11 0 D::J G -D = /-',m /~n ~ ,,_ / .... ." m m ',......,.,/ K b= G -D = K b= » n ~/t a2 + b2 tan- 1 -b A n a 0 7r1~1 (IZI - 2) (' y'-/t + a2 b2 Fig. 12 MI IMI I+ (' MIMI ~:j . -1 cossm I _ 1 [7r . -1 a-;;: 2"+sm Negative deficiency (type 1) B 21MI - 21MI tan- 1 -b a _ 1 ( a -;;: 7r 2D . + /M/ cos sm -1 1 (' MI IMI H) ] Fig.13a -~(I~IY D 2/ M / -I ;:c 0r- H) H) cossm. Z Negative deficiency (type 2) Relay n,/ /s:e=K ;;~~n = 1 +; IMI GD = i~sin ~ Fig. 13 ' \::f ' .... _"I ~ ~ \ - V 2 D GD K I '-' 7r IMI + 1/1)/_ 1/1 2 _ 2 {3 Fig. 14 H) {3 -cos -1 (B + ~ Z 0 Z -H) = cos- (B21MI 1 Granularity Delay time ¥v m=step 1=;:=\ n Z ~ \ ,_/ m 2 V GD=;IMI I (Ai f, ~4-(2a-1)2 . Ai = largest mteger value of GD = IkII B » ;::c B2 ) IMI2 1 IMI + -, -2 B Fig. 15 Ul -< Ul -f > -21 m ~ Ul e-iwtd (There is no harmonic distortion in delay time. This expression is exact.) Time- Variable gain c: n = mK G D = 2_ IMI K - l r[(K y7r r[(K + 1)/2] + 2)/2] Fig. 16 t-.) t11 ~ (,.) 25-24 FEEDBACK CONTROL 1.0 ~ q C 0.8 c ~ u c: ~ 0.6 bO c: ;gu Vl Q) "0 "0 -f- / 0.4 ___1LC == Q) .~ ro .E0 0.2 t-+-t-t-t-7'f-t-t-l / Slope d"-~~SS~ ----,;" T1i = = / i Z V 0 / 0 " I I I I I 0.4 0.6 Ratio sllMI 0.2 FIG. 10. =~ nl == -I 0.8 1.0 Describing function for saturation. 1.0 - ~ t!;q #e=~ .-m=- '\ "- r\. 0.8 c:~ o ~ .2 '\ 1,\ 0.6 i\. bO c: ;g u Vl Q) _. rB1" . .I I = - 0.4 "0 "0 "" Q) .!::! ro 0.2 E 5 z "" r-... o o FIG. 11. 0.2 0.4. B 0.6 RatiO 21M1 0.8 ...... ..... 1.0 Describing function for deadband. NONLINEAR SYSTEMS ~ 1.0 o ~~ ~t-.. ~ .g 0.8 ~ ~ i'o... .a ~ 'c OJ) " ro E s:: 0.6 '\ o ~ '" ,\. s:: .a " r-... I"- Magmtude" ."- +ff tl.O ~ 0.4 o VI '+-+-1--;il--+-+-+t-I ~ I I -g I I I I roE 04 ! I I I ~ . r+~+~/~-r~~~+H-+~I~~~~i~-+~ .t! ~ 1/ i II 0.2 H'-iJ-HH-+-H-r~~t+-'~-+I-+1IH-+#-1+-H~ I i I O~~~~~~~~~~'~~I~~~~I~~~ o 0.2 0.4 0.6 Ratio B/(2IMI) 0.8 1.0 (a) 1/1 o~Ii:::: l-+... H/B=O "'~ 1"-- 1-..... ~'" Q) e ~ t-.. ~. -10 0 C§l 'l I"- r"--fo,.. r-.. ~ 1"- £ ~ -20 r-... 1'\ 0 ~ 1\ I\. .Q) I\. U) 1,\ III .c 1,\ c. g -30 \ , I I \ 0 u \ c: .2 tlO ~ -40 H/B =0.2 1 HIB = 0.4 f- I H/B =0.6 ff- 1\ 0 .~ Q) a " -50 0 If/f,=,l o 0.2 0.4 0.6 0.8 1.0 Ratio B/(2IMI) (b) FIG. 14. Describing function for a relay contactor with hysteresis: (a) magnitude, (b) phase shift. FEEDBACK CONTROL 25-28 1.4 [.,..--r-., "' "- 1.2 v ~ 1.0 V\ c bO , ,,-r .!::! II' I Q) ro E 0.4 II -I I "C "C 1\ \ IT 0.6 1\ V '1/11 .~ Q) I , I i :c I n I ~ ; 0 z . iii I" : 0.2 1':;/7 m-,' II I I I I I I I I I I I I I I I I o V .~ ~B~I I~! II o \ II I; c 1\ V 1/ I' ~ 0.8 VI 1 V Ii I I ~ u j II ,i !:XlI> u .... 1'\ [/i r\ 1 1/ 0.2 0.4 B 0.6 0.8 1.0 21MI FIG. 15. Describing function for granularity. Dashed lines are used when granularity has a finite number of steps. NONLINEAR SYSTEMS 25-29 10 I I I 1/ 6 ~ 4 0.4 0.2 0.1 / 1/ 0)/ "" i~ 2 ~ / ""r--.. 'f.'?I"". ~ '" ~ '"'" " "" 1----r-- :--- ---/ / - "- --'" r-- ~ ~ . . . .v / ,/ V VV I"- ......... ~, 1"- ...... r-_ ~~------- . . . . . v ~r/' /~ /V 1--"'" "" "--- J{ :::.1.2 I - f-I~-;.;- f...-- t/ /~J ""-"-..... """ ......... i'-.. ~ r-- (£:::0.8 f--f--I-r-. " I'....... k' I - l - I"" I/ '' li I I~ / VV 1/ _I-::::f~ ....... 'Art II V V vVI V / - ~ VV V V~ V/ I V lt~~~ L~ V / J V/ V V 4.11. VI ~ \.~I ~ ~ V .$ ~ r--..~tl ~ f' ...... r-.., -I- r"- n=mK J 0.2 0.1 FIG. 16. 0.4. 0.6 1 2 Input magnitude, IMI 4 6 10 Describing functions for variable gain elements, n = mK • COIllplex Nonlinearities. Table 5 gives the block diagrams for nonlinearities which are frequency sensitive or which cannot be conveniently separated from frequency sensitive elements. Included in this table are the describing functions for several of the nonlinear elements, and graphs are given in Figs. 17-19. The block diagrams can be mechanized on an analog computer and, in general, if extensive investigation is needed, a computer solution is recommended. See the instruction manuals of the particular computer, to be used for details on computer circuits. TABLE Type of Nonlinearity 1. Motor velocity limiting 5. DESCRIPTION OF TYPICAL COMPLEX Description or Diagram Ka = acceleration constant Ke = viscous damping constant J = motor-load inertia = motor time constant T = JIKeKa M' = MIKe, normalized in- N ONLINEARITIES Assumptions 1. Linear machine performance except for limiting. 2. Limiting of motor speed is abrupt, Le., limiting is absolute and does not occur gradually. General Remarks Limiting velocity appears as increased damping; i.e., an effective increase in Ke and therefore an apparent decrease in motor gain. In general the decreasing gain and phase shift of this type of limiting tend to make the system more stable but sluggish when limiting occurs. The block diagram can be extended to higher order systems. Note that in modifying the block diagram to suit a different situation when a variable is limited, the derivatives of that variable must go to zero in a physical system. The describing function for the simple one energy storage motor shown in the block diagram has been determined experimentally in Ref. 16 and is given in Fig. 17. The configuration for velocity limiting is applicable to hydraulic, doc and a-c servo motors and can be easily mechanized on an analog computer. 1. Linear machine performance except for limiting. 2. Limiting of motor acceleration is abrupt. Limiting acceleration appears as an increase in motor inertia to torque ratio; i.e., a decrease in KalJ and therefore an apparent increase in motor time constant. However, because the damping is not affected, the motor low-frequency gain (liKe) is unchanged. The resulting mcrease in phase shift with this type of limiting can lead to serious performance deterioration. Torque limiting slows the initial response and also the rate of correction for overshoots. The latter can result in large overshoots. The describing function of the simple one energy storage motor has been determined experimentally in Ref. 16 and is given in Fig. 18. The effect of acceleration limiting can be estimated by simply modifying Ka to account for the limiting. The gain change can be obtained by using the describmg function for saturation, Table 4. l.No gearing bounce. 2. Only viscous damping. 3. Lumped parameter representation of gearing, motor and load (see column 1). The large number of variables in the complete problem precludes the derivation of simple describing functions. Describing functions have been derived for a simplified configuration assuming no viscous damping and either (a) a unidirected Coulomb friction force (see Ref. 14) or (b) a J m » J L so that the motor motion is not affected by reflected load torques (see Ref. 15). Relationships have also been determined experimentally (see Ref. 53). The difficulty of handling the complex describing functions, the restrictions of the basic assumptions, and the desirability of using multiple feedbacks in put N L = limiting velocity !!. _ 11K. . M - T8 l' lmear response + 2. Motor acceleration limiting ~ K a , K e , J, T, M' same as above NL = limiting acceleration o TNL = tor~ue limit referred to normahzed input M' w !!. _ 11K. M - Ts + l' 3. Backlash (d-c shunt motor driving load) JM.JL = motor and load inertia respectively referred to same speed: gearing inertia divided between motor and load, depending upon location of backlash Motor . lmear response Describing Functions Fig. 17 Fig. 18 KL = effective spring practical systems limit the usefulness of this approach. The approximations mentioned in the text or below in 4 and 5 usually suffice for preliminary studies, and a computer study is usually necessary. for a more thorough investigation. See Sect. 7 for a more detailed discussion of backlash effects. constant between motor and load DL = effective load damping with respect to fixed reference B = backlash angle KT,R,K. = motor torque, impedance and velocity, constants The block diagram can be modified to include (a) Coulomb friction by making DL a nonlinear function, (b) speed and acceleration limiting by modifying the motor blocks as shown in 1 and 2 above, (c) multiple backlash by properly dividing the inertia, springiness, etc. 4. Simplified backlash (high load damping) t-J W 5. Simplified Coulomb friction }J=GD . 4F I+J7I'KINd N 1 = fundamental of output This presentation is useful for many types of instrument servos where there is little coasting by the load. Use the hysteresis describing function, Table 4, for backlash where H = total backlash angle. Table 4 Fig. 12 1. Zero load mass, JL =0. 2. Zero viscous damping. a. Friction force equal and opposite to applied force up to a maximum of F. 4. No effect on driving motion of refleet load forces. This describing function is useful when Coulomb friction is a major influence on performance (see Ref. 17). Magnitude and phase are the same as for a simple time constant where frequency Tw = 4F/(1I"KINll) A more exact analysis can be made by using analog computers and introducing friction as a nonlinear damping term. DL, i.e.; Fig. 19 reflected load torques have a negligible effect. 2.VhK/DL«l, i.e., high load damping so that the load does not coast. a.Instantaneous acceleration of the load up to speed when the gears engage. til Nt 1. J m » JL, i.e. --+ (Spring constant, K 12!... I--'llimlli)~ Frictionforce:;F~ FEEDBACK CONTROL 25-32 -~ o -4 ~ ~.!!1 -8 _ II I I-I--~ M'-2 NL - - M' _NL ~ =4 _ M'<1 No limiting --...:::I "",,-J? NL ' ................ r--..... (l) roP- So ~ iij"O 12 -M' ~.g -16 -NL ;:::1:10. =8 ~ .... . ~~ I"- ~ ~-20 «1:2: .s - Qj -28 c: 0 g -10 ~ -20 .?;> ·0 o > ~ .2 ;e "'" I'.... ............ 24 ............. ~ ~ ~ 1=::::1- ~ ~ -40 ~;!::! - ~ tlO :c - § ~ M'- 8 - ~ fe.- ).. r.;. M' t'-.... ~ NL <1 ""'- "" i'... /''~=2~ \. 50 L ~ ~ -60 "'" ~ ~ ....... "'J "" 0.. -70 -80 -90 0.2 ~........ r-;;t-... I/) ~ ~ -30 (l) r::::: ~~ ..... t--r-.,.... 0.4 0.6 7-~ M' N ...... L =4 ..... \.. "" '\ ..... ~~ ~ .... 124 Normalized frequency, w T " "" 6 I" ~ 10 - 20 FIG. 17. Describing function of a single servomotor subject to velocity limiting; M' = normalized magnitude of input, radians per second; N L = saturated speed, radians per second (maximum speed); T = unsaturated motor time constant, seconds (Ref. 16). NONLINEAR SYSTEMS 0 -4 VI Qj .c 'u Q) ~ -8 .... , -............. ........ ............. ~ '-..... .~ '0 -~ ~ ID -12 '0 .a nl ' . . ~+'~ ::::cJ ~~ u Q) \ VI Q) ~ t:lO -30 Q) ~ -40 iI= ~ -50 Q) VI ~ a. -60 -70 ~ +,I~ I~ ::::./. 6~~ r---.:. ~ ............ ~(i"0/; ~ ~."e) ~4:~'l)' ~ "" '~" " ~ r........ " , ~ "~ i" "" '" ~ -20 VI ,~ "....::~ i'. ...... '" -10 ~ :c ';;:: z ~~~,I '~4: 0 t:lO c E 0 ......... - -28 ~ 'iij " '" ... r--... -24 c Q) ~ :E - 20 ~ ,~ f:::: ~ :::::- ........... ~ 'c -16 t:lO ~ c:: 0 "0 "0 '" ~ 25-33 \ I~ ~" ~ ........ \ \ \ i'-., \ " ~ ~~. f---'R~4-' 4~~~ ~~ , ~K <: ~ '-?>4J ,- 1'.< '-?>~ 6'.,?> ~ 'it' ~ Ql ./, " + '~~ :(; 0. 0 t, "fl)' i'~ ~.,?>~ --- "~h ~ - ,~r---.r---r-.r--~~ ~ t-- -80 r- r-I-~ -90 0.1 0.2 0.4 -:::: :::::t-- -- -to-- - 0.6 1 2 Normalized frequency w T "'- . 10 6 4 20 FIG. 18. Describing function of motor subject to acceleration limiting (Ref. 16). o 1.0 ~ :a Magnitude -10 -20 ~ VI -30 ': ~ Q) Q)~ 0.8 'ct:lO ~Q) Phase shift E 0.6 -40~~ u • ~ u -50 § c:: .a 0.4 - 60:E,.gf 70 ';;:: g;o 0.2 -80 ~ -90 t:lO ~nl c ;gu VI Q) C <3 ';'\1 o o 0.5 1 1.5 2 2.5 3 3.5 4.0 Fundamental dimensionless output motion, ~ INll FIG. 19. Describing function for Coulomb friction. See Table 5 for definition of terms. 25-34 FEEDBACK CONTROL SiDlplifying CODlplex Nonlinearities Separation of ADlplitude and Frequency Sensitive EleDlents. It is obvious that the nonlinearities of Table 5 can be simplified by a number of approximations to the point where the simpler amplitude sensitive describing functions can be used. Note that many of the complex nonlinearities consist of a combination of a simple nonlinearity and frequency sensitive element(s). Typical of such a case is the system with backlash shown diagrammatically in Fig. 20a. (a) r---------------------------, I I I I I I I I I (b) FIG. 20. Block diagram for a simple system with backlash GD and G(s) equal to controller elements: (a) the conventional diagram arrangement; (b) the diagram rearranged so that the amplitude and frequency sensitive portions of the system are separated. , If it is recognized that the input to the nonlinear portion of the system will be essentially sinusoidal, then the describing function for the deadband (Table 4, Fig. 11) can be used and substituted directly into the block diagram as a gain GD • To analyze the system conveniently it is necessary to rearrange the block diagram so that the nonlinearity is separated from the frequency sensitive elements and can be treated by the usual techniques of graphically presenting amplitude sensitive describing functions. NONLINEAR SYSTEMS 25-35 Let 1 GAl = - - - - - - - ) Ke8 ( -JmR -8 1 KTKe + Then rearrange the block diagram to appear as in Fig. 20b. In this illustration the amplitude and frequency variant portions of the systems have been separated and the usual type of describing function analysis can be used to determine if the system will be stable and if not what the amplitude and frequency of oscillation will be. Method of Equivalent Coefficients. In complex systems, it becomes difficult and sometimes useless to attempt to separate the amplitude sensitive and frequency sensitive elements in the system so that the method of analysis described earlier in this section under Theory of Describing Functions can be used. For instance in the above example the variables in which one is really interested, C and R, are buried in the block diagram. It is thus difficult to determine: (1) whether the overall system will have satisfactory performance, and (2) how to modify the compensation to improve performance. The technique of using an equivalent coefficient is: (a) to recognize that many nonlinearities appear as a gain change in the system, and (b) to combine this gain change with existing gains in the system to form an equivalent gain or coefficient. Once knowing the range of gain to be experienced, the system can be designed to be as insensitive to such a variation as is desirable. A way to avoid the difficulty in the previous example lies in this approach. It was pointed out that the describing function of the dead space can be considered directly in the analysis. This describing function is in series with the spring constant and can be combined to yield an equivalent spring. Thus, the system will seem to have a very soft spring at low angular displacements and an increasing spring constant (approaching the actual spring constant) as the angular displacement increases. This equivalent coefficient can then be considered as a constant in the remaining analysis. In this case the major effect is the reduction in the load resonant frequency and it becomes necessary to make the system less sensitive to load resonant frequency to avoid difficulties. 25-36 FEEDBACK CONTROL It is necessary to consider a number of different spring constants to make sure that no unstable points exist, but this is usually not too difficult a task although it can be somewhat time consuming. (See Ref. 18.) EXAMPLE. As shown in item 3 of Table 5, the equation for backlash from armature voltage to output shaft rate is 8(h Va J MJ L8 3+ [J MD L+JLK;:-T ] 8 2 + [ (J M+JL)K'L+DL~eK T ] 8 +K' L [ D L +K~K T ] where K'L = the equivalent spring constant = KLGD • The value of GD is obtained from Fig. 11 with the argument B /2(()M - ()L) rather than B/2M. The complete system transfer function including the above equations can then be analyzed for several values of K'L. The actual value of GD has to be considered only if the magnitude of the input to the backlash is wanted. 5. METHODS OF ANALYSIS: PHASE PLANE, GRAPHICAL SOLUTION OF SYSTEM EQUATIONS General This is essentially a heuristic presentation of the phase plane method. As a consequence attention will be directed at the areas of application and only the most rudimentary explanation of the techniques of constructing the diagrams will be provided. At its present state of development this technique has only limited utility in system synthesis; however, phase plane techniques have received some use in the conception and display of schemes for nonlinear compensation. (see Refs. 22 and 23.) The reader is referred to Refs. 20, 21, and 27 for details beyond those provided here. Definitions. The phase plane has the coordinates of velocity (usually the ordinate) and position (usually the abscissa) of the system. The solutions of the differential equations are plotted on this coordinate system. The locus of a solution to the differential equation is called a phase trajectory or simply trajectory. A series of solutions or trajectories is referred to as a phase portrait. Lilllitations. Analysis by the phase plane method is limited to: 1. Second order (single degree of freedom) systems. 2. Autonomous systems (time does not appear as a parameter in any of the coefficients of the system). 3. Systems with impulse, step, or ramp inputs or driving functions. NONLINEAR SYSTEMS 25-37 The limitation on the order of the system is severe, but it is possible to approximate a limited number of practical systems by a second order equation for purposes of prelimin"ary analysis. Methods have been proposed for extending the technique to higher order systems but have not received wide use. (See Refs. 24 and 25.) Basic Equations. The basic equations that can be treated by phase analysis are of the form: (6) d2x -2 dt dx + hex, x) - + f2(X, x)x dt = 0, dx X=-· dt By substituting dx/dt = y, eq. (6) can be reduced to a set of first order equations: dy - = N(x y) " dt (7) dx - dt = D(x y) " and eliminating time by division yields (8) dx dy N(x, y) D(x, y) Significant Characteristics of Phase Portrait Table 6 describes a few of the significant characteristics of phase trajectories. Identifying these characteristics will be useful to the engineer in interpretation of the phase trajectories. Areas of Use of Phase Plane Method Analysis. The graphical techniques of plotting the phase trajectories make the phase plane method particularly useful for systems with second order nonlinear equations of motion. Although the availability of analog computers has greatly reduced the need for such hand methods, there still remains a need for generalizing analysis. The phase plane analysis often can provide this generalization. Presentation of Data. The phase plane has found some use in presentation of analog or actual equipment results. In such cases, the system does not need to be limited to second order. Of course the interpretation becomes more difficult the higher the order of the system. Such plotting techniques have been made even more meaningful when the display is on a cathode ray oscillograph by intensity modulation. Timing pulses can be indicated by brightening or dimming the trajectory. TABLE Type 1. Nodal point 2. Focal point 3. A center 4. Saddle point 5. Limit cycles 6. SIGNIFICANT CHARACTERISTICS OF PHASE PORTRAIT Description The trajectories converge or radiate from the node in such a manner that the direction of the trajectory approaches definite limits as the nodal point is approached. The node is stable if the paths converge on the node and unstable if the paths radiate from the node. This is a singular point, i.e., a point where eqs. (7) are equal to zero. The trajectories converge or radiate from the focus on spiral paths. As for the node, if the paths converge, the focus is referred to as stable; if the paths spiral outward, the focus is referred to as unstable. This is a singular point. Closed trajectories about a point. Typical Trajectories ~I/77y "-l 01 W 00 Stable node *~ Stable focus This is a singular point. Trajectories converge toward the saddle point and then diverge except for the special case when the initial conditions are such as to fall on the trajectory that goes into the point (the converging separatice). This is a singular point. Corresponding Conditions in Linear Second Order System Stable node, negative real roots Unstable node, positive real roots Stable focus, complex roots with negative real parts Unstable focus, complex roots with positive real parts "T1 m Zero damping m o o:J ---v- lone negative roots rea't'ive _~ Both and one ~Separatice .. 6. Hard oscillations A limit cycle describes the oscillation in a nonlinear system. A stable limit cycle is a closed path to which adjacent trajectories converge. When the trajectories diverge from a closed path in the phase plane, the path is called an unstable limit cycle. It is necessary t? excite the. system ?ey<:md a finite bound in order to obtam self-sustamed oSCIllatIOns. The boundary will be an unstable limit cycle. 7. Soft oscillations Self-sustained oscillations can be started with an infinitely small See figure for limit excitation. Soft oscillations start from unstable nodes or focycle cal points. +Ef;LimitCYCle POSI None A stable limit cycle about ali unstable focal point ~unstable None limit cycle Stable limit cycle None » n A n o Z -t :;0 o r- NONLINEAR SYSTEMS 25-39 Once the analyst has set up the equations for the phase plane, an analog computer can be used to plot the trajectories. In this manner, one can maintain the generality of the phase plane analysis and avoid the ennui of extensive hand calculation. Systelll Synthesis. Because the graphical presentation often makes interpretation of results easier, the phase plane method has been looked on with favor by many. A number of authors describing work on "optimum controls" have made extensive use of the phase plane in presenting their results. (See Sect. 7 for details.) However, the limitations on the order of the system equations hamper work on any but the simplest systems. Ku and a number of others have extended the phase plane to phase space. (See Refs. 24 and 25.) This is essentially a multidimensional plot allowing solution of higher order equation. Phase space methods have received only very limited use to date. Analytical Methods of Constructing Phase Plane Direct Method of Solution. If the equation of motion of the system, eq. (6), can be integrated to obtain time solutions for X, then x as a function of x can be obtained by eliminating time from the individual solutions, and the relationship between x and x may be plotted directly. Indirect Method of Solution. If the equations of motion cannot be integrated to obtain time solutions for x and x, a new differential equation in terms of dx and dx may be formed and solved to give x as a function of x directly. The equations in this case reduce to the form of eq. (8). EXAMPLE. Simple Relay Servo. Equations (13), Sect. 6, describing the operation of a relay servo for a step input are repeated here for convenience. d2x dx -+= dr dr 1, x < o. Substitution of y = dx/dr, then dividing by y = dx/dr, and recognizing that (dx/dr)/(dy/dr) = dx/dy yield: dy dx '1 _0_,- 1, y x> 0; (9) dy - = dx 1 +-y 1, x < o. 25-40 FEEDBACK CONTROL The variables can be separated in these equations and integrated: x = - f (10) 1: x> y dy, x =f-Y-dY, x 1-y 0; < O. These integrals can be found in any good table of integrals, and the function can be plotted on the phase plane for different constants of integration. When plotted, the trajectories of eq. (10) would appear similar to the sketch in Fig. 21. dx y= crT FIG. 21. Phase trajectories for a simple relay servo. Each trajectory is for particular constant of integration of eq. (10). Obtaining Time from a Phase Plane Plot. It is possible to obtain time, t, from a phase plane plot even though the original characteristic equation of motion cannot be solved for x and x as functions of time. To do this use the relationship: (11) T =fdT =f~ =f~dX. dx/dT x Equation (11) shows that if the phase portrait is replotted with l/x as the ordinate and x the abscissa, the area under the resultant curve re- NONLINEAR SYSTEMS 25-41 presents time. This method makes it possible to obtain plots of x and x versus time. Graphical methods are also available for obtaining time from the phase portrait. (See Ref. 26.) Graphical Methods of Constructing Phase Plane When the original characteristic equation of motion is nonlinear, the integration of the equation obtained by the above method is difficult or impossible. Graphical methods exist for solving the equation for a direct plot of x versus x. One of the most useful methods is the method of isoclines, described in Refs. 20 and 23. Other graphical methods are also available, e.g., Lienard's, arc-segment procedures. Method of Isoclines. Equation (8) can be written in the form (12) dx N(x, y) dy D(x, y) Equation (12) is the slope of the phase trajectory. By setting eq. (12) equal to a constant, the equation for the locus of a constant slope can be obtained. One can then strike off lines of the proper slope along the locus. After constructing sufficient loci of constant slope, the phase trajectories can be sketched. EXAMPLE. Simple Relay Servo. Equations (9) define the loci of constant slope for the relay servo given in the previous example. (9a) (9b) dy 1 -=---1 dx y dy 1 = --1 dx y , - x> , x 0; < o. Equation (9a) set equal to a constant provides the loci in the right halfplane. Equation (9b) describes the left half-plane. For a +45 0 slope eq. (9a) becomes: 1 1 = - - - 1, y y = -!. Several values are tabulated in Table 7. The isoclines are constructed in Fig. 22. FEEDBACK CONTROL 25-42 TABLE 7. VALUES OF ISOCLINES FOR SIMPLE RELAY SYSTEM Value of y Slope 00 +30 0 -30 0 +45 0 -45 0 +60 0 -60 0 +75 0 -75 0 +90 0 -90 0 Left Half-Plane 1 0.634 2.36 0.5 Right Half-Plane -1 -0.634 -2.36 -0.5 ±oo ±oo 0.366 -1.37 0.211 -0.366 -0.366 1.37 -0.366 0.211 o o Isoclines oo------------------------------~ 1.0 ,,- 0.8 ,,- 0.6 45°--~~~~~~~~~~~r,~~~ 60o--~~~~~~~~~~~+.~~~ 0.4 75°--.r~r+_rT_r+_r~~_r~~~.r+_9 0.2 - 90 o --t--+-+--l-+-+--t-Ir--+-H+-I-t-t--t-t-t-t--t-t----0.2 - 0.4 -0.6 -0.8 -1.0 -1.2 -1.4 \ \ \ \ \ FIG. 22. Construction of a phase trajectory by means of isoclines for a simple relay servo. NONLINEAR SYSTEMS 25-43 6. OTHER METHODS OF ANALYSIS Differential Equations: Analytical Solutions An often useful method of analytically obtaining the transient performance of a simple but useful class of nonlinear systems is by piecewise linearization. Although the type of nonlinearity that can be treated is restricted, higher order systems can be handled. (See Refs. 31, 32, and 33.) Other analytical methods of obtaining transient solutions are described in Sect. 5 and in Refs. 27, 29, and 30 and their bibliographies. Piecewise Linear Systems. Many nonlinear control systems are linear in well-defined areas of operation. At the boundaries of these linear areas are discontinuities which make the system, when considered as a whole, nonlinear. For such systems the linear differential equations can be solved between boundaries and the boundary conditions matched to obtain a complete solution. Since it is generally desirable to obtain a solution under steady-state conditions (or at least as steady-state conditions are approached), the process using differential equations becomes quite laborious if there are a number of reversal points. This is true even for a simple second order system, and the process becomes more unwieldy for higher order systems where more than two initial conditions are required at each reversal point. Normalized Performance Charts. Kahn avoided some of the labor in the differential equation approach by using a semigraphical approach that recognizes the fact that the initial conditions at each boundary point are a function of the velocity at the previous boundary and the time between boundaries. (See Ref. 32.) This method loses its value for systems of an order higher than second. Under these conditions more than two dimensions are needed to represent the curves. For a higher order system, it is necessary to have more than one initial condition for each boundary; for example, on a third order system, it would be necessary to have initial conditions representing both velocity and acceleration. The higher derivatives would fall in the third dimension. Summary of Steps in Piecewise Linear Analysis. 1. Prepare the complete system equations. 2. Break the complete equations into a set of linear equations representing the system operation between discontinuities. 3. Determine the boundary conditions at the discontinuities. 4. N ondimensionalize the equations as much as practical. 5. Rearrange the equation or change the dependent variable to make the dependent variable independent of the input function at the discontinuities. FEEDBACK CONTROL 25-44 6. Obtain the solutions to the equations. presented graphically. Relay r ~o-=- Motor *' FIG. 23. These may often be best -m l/K u Tm s+l c Block diagram of simple relay servo. EXAMPLE. Piecewise Linear Relay Servo. A typical block diagram for a relay servo is shown in Fig. 23. For the motor of the system of the figure, d2c m = KvTm2 dt dc + Kv-· dt From the characteristics for a simple relay m = - V for e < 0, m"= V for e > O. Therefore, d2 c K vT m - dt K v Tm 2 d2 e - dt 2 dc + K v -dt de + K v -dt = - V, for e < 0; = V, for e > 0; e = r - e. A typical transient to a step input for the servo of Fig. 23 appears in Fig. 24. The driving voltage in the motor is reversed at each of the zero .....r:: Q) E Q) o <0 0. en o Time FIG. 24. Typical response of simple relay servo to a step command. 25-45 NONLINEAR SYSTEMS error points 1, 2, 3, 4, 5, etc. Substitution of c the variables by substituting yield for a step input: d2x dx -+-= dT2 dT (13) d2x dT2 -1 ' dx + dT = 1, for x for = r - e and normalizing > 0; x < o. The solutions to these equations are obtained by taking the Laplace transform of eqs. (13) and determining the inverse transform. The transforms of eqs. (13) after the switching at Tn are (14a) (1'1b) + sX(s) S2X(S) + sX(s) S2X(S) = = + SX(Tn) + X(Tn) + x(Tn)]e[l/s + SX(Tn) + X(T~) + X(Tn)]e[-l/s TnS , Tn \ < 0; X > 0; X where X(Tn) = value of x at Tn, Xes) = £[X(T)], . X(Tn) = derivative of x with respect to T at Tn, Tn = normalized time of the nth switching point. Notice that with the exception of the first closure of the relay (in the region 0 - 1 of Fig. 24), X(Tn) = 0 at the switching point and only X(Tn) affects the characteristics of the transient. The velocity just before the relay switches must equal the velocity just after the relay has switched. This is the boundary condition relating the two eqs. (13). To obtain a solution to eqs. (13), it is then necessary to apply the initial conditions at each reversal point. For eq. (14a) the inverse transform yields, when X(Tn) = 0, (15) x = - exp [-T + Tn] - (T - Tn) + 1 + x(Tn)[l - exp (-T + Tn)], Tn+l' > T > Tn, where Tn+l = normalized time at the n + 1 switching point. Equation (15) is dependent only on the time from the last reversal and the initial velocity. The time to reach the next reversal can be calculated by' setting eq. (15) equal to zero and solving for the time difference Tn+l Tn. This equation is exp (-T n+l + Tn) + Tn+l - Tn - 1 (16) X(Tn) = . 1 - exp (-'-T n+l Tn) + FEEDBACK CONTROL 25-46 The velociiy at the next reversal can be obtained from the derivative of eq. (15) at Tn +l: (17) X(T n+l) = exp (-Tn+l + Tn) - 1 + X(Tn) exp (-T n+l + Tn). Equation (14b) can be solved similarly. Since the system is symmetrical, the equations corresponding to eqs. (16) and (17) will be respectively identical except for sign. Kahn avoided the labor of obtaining repetitive solutions to eq. (15) by plotting eqs. (16) and (17) as shown in Fig. 25. The transient X(T) Initial velocity, X(Tn) X(a) (a) . u o 0.7 0.8 FIG. 26. The maximum modulus Mm and the frequency of the maximum modulus Wm of a simple relay servo as a function of the argument of the describing function for a relay. The operating conditions and the servo frequency response are given in Fig. 27. The uncompensated characteristics result from setting the system gain sufficiently high to meet sensitivity requirements, i.e., setting B. Maintaining the system gain and adding a 3 to 1 lead gives the compensated characteristics. (Note that in practice there would have to be a 3 to 1 increase in amplification to compensate for the 3 to 1 attenuation that a passive lead network would have.) and Mm of a relay servo before and after compensation. Figure 27 shows how G(jw) locus has been changed to improve system response by meeting a criterion of a maximum M m ~ 2. The boundary of such an improvement can be quickly obtained by overlaying a Nichols chart with a graph of the GD locus and observing the path of the G(jw) locus on the Nichols chart as the origin is moved along the Gn locus. (See Fig. 28.) If a maximum Mm criterion is being used, the boundary of this Mm for all values 25-52 FEEDBACK CONTROL (a) III ~ { serv~ motor response I~-- G(jw) w=0.125 + 18 = . (11 JW . +Jw) '0 Q) "0 .g + 12 .a 'E B/2M= 0.05 tlO E +6 tlO o -l -180 Phase angle O----~~~~~~~~~~~----------------- .-6 -12 -18 FIG. 27. Log magnitude-angle diagram of a simple relay servo; operating conditions VjB = 6 db; JIjB = 0.5, Curve (a) is the uncompensated system, Curve (b) is the system compensated with a 3 to 1 lead. The cross-hatched region is the necessary modification to the uncompensated system to meet an Mm criterion of Mm < 2. For the purposes of illustration a normalized response has been used for the servo motor and only the ratio V jB has been defined. of GD can be quickly sketched on the G(fw) graph. If sufficient work of this type is performed, templates for several values of Mm can be built. The necessary compensation networks can be determined either by trial and error or by more elaborate methods discussed in Chap. 23. Compensation for Relay Servomechanisms. Describing function, phase plane, and piecewise linear analyses have all been used extensively to determine'the necessary compensation. (See R~fs. 11, 18, 24, 31, 32, and 44.) The describing function method is the most useful for higher order systems. The techniques used in the preceding example can be easily extended to higher order systems. The describing function analysis and experiment normally check within engineering accuracy. (See Refs. 11 and 41.) A maximum Mm -< 2 criterion is generally typical in the design NONLINEAR SYSTEMS 6 4 U) a; ..0 'g "C Describing function _...L GD +2 Phase angle, degrees O~+-~--~--~--~--~--~----------- ar :a -2 'E till ro ::?: -4 -6 -8 FIG. 28. Log magnitude-angle diagram showing a method of estimating the necessary compensation by overlaying with a Nichols chart. The origin of the Nichols chart is first placed at B 12M = 0.67 and the M contour is sketched on the log magnitude-angle diagram. This is noted as curve (a), where M m = 2 has been used for illustrative purposes. The complete area of necessary compensation can be found by moving the origin of the Nichols chart along the describing function locus. Another location is shown at BI2M = 0.6, and the M contour is curve (b). The coordinates of the Nichols . chart are shown as dotted center lines. of relay-positioning systems. However, experience wIth the condition of the particular application may dictate a different value for maximum Mm or a different criterion. Because of the phase lag of a relay with hysteresis, lead compensation is quite useful. Two forms of su.ch compensation are tandem lead networks and rate feedback. The latter can be obtained either from a tachometer or from motor back electromotive force. Nonlinear compensation can also be used to achieve better performance for a particular type of input. See the paragraphs on Optimum Switching Functions later in this section for details. Such compensating networks must be used with care if more than one type of input is to be encountered, because the performance will vary with the form and magnitude of the input function. (See Refs. 42, 43, and 45.) COIllpensation for Saturation. For a large number of systems, it is necessary to follow a relatively smooth input within a very small error. In order to provide economical components, the linear operating range of these components is usually very little beyond that necessary to follow 8. TABLE Type of Nonlinearity Case I, preamplifier saturation TYPICAL TYPES OF SATURATION EFFECTS AND METHODS OF CDMPENSATION System Configuration and Characteristics Block diagram ~ Open loop characteristic ~~' ~ " ....'''l . o c::~ ~ ----- "" .[D O! ~ ~ Case II, power amplifier saturation ~ 0 1 = integral compensation 02 = power element Same open loop characteristics as Case I Effect on System Performance For an unconditionally stable system with saturation, the relative response will be slower for large step inputs. As the step input level is increased for a conditionally stable system, the system will begin to exhibit less stable characteristics until a critical value is reached above which self-sustained oscillations will occur. For moderate saturation, the overshoot may actually be less than for the linear system although the settling time will be increased. Saturation acts like a gain reduction in the system and for saturation to cause system instability, instability must be predicted on a linear basis for reduced gain. Oscillation frequency and amplitude can be estimated within 20 to 30% by describing functions. The presence of noise with the input signal causes an effective increase of saturation beyond that predicted for the input signal by itself. This effect causes an increase in the closed loop phase shift with respect to the input signal. See Ref. 7 and Table 1. Same effect as Case I but a reduction in the overshoot with moderate saturation has not been observed. The degree of saturation (ratio of saturation level to input to element if the system were linear) necessary to start self-sustained oscillations in a system will vary with the location of the nonlinearity in the system. A difference as great as 10 to 1 between the saturation from a step of r needed in the preamplifier and power amplifier has been noted. (See Ref. 18.) In all the cases considered in Ref. 18, it was necessary to have sufficient saturation so that the gain was reduced to the point where there was negative phase margin at gain crossover; however, with preamplifier saturation· it was necessary to exceed this level of saturation considerably to cause self-sustained oscillations. Possible Methods of Compensation (a) Eliminate or reduce the integral compensation for large signal inputs; e.g., typical magnitude and phase angle curves are shown at the left. It is obvious that if the gain is reduced sufficiently, the region of negative phase margin will cause instability. However if a nonlinear compensating network is used which for large errors eliminates or reduces the integral compensation, satisfactory performance can be obtained. The dotted line shows the frequency response after such a change. There are a number of methods by which such changes in the compensation can be obtained. (See Refs. 18, 38, and Table 10.) The circuit constants are normally set experimentally. (b) Modify the basic system operation for large signal levels. This is essentially an extension of (a). However the basic mode of operation is also changed. Examples are dual-mode servos wherein the mode of controlling the power element is changed with signal level, and two-speed servos wherein the feedback signal gain is lowered. (Actually, in practice, the takeoff is from a different speed shaft which gives rise to the appellation two speed.) (See Refs. 39 and 40.) Often the signal used to switch the feedback signal is used to modify the compensation networks. (a) See Case I (a) and (b). Power amplifier saturation is similar to torque saturation for which the dual servo techniques have been developed. (See Refs. 23, 40, and Table 10.) (b) Use of tachometer feedback around the saturation is effective. (See Case IlL) (c) If in place of 01 the integral compensation can be accomplished by a filter (lead) network around the saturating element, the system can be so designed that, as the system saturates, the compensation automatically becomes less and the system will not become unstable with saturation. (See Ref. 2.) Case III, power amplifier saturation Gl = integral compensation G2 = power element G3 = tachometer feedback For a conditionally stable system saturation. the gain is reduced in the tachometer loop lowering the crossover frequency. Wt. which lowers the phase margin at the position loop. There are two possibilities: (1) The phase margin at the normal position loop crossover frequency, We. will be lowered until the system becomes unstable at the normal crossover frequency, We. (2) The tachometer loop becomes ineffective before (1) occurs. and the position loop gain will be lowered forcing the crossover frequency down into the region where the phase margin goes negative on account of the integral compensation.· The effect depends upon the constants of the particular system being considered. For (1) the oscillation frequency will be approximately the crossover frequency. For (2) the frequency of oscillation will be closer to the integral compensation time constants. (a) Instability is not normally a serious consideration. (b) The problem can generally he avoided by achieving the com- pensation by a filter in the tachometer fredback rather than in tandem elements in Gl. J>.) 01 Ot 01 Case IV, saturation in feedback ~I Same open loop characteristics aa Case III Effect similar to Case I. (a) Eliminate sa.turation if possible. (b) See Cases I(a) and (b) and II(c) above. 25-56 FEEDBACK CONTROL the input within the maximum allowable error. When such systems are synchronized on a new operating condition or subjected to violent disturbances, they will inherently be highly saturated. This leads to reduced performance. In addition, because such systems often use integral compensation to achieve high values of low-frequency gain, this can also lead to serious overshoots and the attendant longer settling times or even instability. Normally the requirement on allowable error is not as necessary during the synchronizing period, and a reduction in performance can be tolerated. The major concern is, therefore, that the system should settle rapidly and stably from large signals. The effects of saturation and the methods of compensation for such systems are summarized in Table 8. COlllpensation for Backlash. The effects of backlash and load resonance are the major limits on the performance that can be achieved with a power servomechanism. The great quantity of published material attests to the serious consideration that has been given to the problem. (See Refs. 13, 14, 15, 18, and 46 to 52.) However, thoroughly satisfactory methods for circumventing the effects of backlash are not available. The basic effects (see also Table 5) are illustrated by the system of Fig. 29. For large input signals, the backlash is quickly taken up and has very little effect upon performance. For low-level signals approaching the magnitude of the backlash, the backlash tends noticeably to disconnect the load· from the motor during signal reversals. Heuristically it can be seen that this will cause the load to lag farther behind the motor than with a linear system. Conversely because the motor is disconnected from the load,. it will accelerate faster than normal in the backlash zone and the motor position will therefore tend to lead the normal response. When considered in terms of frequency response, if the primary feedback is from the load, the effect of backlash increases the lagging phase shift and decreases the loop gain; if the primary feedback is from the motor, the effect of backlash on system performance is much less severe and it actual1y introduces a leading phase shift into the loop. For low signal levels, if the load damping is viscous, the linearized equ?-tions of Fig. 29 can be used. For low signal levels if there is appreciable Coulomb friction present, its effects will. predominate, and ·the use of hysteresis to represent the backlash is more correct than the equations of Fig. 29. Therefore, it is necessary to evaluate carefully the type and extent of the damping present. If the damping is viscous, the frequency of oscillation caused by backlash will generally be at or higher than the normal linear gain crossover frequency. If the damping is of the Coulomb type, the frequency of oscillation will be lower than the gain crossover frequency. The amplitude in either case will be small (one to several times the backlash angle, normally). Methods of analysis. Phase plane, piecewise linear, and describing function methods of analysis have been used successfully. The describing NONLINEAR SYSTEMS 25-57 Reflected torque I I I Ke _____________ JI '-----I Primary motor feedback - - Primiryloadfeedback-- - - - - Basic linearized equations: S3 + [20 + Wm]s2 + [(1 + ~~) aWL2 + 28wm Js + aWL2 [28 ;: + Wm J' SOM + 28s + WL2] + ;~) aWL2 + 20wmJ S + aWL2 [28 ;: + wmJ' (l/Ke)Wm[S2 Va = S3 + [28 + Wm]S2 + [(1 where a -= IGD I, magnitude of describing function for dead band; KeKT. . I d Wm = - - , motor tIme constant wIthout oa ; h.[R WL 2 = KL, load mechanical resonant frequency; JL o= DL, - 2JL Ioa dVISCOUS ' d ampmg. . Other constants are defined in Table 5. FIG. 29. Typical shunt d-c machine and load with backlash representation. The basic linearized transfer functions are given in a nondimensionalized form. function methods are the most generally useful for a paper study. However, the complexity of the problem warrants the use of an analog computer for thorough investigations. As just noted and in Table 5, the representation used for backlash will vary depending upon the constants involved. If the hysteresis representation is chosen, the usual describing function methods of analysis can be used. If the more complex representation of Fig. 29 is chosen, the method of equivalent coefficients, Sect. 4, is recommended. General Design Considerations. In the design of a power servomechanism, the following basic effects should be given consideration: 1. Use of tandem integral compensation increases the magnitude of sustained oscillations. The effect can be reduced by the use of dead space. TABLE Type of Corrective Measure 1. Mechanical design 2. Divided reset 9. USEFUL CORRECTIVE Techniques for Backlash Compensation The backlash can be reduced by improving the grade of gears used and the tolerance on the center distances, by ha ving adjustable center distances, or by numerous other special design and/or assembly procedures. (b) Spring loading. There are various methods for mechanically spring loading the gear trains to take up the backlash. One method used in lightly loaded gear trains is shown at the right. This can be extended to the point where the entire gearing is completely divided and onehalf spring loaded against the other half. This takes up the backlash throughout the entire gear train. (e) Split drive. By using two driving motors biased in opposite directions and separate gear trains for each motor, the motors will drive against each other to the extent of the bias and take up any backlash. A hydraulic drive of this type is shown at the right. (d) One speed motor. Eliminate the gearing by driving the load directly from the motor shaft. (See Ref. 52.) (a) Improved precision. (a) Solid motion feedback. Consider the idealized backlash at the right. There must be a point with a displacement somewhere between the displacements of J m and J L that responds only to the externally applied forces. The displacement of this point (center of mass) is called the solid motion. All supplementary motions of J m and J L relative to the solid motion must then be due to mutual forces that occur upon collision or separation and the momentum of the supplementary motions must be equal and opposite. Instrumenting the solid motion would give a signal which did not contain the backlash effects. This point cannot be physically instrumented but by adding signals from the load and motor in the proper proportion the supplementary motions can be cancelled and only the solid motion will remain. From the principles of conservation of momentum: . JM. Xs = J M J L Xm + + JL J L . + J M XL, where xs is the rate of change of the solid motion. (See Refs. 14, 15, and 49.) A method of instrumenting the technique is shown at the right. (b) Artificial damping. Reducing the load-resonant peak and increasing the apparent load-resonant frequency ameliorates the backlash effects. This can be done by the proper feedback of the relative load and motor motions .. This includes position, velocity, and acceleration differences. A configuration containing such a feedback is shown at the right. The circuit constants can be determined by frequency response techniques by assuming the system to be linear as in Fig. 29. Experimental adjustment will be necessary. 3. Network compensation (a) Dead zone compensation. Backlash oscillations are low amplitude. Use of a dead zone in the error channel opens the loop to low amplitude signals and will stop certain types of backlash oscillation. (See Ref. 48.) The dead zone will be the same order of magnitude as the backlash. (b) Frequency-sensitive networks. Compensating networks, gain changes, parallel-tandem networks, etc., can be used in combination with deadband or separately to get the desired gain-phase change at low signal levels. (See Ref. 28.) Lead networks are particularly effective in compensating for the lagging phase shift of backlash. 25·58 TECHNIQUES FOR BACKLASH General Remarks There are many other methods besides the two shown for mechanizing the concepts Driving shaft of (b) and (c). (See Ref. 49.) In general, these methods are costly and increase the friction and wear in the gear train. Electric and hydraulic models of one-speed motors have been built and tested. The low-speed high-torque requirement makes Spring tension rotates gears in opposite the electric unit bulky and heavy. direction until backlash is taken up. Torque is transmitted through springs General mechanical design considerations are outlined in the text. to shaft. Typical Diagrams J m = motor inertia J L = load inertia Idealized Backlash When the feedback (or reset) is from (or divided between) the motor and the load it is called divided reset. The concept as discussed is highly simplified and in practice the configuration and constants will have to be adjusted to suit the particular case. When J m » J L, the motor and the center of mass follow closely and rate feedback from the motor alone is effective in damping the system. Approximate scheines for obtaining the rate feedback from the motor back emf, etc., are often adequate. Solid motion position feedback can be obtained in the same manner; however, this position signal can differ from the actual load position by as much as BJm/(Jm + J L), and often the addition error cannot be tolerated. Feedbacks of the type in (b) are very effective. Generally, position feedback alone is sensitive to system parameter variations; rate and position feedback in combination are quite insensitive; acceleration feedback is very sensitive. Position difference feedback increases the resonant frequency, and rate difference feedback increases the damping. This method has been used for drives with a low-load inertia to motor inertia ratio (referred to the same speed) and with sufficient friction to keep the ioad from much coasting. Under these conditions, it is effective and the error is small. More complex schemes of sensing the proper time to modify the gain characteristics are possible but usually are not justified. 25-59 25-60 FEEDBACK CONTROL (See Table 9, item 3.) The effects of integral type compensation achieved by tachometer feedback and.a lead filter have not been as thoroughly documented. However, the same general tendency is apparent .. 2. Increasing the load mechanical resonant frequency, (KL/J L)Yz, reduces the magnitude of the sustained oscillations. 3. It would be desirable to have mechanical load damping ratios greater than 0.1. These would probably be undesirable on larger drives because of the large power loss involved. However, there are methods of increasing the damping electrically (see Table 9). 4. Primary feedback from the motor gives more stable operation than from the load. Table 9 summarizes various methods of compensating for backlash. None of the schemes is· perfect in the practical case, but all provide a certain relief from the problem. The final choice usually includes considerations of weight, size, and cost, as well as performance. R2 R eo 3 (a) R3 C C T 0 (c) 0 0 T TTo (d) FIG. 30. Typical nonlinear compensating circuits. As shown the circuits vary with the input variable but circuits (a) and (c) can be adapted to vary with an independent variable: (a) nonlinear gain circuit with characteristics that vary with input voltage to increase gain for low-level signals; (b) resistance characteristics of the voltage sensitive resistor Rv; (c) nonlinear time constant circuit that reduces the time constant for large input signals; (d) nonlinear time constant circuit that eliminates the time constant and reduces the low-frequency gain for large input signals. NONLINEAR SYSTEMS 25-61 Nonlinearities to Improve System Response N onlinearities used for improving system performance in general involve methods for (a) reducing the response time and/or (b) minimizing overshoots by more fully utilizing the performance available in the power element(s). Many of these methods accelerate the system rapidly for large errors and increase the relative damping for small errors so that operation is smooth (very stable). Table 10 summarizes several typical methods of nonlinear compensation and refers to typical circuits shown in Figs. 30 and 31. r + - e + - Cm / + C Tach. r-- Nonlinearity (a) Input to amplifier Error R4 R3 (x) R2 T'C 2 (b) Feedback FIG. 31. Typical nonlinear feedback compensating circuits: (a) nonlinear rate feedback to minimize overshoot from large signals ; (b) nonlinear stabilizing circuit for switching feedback compensation for large errors or feedback rates. To obtain the proper characteristics it may be necessary to use an isolation amplifier at (x). Because of the difficulty of specifying the required characteristics mathematically' and the impracticality of instrumenting the ideal characteristics for all but the simplest systems, nonlinear compensation is obtained by ~mpirical means in practice. t-.) TABLE Type 10. TYPICAL NONLINEAR METHODS OF COMPENSATION Description of Technique Block Diagram Gm is the transfer function of the doc output motor, G1 is the transfer function of a convention- 1. Lewis servo al tachometer and the dotted block represents the transfer function of a second tachometer in which the term x denotes the product of Kl and sc. The field of this second tachometer is excited by the amplifier error signal so that the output is proportional to the error magnitude times the output speed. This output is subtracted from the output of the first tachometer and results in a value of damping which is low for large errors and which increases as the error decreases (Ref. 54). 2. Tandem compensation li = lilTtar elements Generally, the relative damping is decreased and the frequency response increased. For instance, the solid curve represents the normal response for small signals, and the dotted curve the response for large signals. il~~·'M~. ~ Normal - ..........__ ...., L_ 3. Feedback compensation a:::o:::::::::: ... This is the same basic approach as (2), but the feedback allows gain and time constant changes to be made as functions of error and the deriva. tives of the error. This can be used to alleviate the problem of reaching zero error with high derivatives existing. The needed functions are nonlinear but can often be approximated adequately by'linear circuit components and diodes. Ot 0t\) General Remarks It is possible to choose values such that this system will give a very fast initial response to a step input with no overshoot. However, if a step input in one direction is followed by an unequal one in the opposite direction before the error caused by the initial step is corrected, the system can become unstable (Ref. 45). This tendency toward instability can be corrected by limiting the magnitude of the term Ie I e to some experimentally determined maximum value. "T1 m m oc:J The increased bandwidth can be accomplished by adjusting either the controller gain and/or time constants. Both methods have been used with success (Ref. 36). Operating on the time constants of the stabilizing network is particularly desirable. This allows the reduction of energy storage elements in the system which can give undesirable lags in synchronizing. Typical circuits to give gain and time constant change with signal level are shown in Fig. 30. Circuit constants are determined experimentally. Since the performance of the system is dependent upon the characteristics of the inputs, one must completely define the input. Figure 31 gives two typical circuits for accomplishing nonlinear feedback compensation. Circuit (a) is a modification for a standard tachometer stabilized position servo. The form of the feedback function depends upon the servo characteristics (see Optimum Switching Techniques). Circuit (b) is a more elaborate feedback circuit where error and feedback rate are combined to switch from normal feedback to a feedback which provides more rapid response. Note » n '" n o Z -I ::0 o .... that for high feedback rates and low errors the switch will open (the diodes stop conduction) and normal stabilization will come into play during synchronization. Under extreme conditions the switch may actually reverse polarity to allow rapid deceleration. 4. Optimum switching techniques or minimum response time systems Optimum switching is the controlled switching of power to the motor to reduce the error and its derivatives to zero in the minimum possible time, recognizing only the limitations on the performance of the motor. For example, the optimum response to a step input of a second order system with torque limiting is to accelerate at the maximum rate about halfway and then switch and decelerate at the maximum rate for the remaining distance. By proper selection of the switching point the system will arrive at zero error with zero error rate, and if the torque is removed, the system will remain at rest with no further corrective action. See the example in the text. Table 11 gives the optimum switching functions for several second order systems. The number of switching points needed to respond in the minimum time to a step input is (n - 1) where n is the order of the system. (See Ref. 24.) Excessive switching at low signal levels can be avoided by having a small deadband at the null. Smoother operation for small signals can be obtained by changing the mode of operation and having a small linear band at null. This has been called dual mode operation. (See Ref. 23.) It is difficult to mechanize the optimum switching function for sys'tems higher than the second order. However, the optimum performance can be approached closely without going to the complexity of (n - 1) switching points. This approximation can be made analytically by deriving a nonlinear function (of one variable) that gives a response that approaches the optimum response. This technique is explained in Ref. 57. The approximation can be arrived at empirically by using the basic second order system switching functions and modifying them by experience and experiment to provide satisfactory performance for higher order systems. The optimum switching technique is not limited to relay servos. The "switching" can he the saturation of some element in the system. In any case the optimum response is obtained only for the designed input; Le., systems designed for step inputs show poorer response for velocity inputs. Because the optimum response is the minimum time that a power element can make a correction it provides a good basis for rating system performance. The ratio of response time to the optimum response time is a useful index of system performance. (See Ref. 55.) z o Z r- Z m »AJ en -< en -t m ?; en ...., 01 b. (..) 25-64 FEEDBACK CONTROL Optimum Switching Techniques to Obtain Minimum Time (Refer to Fig. 32.) It is assumed that the amplifier gain is EXAMPLE. Response. Motor c Nonlinear rate feedback, 4(c) FIG. 32. Nonlinear control system with a very high gain amplifier and torque saturation ±Tm. sufficiently high so that the motor operates with full voltage on it for all but very small errors. The system equations are: ±Tm = J'C + Dc, m = e- fCc), (21) e = r - c, where c= c = d c/dt +Tm for m > 0, -Tm for m < O. dc/dt, 2 2 , For a step input of r: 'TTm = (22) Je + De, m = e - h(e). Equation (22) can be solved independent of time to yield a series of trajectories in the phase plane, Fig. 33. The coordinates of this plane are error, e, and error rate, e. There is only one trajectory which passes through the origin, and it will provide the optimum system response if the torque to the motor is reversed when this trajectory is reached. From Fig. 33, it is seen that proper choice of the function Ce, e) = e - fCe) will provide the intelligence to perform the necessary switching function. A nonlinear tachometer feedback will then provide the necessary switching information. NONLINEAR .SYSTEMS f(e) 25-65 = optimum switching line Optimum switching point for input El Maximum deceleration at - Tm Error rate, e Maximum acceleration at + Tm Error, e FIG. 33. Phase portrait of the performance of the control system of Fig. 32, showing the optimum switching line where the torque must be reversed to bring the system to rest with no overshoot. When the quantity e - f(e) goes to zero, the torque should be reversed. OptilllUlll Switching Functions. The form of the system characteristic equation will dictate what the optimum function should be. Several typical cases as derived in Ref. 56 are given in Table 11. TABLE 11. TYPICAL OPTIMUM SWITCHING FUNCTIONS FOR SECOND ORDER SYSTEMS WITH LIMITED TORQUE, System Type Undamped Viscous damped Torque Equation (See Fig. 32) d2e ±Tm = J dt 2 d 2e Tm Optimum Switching Function in the Fourth Quadrant e",., e de ± T m = J dt 2 + D dt 2Tm = Te, " de e = dt JD) TmJ ( e = - --log D2 e 1 - -Tm J. -De, • de e =- dt Coulomb damped a .,., Tm ( e e = 2T Tf) 1 + T m e, • de e =dt a Tf(C) is positive for c > 0 and negative for c < 0 and is a constant in either case. 25-66 FEEDBACK CONTROL REFERENCES 1. L. A. MacCall, Fundamental Theory of Servomechanisms, Van Nostrand, Princeton, N. J., 1945. 2. J. G. Truxal, Control System Synthesis, Chap. 10, McGraw-Hill, New York, 1955. 3. J. C. Lozier, A steady state approach to the theory of saturable servo systems, I.R.E. Trans. on Automatic Control, May 1956. 4. K. Klotter, Steady state vibrations in systems having arbitrary restoring and arbitrary damping forces, Proc. Symposium on Nonlinear Circuit A nalysis, Vol. II, Polytechnic Institute of Brooklyn, New York, 1953. 5. E. Levinson, Some saturating phenomena in servo mechanisms with emphasis on the tachometer stabilized system, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, (1953). 6. N. Minorsky, Introduction to Nonlinear Mechanics, Edwards, Ann Arbor, Mich., 1947. 7. R. G. Wilson and I. H. Van Horn, The Effect of Noise on Rate-Limited Systems, Rept. No. GER2328, Goodyear Aircraft Corp., Feb. 22, 1952. 8. C. A. Ludeke, The generation and extinction of subharmonics, Proc. Symposium on Nonlinear Circuit Analysis, Vol. II, Polytechnic Institute of Brooklyn, New York, 1953. 9. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. II, Chap. 10, Wiley, New York, 1955. 10. H. D. Greif, Describing function method of servomechanism analysis applied to most commonly encountered nonlinearities, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, 243-248 (1953). 11. R. J. Kochenburger, Frequency response method for analyzing and synthesizing contractor servomechanisms, Trans. Am. Inst. Elee. Engrs., 69, Pt. 1, 270-284 (1950). 12. E. C. Johnson, Sinusoidal analysis of feedback control systems containing nonlinear elements, Trans. Am. Inst. Elee. Engrs., 71, Pt. 2, 169-181 (1952). 13. N. B. Nichols, Backlash in a velocity lag servomechanism, Trans. Am. Inst. Elee. Engrs., 72, Pt. 2, 462-466 (1953). 14. A. Tustin, The effects of backlash and of speed-dependent friction on the stability of closed-cycle control systems, J. Inst. Elec. Engrs. (London), 94, Pt. 2A, 143-151 (1947). 15. K. N. Satyendra, Describing functions representing the effects of inertia, backlash, and Coulomb friction on the stability of an automatic control system, Trans. Am. Inst. Elee. Engrs., 75, Pt. 2, 243-248 (1956). 16. R. J. Kochenburger, Limiting in feedback control systems, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, 180-192 (discussion), 192':""194 (1953). 17. V. B. Haas, Coulomb friction in feedback control systems, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, 119-123 (discussion), 123-126 (1953). 18. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. II, Chap. 8, Wiley, New York, 1955. 19. K. Klotter, How to obtain describing functions for nonlinear feedback systems, Am. Soc. Mech. Engrs., IRD Paper No. 56-IRD-5, August 1956. 20. J. J. Stoker, Nonlinear Vibrations in Mechanical and Electrical Systems, Interscience Publishers, New York, 1950. 21. T. M. Stout, Basic methods for nonlinear control system analysis, Am. Soc. Mech. Engrs. Paper No. 56-IRD-9, July 1956. 22. A. M. Hopkins, A phase-plane approach to compensation of saturating servomechanisms, Trans. Am. Inst. Elec. Engrs., 70, Pt. 1, 631-639 (1951). NONLINEAR SYSTEMS 23. D. McDonald, Nonlinear techniques for improving servo performance, Proc. Nall. Electronics Conference, Vol. VI, pp. 400-421, National Electronics Conference, Inc., Menasha, Wis., 1950. 24. 1. Bogner, and L. F. Kazda, An investigation of the switching criteria for higher order contactor servomechanisms, Trans. Am. Inst. Elec. Engrs., 73, Pt. 2, 118-126 (discussion), 126-127 (1954). 25. Y. H. Ku, A method for solving third and higher order nonlinear differential equations, J. Franklin Inst., 256, 229-244 (1953). 26. J. G. Truxal, Automatic Feedback Control System Synthesis, Chap. 11, McGrawHill, N ew York, 1955. 27. T. J. Higgins, A resume of the development and literature of nonlinear control system theory, Am. Soc. Mech. Engrs. Paper No. 56-IRD-4, July 1956. 28. C. H. Shen, H. A. Miller, and N. B. Nichols, Nonlinear integral compensation of a velocity-lag servomechanism with backlash, Am. Soc. Mech. Engrs. IRD Paper No. 56-IRD-3, August 1956. 29. T. M. Stout, A step-by-step method for transient analysis of feedback systems with one nonlinear element, Trans. Am. Inst. Elec. Engrs., 75, Pt. 2, 378-389 (discussion), 389-390 (1956). 30. J. G. Truxal, Numerical analysis for network design, Approximation Papers, Trans. I.R.E., PGCT-CT-1, 4-64, September 1954. 31. H. L. Hazen, Theory of servomechanisms, J. Franklin Inst., 218, 279-331 (1934). 32. D. A. Kahn, Analysis of relay servomechanisms, Trans. Am. Inst. Elec. Engrs., 68, Pt. 2, 1079-1088 (1949). 33. J. W. Schwartz, Piecewise linear servomechanisms, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, 401-405 (1953). 34. M. J. KIrby, Stability of servomechanisms with linearly varying elements, 'Trans. Am. Inst. Elec. Engrs., 69, Pt. 2, 1662-1667 (1950). 35. M. J. Kirby and R. M. Guilianelli, Stability of varying-element servomechanisms with polynomial coefficients, Trans. Am. Inst. Elec. Engrs., 70, Pt. 2, 1447-1451 (1951). 36. H. Chestnut and R. W. Mayer, Servomechanisms and Regulating System Design, Vol. II, Chap. 9, Wiley, New York, 1955. 38. E. S. Sherrard, Stabilization of a servomechanism subject to large amplitude oscillation, Trans. Am. Inst. Elec. Engrs., 71, Pt. 2, 312-324 (1952). 39. J. C. West, A system utilizing course and fine position measuring elements in remote-position-control servo mechanisms, Proc. I.R.E., 99, Pt. 2, 135-143 (1952). 40. D. McDonald, Multiple mode operations of servomechanisms, Rev. Sci. Instr., 23, 22-30 (1952). 41. s. K. Chao, Design of a contactor servo using describing function theory, Trans. Am. Inst. Elec. Engrs., 75, Pt. 2, 223-231 (1956). 42. H. G. Doll and T. M. Stout, Design and analog computer analysis of an optimum third-order nonlinear servomechanism, Am. Soc. Mech. Engrs. Paper No. 56-IRD-10, July 1956. 43. J. C. West and P. N. Nikiforak, The frequency response of a servomechanism designed for optimum transient response, Trans. Am. Inst. Elec. Engrs., 75, Pt. 2, 234-239 (1956). 44. J. E. Hart, An analytical method for the design of relay servomechanisms, Trans. Am. Inst. Elec. Engrs., 74, Pt. 2, 83-89 (discussion), 89-90 (1955). 45. R. R. Caldwell and V. C. Rideout, A differential-analyzer study of certain nonlinearly damped servomechanisms, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, 165-169 (discussion), 169-170 (1953). 25-68 f=EEDBACK CONTROL 46. A. A. Clark and H. J. Pixley, Effects of non-linearities in multi-loop lead angle prediction systems, Ani. Soc. Mech. Engrs. Paper No. 56-IRD-18, July 1956. 47. H. T. Marcy, M. Yachter, and J. Zauderer, Instrument inaccuracies in feedback control systems with particular reference to backlash, Trans. Am. Inst. Elec. Engrs., 68, Pt. 1, 778-788 (1949). . 48. F. J. Ellert, Feedback in contouring control systems, Am. Inst. Elec. Engrs. Second Feedback Control Conference, April 1954. 49. D. C. McDonald, Backlash compensation improves servo system operation, Instruments and Automation, 28 [10], 1728-1731 (1955). 50. C. H. Thomas, Stability Characteristics of Closed-Loop Systems with Dead Band Frequency Response, 288-305, R. Oldenburger, Editor, Macmillan, New York, 1956. 51. R. L. Hovious, Jitter in instrument servos, Trans. Am. Inst. Elec. Engrs., 73, Pt. 2, 393-398 (1954). 52. F. M.Bailey, Performance of drive members in feedback control systems, I.R.E. Trans. on Automatic Control, PGAC-I, May 1956. 53. J. H. Liversidge, Backlash and resilience within the closed loop of automatic control systems, in Automatic and Manual Control, A. Tustin, Editor, Butterworths, London, 1952. 54. J. B. Lewis, The use of non-linear feedback to improve the transient response of servomechanisms, Trans. Am. Inst. Elec. Engrs., 71, Pt. 2, 449-453, (discussion') 453 (1952). 55. R. S. Neiswander and R. H. MacNeal, Optimization of non-linear control systems by means of non-linear feedbacks, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, 260-270, (discussion) 270-272 (1953). 56. T. M. Stout, Effects of friction in an optimum relay servomechanism, Trans. Am. Inst. Elec. Engrs., 72, Pt. 2, 329-335, (discussion) 335-336 (1953). 57. R. E. Kuba and L. F. Kazda, A phase space method for the synthesis of nonlinear servomechanisms, Trans. Am. Inst. Elec. Engrs., 75, Pt. 2,282-289 (discussion), 289-290 (1956). E FEEDBACK CONTROL Chapter 26 Sampled-D~'ta Systems and Periodic Controllers John E. Barnes, Jr. 1. Description and Deflnition of Sampled-Data System 26-01 2. Methods of Transient Analysis 3. Sampled-Data System Stability 26-06 4. Sampled-Data System Synthesis 26-20 26-15 References 26-32 1. DESCRIPTION AND DEFINITION OF SAMPLED-DATA SYSTEM Definition of SaIllpled-Data SysteIll. Systems which operate on data obtained at discrete intervals of time are called sampled-data systems_ The information obtained at a particular instant is called the sample. Normally the intervals are equally spaced in time and the amplitude of the sample is proportional to the amplitude of the signal. Characteristics of SaIllpled-Data SysteIlls Basic EleIllents. Figure 1 shows the basic elements of a sampled-data system: the sampler and the continuous elements. They may appear in various configurations, and there may be more than one sampler in the system. The output from the sampler is a train of pulses which is denoted by a starred symbol; that is, the output of a sampler whose input is e(t) is written e*(t). 26-01 26-02 FEEDBACK CONTROL Linearity. If the continuous elements are linear, the sampled-data system is linear and the superposition theorem is valid. A sampled-data system has regular time discontinuities, but the techniques of analysis by the use of solutions of the linear constant-coefficient differential equations of the system' are directly applicable. ret) + e(t) (a) e(t) Ie'lt) 1111111 °1 -lTk(b) II II I t (c) FIG.1. Sampled-data system and sampler input and output signals: (a) simple sampleddata system; (b) continuous error function; and (c) sampled error function. The Salllpler. The sampler acts as a pulse modulator of the input and generates a train of pulses. This action introduces high frequencies into the system which may be attenuated by a linear filter. The information contained in the input signal may be recovered with reasonable fidelity if the sampling frequency is at least twice the highest frequency component of the input signal. Figure 2 shows the effect of sampling frequency upon the frequency spectrum of the output of the sampler. Use of Salllplers. Sampled-data systems may be used for several reasons: 1. To use a digital computer as part of the controller. The input data must be in sampled form. 2. To use simpler, low-powered control elements. 3. To realize the beneficial effect which sometimes accrues when sampled-' data systems are used for the process control of plants having inherent dead time. SAMPLED-DATA SYSTEMS AND PERIODIC· CONTROLLERS 26-03 \E(jw) I ----~--~~~---W (a) \E*(jw) \ 3ws -2 -W s Complementary components Ws Ws -2 Ws "2 Primary component Complementary components (b) IE*(jw) I --~~~~~~~~~~~~~~~~~~~~--w -4ws -3ws -2ws -W s _~ 0 2 Ws 2" Ws 2ws 3ws 4ws (c) FIG. 2. Sampler transfer characteristics in the frequency domain. (a) Amplitude spectrum of sampler input; (b) amplitude spectrum of sampler output, sampling frequency greater than twice the maximum. signal frequency; (c) amplitude spectrum of sampler output, sampling frequency less than twice the maximum signal frequency. Ws = sampling frequency (Ref. 5). Reprinted by permission from J. D. Truxal, Automatic Feedback Control Systems Synthesis, Copyright 1955 by McGraw-Hill Book Co. 4. To use pulsed-data information. The input information may be available in discrete samples as in a guided missile control system or as in certain track-while-scan radar systems. Sampled-data systems may be used to advantage where digital sensors are already available. Description of Typical Sampled-Data Systems Digital Computer in the Controller. Figure 3 shows a typical digital system. The sampling and coding unit converts continuous data into pulsed data. The digital computer performs a series of operations on the pulsed data and presents the results in pulsed form to the holding and decoding unit which reconverts the results into (approximately) continuous 26-04 FEEDBACK CONTROL signals for use by the continuous control equipment. The feedback may transmit the data in either pulsed or continuous form. ..!J.!4 Sampling and Digital coding ~ computer unit ~ Holding and decoding ~ unit Control equipment (conventional) cit) --'" I Feedback FIG. 3. Typical sampled-data control system. COIlventional control equipment is continuous. In practical operating· systems, a typical method of converting from a continuous variable available as a shaft rotation to a binary code number which represents its magnitude and polarity is to use an encoding device such as a circular binary pattern shown in Fig. 4. The circular tracks may i .r' FIG. 4. Circular binary pattern for analog-digital conversion. The lines across the pattern show that accurate angular position of the photocells or brush contacts is necessary to avoid errors in conversion. be scanned radially with a photoelectric cell or brush pickoffs. The output will be the binary pulse code which represents a particular position of the circular binary pattern; the pattern shown can resolve a circle into 26 = 64 SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS 26-05 parts. Although the encoder shown is for angular rotation, devices have been manufactured for conversion of pressures and flows to digital form. Techniques for converting analog voltages to digital form are also available. (See Vol. 2, Chap. 20.) Periodic Process Controller. A' typical sampled-data regulator for process control is shown in Fig. 5. FIG. 5. Typical sampled-data process regulator. The typical stepwise process controller monitors the controlled variable periodically (every nT) and makes a control adjustment at each sensing instant. In process regulation, the usual (and perhaps the most useful) form of control actuator is a servo motor, which serves as a low-pass filter and also serves to reset the error detector. The following description of a periodic controller is taken from Oldenbourg and Sartorius (Ref. 1). Motor armature lead To process control valve FIG. 6. Schematic form of a periodic controller (chopper bar relay) (Ref. 1). 26-06 FEEDBACK CONTROL From a constructional standpoint, the periodic controller operates about as follows. Through a sensing device, such as a meter pointer, the control variable is observed at equal time intervals. Then, by auxiliary power, additional members of the control loop are suitably actuated according to the sensed position of the pointer. EXAMPLE. The Chopper Bar Controller. See Fig. 6. As long as the meter pointer stands between the two cont'act springs the circuit remains broken, even during the sensing instants. It is closed only when the pointer leaves its mid-position. The duration of closure increases with deviation of the pointer. If the contact closure is used to actuate a reversible constant-speed motor, the control action is called astatic (never quiet) because, with constant actuating error, the control motor moves intermittently across its entire range at an average speed (roughly) proportional to the pointer deviation~ Although periodic controllers may have static correspondence between deviation and motor motion, astatic action will be assumed here because of its greater practical significance (Ref. 1). 2. METHODS OF TRANSIENT ANALYSIS Basic Mathelliatical Relationships Analysis of Sallipler. The output of the sampler (see Fig. 7) is the input modulated by the sampler into a train of pulses: 00 (1) e*(t) = e(t) 00 2: uo(t- nT) = 2: e(nT)uo(t - nT), where uo(t - nT) = the impulse or Dirac delta function occurring at t = nT, in which .1 u(t - nT) - u(t - nT - a) uo(t - nT) = lim , a-tO a T = the sampling period, n = an integer, e(nT) = the value of the input at the sampling instant. The Laplace transform of eq. (1) may be written: 00 (2) E*(s) = 2: e(nT)e-nTs , n=O or eq. (1) may he written in the frequency response form: 1 00 (3) E*(s) = - 2: E (s + jn27r/T). T n=-oo SAMPLED-DATA SYSTEMS AND PERIODIC CONTROllERS '-, e(t) FIG. 7. ~ e*(t) 1 1 g(t) 26-07 e*(t) "0--- e(t) Showing basic mathematical relationships of a sampler. NOTE. Equation (3) may be derived by performing the complex convolution of the input e(t) and the train of unit impulses generated by the sampler, namely (4) E*(s) = E(s)®£ [n~ uo(t - nT)] ; where ® is the symbol denoting complex convolution. Notice that £ L~ uo(t - nT)] = 1 + e-,T + e-2,T + ... 1 or in closed form: [1 - exp (-sT)] 1 has only simple poles at s = jn27r/T, the complex [1 - exp (-sT)] convolution reduces to eq. (3). Sllloothing the Salllpled Data. Normally, the high-frequency components generated by the sampler are removed before the signal reaches the output. Often in sampled-data servo systems, a large portion of the smoothing is accomplished by the components (motors, etc.) between the sampler and the output. Sometimes more smoothing is necessary. One particularly simple low-pass filter is the holding circuit or -boxcar generator. In this circuit, the value of a sampling pulse is held until the next pulse arrives, whereupon the circuit assumes the value of the new pulse. The transfer function of such a network is that of a rectangular pulse of unity height and of T seconds duration, namely Because (5) 1 GH(s) = - [1 - e- sT ]. s Response of a Continuous Filter to Salllpled Data. The response of a continuous transfer member get) of Fig. 7 is 00 (6) e(t) = :E g(t)e(nT)uoCt n=O nT). FEEDBACK CONTROL 26-08 Equation (6) has the Laplace transform C(s) = E*(s)G(s). (7) Equation (6) is a summation of the filter impulse responses which are excited by each sample and is valid only for the case of zero initial conditions. When this condition is not met, a second term must be added to eq. (6) to include the decay of the nonzero initial conditions. Since the system is linear this is not important when considering the stability of the system, but it must be included if a time response is being calculated. The response of the filter only at the sa~pling insLants is: 00 (8) c*(t) = L: e(nT)uo(t - nT)g(t)uo[t - (q - n)T], n=O which has the following Laplace transform: (9) C*(s) = E*(s)G*(s), where G*(s) = £[g*(t)] = £[g(t)uo(t - nT)]. SaIllpled-Data SysteIll Transfer Function. From eq. (9), the sampled-data transfer function or pulse transfer can be defined as (10) C*(s) G*(s) = - - . E*(s) An equivalent form in terms of the z-transform symbolism is indicated in eq. (11). (The z-transform is defined and illustrated in a later paragraph.) (11) C(z) G(z) = - . E(z) Laplace TransforIll Analysis It is possible to use the equations of the previous section to obtain the complete time response, c(t). However, the Laplace transfotms are not rational and it requires considerable labor to obtain the complete response. If the response is calculated only at the sa,~pling instants, the transforms can often be written in closed form and the'labor of calculation and manipulation is greatly reduced. The z-tratisform method is usually used to compute the response at the sensing 'Instants. (See z-Transform Analysis, Sect. 2.) SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS 26-09 Analysis by Difference Equations The analysis of sampled-data control systems leads to characteristic equations, which are difference equations. ForDlulation of the Difference Equations. Difference equations are discussed in Chap. 4. A simple example will illustrate the analysis of a control process by difference equations. EXAMPLE. (See· Fig. 8.) The following simplifying assumptions are made: (a) The displacement of the control means, m, is linear and unlimited. me Control valve l/(Tcs+l) Influx Supply c(t) Controller e(nT) (a) e(l) . (b) FIG. 8. (a) Simple process control. N ole. eel) = TO - (b) Elements of process control eql,livalent to (a). e(t); this quantity is dimensionless. (b) The plant has a simple time constant, Te. (c) The controller is lagfree, i.e., the sensing and positioning times are negligible. The disturbance is assumed at the most unfavorable instant (just after sensing). The control means, m, changes instantly at the sensing instant and remains at its new value throughout the sensing cycle. The analysis is quite simple. Inside a sensing cycle, the behavior of the plant is continuous and may be described by the linear differential equation Te de(t) e(t) = ro - mi me = ro - m T dr - -- + + 26-10 FEEDBACK CONTROL where Tc = the plant time constant, seconds; T = the sampling cycle, seconds; e(t) = the controlled variable error, dimensionless: e(t) = ro - c(t), c(t) is the controlled variable, ro is the set point, all parameters non dimensional and normalized; T = tiT, dimensionless time; m = the net value of control means, dimensionless; mi = the manipulated control means, dimensionless; me = the disturbance of the control means, dimensionless. By using the Laplace transformation it can be shown that the solution to eq. (1) at the nth sensing instant is en = Den-l - (1 - D)mn-l, (12) where en = value of the controlled variable at the nth sensing instant, en-l = value at the (n - l)th sensing instant, D = exp (- T IT c ), the decrement characteristic of the plant and of the sensing cycle, mn-l = value of the control means at the (n-l) sensing instant. N ow consider the behavior of the variable m at the sensing instants. The relationship is assumed to be linear: (13) where K is the strength of the controller and is called the specific step. The minus sign in eq. (13) provides the negative feedback needed for regulation of the variables. Equations (12) and (13) are the simultaneous difference equations of the control action. They lead to the difference equation of the system, namely, en+2 - [1 (14) +D - K(1 - D)]en+l Solution of Linear Difference Equations. + Den = O. The linear homogeneous difference equation may be written: Aoen+q (15) + A1en+q_l + ... + Aq_1en+l + Aqen = If the roots are not equal, eq. (15) has the solution: q (16) en = L: aiZin, i=l where (17) Zi is a root of the auxiliary equation, O. SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS 26-11 The coefficients of eq. (17) are identical with those of eq. (15). Equation (17) is often called the characteristic equation of the system. With q distinct roots the general solution is (18) If k roots are equal the solution is (19) en = [al + a2n + ... + aknk-l]zln + ak+lZ2n + ... + aqZq_k n. Thus, the values en at any sensing instant may be computed. The q summing constants ai are determined by the first q of the e's at the sensing instants. The characteristic equation with vanishing roots has special significance as will be discussed later. Use and Limitations of Difference Equation Method. If the problem is dominated by the sampler, that is, if there is a rather simple control loop whose servomotor is actuated by a periodically applied measurement of the error, this type of analysis is simplest. It is also well to remember that the difference equation method and the z-transform method are synonymous. For higher order systems, the difference equation approach becomes laborious, so that the more methodical z-transform method becomes advantageous. Analysis by z-Transform Method Usefulness. The z-transform is the shorthand rational way to write the Laplace transform of the linear difference equation. It has the same relationship to linear difference equations as the Laplace transformation bears to linear differential equations. The advantages of the z-transform are: (a) it reduces the nonrational Laplace transform of a sampled-data system to a rational transform which facilitates writing transfer functions; (b) it allows definition of the closed loop system response of a sampleddata system (no advantage over difference equations). Limitations. Tables of the more complex z-transforms are not readily available and the polynomials must be expanded into partial fractions. A fundamental limitation of z-transforms is that the time solutions are defined only at the sensing instants. Hidden Oscillations. Because the time solutions are calculated only at sensing instants, it is possible that the sampling frequency may be lower than the characteristic frequency of the plant being controlled, and oscillations may occur which are not apparent from the z-transforms. If such a condition is suspected the z-transform can be modified to give the output between the sampling instances and the existences of such oscillations can be checked. (See Refs. 4 and 5.) FEEDBACK CONTROL 26-12 Basic Relationship. The z-transform is based on the transformation (20) where s is the Laplace operator and T is the sampling period. The Laplace transform of the sampled signal will contain s in the irrational form e-nsT . Substitution of z will produce a rational transform in z. The ztransform is defined as 00 L c(nT)z-n. n=O Table of Useful z- TransforIns. See Table 1. C(z) = (21) TABLE 1. LAPLACE AND Column 1 Row Laplace Transform a b c e- nT8 1 1 - e- T8 1 d s Column 3 Column 2 Time Function z-Transform uo(t) 1 Impulse function at t = 0 uo(t - nT) 1 zn Impulse function at t = nT i(t) z z - 1 Train of impulses at sampling instants u(t) z z - 1 Step function 1 S2 e z-TRANSFORMS (Refs. 2, 5) Tz (z - 1)2 Column 4 Description of Time Function Ramp function f 1 ;a !t2 !.T2 z(z 1) 2 (z _ 1)3 + Quadratic or acceleration function g 1 s+a e-at z z - e- aT Exponential func- h i a S2 sin at + a2 1 s - (ljT) In a j b [s - (ljT) In a]2 k s - (ljT) In a [s - (ljT) In a]2 b2 + b2 + F(s + a) atlT z sin aT Z2 - 2z cos aT tion +1 z z-a Sinusoidal function Constant raised to power t atlT sin bt za sin bT Z2 - 2az cos bT + a2 Sine wave multiplied by atlT atlT cos bt z(z - a cos bT) Z2 - 2az cos bT a2 Cosine wave multiplied by atlT e-atf(t) F(e+aTz) Effect of multiplication bye-at + SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS 26-13 Methods- of Inverting z- Transformation. Real Inversion Integral. The real inversion integral for the z-transform is c(nT) = (22) ~ ~C(z)zn-l dz, 27rJ 'f where the line integration is made of a sufficiently large radius to enclose all roots of the integrand. Partial Fraction Expansion. The z-transform is factored into components so that each term in the expansion can be obtained from Table 1. The usual methods for partial fraction expansion are applicable. (See Chap. 20.) Power Series Expansion. From eq. (21) c(O) 00 C(z) = (23) c(T) c(2T) L: c(nT)z-n = - ZO + - Zl + -Z2- + .. ', n=O which thus expands the z-transform of a variable in an inverse power series in z . . The coefficient c(nT) of z-n is the value of the variable at the nth sensing instant, and the coefficients can be used directly to plot the time function at the sampling instants. z-Transform Block Diagram Algebra. The z..:transform describes the transfer function of two variables at the sensing instants only. Figure 9a illustrates the transform R(z). Figure 9b shows the ~ransform (24) and Fig. 9c (25) (a) r(t) ../ r*(t) ~o ) (b) - /" c/(t) r4 (c) r(t) 0--0)- ../~~~ ~ ~ ~ r~ ro" I (d) FIG. 9. ,.." cl(t) - o---~ Cd(t) Basic z-transform relationships. r· FEEDBACK CONTROL 26-14 In words, if each transfer member is separated from others by synchronous samplers, the z-transforms cascade, i.e., they can be multiplied. But notice that if the transfer members are not separated by a chopper, the ztransform cannot be obtained by multiplying together the z-transforms of the component members. In continuous systems where coupling exists between transfer members a similar -difficulty is encountered. For example Fig. 9d has the transform: Consider Cd(z) = R(Z)G l G2(Z). z 1 Gl(s) = S + 1; z - exp (-T) 1 and s + 2' Then Z G2 (z) = Z - exp (-2T) Cc(z) Z2 R(z) [z - exp (-T)][z - exp (-2T)] --= . , whereas Cd(Z) Z Z R(z). [z - exp (- T)] [z - exp (-2T)] ---- = ------------ z[exp (- T) - exp (-2T)] [z - exp (- T)][z - exp (-2T)] NOTE. The difference is that in the first case, G2 is driven by a train of impulses, whereas in the second, it is driven by the linear response of Gl to its own input pulses. A helpful concept is that the z-transform of a chain of transfer members must be derived from chopper to chopper in the circuit. Table 2 shows some control loops, their Laplace transforms and their z-transforms. The output c may be assumed to be sampled by an imaginary chopper (synchronized with the real one), resulting in c(nT), although this imaginary chopper must be disregarded in traversing the complete control loops. SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS TABLE 2. 26-15 OUTPUT TRANSFORMS FOR BASIC SAMPLED-DATA SYSTEMS a z- Transform Laplace Transform of Output C(s) System 1 r O"""~ Output of C(z) R*(s) R(z) 2 ~~ GR*(s) GR(z) 3 ..!.-o"~ G(s)R*(s) G(z)R(z) 4 5 6 7 ~ G(s)R*(s) HG*(s) 1 l~f G(z)R(z) HG(z) + 1 G*(s)R*(s) 1 ~ ~ + H*(s)G*(s) G(z)R(z) 1 + + + H(z)G(z) RG(z) G( ) [ R( ) _ H(s)RG*(s) ] s .s 1 HG*(s) G2(s)RG l *(s) 1 HG 1G2 *(S) + 1 + HG(z) G2(z)RGi.(zt HG,G 2(z) 1 + a This table is reprinted from an article by Ragazzini and Zadeh (Ref. '2) with the permission of the authors. 3. SAMPLED-DATA SYSTEM STABILITY Stability Criteria of Difference Equations and z- Transforllls The solution of the characteristic difference equation with nonequal roots is m (26) Cn = c(nT) = L n aiZi • i=l Now if c(nT) is to remain finite even for large n, then (27) In words, the inequality (27) states that, for stability of the sampled-data FEEDBACK CONTROL 26-16 system, the roots (or zeros) of its characteristic difference equation must lie inside the unit circle with center at the origin. The unit circle in the z-plane is the periodic limit of stability corresponding to the Routh-Hurwitz stability criteria. The Routh-Hurwitz Stability Criteria. This is used in linear control theory and may be applied to sampled-data systems by using the conformal transformation z+l s=--' z- 1 (28) This transformation changes the unit circle in the z-plane into the left half of the s-plane as shown in Fig. 10. If the Hurwitz conditions are applied z-plane y s-plane t tv +1 +1 - -1 u -1 FIG. 10. The linear transformation Sw = (z + l)/(z - 1), used for deriving stability conditions from the difference equation of control. to the characteristic equation (subjected to the transformation of eq. 28), the conditions can be found which cause the· roots of the transformed equation to lie in the left half of the w-plane. Hence, the roots of the characteristic equation must lie within the unit circle in the z-pJane. As an example consider the second order characteristic equation (29) If the transformation (eq. 28) is used, the transformed equation is (30) BoS2 + BIs + B2 = 0, where Bo = Ao + Al + A 2, BI = (Ao - A 2), B2 = Ao - Al + A 2. In the simple quadratic case, the Hurwitz criterion requires for stability only that all these coefficients have the same sign. Therefore, the periodic SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS 26-17 stability limit of the second order eq. (29) is defined by the dual condition (if Bo > 0): BI > 0 or Ao - A2 > 0, (31) B2 > 0 or Ao - Al + A2 > o. The above procedure can be extended to higher order systems. Use of Frequency Response Methods to Determine Stability Nyquist Diagram. Exact Graphical Procedure. The Nyquist diagram can be drawn by considering G(z) rather than G*(s). The complex plane plot is made by allowing z to vary along a unit circle in the z-domain. The gains and phase at any frequency are found by locating the point on the unit circle (z-domain) corresponding to this angular frequency. The interpretation of the Nyquist diagram follows conventional lines. EXAMPLE. Simple sampled-data system (T = sampling period) (this example is after a similar example by Truxal, Ref. 5): G(s)= =K[~ _ _l-J' K s(s + 1) s z z s +1 Kz(l - e-T ) ] G*(s) = [ - - K = -----z - 1 (z - e-T ) (z - 1) (z - e-T ) for Ws = 4 rad/sec, 2'lr 6.28 Ws 4 - = T = - - = 1.57; G*(s) = e":""1.57 = 0.208, 0.792Kz (z - 1) (z - 0.208) . The unit circle in the z-plane would appear as in Fig. 11. The vectors are shown for 1 rad/sec. At W = 1 rad/sec, z = 1/90°, 0.792K1/90° - 1.414/135° 1.0216 /101.8° G*(·l) J K = 0.86 - T / -146.8° ; likewise K G*(J·2) = 0.52 - /180°. T-- FEEDBACK CONTROL 26-18 Imaginary z 1 rad/sec -1 Re z 3 rad/sec FIG. 11. Pole-zero configuration for G(z), with vectors shown for calculation of Nyquist diagram at w = 1 rad/sec K and Ws . = 4 rad/sec. G(s) = ---, s(s + 1) Continuing the above for other frequencies, a Nyquist plot, G*(jw) , similar to that shown in Fig. 12, could be produced. COlllparison with Alllplitude Modulation. Graphical Approxi",:" lllate Nyquist Plot. Linvill applied the Nyquist diagram to sampled systems based on an approximation for the starred open loop transfer function: 1 00 G*(s) =- T L: G(s + jnws ). n=-oo If G(s) is a good low-pass filter, G*(s) will contain only two or three significant terms. For example G*(j1) is the vector addition G*(jl) = (1/T)[G(jl) + G(j1 - jw s ) + G(j1 + jw s ) + ... ]. If the sampling frequency is 4 rad/sec, G*(j1) = [G(j1) + G( -j3) + G(j5) + G( -j7) + ... ](1/T). SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS 260 100 270 90 26-19 280 80 --..,.---~ 190 170 350 10 170 190 10 350 100 ' 260 90 270 80 280 FIG. 12. Nyquist diagram for G*(s) with G(s) = K/s(s + 1), K/T = 1, and Ws = 4 rad/sec(constructed by using two-term approximation) (Ref. 5). Radial scale numbers are in terms of T /K; circle spacing is O.125T /K. All terms except G(j1) and G( -j3) would be small if G(s) is an effective low-pass filter. The graphical construction on the G*(jw) plane is shown in Fig. 12. NOTE. The value of G*(jw s /2) is purely real. ,At frequencies above ws /2 the Nyquist diagram continues into the upper half-plane until it reaches infinity at the sampling frequency. The only part of the diagram of interest in stability considerations is the section corresponding to frequencies lying between zero and ws /2. The example illustrates that sampling, by itself, increases the phase lag for a given gain. From the Nyquist diagram the maximum gain for a 26-20 FEEDBACK CONTROL stable system is read directly. If only the two terms are used in the series expansion of G*(jw) the allowable KIT is 2.5, if all are used, the KIT is 1.94. The gain in the first case is 3.93; consideration of the rest of the terms reduces the allowable gain to 3.05, since in this example, T = 27r/4 = 1.57. 4. SAMPLED-DATA SYSTEM SYNTHESIS Design Procedure Using z-TransforIns This section is based upon material from Ref. 4. The typical sampled-data system of Fig. 13 wiII be used for illustration. The error unit embodies both the analog-digital transducer, which peri- r-----------, I I Error unit Input l I I I I I I I I I I I I I I I I __________ --1I L FIG. 13. The basic digital servo system. odically expresses the angular position of the output shaft as a number in binary code form and the digital subtractor which takes the difference of this number and the incoming one. The characteristics of the servo motoramplifier combination differ for different· applications. They are assumed to be known and invariant, so that the problem is to synthesize a suitable controller. The following z-transforms will be used: Gc(z) = z-transform of the controller, Gm(z) = z-transform of the motor-amplifier combination, Go(z) = z-transform of the open control loop, G(z) = the closed loop z-transform. (32) G(z) - z-transform of the forward path 1 + Go(z) . PerforInance Criteria. It is useful to assess the performance in terms of the responses to specific driving functions such as a step function, a steady velocity or acceleration, a sinusoidal input of various frequencies or SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS 26-21 a random noise. Any or all such tests may be applied and, since improvement in one respect is often accompanied by deterioration in another, it will be necessary to compromise. Such overriding factors as the demand for zero velocity lag must take precedence. The servo amplifier may overload if it is fed a series of discontinuous pulses representing samples. Its input must be reasonably smooth, and the correction due to one error number will not be complete before the next is begun. An equivalent system is shown in Fig. 14. The system delay, AT, is now shown with the motor amplifier. The controller has two parts, the first of which is characterized by its z-transform, G1 (z), and it modifies the sequence of correction samples supplied to it. The modified sequence is ~Gl(Z) I I I > I G2G m (Z) E I I I I ~ I I G 2 (s) I Motoramplifier Gm(s) 'AT - - - ControllerG o[ ) FIG. 14. A system equivalent to that of Fig. 13. smoothed by the second part, characterized by its transfer function G2 (s), which provides a continuous signal for driving the servo amplifier. This subdivision is unlikely to correspond to any physical separation of the components. The composite expression G1(z) * G2 (s) may be called the "operational instruction" of the controller. The * symbol is used in this case to separate the sampled and continuous portions of the operational instruction and indicates that the information input to the continuous elements is in sampled form. Knowing Gm(s) and the performance requirements, it should be possible to specify G2 (s). For example, the motor amplifier may have the simple transfer function l/[s(l T mS)], where the time constant, T m, is probably smaller than the sampling interval. To avoid sudden changes in velocity, G2 (s) need only be l/s. The next step is the determination of a suitable Go(z), taking into account all the overriding factors. The fact that Go(z) may be expressed as the ratio of two polynomials N(z)/D(z) is also used. Physical Realizability. The order of N must be at least one less than that of D. Poles at z = 1. To have zero static error, the function Go(z) must possess at least one simple pole at z = 1. A second order pole at z = 1 + 26-22 FEEDBACK CONTROL would provide zero velocity lag and a third order pole, a zero acceleration lag characteristic. Cancellation of Poles and Zeros. The characteristic equation of the system is D(z) + N(z, Ll) = O. The parameter Ll indicates the need for checking the values of the system variables between sensing instants. The system will be u'nstable if any root lies on or outside the unit circle (z) = 1. One may be tempted to arrange by adjustment of parameters for the cancellation of a zero by a pole so as to eliminate the root which would otherwise lie outside the unit circle. It is better to increase the sampling frequency. This point cannot be emphasized too strongly, particularly because it is tempting to deal with the special case of no system delay, but this can result in instability that would then be revealed only when the behavior between sampling instants is investigated. See Ref. 5. Design Constants. The suggested method of synthesizing a system is to match the characteristic equation with one known to give satisfactory performance. Lawden et al. (Ref. 6) has used equations of the form (z - a)n = 0, although when n is a small number, it may be desirable to depart from this form. Oldenbourg and Sartorius (Ref. 1) show that minimum control area (see Condition for Minimal Control Area, later in this section) results from the case of vanishing roots, namely zn = O. Examples relevant to continuous systems may well be suitable for sampling systems. The procedure is to arrange for N(z) and D(z) to include between them a number of constants which are adjustable in the design stage. This number should be equal to the order of the characteristic equation. It is always possible to do this, because two additional constants are picked up each time the order of the characteristic equation is increased by one. The characteristic equation zn = 0 leads to minimum control area, but if there is noise present, very little smoothing is provided; as a result, the servo amplifier may be transiently overloaded or driven into saturation. The characteristic equation (z - OA)n has been used by some authors to provide smooth and satisfactory performance in the presence of noise. Analysis of a representa. tive second order sampled-data system by Jury (Ref. 7) leads to the results of Fig. 15, which shows the constant overshoot loci in the z-plane. It can be shown that these loci can be used for higher order systems. Note that a system which has no overshoot has its characteristic roots on the positive real axis. The values of the roots must be less than unity. The simplest expression which meets all the above requirements is the expression Go(z). Dividing it by G2 Gm (z) gives G1 (z), the first part of the operational instruction. The Operational Instruction. It remains to decide how standard components may be assembled into a system having the required operational instruction, G1 (z) * G2 (s). The s-part must describe the properties SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS 26-23 Imaginary FIG. 15. Constant overshoot loci in the z-plane for a unit step input, Mp values transient peak to steady state (Ref. 7). = ratio of of the digital-analog converter included in the error unit. This converter may perform the function of a clamp, which has the operational instruction (1 - e-sT)s-l. Other functions of s may be obtained by the usual synthesis procedures and may lead to further terms in the z-part. This usually leaves an expression in z which is required for the rest of the operational instruction. Generally, such an expression is of the form (33) + A1z-1 + A 2 z-2 + ... + Arz-r 1 + B1z- 1 + B 2 z- 2 + ... + Brz-r Ao This function may be synthesized in many ways; one way of constructing its physical counterpart is with the aid of r delay elements (each equal to T). The output is obtained by the summation of delayed components proportional to the coefficients in the numerator. The correct denominator is obtained by negative feedback of the delayed components proportional to the coefficients in the denominator. EXAMPLE. Synthesis of a Simple Analog System. Figure 14 shows the simple system to be synthesized, and the following assumptions are made regarding it: (a) The servo motor and amplifier are constructed so that the rate of rotation of the motor is proportional to the voltage applied to the amplifier. The transfer function is Gm(s) = K/s. FEEDBACK CONTROL 26-24 (b) The transducer samples and introduces a delay, T, the effect of which is to multiply the z-transform by Z-I. (c) There must be zero static error; hence Go(z) must include the factor (z - 1) in its denominator. (d) There shall be no sudden changes in output velocity. The law of motion shall be quadratic between sampling instants. Hence G2(s) = S-2. Therefore, G2(s)G m (s) = KS-3 and G2Gm (z) = KT 2z(z + 1)/2(z - 1)3, the z-transform being found directly from Table 1. The z-TransforIn of the Loop. This is G (z) o - (34) KT 2 GI (z) (z + 1) . 2(z _ 1)3 It contains the required factor (z - 1) in the denominator. It must also contain adjustable design constants such that the characteristic equation can be forced into one known to be suitable such as (z - a)n = o. The simplest expression for GI (z) which adds two further constants without increasing the order is (35) The operational instruction for the controller is therefore (z - 1)2 (36) (Z2 + BIZ + B 2) * S-2 The z-transform of ,the operational instruction is obtained by replacing S-2 with its z-transform, Tz(z - 1) -2, which reduces eq. (36) to TZ(Z2 BIZ + B 2)-I. The characteristic equation is (37) Z3 (BI - 1)z2 (B2 - BI 1/2KT2)z (1/2KT2 - B 2) = O. + + + + + The simplest third order equation this can be identified with is z3 = o. Choice of the characteristic equation z3 = 0 is known to produce the most rapid recovery from a transient disturbance. (See Condition for Minimal Control Area, later in this section.) If KT2 = BI = 1 and B2 = 72, Go(z) can be written Go(z) = (z 1)(2z3 - Z - 1)-1. (38) + The closed loop z-transform in response to a pulse is (39) or SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS 26-25 Design Procedure Using Frequency Response. The design procedures of linear techniques discussed in previous chapters are applicable. Nyquist and Bode diagrams may be used after the transfer function has been obtained. In working with the Nyquist diagram, it can be seen that lead compensation will increase the bandwidth of the system. If the sampling frequency is not increased, no additional high-frequency information will be passed. This illustrates a difficulty which one may encounter i~ the synthesis of sampled data systems. Design Procedure Using Root Locus Techniques. As in frequency response techniques, the root locus could be used as an aid for synthesizing sampled-data systems. However, it can be shown that the desired root locus is the positive real axis in the z-plane .. As previously mentioned, the characteristic equation zn = 0 is known to lead to the fastest recovery of the system from a disturbance. If noise is present (z - a)n = 0 is the desired characteristic, where a is a number between zero and 1. As Jury has shown (see Fig. 15), the loci of constant overshoot also illustrate that the positive z-axis is the desired place to locate the roots of the characteristic equation. The circumference of the circle in the z-plane having unity radius is the periodic liJnit of stability. In summary, the sampling frequency controls the bandpass and thus the speed with which the system can transmit information. The roots of the characteristic equation should be placed as near the origin as permissible. Placement at the origin is known to produce the liveliest system; if noise is present, the roots must be moved along the positive real axis in the z-plane toward z = 1. It should be noted that in many practical cases th~ above simple criterion for performance will have to be modified for one or more practical reasons. In such cases the approach suggested is to use the above rules for the first approximation and then to introduce the other considerations. Performance Charts for Typical Sampled-Data Systems Performance Index: Control Area. To evaluate the results of computations and to choose the most favorable conditions of operation, the copcept of control effectiveness, measured by the smallness of the control area, is very useful. For continuous controllers, the control area is defined as (40) For sampled-data controllers the calculation is not so simple, except in one case, when the control process has the initial value of zero at the first FEEDBACK CONTROL 26-26 sensing instant. In this case (which leads to the largest control area) the control area is F (41) T = sampling period. -= T It can be seen the control area is the error-time integral. Condition for Minintal Control Area. The characteristic equation with vanishing roots, namely zn = 0, has the least control area. Such a system can be shown to recover most quickly from a disturbance. However, if there is noise present, or if such a characteristic equation is physically unrealizable, the characteristic equation (z - a)n = 0 is used. It is used in the presence of noise to provide smoothing of the impulses, and it is used in the second case so that normal system components may be employed. Second Order Systent with Dead Tinte and with No Contpensation. (See Fig. 16.) The characteristic equation for this system is (42) Z2 + [K(1 - D/L) - (1 + D)]z + D - KD(1 - I/L) = 0, where L = exp ( - TL/Tc) and D = exp ( - T /Tc). ' If T < TL < 2T, the equation becomes (43) Z3 - (1 + D)Z2 + [D + K(1 - D 2/L)]z + KD(D/L - 1) = O. Dead time does not increase the order of the characteristic equation as long as TL < T. When T < TL <2T, the order of the equation is increased from 2 to 3. Further increase of the dead time (or shortening of the sensing time) leads to successively higher order characteristic equations, and it can be shown that the equation becomes transcendental for the case of the continuous controller. It can be shown that the controlled variable never oscillates if all the roots lie on the positive axis of the z-plane between 0 and + 1. This fact is used to force eq. (42) to have a double positive root less than.unity. The control factors leading to this aperiodic limit are shown in Fig. 16. It is not possible to cause the roots to vanish (i.e., zn = 0) unless compensation is added. Adding compensation introduces two arbitrary constants into the characteristic equation. The two extra constants can be used to design the system characteristic for vanishing roots. SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS C Co Sampler r- '" I" 5 ~r-, "- 2 ......... 1 ~ 10 ~I\ , 1\ ~ .~I- T-r-, r-, ...... K 0.5 0.1 k .' 0.2 - I-- - I-- T/Tc =0.1 / / Ii / 0.01 0.01 J~ 0.1 10 -TL/Tc FIG. 16. Ex;cursion-dependent periodic control on a plant with first order time constant and dead time; aperiodic limit (Ref. 1). FEEDBACK CONTROL 26-28 ~--_co In 500 '/ /' v / V - E-i<'> 100 E-i'" .......... T= Tc K '" j/F/MTc 10 / - - I TdTc / ...... I. . . . . . . . . . r-... / 1 , 0.01 ~~ 1/ 0.1 0.01 10 FIG. 17. Excursion~dep(mdent periodic control with rate action, on a plant with first order time constant and dead time; parameters for optimal response (Ref. 1). Second Order SysteIn with Dead Tilne and Rate Stabilization. (See Fig. 17.) This system leads to the equation (44) AOZ2 + Alz + A2 = 0, where Ao = 1, Al = K(1 - D/L) + TI/Tc·D/L - (1 + D), A2 = D[I + K(I/L - 1) - TI/Tc·I/L], L = exp (-TL/Tc), D = exp (-T/Tc), M = step disturbance in m, F = control area. The control area will assume an absolute minimum if all the roots vanish. SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS 26-29 This limit arises when Ad Ao = 0 and A2/ Ao = o. These conditions are called optimal because they lead to least control a.rea. They are valid only in the range 0 < TL/Tc < T /T c. The con.trol area increases steadily with T /Tc so that if one is free to choose T /Tc, the most favorable operating conditions are obtained when T = T L • The action following a step disturbance inside the control loop is shown in Fig. 18. The parameters for optimal response are shown in Fig. 17. r-l~2-1 Interva Is ,,,., ... Z t Zo Tc =0 ...... ... V~ Z2 .... , =0 ............... =TL ----- m 1 ~ ! -- FIG. 18. Example of a difference equation of second order with vanishing characteristic values (Ref. 1). Third Order Systelll with Dead Tillle and with Delayed Rate COlllpensation. The system of Fig. 19 leads to the following character- istic equation: (45) where Ao Al A2 A3 Q = = = = = 1, 1(1 - D/L pQ) - 1 - D - Q, 1([D/L(l + Q) - pQ(l + D) - D - Q] + D + DQ + Q, DQ[1(l p - l/L) - 1], exp ( - T /TR), and L, D, J.11, F are defined above (see eq. 44). + + Here again optimal response is possible with its finite control process similar to that shown in Fig. 18. The parameters for optimal response are shown in Fig. 19. COlllparison of Continuous and Salllpled-Data Controllers. If the process has no dead time, sampled-data control is decidedly less favor- FEEDBACK CONTROL 26-30 Co I I I I'\. ""\ I I / ~ 1-1' I // y/ ~ / ~ ~ TR/TcV L.... '" '"I£'\. "" "" / b< r7 f ~ Y II l/p / II 1/ V / 0.01 0.01 / ,/ ;- ,/ 100 v "I"-1\ / t 1I ~ / K 1/ ~ 0.1 V 11/ T= TL "' r--,K i J ,/ I) F/MTc : ,. " 10 "- "' \.. i' t\. "- 1I 0.1 1 . . . . r-. t--.i-o- 1 10 FIG. 19. Excursion-dependent periodic control with retractile followup, on a plant with first order time constant and dead time; parameters for optimal response (Ref. 1). able than the corresponding continuous control. Figure 20 shows the relationships which are present when there is dead time in the plant. If the control areas of sampled-data and continuous controllers without stabilization are compared at the aperiodic limit, the two upper solid lines of Fig. 20· are obtained. For small values of the dead time, these two curves can hardly be distinguished from one another. Decidedly different relations are present, however, if the controller includes a stabilizing device, since SAMPLED-DATA SYSTEMS AND PERIODIC CONTROLLERS 26-31 then a control response which terminates in a finite time can be had with a sampled-data controller. A comparison of control areas, Fig. 20, shows that sampled-data control gives appreciably better results. To be sure, the combination 'of parameters which causes vanishing roots of the characteristic equation is not possible for arbitrarily small dead times Continuous Co no stabilizer f- CR with retractile follow-up 10 f- C with rate action 40 ~ 20 Periodic Po no stabilizer PR with retractile I~V follow-up P wit~ rate ~V,/ ' i // action [,/ / ~~O/~/ )('~ / / /' 1// PRj I' // /I • t /1/ 1/
Source Exif Data:File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.3 Linearized : No XMP Toolkit : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37 Producer : Adobe Acrobat 9.13 Paper Capture Plug-in Modify Date : 2009:10:20 21:16:15-07:00 Create Date : 2009:10:20 21:16:15-07:00 Metadata Date : 2009:10:20 21:16:15-07:00 Format : application/pdf Document ID : uuid:323f3690-90a6-40aa-aa14-aba174e2dee1 Instance ID : uuid:011d175a-731d-4308-b140-5ea85b899b5d Page Layout : SinglePage Page Mode : UseNone Page Count : 1037EXIF Metadata provided by EXIF.tools