A Guide To Advanced Linear Algebra

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 266 [warning: Documents this large are best viewed by clicking the View PDF Link!]

“book” — 2011/3/4 — 17:06 — page i — #1
i
i
i
i
i
i
i
i
A Guide
to
Advanced Linear Algebra
“book” — 2011/3/4 — 17:06 — page ii — #2
i
i
i
i
i
i
i
i
c
2011 by
The Mathematical Association of America (Incorporated)
Library of Congress Catalog Card Number 2011923993
Print Edition ISBN 978-0-88385-351-1
Electronic Edition ISBN 978-0-88385-967-4
Printed in the United States of America
Current Printing (last digit):
10987654321
“book” — 2011/3/4 — 17:06 — page iii — #3
i
i
i
i
i
i
i
i
The Dolciani Mathematical Expositions
NUMBER FORTY-FOUR
MAA Guides # 6
A Guide
to
Advanced Linear Algebra
Steven H. Weintraub
Lehigh University
Published and Distributed by
The Mathematical Association of America
“book” — 2011/3/4 — 17:06 — page iv — #4
i
i
i
i
i
i
i
i
DOLCIANI MATHEMATICAL EXPOSITIONS
Committee on Books
Frank Farris, Chair
Dolciani Mathematical Expositions Editorial Board
Underwood Dudley, Editor
Jeremy S. Case
Rosalie A. Dance
Tevian Dray
Thomas M. Halverson
Patricia B. Humphrey
Michael J. McAsey
Michael J. Mossinghoff
Jonathan Rogness
Thomas Q. Sibley
“book” — 2011/3/4 — 17:06 — page v — #5
i
i
i
i
i
i
i
i
The DOLCIANI MATHEMATICAL EXPOSITIONS series of the Mathematical
Association of America was established through a generous gift to the Association
from Mary P. Dolciani, Professor of Mathematics at Hunter College of the City Uni-
versity of New York. In making the gift, Professor Dolciani, herself an exceptionally
talented and successful expositor of mathematics, had the purpose of furthering the
ideal of excellence in mathematical exposition.
The Association, for its part, was delighted to accept the gracious gesture initiat-
ing the revolving fund for this series from one who has served the Association with
distinction, both as a member of the Committee on Publications and as a member of
the Board of Governors. It was with genuine pleasure that the Board chose to name
the series in her honor.
The books in the series are selected for their lucid expository style and stimu-
lating mathematical content. Typically, they contain an ample supply of exercises,
many with accompanying solutions. They are intended to be sufficiently elementary
for the undergraduate and even the mathematically inclined high-school student to
understand and enjoy, but also to be interesting and sometimes challenging to the
more advanced mathematician.
1. Mathematical Gems, Ross Honsberger
2. Mathematical Gems II, Ross Honsberger
3. Mathematical Morsels, Ross Honsberger
4. Mathematical Plums, Ross Honsberger (ed.)
5. Great Moments in Mathematics (Before 1650), Howard Eves
6. Maxima and Minima without Calculus, Ivan Niven
7. Great Moments in Mathematics (After 1650), Howard Eves
8. Map Coloring, Polyhedra, and the Four-Color Problem, David Barnette
9. Mathematical Gems III, Ross Honsberger
10. More Mathematical Morsels, Ross Honsberger
11. Old and New Unsolved Problems in Plane Geometry and Number Theory,
Victor Klee and Stan Wagon
12. Problems for Mathematicians, Young and Old, Paul R. Halmos
13. Excursions in Calculus: An Interplay of the Continuous and the Discrete,
Robert M. Young
14. The Wohascum County Problem Book, George T. Gilbert, Mark Krusemeyer,
and Loren C. Larson
15. Lion Hunting and Other Mathematical Pursuits: A Collection of Mathematics,
Verse, and Stories by Ralph P. Boas, Jr., edited by Gerald L. Alexanderson and
Dale H. Mugler
16. Linear Algebra Problem Book, Paul R. Halmos
17. From Erd˝os to Kiev: Problems of Olympiad Caliber, Ross Honsberger
18. Which Way Did the Bicycle Go? . . . and Other Intriguing Mathematical Mys-
teries, Joseph D. E. Konhauser, Dan Velleman, and Stan Wagon
“book” — 2011/3/4 — 17:06 — page vi — #6
i
i
i
i
i
i
i
i
19. In P´olya’s Footsteps: Miscellaneous Problems and Essays, Ross Honsberger
20. Diophantus and Diophantine Equations, I. G. Bashmakova (Updated by Joseph
Silverman and translated by Abe Shenitzer)
21. Logic as Algebra, Paul Halmos and Steven Givant
22. Euler: The Master of Us All, William Dunham
23. The Beginnings and Evolution of Algebra, I. G. Bashmakovaand G. S. Smirnova
(Translated by Abe Shenitzer)
24. Mathematical Chestnuts from Around the World, Ross Honsberger
25. Counting on Frameworks: Mathematics to Aid the Design of Rigid Structures,
Jack E. Graver
26. Mathematical Diamonds, Ross Honsberger
27. Proofs that Really Count: The Art of Combinatorial Proof, Arthur T. Benjamin
and Jennifer J. Quinn
28. Mathematical Delights, Ross Honsberger
29. Conics, Keith Kendig
30. Hesiods Anvil: falling and spinning through heaven and earth, Andrew J.
Simoson
31. A Garden of Integrals, Frank E. Burk
32. A Guide to Complex Variables (MAA Guides #1), Steven G. Krantz
33. Sink or Float? Thought Problems in Math and Physics, Keith Kendig
34. Biscuits of Number Theory, Arthur T. Benjamin and Ezra Brown
35. Uncommon Mathematical Excursions: Polynomia and Related Realms, Dan
Kalman
36. When Less is More: Visualizing Basic Inequalities, Claudi Alsina and Roger
B. Nelsen
37. A Guide to Advanced Real Analysis (MAA Guides #2), Gerald B. Folland
38. A Guide to Real Variables (MAA Guides #3), Steven G. Krantz
39. Voltaire’s Riddle: Microm´egas and the measure of all things, Andrew J.
Simoson
40. A Guide to Topology, (MAA Guides #4), Steven G. Krantz
41. A Guide to Elementary Number Theory, (MAA Guides #5), Underwood Dud-
ley
42. Charming Proofs: A Journey into Elegant Mathematics, Claudi Alsina and
Roger B. Nelsen
43. Mathematics and Sports, edited by Joseph A. Gallian
44. A Guide to Advanced Linear Algebra, (MAA Guides #6), Steven H. Weintraub
MAA Service Center
P.O. Box 91112
Washington, DC 20090-1112
1-800-331-1MAA FAX: 1-301-206-9789
“book” — 2011/3/4 — 17:06 — page vii — #7
i
i
i
i
i
i
i
i
Preface
Linear algebra is a beautiful and mature field of mathematics, and mathe-
maticians have developed highly effective methods for solving its problems.
It is a subject well worth studying for its own sake.
More than that, linear algebra occupies a central place in modern math-
ematics. Students in algebra studying Galois theory, students in analysis
studying function spaces, students in topology studying homology and co-
homology, or for that matter students in just about any area of mathematics,
studying just about anything, need to have a sound knowledge of linear al-
gebra.
We have written a book that we hope will be broadly useful. The core of
linear algebra is essential to every mathematician, and we not only treat this
core, but add material that is essential to mathematicians in specific fields,
even if not all of it is essential to everybody.
This is a book for advanced students. We presume you are already famil-
iar with elementary linear algebra, and that you know how to multiply ma-
trices, solve linear systems, etc. We do not treat elementary material here,
though in places we return to elementary material from a more advanced
standpoint to show you what it really means. However, we do not presume
you are already a mature mathematician, and in places we explain what (we
feel) is the “right” way to understand the material. The author feels that one
of the main duties of a teacher is to provide a viewpoint on the subject, and
we take pains to do that here.
One thing that you should learn about linear algebra now, if you have not
already done so, is the following: Linear algebra is about vector spaces and
linear transformations, not about matrices. This is very much the approach
of this book, as you will see upon reading it.
We treat both the finite and infinite dimensional cases in this book,
and point out the differences between them, but the bulk of our attention
is devoted to the finite dimensional case. There are two reasons: First, the
vii
“book” — 2011/3/4 — 17:06 — page viii — #8
i
i
i
i
i
i
i
i
viii A Guide to Advanced Linear Algebra
strongest results are available here, and second, this is the case most widely
used in mathematics. (Of course, matrices are available only in the finite
dimensional case, but, even here, we almost always argue in terms of linear
transformations rather than matrices.)
We regard linear algebra as part of algebra, and that guides our ap-
proach. But we have followed a middle ground. One of the principal goals
of this book is to derive canonical forms for linear transformations on fi-
nite dimensional vector spaces, i.e., rational and Jordan canonical forms.
The quickest and perhaps most enlightening approach is to derive them as
corollaries of the basic structure theorems for modules over a principal ideal
domain (PID). Doing so would require a good deal of background, which
would limit the utility of this book. Thus our main line of approach does
not use these, though we indicate this approach in an appendix. Instead we
adopt a more direct argument.
We have written a book that we feel is a thorough, though intentionally
not encyclopedic, treatment of linear algebra, one that contains material
that is both important and deservedly “well known”. In a few places we
have succumbed to temptation and included material that is not quite so
well known, but that in our opinion should be.
We hope that you will be enlightened not only by the specific material
in the book but by its style of argument–we hope it will help you learn
to “think like a mathematician”. We also hope this book will serve as a
valuable reference throughout your mathematical career.
Here is a rough outline of the text. We begin, in Chapter 1, by intro-
ducing the basic notions of linear algebra, vector spaces and linear trans-
formations, and establish some of their most important properties. In Chap-
ter 2we introduce coordinates for vectors and matrices for linear trans-
formations. In the first half of Chapter 3we establish the basic properties
of determinants, and in the last half of that chapter we give some of their
applications. Chapters 4and 5are devoted to the analysis of the structure
of a single linear transformation from a finite dimensional vector space to
itself. In particular, in these chapters, we develop eigenvalues, eigenvec-
tors, and generalized eigenvectors, and derive rational and Jordan canonical
forms. In Chapter 6we introduce additional structure on a vector space,
that of a (bilinear, sesquilinear, or quadratic) form, and analyze these forms.
In Chapter 7we specialize the situation of Chapter 6to that of a positive
definite inner product on a real or complex vector space, and in particular
derive the spectral theorem. In Chapter 8we provide an introduction to Lie
groups, which are central objects in mathematics and are a meeting place for
“book” — 2011/3/4 — 17:06 — page ix — #9
i
i
i
i
i
i
i
i
Preface ix
algebra, analysis, and topology. (For this chapter we require the additional
background knowledge of the inverse function theorem.) In Appendix A we
review basic properties of polynomials and polynomial rings that we use,
and in Appendix B we rederive some of our results on canonical forms of a
linear transformation from the structure theorems for modules over a PID.
We have provided complete proofs of just about all the results in this
book, except that we have often omitted proofs that are routine without
comment.
As we have remarked above, we have tried to write a book that will be
widely applicable. This book is written in an algebraic spirit, so the stu-
dent of algebra will find items of interest and particular applications, too
numerous to mention here, throughout the book. The student of analysis
will appreciate the fact that we not only consider finite dimensional vec-
tor spaces, but also infinite dimensional ones, and will also appreciate our
material on inner product spaces and our particular examples of function
spaces. The student of algebraic topology will appreciate our dimension-
counting arguments and our careful attention to duality, and the student of
differential topology will appreciate our material on orientations of vector
spaces and our introduction to Lie groups.
No book can treat everything. With the exception of a short section on
Hilbert matrices, we do not treat computational issues at all. They do not fit
in with our theoretical approach. Students in numerical analysis, for exam-
ple, will need to look elsewhere for this material.
To close this preface, we establish some notational conventions. We will
denote both sets (usually but not always sets of vectors) and linear transfor-
mations by script letters A;B; : : : ; Z. We will tend to use script letters near
the front of the alphabet for sets and script letters near the end of the alpha-
bet for linear transformations. Twill always denote a linear transformation
and Iwill always denote the identity linear transformation. Some particu-
lar linear transformations will have particular notations, often in boldface.
Capital letters will denote either vector spaces or matrices. We will tend to
denote vector spaces by capital letters near the end of the alphabet, and V
will always denote a vector space. Also, Iwill almost always denote the
identity matrix. Eand Fwill denote arbitrary fields and Q,R, and Cwill
denote the fields of rational, real, and complex numbers respectively. Zwill
denote the ring of integers. We will use ABto mean that Ais a sub-
set of Band ABto mean that Ais a proper subset of B.AD.aij /
will mean that Ais the matrix whose entry in the .i; j / position is aij .
ADŒv1jv2j  j vnwill mean that Ais the matrix whose ith column
“book” — 2011/3/4 — 17:06 — page x — #10
i
i
i
i
i
i
i
i
x A Guide to Advanced Linear Algebra
is vi. We will denote the transpose of the matrix Aby tA(not by At). Fi-
nally, we will write BD fvigas shorthand for BD fvigi2Iwhere Iis an
indexing set, and Pciviwill mean Pi2Icivi.
We follow a conventional numbering scheme with, for example, Re-
mark 1.3.12 denoting the 12th numbered item in Section 1.3 of Chapter 1.
We use to denote the end of proofs. Theorems, etc., are set in italics, so
the end of italics denotes the end of their statements. But definitions, etc.,
are set in ordinary type, so there is ordinarily nothing to denote the end of
their statements. We use Þfor that.
Steven H. Weintraub
Bethlehem, PA, USA
January 2010
“book” — 2011/3/4 17:06 — page xi — #11
i
i
i
i
i
i
i
i
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1 Vector spaces and linear transformations . . . . . . . . . . . . . . . . . . . . . 1
1.1 Basic definitions and examples . . . . . . . . . . . . . . . 1
1.2 Basis and dimension . . . . . . . . . . . . . . . . . . . . . 8
1.3 Dimension counting and applications . . . . . . . . . . . . 17
1.4 Subspaces and direct sum decompositions . . . . . . . . . 22
1.5 Affine subspaces and quotient spaces . . . . . . . . . . . . 24
1.6 Dualspaces ......................... 30
2 Coordinates................................................. 41
2.1 Coordinates for vectors . . . . . . . . . . . . . . . . . . . 42
2.2 Matrices for linear transformations . . . . . . . . . . . . . 43
2.3 Change of basis . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 The matrix of the dual . . . . . . . . . . . . . . . . . . . . 53
3 Determinants ............................................... 57
3.1 The geometry of volumes . . . . . . . . . . . . . . . . . . 57
3.2 Existence and uniqueness of determinants . . . . . . . . . 65
3.3 Further properties . . . . . . . . . . . . . . . . . . . . . . 68
3.4 Integrality .......................... 74
3.5 Orientation.......................... 78
3.6 Hilbert matrices . . . . . . . . . . . . . . . . . . . . . . . 86
4 The structure of a linear transformation I . . . . . . . . . . . . . . . . . . . . 89
4.1 Eigenvalues, eigenvectors, and generalized eigenvectors . . 91
4.2 Some structural results . . . . . . . . . . . . . . . . . . . 97
4.3 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . 102
4.4 An application to differential equations . . . . . . . . . . . 104
xi
“book” — 2011/3/4 — 17:06 — page xii — #12
i
i
i
i
i
i
i
i
xii Contents
5 The structure of a linear transformation II . . . . . . . . . . . . . . . . . . . 109
5.1 Annihilating, minimum, and characteristic polynomials . . 111
5.2 Invariant subspaces and quotient spaces . . . . . . . . . . . 116
5.3 The relationship between the characteristic and minimum
polynomials ......................... 119
5.4 Invariant subspaces and invariant complements . . . . . . . 122
5.5 Rational canonical form . . . . . . . . . . . . . . . . . . . 132
5.6 Jordan canonical form . . . . . . . . . . . . . . . . . . . . 136
5.7 An algorithm for Jordan canonical form and Jordan basis . 140
5.8 Field extensions . . . . . . . . . . . . . . . . . . . . . . . 157
5.9 More than one linear transformation . . . . . . . . . . . . 159
6 Bilinear, sesquilinear, and quadratic forms. . . . . . . . . . . . . . . . . . . . 165
6.1 Basic definitions and results . . . . . . . . . . . . . . . . . 165
6.2 Characterization and classification theorems . . . . . . . . 170
6.3 The adjoint of a linear transformation . . . . . . . . . . . . 184
7 Real and complex inner product spaces . . . . . . . . . . . . . . . . . . . . . . . 189
7.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . 189
7.2 The Gram-Schmidt process . . . . . . . . . . . . . . . . . 196
7.3 Adjoints, normal linear transformations, and
the spectral theorem . . . . . . . . . . . . . . . . . . . . . 202
7.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.5 The singular value decomposition . . . . . . . . . . . . . . 219
8 Matrix groups as Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.1 Definition and first examples . . . . . . . . . . . . . . . . 223
8.2 Isometry groups of forms . . . . . . . . . . . . . . . . . . 224
A Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
A.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . 231
A.2 Unique factorization . . . . . . . . . . . . . . . . . . . . . 236
A.3 Polynomials as expressions and polynomials as functions . 239
B Modules over principal ideal domains . . . . . . . . . . . . . . . . . . . . . . . . 241
B.1 Definitions and structure theorems . . . . . . . . . . . . . 241
B.2 Derivation of canonical forms . . . . . . . . . . . . . . . . 242
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
“book” — 2011/3/4 — 17:06 — page xiii — #13
i
i
i
i
i
i
i
i
To the binary tree:
Judy
RachelJodie
Ethan Logan Blake Natalie
“book” — 2011/3/4 17:06 — page xiv — #14
i
i
i
i
i
i
i
i
“book” — 2011/3/4 — 17:06 — page 1 — #15
i
i
i
i
i
i
i
i
CHAPTER 1
Vector spaces and linear
transformations
In this chapter we introduce the objects we will be studying and investigate
some of their basic properties.
1.1 Basic definitions and examples
Definition 1.1.1. A vector space Vover a field Fis a set Vwith a pair
of operations .u; v/ 7! uCvfor u; v 2Vand .c; u/ 7! cu for c2F,
v2Vsatisfying the following axioms:
(1) uCv2Vfor any u; v 2V.
(2) uCvDvCufor any u; v 2V.
(3) uC.v Cw/ D.u Cv/ Cwfor any u; v; w 2V.
(4) There is a 02Vsuch that 0CvDvC0Dvfor any v2V.
(5) For any v2Vthere is a v2Vsuch that vC.v/ D.v/CvD0.
(6) cv 2Vfor any c2F,v2V.
(7) c.u Cv/ Dcu Ccv for any c2F,u; v 2V.
(8) .c Cd /u Dcu Cdu for any c; d 2F,u2V.
(9) c.du/ D.cd /u for any c; d 2F,u2V.
(10) 1u Dufor any u2V.
Þ
1
“book” — 2011/3/4 — 17:06 — page 2 — #16
i
i
i
i
i
i
i
i
2 1. Vector spaces and linear transformations
Remark 1.1.2. The elements of Fare called scalars and the elements of
Vare called vectors. The operation .u; v/ 7! uCvis called vector addition
and the operation .c; u/ 7! cu is called scalar multiplication.Þ
Remark 1.1.3. Properties (1) through (5) of Definition 1.1.1 state that V
forms an abelian group under the operation of vector addition. Þ
Lemma 1.1.4. (1) 02Vis unique.
(2) 0v D0for any v2V.
(3) .1/v D vfor any v2V.
Definition 1.1.5. Let Vbe a vector space. Wis a subspace of Vif
WVand Wis a vector space with the same operations of vector addition
and scalar multiplication as V.Þ
The following result gives an easy way of testing whether a subset W
of Vis a subspace of V.
Lemma 1.1.6. Let WV. Then Wis a subspace of Vif and only if it
satisfies the equivalent sets of conditions (0), (1), and (2), or (0 0), (1), and
(2):
(0) Wis nonempty.
(0 0)02W.
(1) If w1; w22Wthen w1Cw22W.
(2) If w2Wand c2F, then cw 2W.
Example 1.1.7. (1) The archetypal example of a vector space is Fn, for
a positive integer n, the space of column vectors
FnD8
ˆ
<
ˆ
:2
6
4
a1
:
:
:
an
3
7
5ˇˇˇˇˇˇˇ
ai2F9
>
=
>
;:
We also have the spaces “little F1” and “big F1” which we denote by
F1and F11 respectively (this is nonstandard notation) that are defined
by
F1D8
ˆ
<
ˆ
:2
6
4
a1
a2
:
:
:
3
7
5ˇˇˇˇˇˇˇ
ai2F;only finitely many nonzero9
>
=
>
;;
F11 D8
ˆ
<
ˆ
:2
6
4
a1
a2
:
:
:
3
7
5ˇˇˇˇˇˇˇ
ai2F9
>
=
>
;:
“book” — 2011/3/4 — 17:06 — page 3 — #17
i
i
i
i
i
i
i
i
1.1. Basic definitions and examples 3
F1is a subspace of F11.
Let eidenote the vector in Fn,F1, or F11 (which we are considering
should be clear from the context) with a 1in position iand 0everywhere
else. A formal definition appears in Example 1.2.18(1).
(2) We have the vector spaces rFn,rF1, and rF11 defined analo-
gously to Fn,F1, and F11 but using row vectors rather than column
vectors.
(3) Mm;n.F/D fm-by-nmatrices with entries in Fg. We abbreviate
Mm;m.F/by Mm.F/.
(4) P .F/D fpolynomials p.x/ with coefficients in Fg. For a nonnega-
tive integer n,Pn.F/D fpolynomials p.x/ of degree at most nwith
coefficients in Fg. Although the degree of the 0polynomial is undefined,
we adopt the convention that 02Pn.F/for every n. Observe that Pn.F/
is a subspace of P.F/, and that Pm.F/is a subspace of Pn.F/whenever
mn. (We also use the notation FŒxfor P .F/. We use P .F/when we
want to consider polynomials as elements of a vector space while we use
FŒxwhen we want to consider their properties as polynomials.)
(5) Fis itself an F-vector space. If Eis any field containing Fas a
subfield (in which case we say Eis an extension field of F), Eis an F-
vector space. For example, Cis an R-vector space.
(6) If Ais a set, ffunctions fWA!Fgis a vector space. We denote it
by FA.
(7) C0.R/, the space of continuous functions fWR!R, is a vector
space. For any k > 0,Ck.R/D ffunctions fWR!Rjf; f 0; : : : ; f .k/
are all continuousgis a vector space. Also, C1.R/D ffunctions fWR!
Rjfhas continuous derivatives of all ordersgis a vector space. Þ
Not only do we want to consider vector spaces, we want to consider the
appropriate sort of functions between them, given by the following defini-
tion.
Definition 1.1.8. Let Vand Wbe vector spaces. A function TWV!
Wis a linear transformation if for all v; v1; v22Vand all c2F:
(1) T.cv/ DcT.v/.
(2) T.v1Cv2/DT.v1/CT.v2/.Þ
Lemma 1.1.9. Let TWV!Wbe a linear transformation. Then T.0/ D
0.
“book” — 2011/3/4 — 17:06 — page 4 — #18
i
i
i
i
i
i
i
i
4 1. Vector spaces and linear transformations
Definition 1.1.10. Let Vbe a vector space. The identity linear trans-
formation IWV!Vis the linear transformation defined by
I.v/ Dvfor every v2V: Þ
Here is one of the most important ways of constructing linear transfor-
mations.
Example 1.1.11. Let Abe an m-by-nmatrix with entries in F,A2
Mm;n.F/. Then TAWFn!Fmdefined by
TA.v/ DAv
is a linear transformation. Þ
Lemma 1.1.12. (1) Let Aand Bbe m-by-nmatrices. Then ADBif and
only if TADTB.
(2) Every linear transformation TWFn!Fmis TAfor some unique
m-by-nmatrix A.
Proof. (1) Clearly if ADB, then TADTB. Conversely, suppose TADTB.
Then TA.v/ DTB.v/ for every v2Fn. In particular, if vDei, then
TA.ei/DTB.ei/, i.e., AeiDBei. But Aeiis just the ith column of A, and
Beiis just the ith column of B. Since this is true for every i,ADB.
(2) TDTAfor
ADT.e1/jT.e2/j  j T.en/:
Definition 1.1.13. The n-by-nidentity matrix Iis the matrix defined
by the equation
IDTI:Þ
It is easy to check that this gives the usual definition of the identity ma-
trix.
We now use Lemma 1.1.12 to define matrix operations.
Definition 1.1.14. (1) Let Abe an m-by-nmatrix and cbe a scalar.
Then DDcA is the matrix defined by TDDcTA.
(2) Let Aand Bbe m-by-nmatrices. Then EDACBis the matrix
defined by TEDTACTB.Þ
It is easy to check that these give the usual definitions of the scalar
multiple cA and the matrix sum ACB.
“book” — 2011/3/4 — 17:06 — page 5 — #19
i
i
i
i
i
i
i
i
1.1. Basic definitions and examples 5
Theorem 1.1.15. Let U,V, and Wbe vector spaces. Let TWU!Vand
SWV!Wbe linear transformations. Then the composition SıTWU!
W, defined by .SıT/.u/ DS.T.u//, is a linear transformation.
Proof.
.SıT/.cu/ DS.T.cu// DS.cT.u//
DcS.T.u// Dc.SıT/.u/
and
.SıT/.u1Cu2/DS.T.u1Cu2// DS.T.u1/CT.u2//
DS.T.u1// CS.T.u2//
D.SıT/.u1/C.SıT/.u2/:
We now use Theorem 1.1.15 to define matrix multiplication.
Definition 1.1.16. Let Abe an m-by-nmatrix and Bbe an n-by-p
matrix. Then DDAB is the m-by-pmatrix defined by TDDTAıTB.Þ
It is routine to check that this gives the usual definition of matrix multi-
plication.
Theorem 1.1.17. Matrix multiplication is associative, i.e., if Ais an m-by-
nmatrix, Bis an n-by-pmatrix, and Cis a p-by-qmatrix, then A.BC / D
.AB/C .
Proof. Let DDA.BC / and ED.AB/C . Then Dis the unique matrix
defined by TDDTAıTBC DTAı.TBıTC/, while Eis the unique
matrix defined by TEDTAB ıTCD.TAıTB/ıTC. But composition of
functions is associative, TAı.TBıTC/D.TAıTB/ıTC, so DDE, i.e.,
A.BC / D.AB/C .
Lemma 1.1.18. Let TWV!Wbe a linear transformation. Then Tis
invertible (as a linear transformation) if and only if Tis 1-1and onto.
Proof. Tis invertible as a function if and only if Tis 1-1and onto. It is
then easy to check that in this case the function T1WW!Vis a linear
transformation.
Definition 1.1.19. An invertible linear transformation TWV!Wis
called an isomorphism. Two vector spaces Vand Ware isomorphic if there
is an isomorphism TWV!W.Þ
“book” — 2011/3/4 — 17:06 — page 6 — #20
i
i
i
i
i
i
i
i
6 1. Vector spaces and linear transformations
Remark 1.1.20. It is easy to check that being isomorphic is an equiva-
lence relation among vector spaces. Þ
Although the historical development of calculus preceded the histori-
cal development of linear algebra, with hindsight we can see that calculus
“works” because of the three parts of the following example.
Example 1.1.21. Let VDC1.R/, the vector spaces of real valued
infinitely differentiable functions on the real line R.
(1) For a real number a, let EaWV!Rbe evaluation at a, i.e.,
Ea.f .x// Df .a/. Then Eais a linear transformation. We also have the
linear transformation e
EaWV!V, where e
Ea.f .x// is the constant func-
tion whose value is f .a/.
(2) Let DWV!Vbe differentiation, i.e., D.f .x// Df0.x/. Then D
is a linear transformation.
(3) For a real number a, let IaWV!Vbe definite integration starting
at tDa, i.e., Ia.f /.x/ DRx
af .t/ dt. Then Iais a linear transformation.
We also have the linear transformation EbıIa, with .EbıIa/.f .x// D
Rb
af .x/ dx.Þ
Theorem 1.1.22. (1) DıIaDI.
(2) IaıDDIe
Ea.
Proof. This is the Fundamental Theorem of Calculus.
Example 1.1.23. (1) Let VDrF11. We define LWV!V(left shift)
and RWV!V(right shift) by
La1; a2; a3; : : : Da2; a3; a4; : : : ;
Ra1; a2; a3; : : : D0; a1; a2; : : : :
Note that Land Rrestrict to linear transformations (which we denote
by the same letters) from rF1to rF1. (We could equally well consider
up-shift and down-shift on F11 or F1, but it is traditional to consider
left-shift and right-shift.)
(2) Let Ebe an extension field of F. Then for ˛2E, we have the linear
transformation given by multiplication by ˛, i.e., T.ˇ/ D˛ˇ for every
ˇ2E.
(3) Let Aand Bbe sets. We have the vector spaces FAD ffWA!
Fgand FBD fgWB!Fg. Let 'WA!Bbe a function. Then
“book” — 2011/3/4 — 17:06 — page 7 — #21
i
i
i
i
i
i
i
i
1.1. Basic definitions and examples 7
'WFB!FAis the linear transformation defined by '.g/ Dgı', i.e.,
'.g/ WA!Fis the function defined by
'.g/.a/ Dg'.a/for a2A:
Note that '“goes the other way” than '. That is, 'is covariant, i.e.,
pushes points forward, while 'is contravariant, i.e., pulls functions back.
Also, the pull-back is given by composition. This is a situation that recurs
throughout mathematics. Þ
Here are two of the most important ways in which subspaces arise.
Definition 1.1.24. Let TWV!Wbe a linear transformation. Then
the kernel of Tis
Ker.T/D fv2VjT.v/ D0g
and the image of Tis
Im.T/D fw2WjwDT.v/ for some v2Vg:Þ
Lemma 1.1.25. In the situation of Definition 1.1.24, Ker.T/is a subspace
of Vand Im.T/is a subspace of W.
Proof. It is easy to check that the conditions in Lemma 1.1.6 are satisfied.
Remark 1.1.26. If TDTA, Ker.T/is often called the nullspace of A
and Im.T/is often called the column space of A.Þ
We introduce one more vector space.
Definition 1.1.27. Let Vand Wbe vector spaces. Then HomF.V; W /,
the space of F-homomorphisms from Vto W, is
HomF.V; W / D flinear transformations TWV!Wg:
If WDV, we set EndF.V / DHomF.V; V /, the space of F-endomorphisms
of V.Þ
Lemma 1.1.28. For any F-vector spaces Vand W,HomF.V; W / is a vec-
tor space.
Proof. It is routine to check that the conditions in Definition 1.1.1 are sat-
isfied.
“book” — 2011/3/4 — 17:06 — page 8 — #22
i
i
i
i
i
i
i
i
8 1. Vector spaces and linear transformations
We also have the subset, which is definitely not a subspace, of EndF.V /
consisting of invertible linear transformations.
Definition 1.1.29. (1) Let Vbe a vector space. The general linear
group GL.V / is
GL.V / D finvertible linear transformations TWV!Vg:
(2) The general linear group GLn.F/is
GLn.F/D finvertible n-by-nmatrices with entries in Fg:Þ
Theorem 1.1.30. Let VDFnand WDFm. Then HomF.V; W / is iso-
morphic to Mm;n.F/. In particular, EndF.V / is isomorphic to Mn.F/. Also,
GL.V / is isomorphic to GLn.F/.
Proof. By Lemma 1.1.12, any T2HomF.V; W / is TDTAfor a unique
A2Mm;n.F/. Then the linear transformation TA7! Agives an isomor-
phism from HomF.V; W / to Mm;n.F/. This restricts to a group isomor-
phism from GLn.F/to GL.V /.
Remark 1.1.31. In the next section we define the dimension of a vector
space and in the next chapter we will see that Theorem 1.1.30 remains true
when Vand Ware allowed to be any vector spaces of dimensions nand m
respectively. Þ
1.2 Basis and dimension
In this section we develop the very important notion of a basis of a vector
space. A basis Bof the vector space Vhas two properties: Bis linearly
independent and Bspans V. We begin by developing each of these two
notions, which are important in their own right. We shall prove that any two
bases of Vhave the same number of elements, which enables us to define
the dimension of Vas the number of elements in any basis of V.
Definition 1.2.1. Let BD fvigbe a subset of V. A vector v2Vis a
linear combination of the vectors in Bif there is a set of scalars fcig, only
finitely many of which are nonzero, such that
vDXcivi:Þ
“book” — 2011/3/4 — 17:06 — page 9 — #23
i
i
i
i
i
i
i
i
1.2. Basis and dimension 9
Remark 1.2.2. If we choose all ciD0then we obtain
0DXcivi:
This is the trivial linear combination of the vectors in B. Any other linear
combination is nontrivial.Þ
Remark 1.2.3. In case BD f g, the only linear combination we have is
the empty linear combination, whose value we consider to be 02Vand
which we consider to be a trivial linear combination. Þ
Definition 1.2.4. Let BD fvigbe a subset of V. Then Bis linearly
independent if the only linear combination of elements of Vthat is equal
to 0is the trivial linear combination, i.e., if 0DPciviimplies ciD0for
every i.Þ
Definition 1.2.5. Let BD fvigbe a subset of V. Then Span.B/is the
subspace of Vconsisting of all linear combinations of elements of B,
Span.B/DnXcivijci2Fo:
If Span.B/DVthen Bis a spanning set for V(or equivalently, Bspans
V). Þ
Remark 1.2.6. Strictly speaking, we should have defined Span.B/to be
a subset of V, but it is easy to verify that it is a subspace. Þ
Lemma 1.2.7. Let Bbe a subset of a vector space V. The following are
equivalent:
(1) Bis linearly independent and spans V.
(2) Bis a maximal linearly independent subset of V.
(3) Bis a minimal spanning set for V.
Proof (Outline). Suppose Bis linearly independent and spans V. If B
B0, choose v2B0,v62 B. Since Bspans V,vis a linear combination of
elements of B, and so B0is not linearly independent. Hence Bis a maximal
linearly independent subset of V. If B0B, choose v2B,v62 B0. Since
Bis linearly independent, vis not in the subspace spanned by B0, and
hence Bis a minimal spanning set for V.
Suppose that Bis a maximal linearly independent subset of V. If B
does not span V, choose any vector v2Vthat is not in the subspace
“book” — 2011/3/4 — 17:06 — page 10 — #24
i
i
i
i
i
i
i
i
10 1. Vector spaces and linear transformations
spanned by B. Then B0DB[ fvgwould be linearly independent, contra-
dicting maximality.
Suppose that Bis a minimal spanning set for V. If Bis not linearly in-
dependent, choose v2Bthat is a linear combination of the other elements
of B. Then B0DB fvgwould span V, contradicting minimality.
Definition 1.2.8. A subset Bof Vsatisfying the equivalent conditions
of Lemma 1.2.7 is a basis of V.Þ
Theorem 1.2.9. Let Vbe a vector space and let Aand Cbe subsets of V
with AC,Alinearly independent, and Cspanning V. Then there is a
basis Bof Vwith ABC.
Proof. This proof is an application of Zorns Lemma. Let
ZD fB0jAB0C;B0linearly independentg;
partially ordered by inclusion. Zis nonempty as A2Z. Any chain (i.e.,
linearly ordered subset) of Zhas a maximal element, its union. Then, by
Zorn’s Lemma, Zhas a maximal element B. We claim that Bis a basis for
V.
Certainly Bis linearly independent, so we need only show that it spans
V. Suppose not. Then there would be some v2Cnot in the span of B
(since if every v2Cwere in the span of B, then Bwould span V, because
Cspans V), and BCDB[ fvgwould then be a linearly independent
subset of Cwith BBC, contradicting maximality.
Corollary 1.2.10. (1) Let Abe any linearly independent subset of V. Then
there is a basis Bof Vwith AB.
(2) Let Cbe any spanning set for V. Then there is a basis Bof Vwith
BC.
(3) Every vector space Vhas a basis B.
Proof. (1) Apply Theorem 1.2.9 with CDV.
(2) Apply Theorem 1.2.9 with AD f g.
(3) Apply Theorem 1.2.9 with AD f g and CDV.
We now show that the dimension of a vector space is well-defined. We
first prove the following familiar result from elementary linear algebra, one
that is useful and important in its own right.
Lemma 1.2.11. A homogeneous system of mequations in nunknowns with
m < n has a nontrivial solution.
“book” — 2011/3/4 — 17:06 — page 11 — #25
i
i
i
i
i
i
i
i
1.2. Basis and dimension 11
Proof (Outline). We proceed by induction on m. Let the unknowns be
x1; : : : ; xn. If mD0, set x1D1,x2D  D xnD0.
Suppose the theorem is true for mand consider a system of mC1
equations in n > m C1unknowns. If none of the equations involve x1,
the system has the solution x1D1,x2D  D xnD0. Otherwise,
pick an equation involving x1(i.e., with the coefficient of x1nonzero) and
subtract appropriate multiples of it from the other equations so that none of
them involve x1. Then the other equations in the transformed system are a
system of n1 > m equations in the variables x2; : : : ; xn. By induction it
has a nontrivial solution for x2; : : : ; xn. Then solve the remaining equation
for x1.
Lemma 1.2.12. Let BD fv1; : : : ; vmgspan V. Any subset Cof Vcon-
taining more than mvectors is linearly dependent.
Proof. Let CD fw1; : : : ; wngwith n > m. (If Cis infinite consider a finite
subset containing n > m elements.) For each iD1; : : : ; n
wiD
m
X
jD1
aj i vj:
We show that
0D
m
X
iD1
ciwi
has a nontrivial solution (i.e., a solution with not all ciD0). We have
0D
m
X
iD1
ciwiD
m
X
iD1
ci0
@
n
X
jD1
aj i vj1
AD
n
X
jD1 m
X
iD1
aj i ci!vj
and this will be true if
0D
n
X
iD1
aj i cifor each jD1; : : : ; m:
This is a system of mequations in the nunknowns c1; : : : ; cnand so has a
nontrivial solution by Lemma 1.2.11.
In the following, we do not distinguish between cardinalities of infinite
sets.
“book” — 2011/3/4 — 17:06 — page 12 — #26
i
i
i
i
i
i
i
i
12 1. Vector spaces and linear transformations
Theorem 1.2.13. Let Vbe a vector space. Then any two bases of Vhave
the same number of elements.
Proof. Let Vhave bases Band C. If both Band Care infinite, we are
done. Assume not. Let Bhave melements and Chave nelements. Since
Band Care bases, both Band Cspan Vand both Band Care linearly
independent. Applying Lemma 1.2.12 we see that mn. Interchanging B
and Cwe see that nm. Hence mDn.
Given this theorem we may make the following very important defini-
tion.
Definition 1.2.14. Let Vbe a vector space. The dimension of V, dim.V /,
is the number of vectors in any basis of V, dim.V / 2 f0; 1; 2; : : :g [ f1g.
Þ
Remark 1.2.15. The vector space VD f0ghas basis f g and hence
dimension 0.Þ
While we will be considering both finite-dimensional and infinite-dimen-
sional vector spaces, we adopt the convention that when we write “Let Vbe
an n-dimensional vector space or “Let Vbe a vector space of dimension n
we always mean that Vis finite-dimensional, so that nis a nonnegative in-
teger.
Theorem 1.2.16. Let Vbe a vector space of dimension n. Let Cbe a subset
of Vconsisting of melements.
(1) If m > n then Cis not linearly independent (and hence is not a basis
of V).
(2) If m < n then Cdoes not span V(and hence is not a basis of V).
(3) If mDnthe following are equivalent:
(a) Cis a basis of V.
(b) Cspans V.
(c) Cis linearly independent.
Proof. Let Bbe a basis of V, consisting necessarily of nelements.
(1) Bspans Vso, applying Lemma 1.2.12, if Chas m > n elements
then Cis not linearly independent.
(2) Suppose Cspans V. Then, applying Lemma 1.2.12, Bhas n > m
elements so cannot be linearly independent, contradicting Bbeing a basis
of V.
“book” — 2011/3/4 — 17:06 — page 13 — #27
i
i
i
i
i
i
i
i
1.2. Basis and dimension 13
(3) By definition, (a) is equivalent to (b) and (c), so (a) implies (b) and
(a) implies (c). Suppose (b) is true. By Corollary 1.2.10, Chas a subset of
C0of mnelements that is a basis of V. By Theorem 1.2.13, mDn,
so C0DC. Suppose (c) is true. By Corollary 1.2.10, Chas a superset of
C0of mnelements that is a basis of V. By Theorem 1.2.13, mDn, so
C0DC.
Remark 1.2.17. A good mathematical theory is one that reduces hard
problems to easy problems. Linear algebra is such a theory, as it reduces
many problems to counting. Theorem 1.2.16 is a typical example. Suppose
we want to know whether a set Cis a basis of an n-dimensional vector space
V. We count the number of elements of C, say m. If we get the “wrong”
number, i.e., if m¤n, then we know Cis not a basis of V. If we get the
“right number, i.e., if mDn, then Cmay or may not be a basis of V. While
there are normally two conditions to check, that Cis linearly independent
and that Cspans V, it suffices to check either one of the conditions. If that
one is satisfied, the other one is automatic. Þ
Example 1.2.18. (1) Fnhas basis En, the standard basis, given by EnD
fe1;n; e2;n; : : : ; en;ngwhere ei;n is the vector in Fnwhose ith entry is 1and
all of whose other entries are 0.
F1has basis E1D fe1;1; e2;1; : : :gdefined analogously. We will
often write Efor Enand eifor ei;n when nis understood. Thus Fnhas
dimension nand F1is infinite-dimensional.
(2) F1is a proper subspace of F11. By Corollary 1.2.10, F11 has a
basis, but it is impossible to write one down in a constructive way.
(3) The vector space of polynomials of degree at most nwith coef-
ficients in F,Pn.F/D fa0Ca1xC  C anxnjai2Fg, has basis
f1; x; : : :; xngand dimension nC1.
(4) The vector space of polynomials of arbitrary degree with coefficients
in F,P .F/D fa0Ca1xCa2x2C  j ai2Fg, has basis f1; x; x2; : : :g
and is infinite-dimensional.
(5) Let pi.x/ be any polynomial of degree i. Then fp0.x/; p1.x/; : : : ;
pn.x/gis a basis for Pn.F/, and fp0.x/; p1.x/; p2.x/; : : :gis a basis for
P.F/.
(6) Mm;n.F/has dimension mn, with basis given by the mn distinct
matrices each of which has a single entry of 1and all other entries 0.
(7) If VD ffWA!Fgfor some finite set AD fa1; : : : ; ang, then
Vis n-dimensional with basis fb1; : : : ; bngwhere biis the function defined
by bi.aj/D1if jDiand 0if j¤i.
“book” — 2011/3/4 — 17:06 — page 14 — #28
i
i
i
i
i
i
i
i
14 1. Vector spaces and linear transformations
(8) Let Ebe an extension of Fand let ˛2Ebe algebraic, i.e., ˛is a root
of a (necessarily unique) monic irreducible polynomial f .x/ 2FŒx. Let
f .x/ have degree n. Then F/ defined by F/ D fp/ jp.x/ 2FŒxg
is a subfield of Ewith basis f1; ˛; : : :; ˛n1gand so is an extension of Fof
degree n.Þ
Remark 1.2.19. If we consider cardinalities of infinite sets, we see that
F1is countably infinite-dimensional. On the other hand, F11 is uncount-
ably infinite-dimensional. If Fis a countable field, this is easy to see: F11
is uncountable. For Funcountable, we need a more subtle argument. We
will give it here, although it presupposes results from Chapter 4. For con-
venience we consider rF11 instead, but clearly rF11 and F11 are iso-
morphic.
Consider RWrF11 !rF11. Observe that for any a2F,Rhas
eigenvalue awith associated eigenvector vaDŒ1; a; a2; a3; : : :. But eigen-
vectors associated to distinct eigenvalues are linearly independent. (See
Lemma 4.2.5.) Þ
Corollary 1.2.20. Let Wbe a subspace of V. Then dim.W / dim.V /. If
dim.V / is finite, then dim.W / Ddim.V / if and only if WDV.
Proof. Apply Theorem 1.2.16 with Ca basis of W.
We have the following useful characterization of a basis.
Lemma 1.2.21. Let Vbe a vector space and let BD fvigbe a set of
vectors in V. Then Bis a basis of Vif and only if every v2Vcan be
written uniquely as vDPcivifor ci2F, all but finitely many zero.
Proof. Suppose Bis a basis of V. Then Bspans V, so any v2Vcan be
written as vDPcivi. We show this expression for vis unique. Suppose
we have vDPc0
ivi. Then 0DP.c0
ici/vi. But Bis linearly indepen-
dent, so c0
iciD0and c0
iDcifor each i.
Conversely, suppose every v2Vcan be written as vDPciviin
a unique way. This clearly implies that Bspans V. To show Bis linearly
independent, suppose 0DPcivi. Certainly 0DP0vi. By the uniqueness
of the expression, ciD0for each i.
This lemma will be the basis for our definition of coordinates in the
next chapter. It also has immediate applications. First, an illustrative use,
and then some general results.
“book” — 2011/3/4 — 17:06 — page 15 — #29
i
i
i
i
i
i
i
i
1.2. Basis and dimension 15
Example 1.2.22. (1) Let VDPn1.R/. For any real number a,
BD f1; x a; .x a/2; : : : ; .x a/n1g
is a basis of V, so any polynomial p.x/ 2Vcan be written uniquely as a
linear combination of elements of B,
p.x/ D
n1
X
cD0
ci.x a/i:
Solving for the coefficients ciwe obtain the familiar Taylor expansion
p.x/ D
n1
X
iD0
p.i/.a/
iŠ .x a/i:
(2) Let VDPn1.R/. For any set of pairwise distinct real numbers
fa1; : : : ; ang,
BD f.x a2/.x a3/.x an/; .x a1/.x a3/.x an/; : : : ;
.x a1/.x an/.x an1/g
is a basis of V, so any polynomial p.x/ 2Vcan be written uniquely as a
linear combination of elements of B,
p.x/ D
n
X
iD1
cixa1xai1xaiC1xan:
Solving for the coefficients ciwe obtain the familiar Lagrange interpolation
formula
p.x/ D
n
X
iD1
pai
aia1aiai1aiaiC1aian
xa1xai1xaiC1xan:Þ
So far in this section we have considered individual vector spaces. Now
we consider pairs of vector spaces Vand Wand linear transformations
between them.
Lemma 1.2.23. (1) A linear transformation TWV!Wis specified by its
values on any basis of V.
(2) If fvigis a basis of Vand fwigis an arbitrary set of vectors in W,
then there is a unique linear transformation TWV!Wwith T.vi/Dwi
for each i.
“book” — 2011/3/4 — 17:06 — page 16 — #30
i
i
i
i
i
i
i
i
16 1. Vector spaces and linear transformations
Proof. (1) Let BD fv1; v2; : : :gbe a basis of Vand suppose that TWV!
Wand T0WV!Ware two linear transformations that agree on each vi.
Let v2Vbe arbitrary. We may write vDPcivi, and then
T.v/ DTXciviDXciTviDXciT0vi
DT0XciviDT0.v/:
(2) Let fw1; w2; : : :gbe an arbitrary set of vectors in W, and define T
as follows: For any v2V, write vDPciviand let
T.v/ DXciT.vi/DXciwi:
Since the expression for vis unique, this gives a well-defined function TW
V!Wwith T.vi/Dwifor each i. It is routine to check that Tis a linear
transformation. Then Tis unique by part (1).
Lemma 1.2.24. Let TWV!Wbe a linear transformation and let BD
fv1; v2; : : :gbe a basis of V. Let CD fw1; w2; : : :g D fT.v1/; T.v2/; : : :g.
Then Tis an isomorphism if and only if Cis a basis of W.
Proof. First suppose Tis an isomorphism.
To show Cspans W, let w2Wbe arbitrary. Since Tis an epimor-
phism, wDT.v/ for some v. As Bis a basis of V, it spans V, so we may
write vDPcivifor some fcig. Then
wDT.v/ DT.Xcivi/DXciT.vi/DXciwi:
To show Cis linearly independent, suppose PciwiD0. Then
0DXciwiDXciTviDTXciviDT.v/ where vDXcivi:
Since Tis a monomorphism, we must have vD0. Thus 0DPcivi. As
Bis a basis of V, it is linearly independent, so ciD0for all i.
Conversely, suppose Cis a basis of W. By Lemma 1.2.23(2), we may
define a linear transformation SWW!Vby S.wi/Dvi. Then ST .vi/D
vifor each iso, by Lemma 1.2.23(1), S T is the identity on V. Similarly
T S is the identity on Wso Sand Tare inverse isomorphisms.
“book” — 2011/3/4 — 17:06 — page 17 — #31
i
i
i
i
i
i
i
i
1.3. Dimension counting and applications 17
1.3 Dimension counting
and applications
We have mentioned in Remark 1.2.17 that linear algebra enables us to re-
duce many problems to counting. We gave examples of this in counting
elements of sets of vectors in the last section. We begin this section by de-
riving a basic dimension-counting theorem for linear transformations, The-
orem 1.3.1. The usefulness of this result cannot be overemphasized. We
present one of its important applications in Corollary 1.3.2, and we give a
typical example of its use in Example 1.3.10. It is used throughout linear
algebra.
Here is the basic result about dimension counting.
Theorem 1.3.1. Let Vbe a finite-dimensional vector space and let TW
V!Wbe a linear transformation. Then
dim Ker.T/Cdim Im.T/Ddim.V /:
Proof. Let kDdim.Ker.T// and nDdim.V /. Let fv1; : : : ; vkgbe a basis
of Ker.T/. By Corollary 1.2.10, fv1; : : : ; vkgextends to a basis fv1; : : : ; vk;
vkC1; : : : ; vngof V. We claim that BD fT.vkC1/; : : : ; T.vn/gis a basis
of Im.T/.
First let us see that Bspans Im.T/. If w2Im.T/, then wDT.v/ for
some v2V. Let vDPcivi. Then
T.v/ DXciTviD
k
X
iD1
ciTviC
n
X
iDkC1
ciTvi
D
n
X
iDkC1
ciTvi
as T.vi/D  D T.vk/D0since v1; : : : ; vk2Ker.T/.
Second, let us see that Bis linearly independent. Suppose that
n
X
iDkC1
ciTviD0:
Then
T n
X
iDkC1
civi!D0;
“book” — 2011/3/4 — 17:06 — page 18 — #32
i
i
i
i
i
i
i
i
18 1. Vector spaces and linear transformations
so
n
X
iDkC1
civi2Ker.T/;
and hence for some c1; : : : ; ck, we have
n
X
iDkC1
civiD
k
X
iD1
civi:
Then
k
X
iD1civiC
n
X
iDkC1
civiD0;
so by the linear independence of fv1; : : : ; vng,ciD0for each i.
Thus dim.Im.T// Dnkand indeed kC.n k/ Dn.
Corollary 1.3.2. Let TWV!Wbe a linear transformation between
vector spaces of the same finite dimension n. The following are equivalent:
(1) Tis an isomorphism.
(2) Tis an epimorphism.
(3) Tis a monomorphism.
Proof. Clearly (1) implies (2) and (3).
Suppose (2) is true. Then, by Theorem 1.3.1,
dim Ker.T/Ddim.V / dim Im.T/
Ddim.W / dim Im.T/DnnD0;
so Ker.T/D f0gand Tis a monomorphism, yielding (3) and hence (1).
Suppose (3) is true. Then, by Theorem 1.3.1,
dim Im.T/Ddim.V / dim Ker.T/
Ddim.W / dim Ker.T/Dn0D0;
so Im.T/DWand Tis an epimorphism, yielding (2) and hence (1).
Corollary 1.3.3. Let Abe an n-by-nmatrix. The following are equivalent:
(1) Ais invertible.
“book” — 2011/3/4 — 17:06 — page 19 — #33
i
i
i
i
i
i
i
i
1.3. Dimension counting and applications 19
(2) There is an n-by-nmatrix Bwith AB DI.
(3) There is an n-by-nmatrix Bwith BA DI.
In this situation, BDA1.
Proof. Apply Corollary 1.3.2 to the linear transformation TA. If Ais in-
vertible and AB DI, then BDIB DA1.AB/ DA1IDA1, and
similarly if BA DI.
Example 1.3.4. Corollary 1.3.2 is false in the infinite-dimensional case:
(1) Let VDrF11 and consider left shift Land right shift R.Lis an
epimorphism but not a monomorphism, while Ris a monomorphism but not
an epimorphism. We see that LıRDI(so Ris a right inverse for Land
Lis a left inverse for R) but RıL¤I(and neither Lnor Ris invertible).
(2) Let VDC1.R/. Then DWV!Vand IaWV!Vare linear
transformations that are not invertible, but DıIais the identity. Þ
Remark 1.3.5. We are not in general considering cardinalities of in-
finite sets. But we remark that two vector spaces Vand Ware isomor-
phic if and only if they have bases of the same cardinality, as we see from
Lemma 1.2.23 and Lemma 1.2.24. Þ
Corollary 1.3.6. Let Vbe a vector space of dimension mand let Wbe a
vector space of dimension n.
(1) If m < n then no linear transformation TWV!Wcan be an
epimorphism.
(2) If m > n then no linear transformation TWV!Wcan be a
monomorphism.
(3) Vand Ware isomorphic if and only if mDn. In particular, every
n-dimensional vector space Vis isomorphic to Fn.
Proof. (1) In this case, dim.Im.T// m < n so Tis not an epimorphism.
(2) In this case, dim.Ker.T// mn > 0 so Tis not a monomor-
phism.
(3) Parts (1) and (2) show that if m¤n, then Vand Ware not isomor-
phic. If mDn, choose a basis fv1; : : : ; vmgof Vand a basis fw1; : : : ; wmg
of W. By Lemma 1.2.23, there is a unique linear transformation Tdeter-
mined by T.vi/Dwifor each i, and by Lemma 1.2.24 Tis an isomor-
phism.
“book” — 2011/3/4 — 17:06 — page 20 — #34
i
i
i
i
i
i
i
i
20 1. Vector spaces and linear transformations
Corollary 1.3.7. Let Abe an n-by-nmatrix. The following are equivalent:
(1) Ais invertible.
(10) The equation Ax Dbhas a unique solution for every b2Fn.
(2) The equation Ax Dbhas a solution for every b2Fn.
(3) The equation Ax D0has only the trivial solution xD0.
Proof. This is simply a translation of Corollary 1.3.2 into matrix language.
We emphasize that this one-sentence proof is the “right” proof of the
equivalence of these properties. For the reader who would like to see a more
computational proof, we shall prove directly that (1) and (10) are equivalent.
Before doing so we also observe that their equivalence does not involve
dimension counting. It is their equivalence with properties (2) and (3) that
does. It is possible to prove this equivalence without using dimension count-
ing, and this is often done in elementary texts, but that is most certainly the
“wrong” proof as it is a manipulative proof that obscures the ideas.
(1) )(10): Suppose Ais invertible. Let x0DA1b. Then Ax0D
A.A1b/ Dbso x0is the solution of Ax Db. If x1any other solution,
then Ax1Db,A1.Ax1/DA1b,x1DA1bDx0, so x0is the unique
solution.
(10))(1): Let bibe a solution of Ax Deifor iD1; : : : ; n, which
exists by hypothesis. Let BDŒb1jb2j  j bn. Then AB DŒe1j
e2j  j enDI. We show that BA DIas well. (That comes from
Corollary 1.3.3, but we are trying to prove it without using Theorem 1.3.1.)
Let fiDAei,iD1; : : : ; n. Then Ax Dfievidently has the solution
x0Dei. It also has the solution x1DBAeias
ABAeiD.AB/AeiDIAeiDAeiDfi:
By hypothesis, Ax Dfihas a unique solution, so BAeiDeifor each i,
giving BA DŒe1je2jjenDI.
As another application of Theorem 1.3.1, we prove the following famil-
iar theorem from elementary linear algebra.
Theorem 1.3.8. Let Abe an m-by-nmatrix. Then the row rank of Aand
the column rank of Aare equal.
Proof. For a matrix C, the image of the linear transformation TCis simply
the column space of C.
“book” — 2011/3/4 — 17:06 — page 21 — #35
i
i
i
i
i
i
i
i
1.3. Dimension counting and applications 21
Let Bbe a matrix in (reduced) row echelon form. The nonzero rows of
Bare a basis for the row space of B. Each of these rows has a “leading”
entry of 1, and it is easy to check that the columns of Bcontaining those
leading 1s are a basis for the column space of B. Thus if Bis in (reduced)
row echelon form, its row rank and column rank are equal.
Thus if Bhas column rank k, then dim.Im.TB// Dkand hence by
Theorem 1.3.1 dim.Ker.TB// Dnk.
Our original matrix Ais row-equivalent to a (unique) matrix Bin (re-
duced) row echelon form, so Aand Bmay be obtained from each other
by a sequence of row operations. Row operations do not change the row
space of a matrix, so if Bhas row rank k, then Ahas row rank kas well.
Row operations change the column space of A, so we can not use the col-
umn space directly. However, they do not change Ker.TA/. (That is why
we usually do them, to solve Ax D0.) Thus Ker.TB/DKer.TA/and so
dim.Ker.TA// Dnk. Then by Theorem 1.3.1 again, dim.Im.TA// Dk,
i.e., Ahas column rank k, the same as its row rank, and we are done.
Remark 1.3.9. This proof is a correct proof, but is the “wrong” proof, as
it shows the equality without showing why it is true. We will see the “right”
proof in Theorem 2.4.7 below. That proof is considerably more compli-
cated, so we have presented this easy proof. Þ
Example 1.3.10. Let VDPn1.R/for fixed n. Let a1; : : : ; akbe dis-
tinct real numbers and let e1; : : : ; ekbe non-negative integers with .e1C
1/ C  C .ekC1/ Dn. Define TWV!Rnby
Tf .x/D
2
6
6
6
6
6
6
6
6
6
6
6
6
4
fa1
:
:
:
f.e1/a1
:
:
:
fak
:
:
:
f.ek/ak
3
7
7
7
7
7
7
7
7
7
7
7
7
5
:
If f .x/ 2Ker.T/, then f.i /.ai/D0for iD0; : : : ; ei, so f .x/ is divis-
ible by .x ai/eiC1for each i. Thus f .x/ divisible by .x a1/e1C1.x
ak/ekC1, a polynomial of degree n. Since f .x/ has degree at most n1, we
conclude f .x/ is the 0polynomial. Thus Ker.T/D f0g. Since dim VDn
we conclude from Corollary 1.3.2 that Tis an isomorphism. Thus for any
“book” — 2011/3/4 — 17:06 — page 22 — #36
i
i
i
i
i
i
i
i
22 1. Vector spaces and linear transformations
nreal numbers b0
1; : : : ; be1
1; : : : ; b0
k; : : : ; bek
kthere is a unique polynomial
f .x/ of degree at most n1with f.j /.ai/Dbj
ifor jD0; : : : ; eiand for
iD1; : : : ; k. (This example generalizes Example 1.2.22(1), where kD1,
and Example 1.2.22(2), where eiD0for each i.) Þ
Let us now see that the numerical relation in Theorem 1.3.1 is the only
restriction on the kernel and image of a linear transformation.
Theorem 1.3.11. Let Vand Wbe vector spaces with dim VDn. Let V1
be a k-dimensional subspace of Vand let W1be an .n k/-dimensional
subspace of W. Then there is a linear transformation TWV!Wwith
Ker.T/DV1and Im.T/DV2.
Proof. Let B1D fv1; : : : ; vkgbe a basis of V1and extend B1to BD
fv1; : : : ; vng, a basis of V. Let C1D fwkC1; : : : ; wngbe a basis of W1.
Define TWV!Wby T.vi/D0for iD1; : : : ; k and T.vi/Dwifor
iDkC1; : : : ; n.
Remark 1.3.12. In this section we have stressed the importance and
utility of counting arguments. Here is a further application:
A philosopher, an engineer, a physicist, and a mathematician are sitting
at a sidewalk cafe having coffee. On the opposite side of the street there is
an empty building. They see two people go into the building. A while later
they see three come out.
The philosopher concludes “There must have been someone in the build-
ing to start with.
The engineer concludes “We must have miscounted.
The physicist concludes “There must be a rear entrance.
The mathematician concludes “If another person goes in, the building
will be empty. Þ
1.4 Subspaces and
direct sum decompositions
We now generalize the notion of spanning sets, linearly independent sets,
and bases. We introduce the notions of Vbeing a sum of subspaces W1;:::;
Wk, of the subspaces W1; : : : ; Wkbeing independent, and of Vbeing the
direct sum of the subspaces W1; : : : ; Wk. In the special case where each
W1; : : : ; Wkconsists of the multiples of a single nonzero vector vi, let BD
fv1; : : : ; vkg. Then Vis the sum of W1; : : : ; Wkif and only if Bspans
“book” — 2011/3/4 — 17:06 — page 23 — #37
i
i
i
i
i
i
i
i
1.4. Subspaces and direct sum decompositions 23
V; the subspaces W1; : : : ; Wkare independent if and only if Bis linearly
independent; and Vis the direct sum of W1; : : : ; Wkif and only if Bis a
basis of V. Thus our work here generalizes part of our work in Section 1.2,
but this generalization will be essential for future developments. In most
cases we omit the proofs as they are very similar to the ones we have given.
Definition 1.4.1. Let Vbe a vector space and let fW1; : : : ; Wkgbe a set
of subspaces of V. Then Vis the sum VDW1C  C Wkif every v2V
can be written as vDw1C:::Cwkwhere wi2Wi.Þ
Definition 1.4.2. Let Vbe a vector space and let fW1; : : : ; Wkgbe a set
of subspaces of V. This set of spaces is independent if 0Dw1C  C wk
with wi2Wiimplies wiD0for each i.Þ
Definition 1.4.3. Let Vbe a vector space and let fW1; : : : ; Wkgbe a
set of subspaces of V. Then Vis the direct sum VDW1˚  ˚ Wkif
(1) VDW1C  C Wk, and
(2) fW1; : : : ; Wkgis independent. Þ
We have the following equivalent criterion.
Lemma 1.4.4. Let fW1; : : : ; Wkgbe a set of subspaces of V. This set of
subspaces is independent if and only if Wi\.W1C  C Wi1CWiC1C
 C Wk/D f0gfor each i.
If we only have two subspaces fW1; W2gthis condition simply states
W1\W2D f0g. If we have more than two subspaces, it is stronger than
the condition Wi\WjD f0gfor i¤j, and it is the stronger condition we
need for independence, not the weaker one.
Lemma 1.4.5. Let Vbe a vector space and let fW1; : : : ; Wkgbe a set of
subspaces of V. Then Vis the direct sum VDW1˚   ˚ Wkif and only
if v2Vcan be written as vDw1CCwkwith wi2Wi, for each i, in
a unique way.
Lemma 1.4.6. Let Vbe a vector space and let fW1; : : : ; Wkgbe a set of
subspaces of V. Let Bibe a basis of Wi, for each i, and let BDB1[[
Bk. Then
(1) Bspans Vif and only if VDW1C  C Wk.
(2) Bis linearly independent if and only if fW1; : : : ; Wkgis independent.
(3) Bis a basis for Vif and only if VDW1˚  ˚ Wk.
“book” — 2011/3/4 — 17:06 — page 24 — #38
i
i
i
i
i
i
i
i
24 1. Vector spaces and linear transformations
Corollary 1.4.7. Let Vbe a finite-dimensional vector space and let fW1;
: : : ; Wkgbe a set of subspaces with VDW1˚  ˚ Wk. Then dim.V / D
dim.W1/C  C dim.Wk/.
Corollary 1.4.8. Let Vbe a vector space of dimension nand let fW1;:::;
Wkgbe a set of subspaces. Let niDdim.Wi/.
(1) If n1C  C nk> n then fW1; : : : ; Wkgis not independent.
(2) If n1C  C nk< n then V¤W1C  C Wk.
(3) If n1C  C nkDnthe following are equivalent:
(a) VDW1˚  ˚ Wk.
(b) VDW1C  C Wk
(c) fW1; : : : ; Wkgis independent.
Definition 1.4.9. Let Vbe a vector space and let W1be a subspace of
V. Then W2is a complement of W1if VDW1˚W2.Þ
Lemma 1.4.10. Let Vbe a vector space and let W1be a subspace of V.
Then W1has a complement W2.
Proof. Let B1be a basis of W1. Then B1is linearly independent, so by
Corollary 1.2.10 there is a basis Bof Vcontaining B1. Let B2DBB1.
Then B2is a subset of V, so is linearly independent. Let W2be the span of
B2. Then B2is a linearly independent spanning set for W2, i.e., a basis for
W2, and so by Lemma 1.4.6 VDW1˚W2, and hence W2is a complement
of W1.
Remark 1.4.11. Except when W1D f0g(where W2DV) or W1DV
(where W1D f0g), the subspace W2is never unique. We can always choose
a different way of extending B1to a basis of V, in order to obtain a different
W2. Thus W2is a, not the, complement of W1.Þ
1.5 Affine subspaces
and quotient spaces
For the reader familiar with these notions, we can summarize much of what
we are about to do in this section in a paragraph: Let Wbe a subspace of
V. Then Wis a subgroup of V, regarded as an additive group. An affine
subspace of Vparallel to Wis simply a coset of Win V, and the quotient
“book” — 2011/3/4 — 17:06 — page 25 — #39
i
i
i
i
i
i
i
i
1.5. Affine subspaces and quotient spaces 25
space V=W is simply the group quotient V=W , which also has a vector
space structure.
But we will not presume this familiarity, and instead proceed “from
scratch”.
We begin with a generalization of the notion of a subspace of a vector
space.
Definition 1.5.1. Let Vbe a vector space. A subset Xof Vis an affine
subspace if for some element x0of X,
UD˚x0x0jx02X
is a subspace of V. In this situation Xis parallel to U.Þ
The definition makes the element x0of Xlook distinguished, but that
is not the case.
Lemma 1.5.2. Let Xbe affine subspace of Vparallel to the subspace U.
Then for any element xof X,
UD˚x0xjx02X:
Remark 1.5.3. An affine subspace Xof Vis a subspace of Vif and only
if 02X.Þ
An alternative way of looking at affine subspaces is given by the follow-
ing result.
Proposition 1.5.4. A subset Xof Vis an affine subspace of Vparallel to
the subspace Uof Vif and only if for some, and hence for every, element x
of X,
XDxCUD fxCuju2Ug:
There is a natural definition of the dimension of an affine subspace.
Definition 1.5.5. Let Xbe affine subspace of Vparallel to the subspace
U. Then the dimension of Xis dim.X/ Ddim.U /.Þ
Proposition 1.5.6. Let Xbe an affine subspace of Vparallel to the sub-
space Uof V. Let x0be an element of Xand let fu1; u2; : : :gbe a basis of
U. Then any element xof Xmay be written uniquely as
xDx0CXciui
for some scalars fc1; c2; : : :g.
“book” — 2011/3/4 — 17:06 — page 26 — #40
i
i
i
i
i
i
i
i
26 1. Vector spaces and linear transformations
The most important way in which affine subspaces arise is as follows.
Theorem 1.5.7. Let TWV!Wbe a linear transformation and let w02
Wbe an arbitrary element of W. If T1.w0/is nonempty, then T1.w0/
is an affine subspace of Vparallel to Ker.T/.
Proof. Choose v02Vwith T.v0/Dw0. If v2T1.w0/is arbitrary, then
vDv0C.v v0/Dv0Cuand T.u/ DT.v v0/DT.v/ T.v0/D
w0w0D0, so u2Ker.T/. Conversely, if u2Ker.T/and vDv0Cu,
then T.v/ DT.v0Cu/ DT.v0/CT.u/ Dw0C0Dw0. Thus we see
that
T1.w0/Dv0CKer.T/
and the theorem then follows from Proposition 1.5.4.
Remark 1.5.8. The condition in Definition 1.5.1 is stronger than the
condition that UD fx2x1jx1; x22Ug. (We must fix x1and let x2
vary, or vice versa, but we cannot let both vary.) For example, if Vis any
vector space and XDVf0g, then VD fx2x1jx1; x22Xg, but Xis
never an affine subspace of V, except in the case that Vis a 1-dimensional
vector space over the field with 2elements. Þ
Let Vbe a vector space and Wa subspace. We now define the impor-
tant notion of the quotient vector space V =W , and investigate some of its
properties.
Definition 1.5.9. Let Vbe a vector space and let Wbe a subspace of
V. Let be the equivalence relation on Vgiven by v1v2if v1v22W.
Denote the equivalence class of v2Vunder this relation by Œv. Then the
quotient V =W is the vector space
V=W D˚equivalence classes Œvjv2V
with addition given by Œv1CŒv2DŒv1Cv2and scalar multiplication
given by cŒvDŒcv.Þ
Remark 1.5.10. We leave it to the reader to check that these operations
give V =W the structure of a vector space. Þ
Here is an alternative definition of V =W .
Lemma 1.5.11. The quotient space V =W of Definition 1.5.9 is given by
V=W D˚affine subspaces of Vparallel to W:
“book” — 2011/3/4 — 17:06 — page 27 — #41
i
i
i
i
i
i
i
i
1.5. Affine subspaces and quotient spaces 27
Proof. As in Proposition 1.5.4, we can check that for v02V, the equiva-
lence class Œv0of v0is given by
v0D˚v2Vjvv0D˚v2Vjvv02WDv0CW;
which is an affine subspace parallel to W, and every affine subspace arises
in this way from a unique equivalence class.
There is a natural linear transformation from Vto V =W .
Definition 1.5.12. Let Wbe a subspace of V. The canonical projection
WV!V=W is the linear transformation given by .v/ DŒvDvCW.
Þ
We have the following important construction and results. They improve
on the purely numerical information provided by Theorem 1.3.1.
Theorem 1.5.13. Let TWV!Xbe a linear transformation. Then TW
V= Ker.T/!Xgiven by T.v CKer.T// DT.v/ (i.e., by T..v// D
T.v/) is a well-defined linear transformation, and Tgives an isomorphism
from V= Ker.T/to Im.T/X.
Proof. If v1CKer.T/Dv2CKer.T/, then v1Dv2Cwfor some
w2Ker.T/, so T.v1/DT.v2Cw/ DT.v2/CT.w/ DT.v2/C0D
T.v2/, and Tis well-defined. It is then easy to check that it is a linear
transformation, that it is 1-1, and that its image is Im.T/, completing the
proof.
Let us now see how to find a basis for a quotient vector space.
Theorem 1.5.14. Let Vbe a vector space and W1a subspace. Let B1D
fw1; w2; : : :gbe a basis for W1and extend B1to a basis Bof V. Let B2D
BB1D fz1; z2; : : :g. Let W2be the subspace of Vspanned by B2, so that
W2is a complement W1in Vwith basis B2. Then the linear transformation
PWW2!V=W1defined by P.zi/DŒziis an isomorphism. In particular,
B2D fŒz1; Œz2; : : :gis a basis for V=W1.
Proof. It is easy to check that Pis a linear transformation. We show that
fŒz1; Œz2; : : :gis a basis for V=W1. Then, since Pis a linear transformation
taking a basis of one vector space to a basis of another, Pis an isomor-
phism.
First let us see that B2spans V=W1. Consider an equivalence class Œv
in V =W1. Since Bis a basis of V, we may write vDPciwiCPdjzj
“book” — 2011/3/4 — 17:06 — page 28 — #42
i
i
i
i
i
i
i
i
28 1. Vector spaces and linear transformations
for some fcigand fdjg. Then vPdjzjDPciwi2W1, so vPdjzj
and hence ŒvDŒPdjzjDPdjŒzj.
Next let us see that B2is linearly independent. Suppose PdjŒzjD
ŒPdjzjD0. Then Pdjzj2W1, so PdjzjDPciwifor some fcig.
But then P.ci/wiCPdjzjD0, an equation in V. But fw1; w2;:::;
z1; z2; : : :g D Bis a basis of V, and hence linearly independent, so
(c1Dc2D  D 0and) d1Dd2D  D 0.
Remark 1.5.15. We cannot emphasize strongly enough the difference
between a complement W2of the subspace W1and the quotient V =W1. The
quotient V =W1is canonically associated to W1, whereas a complement is
not. As we observed, W1almost never has a unique complement. Theo-
rem 1.5.14 shows that any of these complements is isomorphic to the quo-
tient V =W1. We are in a situation here where every quotient object V=W1is
isomorphic to a subobject W2. This is not always the case in algebra, though
it is here, and this fact simplifies arguments, as long as we remember that
what we have is an isomorphism between W2and V=W1,not an identifica-
tion of W2with V =W1. Indeed, it would be a bad mistake to identify V =W1
with a complement W2of W1.Þ
Often when considering a subspace Wof a vector space V, what is
important is not its dimension, but rather its codimension, which is defined
as follows.
Definition 1.5.16. Let Wbe a subspace of V. Then the codimension
of Win Vis
codimVWDdim V =W: Þ
Lemma 1.5.17. Let W1be a subspace of V. Let W2be any complement of
W1in V. Then codimVW1Ddim W2.
Proof. By Theorem 1.5.14, V =W1and W2are isomorphic.
Corollary 1.5.18. Let Vbe a vector space of dimension nand let Wbe a
subspace of Vof dimension k. Then dim V =W DcodimVWDnk.
Proof. Immediate from Theorem 1.5.14 and Lemma 1.5.17.
Here is one important way in which quotient spaces arise.
Definition 1.5.19. Let TWV!Wbe a linear transformation. Then
the cokernel of Tis the quotient space
Coker.T/DW= Im.T/: Þ
“book” — 2011/3/4 — 17:06 — page 29 — #43
i
i
i
i
i
i
i
i
1.5. Affine subspaces and quotient spaces 29
Corollary 1.5.20. Let Vbe an n-dimensional vector space and let TW
V!Vbe a linear transformation. Then dim.Ker.T// Ddim.Coker.T//.
Proof. By Theorem 1.3.1, Corollary 1.5.18, and Definition 1.5.19,
dim Ker.T/Ddim.V / dim Im.T/Ddim V= Im.T/
Ddim Coker.T/:
We have shown that any linearly independent set in a vector space V
extends to a basis of V. We outline another proof of this, using quotient
spaces. This proof is not any easier, but its basic idea is one we will be
using later.
Theorem 1.5.21. Let B1be any linearly independent subset of a vector
space V. Then B1extends to a basis Bof V.
Proof. Let Wbe the subspace of Vgenerated by B1, and let WV!
V=W be the canonical projection. Let CD fx1; x2; : : :gbe a basis of V=W
and for each ilet ui2Vwith .ui/Dxi. Let B2D fu1; u2; : : :g. We
leave it to the reader to check that BDB1[B2is a basis of V.
In a way, this result is complementary to Theorem 1.5.14, where we
showed how to obtain a basis of V =W , starting from the right sort of basis
of V. Here we showed how to obtain a basis of V, starting from a basis of
Wand a basis of V =W .
Definition 1.5.22. Let TWV!Vbe a linear transformation. Tis
Fredholm if Ker.T/and Coker.T/are both finite-dimensional, in which
case the index of Tis dim.Ker.T// dim.Coker.T//.Þ
Example 1.5.23. (1) In case Vis finite-dimensional, every Tis Fred-
holm. Then by Corollary 1.5.20, dim.Ker.T// Ddim.Coker.T//, so T
has index 0. Thus in the finite-dimensional case, the index is completely
uninteresting.
(2) In the infinite-dimensional case, the index is an important invariant,
and may take on any integer value. For example, if VDrF11,LWV!
Vis left shift and RWV!Vis right shift, as in Example 1.1.23(1), then
Lnhas index nand Rnhas index n.
(3) If VDC1.R/, then DWV!Vhas kernel ff .x/ jf .x/ is a con-
stant functiong, of dimension 1, and is surjective, so Dhas index 1. Also,
IaWV!Vis injective and has image ff .x/ jf .a/ D0g, of codimen-
sion 1, so Iahas index 1.Þ
“book” — 2011/3/4 — 17:06 — page 30 — #44
i
i
i
i
i
i
i
i
30 1. Vector spaces and linear transformations
1.6 Dual spaces
We now consider the dual space of a vector space. The dual space is easy to
define, but we will have to be careful, as there is plenty of opportunity for
confusion.
Definition 1.6.1. Let Vbe a vector space over a field F. The dual V
of Vis
VDHomF.V; F/D flinear transformations TWV!Fg:Þ
Lemma 1.6.2. (1) If Vis a vector space over F, then Vis isomorphic to a
subspace of V.
(2) If Vis finite-dimensional, then Vis isomorphic to V. In particular,
in this case dim VDdim V.
Proof. Choose a basis Bof V,BD fv1; v2; : : :g. Let Bbe the subset of
Vgiven by BD fw
1; w
2; : : :gwhere v
iis defined by w
i.vi/D1and
w
i.vj/D0if j¤i. (This defines w
iby Lemma 1.2.23.) We claim that
Bis a linearly independent set. To see this, suppose Pcjw
jD0. Then
.Pcjw
j/.v/ D0for every v2V. Choosing vDvi, we see that ciD0,
for each i.
The linear transformation SBWV!Vdefined by SB.vi/Dw
i
takes the basis Bof Vto the independent set Bof V, so is an injection
(more precisely, an isomorphism from Vto the subspace of Vspanned by
B).
Suppose Vis finite-dimensional and let wbe an element of V. Let
w.vi/Daifor each i. Let vDPaivi, a finite sum since Vis finite-
dimensional. For each i,SB.v/.vi/Dw.vi/. Since these two linear trans-
formations agree on the basis Bof V, by Lemma 1.2.23 they are equal, i.e.,
SB.v/ Dw, and SBis a surjection.
Remark 1.6.3. It is important to note that there is no natural map from
Vto V. The linear transformation SBdepends on the choice of basis B.
In particular, if Vis finite-dimensional then, although Vand Vare iso-
morphic as abstract vector spaces, there is no natural isomorphism between
them, and it would be a mistake to identify them. Þ
Remark 1.6.4. If VDFnwith Ethe standard basis fe1; : : : ; eng, then
the proof of Lemma 1.6.2 gives the standard basis Eof V,ED fe
1;:::;
“book” — 2011/3/4 — 17:06 — page 31 — #45
i
i
i
i
i
i
i
i
1.6. Dual spaces 31
e
ng, defined by
e
i0
B
@2
6
4
a1
:
:
:
an
3
7
51
C
ADai:Þ
Remark 1.6.5. The basis B(and hence the map SB) depends on the
entire basis B. For example, let VDF2and choose the standard basis E
of V,
ED1
0;0
1D˚e1; e2:
Then Eis the basis fe
1; e
2gof V, with
e
1x
yDxand e
2x
yDy:
If we choose the basis Bof Vgiven by
BD1
0;1
1D˚v1; v2;
then BD fw
1; w
2gwith
w
1x
yDxCyand w
2x
yD y:
Thus, even though v1De1,w
1¤e
1.Þ
Example 1.6.6. If Vis infinite-dimensional, then in general the linear
transformation SBis an injection but not a surjection. Let VDF1with
basis ED fe1; e2; : : :gand consider the set ED fe
1; e
2; : : :g. Any element
wof the subspace Vspanned by Ehas the property that w.ei/¤0for
only finitely many values of i. This is not the case for a general element of
V. In fact, Vis isomorphic to F11 as follows: If
vD2
6
4
a1
a2
:
:
:
3
7
52F1and xD2
6
4
b1
b2
:
:
:
3
7
52F11
then we have the pairing x.v/ DPaibi. (This makes sense for any x, as
only finitely many entries of vare nonzero.) Any element wof Varises
“book” — 2011/3/4 — 17:06 — page 32 — #46
i
i
i
i
i
i
i
i
32 1. Vector spaces and linear transformations
in this way as we may choose
xD2
6
4
we1
we2
:
:
:
3
7
5:
Thus in this case the image of SBis F1F11.Þ
Remark 1.6.7. The preceding example leaves open the possibility that
Vmight be isomorphic to Vby some other isomorphism than TB. That is
also not the case in general. We have seen in Remark 1.2.19 that F1is a
vector space of countably infinite dimension and F11 is a vector space of
uncountably infinite dimension. Þ
Remark 1.6.8. Just as a typical element of Vis denoted by v, a typical
element of Vis often denoted by v. This notation carries the danger of
giving the impression that there is a natural map from Vto Vgiven by
v7! v(i.e., that the element vof Vis the dual of the element vof V),
and we emphasize again that that is not the case. There is no such natural
map and that is does not make sense to speak of the dual of an element of V.
Thus we do not use this notation and instead use wto denote an element
of V.Þ
Example 1.6.9 (Compare Example 1.2.22).Let VDPn1.R/for any
n.
(1) For any a2R,Vhas basis BD fp0.x/; p1.x/; : : : ; pn1.x/g
where p0.x/ D1and pk.x/ D.x a/k=kŠ for kD1; : : : ; n 1. The dual
basis Bis given by BD fEa;EaıD;:::;EaıDn1g.
(2) For any distinct a1; : : : ; an2R,Vhas basis CD fq1.x/; : : : ; qn.x/g
with qk.x/ Dj¤k.x aj/=.akaj/. The dual basis Cis given by
CD fEa1;:::;Eang.
(3) Fix an interval Œa; b and let TWV!Rbe the linear transformation
Tf .x/DZb
a
f .x/ dx:
Then T2V. Since C(as above) is a basis of V, we have TD
Pn
iD1ciEaifor some constants c1; : : : ; cn.
In other words, we have the exact quadrature formula, valid for every
f .x/ 2V,
Zb
a
f .x/ dx D
n
X
iD1
cif .ai/:
“book” — 2011/3/4 — 17:06 — page 33 — #47
i
i
i
i
i
i
i
i
1.6. Dual spaces 33
For simplicity, let Œa; b DŒ0; 1, and let us for example choose equally
spaced points.
For nD0choose a1D1=2. Then c1D1, i.e.,
Z1
0
f .x/ dx Df .1=2/ for f2P0.R/:
For nD1, choose a1D0and a2D1. Then c1Dc2D1=2, i.e.,
Z1
0
f .x/ dx D.1=2/f .0/ C.1=2/f .1/ for f2P1.R/:
For nD2, choose a1D0,a2D1=2,a3D1. Then c1D1=6,
c2D4=6,c3D1=6, i.e.,
Z1
0
f .x/ dx D.1=6/f .0/ C.4=6/f .1=2/ C.1=6/f .1/ for f2P2.R/:
The next two expansions of this type are
Z1
0
f .x/ dx D.1=8/f .0/ C.3=8/f .1=3/ C.3=8/f .2=3/
C.1=8/f .1/ for f2P3.R/;
Z1
0
f .x/ dx D.7=90/f .0/ C.32=90/f .1=4/ C.12=90/f .1=2/
C.32=90/f .3=4/ C.7=90/f .1/ for f2P4.R/:
These formulas are the basis for commonly used approximate quadra-
ture formulas: The first three yield the midpoint rule, the trapezoidal rule,
and Simpsons rule respectively.
(4) Fix an interval Œa; b and for any polynomial g.x/ let
Tg.x/ DZb
a
f .x/g.x/ dx:
Then Tg.x/ 2V. Let DD fT1;Tx;:::;Txn1g. We claim that Dis
linearly independent. To see this, suppose that
TDa0T1Ca1TxC  C an1Txn1D0:
Then TDTg.x/ with g.x/ Da0Ca1xCCan1xn12V. To say that
TD0is to say that T.f .x// D0for every f .x/ 2V. But if we choose
f .x/ Dg.x/, we find
Tf .x/DTg.x/ g.x/DZb
a
g.x/2dx D0
“book” — 2011/3/4 — 17:06 — page 34 — #48
i
i
i
i
i
i
i
i
34 1. Vector spaces and linear transformations
which forces g.x/ D0, i.e., a0Da1D  D an1D0, and Dis
linearly independent.
Since Dis a linearly independent set of nelements in V, a vector
space of dimension n, it must be a basis of V, so every element of V
is Tg.x/ for a unique g.x/ 2V. In particular this is true for Ecfor every
c2Œa; b. It is simply a matter of solving a linear system to find g.x/. For
example, let Œa; b DŒ0; 1 and let cD0. We find
f .0/ DZ1
0
f .x/g.x/ dx
for g.x/ D1if f .x/ 2P0.R/;
for g.x/ D46x if f .x/ 2P1.R/;
for g.x/ D936x C30x2if f .x/ 2P2.R/;
for g.x/ D16 120x C240x2140x3if f .x/ 2P3.R/;
for g.x/ D25 300x C1050x21400x3C630x4if f .x/ 2P4.R/:
Admittedly, we rarely if ever want to evaluate a function at a point by com-
puting an integral instead, but this shows how it could be done.
We have presented (3) and (4) here so that the reader may see some
interesting examples early, but they are best understood in the context of
inner product spaces, which we consider in Chapter 7. Þ
To every subspace of Vwe can naturally associate a subspace of V
(and vice-versa), as follows.
Definition 1.6.10. Let Ube a subspace of V. Then the annihilator
Ann.U / is the subspace of Vdefined by
Ann.U / D˚w2Vjw.u/ D0for every u2U:Þ
Lemma 1.6.11. Let Ube a finite-dimensional subspace of V. Then
V=Ann.U / is isomorphic to U. Consequently,
codim Ann.U /Ddim.U /:
Proof. Set XDAnn.U / and let fx
1; x
2; : : :gbe a basis of X. Let
fu1; : : : ; ukgbe a basis for U. Let U0be a complement of U, so VD
U˚U0, and let fu0
1; u0
2; : : :gbe a basis of U0. Then fu1; : : : ; uk; u0
1; u0
2; : : :g
“book” — 2011/3/4 — 17:06 — page 35 — #49
i
i
i
i
i
i
i
i
1.6. Dual spaces 35
is a basis of V. For jD1; : : : ; k define y
j2Vby
y
juiD0if i¤j;
y
jujD1;
y
ju0
mD0for every m:
We claim fy
1; : : : ; y
k; x
1; x
2; : : :gis a basis of V. First we show it
is linearly independent: Suppose Pcjy
jCPdmx
mD0. Evaluating this
function at uiwe see it has the value ci, so ciD0for iD1; : : : ; k. Then
dmD0for each mas fx
1; x
2; : : :gis linearly independent. Next we show
it spans V: Let w2V. For jD1; : : : ; k, let ciDw.ui/. Let yD
wPcjy
j. Then y.ui/D0for each i, so y2Ann.U /and hence
yDPdmx
mfor some d1; : : : ; dm. Then wDPcjy
jCPdmx
m.
Let Ybe the subspace of Vspanned by fy
1; : : : ; y
kg. Then VD
X˚Yso V=Xis isomorphic to Y. But we have an isomorphism
SWU!Ygiven by S.ui/Dy
i. (If we let u
ibe the restriction of y
ito
U, then fu
1; : : : ; u
kgis the dual basis to fu1; : : : ; ukg.)
Remark 1.6.12. We often think of Lemma 1.6.11 as follows: Suppose we
have klinearly independent elements u1; : : : ; ukof V, so that they generate
a subspace Uof Vof dimension k. Then the requirements that a linear
transformation from Vto Fbe zero at each of u1; : : : ; ukimposes klinearly
independent conditions on the space of all such linear transformations, so
the subspace of linear transformations satisfying precisely these conditions,
which is Ann.U /, has codimension k.Þ
To go the other way, we have the following association.
Definition 1.6.13. Let Ube a subspace of V. Then the annihilator
Ann.U /is the subspace of Vdefined by
Ann.U /D˚v2Vjw.v/ D0for every w2U:
Þ
Remark 1.6.14. Observe that Ann.f0g/DVand Ann.V / D f0g;
similarly Ann.f0g/DVand Ann.V /D f0g.Þ
If Vis finite-dimensional, our pairings are inverses of each other, as we
now see.
Theorem 1.6.15. (1) For any subspace Uof V,Ann.Ann.U // DU.
(2) Let Vbe finite-dimensional. For any subspace Uof V,
Ann.Ann.U // DU:
“book” — 2011/3/4 — 17:06 — page 36 — #50
i
i
i
i
i
i
i
i
36 1. Vector spaces and linear transformations
So far in this section we have considered vectors, i.e., objects. We now
consider linear transformations, i.e., functions. We first saw pullbacks in
Example 1.1.23(3), and now we see them again.
Definition 1.6.16. Let TWV!Xbe a linear transformation. Then
the dual Tof Tis the linear transformation TWX!Vgiven by
T.y/DyıT, i.e., T.y/2Vis the linear transformation on V
defined by
Ty.v/ DyıT.v/ DyT.v/;for y2X:Þ
Remark 1.6.17. (1) It is easy to check that T.y/is a linear transfor-
mation for any y2X. But we are claiming more, that y7! T.y/
is a linear transformation from Vto X. This follows from checking that
T.y
1Cy
2/DT.y
1/CT.y
2/and T.cy/DcT.y/.
(2) The dual Tof Tis well-defined and does not depend on a choice
of basis, as it was defined directly in terms of T.Þ
Now we derive some relations between various subspaces.
Lemma 1.6.18. Let TWV!Xbe a linear transformation. Then Im.T/D
Ann.Ker.T//.
Proof. Let w2Vbe in Im.T/, so wDT.y/for some y2X.
Then for any u2Ker.T/,w.u/ D.T.y//.u/ Dy.T.u// Dy.0/ D
0, so wis in Ann.Ker.T//. Thus we see that Im.T/Ann.Ker.T//.
Let w2Vbe in Ann.Ker.T//, so w.u/ D0for every u2
Ker.T/. Let V0be a complement of Ker.T/, so VDKer.T/˚V0. Then
we may write any v2Vuniquely as vDuCv0with u2Ker.T/,
v02V0. Then w.v/ Dw.u Cv0/Dw.u/ Cw.v0/Dw.v0/. Also,
T.v/ DT.v0/, so T.V / DT.V 0/. Let X0be any complement of T.V 0/in
X, so that XDT.V 0/˚X0.
Since the restriction of Tto V0is an isomorphism, we may write x2X
uniquely as xDT.v0/Cx0with v02V0and x02X0. Define y2Xby
y.x/ Dw.v0/where xDT.v0/Cx0; v02V0and x02X0:
(It is routine to check that yis a linear transformation.) Then for v2V,
writing vDuCv0, with u2Ker.T/and v02V0, we have
Ty.v/ DyT.v/DyT.v0/Dw.v0/Dw.v/:
Thus T.y/Dwand we see that Ann.Ker.T// Im.T/.
“book” — 2011/3/4 — 17:06 — page 37 — #51
i
i
i
i
i
i
i
i
1.6. Dual spaces 37
The following corollary gives a useful dimension count.
Corollary 1.6.19. Let TWV!Xbe a linear transformation.
(1) If Ker.T/is finite-dimensional, then
codim Im TDdim Coker TDdim Ker.T/:
(2) If Coker.T/is finite-dimensional, then
dim Ker TDdim Coker.T/Dcodim Im.T/:
Proof. (1) Let UDKer.T/. By Lemma 1.6.11,
dim Ker TDcodim AnnKer.T/:
By Lemma 1.6.18,
AnnKer.T/DIm T:
(2) is proved using similar ideas and we omit the proof.
Here is another useful dimension count.
Corollary 1.6.20. Let TWV!Xbe a linear transformation.
(1) If dim.V / is finite, then
dim Im TDdim Im T:
(2) If dim.V / Ddim.X/ is finite, then
dim Ker TDdim Ker.T/:
Proof. (1) By Theorem 1.3.1 and Corollary 1.6.19,
dim.V / dim Im.T/Ddim Ker.T/
Dcodim Im TDdim Vdim Im.T/;
and by Lemma 1.6.2, dim.V /Ddim.V /.
(2) By Theorem 1.3.1 and Lemma 1.6.2,
dim.Ker T/Ddim Xdim Im.T/
Ddim.V / dim Im.T/Ddim Ker.T/:
“book” — 2011/3/4 — 17:06 — page 38 — #52
i
i
i
i
i
i
i
i
38 1. Vector spaces and linear transformations
Remark 1.6.21. Again we caution the reader that although we have
equality of dimensions, there is no natural identification of the subspaces
in each part of Corollary 1.6.20. Þ
Lemma 1.6.22. Let TWV!Xbe a linear transformation.
(1) Tis injective if and only if Tis surjective.
(2) Tis surjective if and only if Tis injective.
(3) Tis an isomorphism if and only if Tis an isomorphism.
Proof. (1) Suppose that Tis injective. Let w2Vbe arbitrary. To show
that Tis surjective we must show that there is a y2Xwith T.y/D
w, i.e., yıTDw.
Let BD fv1; v2; : : :gbe a basis of Vand set xiDT.vi/.Tis injective
so fx1; x2; : : :gis a linearly independent set in X. Extend this set to a basis
CD fx1; x2; : : : ; x0
1; x0
2; : : :gof Xand define a linear transformation UW
X!Vby U.xi/Dvi,U.x0
j/D0. Note UT .vi/Dvifor each iso UT
is the identity map on V. Set yDwıU. Then T.y/DyıTD
.wıU/ıTDwı.UıT/Dw.
Suppose that Tis not injective and choose v¤0with T.v/ D0. Then
for any y2X,T.y/.v/ D.yıT/.v/ Dy.T.v// Dy.0/ D0.
But not every element wof Vhas w.v/ D0. To see this, let v1Dvand
extend v1to a basis BD fv1; v2; : : :gof V. Then there is an element wof
Vdefined by w.v1/D0,w.vi/D0for i¤1.
(2) Suppose that Tis surjective. Let y2X. To show that Tis
injective we must show that if T.y/D0, then yD0. Thus, suppose
T.y/D0, i.e., that .T.y//.v/ D0for every v2V. Then 0D
.T.y//.v/ D.yıT/.v/ Dy.T.v// for every v2V. Choose x2
X. Then, since Tis surjective, there is a v2Vwith xDT.v/, and so
y.x/ Dy.T.v// D0. Thus y.x/ D0for every x2X, i.e., yD0.
Suppose that Tis not surjective. Then Im.T/is a proper subspace of
X. Let fx1; x2; : : :gbe a basis for Im.T/and extend this set to a basis
CD fx1; x2; : : : ; x0
1; x0
2; : : :gof X. Define y2Xby y.xi/D0for all
i,y.x0
1/D1, and y.x0
j/D0for j¤1. Then y¤0, but y.x/ D0
for every x2Im.T/. Then
Ty.v/ DyıT.v/ DyT.v/D0
so T.y/D0.
(3) This immediately follows from (1) and (2).
Next we see how the dual behaves under composition.
“book” — 2011/3/4 — 17:06 — page 39 — #53
i
i
i
i
i
i
i
i
1.6. Dual spaces 39
Lemma 1.6.23. Let TWV!Wand SWW!Xbe linear transfor-
mations. Then SıTWV!Xhas dual .SıT/WX!Vgiven by
.SıT/DTıS.
Proof. Let y2Xand let x2X. Then
.SıT/y/.x/ Dy.SıT/.x/DyST.x/
DSyT.x/DTSy.x/
DTıSy.x/:
Since this is true for every xand y,.SıT/DTıS.
We can now consider the dual V of V, known as the double dual
of V.
An element of Vis a linear transformation from Vto F, and so is a
function from Vto F. An element of Vis a linear transformation from
Vto F, and so is a function from Vto F. In other words, an element
of V is a function on functions. There is one natural way to get a func-
tion on functions: evaluation at a point. This is the linear transformation Ev
(“Evaluation at v”) of the next definition.
Definition 1.6.24. Let Ev2V be the linear transformation EvW
V!Fdefined by Ev.w/Dw.v/ for every w2V.Þ
Remark 1.6.25. It is easy to check that Evis a linear transformation.
Also, Evis naturally defined. It does not depend on a choice of basis. Þ
Lemma 1.6.26. The linear transformation HWV!V given by H.v/ D
Evis an injection. If Vis finite-dimensional, it is an isomorphism.
Proof. Let vbe an element of Vwith EvD0. Now Evis an element of
V, the dual of V, so EvD0means that for every w2V,Ev.w/D
0. But Ev.w/Dw.v/. Thus v2Vhas the property that w.v/ D0
for every w2V. We claim that vD0. Suppose not. Let v1Dvand
extend fv1gto a basis BD fv1; v2; : : :gof V. Consider the dual basis
BD fw
1; w
2; : : :gof V. Then w
1.v1/D1¤0.
If Vis finite-dimensional, then Evis an injection between vector spaces
of the same dimension and hence is an isomorphism.
Remark 1.6.27. As is common practice, we will often write vD
H.v/ in case Vis finite-dimensional. The map v7! vthen provides a
canonical identification of elements of Vwith elements of V, as there is
no choice, of basis or anything else, involved. Þ
“book” — 2011/3/4 — 17:06 — page 40 — #54
i
i
i
i
i
i
i
i
40 1. Vector spaces and linear transformations
Beginning with a vector space Vand a subspace Uof V, we obtained
from Definition 1.6.10 the subspace Ann.U / of V. Similarly, beginning
with the subspace Ann.U / of Vwe could obtain the subspace
Ann.Ann.U // of V. This is not the construction of Definition 1.6.13,
which would give us the subspace Ann.Ann.U //, which we saw in Theo-
rem 1.6.15 was just U. But these two constructions are closely related.
Corollary 1.6.28. Let Vbe a finite-dimensional vector space and let Ube
a subspace of V. Let Hbe the linear transformation of Lemma 1.6.26. Then
HWU!Ann.Ann.U // is an isomorphism.
Since we have a natural way of identifying finite-dimensional vector
spaces with their double duals, we should have a natural way of identifying
linear transformations between finite-dimensional vector spaces with linear
transformations between their double duals, and we do.
Definition 1.6.29. Let Vand Xbe finite-dimensional vector spaces.
If TWV!Xis a linear transformation, its double dual is the linear
transformation T WV!X given by T.v/D.T.v//.Þ
Lemma 1.6.30. Let Vand Xbe nite-dimensional vector spaces. Then
T7! T is an isomorphism from HomF.V; X/ D flinear transformations:
V!Xgto HomF.V ; X/D flinear transformations:V !Xg.
Proof. It is easy to check that T7! T is a linear transformation. Since V
and V have the same dimension, as do Xand X,flinear transformations:
V!Xgand flinear transformations:V !Xgare vector spaces of
the same dimension. Thus in order to show that T7! Tis an isomor-
phism, it suffices to show that T7! Tis an injection. Suppose TD0,
i.e., T.v/D0for every v 2V. Let v2Vbe arbitrary. Then
0DT.v/D.T.v// DH.T.v//. But His an isomorphism by
Lemma 1.6.26, so T.v/ D0. Since this is true for every v2V,TD0.
Remark 1.6.31. In the infinite-dimensional case it is in general not true
that Vis isomorphic to V. For example, if VDF1we have seen in
Example 1.6.6 that Vis isomorphic to F11. Also, Vis isomorphic to
a subspace of V. We thus see that Vhas countably infinite dimension
and V has uncountably infinite dimension, so they cannot be isomorphic.
Þ
“book” — 2011/3/4 — 17:06 — page 41 — #55
i
i
i
i
i
i
i
i
CHAPTER 2
Coordinates
In this chapter we investigate coordinates.
It is useful to keep in mind the metaphor:
Coordinates are a language for describing vectors and linear
transformations.
In human languages we have, for example:
ŒEnglish Dstar, ŒFrench D´etoile, ŒGerman DStern,
Œ!English Darrow, Œ!French D`eche, Œ!German DPfeil.
Coordinates share two similarities with human languages, but have one
important difference.
(1) Often it is easier to work with objects, and often it is easier to work
with words that describe them. Similarly, often it is easier and more
enlightening to work with vectors and linear transformations directly,
and often it is easier and more enlightening to work with their descrip-
tions in terms of coordinates, i.e., with coordinate vectors and matrices.
(2) There are many different human languages and it is useful to be able to
translate among them. Similarly, there are different coordinate systems
and it is not only useful but indeed essential to be able to translate
among them.
(3) A problem expressed in one human language is not solved by translat-
ing it into a second langauge. It is just expressed it differently. Coordi-
nate systems are different. For many problems in linear algebra there
is a preferred coordinate system, and translating the problem into that
41
“book” — 2011/3/4 — 17:06 — page 42 — #56
i
i
i
i
i
i
i
i
42 Guide to Advanced Linear Algebra
language greatly simplifies it and helps to solve it. This is the idea be-
hind eigenvalues, eigenvectors, and canonical forms for matrices. We
save their investigation for a later chapter.
2.1 Coordinates for vectors
We begin by restating Lemma 1.2.21.
Lemma 2.1.1. Let Vbe a vector space and let BD fvigbe a set of vectors
in V. Then Bis a basis for Vif and only if every v2Vcan be written
uniquely as vDPcivifor ci2F, all but finitely many zero.
With this lemma in hand we may make the following important defini-
tion.
Definition 2.1.2. Let Vbe an n-dimensional vector space and let BD
fv1; : : : ; vngbe a basis for V. For v2Vthe coordinate vector of vwith
respect to the basis B,ŒvB, is given as follows: If vDPcivi, then
ŒvBD2
6
6
6
4
c1
c2
:
:
:
cn
3
7
7
7
52Fn:Þ
Theorem 2.1.3. Let Vbe an n-dimensional vector space and let Bbe a
basis of V. Then TWV!Fnby T.v/ DŒvBis an isomorphism.
Proof. Let BD fv1; : : : ; vng. Define SWFn!Vby
S0
B
@2
6
4
c1
:
:
:
cn
3
7
51
C
ADXcivi:
It is easy to check that Sis a linear transformation, and then Lemma 2.1.1
shows that Sis an isomorphism. Furthermore, TDS1.
Example 2.1.4. (1) Let VDFnand let BDEbe the standard basis.
If vDc1
:
:
:
cn, then vDPciei(where ED fe1; : : : ; eng) and so ŒvED
c1
:
:
:
cn. That is, a vector “looks like itself” in the standard basis.
(2) Let Vbe arbitrary and let BD fb1; : : : ; bngbe a basis for V. Then
ŒbiBDei.
“book” — 2011/3/4 — 17:06 — page 43 — #57
i
i
i
i
i
i
i
i
2.2. Matrices for linear transformations 43
(3) Let VDR2, let EDnh1
0i;h0
1ioD fe1; e2gand let BDnh1
2i;h3
7ioD
fb1; b2g. Then Œb1EDh1
2iand Œb2EDh3
7i(as h1
2iD1h1
0iC2h0
1iand
h3
7iD3h1
0iC7h0
1i).
On the other hand, Œe1BDh7
2iand Œe2BDh3
1i(as h1
0iD7h1
2iC
.2/h3
7iand h0
1iD.3/h1
0iC1h3
7i).
Let v1Dh17
39i. Then Œv1EDh17
39i. Also, Œv1BDhx1
x2iwhere v1D
x1b1Cx2b2, i.e., h17
39iDx1h1
2iCx2h3
7i. Solving, we find x1D2,x2D
5, so Œv1BDh2
5i. Similarly, let v2Dh27
62i. Then Œv2EDh27
62i. Also,
Œv2BDhy1
y2iwhere v2Dy1b1Cy2b2, i.e., h27
62iDy1h1
2iCy2h3
7i.
Solving, we find y1D3,y2D8, so Œv2BDh3
8i.
(4) Let VDP2.R/, let B0D f1; x; x2g, and let B1D f1; x 1;
.x 1/2g. Let p.x/ D36x C4x2. Then
p.x/B0D2
4
3
6
43
5:
Also p.x/ D1C2.x 1/ C4.x 1/2, so
p.x/B1D2
4
1
2
43
5:Þ
2.2 Matrices for linear transformations
Let Vand Wbe vector spaces of finite dimensions nand mrespectively
with bases BD fv1; : : : ; vngand CD fw1; : : : ; wmgand let TWV!W
is a linear transformation. Then we have isomorphisms SWV!Fngiven
by S.v/ DŒvBand UWW!Fmgiven by U.w/ DŒwC, and we
may form the composition UıTıS1WFn!Fm. Since this is a linear
transformation, it is given by multiplication by a unique matrix. We are thus
led to the following definition.
“book” — 2011/3/4 — 17:06 — page 44 — #58
i
i
i
i
i
i
i
i
44 Guide to Advanced Linear Algebra
Definition 2.2.1. Let Vbe an n-dimensional vector space with basis
BD fv1; : : : ; vngand let Wbe an m-dimensional vector space with basis
CD fw1; : : : ; wmg. Let TWV!Wbe a linear transformation. The matrix
of the linear transformation Twith respect to the bases Band C, denoted
ŒTC B, is the unique matrix such that
ŒTC BŒvBDŒT.v/Cfor every v2V: Þ
It is easy to write down ŒTC B(at least in principle).
Lemma 2.2.2. In the situation of Definition 2.2.1, the matrix ŒTC Bis
given by
ŒTC BDTv1CjTv2Cj  j TvnC;
i.e., ŒTC Bis the matrix whose ith column is ŒT.vi/C, for each i.
Proof. By Lemma 1.2.23, we need only verify the equation ŒTC BŒvD
ŒT.v/Cfor vDvi,iD1 : : : ; n. But ŒviBDeiand ŒTC Beiis the
ith column of ŒTC B, i.e., ŒTC BŒviBDŒTC BeiDŒT.vi/Cas
required.
Theorem 2.2.3. Let Vbe a vector space of dimension nand let Wbe a
vector space of dimension mover a field F. Choose bases Bof Vand Cof
W. Then the linear transformation
SW flinear transformations TWV!Wg
! fm-by-nmatrices with entries in Fg
given by S.T/DŒTC Bis an isomorphism.
Corollary 2.2.4. In the situation of Theorem 2.2.3, flinear transformations
TWV!Wgis a vector space over Fof dimension mn.
Proof. fm-by-nmatrices with entries in Fgis a vector space of dimension
mn, with basis the set of matrices fEij g,1im,1jn, where Eij
has an entry of 1in the .i; j / position and all other entries 0.
Lemma 2.2.5. Let U,V, and Wbe finite-dimensional vector spaces with
bases B,C, and Drespectively. Let TWU!Vand SWV!Wbe
linear transformations. Then SıTWU!Wis a linear transformation
with
ŒSıTD BDŒSD CŒTC B:
“book” — 2011/3/4 — 17:06 — page 45 — #59
i
i
i
i
i
i
i
i
2.2. Matrices for linear transformations 45
Proof. For any u2W,
ŒSD CŒTC BŒuBDŒSD CŒTC BŒuB
DŒSD CT.u/C
DST.u/DDSıT.u/D:
But also ŒSıTD BŒuBDŒ.SıT/.u/Dso
ŒSıTD BDŒSD CŒTC B:
Example 2.2.6. Let Abe an m-by-nmatrix and let TAWFn!Fmbe
defined by TA.v/ DAv. Choose the standard bases Enfor Fnand Emfor
Fm. Write ADŒa1ja2j  j an, i.e., aiis the ith column of A. Then
ŒTAEm Enis the matrix whose ith column is
TAeiEmDAeiEmDaiEmDai;
so we see that ŒTAEm EnDA. That is, multiplication by a matrix “looks
like itself” with respect to the standard bases. Þ
The following definition is the most important special case of Defini-
tion 2.2.1, and the case we will concentrate on.
Definition 2.2.7. Let Vbe an n-dimensional vector space with basis
BD fv1; : : : ; vngand let TWV!Vbe a linear transformation. The
matrix of the linear transformation Tin the basis B, denoted ŒTB, is the
unique matrix such that
ŒTBŒvBDŒT.v/Bfor every v2V: Þ
Remark 2.2.8. Comparing Definition 2.2.7 with Definition 2.2.1, we see
that we have simplified our notation in this special case: We have replaced
ŒTB Bby ŒTB.
With this simplification, the conclusion of Lemma 2.2.2 reads
ŒTBDTv1BjTv2Bj  j TvnB:Þ
We also make the following observation.
Lemma 2.2.9. Let Vbe a finite-dimensional vector space and let Bbe a
basis of V.
(1) If TDI, the identity linear transformation, then ŒTBDI, the
identity matrix.
(2) TWV!Vis an isomorphism if and only if ŒTBis an invertible
matrix, in which case ŒT1BDTB/1.
“book” — 2011/3/4 — 17:06 — page 46 — #60
i
i
i
i
i
i
i
i
46 Guide to Advanced Linear Algebra
Example 2.2.10. Let TWR2!R2be given by T.v/ Dh65 24
149 55iv.
Then ŒTEDh65 24
149 55i. Let Bbe the basis BD fb1; b2gwith b1Dh1
2i
and b2Dh3
7i. Then ŒTBDŒŒv1BjŒv2Bwhere
v1DTb1D65 24
149 551
2D17
39
and
v2DTb2D65 24
149 553
7D27
62:
We have computed Œv1Band Œv2Bin Example 2.1.4(3) where we obtained
Œv1BD2
5and Œv2BD3
8, so ŒTBD2 3
5 8.Þ
We shall see further examples of matrices of particularly interesting lin-
ear transformations in Example 2.3.18.
2.3 Change of basis
We now investigate how to change coordinates. In our metaphor of coordi-
nates providing a language, changing coordinates is like translating between
languages. We look at translation between languages first, in order to guide
us later.
Suppose we wish to translate from English to English, for example, or
from German to German. We could do this by using an English to English
dictionary, or a German to German dictionary, which would look in part
like:
English English
star star
arrow arrow
German German
Stern Stern
Pfeil Pfeil
The two columns are identical. Indeed, translating from any language to
itself leaves every word unchanged, or to express it mathematically, it is the
identity transformation.
Suppose we wish to translate from English to German or from German
to English. We could use an English to German dictionary or a German to
English dictionary, which would look in part like:
“book” — 2011/3/4 — 17:06 — page 47 — #61
i
i
i
i
i
i
i
i
2.3. Change of basis 47
English German
star Stern
arrow Pfeil
German English
Stern star
Pfeil arrow
The effect of translating from German to English is to reverse the ef-
fect of translating from English to German, and vice versa. Mathematically,
translating from German to English is the inverse of translating from En-
glish to German, and vice versa.
Suppose that we wish to translate from English to German but we do not
have an English to German dictionary available. However, we do have an
English to French dictionary, and a French to German dictionary available,
and they look in part like:
English French
star ´etoile
arrow fl`eche
French German
´etoile Stern
`eche Pfeil
We could translate from English to German by first translating from
English to French, and then translating from French to German. Mathemat-
ically, translating from English to German is the composition of translating
from English to French followed by translating from French to German.
We now turn from linguistics to mathematics.
Let Vbe an n-dimensional vector space with bases BD fv1; : : : ; vng
and CD fw1; : : : ; wng. Then we have isomorphisms SWV!Fngiven by
S.v/ DŒvB, and TWV!Fngiven by T.v/ DŒvC. The composition
TıS1WFn!Fnis then an isomorphism, and TıS1.ŒvB/DŒvC.
By Lemma 1.1.12, it isomorphism is given by multiplication by a unique
(invertible) matrix. We make the following definition.
Definition 2.3.1. Let Vbe an n-dimensional vector space with bases
BD fv1; : : : ; vngand CD fw1; : : : ; wmg. The change of basis matrix
PC B, is the unique matrix such that
PC BŒvBDŒvC
for every v2V.Þ
It is easy to write down, at least in principle, PC B.
Lemma 2.3.2. In the situation of Definition 2.3.1, the matrix PC Bis
given by
PC BDv1Cjv2Cj  j vnC;
i.e., PC Bis the matrix whose ith column is ŒviC.
“book” — 2011/3/4 — 17:06 — page 48 — #62
i
i
i
i
i
i
i
i
48 Guide to Advanced Linear Algebra
Proof. By Lemma 1.2.23, we need only verify the equation PC BŒvBD
ŒvCfor vDvi,iD1; : : : ; n. But ŒviBDeiand PC Beiis the ith
column of PC B, i.e., PC BŒviBDPC BeiDŒviCas required.
Remark 2.3.3. If we think of Bas the “old” basis, i.e., the one we are
translating from, and Cas the “new” basis, i.e., the one we are translating
to, then this lemma says that in order to solve the translation problem for an
arbitrary vector v2V, we need only solve the translation problem for the
old basis vectors, and write down their translations in successive columns to
form a matrix. Then multiplication by that matrix does translation for every
vector. Þ
We have a theorem that parallels our discussion of translation between
human languages.
Theorem 2.3.4. Let Vbe a finite-dimensional vector space.
(1) For any basis Bof V,PB BDIis the identity matrix.
(2) For any two bases Band Cof V,PC Bis invertible and .PC B/1D
PB C.
(3) For any three bases B,C, and Dof V,PD BDPD CPC B.
Proof. (1) For any v2V,
ŒvBDI ŒvBDPB BŒvB;
so PB BDI.
(2) For any v2V,
.PB CPC B/ŒvBDPB C.PC BŒvB/DPB CŒvCDŒvB;
so PB CPC BDI, and similarly PC BPB CDIso .PC B/1D
PB C.
(3) PD Bis the matrix defined by PD BŒvBDŒvD. But
.PD CPC B/ŒvBDPD C.PC BŒvB/DPD CŒvCDŒvD;
so PD BDPD CPC B.
Remark 2.3.5. There is no uniform notation for PC B. We have chosen
a notation that we feel is mnemonic: PC BŒvBDŒvCas the subscript
B” of ŒvBis near the “B” in the subscript “C B” of PC B, and
this subscript goes to “C”, which is the subscript in the answer ŒvC. Some
other authors denote PC Bby PB
Cand some by PC
B. The reader should
pay careful attention to the author’s notation as interchanging the two bases
takes the change of basis matrix to its inverse. Þ
“book” — 2011/3/4 — 17:06 — page 49 — #63
i
i
i
i
i
i
i
i
2.3. Change of basis 49
Remark 2.3.6. (1) There is one case in which the change of basis matrix
is easy to write down. Suppose VDFn,BD fv1; : : : ; vngis a basis of V,
and ED fe1; : : : ; engis the standard basis of V. Then, by Example 2.1.4(1),
ŒviEDvi, so
PE BDŒv1jv2j  j vn:
Thus, the change of basis matrix into the standard basis is easy to find.
(2) It is more often the case that we wish to find the change of basis
matrix out of the standard basis, i.e., we wish to find PB E. Then it requires
work to find ŒeiB. Instead we may write down PE Bas in (1) and then
find PB Eby PB ED.PE B/1.
(3) Suppose we have two bases Band Cof Fnneither of which is the
standard basis. We may find PC Bdirectly, or else we may find PC Bby
PC BDPC EPE BD.PE C/1PE B.Þ
Lemma 2.3.7. Let Pbe an n-by-nmatrix. Then Pis a change of basis
matrix between two bases of Fnif and only if Pis invertible.
Proof. Let PD.pij /. Choose a basis CD fw1; : : : ; wngof V. Let viD
Pjpij wj. Then BD fv1; : : : ; vngis a basis of Vif and only if Pis
invertible, in which case PDPC B.
Remark 2.3.8. Comparing Lemma 2.2.2 and Lemma 2.3.2, we observe
that PC BDŒIC Bwhere IWFn!Fnis the identity linear transfor-
mation (I.v/ Dvfor every vin Fn). Þ
Example 2.3.9. Let VDR2,ED1
0;0
1, and BD1
2;3
7.
Let v1D17
39, so also Œv1ED17
39. We computed directly in Exam-
ple 2.1.4(3) that Œv1BD2
5. Let v2D27
62, so also Œv2ED27
62. We
computed directly in Example 2.1.4(3) that Œv2BD3
8.
We know from Remark 2.3.6(1) that PE BD1 3
2 7and from Re-
mark 2.3.6(2) that PB ED1 3
2 71
D73
2 1. Then we can easily
verify that
2
5D73
2 117
39and 3
8D73
2 127
62:Þ
“book” — 2011/3/4 — 17:06 — page 50 — #64
i
i
i
i
i
i
i
i
50 Guide to Advanced Linear Algebra
We shall see further particularly interesting examples of change of basis
matrices in Example 2.3.17.
Now we wish to investigate change of basis for linear transformations.
Again we will return to our metaphor of language, and see how linguistic
transformations work.
Let Tbe the transformation that takes an object to several of the same
objects, T.?/ D????,T.!/D!!!  !.
This is reflected in the linguistic transformation of taking the plural.
Suppose we wish to take the plural of German words, but we do not know
how. We consult our German to English and English to German dictionar-
ies:
German English
Stern star
Sterne stars
Pfeil arrow
Pfeile arrows
English German
star Stern
stars Sterne
arrow Pfeil
arrows Pfeile
We thus see that to take the plural of the German word Stern, we may
translate Stern into the English word star, take the plural (i.e., apply our
linguistic transformation) of the English word star, and translate this word
into German to obtain Sterne, the plural of the German word Stern. Simi-
larly, the path Pfeil !arrow !arrows !Pfeile gives us the plural of the
German word Pfeil.
The mathematical analog of this conclusion is the following theorem.
Theorem 2.3.10. Let Vbe an n-dimensional vector space and let TWV!
Vbe a linear transformation. Let Band Cbe any two bases of V. Then
ŒTCDPC BŒTBPB C:
Proof. For any vector v2V,
PC BŒTBPB CŒvCDPC BŒTBPB CŒvC
DPC BŒTBŒvB
DPC BŒTBŒvB
DPC BT.v/BDT.v/C:
But ŒTCis the unique matrix with
ŒTCŒvCDŒT.v/C
for every v2V, so we see that ŒTCDPC BŒTBPB C.
“book” — 2011/3/4 — 17:06 — page 51 — #65
i
i
i
i
i
i
i
i
2.3. Change of basis 51
Corollary 2.3.11. In the situation of Theorem 2.3.10,
ŒTCDPB C1ŒTBPB C
DPC BŒTBPC B1:
Proof. Immediate from Theorem 2.3.10 and Theorem 2.3.4(2).
We are thus led to the following very important definition. (A priori,
this definition may seem very unlikely, but in light of our development it is
almost forced on us.)
Definition 2.3.12. Two n-by-nmatrices Aand Bare similar if there is
an invertible matrix Pwith
ADP1BP: Þ
Remark 2.3.13. It is easy to check that similarity is an equivalence rela-
tion. Þ
The importance of this definition comes from the following theorem.
Theorem 2.3.14. Let Aand Bbe n-by-nmatrices. Then Aand Bare
similar if and only if they are matrices of the same linear transformation
TWFn!Fnwith respect to a pair of bases of Fn.
Proof. Immediate from Corollary 2.3.11.
There is an alternate point of view.
Theorem 2.3.15. Let Vbe a finite-dimensional vector space and let SW
V!Vand TWV!Vbe linear transformations. Then Sand Tare
conjugate (i.e., TDR1SR for some invertible linear transformation
RWV!V) if and only if there are bases Band Cof Vwith
ŒSBDŒTC:
Proof. If ŒSBDŒTC, then by Corollary 2.3.11
ŒSBDŒTCDPC BŒTBP1
C B
so ŒSBand ŒTBare conjugate by the matrix PC Band hence, since a
linear transformation is determined by its matrix in any basis, Sand Tare
conjugate. Conversely, if TDR1S R then
ŒTEDŒR1EŒSEŒRE
“book” — 2011/3/4 — 17:06 — page 52 — #66
i
i
i
i
i
i
i
i
52 Guide to Advanced Linear Algebra
but ŒRE, being an invertible matrix, is a change of basis matrix PC Bfor
some basis C. Then
ŒTEDP1
C EŒSPC E;
so
PC EŒTEP1
C EDŒSE;
i.e.,
ŒTCDŒSE:
Example 2.3.16. Let TWR2!R2be TDTA, where ADh65 24
149 55i.
Let BDnh1
2i;h3
7io, a basis of R2. Then ŒTBDPB EŒTEPE BD
P1
B EŒTEPB E. Since ŒTEDAwe see that
ŒTBD1 3
2 7165 24
149 551 3
2 7D2 3
5 8;
verifying the result of Example 2.2.10, where we computed ŒTBdirectly.
Þ
Example 2.3.17. Let VDPn.R/and let Band Cbe the bases
BD f1; x; x.2/; x.3/; : : : ; x.n/g;
where x.i/ Dx.x 1/.x 2/ .x iC1/, and
CD f1; x; x2; : : : ; xng:
Let PD.pij /DPC Band QD.qij /DPB CDP1. The entries
pij are called Stirling numbers of the first kind and the entries qij are called
Stirling numbers of the second kind. Here we number the rows/columns of
the respective matrices from 0to n, not from 1to nC1. For example, if
nD5we have
PD
2
6
6
6
6
6
6
6
4
1 0 0 0 0 0
0 1 1 2 6 24
0 0 1 3 11 50
0 0 0 1 6 35
0 0 0 0 1 10
0 0 0 0 0 1
3
7
7
7
7
7
7
7
5
and QD
2
6
6
6
6
6
6
6
4
1 0 0 0 0 0
0 1 1 1 1 1
0 0 1 3 7 15
0 0 0 1 6 25
0 0 0 0 1 10
0 0 0 0 0 1
3
7
7
7
7
7
7
7
5
:
(The numbers pij and qij are independent of nas long as i; j n.) Þ
“book” — 2011/3/4 — 17:06 — page 53 — #67
i
i
i
i
i
i
i
i
2.4. The matrix of the dual 53
Example 2.3.18. Let VDP5.R/with bases BD f1; x; : : :; x.5/gand
CD f1; x; : : : ; x5gas in Example 2.3.17.
(1) Let DWV!Vbe differentiation, D.p.x// Dp0.x/.
Then
ŒDBD
2
6
6
6
6
6
6
6
4
0 1 1 2 6 24
0 0 2 6 22 100
0 0 0 3 18 105
0 0 0 0 4 40
0 0 0 0 0 5
0 0 0 0 0 0
3
7
7
7
7
7
7
7
5
and ŒDCD
2
6
6
6
6
6
6
6
4
0 1 0 0 0 0
0 0 2 0 0 0
0 0 0 3 0 0
0 0 0 0 4 0
0 0 0 0 0 5
0 0 0 0 0 0
3
7
7
7
7
7
7
7
5
;
so these two matrices are similar. Indeed,
ŒDBDP1ŒDCPDDCQ1
where Pand Qare the matrices of Example 2.3.17.
(2) Let WV!Vbe the forward difference operator, .p.x// D
p.x C1/ p.x/. Then
ŒBD
2
6
6
6
6
6
6
6
4
010000
002000
000300
000040
000005
000000
3
7
7
7
7
7
7
7
5
and ŒCD
2
6
6
6
6
6
6
6
4
0 1 1 1 1 1
0 0 2 3 4 5
0 0 0 3 6 10
0 0 0 0 4 10
0 0 0 0 0 5
0 0 0 0 0 0
3
7
7
7
7
7
7
7
5
so these two matrices are similar. Again,
ŒBDP1ŒCPDCQ1
where Pand Qare the matrices of Example 2.3.17.
(3) Since ŒDCDŒB, we see that DWV!Vand WV!Vare
conjugate. Þ
2.4 The matrix of the dual
Let TWV!Xbe a linear transformation between finite-dimensional
vector spaces. Once we choose bases Band Cof Vand Xrespectively, we
can represent Tby a unique matrix ŒTC B. We also have the dual linear
transformation TWX!Vand the dual bases Cand Bof Xand
Vrespectively, and it is natural to consider the matrix ŒTB C.
“book” — 2011/3/4 — 17:06 — page 54 — #68
i
i
i
i
i
i
i
i
54 Guide to Advanced Linear Algebra
Definition 2.4.1. Let TWV!Xbe a linear transformation between
finite dimensional vector spaces, and let Abe the matrix ADŒTC B. The
transpose of Ais the matrix tAgiven by tADŒTB C.Þ
Let us first see that this gives the usual definition of the transpose of a
matrix.
Lemma 2.4.2. Let AD.aij /be an m-by-nmatrix. Then BDtAD.bij /
is the n-by-mmatrix with entries bij Daj i ,iD1; : : : ; m,jD1; : : : ; n.
Proof. Let BD fv1; : : : ; vng,BD fw
1; : : : ; w
ng,CD fx1; : : : ; xmg,
and CD fy
1; : : : ; y
mg. Then, by definition,
TvjD
m
X
kD1
akj xkfor jD1; : : : ; n
and
Ty
iD
n
X
kD1
bki w
kfor iD1; : : : ; m:
Now
y
iTvjDaij as y
ixiD1; y
ixkD0for k¤i
and
Ty
ivjDbj i as w
jvjD1; w
kvjD0for k¤j:
By the definition of T, for any y2Xand any v2V
Ty.v/ DyT.v/
so we see bj i Daij , as claimed.
Remark 2.4.3. Every matrix is the matrix of a linear transformation with
respect to a pair of bases, so tAis defined for any matrix A. Our definition
appears to depend on the choice of the bases Band C, so to see that tAis
well-defined we must show it is independent of the choice of bases. This
follows from first principles, but it is easier to observe that Lemma 2.4.2
gives a formula for tAthat is independent of the choice of bases. Þ
Remark 2.4.4. It easy to see that t.A1CA2/DtA1CtA2and that
t.cA/ DctA.Þ
Other properties of the transpose are a little more subtle.
“book” — 2011/3/4 — 17:06 — page 55 — #69
i
i
i
i
i
i
i
i
2.4. The matrix of the dual 55
Lemma 2.4.5. t.AB/ DtBtA.
Proof. Let TWV!Xwith ŒTC BDBand let SWX!Zwith
ŒTD CDA. Then, as we have seen, SıTWV!Zwith ŒSıTD BD
AB. By Definition 2.4.1 and Lemma 1.6.23,
t.AB/ DŒ.SıT/B DDŒTıSB D
DŒTB CŒSC DDtBtA:
Lemma 2.4.6. Let Abe an invertible matrix. Then, t.A1/D.tA/1.
Proof. Clearly if TWV!Vis the identity, then TWV!Vis the
identity, (w.T.v// Dw.v/ D.T.w//.v/ if Tand Tare both the
respective identities). Choose a basis Bof Vand let RWV!Vbe the
linear transformation with ŒRBDA. Then ŒR1BDA1, and
IDŒIBDIBDR1ıRB
DRBR1BDtAtA1;
and
IDŒIBDIBDRıR1B
DR1BRBDtA1tA:
As an application of these ideas, we have a theorem from elementary
linear algebra.
Theorem 2.4.7. Let Abe an m-by-nmatrix. Then the row rank of Aand
the column rank of Aare equal.
Proof. Let TDTAWFn!Fmbe given by T.v/ DAv. Then ŒTEm EnD
A, so the column rank of A, which is the dimension of the subspace of Fm
spanned by the columns of A, is the dimension of the subspace Im.T/of
Fm.
Consider the dual TW.Fm/!.Fn/. As we have seen, ŒTE
n E
mD
tA, so the column rank of tAis equal to the dimension of Im.T/. By
Corollary 1.6.20, dim Im.T/Ddim Im.T/, and obviously the column
space of tAis identical to the row space of A.
We have considered the dual. Now let us consider the double dual. In
Lemma 1.6.26 we defined the linear transformation Hfrom a vector space
to its double dual.
“book” — 2011/3/4 — 17:06 — page 56 — #70
i
i
i
i
i
i
i
i
56 Guide to Advanced Linear Algebra
Lemma 2.4.8. Let TWV!Xbe a linear transformation between finite-
dimensional F-vector spaces. Let BD fv1; : : : ; vngbe a basis of Vand
CD fx1; : : : ; xmgbe a basis of X.
Let B D fv
1; : : : ; v
ngand C D fx
1; : : : ; x
mg, bases of V
and X respectively (where v
iDH.vi/and x
jDH.xj/). Then
TC B DŒTC B:
Proof. An inspection of Definition 1.6.29 shows that T is the compo-
sition HıTıH1where the right-hand His HWV!Vand the
left-hand His HWW!W. But ŒHB BDIand ŒHC CDI
so
TC B DŒHC CŒTC BŒH1B B
DI ŒTC BI1DŒTC B:
The following corollary is obvious from direct computation but we present
another proof.
Corollary 2.4.9. Let Abe an m-by-nmatrix. Then t.tA/ DA.
Proof. Let TWV!Wbe a linear transformation with ŒTC BDA.
Then by Lemma 2.4.8,
ADŒTC BDŒTC B DttŒTC BDttA;
as T is the dual of the dual of T.
“book” — 2011/3/4 — 17:06 — page 57 — #71
i
i
i
i
i
i
i
i
CHAPTER 3
Determinants
In this chapter we deal with the determinant of a square matrix. The de-
terminant has a simple geometric meaning, that of signed volume, and we
use that to develop it in Section 3.1. We then present a more traditional and
fuller development in Section 3.2. In Section 3.3 we derive important and
useful properties of the determinant. In Section 3.4 we consider integrality
questions, e.g., the question of the existence of integer (not just rational)
solutions of the linear system Ax Db, a question best answered using de-
terminants. In Section 3.5 we consider orientations, and see how to explain
the meaning of the sign of the determinant in the case of real vector spaces.
In Section 3.6 we present an interesting family of examples, the Hilbert
matrices.
3.1 The geometry of volumes
The determinant of a matrix Ahas a simple geometric meaning. It is the
(signed) volume of the image of the unit cube under the linear transforma-
tion TA.
We will begin by doing some elementary geometry to see what proper-
ties (signed) volume should have, and use that as the basis for the not-so-
simple algebraic definition.
Henceforth we drop the word “signed” and just refer to volume.
In considering properties that volume should have, suppose we are work-
ing in R2, where volume is area. Let Abe the matrix ADŒv1jv2. The
unit square in R2is the parallelogram determined by the standard unit vec-
tors e1and e2.TA.e1/Dv1and TA.e2/Dv2, so we are looking at the area
of the parallelogram Pdetermined by v1and v2, the two columns of A.
57
“book” — 2011/3/4 — 17:06 — page 58 — #72
i
i
i
i
i
i
i
i
58 Guide to Advanced Linear Algebra
The area of a parallelogram should certainly have the following two
properties:
(1) If we multiply one side of Pby a number c, e.g., if we replace Pby
the parallelogram P0determined by v1and cv2, the area of P0should be c
times the area of P.
(2) If we add a multiple of one side of Pto another, e.g., if we replace
Pby the parallelogram P0determined by v1and v2Ccv1, the area of
P0should be the same as the area of P. (To see this, note that the area of
a parallelogram is base times height, and while this operation changes the
shape of the parallelogram, it does not change its base or its height.)
Property (1) should in particular hold if cD0, when one of the sides
becomes the zero vector, in which case the parallelogram degenerates to a
line (or to a point if both sides are the zero vector), and a line or a point has
area 0.
We now consider an arbitrary field F, and consider n-by-nmatrices. We
are still guided by properties (1) and (2), extending them to n-by-nmatrices
using the idea that if only one or two columns are changed as in (1) or (2),
and the other n1or n2columns are unchanged, then the volume should
change as in (1) or (2). We are thus led to the following definition.
Definition 3.1.1. A volume function Vol WMn.F/!Fis a function
satisfying the properties:
(1) For any scalar c, and any i,
Vol v1j  j vi1jcvijviC1j  j vn
DcVol v1j  j vi1jvijviC1j  j vn:
(2) For any scalar c, and any j¤i,
Vol v1j  j vi1jviCcvjjviC1j  j vn
DVol v1j  j vi1jvijviC1j  j vn:
Note we have not shown that Vol exists, but we will proceed on the
assumption it does to derive properties that it must have, and we will use
them to prove existence.
As we have defined it, Vol cannot be unique, as we can scale it by an
arbitrary factor. Once we specify the scale we obtain a unique function that
we will denote by Vol1, and we will let the determinant be Vol1. But it is
convenient to work with arbitrary volume functions and normalize the result
“book” — 2011/3/4 — 17:06 — page 59 — #73
i
i
i
i
i
i
i
i
3.1. The geometry of volumes 59
at the end. Vol1(or the determinant) will be Vol scaled so that the signed
volume of the unit n-cube, with the columns arranged in the standard order,
is C1.Þ
Lemma 3.1.2. (1) If some column of Ais zero, then Vol.A/ D0.
(2) If the columns of Aare not linearly independent, then Vol.A/ D0.
In particular, if two columns of Aare equal, then Vol.A/ D0.
(3)
Vol v1j  j vjj  j vij  j vn
D Vol v1j  j vij  j vjj  j vn:
(4)
Vol v1j  j au Cbw j  j vn
DaVol v1j  j uj  j vn
CbVol v1j  j wj  j vn:
Proof. (1) Let viD0. Then viD0vi, so by property (1)
Vol v1j  j vij  j vnD0Vol v1j  j vij  j vnD0:
(2) Let viDa1v1Ca2v2CCai1vi1CaiC1viC1CCanvn. Let
v0
iDa2v2CCai1vi1CaiC1viC1CCanvn, so that viDa1v1Cv0
i.
Then, applying property (2),
Vol v1j  j vij  j vnDVol v1j  j a1v1Cv0
ij  j vn
DVol v1j  j v0
ij  j vn:
Proceeding in the same way, applying property (2) repeatedly, we obtain
Vol v1j  j vij  j vnDVol v1j  j 0j  j vnD0:
(3)
Vol v1j  j vjj  j vij  j vn
DVol v1j  j vjj  j vjCvij  j vn
DVol v1j  j vij  j vjCvij  j vn
DVol v1j  j vij  j vjj  j vn
D Vol v1j  j vij  j vjj  j vn:
“book” — 2011/3/4 — 17:06 — page 60 — #74
i
i
i
i
i
i
i
i
60 Guide to Advanced Linear Algebra
(4) First, suppose fv1; : : : ; vi1; viC1; : : : ; vngis not linearly indepen-
dent. Then, by part (3), the equation in (4) becomes 0Da0Cb0, which
is true.
Now for the heart of the proof. Suppose fv1; : : : ; vi1; viC1; : : : ; vngis
linearly independent. By Corollary 1.2.10(1), we may extend this set to a
basis fv1; : : : ; vi1; viC1; : : : ; vn; zgof Fn. Then we may write
uDc1v1C  C ci1vi1CciC1viC1C  C cnvnCc0z;
wDd1v1C  C di1vi1CdiC1viC1C  C dnvnCd0z:
Let vDau Cbw. Then
vDe1v1C  C ei1vi1CeiC1viC1C  C envnCe0z
where e0Dac0Cbd 0.
Applying property (2) repeatedly, and property (1), we see that
Vol v1j  j vj  j vnDe0Vol v1j  j zj  j vn;
Vol v1j  j uj  j vnDc0Vol v1j  j zj  j vn;
Vol v1j  j wj  j vnDd0Vol v1j  j zj  j vn;
yielding the theorem.
Remark 3.1.3. Setting viDvjDz(zarbitrary) in Lemma 3.1.2(3)
gives 2Vol.Œv1j  j zj  j zj  j vn/ D0and hence Vol.Œv1j
 j zj  j zj  j vn/ D0if Fdoes not have characteristic z. This
latter condition is stronger if char.F/D2, and it is this stronger condition,
coming directly from the geometry, that we need. Þ
Theorem 3.1.4. A function fWMn.F/!Fis a volume function if and
only if it satisfies:
(1) Multilinearity: If ADŒv1j  j vnwith viDau Cbw for some i,
then
fv1j  j vij  j vnDaf v1j  j uj  j vn
Cbf v1j  j wj  j vn:
(2) Alternation: If ADŒv1j  j vnwith viDvjfor some i¤j, then
fv1j  j vnD0:
“book” — 2011/3/4 — 17:06 — page 61 — #75
i
i
i
i
i
i
i
i
3.1. The geometry of volumes 61
Proof. We have seen that any volume function satisfies Lemma 3.1.2(3)
and (4), which gives alternation and multilinearity. Conversely, it is easy to
see that multilinearity and alternation give properties (1) and (2) in Defini-
tion 3.1.1.
Remark 3.1.5. The conditions of Theorem 3.1.4 are usually taken to be
the definition of a volume function. Þ
Remark 3.1.6. In characteristic 2, the function fŒa c
b d Dac is multi-
linear and satisfies f .Œv2jv1/ Df .Œv1jv2/ D f .Œv1jv2/, but is not
alternating. Þ
Theorem 3.1.7. Suppose there exists a nontrivial volume function Vol W
Mn.F/!F. Then there is a unique volume function Vol1satisfying Vol1.I /
D1. Furthermore, any volume function is Volafor some a2F, where Vola
is the function Vola.A/ DaVol1.A/.
Proof. Let Abe a matrix with Vol.A/ ¤0. Then, by Lemma 3.1.2(2), A
must be nonsingular. Then there is a sequence of elementary column opera-
tions taking Ato I. By Definition 3.1.1(1) and (2), and by Lemma 3.1.2(4),
each of these operations has the effect of multiplying Vol.A/ by a nonzero
scalar, so Vol.I / ¤0.
Any scalar multiple of a volume function is a volume function, so we
may obtain a volume function Vol1by Vol1.A/ D.1= Vol.I // Vol.A/, and
clearly Vol1.I / D1. Then set Vola.A/ DaVol1.A/.
Now let fbe any volume function. Set aDf .I /. If Ais singular,
then f .A/ D0. Suppose Ais nonsingular. Then there is a sequence of
column operations taking Ito A, and each of these column operations has
the effect of multiplying the value of any volume function by a nonzero
constant independent of the choice of volume function. Thus, if we let bbe
the product of these constants, we have
f .A/ Dbf .I / Dba DbVola.I / DVola.A/;
so fDVola. In particular, if fis any volume function with f .I / D1,
then fDVol1, which shows that Vol1is unique.
Note the proof of this theorem does not show that Vol1exists, as a priori
we could choose two different sequences of elementary column operations
to get from Ito Aand obtain two different values for Vol1.A/. In fact Vol1
does exist, as we now see.
“book” — 2011/3/4 — 17:06 — page 62 — #76
i
i
i
i
i
i
i
i
62 Guide to Advanced Linear Algebra
Theorem 3.1.8. There is a unique volume function Vol1WMn.F/!F
with Vol1.I / D1.
Proof. We proceed by induction on n. For nD1we define det.Œa/ Da.
Suppose det is defined on .n 1/-by-.n 1/ matrices. We define det on
n-by-nmatrices by
det.A/ D
n
X
jD1
.1/1Cja1j det.M1j /
where AD.aij /and M1j is the .n 1/-by-.n 1/ matrix obtained by
deleting row 1and column jof A. (M1j is known as the .1; j /-minor of
A.)
We need to check that the properties of a volume function are satis-
fied. Instead of checking the properties in Definition 3.1.1 directly, we will
check the equivalent properties in Theorem 3.1.4. We use the notation of
that theorem.
We prove the properties of det by induction on n. We assume that det
has the properties of a volume function given in Theorem 3.1.4 for .n 1/-
by-.n 1/ matrices, and in particular that the conclusions of Lemma 3.1.2
hold for det on .n 1/-by-.n 1/ matrices.
We first prove multilinearity. In the notation of Theorem 3.1.4, let viD
au Cbw, and let AD.aij /. Then a1i Dau1Cbw1, where u1and
w1are the first entries of uand wrespectively. Also, M1i DŒv1j  j
vi1jviC1j  j vn. Inspecting the sum for det.A/, and applying
Lemma 3.1.2(4), we see that multilinearity holds.
We next prove alternation. Again follow the notation of Theorem 3.1.4
and let viDvjfor some i¤j. If k¤iand k¤j, the minor M1k
has two identical columns and so by Lemma 3.1.2(2), det.M1k /D0. Then,
inspecting the sum for det.A/, we see that it reduces to
det.A/ D.1/1Cia1i det M1i C.1/1Cja1j det M1j
with a1i Da1j . Let i < j . Then
M1i Dv1j  j vi1jviC1j  j vj1jvjjvjC1j  j vn
and
M1j Dv1j  j vi1jvjjviC1j  j vj1jvjC1j  j vn;
“book” — 2011/3/4 — 17:06 — page 63 — #77
i
i
i
i
i
i
i
i
3.1. The geometry of volumes 63
where vkis the vector obtained from vkby deleting its first entry, and viD
vj.
We may obtain M1i from M1j as follows: First interchange viwith
viC1, then interchange viwith viC2; : : :, and finally interchange viwith
vj1. There is a total of ji1interchanges, and by Lemma 3.1.2(3)
each interchange has the effect of multiplying det by 1, so we see that
det.M1i /D.1/ji1det.M1j /:
Hence, letting aDa1j and mDdet.M1j /,
det.A/ D.1/1Cia.1/ji1mC.1/1Cjam
D.1/jam1C.1/D0:
Finally, det.Œ1/ D1and by induction we have that det.In/D1
det.In1/D1, where In(respectively In1) denotes the n-by-n(respec-
tively .n 1/-by-.n 1/) identity matrix.
Definition 3.1.9. The unique volume function Vol1is the determinant
function, denoted det.A/.Þ
Corollary 3.1.10. Let Abe an n-by-nmatrix. Then det.A/ ¤0if and only
if Ais nonsingular.
Proof. By Lemma 3.1.2(2), for any volume function Vola, Vola.A/ D0
if Ais singular. For any nontrivial volume function, i.e., for any function
Volawith a¤0, we observed in the course of the proof of Theorem 3.1.7
that, for any nonsingular matrix A, Vola.A/ DcVola.I / Dca for some
c¤0.
Remark 3.1.11. Let us give a heuristic argument as to why Corollary
3.1.10 should be true, from a geometric viewpoint. Let ADŒv1j  j vn
be an n-by-nmatrix. Then viDAeiDTA.ei/,iD1; : : : ; n, where ID
Œe1j  j en. Thus the n-parallelogram Pspanned by the columns of A
is the image of the unit n-cube under the linear transformation TA, and the
determinant of Ais the signed volume of P.
If det.A/ ¤0, i.e., if Phas nonzero volume, then the translates of P
“fill up” Fn, and so for any w2Fn, there is a v2Fnwith TA.v/ D
Av Dw. Thus in this case TAis onto Fn, and hence is an isomorphism by
Corollary 1.3.2, so Ais invertible.
If det.A/ D0, i.e., if Phas zero volume, then it is a degenerate n-
parallelogram, and so is a nondegenerate k-parallelogram for some k < n,
“book” — 2011/3/4 — 17:06 — page 64 — #78
i
i
i
i
i
i
i
i
64 Guide to Advanced Linear Algebra
and its translates only “fill up” a k-dimensional subspace of Fn. Thus in
this case TAis not onto Fn, and hence Ais not invertible. Þ
Remark 3.1.12. Another well-known and important property of deter-
minants, that we shall prove in Theorem 3.3.1, is that for any two n-by-n
matrices Aand B, det.AB/ Ddet.A/ det.B/. Let us also give a heuristic
argument as to why this should be true, again from a geometric viewpoint.
But we need to change our viewpoint slightly, from a “static” one to a “dy-
namic” one. In the notation of Remark 3.1.11,
det v1j  j vnDdet.A/ Ddet.A/ 1Ddet.A/ det.I /
Ddet.A/ det e1j  j en:
We then think of the determinant of Aas the factor by which the linear
transformation TAmultiplies signed volume when it takes the unit n-cube
to the n-parallelogram P. A linear transformation is homogeneous in that it
multiplies each “bit” of signed volume by the same factor. That is, if instead
of starting with Iwe start with any n-parallelogram Jand take its image Q
under the linear transformation TA, the signed volume of Qwill be det.A/
times the signed volume of J.
To apply this we begin with the linear transformation TBand let Jbe
the n-parallelogram that is the image of Iunder TB.
In going from Ito J, i.e., in taking the image of Iunder TB, we mul-
tiply signed volume by det.B/, and in going from Jto Q, i.e., in tak-
ing the image of Junder TA, we multiply signed volume by det.A/, so
in going from Ito Q, i.e., in taking the image of Iunder TAıTB, we
multiply signed volume by det.A/ det.B/. But TAıTBDTAB, so TAB
takes Ito Q, and so TAB multiplies signed volume by det.AB/. Hence,
det.AB/ Ddet.A/ det.B/.Þ
Remark 3.1.13. The fact that the determinant is the factor by which lin-
ear transformations multiply signed volume is the reason for the appearance
of the Jacobian in the transformation formula for multiple integrals. Þ
We have carried our argument this far in order to show that we can ob-
tain the existence of the determinant purely from the geometric viewpoint.
In the next section we present an algebraic viewpoint, which only uses our
work up through Theorem 3.1.4. We use this second viewpoint to derive the
results of Section 3.3. But we note that the formula for the determinant we
have obtained in Theorem 3.1.4 is a special case of the Laplace expression
of Theorem 3.3.6. (The geometric viewpoint is simpler, but the algebraic
viewpoint is technically more useful, which is why we present both.)
“book” — 2011/3/4 — 17:06 — page 65 — #79
i
i
i
i
i
i
i
i
3.2. Existence and uniqueness of determinants 65
3.2 Existence and uniqueness
of determinants
We now present a more traditional approach to the determinant.
Lemma 3.2.1. Let Vn;m D fmultilinear functions fWMn;m.F/!Fg.
Then Vm;n is a vector space of dimension nmwith basis ffg, where W
f1; : : : ; mg ! f1; : : : ; ngis any function and, if AD.aij /,
f.A/ Da.1/;1a.2/;2 : : : a.m/;m:
Proof. We proceed by induction on m. Let mD1. Then, by multilinearity,
f2Vn;1 is given by
f0
B
B
B
@2
6
6
6
4
a11
a21
:
:
:
an1
3
7
7
7
51
C
C
C
ADf0
B
B
B
B
B
@
a11
2
6
6
6
6
6
4
1
0
0
:
:
:
0
3
7
7
7
7
7
5Ca21
2
6
6
6
6
6
4
0
1
0
:
:
:
0
3
7
7
7
7
7
5C  C an1
2
6
6
6
6
6
4
0
0
0
:
:
:
1
3
7
7
7
7
7
5
1
C
C
C
C
C
A
Dc11a11 C  C cn1an1
where c11 Df .e1/; : : : ; cn1 Df .en/, and the lemma holds.
Now for the inductive step. Assume the lemma holds for mand consider
f2Vn;mC1. Let A2Mn;mC1and write A0for the n-by-msubmatrix of A
consisting of the first mcolumns of A. Then, by multilinearity,
f0
B
@2
6
4A0ˇˇˇˇˇˇˇ2
6
4
a1mC1
:
:
:
anmC1
3
7
53
7
51
C
A
Da1mC1fA0je1C C anmC1fA0jen:
But g.A0/Df .ŒA0jei/ is a multilinear function on m-by-nmatrices, so
by induction g.A0/DPc0f0.A0/where 0W f1; : : : ; mg ! f1; : : : ; ng,
and so we see that
f .A/ D
n
X
iD1
c0fA0je1a0.1/;1 a0.m/;mai;mC1
D
n
X
iD1
ca.1/;1 a.mC1/;mC1
where W f1; : : : ; m C1g ! f1; : : : ; ngis given by .k/ D0.k/ for
1km, and .m C1/ Di, and the lemma holds.
“book” — 2011/3/4 — 17:06 — page 66 — #80
i
i
i
i
i
i
i
i
66 Guide to Advanced Linear Algebra
We now specialize to the case mDn. In this case, Vol, being a multi-
linear function, is a linear combination of basis elements. We have not used
the condition of alternation yet. We do so now, in two stages.
We let P0be the n-by-nmatrix defined by P0D.pij /where pij D1
if iD0.j / and pij D0if i¤0.j /.P0has exactly one nonzero entry
in each column: an entry of 1 in row 0.j / of column j. We then observe
that if
f .A/ DX
ca.1/;1 a.n/;n;
then f .P0/Dc0. For if D0then each factor p.j /;j is 1, so the
product is 1, but if ¤0then some factor P.j /;j is 0, so the product is 0.
Lemma 3.2.2. Let f2Vn;n be alternating and write
f .A/ DX
ca.1/;1 a.n/;n
where W f1; : : : ; ng ! f1; : : : ; ng. If 0is not 1-to-1, then c0D0.
Proof. Suppose 0is not 1-to-1. As we have observed, f .P0/Dc0. But
in this case P0is a matrix with two identical columns (columns j1and
j2where 0.j1/D0.j2/), so by the definition of alternation, f .P0/D
0.
We restrict our attention to 1-1functions W f1; : : : ; ng ! f1; : : : ; ng.
We denote the set of such functions by Sn, and elements of this set by .Sn
forms a group under composition of functions, as any 2Snis invertible.
Snis known as the symmetric group, and 2Snis a permutation. (We
think of as giving a reordering of f1; : : : ; ngas f.1/; : : : ;  .n/g.)
We now cite some algebraic facts without proof. A transposition is an
element of Snthat interchanges two elements of f1; : : : ; ngand leaves all
the others fixed. (More formally, 2Snis a transposition if for some 1
i¤jn,.i/ Dj,.j / Di,.k/ Dkfor k¤i; j .) Every element
of Sncan be written as a product (i.e., composition) of transpositions. If
is the product of ttranspositions, we define its sign by sign./ D.1/t.
Though tis not well-defined, sign./ is well-defined, i.e., if is written
as a product of t1transpositions and as a product of t2transpositions, then
t1t2.mod 2/.
“book” — 2011/3/4 — 17:06 — page 67 — #81
i
i
i
i
i
i
i
i
3.2. Existence and uniqueness of determinants 67
Lemma 3.2.3. Let f2Vn;n be alternating and write
f .A/ DX
2Sn
ca.1/;1 a.n/;n:
Then f .P0/Dsign.0/f .I /.
Proof. The matrix P0is obtained by starting with Iand performing tin-
terchanges of pairs of columns, where 0is the product of ttranspositions,
and the only term in the sum that contributes is when D0, so the lemma
follows from Lemma 3.1.2(3).
Theorem 3.2.4. Any multilinear, alternating function Vol WMn.F/!Fis
given by
Vol.A/ DVola.A/ Da0
@X
2Sn
sign./a .1/;1 a.n/;n1
A
for some a2F, and every function defined in this way is multilinear and
alternating.
Proof. We have essentially already shown the first part. Let aDf .I /.
Then by Lemma 3.2.3, for every 2Sn,cDasign./.
It clearly suffices to verify the second part when aD1. Suppose AD
Œv1j  j vnand viDv0
iCv00
i. Let
viD2
6
4
a1i
:
:
:
ani
3
7
5; v0
iD2
6
4
b1i
:
:
:
bni
3
7
5;and v00
iD2
6
4
c1i
:
:
:
cni
3
7
5;
so aki Dbki Ccki .
Then
X
2Sn
sign./a .1/;1 a.i /;i a.n/;n
DX
2Sn
sign./a .1/;1 b.i /;i Cc.i /;i a.n/;n
DX
2Sn
sign./a .1/;1 b.i/;i a.n/;n
CX
2Sn
sign./a .1/;1 c.i/;i a.n/;n;
“book” — 2011/3/4 — 17:06 — page 68 — #82
i
i
i
i
i
i
i
i
68 Guide to Advanced Linear Algebra
showing multilinearity. Suppose columns iand jof Aare equal, and let
2Snbe the transposition that interchanges iand j. To every 2Snwe
can associate 0D 2Sn, and is associated to 0as 2is the identity,
and hence D2D0. Write this association as 0. Then
X
2Sn
sign./a.1/;1 a.i /;i a .j /;j a .n/;n
DX
0sign./a .1/;1 a.i /;i a.j /;j a.n/;n
Csign.0/a0.1/;1 a0.i/;i a0.j /;j a0.n/;n:
But sign./ D sign.0/and the two products of elements are equal be-
cause columns iand jof Aare identical, so the terms cancel in pairs and
the sum is 0, showing alternation.
Definition 3.2.5. The function det WMn.F/!F, given by
det.A/ DX
2Sn
sign./a .1/;1 a.n/;n
is the determinant function. Þ
3.3 Further properties
We now derive some important properties of the determinant.
Theorem 3.3.1. Let A; B 2Mn.F/. Then
det.AB/ Ddet.A/ det.B/:
Proof. Define a function fWMn.F/!Fby f .B/ Ddet.AB/. It is
straightforward to check that fis multilinear and alternating, so fis a vol-
ume function f .B/ DVola.B/ Dadet.B/ where aDf .I / Ddet.AI / D
det.A/.
Corollary 3.3.2. (1) det.A/ ¤0if and only if Ais invertible.
(2) If Ais invertible, then det.A1/D1= det.A/. Furthermore, for any
matrix B,det.ABA1/Ddet.B/.
Proof. We have already seen in Lemma 3.1.2 that for any volume function
f,f .A/ D0if Ais not invertible. If Ais invertible we have 1Ddet.I / D
det.AA1/Ddet.A/ det.A1/from which the corollary follows.
“book” — 2011/3/4 — 17:06 — page 69 — #83
i
i
i
i
i
i
i
i
3.3. Further properties 69
Lemma 3.3.3. (1) Let Abe a diagonal matrix. Then det.A/ is the product
of its diagonal entries.
(2) More generally, let Abe an upper triangular, or a lower triangular,
matrix. Then det.A/ is the product of its diagonal entries.
Proof. (1) If Ais diagonal, then there is only one nonzero term in Defini-
tion 3.2.5, the term corresponding to the identity permutation (.i/ Difor
every i), which has sign C1.
(2) If is not the identity then there is a jwith .j / < j , and a k
with .k/ > k, so for a triangular matrix there is again only the diagonal
term.
Theorem 3.3.4. (1) Let Mbe a block diagonal matrix,
MDA 0
0 D:
Then det.M / Ddet.A/ det.D/.
(2) More generally, let Mbe a block upper triangular or a block lower
triangular matrix,
MDA B
0 Dor MDA 0
C D:
Then det.M / Ddet.A/ det.D/.
Proof. (1) Define a function fWMn.F/!Fby
f .D/ Ddet A 0
0 D:
Then fis multilinear and alternating, so f .D/ Df .I / det.D/. But f .I / D
det A 0
0 I Ddet.A/. (This last equality is easy to see as any permutation
that contributes nonzero to det A 0
0 I must fix all but (possibly) the first n
entries.)
(2) Suppose Mis upper triangular (the lower triangular case is similar).
If Ais singular then there is a vector v¤0with Av D0. Then let wbe the
vector whose first nentries are that of vand whose remaining entries are 0.
Then M w D0. Thus Mis singular as well, and 0D0det.D/.
Suppose that Ais nonsingular. Then
A B
0 DDA 0
0 DI A1B
0 I :
“book” — 2011/3/4 — 17:06 — page 70 — #84
i
i
i
i
i
i
i
i
70 Guide to Advanced Linear Algebra
The first matrix on the right-hand side has determinant det.A/ det.D/, and
the second matrix on the right-hand side has determinant 1, as it is upper
triangular, and the theorem follows.
Lemma 3.3.5. Let tAbe the matrix obtained from Aby interchanging the
rows and columns of A. Then det.tA/ Ddet.A/.
Proof. For any 2Sn, sign.1/Dsign./. Let BD.bij /DtA. Then
det.A/ DX
2Sn
sign./a.1/;1 a.n/;n
DX
2Sn
sign./a1;1.1/ an;1.n/
DX
2Sn
sign.1/a1;1.1/ an;1.n/
DX
12Sn
sign.1/b1.1/;1 b1.n/;n
Ddet.tA/:
Let Aij denote the .i; j /-minor of the matrix A, the submatrix obtained
by deleting row iand column jof A.
Theorem 3.3.6 (Laplace expansion).Let Abe an n-by-nmatrix, AD.aij /.
(1) For any i,
det.A/ D
n
X
jD1
.1/iCjaij det Aij :
(2) For any j,
det.A/ D
n
X
iD1
.1/iCjaij det Aij :
(3) For any i, and for any k¤i,
0D
n
X
jD1
.1/iCjakj det Aij :
(4) For any j, and for any k¤j,
0D
n
X
iD1
.1/iCjaik det Aij :
“book” — 2011/3/4 — 17:06 — page 71 — #85
i
i
i
i
i
i
i
i
3.3. Further properties 71
Proof. We prove (1) and (3) simultaneously, so we fix k(which may or may
not equal i).
The sum on the right-hand side is the sum of multilinear functions so is
itself multilinear. (This is also easy to see directly.)
We now show it is alternating. Let Abe a matrix with columns pand q
equal, where 1p < q n. If j¤p; q then Aij is a matrix with two
columns equal, so det.Aij /D0. Thus the only two terms that contribute to
the sum are
.1/iCpakp det Aip C.1/iCqakq det Aiq:
By hypothesis, akq Dakp. Now
Aip Dv1j  j vp1jvpC1j  j vq1jvqjvqC1j  j vn;
Aiq Dv1j  j vp1jvpjvpC1j  j vq1jvqC1j  j vn:
where vmdenotes column mof the matrix obtained from Aby deleting
row iof A. By hypothesis, vpDvq, so these two matrices have the same
columns but in a different order. We get from the first of these to the second
by successively performing qp1column interchanges (first switching
vqand vq1, then switching vqand vq2, . . . , and finally switching vqand
vpC1), so det.Ai q /D.1/qp1det.Aip /. Thus we see that the contribu-
tion of these two terms to the sum is
.1/iCpakp det Aip C.1/iCqakp.1/qp1det Aip
and since .1/iCpand .1/iC2qp1always have opposite signs, they can-
cel.
By our uniqueness result, the right-hand side is a multiple adet.A/ for
some a. A computation shows that if ADI, the right-hand side gives 1if
kDiand 0if k¤i, proving the theorem in these cases.
For cases (2) and (4), using the fact that det.B/ Ddet.tB/ for any
matrix B, we can take the transpose of these formulas and use cases (1) and
(3).
Remark 3.3.7. Theorem 3.3.6(1) (respectively, (3)) is known as expan-
sion by minors of the jth column (respectively, of the ith row). Þ
Definition 3.3.8. The classical adjoint of Ais the matrix Adj.A/ de-
fined by Adj.A/ D.bij /where bij D.1/iCjdet.Aj i /.Þ
Note carefully the subscript in the definition—it is Aj i , as written, not
Aij .
“book” — 2011/3/4 — 17:06 — page 72 — #86
i
i
i
i
i
i
i
i
72 Guide to Advanced Linear Algebra
Corollary 3.3.9. (1) For any matrix A,
Adj.A/DAAdj.A/Ddet.A/I:
(2) If Ais invertible,
A1D1
det.A/ Adj.A/:
Proof. (1) can be verified by a computation that follows directly from The-
orem 3.3.6. Then (2) follows immediately.
Remark 3.3.10. We have given the formula in Corollary 3.3.9(2) for its
theoretical interest (and we shall see some applications of it later) but as
a practical matter it should almost never be used to find the inverse of a
matrix. Þ
Corollary 3.3.11 (Cramer’s rule).Let Abe an invertible n-by-nmatrix and
let bbe a vector in Fn. Let xbe the unique vector in Fnwith Ax Db. Write
xDx1
:
:
:
xn. Then, for 1in,xiDdet.Ai.b//= det.A/, where Ai.b/ is
the matrix obtained from Aby replacing its ith column by b.
Proof. Let the columns of Abe a1; : : : ; an. By linearity, it suffices to prove
the corollary for all elements of any basis Bof Fn. We choose the basis
BD fa1; : : : ; ang.
Fix iand consider Ax Dai. Then Ai.ai/DA, so the above formula
gives xiD1. For j¤i,Ai.aj/is a matrix with two identical columns, so
the above formula gives xjD0. Thus xDei, the ith standard basis vector,
and indeed AeiDai.
Remark 3.3.12. Again this formula is of theoretical interest but should
almost never be used in practice. Þ
Here is a familiar result from elementary linear algebra.
Definition 3.3.13. If the matrix Ahas a a k-by-ksubmatrix with
nonzero determinant, but does not have a .k C1/-by-.k C1/ submatrix
with nonzero determinant, then the determinantal rank of Ais k.Þ
Theorem 3.3.14. Let Abe a matrix. Then the row rank, column rank, and
determinantal rank of Aare all equal.
“book” — 2011/3/4 — 17:06 — page 73 — #87
i
i
i
i
i
i
i
i
3.3. Further properties 73
Proof. We showed that the row rank and column rank of Aare equal in
Theorem 2.4.7. We now show that the column rank of Ais equal to the
determinantal rank of A.
Write ADŒv1j  j vn, where Ais m-by-n. Let Ahave a k-by-k
submatrix Bwith nonzero determinant. For simplicity, we assume that Bis
the upper left-hand corner of A. Suppose Bis k-by-k. Let WFm!Fk
be defined by
0
B
@2
6
4
a1
:
:
:
am
3
7
51
C
AD2
6
4
a1
:
:
:
ak
3
7
5:
Then BDŒ.v1/j  j .vk/. Since det.B/ ¤0,Bis nonsingular,
so f.v1/; : : : ; .vk/gis linearly independent, and hence fv1; : : : ; vkgis
linearly independent. But then this set spans a k-dimensional subspace of
the column space of A, so Ahas column rank at least k.
On the other hand, suppose Ahas klinearly independent columns.
Again, for simplicity, suppose these are the leftmost kcolumns of A. Now
fv1; : : : ; vkgis linearly independent and fe1; : : : ; emgspans Fm, so
fv1; : : : ; vk; e1; : : : ; emgspans Fmas well. Then, by Theorem 1.2.9, there is
a basis Bof Fmwith fv1; : : : ; vkg  B fv1; : : : ; vk; e1; : : : ; emg. Write
BD fv1; : : : ; vk; vkC1; : : : ; vmgand note that, for each ikC1,viDej
for some j. Form the matrix B0DŒv1j  j vkjvkC1j  j vnand
note that det.B0/¤0. Expand by minors of columns n; n 1; : : : ; k C1to
obtain 0¤det.B0/D ˙det.B/ where Bis a k-by-ksubmatrix of A, so A
has determinantal rank at least k.
We have defined the determinant for matrices. We can define the de-
terminant for linear transformations TWV!V, where Vis a finite-
dimensional vector space.
Definition 3.3.15. Let TWV!Vbe a linear transformation with V
a finite-dimensional vector space. The determinant det.T/is defined to be
det.T/Ddet Œ.T/Bwhere Bis any basis of V.Þ
To see that this is well-defined we have to know that it is independent of
the choice of the basis B. That follows immediately from Corollary 2.3.11
and Corollary 3.3.2(2).
We have defined the general linear groups GLn.F/and GL.V / in Defi-
nition 1.1.29.
“book” — 2011/3/4 — 17:06 — page 74 — #88
i
i
i
i
i
i
i
i
74 Guide to Advanced Linear Algebra
Lemma 3.3.16. GLn.F/D fA2Mn.F/jdet.A/ ¤0g. For Vfinite
dimensional,
GL.V / D˚TWV!Vjdet.T/¤0:
Proof. Immediate from Corollary 3.3.2.
We can now make a related definition.
Definition 3.3.17. The special linear group SLn.F/is the group
SLn.F/D˚A2GLn.F/jdet.A/ D1:
For Vfinite dimensional,
SLn.V / D˚T2GL.V / jdet.T/D1:Þ
Theorem 3.3.18. (1) SLn.F/is a normal subgroup of GLn.F/.
(2) For Vnite dimensional, SL.V / is a normal subgroup of GL.V /.
Proof. SLn.F/is the kernel of the homomorphism det WGLn.F/!F,
and similarly for SL.V /. (By Theorem 3.3.1, det is a homomorphism.) Here
Fdenotes the multiplicative group of nonzero elements of F.
3.4 Integrality
While we almost exclusively work over a field, it is natural to ask the ques-
tion of integrality, and we consider that here.
Let Rbe an integral domain with quotient field F. An element uof R
is a unit if there is an element vof Rwith uv Dvu D1. (The reader
unfamiliar with quotient fields can simply take RDZand FDQ, and
note that the units of Zare ˙1.)
Theorem 3.4.1. Let Abe an n-by-nmatrix with entries in Rand suppose
that it is invertible, considered as a matrix with entries in F. The following
are equivalent:
(1) A1has entries in R.
(2) det.A/ is a unit in R.
(3) For every vector ball of whose entries are in R, the unique solution of
Ax Dbis a vector all of whose entries are in R.
“book” — 2011/3/4 — 17:06 — page 75 — #89
i
i
i
i
i
i
i
i
3.4. Integrality 75
Proof. First we show that (1) and (3) are equivalent and then we show that
(1) and (2) are equivalent.
Suppose (1) is true. Then the solution of Ax Dbis xDA1b,
whose entries are in R. Conversely, suppose (3) is true. Let AxiDei,
iD1; : : : ; n, where feigis the set of standard unit vectors in Fn. Form the
matrix BDŒx1jx2j  j xn. Then Bis a matrix all of whose entries are
in R, and AB DI, so BDA1by Corollary 1.3.3.
Suppose (1) is true. Let det.A/ Duand det.A1/Dv. Then uand
vare elements of Rand uv Ddet.A/ det.A1/Ddet.I / D1, so uis a
unit in R. Conversely, suppose (2) is true, so det.A/ Duis a unit in R. Let
uv D1with v2R, so vD1=u. Then Corollary 3.3.9(2) shows that all of
the entries of A1are in R.
Remark 3.4.2. Let Abe an n-by-nmatrix with entries in Rand sup-
pose that Ais invertible, considered as a matrix with entries in F. Let
dDdet.A/.
(1) If bis a vector in Rnall of whose entries are divisible by d, then
xDA1b, the unique solution of Ax Db, has all its entries in R.
(2) This condition on the entries of bis sufficient but not necessary. It is
possible to have a vector bwhose entries are not all divisible by dwith the
solution of Ax Dbhaving all its entries in R. For example, let RDZand
take AD1 1
1 3 , a matrix of determinant 2. Then Ax D1
1has solution
xD1
0. (By Theorem 3.4.1, if dis not a unit, this is not possible for all
b.) Þ
We can now generalize the definitions of GLn.F/and SLn.F/.
Definition 3.4.3. The general linear group GLn.R/ is defined by
GLn.R/ D˚A2Mn.R/ jAhas an inverse in Mn.R/:Þ
Corollary 3.4.4.
GLn.R/ D˚A2Mn.R/ jdet.A/ is a unit in R:
Definition 3.4.5. The special linear group SLn.R/ is defined by
SLn.R/ D˚A2GLn.R/ jdet.A/ D1:Þ
Lemma 3.4.6. SLn.R/ is a normal subgroup of GLn.R/.
Proof. SLn.R/ is the kernel of the determinant homomorphism.
“book” — 2011/3/4 — 17:06 — page 76 — #90
i
i
i
i
i
i
i
i
76 Guide to Advanced Linear Algebra
Remark 3.4.7. If RDZ, the units in Rare 1g. Thus SLn.Z/is a
subgroup of index 2 of GLn.Z/.Þ
It follows from our previous work that for any nonzero vector v2Fn
there is an invertible matrix Awith Ae1Dv(where e1is the first vector in
the standard basis of Fn). One can ask the same question over the integers:
Given a nonzero vector v2Zn, is there a matrix Awith integer entries, in-
vertible as an integer matrix, with Ae1Dv? There is an obvious necessary
condition, that the entries of vbe relatively prime. This condition turns out
to be sufficient. We prove a slightly more precise result.
Theorem 3.4.8. Let n2and let vDa1
:
:
:
anbe a nonzero vector with inte-
gral entries. Let dDgcd.a1; : : : ; an/. Then there is a matrix A2SLn.Z/
with A.de1/Dv.
Proof. We proceed by induction on n. We begin with nD2. If dD
gcd.a1; a2/, let a0
1Da1=d and b0
1Db1=d . Then there are integers p
and qwith a0
1pCa0
2qD1. Set
ADa0
1q
a0
2p:
Suppose the theorem is true for n1, and consider v2Zn. It is easy
to see that the theorem is true if a1D  D an1D0, so suppose not. Let
d0Dgcd.a1; : : : ; an1/. Then dDgcd.d0; an/. By the proof of the nD2
case, there is an n-by-nmatrix A1with
A1de1D2
6
6
6
6
6
4
d0
0
:
:
:
0
an
3
7
7
7
7
7
5
:
(A1has suitable entries in its “corners” and an .n 2/-by-.n 2/ identity
matrix in its “middle”.) By the inductive assumption, there is an n-by-n
matrix A2with
A2
0
B
B
B
B
B
@
2
6
6
6
6
6
4
d0
0
:
:
:
0
an
3
7
7
7
7
7
5
1
C
C
C
C
C
AD2
6
4
a1
:
:
:
an
3
7
5:
“book” — 2011/3/4 — 17:06 — page 77 — #91
i
i
i
i
i
i
i
i
3.4. Integrality 77
(A2is a block diagonal matrix with a suitable .n 1/-by-.n 1/ matrix in
its upper left-hand corner and an entry of 1in its lower right-hand corner.)
Set ADA2A1.
Corollary 3.4.9. Let n2and let vDa1
:
:
:
anbe a nonzero vector with
integer entries, and suppose that fa1; : : : ; angis relatively prime. Then there
is a matrix A2SLn.Z/whose first column is v.
Proof. Ais the matrix constructed in the proof of Theorem 3.4.8.
Let Z=N Zdenote the ring of integers mod N. We have the map Z!
Z=N Zby a7! a .mod N /. This induces a map on matrices as well.
Theorem 3.4.10. For every n1, the map 'WSLn.Z/!SLn.Z=N Z/
given by the reduction of entries .mod N / is an epimorphism.
Proof. We prove the theorem by induction on n. For nD1it is obvious.
Suppose n > 1. Let M2SLn.Z=N Z/be arbitrary. Then there is
certainly a matrix Mwith integer entries with '.M / DM, and then
det.M / 1 .mod N /. But this is not good enough. We need det.M / D1.
Let v1Da1
:
:
:
anbe the first column of M. Then M2SLn.Z=N Z/
implies gcd.a1; : : : ; an; N / D1.
Let dDgcd.a1; : : : ; an/. Then dand Nare relatively prime. By The-
orem 3.4.8, there is a matrix A2SLn.Z/with AM a matrix of the form
AM D2
6
6
4
d
0
:
:
:w2 wn
0
3
7
7
5:
If dD1we may set M1DM,BDI, and PDAM DBAM1.
Otherwise, let Lbe the matrix with an entry of Nin the .2; 1/ position and
all other entries 0. Let M1DMCA1L. Then
AM1D2
6
6
6
6
6
4
d
N
0
:
:
:
w2 wn
0
3
7
7
7
7
7
5
and M1M .mod N /.
“book” — 2011/3/4 — 17:06 — page 78 — #92
i
i
i
i
i
i
i
i
78 Guide to Advanced Linear Algebra
As in the proof of Theorem 3.4.8, we choose integers pand qwith
dp CN q D1. Let Ebe the 2-by-2matrix
EDp q
N d
and let Bbe the n-by-nblock matrix
BDE 0
0 I :
Then PDBAM1is of the form
PD2
6
6
4
1
0
:
:
:u2 un
0
3
7
7
5:
Write Pas a block matrix
PD1 X
0 U :
Then det.P / det.M / 1 .mod N /, so det.U / 1 .mod N /.Uis an
.n 1/-by-.n 1/ matrix, so by the inductive hypothesis there is a matrix
V2SLn1.Z/with VU .mod N /. Set
QD1 X
0 V :
Then Q2SLn.Z/and
QPDBAM1BAM .mod N /:
Thus
RD.BA/1Q2SLn.Z/and RM .mod N /;
i.e., '.R/ D'.M / DM, as required.
3.5 Orientation
We now study orientations of real vector spaces, where we will see the
geometric meaning of the sign of the determinant. Before we consider ori-
entation per se it is illuminating to study the topology of the general linear
group GLn.R/, the group of invertible n-by-nmatrices with real entries.
“book” — 2011/3/4 — 17:06 — page 79 — #93
i
i
i
i
i
i
i
i
3.5. Orientation 79
Theorem 3.5.1. The general linear group GLn.R/has two components.
Proof. We have the determinant function det WMn.R/!R. Since a matrix
is invertible if and only if its determinant is nonzero,
GLn.R/Ddet1.R f0g/:
Now Rf0ghas two components, so GLn.R/has at least two components,
fmatrices with positive determinantgand fmatrices with negative determi-
nantg. We will show that each of these two sets is path-connected. (Since
GLn.R/is an open subset of Euclidean space, components and path com-
ponents are the same.)
We know that every nonsingular matrix can be transformed to the iden-
tity matrix by left-multiplication by a sequence of elementary matrices, that
have the effect of performing a sequence of elementary row operations. (We
could equally well right-multiply and perform column operations with no
change in the proof.) We will consider a variant on elementary row opera-
tions, namely operations of the following type:
(1) Left multiplication by a matrix
e
ED2
6
6
6
4
1
1 a
:::
1
3
7
7
7
5
with ain the .i; j / position, which has the effect of adding a times row j
to row i. (This is a usual row operation.)
(2) Left multiplication by a matrix
e
ED
2
6
6
6
6
6
6
6
6
6
4
1
1
:::
c
:::
1
3
7
7
7
7
7
7
7
7
7
5
with c > 0 in the .i; i / position, which has the effect of multiplying row i
by c. (This is a usual row operation, but here we restrict cto be positive.)
“book” — 2011/3/4 — 17:06 — page 80 — #94
i
i
i
i
i
i
i
i
80 Guide to Advanced Linear Algebra
(3) Left multiplication by a matrix
e
ED
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
:::
01
1
:::
1
1 0
1
:::
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
with 1in the .i; j / position and 1in the .j; i / position, which has the
effect of replacing row iby row jand row jby the negative of row i. (This
differs by a sign from a usual row operation, which replaces each of these
two rows by the other.)
There is a path in GLn.R/connecting the identity to each of these ele-
ments e
E.
In case (1), we have the path given by
e
E.t/ D2
6
4
1
:::ta
1
3
7
5
for 0t1.
In case (2), we have the path given by
e
E.t/ D
2
6
6
6
6
6
6
6
6
6
4
1
:::
exp tln.c/1
:::
1
3
7
7
7
7
7
7
7
7
7
5
for 0t1.
“book” — 2011/3/4 — 17:06 — page 81 — #95
i
i
i
i
i
i
i
i
3.5. Orientation 81
In case (3), we have the path given by
e
E.t/ D
2
6
6
6
6
6
6
6
6
6
6
6
6
4
1
:::
cos t=2sin t=2
:::
sin t=2cos t=2:::
1
3
7
7
7
7
7
7
7
7
7
7
7
7
5
for 0t1.
Now let Abe an invertible matrix and suppose we have a sequence
of elementary row operations that reduces Ato the identity, so that
EkE2E1ADI. Replacing each Eiby the corresponding matrix e
Ei
we see that e
Eke
E1ADe
Iis a matrix differing by Iin at most the sign of
its entries, i.e., e
Iis a diagonal matrix with each diagonal entry equal to ˙1.
As tgoes from 0to 1, the product e
E1.t/A gives a path from Ato e
E1A; as
tgoes from 0to 1,e
E2.t/e
E1Agives a path from e
E1Ato e
E2e
E1A, and so
forth. In the end we have path from Ato e
I, so Aand e
Iare in the same path
component of GLn.R/. Note that Aand e
Ihave determinants with the same
sign. Thus there are two possibilities:
(1) Ahas a positive determinant. In this case e
Ihas an even number of
1entries on the diagonal, which can be paired. Suppose there is a pair of
1entries in positions .i; i/ and .j; j /. If e
Eis the appropriate matrix of
type (3), e
E2e
Iwill be a matrix of the same form as e
I, but with both of these
entries equal to C1and the others unchanged. As above, we have a path
from e
Ito e
E2e
I. Continue in this fashion to obtain a path from e
Ito I, and
hence a path from Ato I. Thus Ais in the same path component as I.
(2) Ahas a negative determinant. In this case e
Ihas an odd number of
1entries. Proceeding as in (1), we pair up all but one of the 1entries
to obtain a path from e
Ito a diagonal matrix with a single 1entry on
the diagonal and all other diagonal entries equal to 1. If the 1entry is in
the .1; 1/ position there is nothing more to do. If it is in the .i; i/ position
for i¤1(and hence the entry in the .1; 1/ position is 1) we apply an
appropriate matrix e
Eof type (3) to obtain the diagonal matrix with 1as
the first entry on the diagonal and all other entries equal to 1, and hence a
path from Ato this matrix, which we shall denote by I. Thus in this case
Ais in the same path component as I.
“book” — 2011/3/4 — 17:06 — page 82 — #96
i
i
i
i
i
i
i
i
82 Guide to Advanced Linear Algebra
We now come to the notion of an orientation of a real vector space. We
assume Vis finite dimensional and dim.V / > 0.
Definition 3.5.2. Let BD fv1; : : : ; vngand CD fw1; : : : ; wngbe
two bases of the n-dimensional real vector space V. Then Band Cgive
the same orientation of Vif the change of basis matrix PC Bhas positive
determinant, while they give opposite orientations of Vif the change of
basis matrix PC Bhas negative determinant. Þ
Remark 3.5.3. It is easy to check that “giving the same orientation” is an
equivalence relation on bases. It then follows that we can regard an orienta-
tion on a real vector space (of positive finite dimension) as an equivalence
class of bases of V, and there are two such equivalence classes. Þ
In general, there is no preferred orientation on a real vector space, but
in one very important special case there is.
Definition 3.5.4. Let BD fv1; : : : ; vngbe a basis of Rn. Then B
gives the standard orientation of Rnif Bgives the same orientation as the
standard basis Eof Rn. Otherwise Bgives the nonstandard orientation of
Rn.Þ
Remark 3.5.5. (1) Eitself gives the standard orientation of Rnas PE E
DIhas determinant 1.
(2) The condition in Definition 3.5.4 can be phrased more simply. By
Remark 2.3.6(1), PE Bis the matrix PE BDŒv1jv2j  j vn. So B
gives the standard orientation of Rnif det.PE B/ > 0 and the nonstandard
orientation of Rnif det.PE B/ < 0.
(3) In Definition 3.5.4, recalling that PC BD.PE C/1PE B, we
see that Band Cgive the same orientation of Rnif the determinants of the
matrices Œv1jv2j  j vnand Œw1jw2j  j wnhave the same sign and
opposite orientations if they have opposite signs. Þ
Much of the significance of the orientation of a real vector space comes
from topological considerations. We continue to let Vbe a real vector space
of finite dimension n > 0, and we choose a basis B0of V. For any basis
Cof Vwe have a map f0W fbases of Vg ! GLn.R/given by f0.C/D
PB0 C. (If CD fw1; : : : ; wngthen f0.C/is the matrix ŒŒw1B0j  j
ŒwnB0.) This map is 1-1and onto. We then give fbases of Vga topology
by requiring that f0be a homeomorphism. That is, we define a subset Oof
fbases of Vgto be open if and only if f0.O/is an open subset of GLn.R/.
A priori, this topology depends on the choice of B0, but in fact it does
not. For if we choose a different basis B1and let f1.C / DPB1 C, then
“book” — 2011/3/4 — 17:06 — page 83 — #97
i
i
i
i
i
i
i
i
3.5. Orientation 83
f1.C / DPf0.C/where Pis the constant matrix PDPB1 B0, and
multiplication by the constant matrix Pis a homeomorphism from GLn.R/
to itself.
We then have:
Corollary 3.5.6. Let Vbe an n-dimensional real vector space and let B
and Cbe two bases of V. Then Band Cgive the same orientation of Vif
and only if Bcan continuously be deformed to C, i.e., if and only if there
is a continuous function pWŒ0; 1 ! fbases of Vgwith p.0/ DBand
p.1/ DC.
Proof. The bases Band Cof Vgive the same orientation of Vif and only
if PC Bhas positive determinant, and by Theorem 3.5.1 this is true if and
only if there is a path in GLn.R/joining Ito PC B.
To be more explicit, let pWŒ0; 1 !GLn.R/with p.0/ DIand
p.1/ DPC B. For any tbetween 0and 1, let Btbe the basis defined by
PBt BDp.t/. Then B0DBand B1DC.
That there is no corresponding analog of orientation for complex vector
spaces. This is a consequence of the following theorem.
Theorem 3.5.7. The general linear group GLn.C/is connected.
Proof. We show that it is path connected (which is equivalent as GLn.C/
is an open subset of Euclidean space). The proof is very much like the proof
of Theorem 3.5.1, but easier. We show that there are paths joining the iden-
tity matrix to the usual elementary matrices.
(1) For
ED2
6
6
6
4
1
1 a
:::
1
3
7
7
7
5
we have
p.t/ D2
6
6
6
4
1
1 at
:::
1
3
7
7
7
5
with atDt a.
“book” — 2011/3/4 — 17:06 — page 84 — #98
i
i
i
i
i
i
i
i
84 Guide to Advanced Linear Algebra
(2) For
ED
2
6
6
6
6
6
6
6
6
6
6
6
4
1
:::
1
c
1
:::
1
3
7
7
7
7
7
7
7
7
7
7
7
5
with cDr ei;
we have
p.t/ D
2
6
6
6
6
6
6
6
6
6
6
6
4
1
1
:::
ct
1
:::
1
3
7
7
7
7
7
7
7
7
7
7
7
5
with ctDetln.r /eti:
(3) For
ED
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
:::
1
0 1
1
:::
1
1 0
1
:::
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
“book” — 2011/3/4 — 17:06 — page 85 — #99
i
i
i
i
i
i
i
i
3.5. Orientation 85
we have
p.t/ D
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
:::
1
atbt
1
:::
1
ctdt
1
:::
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
with
atbt
ctdtDcos t=2e it sin t=2
sin t =2ei t cos t=2:
We may also consider the effect of nonsingular linear transformations
on orientation.
Definition 3.5.8. Let Vbe an n-dimensional real vector space and let
TWV!Vbe a nonsingular linear transformation. Let BD fv1; : : : ; vng
be a basis of V. Then CD fT.v1/; : : : ; T.vng/is also a basis of V. If B
and Cgive the same orientation of Vthen Tis orientation preserving, while
if Band Cgive opposite orientations of Vthen Tis orientation reversing.
Þ
The fact that this is well-defined, i.e., independent of the choice of basis
B, follows from the following proposition, which proves a more precise
result.
Proposition 3.5.9. Let Vbe an n-dimensional real vector space and let
TWV!Vbe a nonsingular linear transformation. Then Tis orientation
preserving if det.T/ > 0, and Tis orientation reversing if det.T/ < 0.
Remark 3.5.10. Suppose we begin with a complex vector space Vof
dimension n. We may then “forget” the fact that we have complex numbers
acting as scalars and in this way regard Vas a real vector space VRof di-
mension 2n. In this situation VRhas a canonical orientation. Choosing any
basis BD fv1; : : : ; vngof V, we obtain a basis BRD fv1; iv1; : : : ; vn; ivng
“book” — 2011/3/4 17:06 — page 86 — #100
i
i
i
i
i
i
i
i
86 Guide to Advanced Linear Algebra
of VR. It is easy to check that if Cis any other basis of V, then CRgives
the same orientation of VRas BRdoes. Furthermore, suppose we have an
arbitrary linear transformation TWV!V. By “forgetting” the complex
structure we similarly obtain a linear transformation TRWVR!VR. In this
situation det.TR/Ddet.T/det.T/. In particular, if Tis nonsingular, then
TRis not only nonsingular but also orientation preserving. Þ
3.6 Hilbert matrices
In this section we present, without proofs, a single family of examples, the
Hilbert matrices. This family is both interesting and important. More infor-
mation on it can be found in the article “Tricks or Treats with the Hilbert
Matrix” by M. D. Choi, Amer. Math Monthly 90 (1983), 301–312.
In this section we adopt the convention that the rows and columns of an
n-by-nmatrix are numbered from 0to n1.
Definition 3.6.1. The n-by-nHilbert matrix is the matrix HD.hij /
with hij D1=.i CjC1/.Þ
Theorem 3.6.2. (1) The determinant of Hnis
det HnD.n 1/Š4
1Š2Š .2n 1/Š :
(2) Let GnD.gij /DH1
n. Then Gnhas entries
gij D.1/iCj.i CjC1/ nCi
n1jnCi
n1iiCj
iiCj
j:
Remark 3.6.3. The entries of H1
nare all integers, and it is known that
det.Hn/is the reciprocal of an integer. Þ
Example 3.6.4. (1) det.H2/D1=12 and
H1
2D46
6 12:
(2) det.H3/D1=2160 and
H1
3D2
4
936 30
36 192 180
30 180 1803
5:
“book” — 2011/3/4 17:06 — page 87 — #101
i
i
i
i
i
i
i
i
3.6. Hilbert matrices 87
(3) det.H4/D1=6048000 and
H1
4D2
6
6
4
16 120 240 140
120 1200 2700 1680
240 2700 6480 4200
140 1680 4200 2800
3
7
7
5:
(4) det.H5/D1=266716800000 and
H1
5D2
6
6
6
6
6
4
25 300 1050 1400 630
300 4800 18900 26880 12600
1050 18900 79380 117600 56700
1400 26880 117600 179200 88200
630 12600 56700 88200 44100
3
7
7
7
7
7
5
:
While we do not otherwise deal with numerical linear algebra in this
book, the Hilbert matrices present examples that are so pretty and striking,
that we cannot resist giving a pair.
These examples arise from the fact that, while Hnis nonsingular, its de-
terminant is very close to zero. (In technical terms, Hnis “ill-conditioned”.)
We can already see this when nD3.Þ
Example 3.6.5. (1) Consider the equation
H3vD2
4
11=6
13=12
47=603
5D2
4
1:833 : : :
1:0833 : : :
0:7833 : : :3
5:
It has solution
vD2
4
1
1
13
5:
Let us round off the right-hand side to two significant digits and consider
the equation
H3vD2
4
1:8
1:1
0:783
5:
It has solution
vD2
4
0
6
3:63
5:
“book” — 2011/3/4 17:06 — page 88 — #102
i
i
i
i
i
i
i
i
88 Guide to Advanced Linear Algebra
(2) Let us round off the entries of H3to two significant figures to obtain
the matrix
2
4
1 0:5 0:33
0:5 0:33 0:25
0:33 0:25 0:23
5:
It has inverse
1
63 2
4
3500 17500 16100
17500 91100 85000
16100 85000 800003
5:
Rounding the entries off to the nearest integer, it is
2
4
56 278 256
278 1446 1349
256 1349 12703
5:Þ
“book” — 2011/3/4 17:06 — page 89 — #103
i
i
i
i
i
i
i
i
CHAPTER 4
The structure of a
linear transformation I
In this chapter we begin our analysis of the structure of a linear transforma-
tion TWV!V, where Vis a finite-dimensional F-vector space.
We have arranged our exposition in order to bring some of the most im-
portant concepts to the fore first. Thus we begin with the notions of eigen-
values and eigenvectors, and we introduce the characteristic and minimum
polynomials of a linear transformation early in this chapter as well. In this
way we can get to some of the most important structural results, including
results on diagonalizability and the Cayley-Hamilton theorem, as quickly
as possible.
Recall our metaphor of coordinates as a language in which to speak
about vectors and linear transformations. Consider a linear transformation
TWV!V,Va finite-dimensional vector space. Once we choose a basis
Bof V, i.e., a language, we have the coordinate vector ŒvBof every vector
vin V, a vector in Fn, and the matrix ŒTBof the linear transformation T,
an n-by-nmatrix, (where nis the dimension of V) with the property that
ŒT.v/BDŒTBŒvB. If we choose a different basis C, i.e., a different lan-
guage, we get different coordinate vectors ŒvCand a different matrix ŒTC
of T, though again we have the identity ŒT.v/CDŒTCŒvC. We have
also seen change of basis matrices, which tell us how to translate between
languages.
But here, mathematical language is different than human language. In
human language, if we have a problem expressed in English, and we trans-
late it into German, we haven’t helped the situation. We have the same prob-
lem, expressed differently, but no easier to solve.
89
“book” — 2011/3/4 17:06 — page 90 — #104
i
i
i
i
i
i
i
i
90 Guide to Advanced Linear Algebra
In linear algebra the situation is different. Given a linear transformation
TWV!V,Va finite-dimensional vector space, there is a preferred basis
Bof V; i.e., a best language in which to study the problem, one that makes
ŒTBas simple as possible and makes the structure of Teasiest to under-
stand. This is the language of eigenvalues, eigenvectors, and generalized
eigenvectors.
We first consider a simple example to motivate our discussion.
Let Abe the matrix
AD2 0
0 3
and consider TAWR2!R2(where, as usual, TA.v/ DAv). Also, consider
the standard basis E;so hx
yiEDhx
yifor every vector hx
yi2R2;and fur-
thermore ŒTAEDA.TAlooks simple, and indeed it is easy to understand.
We observe that TA.e1/D2e1, where e1Dh1
0iis the first standard basis
vector in E, and TA.e2/D3e2, where e2Dh0
1iis the second standard
basis vector in E. Geometrically, TAtakes the vector e1and stretches it by
a factor of 2in its direction, and takes the vector e2and stretches it by a
factor of 3in its direction.
On the other hand, let Bbe the matrix
BD414
3 9
and consider TBWR2!R2:Now TB.e1/DBh1
0iDh4
3i;and
TB.e2/DBh0
1iDh14
9i;and TBlooks like a mess. TBtakes each
of these vectors to some seemingly random vector in the plane, and there
seems to be no rhyme or reason here. But this appearance is deceptive, and
comes from the fact that we are studying Bby using the standard basis
E;i.e., in the Elanguage, which is the wrong language for the problem.
Instead, let us choose the basis BD fb1; b2g D nh 7
3i;h2
1io:Then
TB.b1/DBh7
3iDh14
6iD2h7
3iD2b1;and TB.b2/DBh2
1iD
h6
3iD3h2
1iD3b2:Thus TBhas exactly the same geometry as TA:
It takes the vector b1and stretches it by a factor of 2in its direction, and
it takes the vector b2and stretches it by a factor of 3in its direction. So
we should study TBby using the Bbasis, i.e., in the Blanguage. This is
“book” — 2011/3/4 17:06 — page 91 — #105
i
i
i
i
i
i
i
i
4.1. Eigenvalues, eigenvectors, and . . . 91
the right language for our problem, as it makes TBeasiest to understand.
Referring to Remark 2.2.8 we see that
TBBD2 0
0 3DTAE:
This “right” language is the language of eigenvalues, eigenvectors, and
generalized eigenvectors, and the language that lets us express the matrix of
a linear transformation in “canonical form”.
But before we proceed further, let me make two more remarks.
On the one hand, even if Vis not finite dimensional, it is often the
case that we still want to study eigenvalues and eigenvectors for a linear
transformation T;as these are important structural features of Tand still
give us a good way (sometimes the best way) of understanding T:
On the other hand, in studying a linear transformation Ton a finite-
dimensional vector space, it is often a big mistake to pick a basis Band
study ŒTB:It may be unnatural to pick any basis at all. Tis what comes
naturally and is usually what we want to study, even if in the end we can get
important information about Tby looking at ŒTB:Let me again emphasize
this point: Linear algebra is about linear transformations, not matrices.
4.1 Eigenvalues, eigenvectors,
and generalized eigenvectors
In this section we introduce some of the most important structural informa-
tion associated to a linear transformation.
Definition 4.1.1. Let TWV!Vbe a linear transformation. Let
2F:If Ker.TI/¤ f0g;then is an eigenvalue of T:In this
case, any nonzero v2Ker.TI/is an eigenvector of T;and the sub-
space Ker.TI/of Vis an eigenspace of T. In this situation, ,v, and
Ker.TI/are associated. Þ
Remark 4.1.2. Let v2V,v¤0. If v2Ker.TI/, then
.TI/.v/ D0, i.e., T.v/ Dv, and conversely, the traditional defi-
nition of an eigenvector. Þ
We will give some examples of this very important concept shortly, but
it is convenient to generalize it first.
“book” — 2011/3/4 17:06 — page 92 — #106
i
i
i
i
i
i
i
i
92 Guide to Advanced Linear Algebra
Definition 4.1.3. Let TWV!Vbe a linear transformation and let
2Fbe an eigenvalue of T:The generalized eigenspace of Tassociated
to is the subspace of Vgiven by
˚vj.TI/k.v/ D0for some positive integer k:
If vis a nonzero vector in this generalized eigenspace, then vis a general-
ized eigenvector associated to the eigenvalue : For such a v; the smallest
positive integer kfor which .TI/k.v/ D0is the index of v.Þ
Remark 4.1.4. A generalized eigenvector of index 1is just an eigenvec-
tor. Þ
For a linear transformation Tand an eigenvalue of T;we let E
denote the eigenspace EDKer.TI/: For a positive integer k; we let
Ek
be the subspace Ek
DKer.TI/k:We let E1
denote the generalized
eigenspace associated to the eigenvalue . We see that E1
E2
  and
that the union of these subspaces is E1
.
Example 4.1.5. (1) Let VDrF1and let LWV!Vbe left shift. Then
Lhas the single eigenvalue D0and the eigenspace E0is 1-dimensional,
E0D f.a1; a2; : : :/ 2VjaiD0for i > 1g:More generally, Ek
0D
f.a1; a2; : : :/ 2VjaiD0for i > kg;so dim Ek
0Dkfor every k; and
finally VDE1
0:In contrast, RWV!Vdoes not have any eigenvalues.
(2) Let VDrF11 and let LWV!Vbe left shift. Then for any
2F; Eis 1-dimensional with basis f.1; ; 2; : : :/g:It is routine to
check that Ek
is k-dimensional for every 2Fand every positive integer
k: In contrast, RWV!Vdoes not have any eigenvalues.
(3) Let Fbe a field of characteristic 0and let VDP .F/, the space of
all polynomials with coefficients in F:Let DWV!Vbe differentiation,
D.p.x// Dp0.x/: Then Dhas the single eigenvalue 0and the correspond-
ing eigenspace E0is 1-dimensional, consisting of the constant polynomials.
More generally, Ek
0is k-dimensional, consisting of all polynomials of de-
gree at most k1:
(4) Let VDP.F/be the space of all polynomials with coefficients in
a field of characteristic 0and let TWV!Vbe defined by T.p.x// D
xp0.x/: Then the eigenvalues of Tare the nonnegative integers, and for
every nonnegative integer mthe eigenspace Emis 1-dimensional with basis
fxmg:
(5) Let Vbe the space of holomorphic functions on C;and let DWV!
Vbe differentiation, D.f .z// Df0.z/: For any complex number ,E
“book” — 2011/3/4 17:06 — page 93 — #107
i
i
i
i
i
i
i
i
4.1. Eigenvalues, eigenvectors, and . . . 93
is 1-dimensional with basis f .z/ Dez :Also, Ek
is k-dimensional with
basis fez ; zez; : : : ; zk1ez g:Þ
Now we turn to some finite-dimensional examples. We adopt the stan-
dard language that the eigenvalues, eigenvectors, etc. of an n-by-nma-
trix Aare the eigenvalues, eigenvectors, etc. of TAWFn!Fn(where
TA.v/ DAv/:
Example 4.1.6. (1) Let 1; : : : ; nbe distinct elements of Fand let A
be the n-by-ndiagonal matrix
AD2
6
6
6
4
1
2
:::
n
3
7
7
7
5:
For each iD1; : : : ; n; iis an eigenvalue of Awith 1-dimensional eigenspace
Eiwith basis feig:
(2) Let be an element of Fand let Abe the n-by-nmatrix
AD2
6
6
6
6
6
4
 1
 1
::::::
1
3
7
7
7
7
7
5
with entries of on the diagonal, 1immediately above the diagonal, and
0everywhere else. For each kD1; : : : ; n,ekis a generalized eigenvector
of index k; and the generalized eigenspace Ek
is k-dimensional with basis
fe1; : : : ; ekg:Þ
Now we introduce the characteristic polynomial.
Definition 4.1.7. Let Abe an n-by-nmatrix. The characteristic poly-
nomial cA.x/ of Ais the polynomial
cA.x/ Ddet.xI A/: Þ
Remark 4.1.8. By properties of the determinant it is clear that cA.x/ is
a monic polynomial of degree n.Þ
Lemma 4.1.9. Let Aand Bbe similar matrices. Then cA.x/ DcB.x/:
“book” — 2011/3/4 — 17:06 — page 94 — #108
i
i
i
i
i
i
i
i
94 Guide to Advanced Linear Algebra
Proof. If BDPAP 1;then cB.x/ Ddet.xI B/ Ddet.xI PAP 1/D
det.P.xI A/P 1/Ddet.xI A/ DcA.x/ by Corollary 3.3.2.
Definition 4.1.10. Let Vbe a finite-dimensional vector space and let
TWV!Vbe a linear transformation. Let Bbe any basis of Vand let
ADŒTB:The characteristic polynomial cT.x/ is the polynomial
cT.x/ DcA.x/ Ddet.xI A/:
Þ
Remark 4.1.11. By Corollary 2.3.11 and Lemma 4.1.9, cT.x/ is well-
defined (i.e., independent of the choice of basis Bof V). Þ
Theorem 4.1.12. Let Vbe a finite-dimensional vector space and let TW
V!Vbe a linear transformation. Then is an eigenvalue of Tif and
only if is a root of the characteristic polynomial cT.x/; i.e., if and only if
cT./ D0:
Proof. Let Bbe a basis of Vand let ADŒTB:Then by definition is an
eigenvalue of Tif and only if there is a nonzero vector vin Ker.TI/;
i.e., if and only if .A I /u D0for some nonzero vector uin Fn(where
the connection is that uDŒvB). This is the case if and only if AI is
singular, which is the case if and only if det.AI / D0: But det.AI / D
.1/ndet.I A/; where nDdim.V /; so this is the case if and only if
cT./ DcA./ Ddet.I A/ D0:
Remark 4.1.13. We have defined cA.x/ Ddet.xI A/ and this is the
correct definition, as we want cA.x/ to be a monic polynomial. In actually
finding eigenvectors or generalized eigenvectors, it is generally more con-
venient to work with AI rather than I A: Indeed, when it comes
to finding chains of generalized eigenvectors, it is almost essential to use
AI; as using I Awould introduce spurious minus signs, which
would have to be corrected for. Þ
For the remainder of this section we assume that Vis finite dimensional.
Definition 4.1.14. Let TWV!Vand let be an eigenvalue of
T:The algebraic multiplicity of , alg-mult./, is the multiplicity of as
a root of the characteristic polynomial cT.x/. The geometric multiplicity
of , geom-mult ./, is the dimension of the associated eigenspace ED
Ker.TI/.Þ
We use multiplicity to mean algebraic multiplicity, as is standard.
“book” — 2011/3/4 17:06 — page 95 — #109
i
i
i
i
i
i
i
i
4.1. Eigenvalues, eigenvectors, and . . . 95
Lemma 4.1.15. Let TWV!Vand let be an eigenvalue of T. Then
1geom-mult./ alg-mult./.
Proof. By definition, if is an eigenvalue of Tthere exists a (nonzero)
eigenvector, so 1dim.E/:
Suppose dim.E/Ddand let fv1; : : : ; vdg D B1be a basis for E:
Extend B1to a basis BD fv1; : : : ; vngof V: Then
ŒTBDI B
0 DDA;
a block matrix with the upper left-hand block d-by-d. Then
ŒxITBDxI ADxI I B
0 xI DD.x /I B
0 xI D
so
cT.x/ Ddet.xI A/ Ddet .x /I det.xI D/
D.x /ddet.xI D/
and hence dalg-mult./:
Corollary 4.1.16. Let TWV!Vand let be an eigenvalue of Twith
alg-mult./ D1: Then geom-mult./ D1:
It is important to observe that the existence of eigenvalues and eigen-
vectors depends on the field F;as we see from the next example.
Example 4.1.17. For any nonzero rational number tlet Atbe the matrix
AtD0 1
t 0;
so
A2
tDt 0
0 t DtI:
Let be an eigenvalue of Atwith associated eigenvector v: Then, on the
one hand,
A2
t.v/ DAtAt.v/DAt.v/ DAt.v/ D2v;
“book” — 2011/3/4 17:06 — page 96 — #110
i
i
i
i
i
i
i
i
96 Guide to Advanced Linear Algebra
but, on the other hand,
A2
t.v/ DtI.v/ Dtv;
so 2Dt:
(1) Suppose tD1: Then 2D1,D ˙1, and we have the eigenvalue
D1with associated eigenvector vDh1
1i;and the eigenvalue D 1
with associated eigenvector vDh1
1i:
(2) Suppose tD2: If we regard Aas being defined over Q;then there
is no 2Qwith 2D2; so Ahas no eigenvalues. If we regard Aas being
defined over R;then D ˙p2, and Dp2is an eigenvalue with asso-
ciated eigenvector h1
p2i, and D p2is an eigenvalue with associated
eigenvector h1
p2i:
(3) Suppose tD 1: If we regard Aas being defined over R;then there
is no 2Rwith 2D 1; so Ahas no eigenvalues. If we regard Aas being
defined over C;then D ˙i, and Diis an eigenvalue with associated
eigenvector h1
ii, and D iis an eigenvalue with associated eigenvector
h1
ii.Þ
Now we introduce the minimum polynomial.
Lemma 4.1.18. Let Abe an n-by-nmatrix. There is a nonzero polynomial
p.x/ with p.A/ D0:
Proof. The set of matrices fI; A; : : : ; An2gis a set of n2C1elements of a
vector space of dimension n2, and so must be linearly dependent. Thus there
exist scalars c0; : : : ; cn2;not all zero, with c0ICc1ACCcn2An2D0:
Then p.A/ D0where p.x/ is the nonzero polynomial p.x/ Dcn2xn2C
  Cc1xCc0:
Theorem 4.1.19. Let Abe an n-by-nmatrix. There is a unique monic poly-
nomial mA.x/ of lowest degree with mA.A/ D0: Furthermore, mA.x/ di-
vides every polynomial p.x/ with p.A/ D0:
Proof. By Lemma 4.1.18, there is some nonzero polynomial p.x/ with
p.A/ D0:
If p1.x/ and p2.x/ are any polynomials with p1.A/ D0and p2.A/ D
0; and q.x/ Dp1.x/ Cp2.x/; then q.A/ Dp1.A/ Cp2.A/ D0C
0D0: Also, if p1.x/ is any polynomial with p1.A/ D0; and r.x/ is any
polynomial, and q.x/ Dp1.x/r.x/; then q.A/ Dp1.A/r.A/ D0r .A/ D
“book” — 2011/3/4 17:06 — page 97 — #111
i
i
i
i
i
i
i
i
4.2. Some structural results 97
0: Thus, in the language of Definition A.1.5, the set of polynomials fp.x/ j
p.A/ D0gis a nonzero ideal, and so by Lemma A.1.8 there is a unique
polynomial mA.x/ as claimed.
Definition 4.1.20. The polynomial mA.x/ of Theorem 4.1.19 is the
minimum polynomial of A.Þ
Lemma 4.1.21. Let Aand Bbe similar matrices. Then mA.x/ DmB.x/:
Proof. If BDPAP 1;and p.x/ is any polynomial with p.A/ D0, then
p.B/ DPp.A/P 1DP 0P 1D0, and vice-versa.
Definition 4.1.22. Let Vbe a finite-dimensional vector space and let
TWV!Vbe a linear transformation. Let Bbe any basis of Vand
let ADŒTB:The minimum polynomial of Tis the polynomial mT.x/
defined by mT.x/ DmA.x/.Þ
Remark 4.1.23. By Corollary 2.3.11 and Lemma 4.1.21, mT.x/ is well-
defined (i.e., independent of the choice of basis Bof V). Alternatively we
can see that mT.x/ is well-defined as for any linear transformation SW
V!V; SD0(i.e., Sis the 0linear transformation) if and only if the
matrix ŒSBD0(i.e., ŒSBis the 0matrix) in any and every basis Bof V:
Þ
4.2 Some structural results
In this section we prove some basic but important structural results about a
linear transformation, obtaining information about generalized eigenspaces,
direct sum decompositions, and the relationship between the characteris-
tic and minimum polynomials. As an application, we derive the famous
Cayley-Hamilton theorem.
While we prove much stronger results later, the following result is so
easy that we will pause to obtain it here.
Definition 4.2.1. Let Vbe a finite-dimensional vector space and let
TWV!Vbe a linear transformation. Tis triangularizable if there is a
basis Bof Vin which the matrix ŒTBis upper triangular. Þ
Theorem 4.2.2. Let Vbe a finite-dimensional vector space over the field
Fand let TWV!Vbe a linear transformation. Then Tis triangulariz-
able if and only if its characteristic polynomial cT.x/ is a product of linear
factors. In particular, if Fis algebraically closed then every TWV!Vis
triangularizable.
“book” — 2011/3/4 17:06 — page 98 — #112
i
i
i
i
i
i
i
i
98 Guide to Advanced Linear Algebra
Proof. If ŒTBDAis an upper triangular matrix with diagonal entries
d1; : : : ; dn;then cT.x/ DcA.x/ Ddet.xI A/ D.x d1/.x dn/is
a product of linear factors.
We prove the converse by induction on nDdim.V /: Let cT.x/ D
.x d1/.x dn/: Then d1is an eigenvalue of TIchoose an eigenvector
v1and let V1be the subspace of Vgenerated by v1:Let VDV=V1:Then
Tinduces TWV!Vwith cT.x/ D.x d2/.x dn/: By induction, V
has a basis BD fv2; : : : ; vngwith ŒTBDDupper triangular. Let vi2V
with .vi/Dvifor iD2; : : : ; n; and let BD fv1; v2; : : : ; vng:Then
ŒTBDd1C
0 D
for some 1-by-.n 1/ matrix C. Regardless of what Cis, this matrix is
upper triangular.
Lemma 4.2.3. (1) Let vbe an eigenvector of Twith associated eigenvalue
and let p.x/ 2FŒxbe a polynomial. Then p.T/.v/ Dp./v: Thus, if
p./ ¤0then p.T/.v/ ¤0:
(2) More generally, let vbe a generalized eigenvector of Tof index k
with associated eigenvalue and let p.x/ 2FŒxbe a polynomial. Then
p.T/.v/ Dp./v Cv0;where v0is a generalized eigenvector of Tof index
k0< k with associated eigenvector : Thus if p./ ¤0then p.T/.v/ ¤0:
Proof. We can rewrite any polynomial p.x/ 2FŒxin terms of x:
p.x/ Dan.x /nCan1.x /n1C  C a1.x / Ca0:
Setting xDwe see that a0Dp./:
(1) If vis an eigenvector of Twith associated eigenvalue , then
p.T/.v/ Dan.TI/nC  C a1.TI/Cp./I.v/
Dp./I.v/ Dp./v
as all terms but the last vanish.
(2) If vis a generalized eigenvector of Tof index kwith associated
eigenvalue , then
p.T/.v/ Dan.TI/nC  C a1.TI/Cp./I.v/
Dv0Cp./v
“book” — 2011/3/4 17:06 — page 99 — #113
i
i
i
i
i
i
i
i
4.2. Some structural results 99
where
v0Dan.TI/nC  C a1.TI/.v/
Dan.TI/n1C C a1.TI/.v/
is a generalized eigenvector of Tof index at most k1associated to .
Lemma 4.2.4. Let TWV!Vbe a linear transformation with cT.x/ D
.x 1/e1.x m/em;with 1; : : : ; mdistinct. Let WiDE1
ibe the
generalized eigenspace of Tassociated to the eigenvalue i:Then Wiis
a subspace of Vof dimension ei:Also, WiDEei
i;i.e., any generalized
eigenvector of Tassociated to ihas index at most ei:
Proof. In proof of Theorem 4.2.2, we may choose the eigenvalues in any
order, so we choose ifirst, eitimes. Then we find a basis Bof Vwith
ŒTBan upper triangular matrix
ŒTBDA B
0 D;
where Ais an upper triangular ei-by-eimatrix all of whose diagonal entries
are equal to iand Dis an .nei/-by-.nei/matrix all of whose diagonal
entries are equal to the other js and thus are unequal to i:Write BD
B1[B0
1where B1consists of the first eivectors in B;B1D fv1; : : : ; veig.
We claim that Wiis the subspace spanned by B1:
To see this, observe that
TiIBDAiI B
0 D iI
so TiIei
BDAiIeiB0
0DiIei
for some submatrix B0(whose exact value is irrelevant). But AiIis an
ei-by-eiupper triangular matrix with all of its diagonal entries 0; and, as
is easy to compute, .A iI /eiD0: Also, DiIis an ei-by-eiupper
triangular matrix with none of its diagonal entries 0; and as is also easy to
compute, .DiI /eiis an upper triangular matrix with none of its diagonal
entries equal to 0: Both of these statements remain true for any eei:
Thus for any eei;
TiIe
BD0 B0
0 D0
“book” — 2011/3/4 — 17:06 — page 100 — #114
i
i
i
i
i
i
i
i
100 Guide to Advanced Linear Algebra
with D0an upper triangular matrix all of whose diagonal entries are nonzero.
Then it is easy to see that for any eei;KerTiIe
B/is the subspace
of Fngenerated by fe1; : : : ; eig. Thus Wiis the subspace of Vgenerated by
fv1; : : : ; veig D B1;and is a subspace of dimension ei:
Lemma 4.2.5. In the situation of Lemma 4.2.4,
VDW1˚  ˚ Wm:
Proof. Since nDdeg cT.x/ De1C  C em;by Corollary 1.4.8(3) we
need only show that if 0Dw1C  C wmwith wi2Wifor each i; then
wiD0for each i:
Suppose we have an expression
0Dw1C  C wiC  C wm
with wi¤0: Let qi.x/ DcT.x/=.x i/ei;so qi.x/ is divisible by
.x j/ejfor every j¤i; but qi.i/¤0: Then
0Dqi.T/.0/ Dqi.T/w1C C wiC C wm
Dqi.T/w1C  C qi.T/wiC  C qi.T/wm
D0C  C qi.T/wiC  C 0
Dqi.T/wi;
contradicting Lemma 4.2.3.
Lemma 4.2.6. Let TWV!Vbe a linear transformation whose charac-
teristic polynomial cT.x/ is a product of linear factors. Then
(1) mT.x/ and cT.x/ have the same linear factors.
(2) mT.x/ divides cT.x/.
Proof. (1) Let mT.x/ have a factor x, and let n.x/ DmT.x/=.x /.
Then n.T/¤0, so there is a vector v0with vDn.T/.v0/¤0. Then
.TI/.v/ DmT.T/.v/ D0, i.e., v2Ker.TI/, so vis an eigen-
vector of Twith associated eigenvalue . Thus xis a factor of cT.x/.
Suppose xis a factor of cT.x/ that is not a factor of mT.x/, so that
mT./ ¤0. Choose an eigenvector vof Twith associated eigenvector .
Then on the one hand mT.T/D0so mT.T/.v/ D0, but on the other
hand, by Lemma 4.2.3, mT.T/.v/ DmT./v ¤0; a contradiction.
(2) Since VDW1˚  ˚ Wmwhere WiDEei
i;we can write any
v2Vas vDw1C  C wmwith wi2Wi:
“book” — 2011/3/4 — 17:06 — page 101 — #115
i
i
i
i
i
i
i
i
4.2. Some structural results 101
Then
cT.T/.v/ DcT.T/w1C  C wm
DcT.T/w1C  C cT.T/wm
D0C  C 0D0
as for each i; cT.x/ is divisible by .x i/eiand .TiI/ei.wi/D0
by the definition of Eei
i:But mT.x/ divides every polynomial p.x/ with
p.T/D0; so mT.x/ divides cT.x/:
This lemma has a famous corollary, originally proved by quite different
methods.
Corollary 4.2.7 (Cayley-Hamilton theorem).Let Vbe a finite-dimensional
vector space and let TWV!Vbe a linear transformation. Then
cT.T/D0:
Proof. In case cT.x/ factors into a product of linear factors,
cT.x/ D.x 1/e1.x m/em;
we showed this in the proof of Lemma 4.2.6.
In general, pick any basis Bof Vand let ADŒTB:Then cT.T/D0
if and only if cA.A/ D0: (Note cT.x/ DcA.x/.) Now Ais a matrix with
entries in F;and we can consider the linear transformation TAWFn!Fn:
But we may also take any extension field Eof Fand consider e
TWEn!
Endefined by e
T.v/ DAv: (So e
TDTA;but we are being careful to
use a different notation as e
Tis defined on the new vector space En.) Now
ce
T.x/ DcA.x/ Ddet.xI A/ DcT.x/: In particular, we may take Eto be
a field in which cA.x/ splits into a product of linear factors. For example, we
could take Eto be the algebraic closure of F;and then every polynomial
p.x/ 2FŒxsplits into a product of linear factors over E:Then by the
first case of the corollary, ce
T.e
T/D0; i.e., cA.A/ D0; i.e., cT.T/D0:
(Expressed differently, Ais similar, as a matrix with entries in E;to a matrix
Bfor which cB.B/ D0: If ADPBP 1;then for any polynomial f .x/;
f .A/ DPf .B/P 1:Also, since Aand Bare similar, cA.x/ DcB.x/:
Thus cA.A/ DcB.A/ DP cB.B/P 1DP 0P 1D0.)
Remark 4.2.8. For the reader familiar with tensor products, we observe
that the second case of the corollary can be simplified to:
Consider e
TDT˝1WV˝FE!V˝FE:Then cT.x/ Dce
T.x/ and
ce
T.e
T/D0by the lemma, so cT.T/D0.Þ
“book” — 2011/3/4 — 17:06 — page 102 — #116
i
i
i
i
i
i
i
i
102 Guide to Advanced Linear Algebra
Remark 4.2.9. If Fis algebraically closed (e.g., FDC;which is al-
gebraically closed by the Fundamental Theorem of Algebra) then cT.x/
automatically splits into a product of linear factors, and we are in the first
case of the Cayley-Hamilton theorem, and we are done—fine. If not, al-
though our proof is correct, it is the “wrong” proof. We should not have to
pass to a larger field Ein order to investigate linear transformations over
F:We shall present a “right” proof later, where we will see how to general-
ize both Lemma 4.2.5 and Lemma 4.2.6 (see Theorem 5.3.1 and Corollary
5.3.4). Þ
4.3 Diagonalizability
Before we continue with our analysis of general linear transformations, we
consider a particular but very useful case.
Definition 4.3.1. (1) Let Vbe a finite-dimensional vector space and let
TWV!Vbe a linear transformation. Then Tis diagonalizable if Vhas
a basis Bwith ŒTBa diagonal matrix.
(2) An n-by-nmatrix Ais diagonalizable if TAWFn!Fnis diagonal-
izable. Þ
Remark 4.3.2. In light of Theorem 2.3.14, we may phrase (2) more sim-
ply as: Ais diagonalizable if it is similar to a diagonal matrix. Þ
Lemma 4.3.3. Let Vbe a finite-dimensional vector space and let TWV!
Vbe a linear transformation. Then Tis diagonalizable if and only if Vhas
a basis Bconsisting of eigenvectors of T:
Proof. Let BD fv1; : : : ; vngand let DDŒTBbe a diagonal matrix with
diagonal entries 1; : : : ; n:For each i;
TviBDŒTBviBDD eiDieiDiviB;
so T.vi/Diviand viis an eigenvector.
Conversely, if BD fv1; : : : ; vngis a basis of eigenvectors, so T.vi/D
ivifor each i, then
ŒTBDTv1BˇˇTv2Bˇˇ
D1v1Bˇˇ2v2BˇˇD1e1ˇˇ2e2ˇˇDD
is a diagonal matrix.
“book” — 2011/3/4 — 17:06 — page 103 — #117
i
i
i
i
i
i
i
i
4.3. Diagonalizability 103
Theorem 4.3.4. Let Vbe a finite-dimensional vector space and let TW
V!Vbe a linear transformation. If cT.x/ does not split into a product
of linear factors, then Tis not diagonalizable. If cT.x/ does split into a
product of linear factors (which is always the case if Fis algebraically
closed) then the following are equivalent:
(1) Tis diagonalizable.
(2) mT.x/ splits into a product of distinct linear factors.
(3) For every eigenvalue of T; EDE1
(i.e., every generalized
eigenvector of Tis an eigenvector of T).
(4) For every eigenvalue of T;geom-mult./ Dalg-mult./:
(5) The sum of the geometric multiplicities of the eigenvalues is equal
to the dimension of V:
(6) If 1; : : : ; mare the distinct eigenvalues of T, then
VDE1˚   Em:
Proof. We prove the contrapositive of the first claim: Suppose Tis diag-
onalizable and let Bbe a basis of Vwith DDŒTBa diagonal matrix
with diagonal entries 1; : : : ; n:Then cT.x/ DcD.x/ Ddet.xI D/ D
.x 1/.x n/:
Suppose cT.x/ D.x1/.xn/. The scalars 1; : : : ; nmay not
all be distinct, so we group them. Let the distinct eigenvalues be 1; : : : ; m
so cT.x/ D.x 1/e1.x m/emfor positive integers e1; : : : ; em:
Let nDdim.V /: Visibly, eiis the algebraic multiplicity of i;and
e1C  C emDn: Let fibe the geometric multiplicity of i:Then we
know by Lemma 4.1.15 that 1fiei;so f1CCfmDnif and only
if fiDeifor each i; so (4) and (5) are equivalent. We know by Lemma
4.2.4 that eiDdim E1
i;and by definition fiDdim Ei;and EiE1
i;
so (3) and (4) are equivalent.
By Lemma 4.2.5, VDE1
1˚  ˚ E1
k, so VDE1˚  ˚ Ekif
and only if E1DE1
1for each i, so (3) and (6) are equivalent.
If VDE1˚  ˚ Em, let Bibe a basis for Eiand let BD
B1[  [ Bm. Let Tibe the restriction of Tto Ei. Then Bis a basis for
Vand
ŒTBD2
6
4
A1
:::
Am
3
7
5DA;
a block diagonal matrix with AiDŒTiBi. But in this case Aiis the ei-by-ei
matrix iI(a scalar multiple of the identity matrix) so (6) implies (1).
“book” — 2011/3/4 — 17:06 — page 104 — #118
i
i
i
i
i
i
i
i
104 Guide to Advanced Linear Algebra
If there is an eigenvalue iof Tfor which EiE1
i;let vi2E1
i
be a generalized eigenvector of index k > 1; so .TiI/k.vi/D0but
.TiI/k1.vi/¤0: For any polynomial p.x/ with p.i/¤0; p.T/.vi/
is another generalized eigenvector of the same index k. This implies that any
polynomial f .x/ with f .T/.vi/D0; and in particular mT.x/; has a factor
of .x i/k:Thus not-(3) implies not-(2), or (2) implies (3).
Finally, let Tbe diagonalizable, ŒTBDDin some basis B;where
Dis a diagonal matrix with entries 1; : : : ; m;and with distinct diagonal
entries 1repeated e1times, 2repeated e2times, etc. We may reorder B
so that
ŒTBD2
6
4
A1
:::
Am
3
7
5DA
with Aithe ei-by-eimatrix iI: Then AiiIis the zero matrix, and an
easy computation shows .A 1I / .A mI / D0; so mT.x/ divides,
and is easily seen to be equal to, .x 1/.x m/; and (1) implies
(2).
Corollary 4.3.5. Let Vbe a finite-dimensional vector space and TWV!
Va linear transformation. Suppose that cT.x/ D.x 1/.x n/is
a product of distinct linear factors. Then Tis diagonalizable.
Proof. By Corollary 4.1.16, alg-mult.i/D1implies geom-mult.i/D1
as well.
4.4 An application to
differential equations
Let us look at a familiar situation, the solution of linear differential equa-
tions, and see how the ideas of linear algebra clarify what is going on. Since
we are interested in the linear-algebraic aspects of the situation rather than
the analytical ones, we will not try to make minimal differentiability as-
sumptions, but rather make the most convenient ones.
We let Vbe the vector space of C1complex-valued functions on the
real line R:We let Lbe an nth order linear differential operator LD
an.x/DnC  C a1.x/DCa0.x/; where the ai.x/ are functions in Vand
Ddenotes differentiation: D.f .x// Df0.x/ and Dk.f .x// Df.k/.x/; the
kth derivative. We further assume that an.x/ ¤0for all x2R:
“book” — 2011/3/4 — 17:06 — page 105 — #119
i
i
i
i
i
i
i
i
4.4. An application to differential equations 105
Theorem 4.4.1. Let Lbe as above. Then Ker.L/is an n-dimensional sub-
space of V: For any b.x/ 2V; fy2VjL.y/ Db.x/gis an affine subspace
of Vparallel to Ker.L/:
Proof. As the kernel of a linear transformation, Ker.L/is a subspace of V:
Ker.L/D fy2VjL.y/ D0gis just the solution space of the linear
differential equation L.y/ Dan.x/y.n/ C  C a1.x/y0Ca0.x/y D0:
For x02Rdefine a linear transformation EWKer.L/!Cnby
E.y/ D2
6
6
6
6
4
yx0
y0x0
:
:
:
y.n1/ x0
3
7
7
7
7
5
:
The fundamental existence and uniqueness theorem for linear differen-
tial equations tells us that Eis onto (that’s existence—there is a solution for
any set of initial conditions) and that it is 1-1(that’s uniqueness), so Eis an
isomorphism and Ker.L/is n-dimensional. For any b.x/ 2Vthis theorem
tells us that L.y/ Db.x/ has a solution, so now, by Theorem 1.5.7, the set
of all solutions is an affine subspace parallel to Ker.L/:
Now we wish to solve L.y/ D0or L.y/ Db.x/:
To solve L.y/ D0; we find a basis of Ker.L/: Since we know Ker.L/
is n-dimensional, we simply need to find nlinearly independent functions
fy1.x/; : : : ; yn.x/gin Ker.L/and the general solution of L.y/ D0will be
yDc1y1.x/ CCcnyn.x/: Then, by Proposition1.5.6, in order to solve
the inhomogeneous equation L.y/ Db.x/; we simply need to find a single
solution, i.e., a single function y0.x/ with L.y0.x// Db.x/; and then the
general solution of L.y/ Db.x/ will be yDy0.x/ Cc1y1.x/ C  C
cnyn.x/:
We now turn to the constant coefficient case, where we can find explicit
solutions. That is, we assume an; : : : ; a0are constants.
First let us see that a familiar property of differentiation is a conse-
quence of a fact from linear algebra.
Theorem 4.4.2. Let Vbe a (necessarily infinite-dimensional) vector space
and let TWV!Vbe a linear transformation such that Tis onto and
Ker.T/is 1-dimensional. Then for any positive integer k; Ker.Tk/is k-
dimensional and is the subspace fp.T/.vk/jp.x/ an arbitrary polynomialg
for a single generalized eigenvector vkof index k, (necessarily associated
to the eigenvalue 0).
“book” — 2011/3/4 — 17:06 — page 106 — #120
i
i
i
i
i
i
i
i
106 Guide to Advanced Linear Algebra
Proof. We proceed by induction on k. By hypothesis the theorem is true
for kD1: Suppose it is true for kand consider TkC1. By hypothesis,
there is a vector vkC1with T.vkC1/Dvk, and vkC1is then a generalized
eigenvector of index kC1. The subspace fp.T/.vkC1/jp.x/ a polynomialg
is a subspace of Ker.TkC1/of dimension kC1. We must show this subspace
is all of Ker.TkC1/: Let w2Ker.TkC1/, so TkC1.w/ DTk.T.w// D0.
By the inductive hypothesis, we can write T.w/ Dp.T/.vk/for some
polynomial p.x/. If we let w0Dp.T/.vkC1/, then
T.w0/DTp.T/.vkC1/Dp.T/T.vkC1/Dp.T/.vk/DT.w/:
Hence ww02Ker.T/; so wDw0Cav1where v1DTk1.vk/D
Tk.vkC1/, i.e., wD.p.T/CaTk/.vkC1/Dq.T/.vkC1/where q.x/ D
p.x/ Caxk, and we are done.
Lemma 4.4.3. (1) Ker.Dk/has basis f1; x; : : : ; xk1g:
(2) More generally, for any a; Ker.Da/khas basis feax; xeax;:::;
xk1eaxg.
Proof. We can easily verify that
.Da/k.xk1eax/D0but .Da/k1.xk1eax/¤0
(and it is trivial to verify that Dk.xk1/D0but Dk1.xk1/¤0). Thus
BD feax; xeax; : : : ; xk1eaxgis a set of generalized eigenvectors of in-
dices 1; 2; : : : ; k associated to the eigenvalue a. Hence Bis linearly inde-
pendent. We know from Theorem 4.4.1 that Ker..Da/k/has dimension
k; so Bforms a basis.
Alternatively, we can use Theorem 4.4.2. We know Ker.D/consists pre-
cisely of the constant functions, so it is 1-dimensional with basis f1g:Fur-
thermore, Dis onto by the Fundamental Theorem of Calculus: If F .x/ D
Rx
x0f .t/dt; then D.F .x// Df .x/:
For Dathe situation is only a little more complicated. We can easily
find that Ker.Da/ D fceaxg;a1-dimensional space with basis feaxg:If
we let
F .x/ Deax Zx
x0
eat f .t/ dt;
the product rule and the Fundamental Theorem of Calculus show that
.Da/.F .x// Df .x/:
“book” — 2011/3/4 — 17:06 — page 107 — #121
i
i
i
i
i
i
i
i
4.4. An application to differential equations 107
With notation as in the proof of Theorem 4.4.2, if we let v1Deax and
solve for v2; v3;:::;recursively, we obtain a basis of Ker.Da/
feax; xeax; .x2=2/eax; : : : ; .xk1=.k 1/Š/eaxg
(or f1; x; x2=2; : : : ; xk1=.k 1/Šgif aD0) and since we can replace any
basis element by a multiple of itself and still have a basis, we are done.
Theorem 4.4.4. Let Lbe a constant coefficient differential operator with
factorization
LDan.D1/e1.Dm/em
where 1; : : : ; mare distinct. Then
fe1x; : : : ; xe11e1x; : : : ; emx; : : : ; xem1mxg
is a basis for Ker.L/; so that the general solution of L.y/ D0is
yDc1;1e1xC  C c1;e1xe11e1xC 
Ccm;1emxC  C cm;emxem1emx:
If b.x/ 2Vis arbitrary, let y0Dy0.x/ be an element of Vwith
L.y0.x// Db.x/: (Such an element y0.x/ always exists.) Then the general
solution of L.y/ Db.x/ is
yDy0Cc1;1e1xC  C c1;e1xe11e1xC 
Ccm;1emxC  C cm;emxem1emx:
Proof. We know that the set of generalized eigenspaces corresponding to
distinct eigenvalues are linearly independent (this follows directly from the
proof of Lemma 4.2.5, which does not require Vto be finite dimensional)
and then within each eigenspace a set of generalized eigenvectors with dis-
tinct indices is linearly independent as well, so this entire set of generalized
eigenvectors is linearly independent. Since there are nof them, they form a
basis for Ker.L/: The inhomogeneous case then follows immediately from
Proposition1.5.6.
Remark 4.4.5. Suppose Lhas real coefficients and we want to solve
L.y/ D0in real functions. We proceed as above to obtain the general
solution, and look for conditions on the cs for the solution to be real. Since
anxnC  C a0is a real polynomial, if the complex number is a root
of it, so is its conjugate , and then to obtain a real solution of L.y/ D0
“book” — 2011/3/4 — 17:06 — page 108 — #122
i
i
i
i
i
i
i
i
108 Guide to Advanced Linear Algebra
the coefficient of ex must be the complex conjugate of the coefficient of
ex ;etc. Thus in our expression for ythere is a pair of terms cex Ccex:
Writing cDc1Cic2and DaCbi ,
cex Ccex Dc1Cic2eaxcos.bx/ Cisin.bx/
Cc1ic2eax cos.bx/ isin.bx/
Dd1eax cos.bx/ Cd2eax sin.bx/
for real numbers d1and d2:That is, we can perform a change of basis and
instead of using the basis given in Theorem 4.4.4, replace each pair of basis
elements fex ; ex gby the pair of basis elements feax cos.bx/; eax sin.bx/g;
etc., and express our solution in terms of this new basis. Þ
“book” — 2011/3/4 — 17:06 — page 109 — #123
i
i
i
i
i
i
i
i
CHAPTER 5
The structure of a
linear transformation II
In this chapter we conclude our analysis of the structure of a linear transfor-
mation TWV!V. We derive our deepest structural results, the rational
canonical form of Tand, when Vis a vector space over an algebraically
closed field F, the Jordan canonical form of T.
Recall our metaphor of coordinates as giving a language in which to de-
scribe linear transformations. A basis Bof Vin which ŒTBis in canonical
form is a “right” language to describe the linear transformation T. This is
especially true for the Jordan canonical form, which is intimately related to
eigenvalues, eigenvectors, and generalized eigenvectors.
The importance of the Jordan canonical form of Tcannot be overem-
phasized. Every structural fact about a linear transformation is encoded in
its Jordan canonical form.
We not only show the existence of the Jordan canonical form, but also
derive an algorithm for finding the Jordan canonical form of Tas well as
finding a Jordan basis of V, assuming we can factor the characteristic poly-
nomial cT.x/. (Of course, there is no algorithm for factoring polynomials,
as we know from Galois theory.)
We have arranged our exposition in what we think is the clearest way,
getting to the simplest (but still important) results as quickly as possible
in the preceding chapter, and saving the deepest results for this chapter.
However, this is not the logically most economical way. (That would have
been to prove the most general and deepest structure theorems first, and
to obtain the simpler results as corollaries.) This means that our approach
involves a certain amount of repetition. For example, although we defined
109
“book” — 2011/3/4 — 17:06 — page 110 — #124
i
i
i
i
i
i
i
i
110 Guide to Advanced Linear Algebra
the characteristic and minimum polynomials of a linear transformation in
the last chapter, we will be redefining them here, when we consider them
more deeply. But we want to remark that this repetition is a deliberate choice
arising from the order in which we have decided to present the material.
While our ultimate goal in this chapter is the Jordan canonical form, our
path to it goes through rational canonical form. There are several reasons
for this: First, rational canonical form always exists, while in order to obtain
the Jordan canonical form for an arbitrary linear transformation we must be
working over an algebraically closed field. (There is a generalization of
Jordan canonical form that exists over an arbitrary field, and we will briefly
mention it though not treat it in depth.) Second, rational canonical form is
important in itself, and, as we shall see, has a number of applications. Third,
the natural way to prove the existence of the Jordan canonical form of Tis
first to split Vup into the direct sum of the generalized eigenspaces of T
(this being the easy step), and then to analyze each generalized eigenspace
(this being where the hard work comes in), and for a linear transformation
with a single generalized eigenspace, rational and Jordan canonical forms
are very closely related.
Here is how our argument proceeds. In Section 5.1 we introduce the
minimum and characteristic polynomials of a linear transformation TW
V! V, and in particular we derive Theorem 5.1.11, which is both very
useful and important in its own right. In Section 5.2 we consider T-invariant
subspaces Wof Vand the map Tinduced by Ton the quotient space
V=W . In Section 5.3 we prove Theorem 5.3.1, giving the relationship be-
tween the minimum and characteristic polynomials of T, and as a corollary
derive the Cayley-Hamilton Theorem. (It is often thought that this theo-
rem is a consequence of Jordan canonical form, but, as you will see, it is
actually prior to Jordan canonical form.) In Section 5.4 we return to invari-
ant subspaces, and prove the key technical results Theorem 5.4.6 and The-
orem 5.4.10, which tell us when T-invariant subspaces have T-invariant
complements. Using this work, we quickly derive rational canonical form
in Section 5.5, and then we use rational canonical form to quickly derive
Jordan canonical form in Section 5.6. Because of the importance and utility
of this result, in Section 5.7 we give a well-illustrated algorithm for find-
ing the Jordan canonical form of T, and a Jordan basis of V, providing
we can factor the characteristic polynomial of T. In the last two sections
of this chapter, Section 5.8 and Section 5.9, we apply our results to derive
additional structural information on linear transformations.
“book” — 2011/3/4 — 17:06 — page 111 — #125
i
i
i
i
i
i
i
i
5.1. Annihilating, minimum, characteristic polynomials 111
5.1 Annihilating, minimum, and
characteristic polynomials
Let Vbe a finite-dimensional vector space and let TWV!Vbe a linear
transformation. In this section we introduce three sorts of polynomials asso-
ciated to T: First, for any nonzero vector v2V, we have its T-annihilator
mT;v.x/. Then we have the minimum polynomial of T,mT.x/, and the
characteristic polynomial of T,cT.x/. All of these polynomials will play
important roles in our development.
Theorem 5.1.1. Let Vbe a vector space of dimension nand let v2Vbe a
vector, v¤0. Then there is a unique monic polynomial mT;v.x/ of lowest
degree with mT;v .T/.v/ D0. This polynomial has degree at most n.
Proof. Consider the vectors fv; T.v/; : : : ; Tn.v/g. This is a set of nC1
vectors in an n-dimensional vector space and so is linearly dependent, i.e.,
there are a0; : : : ; annot all zero such that a0vCa1T.v/CCanTn.v/ D
0. Thus if p.x/ DanxnC  C a0,p.x/ is a nonzero polynomial with
p.T/.v/ D0. Now JD ff .x/ 2FŒxjf .T/.v/ D0gis a nonzero ideal
in FŒx(if f .T/.v/ D0and g.T/.v/ D0, then .f Cg/.T/.v/ D0and
if f .T/.v/ D0then .cf /.T/.v/ D0, and p.x/ 2J, so Jis a nonzero
ideal.) Hence by Lemma A.1.8 there is a unique monic polynomial mT;v .x/
of lowest degree in J.
Definition 5.1.2. The polynomial mT;v.x/ is called the T-annihilator
of the vector v.Þ
Example 5.1.3. Let Vhave basis fv1; : : : ; vngand define Tby T.v1/D
0and T.vi/Dvi1for i > 1. Then mT;vk.x/ Dxkfor kD1; : : : ; n.
This shows that mT;v.x/ can have any degree between 1and n.Þ
Example 5.1.4. Let VDrF1and let LWV!Vbe left shift. Consider
v2V,v¤0. For some k,vis of the form .a1; a2; : : : ; ak; 0; 0; : : :/ with
ak¤0, and then mT;v.x/ Dxk. If RWV!Vis right shift, then for any
vector v¤0, the set fv; R.v/; R2.v/; : : :gis linearly independent and so
there is no nonzero polynomial p.x/ with p.T/.v/ D0.Þ
Theorem 5.1.5. Let Vbe a vector space of dimension n. Then there is a
unique monic polynomial mT.x/ of lowest degree with mT.T/.v/ D0for
every v2V. This polynomial has degree at most n2.
Proof. Choose a basis BD fv1; : : : ; vngof V. For each vk2Bwe have its
T-annihilator pk.x/ DmT;vk.x/. Let q.x/ be the least common multiple
“book” — 2011/3/4 — 17:06 — page 112 — #126
i
i
i
i
i
i
i
i
112 Guide to Advanced Linear Algebra
of p1.x/; : : : ; pn.x/. Since pk.x/ divides q.x/ for each k,q.T/.vk/D0.
Hence q.T/.v/ D0for every v2Vby Lemma 1.2.23. If r .x/ is any poly-
nomial with r.x/ not divisible by pk.x/ for some k, then for that value of
kwe have r.T/.vk/¤0. Thus mT.x/ Dq.x/ is the desired polynomial.
mT.x/ divides the product p1.x/p2.x/ pn.x/, of degree n2, so mT.x/
has degree at most n2.
Definition 5.1.6. The polynomial mT.x/ is the minimum polynomial
of T.Þ
Remark 5.1.7. As we will see in Corollary 5.1.12, mT.x/ has degree at
most n.Þ
Example 5.1.8. Let Vbe n-dimensional with basis fv1; : : : ; vngand for
any fixed value of kbetween 1and n, define TWV!Vby T.v1/D0,
T.vi/Dvi1for 2ik,T.vi/D0for i > k. Then mT.x/ Dxk.
This shows that mT.x/ can have any degree between 1and n(compare
Example 5.1.3). Þ
Example 5.1.9. Returning to Example 5.1.4, we see that if TDR,
given any nonzero vector v2Vthere is no nonzero polynomial f .x/
with f .T/.v/ D0, so there is certainly no nonzero polynomial f .x/ with
f .T/D0. Thus Tdoes not have a minimum polynomial. If TDL, then
mT;v.x/ exists for any nonzero vector v2V, i.e., for every nonzero vector
v2Vthere is a polynomial fx.x/ with fv.T/.v/ D0. But there is no
single polynomial f .x/ with f .T/.v/ D0for every v2V, so again T
does not have a minimum polynomial. (Such a polynomial would have to
be divisible by xkfor every positive integer k.) Let TWV!Vbe defined
by T.a1; a2; a3; a4; : : :/ D.a1; a2;a3; a4; : : :/. If v0D.a1; a2; : : :/
with aiD0whenever iis odd, then T.v0/Dv0so mT;v0.x/ Dx1. If
v1D.a1; a2; : : :/ with aiD0whenever iis even, then T.v1/D v1
so mT;v1.x/ DxC1. If vis not of one of these two special forms,
then mT;v.x/ Dx21. Thus Thas a minimum polynomial, namely
mT.x/ Dx21.Þ
Lemma 5.1.10. Let Vbe a vector space and let TWV!Vbe a linear
transformation. Let v1; : : : ; vk2Vwith T-annihilators pi.x/ DmT;vi.x/
for iD1; : : : ; k and suppose that p1.x/; : : : ; pk.x/ are pairwise relatively
prime. Let vDv1C  C vkhave T-annihilator p.x/ DmT;v.x/. Then
p.x/ Dp1.x/ pk.x/.
Proof. We proceed by induction on k. The case kD1is trivial. We do the
crucial case kD2, and leave k > 2 to the reader.
“book” — 2011/3/4 — 17:06 — page 113 — #127
i
i
i
i
i
i
i
i
5.1. Annihilating, minimum, characteristic polynomials 113
Let vDv1Cv2where p1.T/.v1/Dp2.T/.v2/D0with p1.x/ and
p2.x/ relatively prime. Then there are polynomials q1.x/ and q2.x/ with
p1.x/q1.x/ Cp2.x/q2.x/ D1, so
vDIvDp1.T/q1.T/Cp2.T/q2.T/v1Cv2
Dp2.T/q2.T/v1Cp1.T/q1.T/v2
Dw1Cw2:
Now
p1.T/w1Dp1.T/p2.T/q2.T/v1
Dp2.T/q2.T/p1.T/v1D0;
so w12Ker.p1.T// and similarly w22Ker.p2.T//.
Let r.x/ be any polynomial with r .T/.v/ D0.
Now vDw1Cw2so p2.T/.v/ Dp2.T/.w1Cw2/Dp2.T/.w1/,
so 0Dr.T/.v/ gives 0Dr.T/p2.T/q2.T/.w1/. Also, p1.T/.w1/D0
so we certainly have 0Dr.T/p1.T/q1.T/.w1/. Hence
0Dr.T/p1.T/q1.T/Cp2.T/q2.T/w1
Dr.T/Iw1
Dr.T/w1
(as p1.x/q1.x/ Cp2.x/q2.x/ D1), and similarly 0Dr .T/.w2/.
Now
r.T/.w1/Dr.T/.p2.T/q2.T//.v1/:
But p1.x/ is the T-annihilator of v1, so by definition p1.x/ divides
r1.x/.p2.x/q2.x//. From 1Dp1.x/q1.x/Cp2.x/q2.x/ we see that p1.x/
and p2.x/q2.x/ are relatively prime, so by Lemma A.1.21, p1.x/ divides
r.x/. Similarly, considering r .T/.w2/, we see that p2.x/ divides r.x/. By
hypothesis p1.x/ and p2.x/ are relatively prime, so by Corollary A.1.22,
p1.x/p2.x/ divides r.x/.
On the other hand, clearly
.p1.T/p2.T//.v/ D.p1.T/p2.T//.v1Cv2/D0:
Thus p1.x/p2.x/ is the T-annihilator of v, as claimed.
Theorem 5.1.11. Let Vbe a finite-dimensional vector space and let TW
V!Vbe a linear transformation. Then there is a vector v2Vsuch that
the T-annihilator mT;v.x/ of vis equal to the minimum polynomial mT.x/
of T.
“book” — 2011/3/4 — 17:06 — page 114 — #128
i
i
i
i
i
i
i
i
114 Guide to Advanced Linear Algebra
Proof. Choose a basis BD fv1; : : : ; vngof V. As we have seen in Theo-
rem 5.1.5, the minimum polynomial mT.x/ is the least common multiple of
the T-annihilators mT;v1.x/; : : : ; mT;vn.x/. Factor mT.x/ Dp1.x/f1
pk.x/fkwhere p1.x/; : : : ; pk.x/ are distinct irreducible polynomials, and
hence p1.x/f1; : : : ; pk.x/fkare pairwise relatively prime polynomials. For
each ibetween 1and k,pi.x/fimust appear as a factor of mT;vj.x/ for
some j. Write mT;vj.x/ Dpi.x/fiq.x/. Then the vector uiDq.T/.vj/
has T-annihilator pi.x/fi. By Lemma 5.1.10, the vector vDu1CCuk
has T-annihilator p1.x/f1pk.x/fkDmT.x/.
Not only is Theorem 5.1.11 interesting in itself, but it plays a key role in
future developments: We will often pick an element v2Vwith mT;v .x/ D
mT.x/, and proceed from there.
Here is an immediate application of this theorem.
Corollary 5.1.12. Let TWV!Vwhere Vis a vector space of dimension
n. Then mT.x/ is a polynomial of degree at most n.
Proof. mT.x/ DmT;v.x/ for some v2V. But for any v2V,mT;v.x/
has degree at most n.
We now define a second very important polynomial associated to a lin-
ear transformation from a finite-dimensional vector space to itself.
We need a preliminary lemma.
Lemma 5.1.13. Let Aand Bbe similar matrices. Then det.xI A/ D
det.xI B/ (as polynomials in FŒx).
Proof. If BDPAP 1then
xI BDx.PIP 1/.PAP 1/
DP .xI /P 1PAP 1DP .xI A/P 1;
so
det.xI B/ Ddet.P.xI A/P 1/Ddet.P / det.xI A/ det.P 1/
Ddet.P / det.xI A/ det.P /1Ddet.xI A/:
Definition 5.1.14. Let Abe a square matrix. The characteristic poly-
nomial cA.x/ of Ais the polynomial cA.x/ Ddet.xI A/. Let Vbe a
finite-dimensional vector space and let TWV!Vbe a linear transfor-
mation. The characteristic polynomial cT.x/ is the polynomial defined as
follows. Let Bbe any basis of Vand let Abe the matrix ADŒTB. Then
cT.x/ Ddet.xI A/.Þ
“book” — 2011/3/4 — 17:06 — page 115 — #129
i
i
i
i
i
i
i
i
5.1. Annihilating, minimum, characteristic polynomials 115
Remark 5.1.15. We see from Theorem 2.3.14 and Lemma 5.1.13 that
cT.x/ is well defined, i.e., independent of the choice of basis B.Þ
We now introduce a special kind of matrix, whose importance we will
see later.
Definition 5.1.16. Let f .x/ DxnCan1xn1C  C a1xCa0be
a monic polynomial in FŒxof degree n1. Then the companion matrix
C.f .x// of f .x/ is the n-by-nmatrix
Cf .x/D2
6
6
6
6
6
4
an11 0  0
an20 1  0
:
:
::::
a10 0  1
a00 0  0
3
7
7
7
7
7
5
:
(The 1s are immediately above the diagonal.) Þ
Theorem 5.1.17. Let f .x/ DxnCan1xn1C  C a0be a monic
polynomial and let ADC.f .x// be its companion matrix. Let VDFn
and let TDTAWV!Vbe the linear transformation T.v/ DAv.
Let vDenbe the nth standard basis vector. Then the subspace Wof V
defined by WD fg.T/.v/ jg.x/ 2FŒxgis V. Furthermore, mT.x/ D
mT;v.x/ Df .x/.
Proof. We see that T.en/Den1,T2.en/DT.en1/Den2, and in
general Tk.en/Denkfor kn1. Thus the subspace Wof Vcontains
the subspace spanned by fTn1.v/; : : : ; T.v/; vg D fe1; : : : ; en1; eng,
which is all of V. We also see that this set is linearly independent, and
hence that there is no nonzero polynomial p.x/ of degree less than or equal
to n1with p.T/.v/ D0. From
Tn.v/ DT.e1/D an1e1an2e2 a1en1a0en
D an1Tn1.v/ an2Tn2.v/     a1T.v/ a0v
we see that
0DanTn.v/ C  C a1T.v/ Ca0v;
i.e., f .T/.v/ D0. Hence mT;v.x/ Df .x/.
On the one hand, mT;v .x/ divides mT.x/. On the other hand, since
every w2Vis wDg.T/.v/ for some polynomial g.x/,
mT;v.T/.w/ DmT;v.T/g.T/.v/ Dg.T/mT;v.T/.v/ Dg.T/.0/ D0;
“book” — 2011/3/4 — 17:06 — page 116 — #130
i
i
i
i
i
i
i
i
116 Guide to Advanced Linear Algebra
for every w2V, and so mT.x/ divides mT;v.x/. Thus
mT.x/ DmT;v.x/ Df .x/:
Lemma 5.1.18. Let f .x/ DxnCan1xn1CCa0be a monic polyno-
mial of degree n1and let ADC.f .x// be its companion matrix. Then
cA.x/ Ddet.xI A/ Df .x/.
Proof. We proceed by induction. If nD1then ADC.f .x// DŒa0so
xI ADŒx Ca0has determinant xCa0.
Assume the theorem is true for kDn1and consider kDn. We
compute the determinant by expansion by minors of the last row
det 2
6
6
6
6
6
4
xCan11 0  0
an2x1 0
:
:
::
:
::::
a101
a00 x
3
7
7
7
7
7
5
D.1/nC1a0det
2
6
6
6
6
6
6
4
1 0  0
x1 0
0 x :::0
:
:
::
:
:
0 x 1
3
7
7
7
7
7
7
5Cxdet 2
6
6
6
6
6
4
xCan11 0  0
an2x1 0
:
:
:::::
:
:
a20 1
a10 x
3
7
7
7
7
7
5
D.1/nC1a0.1/n1Cxxn1Can1xn2C  C a2xCa1
DxnCan1xn1C  C a1xCa0Df .x/:
5.2 Invariant subspaces
and quotient spaces
Let Vbe a vector space and let TWV!Vbe a linear transformation. A
T-invariant subspace of Vis a subspace Wof Vsuch that T.W / W. In
this section we will see how to obtain invariant subspaces and we will see
that if Wis an invariant subspace of V, then we can obtain in a natural way
the “induced” linear transformation TWV =W !V=W . (Recall that V=W
is the quotient of the vector space Vby the subspace W. We can form V =W
for any subspace Wof V, but in order for Tto be defined we need Wto be
an invariant subspace.)
“book” — 2011/3/4 — 17:06 — page 117 — #131
i
i
i
i
i
i
i
i
5.2. Invariant subspaces and quotient spaces 117
Definition 5.2.1. Let TWV!Vbe a linear transformation. A sub-
space Wof Vis T-invariant if T.W / W, i.e., if T.v/ 2Wfor every
v2W.Þ
Remark 5.2.2. If Wis a T-invariant subspace of V, then for any poly-
nomial p.x/,p.T/.W / W.Þ
Lemma 5.2.4 and Lemma 5.2.6 give two basic ways of obtaining T-
invariant subspaces.
Definition 5.2.3. Let TWV!Vbe a linear transformation. Let
BD fv1; : : : ; vkgbe a set of vectors in V. The T-span of Bis the subspace
WD(k
X
iD1
pi.T/viˇˇpi.x/ 2FŒx):
In this situation Bis said to T-generate W.Þ
Lemma 5.2.4. In the situation of Definition 5.2.3, the T-span Wof Bis a
T-invariant subspace of Vand is the smallest T-invariant subspace of V
containing B.
In case Bconsists of a single vector we have the following:
Lemma 5.2.5. Let Vbe a finite-dimensional vector space and let TWV!
Vbe a linear transformation. Let w2Vand let Wbe the subspace of V
T-generated by w. Then the dimension of Wis equal to the degree of the
T-annihilator mT;w .x/ of w.
Proof. It is easy to check that mT;w .x/ has degree kif and only if
fw; T.w/; : : : ; Tk1.w/gis a basis of W.
Lemma 5.2.6. Let TWV!Vbe a linear transformation and let p.x/ 2
FŒxbe any polynomial. Then
Ker p.T/D˚v2Vjp.T/.v/ D0
is a T-invariant subspace of V.
Proof. If v2Ker.p.T//, then
p.T/.T.v// DT.p.T//.v/ DT.0/ D0:
Now we turn to quotients and induced linear transformations.
“book” — 2011/3/4 — 17:06 — page 118 — #132
i
i
i
i
i
i
i
i
118 Guide to Advanced Linear Algebra
Lemma 5.2.7. Let TWV!Vbe a linear transformation, and let WV
be a T-invariant subspace. Then TWV =W !V=W given by T.vCW / D
T.v/ CWis a well-defined linear transformation.
Proof. Recall from Lemma 1.5.11 that V =W is the set of distinct affine
subspaces of Vparallel to W, and from Proposition 1.5.4 that each such
subspace is of the form vCWfor some element vof V. We need to check
that the above formula gives a well-defined value for T.v CW /. Let vand
v0be two elements of Vwith vCWDv0CW. Then vv0Dw2W, and
then T.v/ T.v0/DT.v v0/DT.w/ Dw02W, as we are assuming
that Wis T-invariant. Hence
T.v CW / DT.v/ CWDT.v0/CWDT.v0CW /:
It is easy to check that Tis linear.
Definition 5.2.8. In the situation of Lemma 5.2.7, we call TWV=W !
V=W the quotient linear transformation. Þ
Remark 5.2.9. If WV!V =W is the canonical projection (see Defi-
nition 1.5.12), then Tis given by T..v// D.T.v//.Þ
When Vis a finite-dimensional vector space, we can recast our discus-
sion in terms of matrices.
Theorem 5.2.10. Let Vbe a finite-dimensional vector space and let W
be a subspace of V. Let B1D fv1; : : : ; vkgbe a basis of Wand ex-
tend B1to BD fv1; : : : ; vk; vkC1; : : : ; vng, a basis of V. Let B2D
fvkC1; : : : ; vng. Let WV!V=W be the quotient map and let B2D
f.vkC1/; : : : ; .vn/g, a basis of V =W .
Let TWV!Vbe a linear transformation. Then Wis a T-invariant
subspace if and only if ŒTBis a block upper triangular matrix of the form
ŒTBDA B
0 D;
where Ais k-by-k.
In this case, let TWV =W !V =W be the quotient linear transforma-
tion. Then
ŒTBDD:
“book” — 2011/3/4 — 17:06 — page 119 — #133
i
i
i
i
i
i
i
i
5.3. Characteristic and minimum polynomials 119
Lemma 5.2.11. In the situation of Lemma 5.2.7, let Vbe finite dimensional,
let v2V=W be arbitrary, and let v2Vbe any element with .v/ Dv.
Then mT;v.x/ divides mT;v.x/.
Proof. We have vDvCW. Then
mT;v.T/.v/ DmT;v.T/.v CW / DmT;v .T/.v/ CWD0CWD0;
where 0D0CWis the 0vector in V=W .
Thus mT;v.x/ is a polynomial with mT;v .v/ D0. But mT;v .x/ divides
any such polynomial.
Corollary 5.2.12. In the situation of Lemma 5.2.11, the minimum polyno-
mial mT.x/ of Tdivides the minimum polynomial mT.x/ of T.
Proof. It easily follows from Remark 5.2.9 that for any polynomial p.x/,
p.T/..v// D.p.T/.v//. In particular, this is true for p.x/ DmT.x/.
Any v2V=W is vD.v/ for some v2V, so
mT.T/.v/ D.mT.T/.v// D.0/ D0:
Thus mT.T/.v/ D0for every v2V =W , i.e., mT.T/D0. But mT.x/
divides any such polynomial.
5.3 The relationship between
the characteristic and
minimum polynomials
In this section we derive the very important Theorem 5.3.1, which gives the
relationship between the minimum polynomial mT.x/ and the character-
istic polynomial cT.x/ of a linear transformation TWV!V, where V
is a finite-dimensional vector space over a general field F. (We did this in
the last chapter for Falgebraically closed.) The key result used in proving
this theorem is Theorem 5.1.11. As an immediate consequence of Theo-
rem 5.3.1 we have Corollary 5.3.4, the Cayley-Hamilton theorem: For any
such T,cT.T/D0.
Theorem 5.3.1. Let Vbe a finite-dimensional vector space and let TW
V!Vbe a linear transformation. Let mT.x/ be the minimum polynomial
of Tand let cT.x/ be the characteristic polynomial of T. Then
“book” — 2011/3/4 — 17:06 — page 120 — #134
i
i
i
i
i
i
i
i
120 Guide to Advanced Linear Algebra
(1) mT.x/ divides cT.x/.
(2) Every irreducible factor of cT.x/ is an irreducible factor of mT.x/.
Proof. We proceed by induction on nDdim.V /. Let mT.x/ have degree
kn. Let v2Vbe a vector with mT;v.x/ DmT.x/. (Such a vector v
exists by Theorem 5.1.11.) Let W1be the T-span of v. If we let vkDv
and vkiDTi.v/ for ik1then, as in the proof of Theorem 5.1.17,
B1D fv1; : : : ; vkgis a basis for W1and ŒTjW1B1DC.mT.x//, the
companion matrix of mT.x/.
If kDnthen W1DV, so ŒTB1DC.mT.x// has characteristic
polynomial mT.x/. Thus cT.x/ DmT.x/ and we are done.
Suppose k < n. Then W1has a complement V2, so VDW1˚V2. Let
B2be a basis for V2and BDB1[B2a basis for V. Then ŒTBis a matrix
of the form
ŒTBDA B
0 D
where ADC.mT.x//. (The 0block in the lower left is due to the fact that
W1is T-invariant. If V2were T-invariant then we would have BD0, but
that is not necessarily the case.) We use the basis Bto compute cT.x/.
cT.x/ Ddet xI ŒTBDdet xI AB
0 xI D
Ddet.xI A/ det.xI D/
DmT.x/ det.xI D/;
so mT.x/ divides cT.x/.
Now we must show that mT.x/ and cT.x/ have the same irreducible
factors. We proceed similarly by induction. If mT.x/ has degree nthen
mT.x/ DcT.x/ and we are done. Otherwise we again have a direct sum
decomposition VDW1˚V2and a basis Bwith
ŒTBDA B
0 D:
In general we cannot consider the restriction TjV2, as V2may not be
invariant. But we can (and will) consider TWV =W1!V=W1. If we let
BD.B/, then by Theorem 5.2.10,
TBDŒD:
“book” — 2011/3/4 — 17:06 — page 121 — #135
i
i
i
i
i
i
i
i
5.3. Characteristic and minimum polynomials 121
By the inductive hypothesis, mT.x/ and cT.x/ have the same irre-
ducible factors. Since mT.x/ divides cT.x/, every irreducible factor of
mT.x/ is certainly an irreducible factor of cT.x/. We must show the other
direction. Let p.x/ be an irreducible factor of cT.x/. As in the first part of
the proof,
cT.x/ Ddet.xI A/ det.xI D/ DmT.x/cT.x/:
Since p.x/ is irreducible, it divides one of the factors. If p.x/ divides the
first factor mT.x/, we are done. Suppose p.x/ divides the second factor. By
the inductive hypothesis, p.x/ divides mT.x/. By Corollary 5.2.12, mT.x/
divides mT.x/. Thus p.x/ divides mT.x/, and we are done.
Corollary 5.3.2. In the situation of Theorem 5.3.1, let mT.x/ Dp1.x/e1
pk.x/ekfor distinct irreducible polynomials p1.x/; : : : ; pk.x/, and posi-
tive integers e1; : : : ; ek. Then cT.x/ Dp1.x/f1pk.x/fkfor integers
f1; : : : ; fkwith fieifor each i.
Proof. This is just a concrete restatement of Theorem 5.3.1.
The following special case is worth pointing out explicitly.
Corollary 5.3.3. Let Vbe an ndimensional vector space and let TWV!
Vbe a linear transformation. Then Vis T-generated by a single element if
and only if mT.x/ is a polynomial of degree n, or, equivalently, if and only
if mT.x/ DcT.x/.
Proof. For w2V, let Wbe the subspace of VT-generated by w. Then the
dimension of Wis equal to the degree of mT;w .x/, and mT;w .x/ divides
mT.x/. Thus if mT.x/ has degree less than n,Whas dimension less than
nand so WV.
By Theorem 5.1.11, there is a vector v02Vwith mT;v0.x/ DmT.x/.
Thus if mT.x/ has degree n, the subspace V0of Vgenerated by v0has
dimension nand so V0DV.
Since mT.x/ and cT.x/ are both monic polynomials, and mT.x/ di-
vides cT.x/ by Theorem 5.3.1, then mT.x/ DcT.x/ if and only if they
have the same degree. But cT.x/ has degree n.
Theorem 5.3.1 has a famous corollary, originally proved by completely
different methods.
“book” — 2011/3/4 — 17:06 — page 122 — #136
i
i
i
i
i
i
i
i
122 Guide to Advanced Linear Algebra
Corollary 5.3.4 (Cayley-Hamilton Theorem).(1) Let Vbe a finite-dimen-
sional vector space and let TWV!Vbe a linear transformation with
characteristic polynomial cT.x/. Then cT.T/D0.
(2) Let Abe an n-by-nmatrix and let cA.x/ be its characteristic poly-
nomial cA.x/ Ddet.xI A/. Then cA.A/ D0.
Proof. (1) mT.T/D0and mT.x/ divides cT.x/, so cT.T/D0.
(2) This is a translation of (1) into matrix language. (Let TDTA.)
Remark 5.3.5. The minimum polynomial mT.x/ has appeared more
prominently than the characteristic polynomial cT.x/ so far. As we shall
see, mT.x/ plays a more important role in analyzing the structure of T
than cT.x/ does. However, cT.x/ has the very important advantage that
it can be calculated without having to consider the structure of T. It is a
determinant, and we have methods for calculating determinants. Þ
5.4 Invariant subspaces and
invariant complements
We have stressed the difference between subspaces and quotient spaces. If
Vis a vector space and Wis a subspace, then the quotient space V =W is
not a subspace of V. But Walways has a complement W0(though except in
trivial cases, W0is not unique), VDW˚W0, and if WV!V =W is the
canonical projection, then the restriction =W gives an isomorphism from
W0to V =W . (On the one hand this can be very useful, but on the other hand
it makes it easy to confuse the quotient space V =W with the subspace W0.)
Once we consider T-invariant subspaces, the situation changes markedly.
Given a vector space V, a linear transformation TWV!V, and a T-
invariant subspace W, then, as we have seen in Lemma 5.2.7, we obtain
from Tin a natural way a linear transformation Ton the quotient space
V=W . However, it is not in general the case that Whas a T-invariant com-
plement W0.
This section will be devoted investigating the question of when a T-
invariant subspace Whas a T-invariant complement W0. We will see two
situations in which this is always the case—Theorem 5.4.6, whose proof
is relatively simple, and Theorem 5.4.10, whose proof is more involved.
Theorem 5.4.10 is the key result we will need in order to develop rational
canonical form, and Theorem 5.4.6 is the key result we will need in order
to further develop Jordan canonical form.
“book” — 2011/3/4 — 17:06 — page 123 — #137
i
i
i
i
i
i
i
i
5.4. Invariant subspaces and invariant complements 123
Definition 5.4.1. Let TWV!Vbe a linear transformation. Then
VDW1˚  ˚ Wkis a T-invariant direct sum if VDW1˚  ˚ Wk
is the direct sum of W1; : : : ; Wkand each Wiis a T-invariant subspace. If
VDW1˚W2is a T-invariant direct sum decomposition, then W2is a
T-invariant complement of W1.Þ
Example 5.4.2. (1) Let Vbe a 2-dimensional vector space with basis
fv1; v2gand let TWV!Vbe defined by T.v1/D0,T.v2/Dv2. Then
W1DKer.T/D fc1v1jc12Fgis a T-invariant subspace, and it has
T-invariant complement W2DKer.TI/D fc2v2jc22Fg.
(2) Let Vbe as in part (1) and let TWV!Vbe defined by T.v1/D0,
T.v2/Dv1. Then W1DKer.T/D fc1v1jc12Fgis again a T-invariant
subspace, but it does not have a T-invariant complement. Suppose W2is
any T-invariant subspace with VDW1CW2. Then W2has a vector of the
form c1v1Cc2v2for some c2¤0. Then T.c1v1Cc2v2/Dc2v12W2, so
W2contains the subspace spanned by fc2v1; c1v1Cc2v2g, i.e., W2DV,
and then Vis not the direct sum of W1and W2. (Instead of W1\W2D f0g,
as required for a direct sum, W1\W2DW1.) Þ
We now consider a more elaborate situation and investigate invariant
subspaces, complements, and induced linear transformations.
Example 5.4.3. Let g.x/ and h.x/ be two monic polynomials that are
not relatively prime and let f .x/ Dg.x/h.x/. (For example, we could
choose an irreducible polynomial p.x/ and let g.x/ Dp.x/iand h.x/ D
p.x/jfor positive integers iand j, in which case f .x/ Dp.x/kwhere
kDiCj.)
Let Vbe a vector space and TWV!Va linear transformation with
mT.x/ DcT.x/ Df .x/.
Let v02Vbe an element with mT;v0.x/ Df .x/, so that Vis T-
generated by the single element v0. Let W1Dh.T/.V /. We claim that W1
does not have a T-invariant complement. We prove this by contradiction.
Suppose that VDW1˚W2where W2is also T-invariant. Denote the
restrictions of Tto W1and W2by T1and T2respectively. First we claim
that mT1.x/ Dg.x/.
If w12W1, then w1Dh.T/.v1/for some v12V. But v0T-generates
V, so v1Dk.T/.v0/for some polynomial k.T/, and then
g.T/w1Dg.T/h.T/v1Dg.T/h.T/k.T/v0
Dk.T/g.T/h.T/v0
Dk.T/f .T/v0Dk.T/.0/ D0:
“book” — 2011/3/4 — 17:06 — page 124 — #138
i
i
i
i
i
i
i
i
124 Guide to Advanced Linear Algebra
Thus g.T/.w1/D0for every w12W1, so mT1.x/ divides g.x/. If we
let w0Dh.T/.v0/and set k.x/ DmT1;w0.x/, then 0Dk.T/.w0/D
k.T/h.T/.v0/, so mT;v0.x/ Dg.x/h.x/ divides k.x/h.x/. Thus g.x/
divides k.x/ DmT1;w0.x/, which divides mT1.x/.
Next we claim that mT2.x/ divides h.x/. Let w22W2. Then h.T/.w2/
2W1(as h.T/.v/ 2W1for every v2V). Since W2is T-invariant,
h.T/.w2/2W2, so h.T/.w2/2W1\W2. But W1\W2D f0gby the
definition of a direct sum, so h.T/.w2/D0for every w22W2, and hence
mT2.x/ divides h.x/. Set h1.x/ DmT2.x/.
If VDW1˚W2, then v0Dw1Cw2for some w12W1,w22W2.
Let k.x/ be the least common multiple of g.x/ and h.x/. Then k.T/.v0/D
k.T/.w1Cw2/Dk.T/.w1/Ck.T/.w2/D0C0as mT1.x/ Dg.x/
divides k.x/ and mT2.x/ Dh1.x/ divides h.x/, which divides k.x/. Thus
k.x/ is divisible by f .x/ DmT;v0.x/. But we chose g.x/ and h.x/ to not
be relatively prime, so their least common multiple k.x/ is a proper factor
of their product f .x/, a contradiction. Þ
Example 5.4.4. Suppose that g.x/ and h.x/ are relatively prime, and
let f .x/ Dg.x/h.x/. Let Vbe a vector space and let TWV!Va
linear transformation with mT.x/ DcT.x/ Df .x/. Let v02Vwith
mT;v0.x/ DmT.x/, so that Vis T-generated by v0. Let W1Dh.T/.V /.
We claim that W2Dg.T/.V / is a T-invariant complement of W1.
First we check that W1\W2D f0g. An argument similar to that in
the previous example shows that if w2W1, then mT1;w .x/ divides g.x/,
and that if w2W2, then mT2;w .x/ divides h.x/. Hence if w2W1\W2,
mT;w .x/ divides both g.x/ and h.x/, and thus divides their gcd. These two
polynomials were assumed to be relatively prime, so their gcd is 1. Hence
1w D0, i.e., wD0.
Next we show that we can write any vector in Vas a sum of a vector in
W1and a vector in W2. Since v0T-generates V, it suffices to show that we
can write v0in this way. Now g.x/ and h.x/ are relatively prime, so there
are polynomials r.x/ and s.x/ with g.x/r.x/ Cs.x/h.x/ D1. Then
v0D1v0Dh.T/s.T/Cg.T/r.T/v0
Dh.T/s.T/v0Cg.T/r.T/v0Dw1Cw2
where
w1Dh.T/.s.T/.v0// 2h.T/.V / DW1
and
w2Dg.T/.r.T/.v0// 2g.T/.V / DW2:Þ
“book” — 2011/3/4 — 17:06 — page 125 — #139
i
i
i
i
i
i
i
i
5.4. Invariant subspaces and invariant complements 125
Example 5.4.5. Let g.x/ and h.x/ be arbitrary polynomials and let
f .x/ Dg.x/h.x/. Let Vbe a vector space and TWV!Va lin-
ear transformation with mT.x/ DcT.x/ Df .x/. Let v02Vwith
mT;v0.x/ DmT.x/ so that Vis T-generated by v0.
Let W1Dh.T/.V /. Then we may form the quotient space V1D
V=W1, with the quotient linear transformation TWV1!V1, and 1W
V!V1. Clearly V1is T-generated by the single element v1D1.v0/.
(Since any v2Vcan be written as vDk.T/.v0/for some polyno-
mial k.x/, then vCW1Dk.T/.v0/CW1.) We claim that mT;v1.x/ D
cT;v1.x/ Dh.x/. We see that h.T/.v1/Dh.T/.v0/CW1D0CW1as
h.T/.v0/2W1. Hence mT;v1.x/ Dk.x/ divides h.x/. Now k.T/.v1/D
0CW1, i.e., k.T/.v0/2W1Dh.T/.V /, so k.T/.v0/Dh.T/.u1/for
some u12V. Then g.T/k.T/.v0/Dg.T/h.T/.v1/Df .T/.u1/D0
since mT.x/ Df .x/. Then f .x/ Dg.x/h.x/ divides g.x/k.x/, so h.x/
divides k.x/. Hence mT;v1.x/ Dk.x/ Dh.x/.
The same argument shows that if W2Dg.T/.V / and V2DV=W2with
TWV2!V2the induced linear transformation then V2is T-generated
by the single element v2D2.v0/with mT;v2.x/ Dg.x/.Þ
We now come to the two most important ways we can obtain T-invariant
complements (or direct sum decompositions). Here is the first.
Theorem 5.4.6. Let Vbe a vector space and let TWV!Vbe a linear
transformation. Let Thave minimum polynomial mT.x/ and let mT.x/
factor as a product of pairwise relatively prime polynomials, mT.x/ D
p1.x/ pk.x/. For iD1; : : : ; k, let WiDKer.pi.T//. Then each Wiis
aT-invariant subspace and VDW1˚ ˚ Wk.
Proof. For any i, let wi2Wi. Then
pi.T/.T.wi// DT.pi.T/.wi// DT.0/ D0
so T.wi/2Wiand Wiis T-invariant.
For each i, let qi.x/ DmT.x/=pi.x/. Then fq1.x/; : : : ; qk.x/gis rela-
tively prime, so there are polynomials r1.x/; : : : ; rk.x/ with q1.x/r1.x/ C
 C qk.x/rk.x/ D1.
Let v2V. Then
vDIvDq1.T/r1.T/C  C qk.T/rk.T/.v/
Dw1C  C wk
“book” — 2011/3/4 — 17:06 — page 126 — #140
i
i
i
i
i
i
i
i
126 Guide to Advanced Linear Algebra
with wiDqi.T/ri.T/.v/. Furthermore,
pi.T/wiDpi.T/qi.T/ri.T/.v/
DmT.T/ri.T/.v/ D0as mT.T/D0
by the definition of the minimum polynomial mT.x/, and so wi2Wi.
To complete the proof we show that if 0Dw1CCwkwith wi2Wi
for each i, then w1D  D wkD0. Suppose iD1. Then 0Dw1CC
wkso
0Dq1.T/.0/ Dq1.T/.w1C  C wk/
Dq1.T/.w1/C0C  C 0Dq1.T/.w1/
as pi.x/ divides q1.x/ for every i > 1. Also p1.T/.w1/D0by definition.
Now p1.x/ and q1.x/ are relatively prime, so there exist polynomials f .x/
and g.x/ with f .x/p1.x/ Cg.x/q1.x/ D1. Then
w1DIw1D.f .T/p1.T/Cg.T/q1.T//.w1/
Df .T/.p1.T/.w1// Cg.T/.q1.T/.w1//
Df .T/.0/ Cg.T/.0/ D0C0D0:
Similarly, wiD0for each i.
As a consequence, we obtain the T-invariant subspaces of a linear trans-
formation TWV!V.
Theorem 5.4.7. Let TWV!Vbe a linear transformation and let
mT.x/ Dp1.x/e1pk.x/ekbe a factorization of the minimum poly-
nomial of Tinto powers of distinct irreducible polynomials. Let WiD
Ker.pi.T/ei/, so that VDW1˚˚Wk, a T-invariant direct sum decom-
position. For iD1; : : : ; k, let Uibe a T-invariant subspace of Wi(perhaps
UiD f0g). Then UDU1˚   ˚ Ukis a T-invariant subspace of V, and
every T-invariant subspace of Varises in this way.
Proof. We have VDW1˚˚Wk, by Theorem 5.4.6. It is easy to check
that any such Uis T-invariant. We show that these are all the T-invariant
subspaces.
Let Ube any T-invariant subspace of V. Let iWV!Wibe the
projection and let UiDi.U /. We claim that UDU1˚˚Uk. To show
that it suffices to show that UiUfor each i. Let ui2Ui. Then, by the
definition of Ui, there is an element uof Uof the form uDu1CCuiC
Cuk, for some elements uj2Uj,j¤i. Let qi.x/ DmT.x/=pi.x/ei.
“book” — 2011/3/4 — 17:06 — page 127 — #141
i
i
i
i
i
i
i
i
5.4. Invariant subspaces and invariant complements 127
Since pi.x/eiand qi.x/ are relatively prime, there are polynomials ri.x/
and si.x/ with ri.x/pi.x/eiCsi.x/qi.x/ D1. We have qi.T/.uj/D0for
j¤iand pi.T/ei.ui/D0. Then
uiD1uiD1ri.T/pi.T/eiui
Dsi.T/qi.T/ui
D0C:::Csi.T/qi.T/uiC:::C0
Dsi.T/qi.T/u1C:::Csi.T/qi.T/.ui/C:::Csk.T/qk.T/.ui/
Dsi.T/qi.T/u1C:::CuiC:::CukDsi.T/qi.T/.u/:
Since Uis T-invariant, si.T/qi.T/.u/ 2U, i.e., ui2U, as claimed.
Now we come to the second way in which we can obtain T-invariant
complements. The proof here is complicated, so we separate it into two
stages.
Lemma 5.4.8. Let Vbe a finite-dimensional vector space and let TWV!
Vbe a linear transformation. Let w12Vbe any vector with mT;w1.x/ D
mT.x/ and let W1be the subspace of VT-generated by w1. Suppose that
W1is a proper subspace of Vand that there is a vector v22Vsuch that
Vis T-generated by fw1; v2g. Then there is a vector w22Vsuch that
VDW1˚W2, where W2is the subspace of VT-generated by w2.
Proof. Observe that if V2is the subspace of Vthat is T-generated by v2,
then V2is a T-invariant subspace and, by hypothesis, every v2Vcan
be written as vDw0
1Cv00
2for some w0
12W1and some v00
22V2. Thus
VDW1CV2. However, there is no reason to conclude that W1and V2are
independent subspaces of V, and that may not be the case.
Our proof will consist of showing how to “modify” v2to obtain a vector
w2such that we can still write every v2Vas vDw0
1Cw0
2with w0
12W1
and w0
22W2, the subspace of VT-generated by w2, and with W1\W2D
f0g. We consider the vector v0
2Dv2Cwwhere wis any element of W1.
Then we observe that fw1; v0
2galso T-generates V. Our proof will consist
of showing that for the proper choice of w,w2Dv0
2Dv2Cwis an element
of Vwith W1\W2D f0g. Let Vhave dimension nand let mT.x/ be a
polynomial of degree k. Set jDnk. Then W1has basis
B1D fu1; : : : ; ukg D fTk1.w1/; : : : ; T.w1/; w1g:
By hypothesis, Vis spanned by
fw1;T.w1/; : : :g [ fv0
2;T.v0
2/; : : :g;
“book” — 2011/3/4 — 17:06 — page 128 — #142
i
i
i
i
i
i
i
i
128 Guide to Advanced Linear Algebra
so Vis also spanned by
fw1;T.w1/; : : : ; Tk1.w1/g [ fv0
2;T.v0
2/; : : :g:
We claim that
fw1;T.w1/; : : : ; Tk1.w1/g [ fv0
2;T.v0
2/; : : : ; Tj1.v0
2/g
is a basis for V. We see this as follows: We begin with the linearly indepen-
dent set fw1;:::;Tk1.w1/gand add v0
2;T.v0
2/; : : : as long as we can do so
and still obtain a linearly independent set. The furthest we can go is through
Tj1.v0
2/, as then we have kCjDnvectors in an n-dimensional vector
space. But we need to go that far, as once some Ti.v0
2/is a linear com-
bination of B1and fv0
2;:::;Ti1.v0
2/g, this latter set, consisting of kCi
vectors, spans V, so ij. (The argument for this uses the fact that W1is
T-invariant.) We then let
B0
2D fu0
kC1; : : : ; u0
ng D fTj1.v0
2/; : : : ; v0
2gand B0DB1[B0
2:
Then B0is a basis of V.
Consider Tj.u0
n/. It has a unique expression in terms of basis elements:
Tju0
nD
k
X
iD1
biuiC
j1
X
iD0ciu0
ni:
If we let p.x/ DxjCcj1xj1C  C c0, we have that
uDp.T/v0
2Dp.T/u0
nD
k
X
iD1
biui2W1:
Case I (incredibly lucky): uD0. Then Tj.v0
2/2V0
2, the subspace
T-spanned by v0
2, which implies that Ti.v0
2/2V0
2for every i, so V0
2is
T-invariant. Thus in this case we choose w2Dv0
2, so W2DV2,TD
W1˚W2, and we are done.
Case II (what we expect): u¤0. We have to do some work.
The key observation is that the coefficients bk; bk1; : : : ; bkjC1are all
0, and hence uDPkj
iD1biui. Here is where we crucially use the hypothesis
that mT;w1.x/ DmT.x/. We argue by contradiction. Suppose bm¤0for
some mkjC1, and let mbe the largest such index. Then
Tm1.u/ Dbmu1;Tm2.u/ Dbmu2Cbm1u1; ::::
“book” — 2011/3/4 — 17:06 — page 129 — #143
i
i
i
i
i
i
i
i
5.4. Invariant subspaces and invariant complements 129
Thus we see that
˚Tm1p.T/v0
2;Tm2p.T/v0
2; : : : ; p.T/v0
2;
Tj1v0
2;Tj2v0
2; : : : ; v0
2
is a linearly independent subset of V0
2, the subspace of VT-generated by
v0
2, and hence V0
2has dimension at least mCjkC1. That implies
that mT;v0
2.x/ has degree at least kC1. But mT;v0
2.x/ divides mT.x/ D
mT;w1.x/, which has degree k, and that is impossible.
We now set
wD 
k1
X
iD1
biuiCj
and w2Dv0
2Cw,
B1D fu1; : : : ; ukg D fTk1.w1/; : : : ; w1g(as before);
B2D fukC1; : : : ; ung D fTj1.w2/; : : : ; w2g;and BDB1[B2:
We then have
TjunDTjv0
2CwDTjv0
2CTjw
D
kj
X
iD1
biuiCTj
kj
X
iD1
biuiCj!
D
kj
X
iD1
biuiC
kj
X
iD1biuiD0
and we are back in Case I (through skill, rather than luck) and we are done.
Corollary 5.4.9. In the situation of Lemma 5.4.8, let nDdim Vand let
kDdeg mT.x/. Then n2k. Suppose that nD2k. If V2is the subspace
of VT-generated by v2, then VDW1˚V2.
Proof. From the proof of Lemma 5.4.8 we see that jDnkk. Also, if
nD2k, then jDk, so bk; bk1; : : : ; b1are all zero. Then uD0, and we
are Case I.
Theorem 5.4.10. Let Vbe a finite-dimensional vector space and let TW
V!Vbe a linear transformation. Let w12Vbe any vector with
“book” — 2011/3/4 — 17:06 — page 130 — #144
i
i
i
i
i
i
i
i
130 Guide to Advanced Linear Algebra
mT;w1.x/ DmT.x/ and let W1be the subspace of VT-generated by
w1. Then W1has a T-invariant complement W2, i.e., there is a T-invariant
subspace W2of Vwith VDW1˚W2.
Proof. If W1DVthen W2D f0gand we are done.
Suppose not. W2D f0gis a T-invariant subspace of Vwith W1\
W2D f0g. Then there exists a maximal T-invariant subspace W2of Vwith
W1\W2D f0g, either by using Zorns Lemma, or more simply by taking
such a subspace of maximal dimension. We claim that W1˚W2DV.
We prove this by contradiction, so assume W1˚W2V.
Choose an element v2of Vwith v2W1˚W2. Let V2be the subspace
T-spanned by v2and let U2DW2CV2. If W1\U2D f0gthen U2is
aT-invariant subspace of Vwith W1\U2D f0gand with U2W2,
contradicting the maximality of W2.
Otherwise, let V0DW1CU2. Then V0is a T-invariant subspace of
Vso we may consider the restriction T0of Tto V0,T0WV0!V0. Now
W2is a T0-invariant subspace of V0, so we may consider the quotient linear
transformation T0WV0=W2!V0=W2. Set XDV0=W2and SDT0. Let
WV0!Xbe the quotient map. Let w1D.w1/and let v2D.v2/.
Let Y1D.W1/Xand let Z2D.U2/X. We make several
observations: First, Y1and Z2are S-invariant subspaces of X. Second, Y1
is T-spanned by w1and Z2is T-spanned by v2, so that Xis T-spanned
by fw1; v2g. Third, since W1\W2D f0g, the restriction of to W1,W
W1!Y1, is 1-1.
Certainly mT0.x/ divides mT.x/ (as if p.T/.v/ D0for every v2V,
then p.T/.v/ D0for every v2V0) and we know that mS.x/ divides
mT0.x/ by Corollary 5.2.12. By hypothesis mT;w1.x/ DmT.x/, and, since
WW1!Y1is 1-1,mS;w1.x/ DmT;w1.x/. Since w12V0,mT;w1.x/
divides mT0.x/. Finally, mS;w1.x/ divides mS.x/. Putting these together,
we see that
mS;w1.x/ DmS.x/ DmT0.x/ DmT.x/ DmT;w1.x/:
We now apply Lemma 5.4.8 with TDS,VDX,w1Dw1, and
v2Dv2. We conclude that there is a vector, which we denote by w2, such
that XDY1˚Y2, where Y2is the subspace of Xgenerated by w2. Let w0
2
be any element of V0with .w0
2/Dw2, and let V0
2be the subspace of V0
T0-spanned by w0
2, or, equivalently, the subspace of VT-spanned by w0
2.
Then .V 0
2/DY2.
“book” — 2011/3/4 — 17:06 — page 131 — #145
i
i
i
i
i
i
i
i
5.4. Invariant subspaces and invariant complements 131
To finish the proof, we observe that
V0=W2DXDY1CZ2DY1˚Y2;
so, setting U0
2DW2CV0
2,
VDW1CV0
2CW2DW1CW2CV0
2DW1CU0
2:
Also, W1\U0
2D f0g. For if x2W1\U0
2,.x/ 2.W1/\.U 0
2/D
Y1\Y2D f0g(as .w2/D f0g). But if x2W1\U0
2, then x2W1, and
the restriction of to W1is 1-1, so .x/ D0implies xD0.
Hence V0DW1˚U0
2and U0
2W2, contradicting the maximality of
W2.
We will only need Theorem 5.4.10 but we can generalize it.
Corollary 5.4.11. Let Vbe a finite-dimensional vector space and let TW
V!Vbe a linear transformation. Let w1; : : : ; wk2Vand let Wibe
the subspace T-spanned by wi,iD1; : : : ; k. Suppose that mT;wi.x/ D
mT.x/ for iD1; : : : ; k, and that fW1; : : : ; Wkgis independent. Then
W1˚  ˚ Wkhas a T-invariant complement, i.e., there is a T-invariant
subspace W0of Vwith VDW1˚  ˚ Wk˚W0.
Proof. We proceed by induction on k. The kD1case is Theorem 5.4.10.
For the induction step, consider TWV!Vwhere VDV =W1.
We outline the proof.
Let WkC1be a maximal T-invariant subspace of Vwith
.W1˚ ˚ Wk/\WkC1D f0g:
We claim that W1˚  ˚ WkC1DV. Assume not. Let WiDT .Wi/
for iD2; : : : ; k. By the inductive hypothesis, W2˚  ˚ Wkhas a T-
invariant complement YkC1containing .WkC1/. (This requires a slight
modification of the statement and proof of Theorem 5.4.10. We used our
original formulation for the sake of simplicity.) Let YkC1be a subspace
of Vwith YkC1WkC1and .YkC1/DYkC1. Certainly .W2˚  ˚
Wk/\YkC1D f0g. Choose any vector y2YkC1,yWkC1. If the
subspace YT-generated by yis disjoint from W1, set xDyand XDY.
Otherwise, “modifyYas in the proof of Lemma 5.4.8 to obtain xwith X,
the subspace T-generated by x, disjoint from W1. Set W0DWkC1˚X.
Then W0WkC1and W0is disjoint from W1˚  ˚ Wk, contradicting
the maximality of WkC1.
“book” — 2011/3/4 — 17:06 — page 132 — #146
i
i
i
i
i
i
i
i
132 Guide to Advanced Linear Algebra
5.5 Rational canonical form
Let Vbe a finite-dimensional vector space over an arbitrary field Fand let
TWV!Vbe a linear transformation. In this section we prove that Thas
a unique rational canonical form.
The basic idea of the proof is one we have seen already in a much sim-
pler context. Recall the theorem that any linearly independent subset of a
vector space extends to a basis of that vector space. We think of that as say-
ing that any partial good set extends to a complete good set. We would like
to do the same thing in the presence of a linear transformation T: Define a
partial T-good set and show that any partial T-good set extends to a com-
plete T-good set. But we have to be careful to define a T-good set in the
right way. We will see that the right kind of way to define a partial T-good
set is to define it as the right kind of basis for the right kind of T-invariant
subspace W. Then we will be able extend this to the right kind of basis for
all of Vby using Theorem 5.4.10.
Definition 5.5.1. Let Vbe a finite-dimensional vector space and let
TWV!Vbe a linear transformation. An ordered set CD fw1; : : : ; wkg
is a rational canonical T-generating set of Vif the following conditions
are satisfied:
(1) VDW1˚˚Wkwhere Wiis the subspace of Vthat is T-generated
by wi
(2) pi.x/ is divisible by piC1.x/ for iD1; : : : ; k 1, where pi.x/ D
mT;wi.x/ is the T-annihilator of wi.Þ
When TDI, any basis of Vis a rational canonical T-generating set
and vice-versa, with pi.x/ Dx1for every i. Of course, every Vhas a
basis. A basis for Vis never unique, but any two bases of Vhave the same
number of elements, namely the dimension of V.
Here is the appropriate generalization of these two facts. For the second
fact, we have not only that any two rational canonical T-generating sets
have the same number of elements, but also the same number of elements
of each “type”, where the type of an element is its T-annihilator.
Theorem 5.5.2. Let Vbe a finite-dimensional vector space and let TW
V!Vbe a linear transformation. Then Vhas a rational canonical T-
generating set CD fw1; : : : ; wkg. If C0D fw0
1; : : : ; w0
lgis any rational
canonical T-generating set of V, then kDland p0
i.x/ Dpi.x/ for iD
1; : : : ; k, where p0
i.x/ DmT;w0
i.x/ and pi.x/ DmT;wi.x/.
“book” — 2011/3/4 — 17:06 — page 133 — #147
i
i
i
i
i
i
i
i
5.5. Rational canonical form 133
Proof. First we prove existence and then we prove uniqueness.
For existence we proceed by induction on nDdim.V /. Choose an
element w1of Vwith mT;w1.x/ DmT.x/ and let W1be the subspace of
VT-generated by w1. If W1DVwe are done.
Otherwise, let W0be a T-invariant complement of Win V, which exists
by Theorem 5.4.10. Then VDW˚W0. Let T0be the restriction of Tto
W0,T0WW0!W0. Then mT0.x/ divides mT.x/. (Since mT.T/.v/ D0
for all v2V,mT.T/.v/ D0for all vin W0.) By induction, W0has a
rational canonical T0-generating set that we write as fw2; : : : ; wkg. Then
fw1; : : : ; wkgis a rational canonical T-generating set of V.
For uniqueness, suppose Vhas rational canonical T-generating sets
CD fw1; : : : ; wkgand C0D fw0
1; : : : ; w0
lgwith corresponding T-invariant
direct sum decompositions VDW1˚˚Wkand VDW0
1˚˚W0
land
corresponding T-annihilators pi.x/ DmT;wi.x/ and p0
i.x/ DmT;w0
i.x/.
Let these polynomials have degree diand d0
irespectively, and let Vhave
dimension n. We proceed by induction on k.
Now p1.x/ DmT.x/ and p0
1.x/ DmT.x/, so p0
1.x/ Dp1.x/. If kD
1,VDW1, dim.V / Ddim.W1/,nDd1. But then nDd0
1Ddim.W 0
1/so
VDW0
1. Then lD1,p0
1.x/ Dp1.x/, and we are done.
Suppose for some k1we have p0
i.x/ Dpi.x/ for iD1; : : : ; k. If
VDW1˚  ˚ Wkthen nDd1C  C dkDd0
1C  C d0
kso VD
W0
1˚˚W0
kas well, lDk,p0
i.x/ Dpi.x/ and we are done, and similarly
if VDW0
1˚  ˚ W0
l. Otherwise consider the vector space pkC1.T/.V /,
aT-invariant subspace of V. Since VDW1˚˚Wk˚WkC1˚we
have that
pkC1.T/.V / DpkC1.T/W1˚ ˚ pkC1.T/Wk
˚pkC1.T/WkC1˚  :
Let us identify this subspace further. Since pkC1.x/ DmT;wkC1.x/, we
have that pkC1.T/.wkC1/D0, and hence pkC1.T/.WkC1/D0. Since
pkCi.x/ divides pkC1.x/ for i1, we also have that pkC1.T/.wkCi/D0
and hence pkC1.T/.WkCi/D0for i1. Thus
pkC1.T/.V / DpkC1.T/W1˚  ˚ pkC1.T/Wk:
Now pkC1.x/ divides pi.x/ for i < k, so pkC1.T/.Wi/has dimension
didkC1, and hence pkC1.T/.V / is a vector space of dimension dD
.d1dkC1/C.d2dkC1/C  C .dkdkC1/. (Some or all of these
differences of dimensions may be zero, which does not affect the argument.)
“book” — 2011/3/4 — 17:06 — page 134 — #148
i
i
i
i
i
i
i
i
134 Guide to Advanced Linear Algebra
Apply the same argument to the decomposition VDW0
1˚˚W0
lto obtain
pkC1.T/.V / DpkC1.T/W0
1˚  ˚ pkC1.T/W0
k
˚pkC1.T/W0
kC1˚ 
which has the subspace pkC1.T/.W 0
1/˚  ˚ pkC1.T/.W 0
k/of dimen-
sion das well (since p0
i.x/ Dpi.x/ for ik). Thus this subspace must
be the entire space, and in particular pkC1.T/.W 0
kC1/D0, or, equiva-
lently, pkC1.T/.W 0
kC1/D0. But w0
kC1has T-annihilator p0
kC1.x/, so
p0
kC1.x/ divides pkC1.x/. The same argument using p0
kC1.T/.V / instead
of pkC1.T/.V / shows that pkC1.x/ divides p0
kC1.x/, so we see that
p0
kC1.x/ Dpk.x/. Proceeding in this way we obtain p0
i.x/ Dpi.x/ for
every i, and lDk, and we are done.
We translate this theorem into matrix language.
Definition 5.5.3. An n-by-nmatrix Mis in rational canonical form if
Mis a block diagonal matrix
MD2
6
6
6
4
Cp1.x/Cp2.x/:::
Cpk.x/
3
7
7
7
5
where C.pi.x// denotes the companion matrix of pi.x/, for some sequence
of polynomials p1.x/; p2.x/; : : : ; pk.x/ with pi.x/ divisible by piC1.x/
for iD1; : : : ; k 1.Þ
Theorem 5.5.4 (Rational Canonical Form).(1) Let Vbe a finite-dimensional
vector space, and let TWV!Vbe a linear transformation. Then Vhas a
basis Bsuch that ŒTBDMis in rational canonical form. Furthermore,
Mis unique.
(2) Let Abe an n-by-nmatrix. Then Ais similar to a unique matrix M
in rational canonical form.
Proof. (1) Let CD fw1; : : : ; wkgbe a rational canonical T-generating set
for V, where pi.x/ DmT;wi.x/ has dimension di. Then
BD˚Td11w1; : : : ; w1;Td21w2; : : : ; w2;:::;Tdk1wk; : : : ; wk
is the desired basis.
(2) Apply part (1) to the linear transformation TDTA.
“book” — 2011/3/4 — 17:06 — page 135 — #149
i
i
i
i
i
i
i
i
5.5. Rational canonical form 135
Definition 5.5.5. If Thas rational canonical form with diagonal blocks
C.p1.x//; C.p2.x//; : : : ; C.pk.x// with pi.x/ divisible by piC1.x/ for
iD1; : : : ; k 1, then p1.x/; : : : ; pk.x/ is the sequence of elementary
divisors of T.Þ
Corollary 5.5.6. (1) Tis determined up to similarity by its sequence of
elementary divisors p1.x/; : : : ; pk.x/
(2) The sequence of elementary divisors p1.x/; : : : ; pk.x/ is determined
recursively as follows: p1.x/ DmT.x/. Let w1be any element of Vwith
mT;w1.x/ DmT.x/ and let W1be the subspace T-generated by w1. Let
TWV=W1!V=W1. Then p2.x/ DmT.x/, etc.
Corollary 5.5.7. Let Thave elementary divisors fp1.x/; : : : ; pk.x/g. Then
(1) mT.x/ Dp1.x/
(2) cT.x/ Dp1.x/p2.x/ pk.x/.
Proof. We already know (1). As for (2),
cT.x/ Ddet.C.p1.x/// det.C.p2.x///  D p1.x/p2.x/ pk.x/:
Remark 5.5.8. In the next section we will develop Jordan canonical
form, and in the following section we will develop an algorithm for find-
ing the Jordan canonical form of a linear transformation TWV!V, and
for finding a Jordan basis of V, providing we can factor the characteristic
polynomial of T.
There is an unconditional algorithm for finding a rational canonical T-
generating set for a linear transformation TWV!V, and hence the ratio-
nal canonical form of T. Since it can be tedious to apply, and the result is
not so important, we will merely sketch the argument.
First observe that for any nonzero vector v2V, we can find its T-
annihilator mT;x .x/ as follows: Successively check whether the sets
fvg;fv; T.v/g;fv; T.v/; T2.v/g; : : : , are linearly independent. When we
come to a linearly dependent set fv; T.v/; : : : ; Tk.v/g, stop. From the lin-
ear dependence we obtain the T-annihilator mT.x/ of v, a polynomial of
degree k.
Next observe that using Euclid’s algorithm we may find the gcd and lcm
of any finite set of polynomials (without having to factor them).
Given these observations we proceed as follows: Pick a basis fv1; : : : ; vng
of V. Find the T-annihilators mT;v1.x/; : : : ; mT;vn.x/. Knowing these, we
can find the minimum polynomial mT.x/ by using Theorem 5.1.5. Then
“book” — 2011/3/4 — 17:06 — page 136 — #150
i
i
i
i
i
i
i
i
136 Guide to Advanced Linear Algebra
we can find a vector w12Vwith mT;w1.x/ DmT.x/ by using Theo-
rem 5.1.11.
Let W1be the subspace of VT-generated by w1. Choose any comple-
ment V2of V, so that VDW1˚V2, and choose any basis fv2; : : : ; vmg
of V2. Successively “modify” v2; : : : ; vmto u2; : : : ; umas in the proof of
Lemma 5.4.8. The subspace U2spanned by fu2; : : : ; umgis a T-invariant
complement of W1,VDW1˚U2. Let T0be the restriction of Tto U2, so
that T0WU2!U2. Repeat the argument for U2, etc.
In this way we obtain vectors w1; w2; : : : ; wk, with CD fw1; : : : ; wkg
being a rational canonical T-generating set for V, and from Cwe obtain a
basis Bof Vwith ŒTBthe block diagonal matrix whose diagonal blocks
are the companion matrices C.mT;w1.x//; : : : ; C.mT;wk.x//, a matrix in
rational canonical form. Þ
5.6 Jordan canonical form
Now let Fbe an algebraically closed field, let Vbe a finite-dimensional
vector space over F, and let TWV!Vbe a linear transformation. In
this section we show in Theorem 5.6.5 that Thas an essentially unique Jor-
dan canonical form. If Fis not algebraically closed that may or may not
be the case. In Theorem 5.6.6 we see the condition on Tthat will guaran-
tee that it does. At the end of this section we discuss, though without full
proofs, a generalization of Jordan canonical form that always exists (Theo-
rem 5.6.13).
These results in this section are easy to obtain given the hard work we
have already done. We begin with some preliminary work, apply Theo-
rem 5.4.6, use rational canonical form, and out pops Jordan canonical form
with no further ado!
Lemma 5.6.1. Let Vbe a finite-dimensional vector space and let TWV!
Vbe a linear transformation. Suppose that mT.x/ DcT.x/ D.x a/k.
Then Vis T-generated by a single element w1and Vhas a basis BD
fv1; : : : ; vkgwhere vkDwand viD.TaI/.viC1/for iD1; : : : ; k 1.
Proof. We know that there is an element wof Vwith mT;w .x/ DmT.x/.
Then wT-generates a subspace W1of Vwhose dimension is the degree k
of mT.x/. By hypothesis mT.x/ DcT.x/, so cT.x/ also has degree k. But
the degree cT.x/ is equal to the dimension of V, so dim.W1/Ddim.V /
and hence W1DV.
“book” — 2011/3/4 — 17:06 — page 137 — #151
i
i
i
i
i
i
i
i
5.6. Jordan canonical form 137
Set vkDwand for 1i < k, set viD.TaI/ki.vk/. Then
viD.TaI/ki.vk/D.TaI/.TaI/ki1.vk/D.TaI/.viC1/.
It remains to show that BD fv1; : : : ; vkgis a basis. It suffices to show
that this set is linearly independent. Suppose that c1v1C  C ckvkD0,
i.e., c1.TaI/k1vkC  C ckvkD0. Then p.T/.vk/D0where
p.x/ Dc1.x a/k1Cc2.x a/k2CCck. Now p.x/ is a polynomial
of degree at most k1, and mT;vk.x/ D.x a/kis of degree k, so p.x/
is the zero polynomial. The coefficient of xk1in p.x/ is c1, so c1D0I
then the coefficient of xk2in p.x/ is c2, so c2D0, etc. Thus c1Dc2D
 D ckD0and Bis linearly independent.
Corollary 5.6.2. Let Tand Bbe as in Lemma 5.6.1. Then
ŒTBD2
6
6
6
6
6
4
a 1
a 1
:::
1
a
3
7
7
7
7
7
5
;
ak-by-kmatrix with diagonal entries a, entries immediately above the di-
agonal 1, and all other entries 0.
Proof. .TaI/.v1/D0so T.v1/Dv1I.TaI/.viC1/Dviso
T.viC1/DviCaviC1, and the result follows from Remark 2.2.8.
Definition 5.6.3. A basis Bof Vas in Corollary 5.6.2 is called a Jordan
basis of V.
If VDV1˚  ˚ Vland Vihas a Jordan basis Bi, then BDB1[
[ Blis called a Jordan basis of V.Þ
Definition 5.6.4. (1) A k-by-kmatrix
2
6
6
6
6
6
4
a 1
a 1
:::
1
a
3
7
7
7
7
7
5
as in Corollary 5.6.2 is called a k-by-kJordan block associated to the eigen-
value a.
“book” — 2011/3/4 — 17:06 — page 138 — #152
i
i
i
i
i
i
i
i
138 Guide to Advanced Linear Algebra
(2) A matrix Jis said to be in Jordan canonical form if Jis a block
diagonal matrix
JD2
6
6
6
4
J1
J2
:::
Jl
3
7
7
7
5
with each Jia Jordan block. Þ
Theorem 5.6.5 (Jordan canonical form).(1) Let Fbe an algebraically
closed field and let Vbe a nite-dimensional F-vector space. Let TWV!
Vbe a linear transformation. Then Vhas a basis Bwith ŒTBDJa
matrix in Jordan canonical form. Jis unique up to the order of the blocks.
(2) Let Fbe an algebraically closed field and let Abe an n-by-nmatrix
with entries in F. Then Ais similar to a matrix Jin Jordan canonical form.
Jis unique up to the order of the blocks.
Proof. Let Thave characteristic polynomial
cT.x/ D.x a1/e1.x am/em:
Then, by Theorem 5.4.6, we have a T-invariant direct sum decomposition
VDV1˚˚Vmwhere ViDKer.TaiI/ei. Let Tibe the restriction
of Tto Vi. Then, by Theorem 5.5.2, Vihas a rational canonical T-basis
CD fwi
1; : : : ; wi
kigand a corresponding direct sum decomposition ViD
Wi
1˚˚Wi
ki. Then each Wi
jsatisfies the hypothesis of Lemma 5.6.1, so
Wi
jhas a Jordan basis Bi
j. Then
BDB1
1[   [ B1
k1[   [ Bm
1[   [ Bm
km
is a Jordan basis of V. To see uniqueness, note that there is unique factor-
ization for the characteristic polynomial, and then the uniqueness of each of
the block sizes is an immediate consequence of the uniqueness of rational
canonical form.
(2) Apply part (1) to the linear transformation TDTA.
We stated Theorem 5.6.5 as we did for emphasis. We have a more gen-
eral result.
Theorem 5.6.6 (Jordan canonical form).(1) Let Vbe a finite-dimensional
vector space over a eld Fand let TWV!Vbe a linear transforma-
tion. Suppose that cT.x/, the characteristic polynomial of T, factors into a
“book” — 2011/3/4 — 17:06 — page 139 — #153
i
i
i
i
i
i
i
i
5.6. Jordan canonical form 139
product of linear factors, cT.x/ D.x a1/e1.x am/em. Then Vhas
a basis Bwith ŒvBDJa matrix in Jordan canonical form. Jis unique
up to the order of the blocks.
(2) Let Abe an n-by-nmatrix with entries in a field F. Suppose that
cA.x/, the characteristic polynomial of A, factors into a product of linear
factors, cA.x/ D.x a1/e1.x am/em. Then Ais similar to a matrix
Jin Jordan canonical form. Jis unique up to the order of the blocks.
Proof. Identical to the proof of Theorem 5.6.5.
Remark 5.6.7. Let us look at a couple of small examples. Let A1D
1 0
0 2 . Then A1is already in Jordan canonical form, but its rational canon-
ical form is M1D3 1
2 0 . Let A2D3 1
0 3 . Then A2is already in Jordan
canonical form, but its rational canonical form is M2D6 1
9 0 . In both of
these two (one diagonalizable, one not) we see that the rational canonical
form is more complicated and less informative than the Jordan canonical
form, and indeed in most applications it is the Jordan canonical form we are
interested in. But, as we have seen, the path to Jordan canonical form goes
through rational canonical form. Þ
The question now naturally arises as to what we can say for a linear
transformation TWV!Vwhere Vis a vector space over Fand cT.x/
may not factor into a product of linear factors over F. Note that this makes
no difference in the rational canonical form. Although there is not a Jordan
canonical form in this case, there is an appropriate generalization. Since it is
not so useful, we will only state the results. The proofs are not so different,
and we leave them for the reader.
Lemma 5.6.8. Let Vbe a finite-dimensional vector space and let TWV!
Vbe a linear transformation. Suppose that mT.x/ DcT.x/ Dp.x/k,
where p.x/ DxdCad1xd1C  C a0is an irreducible polynomial
of degree d. Then Vis T-generated by a single element w, and Vhas a
basis BD fv1
1; : : : ; vd
1; v1
2; : : : ; vd
2; : : : ; v1
k; : : : ; vd
kgwhere vd
kDwand T
is given as follows: For any j, and for i > 1,T.vi
j/Dvi1
j. For jD1,
and for iD1,T.v1
1/D a0v1
1a1v2
1ad1vd
1. For j > 1, and for
iD1,T.v1
j/D a0v1
ja1v2
j    ad1vd
jCvd
j1.
Remark 5.6.9. This is a direct generalization of Lemma 5.6.1, as if
mT.x/ DcT.x/ D.x a/k, then dD1so we are in the case iD1.
the companion matrix of p.x/ Dxais the 1-by-1matrix Œa0DŒa,
and then T.v1
1/Dav1
1and T.v1
j/Dav1
jCv1
j1for j > 1.Þ
“book” — 2011/3/4 — 17:06 — page 140 — #154
i
i
i
i
i
i
i
i
140 Guide to Advanced Linear Algebra
Corollary 5.6.10. In the situation of Lemma 5.6.8,
ŒTBD2
6
6
6
4
C N
C N
:::N
C
3
7
7
7
5;
where there are kidentical d-by-dblocks CDC.cT.x// along the diago-
nal, and .k 1/ identical d-by-dblocks Nimmediately above the diagonal,
where Nis a matrix with an entry of 1in row d, column 1and all other
entries 0.
Remark 5.6.11. If p.x/ D.x a/ this is just a k-by-kJordan block.
Þ
Definition 5.6.12. A matrix as in Corollary 5.6.10 is said to be a gen-
eralized Jordan block. A block diagonal matrix whose diagonal blocks are
generalized Jordan blocks is said to be in generalized Jordan canonical
form.Þ
Theorem 5.6.13 (Generalized Jordan canonical form).(1) Let Vbe a finite-
dimensional vector space over the field Fand let cT.x/ factor as cT.x/ D
p1.x/e1pm.x/emfor irreducible polynomials p1.x/; : : : ; pm.x/. Then
Vhas a basis Bwith ŒV Ba matrix in generalized Jordan canonical form.
ŒV Bis unique up to the order of the generalized Jordan blocks.
(2) Let Abe an n-by-nmatrix with entries in Fand let cA.x/ factor
as cA.x/ Dp1.x/e1pm.x/emfor irreducible polynomials p1.x/; : : : ;
pm.x/. Then Ais similar to a matrix in generalized Jordan canonical form.
This matrix is unique up to the order of the generalized Jordan blocks.
5.7 An algorithm for Jordan
canonical form and Jordan basis
In this section we develop an algorithm to find the Jordan canonical form
of a linear transformation, and a Jordan basis, assuming that we can factor
the characteristic polynomial into a product of linear factors. (As is well
known, there is no general method for doing this.)
We will proceed by first developing a pictorial encoding of the informa-
tion we are trying to find. We call this picture the labelled eigenstructure
picture or `ESP, of the linear transformation.
“book” — 2011/3/4 — 17:06 — page 141 — #155
i
i
i
i
i
i
i
i
5.7. Jordan canonical form and Jordan basis 141
Definition 5.7.1. Let ukbe a generalized eigenvector of index kcor-
responding to an eigenvalue of a linear transformation TWV!V. Set
uk1D.TI/.uk/,uk2D.TI/.uk1/; : : : ,u1D.TI/.u2/.
Then fu1; : : : ; ukgis a chain of generalized eigenvectors. The vector ukis
the top of the chain. Þ
Remark 5.7.2. If fu1; : : : ; ukgis a chain as in Definition 5.7.1, then for
each 1ik,uiis a generalized eigenvector of index iassociated to the
eigenvalue of T.Þ
Remark 5.7.3. A chain is entirely determined by the vector ukat the
top. (We will use this observation later: To find a chain, it suffices to find
the vector at the top of the chain.) Þ
We now pictorially represent a chain as in Definition 5.7.1 as follows:
.
.
.
uk
uk–1
u2
u1
k
k–1
2
λ
1
If fu1; : : : ; ukgforms a Jordan basis for a k-by-kJordan block for the
eigenvalue of T, the vectors in this basis form a chain. Conversely, from
a chain we can construct a Jordan block, and a Jordan basis.
A general linear transformation will have more than one Jordan block.
The `ESP of a linear transformation is the picture we obtain by putting its
chains side by side.
The eigenstructure picture, or ESP, of a linear transformation, is ob-
tained from the `ESP by erasing the labels. We will usually think about this
the other way: We will think of obtaining the `ESP from the ESP by putting
“book” — 2011/3/4 — 17:06 — page 142 — #156
i
i
i
i
i
i
i
i
142 Guide to Advanced Linear Algebra
the labels in. From the Jordan canonical form of a linear transformation we
can determine its ESP, and conversely. Although the ESP has less informa-
tion than the `ESP, it is easier to determine.
The opposite extreme from the situation of a linear transformation whose
Jordan canonical form has a single Jordan block is a diagonalizable linear
transformation.
Suppose Tis diagonalizable with eigenvalues 1; : : : ; n(not necessar-
ily distinct) and a basis fv1; : : : ; vngof associated eigenvectors. Then Thas
`ESP
.
.
.
vn
λn
v3
λ3
v2
λ2
v1
1
λ1
We have shown that the Jordan canonical form of a linear transformation
is unique up to the order of the blocks, so we see that the ESP of a linear
transformation is unique up to the order of the chains. As Jordan bases are
not unique, neither is the `ESP.
The `ESP is easier to illustrate by example than to define formally. We
have just given two general examples. For a concrete example we advise the
reader to look at the beginning of Example 5.7.7.
We now present our algorithm for determining the Jordan canonical
form of a linear transformation. Actually, the algorithm we present will be
an algorithm for ESP.
To find the ESP of Twhat we need to find is the positions of the nodes at
the top of chains. We envision starting at the top, i.e., the highest index, and
working our way down. From this point of view, the nodes we encounter
at the top of chains are “new” nodes, while nodes that are not at the top of
chains come from nodes we have already seen, and we regard them as “old
nodes.
Let us now imagine ourselves in the middle of this process, say at height
(Dindex) j, and suppose we see part of the ESP of Tfor the eigenvalue :
j
“book” — 2011/3/4 — 17:06 — page 143 — #157
i
i
i
i
i
i
i
i
5.7. Jordan canonical form and Jordan basis 143
Each node in the ESP represents a vector in the generalized eigenspace
E1
, and together these vectors are a basis for E1
. More precisely, the
vectors corresponding to the nodes at height jor less form a basis for Ej
,
the subspace of E1
consisting of eigenvectors of index at most j(as well
as the 0vector). Thus if we let dj./ be the number of nodes at height at
most j, then
dj./ Ddim Ej
:
As a first step toward finding the number of new nodes at index j, we
want to find the number of all nodes at this index. If we let dex
j./ denote
the number of nodes exactly at level j, then
dex
j./ Ddj./ dj1./:
(That is, the number of nodes at height exactly jis the number of nodes at
height at most jminus the number of nodes at height at most j1.)
We want to find dnew
j./, the number of new nodes at height j. Every
node at height jis either new or old, so the number of new nodes at height
jis
dnew
j./ Ddex
j./ dex
jC1./
as every old node at height jcomes from a node at height jC1, and there
are exactly dex
jC1./ of those.
This gives our algorithm:
Algorithm 5.7.4. Let be an eigenvalue of TWV!V.
Step 1. For jD1; 2; : : : , compute
dj./ Ddim Ej
Ddim.Ker.TI/j/:
Stop when dj./ Dd1./ Ddim E1
. Recall from Lemma 4.2.4 that
d1./ Dalg-mult./. Denote this value of jby jmax./. (Note also that
jmax./ is the smallest value of jfor which dj./ Ddj1./.)
Step 2. For jD1; : : : ; jmax./ compute dex
j./ by
dex
1./ Dd1./;
dex
j./ Ddj./ dj1./ for j > 1:
“book” — 2011/3/4 — 17:06 — page 144 — #158
i
i
i
i
i
i
i
i
144 Guide to Advanced Linear Algebra
Step 3. For jD1; : : : ; jmax./ compute dnew
j./ by
dnew
j./ Ddex
j./ dex
jC1./ for j < jmax./;
dnew
j./ Ddex
j./ for jDjmax./:
We now refine our argument to use it to find a Jordan basis for a linear
transformation. The algorithm we present will be an algorithm for `ESP,
but since we already know how to find the ESP, it is now just a matter of
finding the labels.
Again we us imagine ourselves in the middle of this process, at height
jfor the eigenvalue . The vectors labelling the nodes at height at most j
form a basis for Ej
and the vectors labelling the nodes at height at most
j1form a basis for Ej1
. Thus the vectors labelling the nodes at height
exactly jare a basis for a subspace Fj
of Ej
that is complementary to
Ej1
. But cannot be any subspace, as it must contain the old nodes at
height j, which come from one level higher, i.e., from a subspace FjC1
of EjC1
that is complementary to Ej
. But that is the only condition on the
complement Fj
, and since we are working our way down and are at level j,
we may assume we have successfully chosen a complement FjC1
at level
jC1.
With a bit more notation we can describe our algorithm. Let us denote
the space spanned by the old nodes at height jby Aj
. (We use Abecause
it is the initial letter of alt, the German word for old. We cannot use O
for typographical reasons.) The nodes in Aj
come from nodes at height
jC1, but we already know what these are: they are in FjC1
. Thus we set
Aj
D.TI/.F jC1
/. Then Aj
and Ej1
are both subspaces of Ej
,
and in fact they are independent subspaces, as any nonzero vector in Aj
has
height jand any nonzero vector in Ej1
has height at most j1. We then
choose Nj
to be any complement of Ej1
˚Aj
in Ej
. (For jD1the
situation is a little simpler, as we simply choose Nj
to be a complement of
Aj
in Ej
.)
This is a space of new (or, in German, neu) vectors at height jand is
precisely the space we are looking for. We choose a basis for Nj
and label
the new nodes at height jwith the elements of this basis. In practice, we
usually find Nj
as follows: We find a basis B1of Ej1
, a basis B2of
Aj
, and extend B1[B2to a basis Bof Ej
. Then B.B1[B2/is a
basis of Nj
. So actually we will find the basis of Nj
directly, and that is the
information we need. Finally, we have just obtained Ej
DEj1
˚Aj
˚Nj
“book” — 2011/3/4 — 17:06 — page 145 — #159
i
i
i
i
i
i
i
i
5.7. Jordan canonical form and Jordan basis 145
so we set Fj
DAj
˚Nj
and we are finished at height jand ready to drop
down to height j1. (When we start at the top, for jDjmax./, the
situation is easier. At the top there can be no old vectors, so for jDjmax
we simply have Ej
DEj1
˚Nj
and Fj
DNj
.)
We summarize our algorithm as follows:
Algorithm 5.7.5. Let be an eigenvalue of TWV!V.
Step 1. For jD1; 2; : : : ; jmax./ find the subspace Ej
DKer..TI/j/.
Step 2. For jDjmax./; : : : ; 2; 1:
(a) If jDjmax./, let Nj
be any complement of Ej1
in Ej
. If j <
jmax./, let Aj
D.TI/.F jC1
/. Let Nj
be any complement of
Ej1
˚Aj
in Ej
if j > 1, and let Nj
be any complement of Aj
in
Ej
if jD1.
(b) Label the new nodes at height jwith a basis of Nj
.
(c) Let Fj
DAj
˚Nj
.
There is one more point we need to clear up to make sure this algorithm
works. We know from our results on Jordan canonical form that there is
some Jordan basis for A, i.e., some labelling so that the `ESP is correct.
We have made some choices, in choosing our complements Nj
, and in
choosing our basis for Nj
. But we can see that these choices all yield the
same ESP (and hence one we know is correct.) For the dimensions of the
various subspaces are all determined by the Jordan canonical form of A, or
equivalently by its ESP, and different choices of bases or complements will
yield spaces of the same dimension.
Remark 5.7.6. There are lots of choices here. Complements are almost
never unique, and bases are never unique except for the vector space f0g.
But no matter what choice we make, we get labels for the ESP and hence
Jordan bases for V. (It is no surprise that a Jordan basis is not unique.) Þ
In finding the `ESP (or, equivalently, in finding a Jordan basis), it is
essential that we work from the top down and not from the bottom up. If
we try to work from the bottom up, we have to make arbitrary choices and
we have no way of knowing if they are correct. Since they almost certainly
won’t be, something we would only find out at a later (perhaps much later)
stage, we would have to go back and modify them, and this rapidly becomes
an unwieldy mess.
“book” — 2011/3/4 — 17:06 — page 146 — #160
i
i
i
i
i
i
i
i
146 Guide to Advanced Linear Algebra
We recall that if Ais a matrix and Bis a Jordan basis for V, then
ADPJP 1where Jis the Jordan canonical form of Aand Pis the ma-
trix whose columns consist of the vectors in B(taken in the corresponding
order).
Example 5.7.7. Here is an example for a matrix that is already in Jordan
canonical form. We present it to illustrate all of the various subspaces we
have introduced, before we move on to some highly nontrivial examples.
Let
AD
2
6
6
6
6
6
6
6
6
6
6
6
4
6 1 0
0 6 1
0 0 6
6
6
7 1
0 7
7
3
7
7
7
7
7
7
7
7
7
7
7
5
;
with characteristic polynomial cA.x/ D.x 6/5.x 7/3.
We can see immediately that Ahas `ESP
1
6 6 6 7 7
2
3
e1e4e5e6
e7
e8
e2
e3
E1
6DKer.A 6I / has dimension 3, with basis ˚e1; e4; e5:
E2
6DKer.A 6I /2has dimension 4, with basis ˚e1; e2; e4; e5:
E3
6DKer.A 6I /3has dimension 5, with basis ˚e1; e2; e3; e4; e5:
E1
7DKer.A 7I / has dimension 2, with basis ˚e6; e8:
E2
7DKer.A 7I /2has dimension 3, with basis ˚e6; e7; e8:
“book” — 2011/3/4 — 17:06 — page 147 — #161
i
i
i
i
i
i
i
i
5.7. Jordan canonical form and Jordan basis 147
Thus
d1.6/ D3; d2.6/ D4; d3.6/ D5;
so
dex
1.6/ D3; d ex
2.6/ D43D1; d ex
3.6/ D54D1;
and
dnew
1.6/ D31D2; d new
2.6/ D11D0; d new
3.6/ D1:
Also
d1.7/ D2; d2.7/ D3;
so
dex
1.7/ D2; d ex
2.7/ D32D1;
and
dnew
1.7/ D21D1; d new
2.7/ D1;
and we recover that Ahas 1 3-by-3block and 2 1-by-1blocks for the eigen-
value 6, and 1 2-by-2block and 1 1-by-1block for the eigenvalue 7.
Furthermore,
E2
6has a complement in E3
6of N3
6with basis ˚e3:
Set F3
6DN3
6with basis fe3g.
A2
6D.A 6I /.F 3
6/has basis fe2g, and E1
6˚A2
6has complement in
E2
6of N2
6D f0gwith empty basis. Set
F2
6DA2
6˚N2
6with basis ˚e2:
A1
6D.A 6I /.F 2
6/has basis fe1g, and A1
6has complement in E1
6of
N1
6with basis fe4; e5g.
Also
E1
7has complement in E2
7of N2
7with basis ˚e7:
Set F2
7DN2
7with basis fe7g.
A1
7D.A 7I /.F 2
7/has basis fe6g, and A1
7has complement in E1
7of
N1
7with basis fe8g.
Thus we recover that e3is at the top of a chain of height 3for the
eigenvalue 6,e4and e5are each at the top of a chain of height 1for the
“book” — 2011/3/4 — 17:06 — page 148 — #162
i
i
i
i
i
i
i
i
148 Guide to Advanced Linear Algebra
eigenvalue 6,e7is at the top of a chain of height 2for the eigenvalue 7, and
e8is at the top of a chain of height 1for the eigenvalue 7.
Finally, since e2D.A 6I /.e3/and e1D.A 6I /.e2/, and e6D
.A7I /.e7/, we recover that fe1; e2; e3; e4; e5; e6; e7; e8gis a Jordan basis.
Þ
Example 5.7.8. We present a pair of (rather elaborate) examples to illus-
trate our algorithm.
(1) Let Abe the 8-by-8matrix
AD
2
6
6
6
6
6
6
6
6
6
6
6
4
3 3 0 0 0 1 0 2
3 4 1 11 0 1 1
0 6 3 0 0 2 0 4
2 4 0 1 1 0 2 5
3 2 1 1 2 0 1 2
1 1 0 11 3 1 1
5 10 1 321 6 10
3 2 1 11 0 1 1
3
7
7
7
7
7
7
7
7
7
7
7
5
with characteristic polynomial cA.x/ D.x 3/7.x 2/.
The eigenvalue D2is easy to deal with. We know without any fur-
ther computation that d1.2/ Dd1.2/ D1and that Ker.A 2I / is 1-
dimensional.
For the eigenvalue D3, computation shows that A3I has rank
5, so Ker.A 3I / has dimension 3and d1.3/ D3. Further computation
shows that .A 3I /2has rank 2, so Ker.A 3I /2has dimension 6and
d2.3/ D6. Finally, .A 3I /3has rank 1, so Ker.A 3I /3has dimension
7and d3.3/ Dd1.3/ D7.
At this point we can conclude that Ahas minimum polynomial mA.x/ D
.x 3/3.x 2/.
We can also determine the ESP of A. We have
dex
1.3/ Dd1.3/ D3
dex
2.3/ Dd2.3/ d1.3/ D63D3
dex
3.3/ Dd3.3/ d2.3/ D76D1
and then
dnew
3.3/ Ddex
3.3/ D1
dnew
2.3/ Ddex
2.3/ dex
3.3/ D31D2
dnew
1.3/ Ddex
1.3/ dex
2.3/ D33D0:
“book” — 2011/3/4 — 17:06 — page 149 — #163
i
i
i
i
i
i
i
i
5.7. Jordan canonical form and Jordan basis 149
Thus we see that for the eigenvalue 3, we have one new node at level 3,
two new nodes at level 2, and no new nodes at level 1. Hence Ahas `ESP
1
2
3
3 3 3 2
u1x1
w1
w2
v1
v2
u2
u3
with the labels yet to be determined, and thus Ahas Jordan canonical form
JD
2
6
6
6
6
6
6
6
6
6
6
6
4
3 1 0
0 3 1
0 0 3
3 1
0 3
3 1
0 3
2
3
7
7
7
7
7
7
7
7
7
7
7
5
:
Now we find a Jordan basis.
Equivalently, we find the values of the labels. Once we have the labels
u3,v2,w2, and x1on the new nodes, the others are determined.
The vector x1is easy to find. It is any eigenvector corresponding to the
eigenvalue 2. Computation reveals that we may choose
x1D
2
6
6
6
6
6
6
6
6
6
6
6
4
30
12
68
18
1
4
66
1
3
7
7
7
7
7
7
7
7
7
7
7
5
:
“book” — 2011/3/4 — 17:06 — page 150 — #164
i
i
i
i
i
i
i
i
150 Guide to Advanced Linear Algebra
The situation for the eigenvalue 3is more interesting. We compute that
Ker.A 3I /3has basis
8
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
:
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
0
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
1
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
1
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
1
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
1
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
0
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
9
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
;
;
Ker.A 3I /2has basis
8
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
:
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
2
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
0
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
1
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
1
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
1
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
0
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
9
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
;
;
and Ker.A 3I / has basis
8
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
:
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
2
0
0
0
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
0
1
1
1
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
1
0
0
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
9
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
;
:
For u3we may choose any vector u32Ker.A 3I /3,u3
Ker.A 3I /2. Inspection reveals that we may choose
u3D
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
:
“book” — 2011/3/4 — 17:06 — page 151 — #165
i
i
i
i
i
i
i
i
5.7. Jordan canonical form and Jordan basis 151
Then
u2D.A 3I /u3D
2
6
6
6
6
6
6
6
6
6
6
6
4
0
3
0
2
3
1
5
3
3
7
7
7
7
7
7
7
7
7
7
7
5
and u1D.A 3I /u2D
2
6
6
6
6
6
6
6
6
6
6
6
4
2
0
4
0
0
0
2
0
3
7
7
7
7
7
7
7
7
7
7
7
5
:
For v2,w2we may choose any two vectors in Ker.A 3I /2such that the
set of six vectors consisting of these two vectors, u2, and the given three
vectors in our basis of Ker.A 3I / are linearly independent. Computation
reveals that we may choose
v2D
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
2
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
and w2D
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
0
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
:
Then
v1D.A 3I /v2D
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
2
1
1
3
1
3
7
7
7
7
7
7
7
7
7
7
7
5
and w1D.A 3I /w2D
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
2
1
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
:
“book” — 2011/3/4 — 17:06 — page 152 — #166
i
i
i
i
i
i
i
i
152 Guide to Advanced Linear Algebra
Then
˚u1; u2; u3; v1; v2; w1; w2; x1
D
8
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
:
2
6
6
6
6
6
6
6
6
6
6
6
4
2
0
4
0
0
0
2
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
3
0
2
3
1
5
3
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
2
1
1
3
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
2
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
2
1
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
0
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
30
12
68
18
1
4
66
1
3
7
7
7
7
7
7
7
7
7
7
7
5
9
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
;
is a Jordan basis. Þ
(2) Let Abe the 8-by-8matrix
AD
2
6
6
6
6
6
6
6
6
6
6
6
4
3 1 0 0 0 0 0 1
3 4 1 11 1 3 3
1 0 3 1 2 2 6 1
6 0 0 2 0 0 0 6
11004001
312 0 4 0 12 3
1 0 1 0 2 2 10 1
41 0 10008
3
7
7
7
7
7
7
7
7
7
7
7
5
with characteristic polynomial cA.x/ D.x 4/6.x 5/2.
For the eigenvalue D5, we compute that A5I has rank 7, so
Ker.A 5I / has dimension 1and hence d1.5/ D1, and also that Ker.A
5I /2has dimension 2and hence d2.5/ Dd1.5/ D2.
For the eigenvalue D4, we compute that A4I has rank 5, so
Ker.A 4I / has dimension 3and hence d1.4/ D3, that .A 4I /2has rank
4, so Ker.A 4I /2has dimension 4and hence d2.4/ D4, that .A 4I /3
has rank 3, so Ker.A 4I /3has dimension 5and hence that d3.4/ D5and
that .A 4I /4has rank 2, so Ker.A 4I /4has dimension 6and hence that
d4.4/ Dd1.4/ D6.
Thus we may conclude that mA.x/ D.x 4/4.x 5/2.
“book” — 2011/3/4 — 17:06 — page 153 — #167
i
i
i
i
i
i
i
i
5.7. Jordan canonical form and Jordan basis 153
Furthermore
dex
1.4/ Dd1.4/ D3
dex
2.4/ Dd2.4/ d1.4/ D43D1
dex
3.4/ Dd3.4/ d2.4/ D54D1
dex
4.4/ Dd4.4/ d3.4/ D65D1
and then
dnew
4.4/ Ddex
4D1
dnew
3.4/ Ddex
3.4/ dex
4.4/ D11D0
dnew
2.4/ Ddex
2.4/ dex
3.4/ D11D0
dnew
1.4/ Ddex
1.4/ dex
2.4/ D31D2:
Also
dex
1.5/ Dd1.5/ D1
dex
2.5/ Dd2.5/ d1.5/ D21D1
and then
dnew
2.5/ Ddex
2.5/ D1
dnew
1.5/ Ddex
1.5/ dex
2.5/ D11D0:
Hence Ahas `ESP as on the next page with the labels yet to be deter-
mined. In any case Ahas Jordan canonical form
2
6
6
6
6
6
6
6
6
6
6
6
4
4 1 0 0
0 4 1 0
0 0 4 1
0 0 0 4
4
4
5 1
0 5
3
7
7
7
7
7
7
7
7
7
7
7
5
:
“book” — 2011/3/4 — 17:06 — page 154 — #168
i
i
i
i
i
i
i
i
154 Guide to Advanced Linear Algebra
1
2
3
4 4 4 5
u1x1
w1
x2
v1
u2
u3
4u4
Now we find the labels. Ker.A 4I /4has basis
8
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
:
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
0
0
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
3
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
6
0
0
0
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
3
0
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
3
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
9
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
;
;
Ker.A 4I /3has basis
8
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
:
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
0
0
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
6
0
0
0
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
3
0
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
3
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
9
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
;
;
“book” — 2011/3/4 — 17:06 — page 155 — #169
i
i
i
i
i
i
i
i
5.7. Jordan canonical form and Jordan basis 155
Ker.A 4I /2has basis
8
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
:
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
0
0
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
1
1
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
3
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
9
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
;
;
and Ker.A 4I / has basis
8
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
:
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
0
0
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
1
1
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
3
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
9
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
;
:
Also, A5I 2has basis 8
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
:
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
2
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
1
0
0
2
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
9
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
;
;
and Ker.A 5I / has basis
8
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
:
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
1
0
0
2
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
9
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
;
:
“book” — 2011/3/4 — 17:06 — page 156 — #170
i
i
i
i
i
i
i
i
156 Guide to Advanced Linear Algebra
We may choose for u4any vector in Ker.A 4I /4that is not in
Ker.A 4I /3. We choose
u4D
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
3
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;so u3D.A 4I /u4D
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
2
0
1
3
1
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
u2D.A 4I /u3D
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
; u1D.A 4I /u2D
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
0
0
1
1
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
:
Then we may choose v1and w1to be any two vectors such that u1,v1, and
w1form a basis for Ker.A 4I /. We choose
v1D
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
0
0
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
and w1D
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
3
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
:
We may choose x2to be any vector in Ker.A 5I /2that is not in
Ker.A 5I /. We choose
x2D
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
2
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
so x1D.A 5I /x2D
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
1
0
0
2
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
:
“book” — 2011/3/4 — 17:06 — page 157 — #171
i
i
i
i
i
i
i
i
5.8. Field extensions 157
Thus we obtain a Jordan basis
˚u1; u2; u3; u4; v1; w1; x1; x2
D
8
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
<
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
:
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
0
0
1
1
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
0
0
0
0
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
2
0
1
3
1
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
3
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
1
0
0
0
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
0
0
3
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
0
1
0
0
2
1
0
3
7
7
7
7
7
7
7
7
7
7
7
5
;
2
6
6
6
6
6
6
6
6
6
6
6
4
0
1
0
2
0
0
0
1
3
7
7
7
7
7
7
7
7
7
7
7
5
9
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
;
:
5.8 Field extensions
Suppose we have an n-by-nmatrix Awith entries in Fand suppose we have
an extension field Eof F. An extension field is a field EF. For example,
we might have EDCand FDR. If Ais similar over Fto another matrix
B, i.e., BDPAP 1where Phas entries in F, then Ais similar to Bover
Eby the same equation BDPAP 1, since the entries of P, being in F,
are certainly in E. (Furthermore, Pis invertible over Fif and only if it is
invertible over E, as we see from the condition that Pis invertible if and
only if det.P / ¤0.) But a priori, the converse may not be true. A priori, A
might be similar to Bover E, i.e., there may be a matrix Qwith entries in
Ewith BDQAQ1, though there may be no matrix Pwith entries in F
with BDPAP 1. In fact, this does not occur: Aand Bare similar over
Fif and only if they are similar over some (and hence over any) extension
field Eof F.
Lemma 5.8.1. Let fv1; : : : ; vkgbe vectors in Fnand let Ebe an extension
of F. Then fv1; : : : ; vkgis linearly independent over F(i.e., the equation
c1v1C  C ckvkD0with each ci2Fonly has the solution c1D
 D ckD0) if and only if it is linearly independent over E(i.e., the
equation c1v1C  C ckvkD0with each ci2Eonly has the solution
c1D  D ckD0).
Proof. Certainly if fv1; : : : ; vkgis linearly independent over E, it is linearly
independent over F.
Suppose now that fv1; : : : ; vkgis linearly independent over F. Then
fv1; : : : ; vkgextends to a basis fv1; : : : ; vngof Fn. Let ED fe1; : : : ; eng
be the standard basis of Fn. It is the standard basis of Enas well. Since
“book” — 2011/3/4 — 17:06 — page 158 — #172
i
i
i
i
i
i
i
i
158 Guide to Advanced Linear Algebra
fv1; : : : ; vngis a basis, the matrix PDŒŒv1EjjŒvnEis nonsingular
when viewed as a matrix over F. That means det.P / ¤0. If we view
Pas a matrix over E,Premains nonsingular as det.P / ¤0. (det.P / is
computed purely from the entries of P.) Then fv1; : : : ; vngis a basis for V
over E, so fv1; : : : ; vkgis linearly independent over E.
Lemma 5.8.2. Let Abe an n-by-nmatrix over F, and let Ebe an extension
of F.
(1) For any v2Fn,mA;v .x/ DemA;v.x/ where mA;v.x/ (respectively
emA;v .x/) is the A-annihilator of vregarded as an element of Fn
(respectively of En).
(2) mA.x/ DemA.x/ where mA.x/ (respectively emA.x/) is the minimum
polynomial of Aregarded as a matrix over F(respectively over E).
(3) cA.x/ DecA.x/ where cA.x/ (resp. ecA.x/) is the characteristic poly-
nomial of Aregarded as a matrix over F(resp. over E).
Proof. (1) emA;v.x/ divides any polynomial p.x/ with coefficients in Efor
which p.A/.v/ D0and mA;v .x/ is such a polynomial (as its coefficients
lie in FE). Thus emA;v.x/ divides mA;v.x/.
Let mA;v .x/ have degree d. Then fv; Av; : : : ; Ad1vgis linearly inde-
pendent over F, and hence, by Lemma 5.8.1, over Eas well, so emA;v .x/
has degree at least d. But then emA;v .x/ DemA;v.x/.
(2) Again, emA.x/ divides mA.x/. There is a vector vin Fnwith mA.x/ D
mA;v.x/. By (1), emA;v .x/ DmA;v.x/. But emA;v .x/ divides emA.x/, so they
are equal.
(3) cA.x/ Ddet.xI A/ DecA.x/ as the determinant is computed
purely from the entries of A.
Theorem 5.8.3. Let Aand Bbe n-by-nmatrices over Fand let Ebe an
extension field of F. Then Aand Bare similar over Eif and only if they are
similar over F.
Proof. If Aand Bare similar over F, they are certainly similar over E.
Suppose Aand Bare not similar over F. Then Ahas a sequence of elemen-
tary divisors p1.x/; : : : ; pk.x/ and Bhas a sequence of elementary divisors
q1.x/; : : : ; pl.x/ that are not the same. Let us find the elementary divisors
of Aover E. We follow the proof of rational canonical form, still working
over F, and note that the sequence of elementary divisors we obtain over
Fis still a sequence of elementary divisors over E. (If fw1; : : : ; wkgis a
“book” — 2011/3/4 — 17:06 — page 159 — #173
i
i
i
i
i
i
i
i
5.9. More than one linear transformation 159
rational canonical T-generating set over F, it is a rational canonical T-
generating set over EIthis follows from Lemma 5.8.2.) But the sequence of
elementary divisors is unique. In other words, p1.x/; : : : ; pk.x/ is the se-
quence of elementary divisors of Aover E, and similarly q1.x/; : : : ; ql.x/
is the sequence of elementary divisors of Bover E. Since these are differ-
ent, Aand Bare not similar over E.
We have stated the theorem in terms of matrices rather than linear trans-
formation so as not to presume any extra background. But it is equivalent to
the following one, stated in terms of tensor products.
Theorem 5.8.4. Let Vbe a finite-dimensional F-vector space and let SW
V!Vand TWV!Vbe two linear transformations. Then Sand T
are conjugate if and only if for some, and hence for any, extension field E
of F,S˝1WV˝FE!V˝FEand T˝1WV˝FE!V˝FEare
conjugate.
5.9 More than one linear
transformation
Hitherto we have examined the structure of a single linear transformation.
In the last section of this chapter, we derive three results that have a common
theme: They deal with questions that arise when we consider more than one
linear transformation.
To begin, let TWV!Wand SWW!Vbe linear transformations,
with Vand Wfinite-dimensional vector spaces. We examine the relation-
ship between ST WV!Vand T S WW!W.
If VDWand at least one of Sand Tare invertible, then S T and T S
are conjugate: ST DT1.T S/Tor T S DS1.ST /S. In general we
have
Lemma 5.9.1. Let TWV!Wand SWW!Vbe linear transformations
between finite-dimensional vector spaces.
Let p.x/ DatxtC  C a02FŒxbe any polynomial with constant
term a0¤0. Then
dim Ker p.ST /Ddim Ker p.T S/:
Proof. Let fv1; : : : ; vkgbe a basis for Ker.p.ST //. We claim that
fT.v1/; : : : ; T.vk/gis linearly independent. To see this, suppose
c1T.v1/C  C ckT.vk/D0:
“book” — 2011/3/4 — 17:06 — page 160 — #174
i
i
i
i
i
i
i
i
160 Guide to Advanced Linear Algebra
Then T.c1v1C  C ckvk/D0, so ST .c1v1C  C ckvk/D0. Let
vDc1v1C  C ckvk, so S T .v/ D0. But v2Ker.p.ST //, so 0D
.at.ST /tC  C a1.ST /Ca0I /.v/ D0C  C 0Ca0vDa0vand
hence, since a0¤0,vD0. Thus c1v1C CckvkD0. But fv1; : : : ; vkg
is linearly independent, so ciD0for all i, and hence fT.v1/; : : : ; T.vk/g
is linearly independent.
Next we claim that T.vi/2Ker.p.T S// for each i. To see this, note
that
.T S/sTD.T S/.T S/TDT.ST /.ST /DT.ST /s
for any s. Then
p.T S/.T.vi// D.at.T S/tC  C a0I/.T.vi//
D.T.at.ST /tC  C a0I//.vi/
DT.p.ST /.vi// DT.0/ D0:
Hence fT.v1/; : : : ; T.vk/gis a linearly independent subset of Ker.p.T S//,
so dim.Ker.p.T S/// dim.Ker.p.ST ///. Interchanging Sand Tshows
that the dimensions are equal.
Theorem 5.9.2. Let TWV!Wand SWW!Vbe linear transforma-
tions between finite-dimensional vector spaces over an algebraically closed
field F. Then ST and T S have the same nonzero eigenvalues, and for each
common eigenvalue ¤0S T and T S have the same ESP at and hence
the same Jordan block structure at (i.e., the same number of blocks of the
same sizes).
Proof. Apply Lemma 5.9.1 to the polynomials pt;.x/ D.x /tfor
tD1; 2; : : : , noting that the sequence of integers fdim.Ker.pt;.R/// j
tD1; 2; : : :gdetermines the ESP of a linear transformation Rat , or,
equivalently, its Jordan block structure at .
Corollary 5.9.3. Let TWV!Vand SWV!Vbe linear transforma-
tions on a finite-dimensional vector space over an arbitrary field F. Then
ST and T S have the same characteristic polynomial.
Proof. First suppose that that Fis algebraically closed. If dim.V / Dnand
ST , and hence T S, has distinct nonzero eigenvalues 1; : : : ; kof multi-
plicities e1; : : : ; ekrespectively, then they each have characteristic polyno-
mial xe0.x 1/e1.x k/ekwhere e0Dn.e1C  C ek/.
“book” — 2011/3/4 — 17:06 — page 161 — #175
i
i
i
i
i
i
i
i
5.9. More than one linear transformation 161
In the general case, choose an arbitrary basis for Vand represent Sand
Tby matrices Aand Bwith entries in F. Then regard Aand Bas having
entries in F, the algebraic closure of F, and apply the algebraically closed
case.
Theorem 5.9.2 and Corollary 5.9.3 are the strongest results that hold in
general. It is not necessarily the case that ST and T S are conjugate, if S
and Tare both singular linear transformations.
Example 5.9.4. (1) Let AD1 0
0 0 and BD0 1
0 0 . Then AB D0 1
0 0
and BA D0 0
0 0 are not similar, so TATBDTAB and TBTADTBA are not
conjugate, though they both have characteristic polynomial x2.
(2) Let AD1 0
1 0 and BD1 1
1 1 . Then AB D1 1
11and BA D
0 0
0 0 are not similar, so TATBDTAB and TBTADTBA are not conjugate,
though they both have characteristic polynomial x2. (In this case TAand TB
are both diagonalizable.) Þ
Let TWV!Vbe a linear transformation, let p.x/ be a polynomial,
and set SDp.T/. Then Sand Tcommute. We now investigate the ques-
tion of under what circumstances any linear transformation that commutes
with Tmust be of this form.
Theorem 5.9.5. Let Vbe a finite-dimensional vector space and let TW
V!Vbe a linear transformation. The following are equivalent:
(1) Vis T-generated by a single element, or, equivalently, the rational
canonical form of Tconsists of a single block.
(2) Every linear transformation SWV!Vthat commutes with Tcan be
expressed as a polynomial in T.
Proof. Suppose (1) is true, and let v0be a T-generator of V. Then every
element of Vcan be expressed as p.T/.v0/for some polynomial p.x/. In
particular, there is a polynomial p0.x/ such that S.v0/Dp0.T/.v0/.
For any v2V, let vDp.T/.v0/. If Scommutes with T,
S.v/ DSp.T/v0Dp.T/Sv0Dp.T/p0.T/v0
Dp0.T/p.T/v0Dp0.T/.v/I
so SDp0.T/. (We have used the fact that if Scommutes with T, it com-
mutes with any polynomial in T. Also, any two polynomials in Tcommute
with each other.) Thus (2) is true.
“book” — 2011/3/4 — 17:06 — page 162 — #176
i
i
i
i
i
i
i
i
162 Guide to Advanced Linear Algebra
Suppose (1) is false, so that Vhas a rational canonical T-generating set
fv1; : : : ; vkgwith k > 1. Let pi.x/ be the T-annihilator of vi, so p1.x/
is divisible by pi.x/ for i > 1. Then we have a T-invariant direct sum
decomposition VDV1˚  ˚ Vk. Define SWV!Vby S.v/ D0
if v2V1and S.v/ Dvif v2Vifor i > 1. It follows easily from the
T-invariance of the direct sum decomposition that Scommutes with T.
We claim that Sis not a polynomial in T. Suppose SDp.T/for some
polynomial p.x/. Then 0Ds.v1/Dp.T/.v1/so p.x/ is divisible by
p1.x/, the T-annihilator of v1. But p1.x/ is divisible by pi.x/ for i1, so
p.x/ is divisible by pi.x/ for i > 1, and hence S.v2/D  D S.vk/D0.
Thus S.v/ ¤vif 0¤v2Vifor i > 1, a contradiction, and (2) is
false.
Remark 5.9.6. Equivalent conditions to condition (1) of Theorem 5.9.5
were given in Corollary 5.3.3. Þ
Finally, let Sand Tbe diagonalizable linear transformations. We see
when Sand Tare simultaneously diagonalizable.
Theorem 5.9.7. Let Vbe a finite-dimensional vector space and let SW
V!Vand TWV!Vbe diagonalizable linear transformations. The
following are equivalent:
(1) Sand Tare simultaneously diagonalizable, i.e, there is a basis Bof
Vwith ŒSBand ŒTBboth diagonal, or equivalently, there is a basis
Bof Vconsisting of common eigenvectors of Sand T.
(2) Sand Tcommute.
Proof. Suppose (1) is true. Let BD fv1; : : : ; vngwhere S.vi/Divi
and T.vj/Divifor some i,i2F. Then S.T.vi// DS.ivi/D
iiviDiiviDT.i.vi// DT.S.vi// for each i, and since Bis a
basis, this implies S.T.v// DT.S.v// for every v2V, i.e., that Sand T
commute.
Suppose (2) is true. Since Tis diagonalizable, VDV1˚  ˚ Vk
where Viis the eigenspace of Tcorresponding to the eigenvalue iof T.
For v2Vi,T.S.vi// DS.T.vi// DS.ivi/DiS.vi/, so S.vi/2Vi
as well. Thus each subspace Viis S-invariant. Since Sis diagonalizable, so
is its restriction SiWVi!Vi. (mSi.x/ divides mS.x/, which is a product
of distinct linear factors, so mSi.x/ is a product of distinct linear factors as
well.) Thus Vihas a basis Biconsisting of eigenvectors for S. Since every
nonzero vector in Viis an eigenvector of T,Biconsists of eigenvectors of
T, as well. Set BDB1[   [ Bk.
“book” — 2011/3/4 — 17:06 — page 163 — #177
i
i
i
i
i
i
i
i
5.9. More than one linear transformation 163
Remark 5.9.8. It is easy to see that if Sand Tare both triangularizable
linear transformations and Sand Tcommute, then they are simultaneously
triangularizable, but it is even easier to see that the converse is false. For
example, take SDh1 1
0 2 iand TDh1 0
0 2 i.Þ
“book” — 2011/3/4 — 17:06 — page 164 — #178
i
i
i
i
i
i
i
i
“book” — 2011/3/4 — 17:06 — page 165 — #179
i
i
i
i
i
i
i
i
CHAPTER 6
Bilinear, sesquilinear,
and quadratic forms
In this chapter we investigate bilinear, sesquilinear, and quadratic forms, or
“forms” for short. A form is an additional structure on a vector space. Forms
are interesting in their own right, and they have applications throughout
mathematics. Many important vector spaces naturally come equipped with
a form.
In the first section we introduce forms and derive their basic properties.
In the second section we see how to simplify forms on finite-dimensional
vector spaces and in some cases completely classify them. In the third sec-
tion we see how the presence of nonsingular form(s) enables us to define
the adjoint of a linear transformation.
6.1 Basic definitions and results
Definition 6.1.1. A conjugation on a field Fis a map cWF!Fwith
the properties (where we denote c.f / by f):
(1) fDffor every f2F,
(2) f1Cf2Df1Cf2for every f1; f22F,
(3) f1f2Df1f2for every f1; f22F.
The conjugation cis nontrivial if cis not the identity on F.
A conjugation on a vector space Vover Fis a map cWV!Vwith the
properties (where we denote c.v/ by v):
(1) vDvfor every v2V,
165
“book” — 2011/3/4 — 17:06 — page 166 — #180
i
i
i
i
i
i
i
i
166 Guide to Advanced Linear Algebra
(2) v1Cv2Dv1Cv2for every v1; v22V,
(3) f v Df v for every f2F,v2V.
Þ
Remark 6.1.2. The archetypical example of a conjugation on a field is
complex conjugation on the field Cof complex numbers. Þ
Definition 6.1.3. Let Fbe a field with a nontrivial conjugation and let
Vand Wbe F-vector spaces. Then TWV!Wis conjugate linear if
(1) T.v1Cv2/DT.v1/CT.v2/for every v1; v22V
(2) T.cv/ DcT.v/ for every c2F,v2V.
Þ
Now we come to the basic definition. The prefix “sesqui” means “one
and a half”.
Definition 6.1.4. Let Vbe an F-vector space. A bilinear form is a
function 'WVV!F,'.x; y/ D hx; yi, that is linear in each entry, i.e.,
that satisfies
(1) hc1x1Cc2x2; yi D c1hx1; yi C c2hx2; yifor every c1; c22F, and
x1; x2; y 2V
(2) hx; c1y1Cc2y2i D c1hx; y1i C c2hx; y2ifor every c1; c22F, and
x; y1; y22V.
Asesquilinear form is a function 'WVV!F,'.x; y/ D hx; yi, that is
linear in the first entry and conjugate linear in the second, i.e., that satisfies
(1) and (2):
(2)hx; c1y1Cc2y2i D c1hx; y1i C c2hx; y2ifor every c1; c22F, and
x; y1; y22V
for a nontrivial conjugation c7! con F.Þ
Example 6.1.5. (1) Let VDRn. Then hx; yi D txy is a bilinear form. If
VDCn, then hx; yi D txy is a sesquilinear form. In both cases this is the
familiar “dot product.” Indeed for any field Fwe can define a bilinear form
on Fnby hx; yi D txy and for any field Fwith a nontrivial conjugation we
can define a sesquilinear form on Fnby hx; yi D txy.
(2) More generally, for an n-by-nmatrix Awith entries in F,hx; yi D
txAy is a bilinear form on Fn, and hx; yi D txAy is a sesquilinear form
“book” — 2011/3/4 — 17:06 — page 167 — #181
i
i
i
i
i
i
i
i
6.1. Basic definitions and results 167
on Fn. We will see that all bilinear and sesquilinear forms on Fnarise this
way, and, by taking coordinates, that all bilinear and sesquilinear forms on
finite-dimensional vector spaces over Farise in this way.
(3) Let VDrF1and let xD.x1; x2; : : :/,yD.y1; y2; : : :/. We
define a bilinear form on Vby hx; yi D Pxiyi. If Fhas a nontrivial
conjugation, we define a sesquilinear form on Vby hx; yi D Pxiyi.
(4) Let Vbe the vector space of real-valued continuous functions on
Œ0; 1. Then Vhas a bilinear form given by
˝f .x/; g.x/˛DZ1
0
f .x/g.x/ dx:
If Vis the vector space of complex-valued continuous functions on
Œ0; 1, then Vhas a sesquilinear form given by
˝f .x/; g.x/˛DZ1
0
f .x/g.x/ dx:
Þ
Let us see the connection between forms and dual spaces.
Lemma 6.1.6. (1) Let Vbe a vector space and let '.x; y/ D hx; yibe a
bilinear form on V. Then ˛'WV!Vdefined by ˛'.y/.x/ D hx; yiis a
linear transformation.
(2) Let Vbe a vector space and let '.x; y/ D hx; yibe a sesquilin-
ear form on V. Then ˛'WV!Vdefined by ˛'.y/.x/ D hx; yiis a
conjugate linear transformation.
Remark 6.1.7. In the situation of Lemma 6.1.6, ˛'.y/ is often written
as h; yi, so with this notation ˛'Wy7! h; yi.Þ
Definition 6.1.8. Let Vbe a vector space and let 'be a bilinear (re-
spectively sesquilinear) form on V. Then 'is nonsingular if the map ˛'W
V!Vis an isomorphism (respectively conjugate isomorphism). Þ
Remark 6.1.9. In more concrete terms, 'is nonsingular if and only if
the following is true: Let TWV!Fbe any linear transformation. Then
there is a unique vector w2Vsuch that
T.v/ D'.v; w/ D hv; wifor every v2V:
Þ
“book” — 2011/3/4 — 17:06 — page 168 — #182
i
i
i
i
i
i
i
i
168 Guide to Advanced Linear Algebra
In case Vis finite dimensional, we have an easy criterion to determine
if a form 'is nonsingular.
Lemma 6.1.10. Let Vbe a finite-dimensional vector space and let '.x; y/ D
hx; yibe a bilinear or sesquilinear form on V. Then 'is nonsingular
if and only if for every y2V,y¤0, there is an x2Vsuch that
hx; yi D '.x; y/ ¤0.
Proof. Since dim VDdim V,˛'is an (conjugate) isomorphism if and
only if it is injective.
Suppose that ˛'is injective, i.e., if y¤0, then ˛'.y/ ¤0. This means
that there exists an x2Vwith ˛'.y/.x/ D'.x; y/ ¤0.
Conversely, suppose that for every y2V,y¤0, there exists an xwith
˛'.y/.x/ D'.x; y/ ¤0. Then for every y2V,y¤0,˛'.y/ is not the
zero map. Hence Ker.˛'/D f0gand ˛'is injective.
Now we see how to use coordinates to associate a matrix to a bilinear or
sesquilinear form on a finite-dimensional vector space. Note this is different
from associating a matrix to a linear transformation.
Theorem 6.1.11. Let '.x; y/ D hx; yibe a bilinear (respectively sesquilin-
ear) form on the finite-dimensional vector space Vand let BD fv1; : : : ; vng
be a basis for V. Define a matrix AD.aij /by
aij D˝vi; vj˛i; j D1; : : : ; n:
Then for x; y 2V,
hx; yi DtŒxBAŒyBrespectively tŒxBAŒyB:
Proof. By construction, this is true when xDviand yDvj(as then
ŒxDeiand ŒyDej) and by (conjugate) linearity that implies it is true
for any vectors xand yin V.
Definition 6.1.12. The matrix AD.aij /of Theorem 6.1.11 is the
matrix of the form 'with respect to the basis B. We denote it by Œ'B.Þ
Theorem 6.1.13. The bilinear or sesquilinear form 'on the finite dimen-
sional vector space Vis nonsingular if and only if matrix Œ'Bin any basis
Bof Vis nonsingular.
“book” — 2011/3/4 — 17:06 — page 169 — #183
i
i
i
i
i
i
i
i
6.1. Basic definitions and results 169
Proof. We use the criterion of Lemma 6.1.10 for nonsingularity of a form.
Suppose ADŒ'Bis a nonsingular matrix. For x2V,x¤0, let
ŒxBD2
6
4
c1
:
:
:
cn
3
7
5:
Then for some i,ci¤0. Let zDA1ei2Fnand let y2Vwith ŒyBDz
(or ŒyBDz). Then '.x; y/ DtxAA1eiDci¤0.
Suppose Ais singular. Let z2Fn,z¤0, with Az D0. Then if y2V
with ŒyBDz(or ŒyBDz), then '.x; y/ DtxAz Dtx0 D0for every
x2V.
Now we see the effect of a change of basis on the matrix of a form.
Theorem 6.1.14. Let Vbe a finite-dimensional vector space and let 'be
a bilinear (respectively sesquilinear) form on V. Let Band Cbe any two
bases of V. Then
Œ'CDtPB CŒ'BPB C.respectively tPB CŒ'BPB C/:
Proof. We do the sesquilinear case; the bilinear case follows by omitting
the conjugation.
By the definition of Œ'C,
'.x; y/ DtŒxCŒ'CŒyC
and by the definition of Œ'B,
'.x; y/ DtŒxBŒ'BŒyB:
But ŒxBDPB CŒxCand ŒyBDPB CŒyC. Substitution gives
tŒxCŒ'CŒyCD'.x; y/ DtŒxBŒ'BŒyB
DtPB CŒxCŒ'BPB CŒyC
DtŒxCtPB CŒ'BPB CŒyC:
Since this is true for every x; y 2V,
Œ'CDtPB CŒ'BPB C:
This leads us to the following definition.
“book” — 2011/3/4 — 17:06 — page 170 — #184
i
i
i
i
i
i
i
i
170 Guide to Advanced Linear Algebra
Definition 6.1.15. Two square matrices Aand Bwith entries in F
are congruent if there is an invertible matrix Pwith tPAP DB, and are
conjugate congruent if there is an invertible matrix Pwith tPAP DB.Þ
It is easy to check that (conjugate) congruence is an equivalence rela-
tion. We then have:
Corollary 6.1.16. (1) Let 'be a bilinear (respectively sesquilinear) form
on the finite-dimensional vector space V. Let Band Cbe bases of V. Then
Œ'Band Œ'Care congruent (respectively conjugate congruent).
(2) Let Aand Bbe congruent (respectively conjugate congruent) n-by-
nmatrices. Let Vbe an n-dimensional vector space over F. Then there is a
bilinear form (respectively sesquilinear form) 'on Vand bases Band C
of Vwith Œ'BDAand Œ'CDB.
6.2 Characterization and
classification theorems
In this section we derive results about the characterization and classification
of forms on finite-dimensional vector spaces.
Our discussion so far has been general, but almost all the forms encoun-
tered in mathematical practice fall into one of the following classes.
Definition 6.2.1. (1) A bilinear form 'on Vis symmetric if '.x; y/ D
'.y; x/ for all x; y 2V.
(2) A bilinear form 'on Vis skew-symmetric if '.x; y/ D '.y; x/
for all x; y 2V, and '.x; x/ D0for all x2V(this last condition follows
automatically if char.F/¤2).
(3) A sesquilinear form 'on Vis Hermitian if '.x; y/ D'.y; x/ for
all x; y 2V.
(4) A sesquilinear form 'on Vis skew-Hermitian if char.F/¤2and
'.x; y/ D '.y; x/ for all x; y 2V. (If char.F/D2, skew-Hermitian is
not defined.) Þ
Lemma 6.2.2. Let Vbe a finite-dimensional vector space over Fand let '
be a form on V. Choose a basis Bof Vand let ADŒ'B. Then
(1) 'is symmetric if and only if tADA.
(2) 'is skew-symmetric if and only if tAD A(and, if char.F/D2,
the diagonal entries of Aare all 0).
(3) 'is Hermitian if and only if tADA.
(4) 'is skew-Hermitian if and only if tAD A(and char.F/¤2).
“book” — 2011/3/4 — 17:06 — page 171 — #185
i
i
i
i
i
i
i
i
6.2. Characterization and classification theorems 171
Definition 6.2.3. Matrices satisfying the conclusion of Lemma 6.2.2
parts (1), (2), (3), or (4) are called symmetric,skew-symmetric,Hermitian,
or skew-Hermitian respectively. Þ
For the remainder of this section we assume that the forms we consider
are one of these types: symmetric, Hermitian, skew-symmetric, or skew-
Hermitian, and that the vector spaces they are defined on are finite dimen-
sional.
We will write .V; '/ for the space Vequipped with the form '.
The appropriate notion of equivalence of forms is isometry.
Definition 6.2.4. Let Vadmit a form 'and Wadmit a form . Then
a linear transformation TWV!Wis an isometry between .V; '/ and
.W; / if Tis an isomorphism and furthermore
Tv1;Tv2D'v1; v2for every v1; v22V:
If there exists an isometry between .V; '/ and .W; / then .V; '/ and .W; /
are isometric.Þ
Lemma 6.2.5. In the situation of Definition 6.2.4, let Vhave basis Band
let Whave basis C. Then Tis an isometry if and only if MDŒTC Bis
an invertible matrix with
tM Œ CMDŒ'Bin the bilinear case, or
tM Œ CMDŒ'Bin the sesquilinear case.
Thus Vand Ware isometric if and only if Œ Cand Œ'Bare congruent, in
the bilinear case, or conjugate congruent, in the sesquilinear case, in some
(or any) pair of bases Bof Vand Cof W.
Definition 6.2.6. Let 'be a bilinear or sesquilinear form on the vector
space V. Then the isometry group of 'is
Isom.'/ D˚TWV!Visomorphism j
Tis an isometry from .V; '/ to itself:Þ
Corollary 6.2.7. In the situation of Definition 6.2.6, let Bbe any basis of
V. Then T7! ŒTBgives an isomorphism
Isom.'/ !˚invertible matrices Mj
tM Œ'BMDŒ'Bor tM Œ'BMDŒ'B:
Now we begin to simplify and classify forms.
“book” — 2011/3/4 — 17:06 — page 172 — #186
i
i
i
i
i
i
i
i
172 Guide to Advanced Linear Algebra
Definition 6.2.8. Let Vadmit the form '. Then two vectors v1and v2
in Vare orthogonal (with respect to ') if
'v1; v2D'v2; v1D0:
Two subspaces V1and V2are orthogonal (with respect to ') if
'v1; v2D'v2; v1D0for all v12V1; v22V2:Þ
We also have an appropriate notion of direct sum.
Definition 6.2.9. Let Vadmit a form ', and let V1and V2be subspaces
of V. Then Vis the orthogonal direct sum of V1and V2,VDV1?V2,
if VDV1˚V2(i.e., Vis the direct sum of V1and V2) and V1and V2
are orthogonal with respect to '. This is equivalent to the condition: Let
v; v02Vand write vuniquely as vDv1Cv2with v12V1and v22V2,
and similarly v0Dv0
1Cv0
2with v0
12V1and v0
22V2.
Let '1be the restriction of 'to V1V1, and '2be the restriction of '
to V2V2. Then
'.v; v0/D'1v1; v0
1C'2v2; v0
2:
In this situation we will also write .V; '/ D.V1; '1/?.V2; '2/.Þ
Remark 6.2.10. Translated into matrix language, the condition in Def-
inition 6.2.9 is as follows: Let B1be a basis for V1and B2be a basis for
V2. Let A1DŒ'1B1and A2DŒ'2B2. Let BDB1[B2and ADŒ'B.
Then
ADA10
0 A2
(a block-diagonal matrix with blocks A1and A2). Þ
First let us note that if 'is not nonsingular, we may “split off” its sin-
gular part.
Definition 6.2.11. Let 'be a form on V. The kernel of 'is the sub-
space of Vgiven by
Ker.'/ D˚v2Vj'.v; w/ D'.w; v/ D0for all w2V:Þ
Remark 6.2.12. By Lemma 6.1.10, 'is nonsingular if and only if
Ker.'/ D0.Þ
“book” — 2011/3/4 — 17:06 — page 173 — #187
i
i
i
i
i
i
i
i
6.2. Characterization and classification theorems 173
Lemma 6.2.13. Let 'be a form on V. Then Vis the orthogonal direct sum
VDKer.'/ ?V1
for some subspace V1, with '1D'jV1a nonsingular form on V1, and
.V1; '1/is well-defined up to isometry.
Proof. Let V1be any complement of Ker.'/, so that VDKer.'/ ˚V1,
and let '1D'jV1. Certainly VDKer.'/ ?V1. To see that '1is nonsin-
gular, suppose that v12V1with '.v1; w1/D0for every w12V1. Then
'.v1; w/ D0for every w2V, so v12Ker.'/, i.e., v2Ker.'/ \V1D
f0g.
There was a choice of V1, but we claim that all choices yield isometric
forms. To see this, let V0be the quotient space V= Ker.'/. There is a well-
defined form '0on V0defined as follows: Let WV!V = Ker.'/ be the
canonical projection. Let v0; w02V0, choose v; w 2Vwith v0D.v/ and
w0D.w/. Then '0.v0; w0/D'.v; w/. It is then easy to check that =V1
gives an isometry from .V1; '1/to .V 0; '0/.
In light of this lemma, we usually concentrate on nonsingular forms.
But we also have the following well-defined invariant of forms in general.
Definition 6.2.14. Let Vbe finite dimensional and let Vadmit the form
'. Then the rank of 'is the dimension of V1, where V1is the subspace given
in Lemma 6.2.13. Þ
Definition 6.2.15. Let Wbe a subspace of V. Then its orthogonal
subspace is the subspace
W?D˚v2Vj'.w; v/ D0for all w2W:Þ
Lemma 6.2.16. Let Vbe a nite-dimensional vector space. Let Wbe a
subspace of Vand let D'jW. If is nonsingular, then VDW?W?.
If 'is nonsingular as well, then ?D'jW?is nonsingular.
Proof. Clearly Wand W?are orthogonal, so to show that VDW?W?
it suffices to show that VDW˚W?.
Let v02W\W?. Then v02W?, so '.w; v0/D0for all w2W.
But v02Was well, so .w; v0/D'.w; v0/and then the nonsingularity
of implies v0D0.
Let v02V. Then T.w/ D'.w; v0/is a linear transformation TW
W!F, and we are assuming is nonsingular so by Remark 6.1.9 there
“book” — 2011/3/4 — 17:06 — page 174 — #188
i
i
i
i
i
i
i
i
174 Guide to Advanced Linear Algebra
is a w02Wwith T.w/ D .w; w0/D'.w; w0/for every w2W.
Then '.w; v0w0/D0for every w2W, so v0w02W?, and
v0Dw0C.v0w0/.
Suppose 'is nonsingular and let v02W?. Then there is a vector v2V
with '.v; v0/¤0. Write vDw1Cw2with w12W,w22W?. Then
0¤'v; v0D'w1Cw2; v0D'w1; v1C'w2; v0D'w2; v0;
so 'jW?is nonsingular.
Remark 6.2.17. The condition that 'jWbe nonsingular is necessary.
For example, if 'is the form on F2defined by
'.v; w/ Dtv0 1
1 0w
and Wis the subspace
WDx
0;
then WDW?.Þ
Corollary 6.2.18. Let Vbe a finite-dimensional vector space and let Wbe
a subspace of Vwith 'jWand 'jW?both nonsingular. Then .W ?/?D
W.
Proof. We have VDW?W?DW??.W ?/?. It is easy to check that
.W ?/?W, so they are equal.
Our goal now is to “simplify”, and in favorable cases classify, forms on
finite-dimensional vector spaces. Lemma 6.2.16 is an important tool that
enables to apply inductive arguments. Here is another important tool, and a
result interesting in its own right.
Lemma 6.2.19. Let Vbe a vector space over F, and let Vadmit the non-
singular form '. If char.F/¤2, assume 'is symmetric or Hermitian. If
char.F/D2, assume 'is Hermitian. Then there is a vector v2Vwith
'.v; v/ ¤0.
Proof. Pick a nonzero vector v12V. If '.v1; v1/¤0, then set vDv1.
If '.v1; v1/D0, then, by the nonsingularity of ', there is a vector v2
“book” — 2011/3/4 — 17:06 — page 175 — #189
i
i
i
i
i
i
i
i
6.2. Characterization and classification theorems 175
with bD'.v1; v2/¤0. If '.v2; v2/¤0, set vDv2. Otherwise, let
v3Dav1Cv2where a2Fis an arbitrary scalar. Then
'v3; v3D'av1Cv2; av1Cv2
D'av1; av1C'av1; v2C'v2; av1C'v2; v2
D'av1; v2C'v2; av1
D2ab if 'is symmetric
Dab Cab if 'is Hermitian.
In the symmetric case, choose a¤0arbitrarily. In the Hermitian case,
let abe any element of Fwith ab ¤ ab. (If char.F/¤2we may choose
aDb1. If char.F/D2we may choose aDb1cwhere c2Fwith
c¤c.) Then set vDv3for this choice of a.
Remark 6.2.20. The conclusion of this lemma does not hold if char.F/D
2. For example, let Fbe a field of characteristic 2, let VDF2, and let 'be
the form defined on Vby
'.v; w/ Dtv0 1
1 0w:
Then it is easy to check that '.v; v/ D0for every v2V.Þ
Thus we make the following definition.
Definition 6.2.21. Let Vbe a vector space over a field Fof charac-
teristic 2and let 'be a symmetric bilinear form on V. Then 'is even if
'.v; v/ D0for every v2V, and odd otherwise. Þ
Lemma 6.2.22. Let Vbe a vector space over a field Fof characteristic 2
and let 'be a symmetric bilinear form on V. Then Vis even if and only if
for some (and hence for every) basis BD fv1; v2; : : :gof V,'.vi; vi/D0
for every vi2B.
Proof. This follows immediately from the identity
'.v Cw; v Cw/ D'.v; v/ C'.v; w/ C'.w; v/ C'.w; w/
D'.v; v/ C2'.v; w/ C'.w; w/
D'.v; v/ C'.w; w/:
Here is our first simplification.
“book” — 2011/3/4 — 17:06 — page 176 — #190
i
i
i
i
i
i
i
i
176 Guide to Advanced Linear Algebra
Definition 6.2.23. Let Vbe a finite-dimensional vector space and let '
be a symmetric bilinear or a Hermitian form on V. Then 'is diagonalizable
if there are 1-dimensional subspaces V1; V2; : : : ; Vnof Vsuch that
VDV1?V2?  ? Vn:Þ
Remark 6.2.24. Let us see where the name comes from. Choose a nonzero
vector viin Vifor each i(so fvigis a basis for Vi) and let aiD'.vi; vi/.
Let Bbe the basis of Vgiven by BD fv1; : : : ; vng. Then
Œ'BD2
6
6
6
4
a1
a20
0:::
an
3
7
7
7
5
is a diagonal matrix. Conversely if Vhas a basis BD fv1; : : : ; vngwith
Œ'Bdiagonal, then VDV1?  ? Vnwhere Viis the subspace spanned
by vi.Þ
Remark 6.2.25. We will let Œa denote the bilinear or Hermitian form
on F(an F-vector space) with matrix Œa, i.e., the bilinear form given by
'.x; y/ Dxay, or the Hermitian form given by '.x; y/ Dxay. In this
notation a form 'on Vis diagonalizable if and only if it is isometric to
Œa1?  ? Œanfor some a1; : : : ; an2F.Þ
Theorem 6.2.26. Let Vbe a finite-dimensional vector space over a field
Fof characteristic ¤2, and let 'be a symmetric or Hermitian form on
V. Then 'is diagonalizable. If char.F/D2and 'is Hermitian, then 'is
diagonalizable.
Proof. We only prove the case char.F/¤2.
By Lemma 6.2.13, it suffices to consider the case where 'is nonsingu-
lar. We proceed by induction on the dimension of V.
If Vis 1-dimensional, there is nothing to prove. Suppose the theorem is
true for all vector spaces of dimension less than n, and let Vhave dimen-
sion n.
By Lemma 6.2.19, there is an element v1of Vwith '.v1; v1/Da1¤0.
Let V1DSpan.v1/. Then, by Lemma 6.2.16, VDV1?V?
1and 'jV?
1is
nonsingular. Then by induction V?
1DV2?  ? Vnfor 1-dimensional
subspaces V2; : : : ; Vn, so VDV1?V2?  ? Vnas required.
The theorem immediately gives us a classification of forms on complex
vector spaces.
“book” — 2011/3/4 — 17:06 — page 177 — #191
i
i
i
i
i
i
i
i
6.2. Characterization and classification theorems 177
Corollary 6.2.27. Let 'be a nonsingular symmetric bilinear form on V,
where Vis an n-dimensional vector space over C. Then 'is isometric to
Œ1 ?  ? Œ1. In particular, any two such forms are isometric.
Proof. By Theorem 6.2.26, VDV1?  ? Vnwhere Vihas basis fvig.
Let aiD'.vi; vi/. If biis a complex number with b2
iD1=aiand Bis the
basis BD fb1v1; : : : ; bnvngof V, then
Œ'BD2
6
4
1 0
:::
0 1
3
7
5:
The classification of symmetric forms over R, or Hermitian forms over
C, is more interesting. Whether we can solve b2
iD1=aiover R, or bibiD
1=aiover C, comes down to the sign of ai. (Recall that in the Hermitian
case aimust be real.)
Before developing this classification, we introduce a notion interesting
and important in itself.
Definition 6.2.28. Let 'be a symmetric bilinear form on the real vector
space V, or a Hermitian form on the complex vector space V. Then 'is
positive definite if '.v; v/ > 0 for every v2V,v¤0, and 'is negative
definite if '.v; v/ < 0 for every v2V,v¤0. It is indefinite if there are
vectors v1; v22Vwith '.v1; v1/ > 0 and '.v2; v2/ < 0.Þ
Theorem 6.2.29 (Sylvester’s law of inertia).Let Vbe a finite-dimensional
real vector space and let 'be a nonsingular symmetric bilinear form on
V, or let Vbe a finite-dimensional complex vector space and let 'be a
nonsingular Hermitian form on V. Then 'is isometric to pŒ1 ?qŒ1 for
well-defined integers pand qwith pCqDnDdim.V /.
Proof. As in the proof of Corollary 6.2.27, we have that 'is isometric to
pŒ1 ?qŒ1 for some integers pand qwith pCqDn. We must show
that pand qare well-defined.
To do so, let VCbe a subspace of Vof largest dimension with 'jVC
positive definite and let Vbe a subspace of Vof largest dimension with
'jVnegative definite. Let p0Ddim.VC/and q0Ddim.V/. Clearly p0
and q0are well-defined. We shall show that pDp0and qDq0. We argue
by contradiction.
Let Bbe a basis of Vwith Œ'BDpŒ1?qŒ1. If BD fv1; : : : ; vng,
let BCD fv1; : : : ; vpgand BD fvpC1; : : : ; vng. If WCis the space
“book” — 2011/3/4 — 17:06 — page 178 — #192
i
i
i
i
i
i
i
i
178 Guide to Advanced Linear Algebra
spanned by BC, then 'jWCis positive definite, so p0p. If Wis the
space spanned by B, then 'jWis negative definite, so q0q. Now
pCqDn, so p0Cq0n. Suppose it is not the case that pDp0and
qDq0. Then p0Cq0> n, i.e., dim.VC/Cdim.V/ > n. Then VC\Vhas
dimension at least one, so contains a nonzero vector v. Then '.v; v/ > 0 as
v2VC, but '.v; v/ < 0 as v2V, which is impossible.
We make part of the proof explicit.
Corollary 6.2.30. Let Vand 'be as in Theorem 6.2.29. Let p0be the
largest dimension of a subspace VCof Vwith 'jVCpositive definite and
let q0be the largest dimension of a subspace Vof Vwith 'jVnegative
definite. If 'is isometric to pŒ1 ?qŒ1, then pDp0and qDq0. In
particular, 'is positive definite if and only if 'is isometric to nŒ1.
We can now define a very important invariant of these forms.
Definition 6.2.31. Let V,',p, and qbe as in Theorem 6.2.29. Then
the signature of 'is pq.Þ
Corollary 6.2.32. A nonsingular symmetric bilinear form on a finite-dimen-
sional vector space Vover R, or a nonsingular Hermitian form on a finite-
dimensional vector space Vover C, is classified up to isometry by its rank
and signature.
Remark 6.2.33. Here is one way in which these notions appear. Let
fWRn!Rbe a C2function and let x0be a critical point of f. Let H
be the Hessian matrix of fat x0. Then fhas a local minimum at x0if H
is positive definite and a local maximum at x0if His negative definite. If
His indefinite, then x0is neither a local maximum nor a local minimum
for f.Þ
We have the following useful criterion.
Theorem 6.2.34 (Hurwitz’s criterion).Let 'be a nonsingular symmet-
ric bilinear form on the n-dimensional complex vector space V. Let BD
fv1; : : : ; vngbe an arbitrary basis of Vand let ADŒ'B. Let ı0.A/ D1
and for 1knlet ık.A/ Ddet.Ak/where Akis the k-by-ksubmatrix
in the upper left corner of A. Then
(1) 'is positive definite if and only if ık.A/ > 0 for kD1; : : : ; n.
(2) 'is negative definite if and only if .1/kık.A/ > 0 for kD1; : : : ; n.
“book” — 2011/3/4 — 17:06 — page 179 — #193
i
i
i
i
i
i
i
i
6.2. Characterization and classification theorems 179
(3) If ık.A/ ¤0for kD1; : : : ; n, then the signature of 'is rs, where
rD#˚kjık.A/ and ık1.A/ have the same sign
sD#˚kjık.A/ and ık1.A/ have opposite signs:
Proof. We prove (1). Then (2) follows immediately by considering the form
'. We leave (3) to the reader; it can be proved using the ideas of the proof
of (1).
We prove the theorem by induction on nDdim.V /. If nD1, the
theorem is clear: 'is positive definite if and only if Œ'BDŒa1with a1>
0. Suppose the theorem is true for all forms on vector spaces of dimension
n1and let Vhave dimension n. Let Vn1be the subspace of Vspanned
by Bn1D fv1; : : : ; vn1g, so that An1DŒ'jVn1Bn1.
Suppose 'is positive definite. Then 'jVn1is also positive definite (if
'.v; v/ > 0 for all v¤0in V, then '.v; v/ > 0 for all v2Vn1). By
the inductive hypothesis ı1.A/; : : : ; ın1.A/ are all positive. Also, since
ın1.A/ ¤0,'jVn1is nonsingular. Hence VDVn1?V?
n1, where
V?
n1is a 1-dimensional subspace generated by a vector wn. Let bnn D
'.wn; wn/, so bnn > 0.
Let B0be the basis fv1; : : : ; vn1; wng. Then
det.Œ'B0/Dın1.A/bnn > 0:
By Theorem 6.1.14, if Pis the change of basis matrix PB0 B, then
det Œ'B0Ddet.P /2det.A/ Ddet.P /2ın.A/ if 'is symmetric
Ddet.P /det.P / det.A/ Dˇˇdet.P /ˇˇ2ın.A/ if 'is Hermitian
and in any case ın.A/ has the same sign as det.Œ'B0/, so ın.A/ > 0.
Suppose that ı1.A/; : : : ; ın1.A/ are all positive. By the inductive hy-
pothesis 'jVn1is positive definite. Again let VDVn1?V?
n1with wn
as above. If bnn D'.wn; wn/ > 0 then 'is positive definite. The same
argument shows that ın1.A/bnn has the same sign as ın.A/. But ın1.A/
and ın.A/ are both positive, so bnn > 0.
Here is a general formula for the signature of '.
Theorem 6.2.35. Let 'be a nonsingular symmetric bilinear form on the
n-dimensional real vector space Vor a nonsingular Hermitian form on
the n-dimensional complex vector space V. Let Bbe a basis for 'and let
ADŒ'B. Then
“book” — 2011/3/4 — 17:06 — page 180 — #194
i
i
i
i
i
i
i
i
180 Guide to Advanced Linear Algebra
(1) Ahas nreal eigenvalues (counting multiplicity), and
(2) the signature of 'is rs, where ris the number of positive eigenvalues
and sis the number of negative eigenvalues of A.
Proof. To prove this we need a result from the next chapter, Corollary 7.3.20,
that states that every symmetric matrix is orthogonally diagonalizable and
that every Hermitian matrix is unitarily diagonalizable. In other words, if
Ais symmetric then there is an orthogonal matrix P, i.e., a matrix with
tPDP1, such that DDPAP 1is diagonal, and if Ais Hermitian there
is a unitary matrix P, i.e., a matrix with tPDP1, such that DDPAP 1
is diagonal (necessarily with real entries). In both cases the diagonal entries
of Dare the eigenvalues of Aand DDŒ'Cfor some basis C.
Thus we see that rsis the number of positive entries on the diagonal
of Dminus the number of negative entries on the diagonal of D.
Let CD fv1; : : : ; vng. Reordering the elements of Cif necessary, we
may assume that the first rdiagonal entries of Dare positive and the re-
maining sDnrdiagonal entries of Dare negative. Then VDW1?W2
where W1is the subspace spanned by fv1; : : : ; vrgand W2is the subspace
spanned by fvrC1; : : : ; vng. Then 'jW1is positive definite and 'jW2is neg-
ative definite, so the signature of 'is equal to dim.W1/dim.W2/D
rs.
Closely related to symmetric bilinear forms are quadratic forms.
Definition 6.2.36. Let Vbe a vector space over F. A quadratic form
on Vis a function ˆWV!Fsatisfying
(1) ˆ.av/ Da2ˆ.v/ for any a2F,v2V
(2) the function 'WVV!Fdefined by
'.x; y/ Dˆ.x Cy/ ˆ.x/ ˆ.y/
is a (necessarily symmetric) bilinear form on V. We say that ˆand 'are
associated.Þ
Lemma 6.2.37. Let Vbe a vector space over Fwith char.F/¤2. Then
every quadratic form ˆis associated to a unique symmetric bilinear form,
and conversely.
Proof. Clearly ˆdetermines '. On the other hand, suppose that 'is as-
sociated to ˆ. Then 4ˆ.x/ Dˆ.2x/ Dˆ.x Cx/ D2ˆ.x/ C'.x; x/
“book” — 2011/3/4 — 17:06 — page 181 — #195
i
i
i
i
i
i
i
i
6.2. Characterization and classification theorems 181
so
ˆ.x/ D1
2'.x; x/
and 'determines ˆas well.
In characteristic 2 the situation is considerably more subtle and we sim-
ply state the results without proof. For an integer mlet e.m/ D2m1.2mC
1/ and o.m/ D2m1.2m1/.
Theorem 6.2.38. (1) Let 'be a symmetric bilinear form on a vector space
Vof dimension nover the field Fof 2 elements. Then 'is associated
to a quadratic form ˆif and only if 'is even (in the sense of Defini-
tion 6.2.21). In this case there are 2nquadratic forms associated to '. Each
such quadratic form ˆis called a quadratic refinement of '.
(2) Let 'be a nonsingular even symmetric bilinear form on a vector
space Vof necessarily even dimension nD2m over F, and let ˆbe a
quadratic refinement of '.
The Arf invariant of ˆis defined as follows: Let j j denote the cardi-
nality of a set. Then either
ˇˇˆ1.0/ˇˇDe.m/ and ˇˇˆ1.1/ˇˇDo.m/; in which case Arf.ˆ/ D0;
or
ˇˇˆ1.0/ˇˇDo.m/ and ˇˇˆ1.1/ˇˇDe.m/; in which case Arf.ˆ/ D1:
Then there are e.m/ quadratic refinements ˆof 'with Arf.ˆ/ D0and
o.m/ quadratic refinements ˆof 'with Arf.ˆ/ D1.
(3) Quadratic refinements of a nonsingular even symmetric bilinear
form on a finite-dimensional vector space Vare classified up to isometry
by their rank .Ddim.V // and Arf invariant.
Proof. Omitted.
Example 6.2.39. We now give a classical application of our earlier re-
sults. Let
VDFnD8
ˆ
<
ˆ
:2
6
4
x1
:
:
:
xn
3
7
59
>
=
>
;;
“book” — 2011/3/4 — 17:06 — page 182 — #196
i
i
i
i
i
i
i
i
182 Guide to Advanced Linear Algebra
Fa field of characteristic ¤2, and suppose we have a function QWV!F
of the form
Q0
B
@2
6
4
x1
:
:
:
xn
3
7
51
C
AD1
2X
i
aii x2
iCX
i<j
aij xixj:
Then Qis a quadratic form associated to the symmetric bilinear form q
where ŒqEis the matrix AD.aij /. Then ŒqEis diagonalizable, and that
provides a diagonalization of Qin the obvious sense. In other words, there
is a nonsingular change of variable
2
6
4
x1
:
:
:
xn
3
7
57! 2
6
4
y1
:
:
:
yn
3
7
5such that Q0
B
@2
6
4
y1
:
:
:
yn
3
7
51
C
ADX
i
bi i y2
i
for some b11; b22; : : : ; bnn 2F. If FDRwe may choose each bi i D ˙1.
Most interesting is the following: Let FDRand suppose that
Q0
B
@2
6
4
x1
:
:
:
xn
3
7
51
C
A> 0 whenever 2
6
4
x1
:
:
:
xn
3
7
5¤2
6
4
0
:
:
:
0
3
7
5:
Then qis positive definite, and we call Qpositive definite in this case as
well. We then see that for an appropriate change of variable
Q0
B
@2
6
4
x1
:
:
:
xn
3
7
51
C
AD
n
X
iD1
y2
i:
That is, over Revery positive definite quadratic form can be expressed
as a sum of squares. Þ
Let us now classify skew-symmetric bilinear forms.
Theorem 6.2.40. Let Vbe a vector space of finite dimension nover an
arbitrary field F, and let 'be a nonsingular skew-symmetric bilinear form
on V. Then nis even and 'is isometric to .n=2/0 1
1 0 , or, equivalently, to
0 I
I 0 , where Iis the .n=2/-by-.n=2/ identity matrix.
Proof. We proceed by induction on n. If nD1and 'is skew-symmetric,
then we must have Œ'BDŒ0, which is singular, so that case cannot occur.
“book” — 2011/3/4 — 17:06 — page 183 — #197
i
i
i
i
i
i
i
i
6.2. Characterization and classification theorems 183
Suppose the theorem is true for all vector spaces of dimension less than n
and let Vhave dimension n.
Choose v12V,v1¤0. Then, since 'is nonsingular, there exists
w2Vwith '.w; v1/Da¤0, and wis not a multiple of v1as 'is
skew-symmetric. Let v2D.1=a/w, let B1D fv1; v2g, and let V1be the
subspace of Vspanned by B1. Then Œ'jV1B1DŒ0 1
1 0 .V1is a nonsingular
subspace so, by Lemma 6.2.16, VDV1?V?
1. Now dim.V ?
1/Dn2
so we may assume by induction that V?
1has a basis B2with Œ'jV?
1B2D
..n 2/=2/Π0 1
1 0 . Let BDB1[B2. Then Œ'BD.n=2/Œ 0 1
1 0 .
Finally, if BD fv1; : : : ; vng, let B0D fv1; v3; : : : ; vn1; v2; v4; : : : ; vng.
Then Œ'B0DŒ0 I
I 0 .
Finally, we consider skew-Hermitian forms. In this case, by convention,
the field Fof scalars has char.F/¤2. We begin with a result about Fitself.
Lemma 6.2.41. Let Fbe a field with char.F/¤2equipped with a nontriv-
ial conjugation c7! c. Then:
(1) F0D fc2FjcDcgis a subfield of F.
(2) There is a nonzero element j2Fwith jD j.
(3) Every element of Fcan be written uniquely as cDc1Cjc2with
c1; c22F(so that Fis a 2-dimensional F0-vector space with basis
f1; j g). In particular, cD cif and only if cDc2jfor some c22F0.
Proof. (1) is easy to check. (Note that 1D.1 1/ D11so 1D1.)
(2) Let cbe any element of Fwith c¤cand let jD.c c/=2.
(3) Observe that cDc1Cjc2with c1D.cCc/=2 and c2D.cc/=2j .
It is easy to check that c1; c22F0.
Also, if cDc1Cc2jwith c1; c22F0, then cDc1jc2and, solving
for c1and c2, we obtain c1D.c Cc/=2 and c2D.c c/=2j .
Remark 6.2.42. If FDCand the conjugation is complex conjugation,
F0DRand we may choose jDi.Þ
Theorem 6.2.43. Let Vbe a finite-dimensional vector space and let 'be
a nonsingular skew-Hermitian form on V. Then 'is diagonalizable, i.e.,
'is isometric to Œa1?::: ?Œanwith ai2F,ai¤0,aiD ai, or
equivalently aiDjbiwith bi2F0,bi¤0, for each i.
Proof. First we claim there is a vector v2Vwith '.v; v/ ¤0. Choose
v12V,v1¤0, arbitrarily. If '.v1; v1/¤0, choose vDv1. Otherwise,
since 'is nonsingular there is a vector v22Vwith '.v1; v2/Da¤0.
“book” — 2011/3/4 — 17:06 — page 184 — #198
i
i
i
i
i
i
i
i
184 Guide to Advanced Linear Algebra
(Then '.v2; v1/D a.) If '.v2; v2/¤0, choose vDv2. Otherwise,
for any c2F, let v3Dv1Ccv2. We easily compute that '.v3; v3/D
ac a c Dac .ac/. Thus if we let vDv1C.j=a/v2,'.v; v/ ¤0.
Now proceed as in the proof of Theorem 6.2.26.
Corollary 6.2.44. Let Vbe a complex vector space of dimension nand
let 'be a nonsingular skew-Hermitian form on V. Then 'is isometric to
rŒi ?sŒifor well-defined integers rand swith rCsDn.
Proof. By Theorem 6.2.43, Vhas a basis BD fv1; : : : ; vngwith Œ'Bdiag-
onal with entries ib1; : : : ; ibnfor nonzero real numbers b1; : : : ; bn. Letting
B0D fv0
1; : : : ; v0
ngwith v0
iD.p1=jbij/viwe see that Œ'B0is diagonal
with all diagonal entries ˙i. It remains to show that the numbers rof Ci
and sof ientries are well-defined.
The proof is almost identical to the proof of Theorem 6.2.29, the
only difference being that instead of considering '.v; v/ we consider
.1=i/'.v; v/.
6.3 The adjoint of a
linear transformation
We now return to the general situation. We assume in this section that .V; '/
and .W; / are nonsingular, where the forms 'and are either both bilin-
ear or both sesquilinear. Given a linear transformation TWV!W, we
define its adjoint Tadj WW!V. We then investigate properties of the
adjoint.
Definition 6.3.1. Let TWV!Wbe a linear transformation. The
adjoint of Tis the linear transformation Tadj WW!Vdefined by
T.x/; yD'x; Tadj.y/for all x2V; y 2W: Þ
This is a rather complicated definition, and the first thing we need to see
is that it in fact makes sense.
Lemma 6.3.2. Tadj WW!V, as given in Definition 6.3.1, is a well-
defined linear transformation.
Proof. We give two proofs, the first more concrete and the second more
abstract.
The first proof proceeds in two steps. The first step is to observe that the
formula '.x; z/ D .T.x/; y/, where x2Vis arbitrary and y2Wis
“book” — 2011/3/4 — 17:06 — page 185 — #199
i
i
i
i
i
i
i
i
6.3. The adjoint of a linear transformation 185
any fixed element, defines a unique element zof V, since 'is nonsingular.
Hence Tadj.y/ Dzis well-defined. The second step is to show that Tadj is
a linear transformation. We compute, for x2Varbitrary,
'x; Tadj.y1Cy2/D T.x/; y1Cy2D T.x/; y1C T.x/; y2
D'x; Tadj.y1/C'x; Tadj.y2/
and
'x; Tadj.cy/D T.x/; cyDc T.x/; y
Dc 'x; Tadj.y/D'x; cTadj.y/:
For the second proof, we first consider the bilinear case. The formula in
Definition 6.3.1 is equivalent to
˛'Tadj.y/.x/ D˛ .y/T.x/DT' .y/.x/;
where TWW!Vis the dual of T, which gives
Tadj D˛1
'ıTı˛ :
In the sesquilinear case we have a bit more work to do, since ˛'and
˛ are conjugate linear rather than linear. The formula in Definition 6.3.1
is equivalent to .T.x/; y/ D'.x; Tadj.y//. Define ˛'by ˛'.y/.x/ D
'.x; y/, and define ˛ similarly. Then ˛'and ˛ are linear transformations
and by the same logic we obtain
Tadj D˛1
'ıTı˛ :
Remark 6.3.3. Tadj is often denoted by T, but we will not use that nota-
tion in this section as we are also considering T, the dual of T, here. Þ
Suppose Vand Ware finite dimensional. Then, since Tadj WW!Vis
a linear transformation, once we have chosen bases, we may represent Tadj
by a matrix.
Lemma 6.3.4. Let Band Cbe bases of Vand Wrespectively and let
PDŒ'Band QDŒ C. Then
TadjB CDP1 t ŒTC BQif 'and are bilinear;
and
TadjB CDP1tŒTC BQif 'and are sesquilinear:
“book” — 2011/3/4 — 17:06 — page 186 — #200
i
i
i
i
i
i
i
i
186 Guide to Advanced Linear Algebra
In particular, if VDW,'D and BDC, and PDŒ'B, then
TadjBDP1 t ŒTBPif 'is bilinear;
and
TadjBDP1tŒTBPif 'is sesquilinear:
Proof. Again we give two proofs, the first more concrete and the second
more abstract.
For the first proof, let ŒTC BDMand ŒTadjC BDN. Then
T.x/; yD˝T.x/; y˛DtM ŒxBQŒyCDtŒxBtMQŒyC
and
'x; Tadj.y/D˝x; Tadj.y/˛DtŒxBPN ŒyCDtŒxBP N ŒyC
from which we obtain
tMQ DP N and hence NDP1tM Q:
For the second proof, let BD fv1; v2; : : :gand set BD fv1; v2; : : :g.
Then, keeping track of conjugations, we know from the second proof of
Lemma 6.3.2 that
TadjB CD˛'B B1TB C˛ C C:
But Œ˛'B BDP ; Œ˛ C CDQ, and from Definition 2.4.1 and
Lemma 2.4.2 we see that ŒTB CDtŒTC BDtŒTC B.
In one very important case this simplifies.
Definition 6.3.5. Let Vbe a vector space and let 'be a form on V. A
basis BD fv1; v2; : : :gof Vis orthonormal if '.vi; vj/D'.vj; vi/D1if
iDjand 0 if i¤j.Þ
Remark 6.3.6. We see from Corollary 6.2.30 that if FDRor Cthen
Vhas an orthonormal basis if and only if 'is real symmetric or complex
Hermitian, and positive definite in either case. Þ
Corollary 6.3.7. Let Vand Wbe finite-dimensional vector spaces with
orthonormal bases Band Crespectively. Let TWV!Wbe a linear
transformation. Then
ŒTadjB CDtŒTC Bif 'and are bilinear
“book” — 2011/3/4 — 17:06 — page 187 — #201
i
i
i
i
i
i
i
i
6.3. The adjoint of a linear transformation 187
and
ŒTadjB CDtŒTC Bif 'and are sesquilinear:
In particular, if TWV!Vthen
ŒTadjBDtŒTBif 'is bilinear
and
ŒTadjBDtŒTBif 'is sesquilinear:
Proof. In this case, both Pand Qare identity matrices.
Remark 6.3.8. There is an important generalization of the definition of
the adjoint. We have seen in the proof of Lemma 6.3.2 that Tadj is defined by
˛'ıTadj DTı˛ . Suppose now that ˛', or equivalently ˛', is injective but
not surjective, which may occur when Vis infinite dimensional. Then Tadj
may not be defined. But if Tadj is defined, then it is well-defined, i.e., if there
is a linear transformation SWW!Vsatisfying '.T.x/; y/ D .x; S.y//
for every x2V,y2W, then there is a unique such linear transformation
S, and we set Tadj DS.Þ
Remark 6.3.9. (1) It is obvious, but worth noting, that if ˛'is injective
the identity IWV!Vhas adjoint IDI, as '.I.x/; y/ D'.x; y/ D
'.x; I.y// for every x; y 2V.
(2) On the other hand, if ˛'is not injective there is no hope of defining
an adjoint. For suppose V0DKer'/¤ f0g. Let P0WW!Vbe
any linear transformation with P0.W / V0. If SWW!Vis a linear
transformation with .T.x/; y/ D'.x; S.y//, then S0DSCP0also
satisfies .T.x/; y/ D'.x; S0.y// for x2V,y2W.Þ
We state some basic properties of adjoints.
Lemma 6.3.10. (1) Suppose T1WV!Wand T2WV!Wboth have
adjoints. Then T1CT2WV!Whas an adjoint and .T1CT2/adj D
Tadj
1CTadj
2.
(2) Suppose TWV!Whas an adjoint. Then cTWV!Whas an
adjoint and .cT/adj DcTadj.
(3) Suppose SWV!Wand TWW!Xboth have adjoints. Then
TıSWV!Xhas an adjoint and .TıS/adj DSadj ıTadj.
(4) Suppose TWV!Vhas an adjoint. Then for any polynomial
p.x/ 2FŒx,p.T/has an adjoint and .p.T//adj Dp.Tadj/.
“book” — 2011/3/4 — 17:06 — page 188 — #202
i
i
i
i
i
i
i
i
188 Guide to Advanced Linear Algebra
Lemma 6.3.11. Suppose that 'and are either both symmetric, both Her-
mitian, both skew-symmetric, or both skew-Hermitian. If TWV!Whas
an adjoint, then Tadj WW!Vhas an adjoint and .Tadj/adj DT.
Proof. We prove the Hermitian case, which is typical. Let SDTadj. By
definition, .T.x/; y/ D'.x; S.y// for x2V,y2W. Now Shas an
adjoint Rif and only if '.S.y/; x/ D .y; R.x//. But
'S.y/; xD'x; S.y/D T.x/; yD y; T.x/
so RDT, i.e., .Tadj/adj DT.
We will present a number of interesting examples of and related to ad-
joints in Section 7.3 and in Section 7.4.
“book” — 2011/3/4 — 17:06 — page 189 — #203
i
i
i
i
i
i
i
i
CHAPTER 7
Real and complex inner
product spaces
In this chapter we consider real and complex vector spaces equipped with
an inner product. An inner product is a special case of a symmetric bilinear
form, in the real case, or of a Hermitian form, in the complex case. But it is
a very important special case, one in which much more can be said than in
general.
7.1 Basic definitions
We begin by defining the objects we will be studying.
Definition 7.1.1. An inner product '.x; y/ D hx; yion a real vector
space Vis a symmetric bilinear form with the property that hv; vi> 0 for
every v2V,v¤0.
An inner product '.x; y/ D hx; yion a complex vector space Vis a
Hermitian form with the property that hv; vi> 0 for every v2V,v¤0.
A real or complex vector space equipped with an inner product is an
inner product space.Þ
Example 7.1.2. (1) The cases FDRand Cof Example 6.1.5(1) give
inner product spaces.
(2) Let FDRand let Abe a real symmetric matrix (i.e., tADA),
or let FDCand let Abe a complex Hermitian matrix (i.e., tADA) in
Example 6.1.5(2). Then we obtain inner product spaces if and only if Ais
positive definite.
(3) Let FDRor Cin Example 6.1.5(3).
(4) Example 6.1.5(4). Þ
189
“book” — 2011/3/4 — 17:06 — page 190 — #204
i
i
i
i
i
i
i
i
190 Guide to Advanced Linear Algebra
In this chapter we let Fbe Ror C. We will frequently state and prove
results only in the complex case when the real case can be obtained by
ignoring the conjugation.
Let us begin by relating inner products to the forms we considered in
Chapter 6.
Lemma 7.1.3. Let 'be an inner product on the nite-dimensional real
or complex vector space V. Then 'is nonsingular in the sense of Defini-
tion 6.1.8.
Proof. Since '.y; y/ > 0 for every y2V,y¤0, we may apply Lemma
6.1.10, choosing xDy.
Remark 7.1.4. Inner products are particularly nice symmetric or Hermi-
tian forms. One of the ways they are nice is that if 'is such a form on a vec-
tor space V, then not only is 'nonsingular but its restriction to any subspace
Wof Vis nonsingular. Conversely, if 'is a form on a real or complex vector
space Vsuch that the restriction of 'to any subspace Wof Vis nonsingu-
lar, then either 'or 'must be an inner product. For if neither 'nor 'is
an inner product, there are two possibilities: (1) There is a vector w0with
'.w0; w0/D0, or (2) There are vectors w1and w2with '.w1; w1/ > 0 and
'.w2; w2/ < 0. In this case f .t/ D'.tw1C.1 t/w2; tw1C.1 t/w2/is
a continuous real-valued function with f .0/ > 0 and f .1/ < 0, so there is
a value t0with f .t0/D0, i.e., '.w0; w0/D0for w0Dt0w1C.1 t0/w2.
Then 'is identically 0on Span.fw0g/.Þ
We now turn our attention to norms of vectors.
Definition 7.1.5. Let Vbe an inner product space. The norm kvkof a
vector v2Vis
kvk D phv; vi:Þ
Lemma 7.1.6. Let Vbe an inner product space.
(1) kcvk D jcjkvkfor any c2Fand any v2V.
(2) kvk  0for all v2Vand kvk D 0if and only if vD0.
(3) (Cauchy-Schwartz-Buniakowsky inequality) jhv; wij  kvkkwkfor
all v; w 2V, with equality if and only if fv; wgis linearly dependent.
(4) (Triangle inequality) kvCwk  kvk C kwkfor all v; w 2V, with
equality if and only if wD0or vDpw for some nonnegative real number
p.
“book” — 2011/3/4 — 17:06 — page 191 — #205
i
i
i
i
i
i
i
i
7.1. Basic definitions 191
Proof. (1) and (2) are immediate.
For (3), if fv; wgis linearly dependent then wD0or w¤0and
vDcw for some c2F, and it is easy to check that in both cases we have
equality. Assume that fv; wgis linearly independent. Then for any c2F,
xDvcw ¤0, and then direct computation shows that
0 < kxk2D hx; xi D hv; vi C hcw; viC hv; cwiC hcw; cwi
D hv; vi  chv; wichv; wiC jcj2hw; wi:
Setting cD hv; wi=hw; wigives
0 < hv; vi  ˇˇhv; wiˇˇ2=hw; wi;
which gives the inequality.
For (4), we have that
vCw
2D hvCw; v Cwi
D hv; vi C hv; wiC hw; viC hw; wi
D kvk2Chv; wi C hv; wiC kwk2
 kvk2C2ˇˇhv; wiˇˇC kwk2
 kvk2C2kvkkwk C kwk2DkvkC kwk2;
which gives the triangle inequality. The second inequality in the proof is the
Cauchy-Schwartz-Buniakowsky inequality. The first inequality in the proof
holds because for a complex number c,cCc2jcj;with equality only if
cis a nonnegative real number.
To have kvCwk2D.kvkCkwk/2both inequalities in the proof must be
equalities. The second one is an equality if and only if wD0; in which case
the first one is, too, or if and only if w¤0and vDpw for some complex
number p: Then hv; wi C hw; vihpw; wiC hw; pwi D .p Cp/kwk2and
then the first inequality is an equality if and only if pis a nonnegative real
number.
If Vis an inner product space, we may recover the inner product from
the norms of vectors.
Lemma 7.1.7 (Polarization identities).(1) Let Vbe a real inner product
space. Then for any v; w 2V,
hv; wi D .1=4/kvCwk2.1=4/kvwk2:
“book” — 2011/3/4 — 17:06 — page 192 — #206
i
i
i
i
i
i
i
i
192 Guide to Advanced Linear Algebra
(2) Let Vbe a complex inner product space. Then for any v; w 2V,
hv; wi D .1=4/kvCwk2C.i=4/kvCiwk2
.1=4/kvwk2.i=4/kviwk2:
For convenience, we repeat here some earlier definitions.
Definition 7.1.8. Let Vbe an inner product space. A vector v2Vis a
unit vector if kvk D 1. Two vectors vand ware orthogonal if hv; wi D 0.
A set Bof vectors in V,BD fv1; v2; : : :g, is orthogonal if the vectors in
Bare pairwise orthogonal, i.e., if hvi; vji D 0whenever i¤j. The set B
is orthonormal if Bis an orthogonal set of unit vectors, i.e., if hvi; vii D 1
for every iand hvi; vji D 0for every i¤j.Þ
Example 7.1.9. Let h;ibe the standard inner product on Fn, defined by
hv; wi D tvw. Then the standard basis ED fe1; : : : ; engis orthonormal. Þ
Lemma 7.1.10. Let BD fv1; v2; : : :gbe an orthogonal set of nonzero vec-
tors in V. If v2Vis a linear combination of the vectors in B,vDPicivi,
then cjD hv; vji=kvjk2for each j. In particular, if Bis orthonormal then
cjD hv; vjifor each j.
Proof. For any j;
˝v; vj˛DX
i
civi; vjDX
i
ci˝vi; vj˛Dcj˝vj; vj˛
as hvi; vji D 0for i¤j:
Corollary 7.1.11. Let BD fv1; v2; : : :gbe an orthogonal set of nonzero
vectors in V: Then Bis linearly independent.
Lemma 7.1.12. Let BD fv1; v2; : : :gbe an orthogonal set of nonzero
vectors in V: If v2Vis a linear combination of the vectors in B; v D
Picivi;then kvk2DPijcij2kvik2:In particular if Bis orthonormal
then kvk2DPijcij2:
Proof. We compute
kvk2D hv; vi D X
i
civi;X
j
cjvj
DX
i;j
cicj˝vi; vj˛DX
iˇˇciˇˇ2˝vi; vi˛:
“book” — 2011/3/4 — 17:06 — page 193 — #207
i
i
i
i
i
i
i
i
7.1. Basic definitions 193
Corollary 7.1.13 (Bessel’s inequality).Let BD fv1; v2; : : : ; vngbe a finite
orthogonal set of nonzero vectors in V. For any vector v2V,
n
X
iD1ˇˇ˝v; vi˛ˇˇ2=
vi
2 kvk2;
with equality if and only if vDPn
iD1hv; viivi:
In particular, if Bis orthonormal then
n
X
iD1ˇˇ˝v; vi˛ˇˇ2 kvk2
with equality if and only if vDPn
iD1hv; viivi:
Proof. Let wDPn
iD1.hv; vii=kvik2/viand let xDvw: Then hv; vii D
hw; viifor each i; so hx; vii D 0for each iand hence hx; wi D 0: Then
kvk2D hv; vi D hwCx; w Cxi D kwk2C kxk2 kwk2
D
n
X
iD1ˇˇ˝v; vi˛ˇˇ2=
vi
2;
with equality if and only if xD0:
We have a more general notion of a norm.
Definition 7.1.14. Let Vbe a vector space over F:Anorm on Vis a
function k  k W V!Rsatisfying:
(a) kvk  0and kvk D 0if and only if vD0,
(b) kcvk D jcjkvkfor c2Fand v2V,
(c) kvCwk  kvk C kwkfor v; w 2V.Þ
Theorem 7.1.15. (1) Let Vbe an inner product space. Then
kvk D phv; vi
is a norm in the sense of Definition 7.1.14.
(2) Let Vbe a vector space and let k  k be a norm on V: There is an
inner product h;ion Vsuch that kvk D phv; viif and only if kk satisfies
the parallelogram law
kvCwk2C kvwk2D2kvk2C kwk2for all v; w 2V:
“book” — 2011/3/4 — 17:06 — page 194 — #208
i
i
i
i
i
i
i
i
194 Guide to Advanced Linear Algebra
Proof. (1) is immediate. For (2), given any norm we can define h;iby use
of the polarization identities of Lemma 7.1.7, and it is easy to verify that
this is an inner product if and only if k  k satisfies the parallelogram law.
We omit the proof.
Example 7.1.16. If
vD2
6
4
x1
:
:
:
xn
3
7
5
define kk on Fnby kvk D jx1jCCjxnj:Then kk is a norm that does
not come from an inner product. Þ
We now investigate some important topological properties.
Definition 7.1.17. Two norms k  k1and k k2on a vector space Vare
equivalent if there are positive constants aand Asuch that
akvk1 kvk2Akvk1for every v2V: Þ
Remark 7.1.18. It is easy to check that this gives an equivalence relation
on norms. Þ
Lemma 7.1.19. (1) Let kkbe any norm on a vector space V: Then d.v; w/
D kvwkis a metric on V:
(2) If k  k1and k  k2are equivalent norms on V; then the metrics
d1.v; w/ D kvwk1and d2.v; w/ D kvwk2give the same topology on
V:
Proof. (1) A metric on a space Vis a function dWVV!Vsatisfying:
(a) d.v; w/ 0and d.v; w/ D0if and only if wDv
(b) d.v; w/ Dd.w; v/
(c) d.v; x/ d.v; w/ Cd.w; x/.
It is then immediate that d.v; w/ D kvwkis a metric.
(2) The metric topology on a space Vwith metric dis the one with a
basis of open sets B".v0/D fvjd.v; v0/"gfor every v02Vand every
" > 0: Thus k  kigives the topology with basis of open sets Bi
".v0/D fvj
kvv0ki< "gfor v02Vand " > 0; for iD1; 2: By the definition
of equivalence B2
"=A.v0/B1
".v0/and B1
"=a.v0/B2
".v0/so these two
bases give the same topology.
“book” — 2011/3/4 — 17:06 — page 195 — #209
i
i
i
i
i
i
i
i
7.1. Basic definitions 195
Theorem 7.1.20. Let Vbe a finite-dimensional F-vector space. Then V
has a norm, and any two norms on Vare equivalent.
Proof. First we consider VDFn:Then Vhas the standard norm
kvk D hv; vi D tvv
coming from the standard inner product h;i.
It suffices to show that any other norm k  k2is equivalent to this one.
By property (b) of a norm, it suffices to show that there are positive
constants aand Awith
a kvk2Afor every v2Vwith kvk D 1:
First suppose that k  k2comes from an inner product h;i2:Then
hv; vi2DtvBv for some matrix B; and so we see that f .v/ D hv; vi2
is a quadratic function of the entries of v(in the real case) or the real and
complex parts of the entries of v(in the complex case). In particular f .v/ is
a continuous function of the entries of v: Now fvj kvk D 1gis a compact
set, and so f .v/ has a minimum a(necessarily positive) and a maximum A
there.
In the general case we must work a little harder. Let
mDmin
e1
2; : : : ;
en
2and MDmax
e1
2;:::;
en
2
where fe1; : : : ; engis the standard basis of Fn:
Let vDx1
:
:
:
xnwith kvk D 1: Then jxij  1for each i; so, by the
properties of a norm,
kvk2D
x1e1C  C xnen
2
x1e1
2C  C
xnen
2
Dˇˇx1ˇˇ
e1
2C  C ˇˇxnˇˇ
en
2
1MC  C 1MDnM:
We prove the other inequality by contradiction. Suppose there is no such
positive constant a: Then we may find a sequence of vectors v1; v2; : : : with
kvik D 1and kvik2< 1= i for each i:
Since fvj kvk D 1gis compact, this sequence has a convergent sub-
sequence w1; w2; : : : with kwik D 1and kwik2< 1= i for each i: Let
w1Dlimi!1 wi;and let dD kw1k2:(We cannot assert that dD0
since we do not know that k  k2is continuous.)
“book” — 2011/3/4 — 17:06 — page 196 — #210
i
i
i
i
i
i
i
i
196 Guide to Advanced Linear Algebra
For any ı > 0; let w2Vbe any vector with kww1k< ı: Then
dD
w1
2
w1w
2C
w
2ınM C kwk2:
Choose ıDd=.2nM /: Then kww1k< ı implies, by the above in-
equality, that
kwk2dınM Dd=2:
Choosing ilarge enough we have kwiw1k< ı and kwik2< d=2; a
contradiction.
This completes the proof for VDFn:For Van arbitrary vector space
of dimension n; choose any basis Bof Vand define k  k on Vby
kvk D
ŒvB
where k  k is the standard norm on Fn:
Remark 7.1.21. It is possible to put an inner product (and hence a norm)
on any vector space V; as follows: Choose a basis BD fv1; v2; : : :gof V
and define h;iby hvi; vji D 1if iDjand 0if i¤j; and extend h;ito
Vby (conjugate) linearity. However, unless we can actually write down the
basis B;this is not very useful. Þ
Example 7.1.22. If Vis any infinite-dimensional vector space then V
admits norms that are not equivalent. Here is an example. Let VDrF1:
Let vDŒx1; x2; : : : and wDŒy1; y2; : : :: Define h;ion Vby hv; wi D
P1
jD1xjyjand define h;i0on Vby hv; wi D P1
jD1xjyj=2j:Then h;i
and h;i0give norms k  k and k  k0that are not equivalent, and moreover
the respective metrics dand d0on Vdefine different topologies, as the
sequence of points fe1; e2; : : :gdoes not have a limit on the topology on V
given by d; but converges to Œ0; 0; : : : in the topology given by d0:Þ
7.2 The Gram-Schmidt process
Let Vbe an inner product space. The Gram-Schmidt process is a method
for transforming a basis for a finite-dimensional subspace of Vinto an or-
thonormal basis for that subspace. In this section we introduce this process
and investigate its consequences.
We fix V; the inner product h;i;and the norm k  k coming from this
inner product, throughout this section.
“book” — 2011/3/4 — 17:06 — page 197 — #211
i
i
i
i
i
i
i
i
7.2. The Gram-Schmidt process 197
Theorem 7.2.1. Let Wbe a finite-dimensional subspace of V; dim.W / D
k; and let BD fv1; v2; : : : ; vkgbe a basis of W: Then there is an orthonor-
mal basis CD fw1; w2; : : : ; wkgof Wsuch that Span.fw1; : : : ; wig/D
Span.fv1; : : : ; vig/for each iD1; : : : ; k: In particular, Whas an or-
thonormal basis.
Proof. By Lemma 7.1.3 and Theorem 6.2.29 we see immediately that W
has an orthonormal basis. Here is an independent construction.
Define vectors xiinductively:
x1Dw1;
xiDviX
j <i
hvi; xji
hxj; xjixjfor i > 1:
Then set
wiDxi=kxikfor each i:
Definition 7.2.2. The basis Cof Wobtained in the proof of Theo-
rem 7.2.1 is said to be obtained from the basis Bof Wby applying the
Gram-Schmidt process to B.Þ
Remark 7.2.3. The Gram-Schmidt process generalizes without change
to the following situation: Let Wbe a vector space of countably infinite di-
mension, and let BD fv1; v2; : : :gbe a basis of Vwhose elements are in-
dexed by the positive (or nonnegative) integers. The proof of Theorem 7.2.1
applies to give an orthonormal basis Cof W: Þ
We recall another two definitions from Chapter 6.
Definition 7.2.4. Let Wbe a subspace of V: Its orthogonal complement
W?is the subspace of Vdefined by
W?D˚x2Vj hx; wi D 0for every w2W:Þ
Definition 7.2.5. Vis the orthogonal direct sum VDW1?W2of
subspaces W1and W2if (1) VDW1˚W2is the direct sum of the subspaces
W1and W2(2) W1and W2are orthogonal subspaces of V. Equivalently, if
vDw1Cw2with w12W1and w22W2, then
kvk2D kw1k2C kw2k2:Þ
Theorem 7.2.6. Let Wbe a finite-dimensional subspace of V. Then Vis
the orthogonal direct sum VDW?W?.
“book” — 2011/3/4 — 17:06 — page 198 — #212
i
i
i
i
i
i
i
i
198 Guide to Advanced Linear Algebra
Proof. If Vfinite-dimensional, then, by Lemma 7.1.3, 'jWis nonsingular
(as is 'itself), so, by Lemma 6.2.16, VDW?W?:
Alternatively, let dim.V / Dnand dim.W / Dk: Choose a basis B1D
fv1; : : : ; vkgof Wand extend it to a basis BD fv1; : : : ; vngof V: Apply
the Gram-Schmidt process to Bto obtain a basis CD fw1; : : : ; wngof V:
Then C1D fw1; : : : ; wkgis a basis of W: It is easy to check that C2D
fwkC1; : : : ; wngis a basis of W?;from which it follows that VDW?
W?:
In general, choose an orthogonal basis CD fw1; : : : ; wkgof W: For
v2V; let xDPhv; wiiwi:Then x2Wand hx; wii D hv; wiifor
iD1; : : : ; k; which implies hx; wi D hv; wifor every w2W: Thus
hvx; wi D 0for every w2W; and so vx2W?:Since vDxC.vx/;
we see that VDWCW?:Now hy; zi D 0whenever y2Wand z2W?:
If w2W\W?;set yDwand zDwto conclude that hw; wi D 0; which
implies that wD0: Thus VDW˚W?:Finally, if VDW˚W?then
VDW?W?by the definition of W?:
Lemma 7.2.7. Let Wbe a subspace of Vand suppose that VDW?W?:
Then .W ?/?DW:
Proof. If Vis finite-dimensional, this is Corollary 6.2.18. The following
argument works in general.
It is easy to check that .W ?/?W: Let v2.W ?/?:Since v2V;
we may write vDxCywith x2Wand y2W?:Then 0D hv; yi D
hxCy; yi D hx; yi C hy; yi D hy; yiso yD0; and hence vDx: Thus
.W ?/?DW:
Corollary 7.2.8. Let Wbe a nite-dimensional subspace of V. Then
.W ?/?DW.
Proof. Immediate from Theorem 7.2.6 and Lemma 7.2.7.
Example 7.2.9. Let VrF11 be the subspace consisting of all ele-
ments Œx1; x2; : : : with fxigbounded (i.e., such that there is a constant M
with jxij< M for each i). Give Vthe inner product
hŒx1; x2; : : :; Œy1; y2; : : :i D 1
X
jD1
xjyj=2j:
Let WDrF1and note that Wis a subspace of V. If yDŒy1; y2; : : : 2W?
then, since ei2Wfor each i,0D hei; yi D yi=2i, so yDŒ0; 0; : : :. Thus
W?Df0g, and we see that V¤W?W?and that .W ?/?¤W.Þ
“book” — 2011/3/4 — 17:06 — page 199 — #213
i
i
i
i
i
i
i
i
7.2. The Gram-Schmidt process 199
Definition 7.2.10. Let Wbe a subspace of Vand suppose that VD
W?W?. The orthogonal projection Wis the linear transformation de-
fined by W.v/ Dxwhere vDxCywith x2Wand y2W?.Þ
Lemma 7.2.11. Let Wbe a finite-dimensional subspace of Vand let CD
fw1; : : : ; wkgbe an orthonormal basis of W. Then
W.v/ D
k
X
iD1˝v; wi˛wifor every v2V:
Proof. Immediate from the proof of Theorem 7.2.6.
Corollary 7.2.12. Let Wbe a finite-dimensional subspace of Vand let
CD fw1; : : : ; wkgand C0D fw0
1; : : : ; w0
kgbe two orthonormal bases of
W: Then
k
X
iD1˝v; wi˛wiD
k
X
iD1˝v; w0
i˛w0
ifor every v2V:
Proof. Both are equal to W.v/:
Lemma 7.2.13. Let Wbe a subspace of Vsuch that VDW?W?:Then
2
WDW; …W?DIW;and W?WDWW?D0:
Proof. This follows immediately from Definition 7.2.10.
Remark 7.2.14. Suppose that Vis finite-dimensional. Let TDW:
By Lemma 7.2.13, T2DTso p.T/D0where p.x/ is the polynomial
p.x/ Dx2xDx.x 1/: Then the minimum polynomial mT.x/ divides
p.x/: Thus mT.x/ Dx; which occurs if and only if WD f0g;or mT.x/ D
x1; which occurs if and only if WDV; or mT.x/ Dx.x 1/: In this last
case Wis the 1-eigenspace of Wand W?is the 0-eigenspace of W:In
any case Wis diagonalizable (over Ror over C), as mT.x/ is a product
of distinct linear factors. Þ
Let us revisit the Gram-Schmidt process from the point of view of or-
thogonal projections. First we need another definition.
Definition 7.2.15. The normalization map NWV f0g ! fv2Vj
kvk D 1gis the function N.v/ Dv=kvk.Þ
“book” — 2011/3/4 — 17:06 — page 200 — #214
i
i
i
i
i
i
i
i
200 Guide to Advanced Linear Algebra
Corollary 7.2.16. Let Wbe a finite-dimensional subspace of Vand let
BD fv1; : : : ; vkgbe a basis of W. Let
W0D f0gand WiDSpan.fv1; : : : ; vig/
for 1i < k. Then the basis CD fw1; : : : ; wkgof Wobtained from Vby
the Gram-Schmidt procedure is given by
wiDNW?
i1vifor iD1; : : : ; k:
The Gram-Schmidt process has important algebraic and topological con-
sequences.
Definition 7.2.17. Let FDRor C. A k-frame in Fnis a linearly
independent k-tuple fv1; : : : ; vkgof vectors in Fn. An orthonormal k-frame
in Fnis an orthonormal k-tuple fv1; : : : ; vkgof vectors in Fn. Set
Gn;k .F/D˚k-frames in Fn
and
Sn;k .F/D˚orthonormal k-frames in Fn:
By identifying fv1; : : : ; vngwith the n-by-kmatrix Œv1jjvkwe iden-
tify Gn;k .F/and Sn;k .F/with subsets of Mn;k .F/. Let Fnk have its usual
topology. The natural identification of Mn;k.F/with Fnk gives a topology
on Mn;k .F/and hence on Gn;k .F/and Sn;k .F/as well. Þ
In order to formulate our result we need a preliminary definition.
Definition 7.2.18. Let AC
kDfk-by-kdiagonal matrices with positive
real number entriesg. For FDRor C, let Nk.F/Dfk-by-kupper trian-
gular matrices with entries in Fand with all diagonal entries equal to 1g.
Topologize AC
kand Nk.F/as subsets of Fk2.Þ
Lemma 7.2.19. With these identifications, any matrix P2Gn;k .R/can
be written uniquely as PDQAN where Q2Sn;k .R/,A2AC
k, and
N2Nk.R/, and any matrix P2Gn;k .C/can be written uniquely as
PDQAN where Q2Sn;k .C/,A2AC
k;and N2Nk.C/.
Proof. The proof is identical in both cases, so we let FDRor C.
Let PDŒv1jjvn. In the notation of the proof of Theorem 7.2.1,
we see that for each iD1; : : : ; k; xiis a linear combination of viand
x1; : : : ; xi1, which implies that viis a linear combination of x1; : : : ; xi.
Also we see that in any such linear combination the xi-coefficient of vi
is 1. Thus PDQ0Nwhere Q0DŒx1jjxkand N2Nk.F/. But
xiD kxikwiso Q0DQA where QDŒw1jjwkand A2AC
kis the
“book” — 2011/3/4 — 17:06 — page 201 — #215
i
i
i
i
i
i
i
i
7.2. The Gram-Schmidt process 201
diagonal matrix with entries kx1k;:::;kxkk. Hence Pcan be written as
PDQAN .
To show uniqueness, suppose PDQ1A1N1DQ2A2N2. Let M1D
A1N1and M2DA2N2. Then Q1M1DQ2M2so Q1DQ2M2M1
1,
where M2M1
1is upper triangular with positive real entries on the diagonal.
Let Q1DŒw1jw2jjwkand Q2DŒw0
1jw0
2jjw0
k. If M2M1
1had a
nonzero entry in the .i; j / position with i < j , then, choosing the smallest
such j,hwi; wji ¤ 0, which is impossible. Thus M2M1
1is a diagonal
matrix. Since hwi; wii D 1for each i, the diagonal entries of M2M1
1
all have absolute value 1, and since they are positive real numbers, they
are all 1. Thus M2M1
1DI. Then M2DM1and hence Q2DQ1.
Hence MDM1and QDQ1are uniquely determined. For any matrices
A2AC
kand N2Nk.F/, the diagonal entries of AN are equal to the
diagonal entries of A, so the diagonal entries of Aare equal to the diagonal
entries of M. Thus A, being a diagonal matrix, is also uniquely determined.
Then NDA1Mis uniquely determined as well.
Theorem 7.2.20. With the above identifications, the multiplication maps
mWSn;k .R/AC
kNk.R/! Gn;k .R/
and
mWSn;k .C/AC
kNk.C/! Gn;k .C/
given by PDm.Q; A; N / DQAN are homeomorphisms.
Proof. In either case, the map mis obviously continuous, and Lemma 7.2.19
shows that it is 1-to-1and onto. The proof of Theorem 7.2.1 shows that
m1WP!.Q; A; N / is also continuous, so mis a homeomorphism.
Corollary 7.2.21. With the above identifications, Sn;k .R/is a strong de-
formation retract of Gn;k .R/and Sn;k .C/is a strong deformation retract of
Gn;k .C/:
Proof. Let FDRor C.Sn;k.F/is a subspace of Gn;k .F/and, in the
notation of Lemma 7.2.19, we have QDQII where the first Iis in AC
k
and the second is in Nk.F/.
A subspace Xof a space Yis a strong deformation retract of Yif there
is a continuous function RWYŒ0; 1 !Ywith
(a) R.y; 0/ Dyfor every y2Y,
(b) R.x; t / Dxfor every x2X,t2Œ0; 1,
(c) R.y; 1/ 2Xfor every y2Y.
“book” — 2011/3/4 — 17:06 — page 202 — #216
i
i
i
i
i
i
i
i
202 Guide to Advanced Linear Algebra
(We think of tas “time” and set Rt.y/ DR.y; t/. Then R0is the identity
on Y,R1WY!X; and Rt.x/ Dxfor every xand t; so points in X
“never move”.)
In our case, the map Ris defined as follows. If PDQAN then
R.P; t/ DQA.1t/tI C.1 t/N :
7.3 Adjoints, normal linear
transformations, and the
spectral theorem
In this section we derive additional properties of adjoints in the case of inner
product spaces. Then we introduce the notion of a normal linear transfor-
mation TWV!Vand study its properties, culminating in the spectral
theorem.
We fix V; the inner product '.x; y/ D hx; yi;and the norm kxk D
hx; xi;throughout.
Let TWV!Wbe a linear transformation between inner product
spaces. In Definition 6.3.1 we defined its adjoint Tadj. We here follow com-
mon mathematical practice and denote Tadj by T. (This notation is am-
biguous because Talso denotes the dual of T,TWW!V, but
in this section we will always be considering the adjoint and never the
dual.) Lemma 6.3.2 guaranteed the existence of Tonly in case Vis finite-
dimensional, but we observed in Remark 6.3.8 that if Tis defined, it is
well-defined.
We first derive some relationships between Tand T:
Lemma 7.3.1. Let Vand Wbe finite-dimensional inner product spaces
and let TWV!Wbe a linear transformation. Then
(1) Im.T/DKer.T/?and Ker.T/DIm.T/?
(2) dim.Ker.T// Ddim.Ker.T//
(3) If dim.W / Ddim.V / then dim.Im.T// Ddim.Im.T//:
Proof. Let UDKer.T/. Let dim.V / Dnand dim.U / Dk, so dim.U ?/D
nk. Then, for any u2Uand any v2V;
˝u; T.v/˛D˝T.u/; v˛D h0; vi D 0;
“book” — 2011/3/4 — 17:06 — page 203 — #217
i
i
i
i
i
i
i
i
7.3. Adjoints, normal linear transformations, and. . . 203
so Im.T/U?:Hence dim.Ker.T// kDdim.Ker.T//: Replacing
Tby Twe obtain dim.Ker.T// dim.Ker.T//: But TDT;so
dim.Ker.T// Ddim.Ker.T// and Im.T/DKer.T/?. The proof that
Ker.T/DIm.T/?is similar. Then (3) follows from Theorem 1.3.1.
Corollary 7.3.2. Let Vbe a finite-dimensional inner product space and
let TWV!Vbe a linear transformation. Suppose that Thas a Jordan
Canonical Form over F(which is always the case if FDC). Then T
has a Jordan Canonical Form over F. The Jordan Canonical Form of T
is obtained from the Jordan Canonical Form of Tby taking the conjugate
of each diagonal entry if FDCand is the same as the Jordan Canonical
Form of Tif FDR:
Proof. By Lemma 6.3.10, .TI/DTI:Apply Lemma 7.3.1 with
Treplaced by .TI/kto obtain that the spaces Ek
of Tand Ek
of T
have the same dimension for every eigenvalue of Tand every positive
integer k: These dimensions determine the Jordan Canonical Forms.
Corollary 7.3.3. Let Vbe a finite-dimensional inner product space and let
TWV!Vbe a linear transformation. Then
(1) mT.x/ DmT.x/
(2) cT.x/ DcT.x/:
Proof. (1) Follows immediately from Lemma 6.3.10 and Lemma 7.3.1.
(2) Follows immediately from Corollary 7.3.2 in case FDC:In case
FDR;choose a basis of V; represent Tin that basis by a matrix, and then
regard that matrix as a matrix over C:
Now we come to the focus of our attention, normal linear transforma-
tions.
Definition 7.3.4. A linear transformation TWV!Vis normal if
(1) Thas an adjoint T
(2) Tcommutes with T, i.e., TıTDTıT.Þ
Let us look at a couple of special cases.
Definition 7.3.5. A linear transformation TWV!Vis self-adjoint if
Thas an adjoint Tand TDT.Þ
“book” — 2011/3/4 — 17:06 — page 204 — #218
i
i
i
i
i
i
i
i
204 Guide to Advanced Linear Algebra
We also recall the definition of an isometry, which we restate for con-
venience in the special case we are considering here, and establish some
properties of isometries.
Definition 7.3.6. Let Vbe an inner product space. An isometry TW
V!Vis an invertible linear transformation such that hT.v/; T.w/i D
hv; wifor all v; w 2V.Þ
We observe that sometimes invertibility is automatic.
Lemma 7.3.7. Let TWV!Vbe a linear transformation. Then
hT.v/; T.w/i D hv; wi
for all v; w 2Vif and only if kT.v/k D k.v/kfor all v2V. If these
equivalent conditions are satisfied, then Tis an injection. If furthermore V
is finite dimensional, then Tis an isomorphism.
Proof. Since kT.v/k2D hT.v/; T.v/i, the first condition implies the sec-
ond, and the second implies the first by the polarization identities.
Suppose these conditions are satisfied. Let v2V,v¤0. Then 0¤
kvk D kT.v/kso T.v/ ¤0and Tis an injection. Any injection from a
finite-dimensional vector space to itself is an isomorphism.
Example 7.3.8. Let VDrF1with the standard inner product
hŒx1; x2; : : :; Œy1; y2; : : :i D Xxiyi:
Then right-shift RWV!Vsatisfies hR.v/; R.w/i D hv; wifor every
v; w 2Vand Ris an injection but not an isomorphism. Þ
Lemma 7.3.9. Let TWV!Vbe an isometry. Then Thas an adjoint T
and TDT1.
Proof. If there is a linear transformation SWV!Vsuch that
˝T.v/; w˛D˝v; S.w/˛for every v; w 2V;
then Sis well-defined and SDT:Since Tis an isometry, we see that
˝v; T1.w/˛D˝T.v/; TT1.w/˛D˝T.v/; w˛:
Corollary 7.3.10. (1) If Tis self-adjoint then Tis normal.
(2) If Tis an isometry then Tis normal.
“book” — 2011/3/4 — 17:06 — page 205 — #219
i
i
i
i
i
i
i
i
7.3. Adjoints, normal linear transformations, and. . . 205
We introduce some traditional language.
Definition 7.3.11. If Vis a real inner product space an isometry of V
is orthogonal. If Vis a complex inner product space an isometry of Vis
unitary.Þ
Definition 7.3.12. A matrix Pis orthogonal if tPDP1. A matrix
Pis unitary if tPDP1.Þ
Corollary 7.3.13. Let Vbe a finite-dimensional inner product space and
let TWV!Vbe a linear transformation. Let Cbe an orthonormal basis
of Vand set MDŒTC.
(1) If Vis a real vector space, then
(a) If Tis self-adjoint, Mis symmetric.
(b) If Tis orthogonal, Mis orthogonal.
(2) If Vis a complex vector space, then
(a) If Tis self-adjoint, Mis Hermitian.
(b) If Tis unitary, Mis unitary.
Proof. Immediate from Corollary 6.3.7.
Let us now look at some interesting examples on infinite dimensional
vector spaces.
Example 7.3.14. (1) Let VDrF1. Let RWV!Vbe right shift, and
LWV!Vbe left shift. Let vDŒx1; x2; : : : and wDŒy1; y2; : : :. Then
hR.v/; wi D x1y2Cx2y3C  D hv; L.w/i
so LDR. Similarly,
hL.v/; wi D x2y1Cx3y2C  D hv; R.w/i
so RDL(as we expect from Lemma 6.3.11). Note that LR DIbut
RL ¤Iso Land Rare not normal. Also note that 1Ddim.Ker.L// ¤0D
dim.Ker.R//, giving a counterexample to the conclusion of Lemma 7.3.1 in
the infinite-dimensional case.
(2) Let Vbe the vector space of doubly infinite sequences of elements
of Fonly finitely many of which are nonzero
VD fŒ: : : ; x2; x1; x0; x1; x2; : : : jxiD0for all but finitely many ig:
“book” — 2011/3/4 — 17:06 — page 206 — #220
i
i
i
i
i
i
i
i
206 Guide to Advanced Linear Algebra
Vhas the inner product hv; wi D Pxiyi(in the obvious notation) and
linear transformations R(right shift) and L(left shift) defined in the obvious
way. Then Land Rare both isometries, and are inverses of each other.
Direct computation as in (1) shows that LDRand RDL, as we expect
from Lemma 7.3.9.
(3) Let VDrF1and let TWV!Vbe defined as follows:
Tx1; x2; x3; : : : D"X
i1
xi; 0; 0; : : : #:
We claim that Tdoes not have an adjoint. We prove this by contradiction.
Suppose Texisted. Let T.e1/DŒa1; a2; a3; : : :. Then for each kD
1; 2; 3; : : :,
1D˝ek; e1˛D˝ek;Te1˛Dak;
which is impossible as T.e1/2Vhas only finitely many nonzero entries.
Þ
We may construct normal linear transformations as follows.
Example 7.3.15. Let 1; : : : ; kbe distinct scalars and let W1; : : : ; Wk
be nonzero subspaces of Vwith VDW1?  ? Wk:Define TWV!V
as follows: Let v2Vand write vuniquely as vDv1C  C vkwith
vi2Wi:Then
T.v/ D1v1C  C kvk:
(Thus 1; : : : ; kare the distinct eigenvalues of Tand W1; : : : ; Wkare the
associated eigenspaces.) It is easy to check that
T.v/ D1v1C  C kvk
(so 1; : : : ; kare the distinct eigenvalues of Tand W1; : : : ; Wkare the
associated eigenspaces). Then
TT.v/ D j1j2v1C  C jkj2vkDT T .v/;
so Tis normal. Clearly Tis self-adjoint if and only if iDifor each i;
i.e., if and only if each iis real. Þ
Our next goal is the spectral theorem, which shows that on a finite-
dimensional complex vector space V; every normal linear transformation is
of this form, and on a finite-dimensional real vector space every self-adjoint
linear transformation is of this form.
We first derive a number of properties of normal linear transformations
(on an arbitrary vector space V).
“book” — 2011/3/4 — 17:06 — page 207 — #221
i
i
i
i
i
i
i
i
7.3. Adjoints, normal linear transformations, and. . . 207
Lemma 7.3.16. Let TWV!Vbe a normal linear transformation. Then
Tis normal. Furthermore,
(1) p.T/is normal for any polynomial p.x/ 2CŒx: If Tis self-adjoint,
p.T/is self-adjoint for any polynomial p.x/ 2RŒx:
(2) kT.v/k D kT.v/kfor every v2V: Consequently Ker.T/DKer.T/:
(3) Ker.T/DIm.T/?and Ker.T/DIm.T/?:
(4) If T2.v/ D0then T.v/ D0:
(5) The vector v2Vis an eigenvector of Twith eigenvalue if and only
vis an eigenvector of Twith eigenvalue :
(6) Eigenspaces of distinct eigenvalues of Tare orthogonal.
Proof. By Lemma 6.3.11, Thas adjoint TDT;and then TT D
TTDT T DTT:
(1) follows from Lemma 6.3.10.
For (2), we compute
T.v/
2D˝T.v/; T.v/˛D˝v; TT.v/˛D˝v; T T .v/˛
D˝v; TT.v/˛D˝T.v/; T.v/˛D
T.v/
2:
Also, we observe that v2Ker.T/,T.v/ D0, kT.v/k D 0:
For (3), u2Ker.T/,u2Ker.T/; by (2), , hT.u/; vi D 0for
all v, hu; T.v/i D 0for all v,u2Im.T/?;yielding the first half
of (3), and replacing Tby T;which is also normal, we obtain the second
half of (3).
For (4), let wDT.v/: Then w2Im.T/: But T.w/ DT2.v/ D0; so
w2Ker.T/: Thus w2Ker.T/\Im.T/D f0gby (3).
For (5), vis an eigenvector of Twith eigenvalue ,v2Ker.T
I/,v2Ker..TI//by (2) DKer.TI/by Lemma 6.3.10(4).
For (6), let v1be an eigenvector of Twith eigenvalue 1and let v2be
an eigenvector of Twith eigenvalue 2;with 2¤1:Set SDT1I:
Then S.v1/D0so
0D˝Sv1; v2˛D˝v1;Sv2˛D˝v1;T1Iv2˛
D˝v1;21v2˛(by (5))
D21˝v1; v2˛
so ˝v1; v2˛D0.
“book” — 2011/3/4 — 17:06 — page 208 — #222
i
i
i
i
i
i
i
i
208 Guide to Advanced Linear Algebra
Corollary 7.3.17. Let Vbe finite-dimensional and let TWV!Vbe a
normal linear transformation. Then Im.T/DIm.T/:
Proof. By Corollary 7.2.8 and Lemma 7.3.16(2) and (3),
Im.T/DKer.T/?DKer T?DIm T:
While Lemma 7.3.16 gives information about the eigenvectors of a nor-
mal linear transformation TWV!V; when Vis infinite dimensional T
may have no eigenvalues or eigenvectors.
Example 7.3.18. Let Rbe right shift, or Lleft shift, on the vector space
Vof Example 7.3.14(2). It is easy to check that, since every element of
Vcan have only finitely many nonzero entries, neither Rnor Lhas any
eigenvalues or eigenvectors. Þ
By contrast, in the finite-dimensional case we may obtain strong infor-
mation about the structure of T:
Lemma 7.3.19. Let Vbe a finite-dimensional inner product space and let
TWV!Vbe a normal linear transformation. Then the minimum poly-
nomial mT.x/ is a product of distinct irreducible factors. If Vis a complex
vector space, or if Vis a real vector space and Tis self-adjoint, every
irreducible factor of mT.x/ is linear.
Proof. Let p.x/ be an irreducible factor of mT.x/: We prove that p2.x/
does not divide mT.x/ by contradiction. Suppose p2.x/ divides mT.x/:
Then there is a vector v2Vwith p2.T/.v/ D0but p.T/.v/ ¤0: Let
SDp.T/: Then Sis normal and S2.v/ D0but S.v/ ¤0; contradicting
Lemma 7.3.16(4).
If Vis a complex vector space there is nothing further to do, as every
complex polynomial is a product of linear factors.
Suppose that Vis a real vector space. Then every real polynomial is a
product of linear and irreducible quadratic factors, and we must show none
of the latter occur. Again we argue by contradiction. Suppose p.x/ Dx2C
bx Ccis an irreducible factor of mT.x/; and let v2Vbe a nonzero vector
with p.T/.v/ D0: We can write p.x/ D.x Cb=2/2Cd2where dis the
real number dDpc2b2=4: Set SDTC.b=2/I;so .S2Cd2I/.v/ D
0; i.e., S2.v/ D d2v: Then, as Sis self-adjoint,
0 < ˝S.v/; S.v/˛D˝v; SS.v/˛D˝v; S2.v/˛D d2hv; vi;
which is impossible.
“book” — 2011/3/4 — 17:06 — page 209 — #223
i
i
i
i
i
i
i
i
7.3. Adjoints, normal linear transformations, and. . . 209
Corollary 7.3.20 (Spectral theorem).(1) Let Vbe a finite-dimensional
complex inner product space and let TWV!Vbe a normal linear trans-
formation. Then Vhas an orthonormal basis of eigenvectors of T.
(2) Let Vbe a finite-dimensional real inner product space and let TW
V!Vbe a self-adjoint linear transformation. Then Vhas an orthogonal
basis of eigenvectors of T.
Proof. The proof in both cases is identical. By Lemma 7.3.19, mT.x/ is a
product of distinct linear factors. Let 1; : : : ; kbe the roots of mT.x/; i.e.,
by Lemma 4.2.6, the eigenvalues of T:Let Eibe the associated eigenspace
of T;for each i: By Theorem 4.3.4, VDE1˚  ˚ Ek;and then by
Lemma 7.3.16(6), VDE1?  ? Ek:By Theorem 7.2.1, each Ei
has an orthonormal basis Ci:Then CDC1[  [ Ckis an orthonormal
basis of eigenvectors of T:
We restate this result in matrix terms.
Corollary 7.3.21. (1) Let Abe a Hermitian matrix. Then there is a unitary
matrix Pand a diagonal matrix Dwith
ADPDP 1DPD tP :
(2) Let Abe a real symmetric matrix. Then there is a real orthogonal
matrix Pand a diagonal matrix Dwith real entries with
ADPDP 1DPD tP:
We have a third formulation of the spectral theorem, in terms of orthog-
onal projections.
Corollary 7.3.22. Under the hypotheses of the spectral theorem, there are
distinct complex numbers 1; : : : ; k;which are real in case Tis self-
adjoint, and subspaces W1; : : : ; Wk;such that
(1) VDW1?  ? Wk
(2) If TiDWiis the orthogonal projection of Vonto the subspace Wi,
then T2
iDTi,TiTjDTjTiD0for i¤j, and IDT1C  C Tk.
Furthermore,
TD1T1C  C kTk:
Proof. Here 1; : : : ; kare the eigenvalues of Tand the subspaces W1;:::;
Wkare the eigenspaces E1; : : : ; Ek:
“book” — 2011/3/4 — 17:06 — page 210 — #224
i
i
i
i
i
i
i
i
210 Guide to Advanced Linear Algebra
Corollary 7.3.23. In the situation of, and in the notation of, Corollary 7.3.22,
TD1T1C  C kTk:
Corollary 7.3.24. Let Vbe a finite-dimensional inner product space and let
TWV!Vbe a linear transformation. Suppose that mT.x/ is a product
of linear factors over F(which is always the case if FDC). Then Tis an
isometry if and only if jj D 1for every eigenvalue 2Fof I:
Let us compare arbitrary and normal linear transformations.
Theorem 7.3.25 (Schur’s theorem).Let Vbe a finite-dimensional inner
product space and let TWV!Vbe an arbitrary linear transformation.
Then Vhas an orthonormal basis Cin which ŒTCis upper triangular if
and only if the minimum polynomial mT.x/ is a product of linear factors
(this being automatic if FDC).
Proof. The “only if” direction is clear. We prove the “if” direction.
For any linear transformation T;if Wis a T-invariant subspace of V
then W?is a T-invariant subspace of V; because for any x2Wand
y2W?0D˝T.x/; y˛D˝x; T.y/˛:
We prove the theorem by induction on nDdim.V /: If nD1there is
nothing to prove. Suppose the theorem is true for all inner product spaces
of dimension n1and let Vhave dimension n:
Since mT.x/ is a product of linear factors, so is mT.x/; by Corol-
lary 7.3.3. In particular TWV!Vhas an eigenvector vn;and we may
assume kvnk D 1: Let Wbe the subspace of Vspanned by fvng:Then W?
is a subspace of Vof dimension n1that is invariant under TDT:If
Sis the restriction of Tto W?;then mS.x/ divides mT.x/; so mS.x/ is
a product of linear factors. Applying the inductive hypothesis, we conclude
that W?has an orthonormal basis C1D fv1; : : : ; vn1gwith ŒSC1upper
triangular. Set CD fv1; : : : ; vng:Then ŒTCis upper triangular.
Theorem 7.3.26. Let Vbe a nite-dimensional inner product space and let
TWV!Vbe a linear transformation. Let Cbe any orthonormal basis
of Vwith ŒTCupper triangular. Then Tis normal if and only if ŒTCis
diagonal.
Proof. The “if” direction is clear. We prove the “only if” direction. Let
EDŒTC:By the spectral theorem, Corollary 7.3.21, Vhas a basis C1
with DDŒTC1diagonal. Then EDPDP 1where PDPC C1is
“book” — 2011/3/4 — 17:06 — page 211 — #225
i
i
i
i
i
i
i
i
7.4. Examples 211
the change of basis matrix. We know PDQ1Rwhere QDPE Cand
RDPE C1:Since Cand C1are both orthonormal, Qand Rare both
isometries, and hence Pis an isometry, tPDP1in the real case and
tPDP1in the complex case. Thus tEDt.PDP 1/Dt.PD tP / D
PtDtPDPDP 1DEin the real case, and similarly tEDEin the
complex case. Since Eis upper triangular, this forces Eto be diagonal.
7.4 Examples
In this section we present some interesting and important examples of inner
product spaces and related phenomena. We look at orthogonal or orthonor-
mal sets, linear transformations that do or do not have adjoints, and linear
transformations that are or are not normal.
Our examples share a common set-up. We begin with an interval IR
and a “weight” function w.x/ on I: We further suppose that we have a
vector space Vof functions on Iwith the properties that
(a) RIf .x/g.x/w.x/ dx is defined for all f .x/,g.x/ 2V
(b) RIf .x/f .x/w.x/ dx is a nonnegative real number for every f .x/ 2
V, and is zero only if f .x/ D0.
Then Vtogether with
˝f .x/; g.x/˛DZI
f .x/g.x/w.x/dx
is an inner product space.
Except in Examples 7.4.3 and 7.4.4, we restrict our attention to the real
case. This is purely for convenience, and the results generalize to the com-
plex case without change.
Example 7.4.1. (1) Let VDP1.R/; the space of all real polynomials.
Then
'f .x/; g.x/D˝f .x/; g.x/˛DZ1
0
f .x/g.x/dx
gives Vthe structure of an inner product space.
We claim that the map ˛'WV!Vis not surjective, where
˛'.g.x/f .x// D'.f .x/; g.x//:
For any a2Œ0; 1; we have the element Eaof Vgiven by Ea.f .x// D
f .a/. We claim that for any finite set of points fa1; : : : ,akgin Œ0; 1 and
any constants fc1; : : : ; ckg, not all zero, PciEaiis not in ˛'.V /. We prove
“book” — 2011/3/4 — 17:06 — page 212 — #226
i
i
i
i
i
i
i
i
212 Guide to Advanced Linear Algebra
this by contradiction. Suppose PciEaiD˛'.g.x// for some g.x/ 2V:
Then for any polynomial f .x/ 2V,
Z1
0
f .x/g.x/ D
k
X
iD1
cifai:
Clearly g.x/ ¤0.
Choose
f .x/ D k
Y
iD1xai2!g.x/:
The left-hand side of this equation is positive while the right-hand side
is zero, which is impossible.
(2) For any n, let VDPn1.R/, the space of all real polynomials of
degree at most n. Again
'f .x/; g.x/D˝f .x/; g.x/˛DZ1
0
f .x/g.x/dx
gives Vthe structure of an inner product space. Here dim.V / Dnso
dim.V /Dnas well.
(a) Any nlinearly independent elements of Vform a basis of V.
In particular fEa1;:::;Eangis a basis of Vfor any distinct set of points
fa1; : : : ; angin Œ0; 1: Then for any fixed g.x/ 2V; ˛'.g.x// 2V;so
˛'.g.x// is a linear combination of fEa1;:::;Eang:In other words, there
are constants c1; : : : ; cnsuch that
Z1
0
f .x/g.x/dx D
n
X
iD1
cifai:
In particular, we may choose g.x/ D1; so there are constants c1; : : : ; cn
with
Z1
0
f .x/dx D
n
X
iD1
cifiaifor every f .x/ 2Pn1.x/:
(b) Since ˛'is an injection and Vis finite-dimensional, it is a surjection.
Thus any element of Vis ˛'.g.x// for a unique polynomial g.x/ 2Pn1.
In particular, this is true for Ea, for any a2Œ0; 1. Thus there is a polyno-
mial g.x/ 2Pn1.x/ such that
f .a/ DZ1
0
f .x/g.x/dx for every f .x/ 2Pn1.x/:
“book” — 2011/3/4 — 17:06 — page 213 — #227
i
i
i
i
i
i
i
i
7.4. Examples 213
Concrete instances of both parts (a) and (b) of this example were given
in Example 1.6.9(3) and (4). Þ
Example 7.4.2. We let VDP1.R/and we choose the standard basis
ED fp0.x/; p1.x/; p2.x/; : : :g D f1; x; x2; : : :g
of V: We may apply the Gram-Schmidt process to obtain an orthonormal ba-
sis CD fq0.x/; q1.x/; q2.x/; : : :gof V: Actually, we will obtain an orthog-
onal basis Cof V; but we will normalize the basis elements by kqi.x/k2D
hiwhere fh0; h1; h2; : : :gis not necessarily f1; 1; 1; : : :g:This is partly for
historical reasons, but mostly because the purposes for which these func-
tions were originally derived made the given normalizations more useful.
(1) Let IDŒ1; 1 and w.x/ D1. Let hnD2=.2n C1/. The se-
quence of polynomials we obtain in this way are the Legendre polynomials
P0.x/; P1.x/; P2.x/; : : :. The first few of these are
P0.x/ D1
P1.x/ Dx
P2.x/ D1
21C3x2
P3.x/ D1
23x C5x3
P4.x/ D1
8330x2C35x4;
and, expressing the elements of Ein terms of them,
1DP0.x/
xDP1.x/
x2D1
3P0.x/ CP2.x/
x3D1
53P1.x/ C2P3.x/
x4D1
35 7P0.x/ C20P2.x/ C8P4.x/:
(2) Let IDŒ1; 1 and w.x/ D1=p1x2:Let h0Dand hnD
=2 for n1. The sequence of polynomials we obtain in this way are the
Chebyshev polynomials of the rst kind T0.x/; T1.x/; T2.x/; : : : : The first
“book” — 2011/3/4 — 17:06 — page 214 — #228
i
i
i
i
i
i
i
i
214 Guide to Advanced Linear Algebra
few of these are given by
T0.x/ D1
T1.x/ Dx
T2.x/ D 1C2x2
T3.x/ D 3x C4x3
T4.x/ D18x2C8x4;
and, expressing the elements of Ein terms of them,
1DT0.x/
xDT1.x/
x2D1
2T0.x/ CT2.x/
x3D1
43T1.x/ CT3.x/
x4D1
83T0.x/ C4T2.x/ CT4.x/:
(3) Let IDŒ1; 1 and w.x/ Dp1x2. Let hnD=2 for all
n. The sequence of polynomials we obtain in this way are the Chebyshev
polynomials of the second kind U0.x/; U1.x/; U2.x/; : : :. The first few of
these are
U0.x/ D1
U1.x/ D2x
U2.x/ D 1C4x2
U3.x/ D 4x C8x3
U4.x/ D112x2C16x4;
and, expressing the elements of Ein terms of them,
1DU0.x/
xD1
2U1.x/
x2D1
4U0.x/ CU2.x/
x3D1
82U1.x/ CU3.x/
x4D1
16 2U0.x/ C3U2.x/ CU4.x/:
“book” — 2011/3/4 — 17:06 — page 215 — #229
i
i
i
i
i
i
i
i
7.4. Examples 215
(4) Let IDRand w.x/ Dex2. Let hnDp2n. The sequence
of polynomials we obtain in this way are the Hermite polynomials H0.x/;
H1.x/; H2.x/; : : : : The first few of these are
H0.x/ D1
H1.x/ D2x
H2.x/ D 2C4x2
H3.x/ D 12x C8x3
H4.x/ D12 48x2C8x4
and, expressing the elements of Ein terms of them,
1DH0.x/
xD1
2H1.x/
x2D1
42H0.x/ CH2.x/
x3D1
86H1.x/ CH3.x/
x4D1
1612H0.x/ C12H2.x/ CH4.x/:Þ
Example 7.4.3. We consider an orthogonal (and hence linearly inde-
pendent) set CD fq0.x/; q1.x/; q2.x/; : : :gof nonzero functions in V: Let
hnD kqnkfor each n:
Let f .x/ 2Vbe arbitrary. For each nD0; 1; 2; : : : let
cnD1= hn˝f .x/; qn.x/˛;
the Fourier coefficients of f .x/ in terms of C, and form the sequence of
functions fg0.x/; g1.x/; g2.x/; : : :gdefined by
gm.x/ D
m
X
kD1
ckqk.x/:
Then for any m,
˝gm.x/; qn.x/˛D˝f .x/; qn.x/˛for all nm
and of course
˝gm.x/; qn.x/˛D0for all n > m:
“book” — 2011/3/4 — 17:06 — page 216 — #230
i
i
i
i
i
i
i
i
216 Guide to Advanced Linear Algebra
We think of fg0.x/; g1.x/; g2.x/; : : :gas a sequence of approximations
to f .x/; and we hope that it converges in some sense to f .x/: Of course,
the question of convergence is one of analysis and not linear algebra. Þ
We do, however, present the following extremely important special case.
Example 7.4.4. Let VDL2;  /: By definition, this is the space of
complex-valued measurable function f .x/ on Œ; such that the Lebesgue
integral
Z
ˇˇf .x/ˇˇ2dx
is finite.
Then, by the Cauchy-Schwartz-Buniakowsky inequality, Vis an inner
product space with inner product
˝f .x/; g.x/˛D1
2 Z
f .x/g.x/dx:
For each integer n; let pn.x/ Dei nx:Then fpn.x/gis an orthonormal
set, as we see from the equalities
pn.x/
2D1
2 Z
einx ei nx dx D1
2 Z
1 dx D1
and, for m¤n;
˝pm.x/; pn.x/˛D1
2 Z
eimx ei nx dx D1
2 Z
ei.mn/x
D1
2i.m n/ei.mn/x ˇˇˇ
D0:
For any function f .x/ 2L2;  / we have its classical Fourier
coefficients
b
f .n/ D˝f .x/; pn.x/˛D1
2 Z
f .x/pn.x/dx D1
2 Z
f .x/ei nx dx
for any integer n; and the Fourier expansion
g.x/ D1
X
nD1 b
f .n/pn.x/:
“book” — 2011/3/4 — 17:06 — page 217 — #231
i
i
i
i
i
i
i
i
7.4. Examples 217
It is a theorem from analysis that the right-hand side is well-defined,
i.e., that if for a nonnegative integer mwe define
gm.x/ D
m
X
nDmb
f .n/pn.x/;
then g.x/ Dlimm!1 gm.x/ exists, and furthermore it is another theorem
from analysis that, as functions in L2; /;
f .x/ Dg.x/:
This is equivalent to limm!1 kf .x/ gm.x/k D 0; and so we may regard
g0.x/,g1.x/,g2.x/; : : : as a series of approximations that converges to
f .x/ (in norm). Þ
Now we turn from orthogonal sets to adjoints and normality.
Example 7.4.5. (1) Let VDC1
0.R/be the space of real valued in-
finitely differentiable functions on Rwith compact support (i.e., for every
f .x/ 2C1
0.R/there is a compact interval IRwith f .x/ D0for
xI). Then Vis an inner product space with inner product given by
˝f .x/; g.x/˛DZ1
1
f .x/g.x/dx:
Let DWV!Vbe defined by D.f .x// Df0.x/: Then Dhas an adjoint
DWV!Vgiven by D.f .x// DE.x/ D f0.x/; i.e., DD D:To
see this, we compute
˝Df .x/; g.x/˛˝f .x/; Eg.x/˛
DZ1
1
f0.x/g.x/dx Z1
1
f .x/g0.x/dx
DZ1
1 f0.x/g.x/ Cf .x/g0.x/dx
DZ1
1 f .x/g.x/0dx Df .x/g.x/jb
aD0;
where the support of f .x/g.x/ is contained in the interval Œa; b:
Since DD D;Dcommutes with D;so Dis normal.
(2) Let VDC1.R/or VDP1.R/; with inner product given by
˝f .x/; g.x/˛DZ1
0
f .x/g.x/dx:
“book” — 2011/3/4 — 17:06 — page 218 — #232
i
i
i
i
i
i
i
i
218 Guide to Advanced Linear Algebra
We claim that DWV!Vdefined by D.f .x// Df0.x/ does not
have an adjoint. We prove this by contradiction. Suppose Dhas an adjoint
DDE: Guided by (1) we write E.f .x// D f0.x/ CF .f .x//: Then
we compute
˝Df .x/; g.x/˛˝f .x/; Eg.x/˛
DZ1
0f .x/g.x/0dx Z1
0
f .x/F g.x/dx
Df .1/g.1/ f .0/g.0/ Z1
0
f .x/F g.x/dx;
true for every pair of functions f .x/; g.x/ 2V. Suppose there is some
function g0.x/ with F .g0.x// ¤0. Setting f .x/ Dx2.x 1/2F .g0.x//
we find a nonzero right-hand side, so Eis not an adjoint of D. Thus the
only possibility is that F .f .x// D0for every f .x/ 2V, and hence
that E.f .x// D f0.x/. Then f .1/g.1/ f .0/g.0/ D0for every pair
of functions f .x/; g.x/ 2V, which is false (e.g., for f .x/ D1and
g.x/ Dx).
(3) For any fixed nlet VDPn1.R/with the same inner product. Then
Vis finite-dimensional. Thus DWV!Vhas an adjoint DWV!V. In
case nD1,DD0so DD0; and Dis trivially normal. For n1,Dis
not normal: Let f .x/ Dx. Then D2.f .x// D0but D.f .x// ¤0, so D
cannot be normal, by Lemma 7.3.16(4).
Let us compute Dfor some small values of n. If we set D.g.x// D
h.x/, we are looking for functions satisfying
Z1
0
f0.x/g.x/dx DZ1
0
f .x/h.x/dx for every f .x/ 2V:
Since Dis a linear transformation, it suffices to give the values of Don
the elements of a basis of V. We choose the standard basis E.
On P0.R/:
D.1/ D0:
On P1.R/:
D.1/ D 6C12x
D.x/ D 3C6x:
“book” — 2011/3/4 — 17:06 — page 219 — #233
i
i
i
i
i
i
i
i
7.5. The singular value decomposition 219
On P2.R/:
D.1/ D 6C12x
D.x/ D224x C30x2
Dx2D326x C30x2:Þ
7.5 The singular value decomposition
In this section we augment our results on normal linear transformations
to obtain geometric information on an arbitrary linear transformation TW
V!Wbetween finite dimensional inner product spaces. We assume we
are in this situation throughout.
Lemma 7.5.1. (1) TTis self-adjoint.
(2) Ker.TT/DKer.T/:
Proof. For (1), .TT/DTT DTT:
For (2), we have Ker.TT/Ker.T/: On the other hand, let v2
Ker.TT/: Then
0D hv; 0i D ˝v; TT.v/˛D˝T.v/; T.v/˛
so T.v/ D0and hence Ker.TT/Ker.T/:
Definition 7.5.2. A linear transformation SWV!Vis nonnegative
(respectively positive) if Sis self-adjoint and hS.v/; vi  0(respectively
hS.v/; vi> 0) for every v2V; v ¤0: Þ
Lemma 7.5.3. The following are equivalent:
(1) SWV!Vis nonnegative (respectively positive).
(2) SWV!Vis self-adjoint and all the eigenvalues of Sare nonnegative
(respectively positive).
(3) SDTTfor some (respectively some invertible) linear transforma-
tion TWV!V:
Proof. (1) and (2) are equivalent by the spectral theorem, Corollary 7.3.20.
If Sis self-adjoint with distinct eigenvalues 1; : : : ; k;all 0; then in
the notation of Corollary 7.3.22 we have SD1T1CCkTk:Choosing
TDRDp1T1C  C pkTk;we have TDRas well, and then
TTDR2DS;so (2) implies (3).
“book” — 2011/3/4 — 17:06 — page 220 — #234
i
i
i
i
i
i
i
i
220 Guide to Advanced Linear Algebra
Suppose (3) is true. We already know by Lemma 7.5.1(1) that TTis
self-adjoint. Let be an eigenvalue of TT;and let vbe an associated
eigenvector. Then
hv; vi D hv; vi D ˝v; TT.v/˛D˝T.v/; T.v/˛;
so 0: By Lemma 7.5.1(2), TTis invertible if and only if Tis invert-
ible, and we know that Tis invertible if and only if all its eigenvalues are
nonzero. Thus (3) implies (2).
Corollary 7.5.4. For any nonnegative linear transformation SWV!V
there is a unique nonnegative linear transformation RWV!Vwith
R2DS:
Proof. Ris constructed in the proof of Lemma 7.5.3. Uniqueness follows
easily by considering eigenvalues and eigenspaces.
Definition 7.5.5. Let TWV!Whave rank r. Let 1; : : : ; rbe
the (not necessarily distinct) nonzero eigenvalues of TT(all of which are
necessarily positive) ordered so that 12::: r. Then 1D
p1; : : : ,rDprare the singular values of T.Þ
Theorem 7.5.6 (Singular value decomposition).Let TWV!Whave
rank r, and let 1; : : : ; rbe the singular values of T:Then there are or-
thonormal bases CD fv1; : : : ; vngof Vand DD fw1; : : : ; wmgof W
such that
TviDiwifor iD1; : : : ; r and TviD0for i > r:
Proof. Since TTis self-adjoint, we know that there is an orthonormal
basis CD fv1; : : : ; vngof Vof eigenvectors of TTand we order the basis
so that the associated eigenvalues are 1; : : : ; r; 0; : : : ; 0: For iD1; : : : ; r;
let
wiD1=iTvi:
We claim C1D fw1; : : : ; wrgis an orthonormal set. We compute
˝wi; wi˛D1=i2˝Tvi;Tvi˛D1=i2iD1
and for i¤j
˝wi; wj˛D1=ij˝Tvi;Tvj˛D1=ij˝vi;TTvj˛
D1=ij˝vi; jvj˛Dj=ij˝vi; vj˛D0:
Then extend Cto an orthonormal basis Cof W:
“book” — 2011/3/4 — 17:06 — page 221 — #235
i
i
i
i
i
i
i
i
7.5. The singular value decomposition 221
Remark 7.5.7. This theorem has a geometric interpretation: We choose
new letters to have an unbiased description. Let Xbe an inner product space
and consider an orthonormal set BD fx1; : : : ; xngof vectors in X: Then
for any positive real numbers a1; : : : ; ak;
(xDc1x1C  C ckxkˇˇˇˇ
k
X
iD1ˇˇciˇˇ2=a2
iD1)
defines an ellipsoid in X: If kDdim.X/ and aiD1for each ithis ellipsoid
is the unit sphere in X:
The singular value decomposition says that if TWV!Wis a lin-
ear transformation, then the image of the unit sphere of Vunder Tis an
ellipsoid in W; and furthermore it completely identifies that ellipsoid. Þ
We also observe the following.
Corollary 7.5.8. Tand Thave the same singular values.
Proof. This is a special case of Theorem 5.9.2.
Proceeding along these lines we now derive the polar decomposition of
a linear transformation.
Theorem 7.5.9 (Polar decomposition).Let TWV!Vbe a linear trans-
formation. Then there is a unique positive semidefinite linear transforma-
tion RWV!Vand an isometry QWV!Vwith TDQR. If Tis
invertible, Qis also unique.
Proof. Suppose TDQR:By definition, QDQ1and RDR. Then
TTD.QR/QR DR.QQ/RDRIR DR2:
Then, by Corollary 7.5.4, Ris unique.
Suppose that Tis invertible, and define Ras in Corollary 7.5.4. Then
Ris invertible, and then TDQR for the unique linear transformation
QDT R1. It remains to show that Qis an isometry. We compute, for any
v2V;
˝Q.v/; Q.v/˛D˝T R1.v/; T R1.v/˛D˝v; T R1T R1.v/˛
D˝v; R1TT R1.v/˛D˝v; R1TTR1.v/˛
D˝v; R1R2R1.v/˛D hv; vi:
“book” — 2011/3/4 — 17:06 — page 222 — #236
i
i
i
i
i
i
i
i
222 Guide to Advanced Linear Algebra
Suppose that Tis not (necessarily) invertible. Choose a linear transfor-
mation SWIm.R/!Vwith RS DIWIm.R/!Im.R/.
By Lemma 7.5.1 we know that Ker.TT/DKer.T/and also that
Ker.R/DKer.RR/DKer.R2/DKer.TT/:
Hence YDIm.R/?and ZDIm.T/?are inner product spaces of the
same dimension .dim.Ker.T/// and hence are isometric. Choose an isom-
etry Q0WY!Z. Define Qas follows: Let XDIm.R/; so VDX?Y.
Then
Q.v/ DTS.x/CQ0.y/ where vDxCy; x 2X; y 2Y:
(In the invertible case, SDR1and Q0W f0g ! f0g;so Qis unique,
QDT R1. In general, it can be checked that Qis independent of the
choice of S, but it depends on the choice of Q0, and is not unique.)
We claim that QR DTand that Qis an isometry.
To prove the first claim, we make a preliminary observation. For any
v2V; let xDR.v/: Then R.S.x/ v/ DRS.x/ R.v/ DxxD0;
i.e., S.x/ v2Ker.R/: But Ker.R/DKer.T/; so S.x/ v2Ker.T/;
i.e., T.S.x/ v/ D0, so T.S.x// DT.v/. Using this observation we
compute that for any v2V,
QR.v/ DQ.x C0/ DT S.x/ CQ0.0/ DT.v/ C0DT.v/:
To prove the second claim, we observe that for any v2V,
˝R.v/; R.v/˛D˝v; RR.v/˛D˝v; R2.v/˛D˝v; TT.v/˛D˝T.v/; T.v/˛:
Then, using the fact that Im.Q0/ZDIm.T/?;and writing vD
xCyas above,
˝Q.v/; Q.v/˛D˝T S.x/ CQ0.y/; T S.x/ CQ0.y/˛
D˝T S.x/; T S.x/˛C˝Q0.y/; Q0.y/˛
D˝T.v/; T.v/˛C hy; yi D ˝R.v/; R.v/˛C hy; yi
D hx; xi C hy; yi D hxCy; x Cyi D hv; vi:
“book” — 2011/3/4 — 17:06 — page 223 — #237
i
i
i
i
i
i
i
i
CHAPTER 8
Matrix groups
as Lie groups
Lie groups are central objects in mathematics. They lie at the intersection
of algebra, analysis, and topology. In this chapter, we will show that many
of the groups we have already encountered are in fact Lie groups.
This chapter presupposes a certain knowledge of differential topology,
and so we will use definitions and theorems from differential topology with-
out further comment. We will also be a bit sketchy in our arguments in
places. Throughout this chapter, “smooth means C1. We use cij to denote
a matrix entry that may be real or complex, xij to denote a real matrix entry
and zij to denote a complex matrix entry, and we write zij Dxij Ciyij
where xij and yij are real numbers. We let FDRor Cand dFDdimRF,
so that dRD1and dCD2.
8.1 Definition and first examples
Definition 8.1.1. Gis a Lie group if
(1) Gis a group.
(2) Gis a smooth manifold.
(3) The multiplication map mWGG!Gby m.g1; g2/Dg1g2and the
inversion map iWG!Gby i.g/ Dg1are both smooth maps. Þ
Example 8.1.2. (1) The general linear group
GLn.F/D finvertible n-by-nmatrices with entries in Fg:
GLn.F/is a Lie group: It is an open subset of Fn2as
GLn.F/Ddet1.F f0g/;
223
“book” — 2011/3/4 — 17:06 — page 224 — #238
i
i
i
i
i
i
i
i
224 Guide to Advanced Linear Algebra
so it is a smooth manifold of dimension dFn2. It is noncompact for every
n1as GL1.F/contains matrices Œcwith jcjarbitrarily large. GLn.R/
has two components and GLn.C/is connected, as we showed in Theo-
rem 3.5.1 and Theorem 3.5.7. The multiplication map is a smooth map as
it is a polynomial in the entries of the matrices, and the inversion map is
a smooth map as it is a rational function of the entries of the matrix with
nonvanishing denominator, as we see from Corollary 3.3.9.
(2) The special linear group
SLn.F/D fn-by-nmatrices of determinant 1with entries in Fg:
SLn.F/is a Lie group: SLn.F/Ddet1.f1g/. To show SLn.F/is a smooth
manifold we must show that 1 is a regular value of det. Let MD.cij /,
M2SLn.F/. Expanding by minors of row i, we see that
1Ddet.M / D.1/iC1det.Mi1/C.1/iC2det.Mi2/C  ;
where Mij is the submatrix obtained by deleting row iand column jof M,
so at least one of the terms in the sum is nonzero, say cij .1/iCjdet.Mij /.
But then the derivative matrix det0of det with respect to the matrix en-
tries, when evaluated at M, has the entry .1/iCjdet.Mij /¤0, so this
matrix has rank dFeverywhere. Hence, by the inverse function theorem,
SLn.F/is a smooth submanifold of Fn2. Since f1g  Fhas codimension
dF, SLn.F/has codimension dFin Fn2, so it is a smooth manifold of di-
mension dF.n21/.
SL1.F/D fŒ1gis a single point and hence is compact, but SLn.F/is
noncompact for n > 1, as we see from the fact that SL2.F/contains matri-
ces of the form c 0
0 1=c with jcjarbitrarily large. An easy modification of the
proofs of Theorem 3.5.1 and Theorem 3.5.7 shows that SLn.F/is always
connected. Locally, SLn.F/is parameterized by all but one matrix entry,
and, by the implicit function theorem, that entry is locally a function of the
other n21entries. We have observed that multiplication and inversion
are smooth functions in the entries of a matrix, and hence multiplication
and inversion are smooth functions of the parameters in a coordinate patch
around each element of SLn.F/, i.e., mDSLn.F/SLn.F/!SLn.F/
and iWSLn.F/!SLn.F/are smooth functions. Þ
8.2 Isometry groups of forms
Our next family of examples arises as isometry groups of nonsingular bi-
linear or sesquilinear forms. Before discussing these, we establish some
“book” — 2011/3/4 — 17:06 — page 225 — #239
i
i
i
i
i
i
i
i
8.2. Isometry groups of forms 225
notation:
Inis the n-by-nidentity matrix.
For pCqDn,Ip;q is the n-by-nmatrix Ip0
0Iq.
For neven, nD2m,Jnis the n-by-nmatrix 0 Im
Im0.
For a matrix MD.cij /, we write MDŒm1j  j mn, so that miis
the ith column of M,miD"c1i
c2i
:
:
:
cni #.
Example 8.2.1. Let 'be a nonsingular symmetric bilinear form on a
vector space Vof dimension nover F. We have two cases:
(1) FDR. Here, by Theorem 6.2.29, 'is isometric to pŒ1 ?qŒ1
for uniquely determined integers pand qwith pCqDn. The orthogonal
group
Op;q .R/D˚M2GLn.R/jtMIp;q MDIp;q :
In particular if pDnand qD0we have
On.R/DOn;0.R/D˚M2GLn.R/jtMDM1:
(2) FDC. In this case, by Corollary 6.2.27, 'is isometric to nŒ1. The
orthogonal group
On.C/D˚M2GLn.C/jtMDM1:
(The term “the orthogonal group” is often used to mean On.R/. Compare
Definition 7.3.12.)
Let GDOp;q .R/, On.R/, or On.C/.Gis a Lie group of dimension
dFn.n 1/=2.Ghas two components. Letting S G DG\SLn.F/, we
obtain the special orthogonal groups. For GDOn.R/or On.C/,SG is the
identity component of G, i.e., the component of Gcontaining the identity
matrix. If GDOn.R/then Gis compact. O1.C/DO1.R/D f˙Œ1g. If
GDOn.C/for n > 1, or GDOp;q .R/with p1and q1, then Gis
not compact.
We first consider the case GDOp;q .R/, including GDOn;0.R/D
On.R/. For vectors vDa1
:
:
:
anand wDb1
:
:
:
bn, let
hv; wi D
p
X
iD1
aibi
n
X
iDpC1
aibi:
“book” — 2011/3/4 — 17:06 — page 226 — #240
i
i
i
i
i
i
i
i
226 Guide to Advanced Linear Algebra
Let MDŒm1j  j mn. Then M2Gif and only if
fii .M / D˝mi; mi˛D1for iD1; : : : ; p
fii .M / D˝mi; mi˛D 1for iDpC1; : : : ; n
fij .M / D˝mi; mj˛D0for 1i < j < n:
Thus if we let FWMn.R/!RN,NDn.n C1/=2, by
F .M / Df11.M /; f22 .M /; : : : ; fnn.M /; f12 .M /;
f13.M /; : : : ; f1n.M /; : : : ; fn1;n.M /
then
GDF1.t0/where t0D.1; : : : ; 1; 0; : : : ; 0/:
We claim that MDIis a regular point of F. List the entries of Min the
order x11; x22; : : : ; xnn; x12; : : : ; x1n; : : : ; xn1;n; x21; : : : ; xn1; : : : ; xn;n1.
Computation shows that F0.I /, the matrix of the derivative of Fevaluated
at MDI, which is an N-by-n2matrix, has its leftmost N-by-Nsubmatrix
a diagonal matrix with diagonal entries ˙2or ˙1. Thus F0.I / has rank N,
and Iis a regular point of F. Hence, by the inverse function theorem, there
is an open neighborhood B.I / of Iin Mn.R/such that F1.t0/\B.I /
is a smooth submanifold of B.I / of codimension N, i.e., of dimension
N2nDn.n 1/=2. But for any fixed M02GLn.R/, multiplication by
M0is an invertible linear map, and hence a diffeomorphism, from Mn.R/
to itself. Thus we know that M0.F 1.t0/\B.I // is a smooth submanifold
of M0B.I /, which is an open neighborhood of M0in Mn.R/. But, since G
is a group, M0F1.t0/DM0GDGDF1.t0/. Hence we see that Gis a
smooth manifold. Again we apply the implicit function theorem to see that
the group operations on Gare smooth maps.
Finally, we observe that any MD.cij /in On.R/has jcij j  1for
every i,j, so On.R/is a closed and bounded, and hence compact, sub-
space of Rn2. On the other hand, the group O1;1.R/contains the matrices
hpx2C1 x
xpx2C1ifor any x2R, so it is an unbounded subset of Rn2and
hence it is not compact, and similarly for Op;q.R/with p1and q1.
A very similar argument applies in case GDOn.C/. We let
fij .M / DRe ˝mi; mj˛and gij .M / DIm ˝mi; mj˛
where Re./and Im./denote real and imaginary parts respectively. We then
let FWMn.C/!R2N by
F .M / Df11.M /; g11.M /; f22.M /; g22 .M /; : : : ;
“book” — 2011/3/4 — 17:06 — page 227 — #241
i
i
i
i
i
i
i
i
8.2. Isometry groups of forms 227
and we identify Mn.C/with R2n2by identifying the entry zij Dxij Ciyij
of Mwith the pair .xij ; yij /of real numbers. Then
GDF1t0where t0D.1; 0; 1; 0; : : :; 1; 0; 0; : : :; 0/:
Again we show that MDIis a regular point of F, and the rest of the
argument is the same, showing that Gis a smooth manifold of dimension
2N 2n2Dn.n 1/, and that the group operations are smooth. Also,
O2.C/contains the matrices hipx21x
x ipx21ifor any x2R, so it is not
compact, and similarly for On.C/for n2.Þ
Example 8.2.2. Let 'be a nonsingular Hermitian form on a vector space
Vof dimension nover C. Then, by Theorem 6.2.29, 'is isometric to
pŒ1 ?qŒ1 for uniquely determined integers pand qwith pCqDn.
The unitary group
Up;q .C/D˚M2GLn.C/jtMIp;q MDIp;q :
In particular if pDnand qD0we have
Un.C/D˚M2GLn.C/jtMDM1:
(The term “the unitary group” is often used to mean Un.C/. Compare Def-
inition 7.3.12.)
Let GDUn.C/or Up;q .C/.Gis a Lie group of dimension n2.Gis
connected. If GDUn.C/then Gis compact. If GDUp;q .C/with p1
and q1, then Gis not compact. Letting SG DG\SLn.R/, we obtain
the special unitary groups, which are closed connected subgroups of Gof
codimension 1.
The argument here is very similar to the argument in the last example.
For vectors vDa1
:
:
:
anand wDb1
:
:
:
bnwe let
hv; wi D
p
X
iD1
aibi
n
X
iDpC1
aibi:
Let MDŒm1j  j mn. Then M2Gif and only if
˝mi; mi˛D1for iD1; : : : ; p
˝mi; mi˛D 1for iDpC1; : : : ; n
˝mi; mj˛D0for 1i < j < n:
“book” — 2011/3/4 — 17:06 — page 228 — #242
i
i
i
i
i
i
i
i
228 Guide to Advanced Linear Algebra
Let fii .M / D hmi; mii, which is always real valued. For i¤j, let
fij .M / DRe.hmi; mji/and gij DIm.hmi; mji/.
Set NDnC2.n.n 1/=2/ Dn2. Let FDMn.C/!RNby
F .M / Df11.M /; : : : ; fnn.M /; f12 .M /; g12.M /; : : : :
Then
GDF1t0where t0D.1; : : : ; 1; 0; : : : ; 0/:
Identify Mn.C/with R2n2as before. We again argue as before, showing
that Iis a regular point of Fand then further that Gis a smooth manifold
of dimension 2n2n2Dn2, and in fact a Lie group. Also, a similar
argument shows that Un.C/is compact but that Up;q .C/is not compact for
p1and q1.Þ
Example 8.2.3. Let 'be a nonsingular skew-symmetric form on a vector
space Vof dimension nover F. Then, by Theorem 6.2.40, 'is isometric to
0 I
I 0 . The symplectic group
Sp.n; F/D˚M2GLn.F/jtMJnMDJn:
Let GDSp.n; R/or Sp.n; C/.Gis connected and noncompact. Gis a Lie
group of dimension dF.n.n C1/=2/. We also have the symplectic group
Sp.n/ DSp.n; C/\U.n; C/:
GDSp.n/ is a closed subgroup of both Sp.n; C/and U.n; C/, and is
a connected compact Lie group of dimension n.n C1/=2. (The term “the
symplectic group” is often used to mean Sp.n/.)
We consider GDSpn.F/for FDRor C.
The argument is very similar. For VDa1
:
:
:
anand wDb1
:
:
:
bn, let
hv; wi D
n=2
X
iD1
.aibiCn=2 aiCn=2bi/:
If MDŒm1j  j mnthen M2Gif and only if
˝mi; miCn=2˛D1for iD1; : : : ; n=2
˝mi; mj˛D0for 1i < j n; j ¤iCn=2:
“book” — 2011/3/4 — 17:06 — page 229 — #243
i
i
i
i
i
i
i
i
8.2. Isometry groups of forms 229
Let fij .M / D hmi; mjifor i < j . Set NDn.n 1/=2. Let FW
Mn.F/!FNby
F .M / Df12.M /; : : : ; fn1;n.M /:
Then
GDF1.t0/where t0D.0; : : : ; 1; : : :/:
Again we show that Iis a regular point for F, and continue similarly, to
obtain that Gis a Lie group of dimension dFn2dFNDdF.n.n C1/=2/.
Sp2.F/contains the matrices x 0
0 1=x for any x¤02R, showing that
Spn.F/is not compact for any n.
Finally, Sp.n/ DSpn.C/\U.n; C/is a closed subspace of the compact
space U.n; C/, so is itself compact. We shall not prove that it is a Lie group
nor compute its dimension, which is .n2Cn/=2, here. Þ
Remark 8.2.4. A warning to the reader: Notation is not universally
consistent and some authors index the symplectic groups by n=2 instead
of n.Þ
Finally, we have a structure theorem for GLn.R/and GLn.C/. We de-
fined AC
N,Nn.R/and Nn.C/in Definition 7.2.18, and these are obviously
Lie groups.
Theorem 8.2.5. The multiplication maps
mWO.n; R/AC
nNn.R/!GLn.R/
and
mWU.n; C/AC
nNn.C/!GLn.C/
given by m.P; A; N / DPAN are diffeomorphisms.
Proof. The special case of Theorem 7.2.20 with kDngives that mis a
homeomorphism, and it is routine to check that mand m1are both differ-
entiable.
Remark 8.2.6. We have adopted our approach here on two grounds: first,
to use elementary arguments to the extent possible, and second, to illustrate
and indeed emphasize the linear algebra aspects of Lie groups. But it is
possible to derive the results of this chapter by using more theory and less
computation. It was straightforward to prove that GLn.R/and GLn.C/are
Lie groups. The fact that the other groups we considered are also Lie groups
is a consequence of the theorem that any closed subgroup of a Lie group is
a Lie group. But this theorem is a theorem of analysis and topology, not of
linear algebra. Þ
“book” — 2011/3/4 — 17:06 — page 230 — #244
i
i
i
i
i
i
i
i
“book” — 2011/3/4 — 17:06 — page 231 — #245
i
i
i
i
i
i
i
i
CHAPTER A
Polynomials
In this appendix we gather and prove some important facts about polyno-
mials. We fix a field Fand we let RDFŒxbe the ring of polynomials in
the variable xwith coefficients in F,
RD fanxnC   C a1xCa0jai2F; n 0g:
A.1 Basic properties
We define the degree of a nonzero polynomial to be the highest power of x
that appears in the polynomial. More precisely:
Definition A.1.1. Let p.x/ DanxnC  C a0with an¤0. Then the
degree deg p.x/ Dn.Þ
Remark A.1.2. The degree of the 0polynomial is not defined. A polyno-
mial of degree 0is a nonzero constant polynomial. Þ
The basic tool in dealing with polynomials is the division algorithm.
Theorem A.1.3. Let f .x/; g.x/ 2Rwith g.x/ ¤0. Then there exist
unique polynomials q.x/ (the quotient) and r.x/ (the remainder) such that
f .x/ Dg.x/q.x/ Cr.x/, where r.x/ D0or deg r.x/ < deg g.x/.
Proof. We first prove existence.
If f .x/ D0we are done: choose q.x/ D0and r.x/ D0. Otherwise,
let f .x/ have degree mand q.x/ have degree n. We fix nand proceed by
complete induction on m. If m < n we are again done: choose q.x/ D0
and r.x/ Df .x/.
Otherwise, let g.x/ DanxnC  C a0and f .x/ DbmxmC  C b0.
If q0.x/ D.bm=an/xmn, then f .x/ g.x/q0.x/ has the coefficient of
231
“book” — 2011/3/4 — 17:06 — page 232 — #246
i
i
i
i
i
i
i
i
232 Guide to Advanced Linear Algebra
xmequal to zero. If f .x/ Dg.x/q0.x/ then we are again done: choose
q.x/ Dq0.x/ and r.x/ D0. Otherwise, f1.x/ Df .x/ g.x/q0.x/ is a
nonzero polynomial of degree less than m. Thus by the inductive hypothesis
there are polynomials q1.x/ and r1.x/ with f1.x/ Dg.x/q1.x/ Cr1.x/
where r1.x/ D0or deg r1.x/ < deg g.x/. Then f .x/ Dg.x/q0.x/ C
f1.x/ Dg.x/q0.x/Cg.x/q1.x/Cr1.x/ Dg.x/q.x/Cr .x/ where q.x/ D
q0.x/Cq1.x/ and r.x/ Dr1.x/ is as required, so by induction we are done.
To prove uniqueness, suppose f .x/ Dg.x/q1.x/ Cr1.x/ and f .x/ D
g.x/q2.x/ Cr2.x/ with r1.x/ and r2.x/ satisfying the conditions of the
theorem. Then g.x/.q1.x/ q2.x// Dr2.x/ r1.x/. Comparing degrees
shows r2.x/ Dr1.x/ and q2.x/ Dq1.x/.
Remark A.1.4. The algebraically well-informed reader will recognize the
rest of this appendix as a special case of the theory of ideals in a Euclidean
ring, but we will develop this theory from scratch for polynomial rings. Þ
Definition A.1.5. A nonempty subset Jof Ris an ideal of Rif it has
the properties
(1) If p1.x/ 2Jand p2.x/ 2J, then p1.x/ Cp2.x/ 2J.
(2) If p1.x/ 2Jand q.x/ 2R, then p1.x/q.x/ 2J.Þ
Remark A.1.6. Note that JD f0gis an ideal, the zero ideal. Any other
ideal (i.e., any ideal containing a nonzero element) is a nonzero ideal.Þ
Example A.1.7. (1) Fix a polynomial p0.x/ and let Jbe the subset of
Rconsisting of all multiples of p0.x/,JD fp0.x/q.x/ jq.x/ 2Rg. It is
easy to check that Jis an ideal. An ideal of this form is called a principal
ideal and p0.x/ is called a generator of J, or is said to generate J.
(2) Let fp1.x/; p2.x/; : : :gbe a (possibly infinite) set of polynomials in
Rand let JD fPpi.x/qi.x/ jonly finitely many qi.x/ ¤0g. It is easy to
check that Jis an ideal, and fp1.x/; p2.x/; : : :gis called a generating set
for J(or is said to generate J). Þ
A nonzero polynomial p.x/ DanxnC  C a0is called monic if the
coefficient of the highest power of xappearing in p.x/ is 1, i.e., if anD1.
Lemma A.1.8. Let Jbe a nonzero ideal of R. Then Jcontains a unique
monic polynomial of lowest degree.
Proof. The set fdeg p.x/ jp.x/ 2J; p.x/ ¤0gis a nonempty set of
nonnegative integers, so, by the well-ordering principle, it has a smallest
element d. Let ep0.x/ be a polynomial in Jwith deg ep0.x/ Dd. Thus
“book” — 2011/3/4 — 17:06 — page 233 — #247
i
i
i
i
i
i
i
i
A.1. Basic properties 233
ep0.x/ is a polynomial in Jof lowest degree, which may or may not be
monic. Write ep0.x/ DeadxdC C ea0. By the properties of an ideal,
p0.x/ D.1=ead/ep0.x/ DxdC  C .ea0=ead/DxdC  C a0is in
J. This gives existence. To show uniqueness, suppose we have a different
monic polynomial p1.x/ of degree din J,p1.x/ DxdCCb0. Then by
the properties of an ideal eq.x/ Dp0.x/p1.x/ is a nonzero polynomial of
degree e < d in J,eq.x/ DecexeCCec0. But then q.x/ D.1=ece/eq.x/ D
xeCC.ec0=ece/is a monic polynomial in Jof degree e < d , contradicting
the minimality of d.
Theorem A.1.9. Let Jbe any nonzero ideal of R. Then Jis a principal
ideal. More precisely, Jis the principal ideal generated by p0.x/, where
p0.x/ is the unique monic polynomial of lowest degree in J.
Proof. By Lemma A.1.8, there is such a polynomial p0.x/. Let J0be the
principal ideal generated by p0.x/. We show that J0DJ.
First we claim that J0J. This is immediate. For, by definition, J0
consists of polynomials of the form p0.x/q.x/, and, by the properties of an
ideal, every such polynomial is in J.
Next we claim that JJ0. Choose any polynomial g.x/ 2J. By
Theorem A.1.3, we can write g.x/ Dp0.x/q.x/ Cr.x/ where r.x/ D0
or deg r.x/ < deg p0.x/. If r.x/ D0we are done, as then g.x/ D
p0.x/q.x/ 2J0. Assume r.x/ ¤0. Then, by the properties of an ideal,
r.x/ Dg.x/ p0.x/q.x/ 2J. (p0.x/ 2Jso p0.x/.q.x// 2J;
then also g.x/ 2Jso g.x/ Cp0.x/.q0.x// Dr.x/ 2J). Now r.x/
is a polynomial of some degree e < d ,r.x/ DaexeC  C a0, so
.1=ae/r.x/ DxeC  C .a0=ae/2J. But this is a monic polynomial
of degree e, contradicting the minimality of d.
We now have an important application of this theorem.
Definition A.1.10. Let fp1.x/; p2.x/; : : :gbe a (possibly infinite) set
of nonzero polynomials in R. Then a monic polynomial d.x/ 2Ris a
greatest common divisor (gcd)of fp1.x/; p2.x/; : : :gif it has the following
properties
(1) d.x/ divides every pi.x/.
(2) If e.x/ is any polynomial that divides every pi.x/, then e.x/ divides
d.x/.Þ
“book” — 2011/3/4 — 17:06 — page 234 — #248
i
i
i
i
i
i
i
i
234 Guide to Advanced Linear Algebra
Theorem A.1.11. Let fp1.x/; p2.x/; : : :gbe a (possibly infinite) set of
nonzero polynomials in R. Then fp1.x/; p2.x/; : : :ghas a unique gcd d.x/.
More precisely, d.x/ is the generator of the principal ideal
JD fXpi.x/qi.x/ jqi.x/ 2Ronly finitely many nonzerog:
Proof. By Theorem A.1.9, there is unique generator d.x/ of this ideal. We
must show it has the properties of a gcd.
Let J0be the principal ideal generated by d.x/, so that J0DJ.
(1) Consider any polynomial pi.x/. Then pi.x/ 2J, so pi.x/ 2J0.
That means that pi.x/ Dd.x/q.x/ for some q.x/, so d.x/ divides pi.x/.
(2) Since d.x/ 2J, it can be written as d.x/ DPpi.x/qi.x/ for some
polynomials fqi.x/g. Let e.x/ be any polynomial that divides every pi.x/.
Then it divides every product pi.x/qi.x/, and hence their sum d.x/.
Thus we have shown that d.x/ satisfies both properties of a gcd. It re-
mains to show that it is unique. Suppose d1.x/ is also a gcd. Since d.x/
is a gcd of fp1.x/; p2.x/; : : :g, and d1.x/ divides each of these polynomi-
als, then d1.x/ divides d.x/. Similarly, d.x/ divides d1.x/. Thus d.x/ and
d1.x/ are a pair of monic polynomials each of which divides the other, so
they are equal.
We recall an important definition.
Definition A.1.12. A field Fis algebraically closed if every noncon-
stant polynomial f .x/ in FŒxhas a root in F, i.e., if for every nonconstant
polynomial f .x/ in FŒxthere is an element rof Fwith f .r/ D0.Þ
We have the following famous and important theorem, which we shall
not prove.
Theorem A.1.13 (Fundamental Theorem of Algebra).The field Cof com-
plex numbers is algebraically closed.
Example A.1.14. Let Fbe an algebraically closed field and let a2F.
Then JD fp.x/ 2Rjp.a/ D0gis an ideal. It is generated by the
polynomial xa.Þ
Here is one of the most important applications of the gcd.
Corollary A.1.15. Let Fbe an algebraically closed field and let fp1.x/; : : : ;
pn.x/gbe a set of polynomials not having a common zero. Then there is a
set of polynomials fq1.x/; : : : ; qn.x/gsuch that
p1.x/q1.x/ C  C pn.x/qn.x/ D1:
“book” — 2011/3/4 — 17:06 — page 235 — #249
i
i
i
i
i
i
i
i
A.1. Basic properties 235
Proof. Since fp1.x/; : : : ; pn.x/ghave no common zero, they have no non-
constant polynomial as a common divisor. Hence their gcd is 1. The corol-
lary then follows from Theorem A.1.11.
Definition A.1.16. A set of polynomials fp1.x/; p2.x/; : : :gis rela-
tively prime if it has gcd 1. Þ
We often phrase this by saying the polynomials p1.x/; p2.x/; : : : are
relatively prime.
Remark A.1.17. Observe that fp1.x/; p2.x/; : : :gis relatively prime if
and only if the polynomials pi.x/ have no nonconstant common factor. Þ
Closely related to the greatest common divisor (gcd) is the least com-
mon multiple (lcm).
Definition A.1.18. Let fp1.x/; p2.x/; : : :gbe a set of polynomials.
A monic polynomial m.x/ is a least common multiple (lcm)of fp1.x/;
p2.x/; : : :gif it has the properties
(1) Every pi.x/ divides m.x/.
(2) If n.x/ is any polynomial that is divisible by every pi.x/, then m.x/
divides n.x/.Þ
Theorem A.1.19. Let fp1.x/; : : : ; pk.x/gbe any finite set of nonzero poly-
nomials. Then fp1.x/; : : : ; pk.x/ghas a unique lcm m.x/.
Proof. Let JD fpolynomials n.x/ jn.x/ is divisible by every pi.x/g. It is
easy to check that Jis an ideal (verify the two properties of an ideal in Defi-
nition A.1.5). Also, Jis nonzero, as it contains the product p1.x/ pk.x/.
By Theorem A.1.9, Jis generated by a monic polynomial m.x/. We
claim m.x/ is the lcm of fp1.x/; : : : ; pk.x/g. Certainly m.x/ is divisible by
every pi.x/, as m.x/ is in J. Also, m.x/ divides every n.x/ in Jbecause J,
as the principal ideal generated by m.x/, consists precisely of the multiples
of m.x/.
Remark A.1.20. By the proof of Theorem A.1.19, m.x/ is the unique
monic polynomial of smallest degree in J. Thus the lcm of fp1.x/; : : : ;
pk.x/gmay alternately be described as the unique monic polynomial of
lowest degree divisible by every pi.x/.Þ
Lemma A.1.21. Suppose p.x/ divides the product q.x/r.x/ and that p.x/
and q.x/ are relatively prime. Then p.x/ divides r.x/.
“book” — 2011/3/4 — 17:06 — page 236 — #250
i
i
i
i
i
i
i
i
236 Guide to Advanced Linear Algebra
Proof. Since p.x/ and q.x/ are relatively prime there are polynomials f .x/
and g.x/ with p.x/f .x/ Cq.x/g.x/ D1. Then
p.x/f .x/r.x/ Cq.x/g.x/r.x/ Dr.x/:
Now p.x/ obviously divides the first term p.x/f .x/r.x/, and p.x/ also
divides the second term as, by hypothesis p.x/ divides q.x/r.x/, so p.x/
divides their sum r.x/.
Corollary A.1.22. Suppose p.x/ and q.x/ are relatively prime. If p.x/
divides r.x/ and q.x/ divides r.x/, then p.x/q.x/ divides r.x/.
Proof. Since q.x/ divides r.x/, we may write r.x/ Dq.x/s.x/ for some
polynomial s.x/. Now p.x/ divides r.x/ Dq.x/s.x/ and p.x/ and q.x/
are relatively prime, so by Lemma A.1.21 we have that p.x/ divides s.x/,
and hence we may write s.x/ Dp.x/t.x/ for some polynomial t.x/. Then
r.x/ Dq.x/s.x/ Dq.x/p.x/t.x/ is obviously divisible by p.x/q.x/.
Corollary A.1.23. If p.x/ and q.x/ are relatively prime monic polynomi-
als, then their lcm is the product p.x/q.x/.
Proof. If their lcm is m.x/, then on the one hand m.x/ divides p.x/q.x/, by
the definition of the lcm. On the other hand, since both p.x/ and q.x/ divide
m.x/, then p.x/q.x/ divides m.x/, by Corollary A.1.22. Thus p.x/q.x/
and m.x/ are monic polynomials that divide each other, so they are equal.
A.2 Unique factorization
The most important property that RDFŒxhas is that it is a unique factor-
ization domain.
In order to prove this we need to do some preliminary work.
Definition A.2.1. (1) The units in FŒxare the nonzero constant poly-
nomials.
(2) A nonzero nonunit polynomial f .x/ is irreducible if
f .x/ Dg.x/h.x/ with g.x/h.x/ 2F.x/
implies that one of g.x/ and h.x/ is a unit.
(3) A nonzero nonunit polynomial f .x/ in FŒxis prime if whenever
f .x/ divides a product g.x/h.x/ of two polynomials in FŒx, it divides (at
least) one of the factors g.x/ or h.x/.
“book” — 2011/3/4 — 17:06 — page 237 — #251
i
i
i
i
i
i
i
i
A.2. Unique factorization 237
(4) Two nonzero polynomials f .x/ and g.x/ in FŒxare associates if
f .x/ Dug.x/ for some unit u.Þ
Lemma A.2.2. A polynomial f .x/ in FŒxis prime if and only if it is irre-
ducible.
Proof. First suppose f .x/ is prime, and let f .x/ Dg.x/h.x/. Certainly
both g.x/ and h.x/ divide f .x/. By the definition of prime, f .x/ divides
g.x/ or h.x/. If f .x/ divides g.x/, then f .x/ and g.x/ divide each other,
and so have the same degree. Thus h.x/ is constant, and so is a unit. By the
same argument, if f .x/ divides h.x/, then g.x/ is constant, and so a unit.
Suppose f .x/ is irreducible, and let f .x/ divide g.x/h.x/. To show
that f .x/ is prime, we need to show that f .x/ divides one of the factors.
By Theorem A.1.11, f .x/ and g.x/ have a gcd d.x/. By definition,
d.x/ divides both f .x/ and g.x/, so in particular d.x/ divides f .x/,f .x/ D
d.x/e.x/. But f .x/ is irreducible, so d.x/ or e.x/ is a unit. If e.x/ Duis
a unit, then f .x/ Dd.x/u so d.x/ Df .x/v where uv D1. Then, since
d.x/ divides g.x/,f .x/ also divides g.x/. On the other hand, if d.x/ Du
is a unit, then d.x/ D1as by definition, a gcd is always a monic poly-
nomial. In other words, by Definition A.1.16, f .x/ and g.x/ are relatively
prime. Then, by Lemma A.1.21, f .x/ divides h.x/.
Theorem A.2.3 (Unique factorization).Let f .x/ 2FŒxbe a nonzero poly-
nomial. Then
f .x/ Dug1.x/ gk.x/
for some unit uand some set fg1.x/; : : : ; gk.x/gof irreducible polynomi-
als. Furthermore, if also
f .x/ Dvh1.x/ hl.x/
for some unit vand some set fh1.x/; : : : ; hl.x/gof irreducible polynomials,
then lDkand, after possible reordering, hi.x/ and gi.x/ are associates
for each iD1; : : : ; k.
Proof. We prove this by complete induction on nDdeg f .x/. First we
prove the existence of a factorization and then we prove its uniqueness.
For the proof of existence, we proceed by induction. If nD0then
f .x/ Duis a unit and there is nothing further to prove. Suppose that we
have existence for all polynomials of degree at most nand let f .x/ have
degree nC1. If f .x/ is irreducible, then f .x/ Df .x/ is a factorization
“book” — 2011/3/4 — 17:06 — page 238 — #252
i
i
i
i
i
i
i
i
238 Guide to Advanced Linear Algebra
and there is nothing further to prove. Otherwise f .x/ Df1.x/f2.x/ with
deg f1.x/ nand deg f2.x/ n. By the inductive hypothesis f1.x/ D
u1g1;1.x/ g1;s .x/ and f2.x/ Du2g2;1.x/ g2;t .x/ so we have the
factorization
f .x/ Du1u2g1;1.x/ g1;s .x/g2;1.x/ g2;t .x/;
and by induction we are done.
For the proof of uniqueness, we again proceed by induction. If nD0
then f .x/ Duis a unit and again there is nothing to prove. (f .x/ can-
not be divisible by any polynomial of positive degree.) Suppose that we
have uniqueness for all polynomials of degree at most nand let f .x/ have
degree nC1. Let f .x/ Dug1.x/ gk.x/ Dvh1.x/ hl.x/. If f .x/
is irreducible, then by the definition of irreducibility these factorizations
must be f .x/ Dug1.x/ Dvh1.x/ and then g1.x/ and h1.x/ are asso-
ciates of each other. If f .x/ is not irreducible, consider the factor gk.x/.
Now gk.x/ divides f .x/, so it divides the product vh1.x/ hl.x/ D
.vh1.x/ hl1.x//hl.x/. Since gk.x/ is irreducible, by Lemma A.2.2 it
is prime, so gk.x/ must divide one of these two factors. If gk.x/ divides
hl.x/, then, since hl.x/ is irreducible, we have hl.x/ Dgk.x/w for some
unit w, in which case gk.x/ and hl.x/ are associates. If not, then gk.x/ di-
vides the other factor vh1.x/ hl1D.vh1.x/ hl2.x//hl1.x/ and
we may repeat the argument. Eventually we may find that gk.x/ divides
some hi.x/, in which case gk.x/ and hi.x/ are associates. By reordering
the factors, we may simply assume that gk.x/ and hl.x/ are associates,
hl.x/ Dgk.x/w for some unit w. Then f .x/ Dug1.x/ gk.x/ D
vh1.x/ hl.x/ D.vw/h1.x/ hl1.x/g.x/. Let f1.x/ Df .x/=g.x/.
We see that
f1.x/ Dug1.x/ gk1.x/ D.vw/h1.x/ hl1.x/:
Now deg f1.x/ n, so by the inductive hypothesis k1Dl1, i.e., kDl,
and after reordering gi.x/ and hi.x/ are associates for iD1; : : : ; k 1.
We have already shown this is true for iDkas well, so by induction we
are done.
There is an important special case of this theorem that is worth observ-
ing separately.
Corollary A.2.4. Let Fbe algebraically closed and let f .x/ be a nonzero
polynomial in FŒx. Then f .x/ can be written uniquely as
f .x/ Duxr1xrn
“book” — 2011/3/4 — 17:06 — page 239 — #253
i
i
i
i
i
i
i
i
A.3. Polynomials as expressions and functions 239
with u¤0and r1; : : : ; rnelements of F.
Proof. If Fis algebraically closed, every irreducible polynomial is linear,
of the form g.x/ Dv.x r/, and then this result follows immediately from
Theorem A.2.3. (This special case is easy to prove directly, by induction on
the degree of f .x/. We leave the details to the reader.)
Remark A.2.5. By Theorem A.1.13, Corollary A.2.4 applies in particular
when FDC.Þ
A.3 Polynomials as expressions
and polynomials as functions
Let p.x/ 2FŒxbe a polynomial. There are two ways to regard p.x/: as an
expression p.x/ Da0Ca1xCCanxn, and as a function p.x/ WF!F
by c7! p.c/. We have at times, when dealing with the case FDRor C,
conflated these two approaches. In this section we show there is no harm in
doing so. We show that if Fis an infinite field, then two polynomials are
equal as expressions if and only if they are equal as functions.
Lemma A.3.1. Let p.x/ 2FŒxbe a polynomial and let c2F. Then
p.x/ D.x c/q.x/ Cp.c/ for some polynomial q.x/.
Proof. By Theorem A.1.3, p.x/ D.x c/q.x/ Cafor some a2F. Now
substitute xDcto obtain aDp.c/.
Lemma A.3.2. Let p.x/ be a nonzero polynomial of degree n. Then p.x/
has at most nroots, counting multiplicities, in F. In particular, p.x/ has at
most ndistinct roots in F.
Proof. We proceed by induction on n. The lemma is clearly true for nD0.
Suppose it is true for all polynomials of degree n. Let p.x/ be a nonzero
polynomial of degree nC1. If p.x/ does not have a root in F, we are done.
Otherwise let rbe a root of p.x/. By Lemma A.3.1, p.x/ D.x r/q.x/,
where q.x/ has degree n. By the inductive hypothesis, q.x/ has at most n
roots in F, so p.x/ has at most nC1roots in F, and by induction we are
done.
Corollary A.3.3. Let p.x/ be a polynomial of degree at most n. If p.x/ has
more than nroots, then p.x/ D0(the 0polynomial).
“book” — 2011/3/4 — 17:06 — page 240 — #254
i
i
i
i
i
i
i
i
240 Guide to Advanced Linear Algebra
Corollary A.3.4. (1) Let f .x/ and g.x/ be polynomials of degree at most
n. If f .c/ Dg.c/ for more than nvalues of c, then f .x/ Dg.x/.
(2) Let Fbe an infinite field. If f .x/ Dg.x/ for every x2F, then
f .x/ Dg.x/.
Proof. Apply Corollary A.3.3 to the polynomial p.x/ Df .x/ g.x/.
Remark A.3.5. Corollary A.3.4(2) is false if Fis a finite field. For exam-
ple, suppose that Fhas nelements c1; : : : ; cn. Then f .x/ D.x c1/.x
c2/.x cn/has f .c/ D0for every c2F, but f .x/ ¤0.Þ
“book” — 2011/3/4 — 17:06 — page 241 — #255
i
i
i
i
i
i
i
i
CHAPTER B
Modules over principal
ideal domains
In this appendix, for the benefit of the more algebraically knowledgable
reader, we show how to derive canonical forms for linear transformations
quickly and easily from the basic structure theorems for modules over a
principal ideal domain (PID).
B.1 Definitions and structure theorems
We begin by recalling the definition of a module.
Definition B.1.1. Let Rbe a commutative ring. An R-module is a set
Mwith a pair of operations satisfying the conditions of Definition 1.1.1
except that the scalars are assumed to be elements of the ring R.Þ
One of the most basic differences between vector spaces (where the
scalars are elements of a field) and modules (where they are elements of a
ring) is the possibility that modules may have torsion.
Definition B.1.2. Let Mbe an R-module. An element m¤0of Mis
atorsion element if r m D0for some r2R,r¤0. If mis any element of
Mits annihilator ideal Ann.m/ is the ideal of Rgiven by
Ann.m/ D fr2Rjrm D0g:
(Thus Ann.0/ DRand m¤0is a torsion element of Mif and only if
Ann.m/ ¤ f0g.)
If every nonzero element of Mis a torsion element then Mis a torsion
R-module. Þ
241
“book” — 2011/3/4 — 17:06 — page 242 — #256
i
i
i
i
i
i
i
i
242 Guide to Advanced Linear Algebra
Remark B.1.3. Here is a very special case: Let MDRand regard M
as an R-module. Then we have the dual module Mdefined analogously to
Definition 1.6.1, and we can identify Mwith Ras follows: Let f2M,
so fWM!R. Then we let f7! f .1/. (Otherwise said, any f2Mis
given by multiplication by some fixed element of R,f .r/ Dr0r, and then
f7! r0.) For s02Rconsider the principal ideal JDs0RD fs0rjr2
Rg. Let NDJand regard Nas a submodule of M. Then
Ann s0DAnn.N /
where Ann.N / is the annihilator as defined in Definition 1.6.10. Þ
Here is the basic structure theorem. It appears in two forms.
Theorem B.1.4. Let Rbe a principal ideal domain (PID). Let Mbe a
finitely generated torsion R-module. Then there is an isomorphism
MŠM1˚  ˚ Mk
where each Miis a nonzero R-module generated by a single element wi,
and Ann.w1/   Ann.wk/. The integer kand the set of ideals
fAnn.w1/; : : : ; Ann.wk/gare well-defined.
Theorem B.1.5. Let Rbe a principal ideal domain (PID). Let Mbe a
finitely generated torsion R-module. Then there is an isomorphism
MŠN1˚  ˚ Nl
where each Niis a nonzero R-module generated by a single element xi,
and Ann.xi/Dpei
iRis the principal ideal of Rgenerated by the element
pei
i, where pi2Ris a prime and eiis a positive integer. The integer land
the set of ideals fpe1
1R; : : : ; pel
lRgare well-defined.
Remark B.1.6. In the notation of Theorem B.1.4, if Ann.wi/is the prin-
cipal ideal generated by the element riof R, the condition Ann.w1/
  Ann.wk/is that riis divisible by riC1for each iD1; : : : ; k 1.Þ
B.2 Derivation of canonical forms
We now use Theorem B.1.4 to derive rational canonical form, and Theo-
rem B.1.5 to derive Jordan canonical form.
We assume throughout that Vis a finite-dimensional F-vector space and
that TWV!Vis a linear transformation.
“book” — 2011/3/4 — 17:06 — page 243 — #257
i
i
i
i
i
i
i
i
B.2. Derivation of canonical forms 243
We let Rbe the polynomial ring RDFŒxand recall that Ris a PID.
We regard Vas an R-module by defining
p.x/.v/ Dp.T /.v/ for any p.x/ 2Rand any v2V:
Lemma B.2.1. Vis a finitely generated torsion R-module.
Proof. Vis a finite-dimensional F-vector space, so it has a finite basis BD
fv1; : : : ; vng. Then the finite set Bgenerates Vas an F-vector space, so
certainly generates Vas an R-module.
To prove that v¤0is a torsion element, we need to show that p.T /.v/ D
0for some nonzero polynomial p.x/ 2R. We proved this, for every v2V,
in the course of proving Theorem 5.1.1 (or, in matrix terms, Lemma 4.1.18).
To continue, observe that Ann.v/, as defined in Definition B.1.2, is the
principal ideal of Rgenerated by the monic polynomial mT;v.x/ of Theo-
rem 5.1.1, and we called this polynomial the T-annihilator of vin Defini-
tion 5.1.2.
We also observe that a subspace Wof Vis an R-submodule of Vif and
only if it is T-invariant.
Theorem B.2.2 (Rational canonical form).Let Vbe a finite-dimensional
vector space and let TWV!Vbe a linear transformation. Then Vhas a
basis Bsuch that ŒTBDMis in rational canonical form. Furthermore,
Mis unique.
Proof. We have simply restated (verbatim) Theorem 5.5.4(1). This is the
matrix translation of Theorem 5.5.2 about the existence of rational canoni-
cal T-generating sets. Examining the definition of a rational canonical T-
generating set in Definition 5.5.1, we see that the elements fwigof that
definition are exactly the elements fwigof Theorem B.1.4, and the ide-
als Ann.wi/are the principal ideals of Rgenerated by the polynomials
mT;wi.x/.
Corollary B.2.3. In the notation of Theorem B.1.4, let fi.x/ DmT;wi.x/.
Then
(1) The minimum polynomial mT.x/ Df1.x/.
(2) The characteristic polynomial cT.x/ Df1.x/ fk.x/.
(3) mT.x/ divides cT.x/.
“book” — 2011/3/4 — 17:06 — page 244 — #258
i
i
i
i
i
i
i
i
244 Guide to Advanced Linear Algebra
(4) mT.x/ and cT.x/ have the same irreducible factors.
(5) (Cayley-Hamilton Theorem) cT.T/D0.
Proof. For parts (1) and (2), see Corollary 5.5.6. Parts (3) and (4) are then
immediate. For (5), mT.T/D0and mT.x/ divides cT.x/, so cT.T/D
0.
Remark B.2.4. We have restated this result here for convenience, but the
full strength of Theorem B.2.2 is not necessary to obtain parts (2), (4), and
(5) of Corollary B.2.3—see Theorem 5.3.1 and Corollary 5.3.4. Þ
Theorem B.2.5 (Jordan canonical form).Let Fbe an algebraically closed
field and let Vbe a finite-dimensional F-vector space. Let TWV!Vbe
a linear transformation. Then Vhas a basis Bwith ŒTBDJa matrix in
Jordan canonical form. Jis unique up to the order of the blocks.
Proof. We have simply restated (verbatim) Theorem 5.6.5(1). To prove this,
apply Theorem B.1.5 to Vto obtain a decomposition VDN1˚  ˚ Nl
as R-modules, or, equivalently, a T-invariant direct sum decomposition of
V. Since Fis algebraically closed, each prime in Ris a linear polynomial.
Now apply Lemma 5.6.1 and Corollary 5.6.2 to each submodule Ni.
Remark B.2.6. This proof goes through verbatim to establish Theo-
rem 5.6.6, the existence and essential uniqueness of Jordan canonical form,
under the weaker hypothesis that the characteristic polynomial cT.x/ fac-
tors into a product of linear factors. Also, replacing Lemma 5.6.1 by Lemma
5.6.8 and Corollary 5.6.2 by Corollary 5.6.10 gives Theorem 5.6.13, the ex-
istence and essential uniqueness of generalized Jordan canonical form. Þ
“book” — 2011/3/4 — 17:06 — page 245 — #259
i
i
i
i
i
i
i
i
Bibliography
There are dozens, if not hundreds, of elementary linear algebra texts, and
we leave it to the reader to choose her or his favorite. Other than that, we
have:
[1] Kenneth M. Hoffman and Ray A. Kunze, Linear Algebra, second
edition, Prentice Hall, 1971.
[2] Paul R. Halmos, Finite Dimensional Vector Spaces, second edition,
Springer-Verlag, 1987.
[3] William A. Adkins and Steven H. Weintraub, Algebra: An Approach
via Module Theory, Springer-Verlag, 1999.
[4] Steven H. Weintraub, Jordan Canonical Form: Theory and Practice,
Morgan and Claypool, 2009.
[1] is an introductory text that is on a distinctly higher level than most,
and is highly recommended.
[2] is a text by a recognized master of mathematical exposition, and has
become a classic.
[3] is a book on a higher level than this one, that proves the structure
theorems for modules over a PID and uses them to obtain canonical forms
for linear transformations (compare the approach in Appendix B).
[4] is a short book devoted entirely to Jordan canonical form. The proof
there is a bit more elementary, avoiding use of properties of polynomials.
While the algorithm for finding a Jordan basis and the Jordan canonical
form of a linear transformation is more or less canonical, our exposition of
it here follows the exposition in [4]. In particular, the eigenstructure picture
(ESP) of a linear transformation was first introduced there.
245
“book” — 2011/3/4 — 17:06 — page 246 — #260
i
i
i
i
i
i
i
i
“book” — 2011/3/4 — 17:06 — page 247 — #261
i
i
i
i
i
i
i
i
Index
adjoint, 184, 202
algebraically closed, 234
alternation, 60
annihilator, 34, 35
Arf invariant, 181
associates, 237
basis, 10
orthonormal, 186
standard, 13
Bessel’s inequality, 193
block
generalized Jordan, 140
Jordan, 137
canonical form
generalized Jordan, 140
Jordan, 138, 244
rational, 134, 243
Cauchy-Schwartz-Buniakowsky inequal-
ity, 190
Cayley-Hamilton Theorem, 101, 122, 244
chain of generalized eigenvectors, 141
change of basis matrix, 47
codimension, 28
cokernel, 28
column space, 7
companion matrix, 115, 134
complement, 24
T-invariant, 123
congruent, 170
conjugate congruent, 170
conjugate linear, 166
conjugation, 165
coordinate vector, 42
Cramer’s rule, 72
degree, 231
determinant, 63, 68, 73
diagonalizable, 102
simultaneously, 162
dimension, 12, 25
direct sum
T-invariant, 123
orthogonal, 172, 197
dual, 30, 36
double, 39, 40
eigenspace, 91
generalized, 92
eigenstructure picture, 141
labelled, 140
eigenvector, 91
generalized, 92
elementary divisors, 135
endomorphism, 7
expansion by minors, 71
extension field, 3
form
bilinear, 166
diagonalizable, 176
even, 175
Hermitian, 170
indefinite, 177
matrix of, 168
negative definite, 177
odd, 175
positive definite, 177
quadratic, 180
sesquilinear, 166
skew-Hermitian, 170
skew-symmetric, 170
symmetric, 170
Fourier coefficients, 215
247
“book” — 2011/3/4 — 17:06 — page 248 — #262
i
i
i
i
i
i
i
i
248 Index
classical, 216
frame, 200
Fredholm, 29
Fundamental Theorem of Algebra, 234
Fundamental Theorem of Calculus, 6
Gram-Schmidt process, 197
greatest common divisor (gcd), 233
group
general linear, 8, 79, 83, 223
Lie, 223
orthogonal, 225
special linear, 74, 224
special orthogonal, 225
special unitary, 227
symplectic, 228
unitary, 227
Hermitian, 171
Hilbert matrix, 86
homomorphism, 7
Hurwitz’s criterion, 178
ideal, 232
annihilator, 241
generator of, 232
principal, 232
identity, 4
identity matrix, 4
image, 7
independent, 23
index, 29, 91, 92
inner product, 189
inner product space, 189
irreducible, 236
isometric, 171
isometry, 171, 204
isometry group, 171
isomorphic, 5
isomorphism, 5
joke, 22
Jordan basis, 137
Jordan block
generalized, 140
kernel, 7, 172
Laplace expansion, 70
least common multiple (lcm), 235
linear combination, 8
linear transformation, 3
quotient, 118
linearly independent, 9
matrix of a linear transformation, 44, 45
minor, 70
Multilinearity, 60
multiplicity
algebraic, 94
geometric, 94
nonsingular, 167
norm, 190, 193
normal, 203
normalization map, 199
notation
.V; '/, 171
C.f .x//, 115
Ek
, 92
E1
, 92
E, 92
I, 4
In, 225
Ip;q , 225
Jn, 225
PC B, 47
V, 30
V1?V2, 172
W?, 197
W1C    C Wk, 23
W1˚    ˚ Wk, 23
W1?W2, 197
ŒTB, 45
ŒTC B, 44
Œ'B, 168
Œa, 176
ŒvB, 42
Adj.A/, 71
Ann.U /, 35
Ann.m/, 241
Ann.U /, 34
En, 13
EndF.V /, 7
“book” — 2011/3/4 — 17:06 — page 249 — #263
i
i
i
i
i
i
i
i
Index 249
F1, 2
Fn, 2
F11, 2
FA, 3
GLn.F/, 223
GL.V /, 8
GLn.F/, 8
HomF.V; W /, 7
I, 4
Im.T/, 7
Ker.T/, 7
W, 199
SLn.F/, 74
Span.B/, 9
T, 36
Tadj, 184
TA, 4
kvk, 190
Vol, 58
alg-mult./, 94
˛', 167
˛', 185
deg, 231
det, 68
det.A/, 63
det.T/, 73
dim.V /, 12
hx; yi, 166
AC
k, 200
Gn;k .F/, 200
Sn;k .F/, 200
T, 202
Ev, 39
L, 6
R, 6
, 27
Op;q.R/, 225
cA.x/, 93, 114
cT.x/, 94, 114
dj./, 143
dex
j./, 143
dnew
j./, 143
ei, 3
mA.x/, 97
mT;v.x/, 111
mT.x/, 97, 112
On.C/, 225
On.R/, 225
Un.C/, 227
Up;q.C/, 227
tA, 54
geom-mult(), 94
nullspace, 7
orientation, 82
orthogonal, 172, 192, 205
orthogonal complement, 197
orthogonal projection, 199
orthonormal, 192
parallelogram law, 193
permutation, 66
polar decomposition, 221
polarization identities, 191
polynomial
characteristic, 93, 94, 114, 119, 243
minimum, 97, 112, 119, 243
monic, 232
polynomials
Chebyshev of the first kind, 213
Chebyshev of the second kind, 214
Hermite, 215
Legendre, 213
prime, 236
projection
canonical, 27
quotient, 26
R-module, 241
rank, 173
refinement
quadratic, 181
relatively prime, 235
Schur’s theorem, 210
self-adjoint, 203
shift
left, 6
right, 6
signature, 178
similar, 51
singular value decomposition, 220
“book” — 2011/3/4 — 17:06 — page 250 — #264
i
i
i
i
i
i
i
i
250 Index
singular values, 220
skew-Hermitian, 171
skew-symmetric, 171
spanning set, 9
spectral theorem, 209
Stirling numbers, 52
subspace
affine, 25
orthogonal, 173
sum, 23
direct, 23
Sylvester’s law of inertia, 177
symmetric, 171
symmetric group, 66
T-annihilator, 111
T-generate, 117
T-generating set
rational canonical, 132
T-invariant, 117
T-span, 117
torsion, 241
transpose, 54
triangle inequality, 190
triangularizable, 97
unique factorization, 237
unit vector, 192
unitary, 205
units, 236
volume function, 58, 60
“book” — 2011/3/4 — 17:06 — page 251 — #265
i
i
i
i
i
i
i
i
About the Author
Steven H. Weintraub is Professor of Mathematics at Lehigh University.
He was born in New York, received his undergraduate and graduate degrees
from Princeton University, and was on the permanent faculty at Louisiana
State University for many years before moving to Lehigh in 2001. He has
had visiting appointments at UCLA, Rutgers, Yale, Oxford, G¨ottingen,
Bayreuth, and Hannover, and has lectured at universities and conferences
around the world. He is the author of over 50 research papers, and this is his
ninth book.
Prof. Weintraub has served on the Executive Committee of the Eastern
Pennsylvania-Delaware section of the MAA, and has extensive service with
the AMS, including currently serving as the Associate Secretary for the
AMS Eastern Section.
251

Navigation menu