A Guide To Advanced Linear Algebra

User Manual:
Open the PDF directly: View PDF .
Page Count: 266
Download
Open PDF In Browser	View PDF
i

i
“book” — 2011/3/4 — 17:06 — page i — #1

i

i

A Guide
to
Advanced Linear Algebra

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page ii — #2

i

i

c 2011 by
The Mathematical Association of America (Incorporated)
Library of Congress Catalog Card Number 2011923993
Print Edition ISBN 978-0-88385-351-1
Electronic Edition ISBN 978-0-88385-967-4
Printed in the United States of America
Current Printing (last digit):
10 9 8 7 6 5 4 3 2 1

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page iii — #3

i

i

The Dolciani Mathematical Expositions
NUMBER FORTY-FOUR

MAA Guides # 6

A Guide
to
Advanced Linear Algebra

Steven H. Weintraub
Lehigh University

Published and Distributed by
The Mathematical Association of America

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page iv — #4

i

i

DOLCIANI MATHEMATICAL EXPOSITIONS
Committee on Books
Frank Farris, Chair
Dolciani Mathematical Expositions Editorial Board
Underwood Dudley, Editor
Jeremy S. Case
Rosalie A. Dance
Tevian Dray
Thomas M. Halverson
Patricia B. Humphrey
Michael J. McAsey
Michael J. Mossinghoff
Jonathan Rogness
Thomas Q. Sibley

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page v — #5

i

i

The DOLCIANI MATHEMATICAL EXPOSITIONS series of the Mathematical
Association of America was established through a generous gift to the Association
from Mary P. Dolciani, Professor of Mathematics at Hunter College of the City University of New York. In making the gift, Professor Dolciani, herself an exceptionally
talented and successful expositor of mathematics, had the purpose of furthering the
ideal of excellence in mathematical exposition.
The Association, for its part, was delighted to accept the gracious gesture initiating the revolving fund for this series from one who has served the Association with
distinction, both as a member of the Committee on Publications and as a member of
the Board of Governors. It was with genuine pleasure that the Board chose to name
the series in her honor.
The books in the series are selected for their lucid expository style and stimulating mathematical content. Typically, they contain an ample supply of exercises,
many with accompanying solutions. They are intended to be sufficiently elementary
for the undergraduate and even the mathematically inclined high-school student to
understand and enjoy, but also to be interesting and sometimes challenging to the
more advanced mathematician.
1. Mathematical Gems, Ross Honsberger
2. Mathematical Gems II, Ross Honsberger
3. Mathematical Morsels, Ross Honsberger
4. Mathematical Plums, Ross Honsberger (ed.)
5. Great Moments in Mathematics (Before 1650), Howard Eves
6. Maxima and Minima without Calculus, Ivan Niven
7. Great Moments in Mathematics (After 1650), Howard Eves
8. Map Coloring, Polyhedra, and the Four-Color Problem, David Barnette
9. Mathematical Gems III, Ross Honsberger
10. More Mathematical Morsels, Ross Honsberger
11. Old and New Unsolved Problems in Plane Geometry and Number Theory,
Victor Klee and Stan Wagon
12. Problems for Mathematicians, Young and Old, Paul R. Halmos
13. Excursions in Calculus: An Interplay of the Continuous and the Discrete,
Robert M. Young
14. The Wohascum County Problem Book, George T. Gilbert, Mark Krusemeyer,
and Loren C. Larson
15. Lion Hunting and Other Mathematical Pursuits: A Collection of Mathematics,
Verse, and Stories by Ralph P. Boas, Jr., edited by Gerald L. Alexanderson and
Dale H. Mugler
16. Linear Algebra Problem Book, Paul R. Halmos
17. From Erdős to Kiev: Problems of Olympiad Caliber, Ross Honsberger
18. Which Way Did the Bicycle Go? . . . and Other Intriguing Mathematical Mysteries, Joseph D. E. Konhauser, Dan Velleman, and Stan Wagon

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page vi — #6

i

i

19. In Pólya’s Footsteps: Miscellaneous Problems and Essays, Ross Honsberger
20. Diophantus and Diophantine Equations, I. G. Bashmakova (Updated by Joseph
Silverman and translated by Abe Shenitzer)
21. Logic as Algebra, Paul Halmos and Steven Givant
22. Euler: The Master of Us All, William Dunham
23. The Beginnings and Evolution of Algebra, I. G. Bashmakova and G. S. Smirnova
(Translated by Abe Shenitzer)
24. Mathematical Chestnuts from Around the World, Ross Honsberger
25. Counting on Frameworks: Mathematics to Aid the Design of Rigid Structures,
Jack E. Graver
26. Mathematical Diamonds, Ross Honsberger
27. Proofs that Really Count: The Art of Combinatorial Proof, Arthur T. Benjamin
and Jennifer J. Quinn
28. Mathematical Delights, Ross Honsberger
29. Conics, Keith Kendig
30. Hesiod’s Anvil: falling and spinning through heaven and earth, Andrew J.
Simoson
31. A Garden of Integrals, Frank E. Burk
32. A Guide to Complex Variables (MAA Guides #1), Steven G. Krantz
33. Sink or Float? Thought Problems in Math and Physics, Keith Kendig
34. Biscuits of Number Theory, Arthur T. Benjamin and Ezra Brown
35. Uncommon Mathematical Excursions: Polynomia and Related Realms, Dan
Kalman
36. When Less is More: Visualizing Basic Inequalities, Claudi Alsina and Roger
B. Nelsen
37. A Guide to Advanced Real Analysis (MAA Guides #2), Gerald B. Folland
38. A Guide to Real Variables (MAA Guides #3), Steven G. Krantz
39. Voltaire’s Riddle: Micromégas and the measure of all things, Andrew J.
Simoson
40. A Guide to Topology, (MAA Guides #4), Steven G. Krantz
41. A Guide to Elementary Number Theory, (MAA Guides #5), Underwood Dudley
42. Charming Proofs: A Journey into Elegant Mathematics, Claudi Alsina and
Roger B. Nelsen
43. Mathematics and Sports, edited by Joseph A. Gallian
44. A Guide to Advanced Linear Algebra, (MAA Guides #6), Steven H. Weintraub
MAA Service Center
P.O. Box 91112
Washington, DC 20090-1112
1-800-331-1MAA FAX: 1-301-206-9789

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page vii — #7

i

i

Preface
Linear algebra is a beautiful and mature field of mathematics, and mathematicians have developed highly effective methods for solving its problems.
It is a subject well worth studying for its own sake.
More than that, linear algebra occupies a central place in modern mathematics. Students in algebra studying Galois theory, students in analysis
studying function spaces, students in topology studying homology and cohomology, or for that matter students in just about any area of mathematics,
studying just about anything, need to have a sound knowledge of linear algebra.
We have written a book that we hope will be broadly useful. The core of
linear algebra is essential to every mathematician, and we not only treat this
core, but add material that is essential to mathematicians in specific fields,
even if not all of it is essential to everybody.
This is a book for advanced students. We presume you are already familiar with elementary linear algebra, and that you know how to multiply matrices, solve linear systems, etc. We do not treat elementary material here,
though in places we return to elementary material from a more advanced
standpoint to show you what it really means. However, we do not presume
you are already a mature mathematician, and in places we explain what (we
feel) is the “right” way to understand the material. The author feels that one
of the main duties of a teacher is to provide a viewpoint on the subject, and
we take pains to do that here.
One thing that you should learn about linear algebra now, if you have not
already done so, is the following: Linear algebra is about vector spaces and
linear transformations, not about matrices. This is very much the approach
of this book, as you will see upon reading it.
We treat both the finite and infinite dimensional cases in this book,
and point out the differences between them, but the bulk of our attention
is devoted to the finite dimensional case. There are two reasons: First, the
vii

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page viii — #8

i

i

viii

A Guide to Advanced Linear Algebra

strongest results are available here, and second, this is the case most widely
used in mathematics. (Of course, matrices are available only in the finite
dimensional case, but, even here, we almost always argue in terms of linear
transformations rather than matrices.)
We regard linear algebra as part of algebra, and that guides our approach. But we have followed a middle ground. One of the principal goals
of this book is to derive canonical forms for linear transformations on finite dimensional vector spaces, i.e., rational and Jordan canonical forms.
The quickest and perhaps most enlightening approach is to derive them as
corollaries of the basic structure theorems for modules over a principal ideal
domain (PID). Doing so would require a good deal of background, which
would limit the utility of this book. Thus our main line of approach does
not use these, though we indicate this approach in an appendix. Instead we
adopt a more direct argument.
We have written a book that we feel is a thorough, though intentionally
not encyclopedic, treatment of linear algebra, one that contains material
that is both important and deservedly “well known”. In a few places we
have succumbed to temptation and included material that is not quite so
well known, but that in our opinion should be.
We hope that you will be enlightened not only by the specific material
in the book but by its style of argument–we hope it will help you learn
to “think like a mathematician”. We also hope this book will serve as a
valuable reference throughout your mathematical career.
Here is a rough outline of the text. We begin, in Chapter 1, by introducing the basic notions of linear algebra, vector spaces and linear transformations, and establish some of their most important properties. In Chapter 2 we introduce coordinates for vectors and matrices for linear transformations. In the first half of Chapter 3 we establish the basic properties
of determinants, and in the last half of that chapter we give some of their
applications. Chapters 4 and 5 are devoted to the analysis of the structure
of a single linear transformation from a finite dimensional vector space to
itself. In particular, in these chapters, we develop eigenvalues, eigenvectors, and generalized eigenvectors, and derive rational and Jordan canonical
forms. In Chapter 6 we introduce additional structure on a vector space,
that of a (bilinear, sesquilinear, or quadratic) form, and analyze these forms.
In Chapter 7 we specialize the situation of Chapter 6 to that of a positive
definite inner product on a real or complex vector space, and in particular
derive the spectral theorem. In Chapter 8 we provide an introduction to Lie
groups, which are central objects in mathematics and are a meeting place for

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page ix — #9

i

i

Preface

ix

algebra, analysis, and topology. (For this chapter we require the additional
background knowledge of the inverse function theorem.) In Appendix A we
review basic properties of polynomials and polynomial rings that we use,
and in Appendix B we rederive some of our results on canonical forms of a
linear transformation from the structure theorems for modules over a PID.
We have provided complete proofs of just about all the results in this
book, except that we have often omitted proofs that are routine without
comment.
As we have remarked above, we have tried to write a book that will be
widely applicable. This book is written in an algebraic spirit, so the student of algebra will find items of interest and particular applications, too
numerous to mention here, throughout the book. The student of analysis
will appreciate the fact that we not only consider finite dimensional vector spaces, but also infinite dimensional ones, and will also appreciate our
material on inner product spaces and our particular examples of function
spaces. The student of algebraic topology will appreciate our dimensioncounting arguments and our careful attention to duality, and the student of
differential topology will appreciate our material on orientations of vector
spaces and our introduction to Lie groups.
No book can treat everything. With the exception of a short section on
Hilbert matrices, we do not treat computational issues at all. They do not fit
in with our theoretical approach. Students in numerical analysis, for example, will need to look elsewhere for this material.
To close this preface, we establish some notational conventions. We will
denote both sets (usually but not always sets of vectors) and linear transformations by script letters A; B; : : : ; Z. We will tend to use script letters near
the front of the alphabet for sets and script letters near the end of the alphabet for linear transformations. T will always denote a linear transformation
and I will always denote the identity linear transformation. Some particular linear transformations will have particular notations, often in boldface.
Capital letters will denote either vector spaces or matrices. We will tend to
denote vector spaces by capital letters near the end of the alphabet, and V
will always denote a vector space. Also, I will almost always denote the
identity matrix. E and F will denote arbitrary fields and Q, R, and C will
denote the fields of rational, real, and complex numbers respectively. Z will
denote the ring of integers. We will use A  B to mean that A is a subset of B and A  B to mean that A is a proper subset of B. A D .aij /
will mean that A is the matrix whose entry in the .i; j / position is aij .
A D Œv1 j v2 j    j vn  will mean that A is the matrix whose i th column

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page x — #10

i

i

x

A Guide to Advanced Linear Algebra

is vi . We will denote the transpose of the matrix A by tA (not by At ). Finally, we will write B D fvi g as shorthand for B D fvi gi 2I where I is an
P
P
indexing set, and ci vi will mean i 2I ci vi .
We follow a conventional numbering scheme with, for example, Remark 1.3.12 denoting the 12th numbered item in Section 1.3 of Chapter 1.
We use  to denote the end of proofs. Theorems, etc., are set in italics, so
the end of italics denotes the end of their statements. But definitions, etc.,
are set in ordinary type, so there is ordinarily nothing to denote the end of
their statements. We use Þ for that.
Steven H. Weintraub
Bethlehem, PA, USA
January 2010

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page xi — #11

i

i

Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1 Vector spaces and linear transformations . . . . . . . . . . . . . . . . . . . . .
1.1 Basic definitions and examples . . . . . . . . . . . . . . .
1.2 Basis and dimension . . . . . . . . . . . . . . . . . . . . .
1.3 Dimension counting and applications . . . . . . . . . . . .
1.4 Subspaces and direct sum decompositions . . . . . . . . .
1.5 Affine subspaces and quotient spaces . . . . . . . . . . . .
1.6 Dual spaces . . . . . . . . . . . . . . . . . . . . . . . . .

1
1
8
17
22
24
30

2 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Coordinates for vectors . . . . . . . . . . . . . . . . . . .
2.2 Matrices for linear transformations . . . . . . . . . . . . .
2.3 Change of basis . . . . . . . . . . . . . . . . . . . . . . .
2.4 The matrix of the dual . . . . . . . . . . . . . . . . . . . .

41
42
43
46
53

3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 The geometry of volumes . . . . . . . . . . . . . . . . . .
3.2 Existence and uniqueness of determinants . . . . . . . . .
3.3 Further properties . . . . . . . . . . . . . . . . . . . . . .
3.4 Integrality . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Hilbert matrices . . . . . . . . . . . . . . . . . . . . . . .

57
57
65
68
74
78
86

4 The structure of a linear transformation I . . . . . . . . . . . . . . . . . . . .
4.1 Eigenvalues, eigenvectors, and generalized eigenvectors . .
4.2 Some structural results . . . . . . . . . . . . . . . . . . .
4.3 Diagonalizability . . . . . . . . . . . . . . . . . . . . . .
4.4 An application to differential equations . . . . . . . . . . .

89
91
97
102
104
xi

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page xii — #12

i

i

xii

Contents

5 The structure of a linear transformation II . . . . . . . . . . . . . . . . . . .
5.1 Annihilating, minimum, and characteristic polynomials . .
5.2 Invariant subspaces and quotient spaces . . . . . . . . . . .
5.3 The relationship between the characteristic and minimum
polynomials . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Invariant subspaces and invariant complements . . . . . . .
5.5 Rational canonical form . . . . . . . . . . . . . . . . . . .
5.6 Jordan canonical form . . . . . . . . . . . . . . . . . . . .
5.7 An algorithm for Jordan canonical form and Jordan basis .
5.8 Field extensions . . . . . . . . . . . . . . . . . . . . . . .
5.9 More than one linear transformation . . . . . . . . . . . .

109
111
116
119
122
132
136
140
157
159

6 Bilinear, sesquilinear, and quadratic forms . . . . . . . . . . . . . . . . . . . . 165
6.1 Basic definitions and results . . . . . . . . . . . . . . . . . 165
6.2 Characterization and classification theorems . . . . . . . . 170
6.3 The adjoint of a linear transformation . . . . . . . . . . . . 184
7 Real and complex inner product spaces . . . . . . . . . . . . . . . . . . . . . . . 189
7.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . 189
7.2 The Gram-Schmidt process . . . . . . . . . . . . . . . . . 196
7.3 Adjoints, normal linear transformations, and
the spectral theorem . . . . . . . . . . . . . . . . . . . . . 202
7.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.5 The singular value decomposition . . . . . . . . . . . . . . 219
8 Matrix groups as Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.1 Definition and first examples . . . . . . . . . . . . . . . . 223
8.2 Isometry groups of forms . . . . . . . . . . . . . . . . . . 224
A Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
A.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . 231
A.2 Unique factorization . . . . . . . . . . . . . . . . . . . . . 236
A.3 Polynomials as expressions and polynomials as functions . 239
B Modules over principal ideal domains . . . . . . . . . . . . . . . . . . . . . . . . 241
B.1 Definitions and structure theorems . . . . . . . . . . . . . 241
B.2 Derivation of canonical forms . . . . . . . . . . . . . . . . 242
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page xiii — #13

i

i

To the binary tree:

Judy

Jodie

Ethan

Logan

Rachel

Blake

Natalie

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page xiv — #14

i

i

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 1 — #15

i

i

CHAPTER

1

Vector spaces and linear
transformations
In this chapter we introduce the objects we will be studying and investigate
some of their basic properties.

1.1

Basic definitions and examples

Definition 1.1.1. A vector space V over a field F is a set V with a pair
of operations .u; v/ 7! u C v for u; v 2 V and .c; u/ 7! cu for c 2 F ,
v 2 V satisfying the following axioms:
(1) u C v 2 V for any u; v 2 V .
(2) u C v D v C u for any u; v 2 V .
(3) u C .v C w/ D .u C v/ C w for any u; v; w 2 V .
(4) There is a 0 2 V such that 0 C v D v C 0 D v for any v 2 V .
(5) For any v 2 V there is a

v 2 V such that v C . v/ D . v/C v D 0.

(6) cv 2 V for any c 2 F , v 2 V .
(7) c.u C v/ D cu C cv for any c 2 F , u; v 2 V .
(8) .c C d /u D cu C du for any c; d 2 F , u 2 V .
(9) c.du/ D .cd /u for any c; d 2 F , u 2 V .
(10) 1u D u for any u 2 V .
Þ
1

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 2 — #16

i

i

2

1. Vector spaces and linear transformations

Remark 1.1.2. The elements of F are called scalars and the elements of
V are called vectors. The operation .u; v/ 7! uC v is called vector addition
and the operation .c; u/ 7! cu is called scalar multiplication.
Þ
Remark 1.1.3. Properties (1) through (5) of Definition 1.1.1 state that V
forms an abelian group under the operation of vector addition.
Þ

Lemma 1.1.4. (1) 0 2 V is unique.
(2) 0v D 0 for any v 2 V .

(3) . 1/v D v for any v 2 V .
Definition 1.1.5. Let V be a vector space. W is a subspace of V if
W  V and W is a vector space with the same operations of vector addition
and scalar multiplication as V .
Þ
The following result gives an easy way of testing whether a subset W
of V is a subspace of V .
Lemma 1.1.6. Let W  V . Then W is a subspace of V if and only if it
satisfies the equivalent sets of conditions (0), (1), and (2), or (0 0 ), (1), and
(2):
(0) W is nonempty.
(0 0 ) 0 2 W .

(1) If w1 ; w2 2 W then w1 C w2 2 W .

(2) If w 2 W and c 2 F , then cw 2 W .

Example 1.1.7. (1) The archetypal example of a vector space is F n , for
a positive integer n, the space of column vectors
9
8̂2 3 ˇ
ˇ
>
=
< a1 ˇ
6 :: 7 ˇ
n
F D 4 : 5 ˇ ai 2 F :
ˇ
>
;
:̂
an ˇ
We also have the spaces “little F 1 ” and “big F 1 ” which we denote by
F and F 11 respectively (this is nonstandard notation) that are defined
by
8̂2 3 ˇ
9
ˇ
>
=
< a1 ˇ
6a2 7 ˇ
1
F D 4 5 ˇ ai 2 F ; only finitely many nonzero ;
>
:̂ :: ˇˇ
;
:
1

F 11

8̂2 3 ˇ
9
ˇ
>
< a1 ˇ
=
6a2 7 ˇ
D 4 5 ˇ ai 2 F :
>
:̂ :: ˇˇ
;
:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 3 — #17

i

i

1.1. Basic definitions and examples

3

F 1 is a subspace of F 11 .
Let ei denote the vector in F n , F 1 , or F 11 (which we are considering
should be clear from the context) with a 1 in position i and 0 everywhere
else. A formal definition appears in Example 1.2.18(1).
(2) We have the vector spaces r F n , r F 1 , and r F 11 defined analogously to F n , F 1 , and F 11 but using row vectors rather than column
vectors.
(3) Mm;n .F / D fm-by-n matrices with entries in F g. We abbreviate
Mm;m .F / by Mm .F /.
(4) P .F / D fpolynomials p.x/ with coefficients in F g. For a nonnegative integer n, Pn .F / D fpolynomials p.x/ of degree at most n with
coefficients in F g. Although the degree of the 0 polynomial is undefined,
we adopt the convention that 0 2 Pn .F / for every n. Observe that Pn .F /
is a subspace of P .F /, and that Pm .F / is a subspace of Pn .F / whenever
m  n. (We also use the notation F Œx for P .F /. We use P .F / when we
want to consider polynomials as elements of a vector space while we use
F Œx when we want to consider their properties as polynomials.)
(5) F is itself an F -vector space. If E is any field containing F as a
subfield (in which case we say E is an extension field of F ), E is an F vector space. For example, C is an R-vector space.
(6) If A is a set, ffunctions f W A ! F g is a vector space. We denote it
by F A .
(7) C 0 .R/, the space of continuous functions f W R ! R, is a vector
space. For any k > 0, C k .R/ D ffunctions f W R ! R j f; f 0 ; : : : ; f .k/
are all continuousg is a vector space. Also, C 1 .R/ D ffunctions f W R !
R j f has continuous derivatives of all ordersg is a vector space.
Þ
Not only do we want to consider vector spaces, we want to consider the
appropriate sort of functions between them, given by the following definition.
Definition 1.1.8. Let V and W be vector spaces. A function T W V !
W is a linear transformation if for all v; v1; v2 2 V and all c 2 F :
(1) T .cv/ D cT .v/.
(2) T .v1 C v2 / D T .v1 / C T .v2 /.

Þ

Lemma 1.1.9. Let T W V ! W be a linear transformation. Then T .0/ D
0.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 4 — #18

i

i

4

1. Vector spaces and linear transformations

Definition 1.1.10. Let V be a vector space. The identity linear transformation I W V ! V is the linear transformation defined by
I.v/ D v

for every v 2 V:

Þ

Here is one of the most important ways of constructing linear transformations.
Example 1.1.11. Let A be an m-by-n matrix with entries in F , A 2
Mm;n .F /. Then TA W F n ! F m defined by
TA .v/ D Av
Þ

is a linear transformation.

Lemma 1.1.12. (1) Let A and B be m-by-n matrices. Then A D B if and
only if TA D TB .
(2) Every linear transformation T W F n ! F m is TA for some unique
m-by-n matrix A.
Proof. (1) Clearly if A D B, then TA D TB . Conversely, suppose TA D TB .
Then TA .v/ D TB .v/ for every v 2 F n . In particular, if v D ei , then
TA .ei / D TB .ei /, i.e., Aei D Bei . But Aei is just the i th column of A, and
Bei is just the i th column of B. Since this is true for every i , A D B.
(2) T D TA for


A D T .e1 / j T .e2 / j    j T .en / :
Definition 1.1.13. The n-by-n identity matrix I is the matrix defined
by the equation
I D TI :

Þ

It is easy to check that this gives the usual definition of the identity matrix.
We now use Lemma 1.1.12 to define matrix operations.
Definition 1.1.14. (1) Let A be an m-by-n matrix and c be a scalar.
Then D D cA is the matrix defined by TD D cTA .
(2) Let A and B be m-by-n matrices. Then E D A C B is the matrix
defined by TE D TA C TB .
Þ
It is easy to check that these give the usual definitions of the scalar
multiple cA and the matrix sum A C B.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 5 — #19

i

i

1.1. Basic definitions and examples

5

Theorem 1.1.15. Let U , V , and W be vector spaces. Let T W U ! V and
S W V ! W be linear transformations. Then the composition S ı T W U !
W , defined by .S ı T /.u/ D S.T .u//, is a linear transformation.
Proof.
.S ı T /.cu/ D S.T .cu// D S.cT .u//

D cS.T .u// D c.S ı T /.u/

and
.S ı T /.u1 C u2 / D S.T .u1 C u2 // D S.T .u1 / C T .u2 //
D S.T .u1 // C S.T .u2 //

D .S ı T /.u1 / C .S ı T /.u2 /:



We now use Theorem 1.1.15 to define matrix multiplication.
Definition 1.1.16. Let A be an m-by-n matrix and B be an n-by-p
matrix. Then D D AB is the m-by-p matrix defined by TD D TA ı TB . Þ
It is routine to check that this gives the usual definition of matrix multiplication.
Theorem 1.1.17. Matrix multiplication is associative, i.e., if A is an m-byn matrix, B is an n-by-p matrix, and C is a p-by-q matrix, then A.BC / D
.AB/C .
Proof. Let D D A.BC / and E D .AB/C . Then D is the unique matrix
defined by TD D TA ı TBC D TA ı .TB ı TC /, while E is the unique
matrix defined by TE D TAB ı TC D .TA ı TB / ı TC . But composition of
functions is associative, TA ı .TB ı TC / D .TA ı TB / ı TC , so D D E, i.e.,
A.BC / D .AB/C .
Lemma 1.1.18. Let T W V ! W be a linear transformation. Then T is
invertible (as a linear transformation) if and only if T is 1-1 and onto.
Proof. T is invertible as a function if and only if T is 1-1 and onto. It is
then easy to check that in this case the function T 1 W W ! V is a linear
transformation.
Definition 1.1.19. An invertible linear transformation T W V ! W is
called an isomorphism. Two vector spaces V and W are isomorphic if there
is an isomorphism T W V ! W .
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 6 — #20

i

i

6

1. Vector spaces and linear transformations

Remark 1.1.20. It is easy to check that being isomorphic is an equivalence relation among vector spaces.
Þ
Although the historical development of calculus preceded the historical development of linear algebra, with hindsight we can see that calculus
“works” because of the three parts of the following example.
Example 1.1.21. Let V D C 1 .R/, the vector spaces of real valued
infinitely differentiable functions on the real line R.
(1) For a real number a, let Ea W V ! R be evaluation at a, i.e.,
Ea .f .x// D f .a/. Then Ea is a linear transformation. We also have the
linear transformation e
Ea W V ! V , where e
Ea .f .x// is the constant function whose value is f .a/.
(2) Let D W V ! V be differentiation, i.e., D.f .x// D f 0 .x/. Then D
is a linear transformation.
(3) For a real number a, let Ia W V ! V be definite integration starting
Rx
at t D a, i.e., Ia .f /.x/ D a f .t/ dt. Then Ia is a linear transformation.
We also have the linear transformation Eb ı Ia , with .Eb ı Ia /.f .x// D
Rb
Þ
a f .x/ dx.

Theorem 1.1.22. (1) D ı Ia D I.
(2) Ia ı D D I e
Ea .

Proof. This is the Fundamental Theorem of Calculus.
Example 1.1.23. (1) Let V D r F 11 . We define L W V ! V (left shift)
and R W V ! V (right shift) by

 

L a1 ; a2 ; a3 ; : : : D a2 ; a3 ; a4 ; : : : ;

 

R a1 ; a2 ; a3 ; : : : D 0; a1 ; a2 ; : : : :

Note that L and R restrict to linear transformations (which we denote
by the same letters) from r F 1 to r F 1 . (We could equally well consider
up-shift and down-shift on F 11 or F 1 , but it is traditional to consider
left-shift and right-shift.)
(2) Let E be an extension field of F . Then for ˛ 2 E, we have the linear
transformation given by multiplication by ˛, i.e., T .ˇ/ D ˛ˇ for every
ˇ 2 E.
(3) Let A and B be sets. We have the vector spaces F A D ff W A !
F g and F B D fg W B ! F g. Let ' W A ! B be a function. Then

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 7 — #21

i

i

1.1. Basic definitions and examples

7

'  W F B ! F A is the linear transformation defined by '  .g/ D g ı ', i.e.,
'  .g/ W A ! F is the function defined by

'  .g/.a/ D g '.a/ for a 2 A:

Note that '  “goes the other way” than '. That is, ' is covariant, i.e.,
pushes points forward, while '  is contravariant, i.e., pulls functions back.
Also, the pull-back is given by composition. This is a situation that recurs
throughout mathematics.
Þ
Here are two of the most important ways in which subspaces arise.

Definition 1.1.24. Let T W V ! W be a linear transformation. Then
the kernel of T is
Ker.T / D fv 2 V j T .v/ D 0g
and the image of T is
Im.T / D fw 2 W j w D T .v/ for some v 2 V g:

Þ

Lemma 1.1.25. In the situation of Definition 1.1.24, Ker.T / is a subspace
of V and Im.T / is a subspace of W .
Proof. It is easy to check that the conditions in Lemma 1.1.6 are satisfied.
Remark 1.1.26. If T D TA , Ker.T / is often called the nullspace of A
and Im.T / is often called the column space of A.
Þ
We introduce one more vector space.
Definition 1.1.27. Let V and W be vector spaces. Then HomF .V; W /,
the space of F -homomorphisms from V to W , is
HomF .V; W / D flinear transformations T W V ! W g:
If W D V , we set EndF .V / D HomF .V; V /, the space of F -endomorphisms
of V .
Þ
Lemma 1.1.28. For any F -vector spaces V and W , HomF .V; W / is a vector space.
Proof. It is routine to check that the conditions in Definition 1.1.1 are satisfied.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 8 — #22

i

i

8

1. Vector spaces and linear transformations

We also have the subset, which is definitely not a subspace, of EndF .V /
consisting of invertible linear transformations.
Definition 1.1.29. (1) Let V be a vector space. The general linear
group GL.V / is
GL.V / D finvertible linear transformations T W V ! V g:
(2) The general linear group GLn .F / is
GLn .F / D finvertible n-by-n matrices with entries in F g:

Þ

Theorem 1.1.30. Let V D F n and W D F m . Then HomF .V; W / is isomorphic to Mm;n .F /. In particular, EndF .V / is isomorphic to Mn .F /. Also,
GL.V / is isomorphic to GLn .F /.
Proof. By Lemma 1.1.12, any T 2 HomF .V; W / is T D TA for a unique
A 2 Mm;n .F /. Then the linear transformation TA 7! A gives an isomorphism from HomF .V; W / to Mm;n .F /. This restricts to a group isomorphism from GLn .F / to GL.V /.
Remark 1.1.31. In the next section we define the dimension of a vector
space and in the next chapter we will see that Theorem 1.1.30 remains true
when V and W are allowed to be any vector spaces of dimensions n and m
respectively.
Þ

1.2

Basis and dimension

In this section we develop the very important notion of a basis of a vector
space. A basis B of the vector space V has two properties: B is linearly
independent and B spans V . We begin by developing each of these two
notions, which are important in their own right. We shall prove that any two
bases of V have the same number of elements, which enables us to define
the dimension of V as the number of elements in any basis of V .
Definition 1.2.1. Let B D fvi g be a subset of V . A vector v 2 V is a
linear combination of the vectors in B if there is a set of scalars fci g, only
finitely many of which are nonzero, such that
X
vD
ci vi :
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 9 — #23

i

i

1.2. Basis and dimension

9

Remark 1.2.2. If we choose all ci D 0 then we obtain
X
0D
ci vi :

This is the trivial linear combination of the vectors in B. Any other linear
combination is nontrivial.
Þ
Remark 1.2.3. In case B D f g, the only linear combination we have is
the empty linear combination, whose value we consider to be 0 2 V and
which we consider to be a trivial linear combination.
Þ

Definition 1.2.4. Let B D fvi g be a subset of V . Then B is linearly
independent if the only linear combination of elements of V that is equal
P
to 0 is the trivial linear combination, i.e., if 0 D
ci vi implies ci D 0 for
every i .
Þ
Definition 1.2.5. Let B D fvi g be a subset of V . Then Span.B/ is the
subspace of V consisting of all linear combinations of elements of B,
nX
o
Span.B/ D
ci vi j ci 2 F :

If Span.B/ D V then B is a spanning set for V (or equivalently, B spans
V ).
Þ

Remark 1.2.6. Strictly speaking, we should have defined Span.B/ to be
a subset of V , but it is easy to verify that it is a subspace.
Þ
Lemma 1.2.7. Let B be a subset of a vector space V . The following are
equivalent:
(1) B is linearly independent and spans V .
(2) B is a maximal linearly independent subset of V .
(3) B is a minimal spanning set for V .
Proof (Outline). Suppose B is linearly independent and spans V . If B 
B 0 , choose v 2 B 0 , v 62 B. Since B spans V , v is a linear combination of
elements of B, and so B 0 is not linearly independent. Hence B is a maximal
linearly independent subset of V . If B 0  B, choose v 2 B, v 62 B 0 . Since
B is linearly independent, v is not in the subspace spanned by B 0 , and
hence B is a minimal spanning set for V .
Suppose that B is a maximal linearly independent subset of V . If B
does not span V , choose any vector v 2 V that is not in the subspace

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 10 — #24

i

i

10

1. Vector spaces and linear transformations

spanned by B. Then B 0 D B [ fvg would be linearly independent, contradicting maximality.
Suppose that B is a minimal spanning set for V . If B is not linearly independent, choose v 2 B that is a linear combination of the other elements
of B. Then B 0 D B fvg would span V , contradicting minimality.
Definition 1.2.8. A subset B of V satisfying the equivalent conditions
of Lemma 1.2.7 is a basis of V .
Þ
Theorem 1.2.9. Let V be a vector space and let A and C be subsets of V
with A  C, A linearly independent, and C spanning V . Then there is a
basis B of V with A  B  C.
Proof. This proof is an application of Zorn’s Lemma. Let
Z D fB 0 j A  B 0  C; B 0 linearly independentg;
partially ordered by inclusion. Z is nonempty as A 2 Z. Any chain (i.e.,
linearly ordered subset) of Z has a maximal element, its union. Then, by
Zorn’s Lemma, Z has a maximal element B. We claim that B is a basis for
V.
Certainly B is linearly independent, so we need only show that it spans
V . Suppose not. Then there would be some v 2 C not in the span of B
(since if every v 2 C were in the span of B, then B would span V , because
C spans V ), and B C D B [ fvg would then be a linearly independent
subset of C with B  B C , contradicting maximality.
Corollary 1.2.10. (1) Let A be any linearly independent subset of V . Then
there is a basis B of V with A  B.
(2) Let C be any spanning set for V . Then there is a basis B of V with
B  C.
(3) Every vector space V has a basis B.
Proof. (1) Apply Theorem 1.2.9 with C D V .
(2) Apply Theorem 1.2.9 with A D f g.
(3) Apply Theorem 1.2.9 with A D f g and C D V .
We now show that the dimension of a vector space is well-defined. We
first prove the following familiar result from elementary linear algebra, one
that is useful and important in its own right.
Lemma 1.2.11. A homogeneous system of m equations in n unknowns with
m < n has a nontrivial solution.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 11 — #25

i

i

1.2. Basis and dimension

11

Proof (Outline). We proceed by induction on m. Let the unknowns be
x1 ; : : : ; xn . If m D 0, set x1 D 1, x2 D    D xn D 0.
Suppose the theorem is true for m and consider a system of m C 1
equations in n > m C 1 unknowns. If none of the equations involve x1 ,
the system has the solution x1 D 1, x2 D    D xn D 0. Otherwise,
pick an equation involving x1 (i.e., with the coefficient of x1 nonzero) and
subtract appropriate multiples of it from the other equations so that none of
them involve x1 . Then the other equations in the transformed system are a
system of n 1 > m equations in the variables x2 ; : : : ; xn. By induction it
has a nontrivial solution for x2 ; : : : ; xn . Then solve the remaining equation
for x1 .
Lemma 1.2.12. Let B D fv1 ; : : : ; vm g span V . Any subset C of V containing more than m vectors is linearly dependent.
Proof. Let C D fw1 ; : : : ; wng with n > m. (If C is infinite consider a finite
subset containing n > m elements.) For each i D 1; : : : ; n
wi D

m
X

aj i vj :

m
X

ci wi

j D1

We show that
0D

i D1

has a nontrivial solution (i.e., a solution with not all ci D 0). We have
0
1
!
m
m
n
n
m
X
X
X
X
X
0D
ci wi D
ci @
aj i vj A D
aj i ci vj
i D1

i D1

j D1

j D1

i D1

and this will be true if
0D

n
X
i D1

aj i ci

for each j D 1; : : : ; m:

This is a system of m equations in the n unknowns c1 ; : : : ; cn and so has a
nontrivial solution by Lemma 1.2.11.
In the following, we do not distinguish between cardinalities of infinite
sets.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 12 — #26

i

i

12

1. Vector spaces and linear transformations

Theorem 1.2.13. Let V be a vector space. Then any two bases of V have
the same number of elements.
Proof. Let V have bases B and C. If both B and C are infinite, we are
done. Assume not. Let B have m elements and C have n elements. Since
B and C are bases, both B and C span V and both B and C are linearly
independent. Applying Lemma 1.2.12 we see that m  n. Interchanging B
and C we see that n  m. Hence m D n.
Given this theorem we may make the following very important definition.
Definition 1.2.14. Let V be a vector space. The dimension of V , dim.V /,
is the number of vectors in any basis of V , dim.V / 2 f0; 1; 2; : : :g [ f1g.
Þ
Remark 1.2.15. The vector space V D f0g has basis f g and hence
dimension 0.
Þ
While we will be considering both finite-dimensional and infinite-dimensional vector spaces, we adopt the convention that when we write “Let V be
an n-dimensional vector space” or “Let V be a vector space of dimension n”
we always mean that V is finite-dimensional, so that n is a nonnegative integer.
Theorem 1.2.16. Let V be a vector space of dimension n. Let C be a subset
of V consisting of m elements.
(1) If m > n then C is not linearly independent (and hence is not a basis
of V ).
(2) If m < n then C does not span V (and hence is not a basis of V ).
(3) If m D n the following are equivalent:
(a) C is a basis of V .
(b) C spans V .
(c) C is linearly independent.
Proof. Let B be a basis of V , consisting necessarily of n elements.
(1) B spans V so, applying Lemma 1.2.12, if C has m > n elements
then C is not linearly independent.
(2) Suppose C spans V . Then, applying Lemma 1.2.12, B has n > m
elements so cannot be linearly independent, contradicting B being a basis
of V .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 13 — #27

i

i

1.2. Basis and dimension

13

(3) By definition, (a) is equivalent to (b) and (c), so (a) implies (b) and
(a) implies (c). Suppose (b) is true. By Corollary 1.2.10, C has a subset of
C 0 of m  n elements that is a basis of V . By Theorem 1.2.13, m D n,
so C 0 D C. Suppose (c) is true. By Corollary 1.2.10, C has a superset of
C 0 of m  n elements that is a basis of V . By Theorem 1.2.13, m D n, so
C 0 D C.
Remark 1.2.17. A good mathematical theory is one that reduces hard
problems to easy problems. Linear algebra is such a theory, as it reduces
many problems to counting. Theorem 1.2.16 is a typical example. Suppose
we want to know whether a set C is a basis of an n-dimensional vector space
V . We count the number of elements of C, say m. If we get the “wrong”
number, i.e., if m ¤ n, then we know C is not a basis of V . If we get the
“right” number, i.e., if m D n, then C may or may not be a basis of V . While
there are normally two conditions to check, that C is linearly independent
and that C spans V , it suffices to check either one of the conditions. If that
one is satisfied, the other one is automatic.
Þ
Example 1.2.18. (1) F n has basis En , the standard basis, given by En D
fe1;n ; e2;n ; : : : ; en;n g where ei;n is the vector in F n whose i th entry is 1 and
all of whose other entries are 0.
F 1 has basis E1 D fe1;1 ; e2;1 ; : : :g defined analogously. We will
often write E for En and ei for ei;n when n is understood. Thus F n has
dimension n and F 1 is infinite-dimensional.
(2) F 1 is a proper subspace of F 11 . By Corollary 1.2.10, F 11 has a
basis, but it is impossible to write one down in a constructive way.
(3) The vector space of polynomials of degree at most n with coefficients in F , Pn .F / D fa0 C a1 x C    C an x n j ai 2 F g, has basis
f1; x; : : : ; x n g and dimension n C 1.
(4) The vector space of polynomials of arbitrary degree with coefficients
in F , P .F / D fa0 C a1 x C a2 x 2 C    j ai 2 F g, has basis f1; x; x 2; : : :g
and is infinite-dimensional.
(5) Let pi .x/ be any polynomial of degree i . Then fp0 .x/; p1 .x/; : : : ;
pn .x/g is a basis for Pn .F /, and fp0 .x/; p1 .x/; p2 .x/; : : :g is a basis for
P .F /.
(6) Mm;n .F / has dimension mn, with basis given by the mn distinct
matrices each of which has a single entry of 1 and all other entries 0.
(7) If V D ff W A ! F g for some finite set A D fa1 ; : : : ; an g, then
V is n-dimensional with basis fb1; : : : ; bn g where bi is the function defined
by bi .aj / D 1 if j D i and 0 if j ¤ i .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 14 — #28

i

i

14

1. Vector spaces and linear transformations

(8) Let E be an extension of F and let ˛ 2 E be algebraic, i.e., ˛ is a root
of a (necessarily unique) monic irreducible polynomial f .x/ 2 F Œx. Let
f .x/ have degree n. Then F .˛/ defined by F .˛/ D fp.˛/ j p.x/ 2 F Œxg
is a subfield of E with basis f1; ˛; : : : ; ˛ n 1 g and so is an extension of F of
degree n.
Þ
Remark 1.2.19. If we consider cardinalities of infinite sets, we see that
F 1 is countably infinite-dimensional. On the other hand, F 11 is uncountably infinite-dimensional. If F is a countable field, this is easy to see: F 11
is uncountable. For F uncountable, we need a more subtle argument. We
will give it here, although it presupposes results from Chapter 4. For convenience we consider r F 11 instead, but clearly r F 11 and F 11 are isomorphic.
Consider R W r F 11 ! r F 11 . Observe that for any a 2 F , R has
eigenvalue a with associated eigenvector va D Œ1; a; a2; a3 ; : : :. But eigenvectors associated to distinct eigenvalues are linearly independent. (See
Lemma 4.2.5.)
Þ
Corollary 1.2.20. Let W be a subspace of V . Then dim.W /  dim.V /. If
dim.V / is finite, then dim.W / D dim.V / if and only if W D V .
Proof. Apply Theorem 1.2.16 with C a basis of W .
We have the following useful characterization of a basis.
Lemma 1.2.21. Let V be a vector space and let B D fvi g be a set of
vectors in V . Then B is a basis of V if and only if every v 2 V can be
P
written uniquely as v D
ci vi for ci 2 F , all but finitely many zero.

Proof. Suppose B is a basis of V . Then B spans V , so any v 2 V can be
P
written as v D
c v . We show this expression for v is unique. Suppose
P 0 i i
P
we have v D
ci vi . Then 0 D .ci0 ci /vi . But B is linearly independent, so ci0 ci D 0 and ci0 D ci for each i .
P
Conversely, suppose every v 2 V can be written as v D
ci vi in
a unique way. This clearly implies that B spans V . To show B is linearly
P
P
independent, suppose 0 D
ci vi . Certainly 0 D
0vi . By the uniqueness
of the expression, ci D 0 for each i .
This lemma will be the basis for our definition of coordinates in the
next chapter. It also has immediate applications. First, an illustrative use,
and then some general results.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 15 — #29

i

i

1.2. Basis and dimension

15

Example 1.2.22. (1) Let V D Pn
B D f1; x

1 .R/.

For any real number a,

a/2 ; : : : ; .x

a; .x

a/n

1

g

is a basis of V , so any polynomial p.x/ 2 V can be written uniquely as a
linear combination of elements of B,
p.x/ D

n 1
X

a/i :

ci .x

cD0

Solving for the coefficients ci we obtain the familiar Taylor expansion
p.x/ D
(2) Let V D Pn
fa1 ; : : : ; an g,
B D f.x
.x

a2 /.x
a1 /.x

1 .R/.

n
X1
i D0

p .i /.a/
.x
iŠ

a/i :

For any set of pairwise distinct real numbers

a3 /    .x

an /    .x

an /; .x
an

a1 /.x

1 /g

a3 /    .x

an /; : : : ;

is a basis of V , so any polynomial p.x/ 2 V can be written uniquely as a
linear combination of elements of B,
p.x/ D

n
X
i D1

ci x


a1    x

ai

1



x


ai C1    x


an :

Solving for the coefficients ci we obtain the familiar Lagrange interpolation
formula

n
X
p ai




p.x/ D
a
a



a
a
a
a



a
a
i
1
i
i
1
i
i
C1
i
n
i D1




 x a1    x ai 1 x ai C1    x an :
Þ

So far in this section we have considered individual vector spaces. Now
we consider pairs of vector spaces V and W and linear transformations
between them.
Lemma 1.2.23. (1) A linear transformation T W V ! W is specified by its
values on any basis of V .
(2) If fvi g is a basis of V and fwi g is an arbitrary set of vectors in W ,
then there is a unique linear transformation T W V ! W with T .vi / D wi
for each i .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 16 — #30

i

i

16

1. Vector spaces and linear transformations

Proof. (1) Let B D fv1; v2 ; : : :g be a basis of V and suppose that T W V !
W and T 0 W V ! W are two linear transformations that agree on each vi .
P
Let v 2 V be arbitrary. We may write v D ci vi , and then
 X
 X

ci vi D
ci T vi D
ci T 0 vi
X

DT0
ci vi D T 0 .v/:

T .v/ D T

X

(2) Let fw1; w2; : : :g be an arbitrary set of vectors in W , and define T
P
as follows: For any v 2 V , write v D ci vi and let
T .v/ D

X

ci T .vi / D

X

ci wi :

Since the expression for v is unique, this gives a well-defined function T W
V ! W with T .vi / D wi for each i . It is routine to check that T is a linear
transformation. Then T is unique by part (1).
Lemma 1.2.24. Let T W V ! W be a linear transformation and let B D
fv1 ; v2 ; : : :g be a basis of V . Let C D fw1 ; w2; : : :g D fT .v1 /; T .v2 /; : : :g.
Then T is an isomorphism if and only if C is a basis of W .
Proof. First suppose T is an isomorphism.
To show C spans W , let w 2 W be arbitrary. Since T is an epimorphism, w D T .v/ for some v. As B is a basis of V , it spans V , so we may
P
write v D
ci vi for some fci g. Then
X
X
X
w D T .v/ D T .
ci vi / D
ci T .vi / D
ci wi :

To show C is linearly independent, suppose
0D

X

ci wi D

X

P

ci wi D 0. Then

X

X

ci T vi D T
ci vi D T .v/ where v D
ci vi :

P
Since T is a monomorphism, we must have v D 0. Thus 0 D
ci vi . As
B is a basis of V , it is linearly independent, so ci D 0 for all i .
Conversely, suppose C is a basis of W . By Lemma 1.2.23(2), we may
define a linear transformation S W W ! V by S.wi / D vi . Then ST .vi / D
vi for each i so, by Lemma 1.2.23(1), ST is the identity on V . Similarly
T S is the identity on W so S and T are inverse isomorphisms.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 17 — #31

i

i

1.3. Dimension counting and applications

1.3

17

Dimension counting
and applications

We have mentioned in Remark 1.2.17 that linear algebra enables us to reduce many problems to counting. We gave examples of this in counting
elements of sets of vectors in the last section. We begin this section by deriving a basic dimension-counting theorem for linear transformations, Theorem 1.3.1. The usefulness of this result cannot be overemphasized. We
present one of its important applications in Corollary 1.3.2, and we give a
typical example of its use in Example 1.3.10. It is used throughout linear
algebra.
Here is the basic result about dimension counting.
Theorem 1.3.1. Let V be a finite-dimensional vector space and let T W
V ! W be a linear transformation. Then


dim Ker.T / C dim Im.T / D dim.V /:

Proof. Let k D dim.Ker.T // and n D dim.V /. Let fv1 ; : : : ; vk g be a basis
of Ker.T /. By Corollary 1.2.10, fv1 ; : : : ; vk g extends to a basis fv1 ; : : : ; vk ;
vkC1 ; : : : ; vn g of V . We claim that B D fT .vkC1 /; : : : ; T .vn /g is a basis
of Im.T /.
First let us see that B spans Im.T /. If w 2 Im.T /, then w D T .v/ for
P
some v 2 V . Let v D ci vi . Then
T .v/ D
D

X

k
n
X

 X

ci T vi
ci T vi D
ci T vi C

n
X

i D1

ci T vi

i DkC1

i DkC1



as T .vi / D    D T .vk / D 0 since v1 ; : : : ; vk 2 Ker.T /.
Second, let us see that B is linearly independent. Suppose that
n
X

i DkC1


ci T vi D 0:

Then
T

n
X

i DkC1

ci vi

!

D 0;

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 18 — #32

i

i

18

1. Vector spaces and linear transformations

so
n
X

i DkC1

ci vi 2 Ker.T /;

and hence for some c1 ; : : : ; ck , we have
n
X

i DkC1

ci vi D

k
X

ci vi :

i D1

Then
k
X
i D1

n
X

ci vi D 0;
ci vi C
i DkC1

so by the linear independence of fv1 ; : : : ; vn g, ci D 0 for each i .
Thus dim.Im.T // D n k and indeed k C .n k/ D n.
Corollary 1.3.2. Let T W V ! W be a linear transformation between
vector spaces of the same finite dimension n. The following are equivalent:
(1) T is an isomorphism.
(2) T is an epimorphism.
(3) T is a monomorphism.
Proof. Clearly (1) implies (2) and (3).
Suppose (2) is true. Then, by Theorem 1.3.1,


dim Ker.T / D dim.V / dim Im.T /

D dim.W / dim Im.T / D n

n D 0;

so Ker.T / D f0g and T is a monomorphism, yielding (3) and hence (1).
Suppose (3) is true. Then, by Theorem 1.3.1,


dim Im.T / D dim.V / dim Ker.T /

D dim.W / dim Ker.T / D n 0 D 0;

so Im.T / D W and T is an epimorphism, yielding (2) and hence (1).

Corollary 1.3.3. Let A be an n-by-n matrix. The following are equivalent:
(1) A is invertible.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 19 — #33

i

i

1.3. Dimension counting and applications

19

(2) There is an n-by-n matrix B with AB D I .
(3) There is an n-by-n matrix B with BA D I .
In this situation, B D A

1

.

Proof. Apply Corollary 1.3.2 to the linear transformation TA . If A is invertible and AB D I , then B D IB D A 1 .AB/ D A 1 I D A 1 , and
similarly if BA D I .
Example 1.3.4. Corollary 1.3.2 is false in the infinite-dimensional case:
(1) Let V D r F 11 and consider left shift L and right shift R. L is an
epimorphism but not a monomorphism, while R is a monomorphism but not
an epimorphism. We see that L ı R D I (so R is a right inverse for L and
L is a left inverse for R) but R ı L ¤ I (and neither L nor R is invertible).
(2) Let V D C 1 .R/. Then D W V ! V and Ia W V ! V are linear
transformations that are not invertible, but D ı Ia is the identity.
Þ
Remark 1.3.5. We are not in general considering cardinalities of infinite sets. But we remark that two vector spaces V and W are isomorphic if and only if they have bases of the same cardinality, as we see from
Lemma 1.2.23 and Lemma 1.2.24.
Þ
Corollary 1.3.6. Let V be a vector space of dimension m and let W be a
vector space of dimension n.
(1) If m < n then no linear transformation T W V ! W can be an
epimorphism.
(2) If m > n then no linear transformation T W V ! W can be a
monomorphism.
(3) V and W are isomorphic if and only if m D n. In particular, every
n-dimensional vector space V is isomorphic to F n .
Proof. (1) In this case, dim.Im.T //  m < n so T is not an epimorphism.
(2) In this case, dim.Ker.T //  m n > 0 so T is not a monomorphism.
(3) Parts (1) and (2) show that if m ¤ n, then V and W are not isomorphic. If m D n, choose a basis fv1 ; : : : ; vm g of V and a basis fw1; : : : ; wm g
of W . By Lemma 1.2.23, there is a unique linear transformation T determined by T .vi / D wi for each i , and by Lemma 1.2.24 T is an isomorphism.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 20 — #34

i

i

20

1. Vector spaces and linear transformations

Corollary 1.3.7. Let A be an n-by-n matrix. The following are equivalent:
(1) A is invertible.
(10 ) The equation Ax D b has a unique solution for every b 2 F n .
(2) The equation Ax D b has a solution for every b 2 F n .
(3) The equation Ax D 0 has only the trivial solution x D 0.
Proof. This is simply a translation of Corollary 1.3.2 into matrix language.
We emphasize that this one-sentence proof is the “right” proof of the
equivalence of these properties. For the reader who would like to see a more
computational proof, we shall prove directly that (1) and (10 ) are equivalent.
Before doing so we also observe that their equivalence does not involve
dimension counting. It is their equivalence with properties (2) and (3) that
does. It is possible to prove this equivalence without using dimension counting, and this is often done in elementary texts, but that is most certainly the
“wrong” proof as it is a manipulative proof that obscures the ideas.
(1) )(10 ): Suppose A is invertible. Let x0 D A 1 b. Then Ax0 D
A.A 1 b/ D b so x0 is the solution of Ax D b. If x1 any other solution,
then Ax1 D b, A 1 .Ax1 / D A 1 b, x1 D A 1 b D x0 , so x0 is the unique
solution.
(10 ) )(1): Let bi be a solution of Ax D ei for i D 1; : : : ; n, which
exists by hypothesis. Let B D Œb1 j b2 j    j bn . Then AB D Œe1 j
e2 j    j en  D I . We show that BA D I as well. (That comes from
Corollary 1.3.3, but we are trying to prove it without using Theorem 1.3.1.)
Let fi D Aei , i D 1; : : : ; n. Then Ax D fi evidently has the solution
x0 D ei . It also has the solution x1 D BAei as



A BAei D .AB/ Aei D I Aei D Aei D fi :
By hypothesis, Ax D fi has a unique solution, so BAei D ei for each i ,
giving BA D Œe1 je2 j    jen  D I .

As another application of Theorem 1.3.1, we prove the following familiar theorem from elementary linear algebra.
Theorem 1.3.8. Let A be an m-by-n matrix. Then the row rank of A and
the column rank of A are equal.
Proof. For a matrix C , the image of the linear transformation TC is simply
the column space of C .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 21 — #35

i

i

1.3. Dimension counting and applications

21

Let B be a matrix in (reduced) row echelon form. The nonzero rows of
B are a basis for the row space of B. Each of these rows has a “leading”
entry of 1, and it is easy to check that the columns of B containing those
leading 1’s are a basis for the column space of B. Thus if B is in (reduced)
row echelon form, its row rank and column rank are equal.
Thus if B has column rank k, then dim.Im.TB // D k and hence by
Theorem 1.3.1 dim.Ker.TB // D n k.
Our original matrix A is row-equivalent to a (unique) matrix B in (reduced) row echelon form, so A and B may be obtained from each other
by a sequence of row operations. Row operations do not change the row
space of a matrix, so if B has row rank k, then A has row rank k as well.
Row operations change the column space of A, so we can not use the column space directly. However, they do not change Ker.TA /. (That is why
we usually do them, to solve Ax D 0.) Thus Ker.TB / D Ker.TA / and so
dim.Ker.TA // D n k. Then by Theorem 1.3.1 again, dim.Im.TA // D k,
i.e., A has column rank k, the same as its row rank, and we are done.
Remark 1.3.9. This proof is a correct proof, but is the “wrong” proof, as
it shows the equality without showing why it is true. We will see the “right”
proof in Theorem 2.4.7 below. That proof is considerably more complicated, so we have presented this easy proof.
Þ
Example 1.3.10. Let V D Pn 1 .R/ for fixed n. Let a1 ; : : : ; ak be distinct real numbers and let e1 ; : : : ; ek be non-negative integers with .e1 C
1/ C    C .ek C 1/ D n. Define T W V ! Rn by
2
 3
f a1
6
7
::
6
: 7
6
7
6 f .e1 / a 7
6
1 7
7
 6
::
7:
T f .x/ D 6
:
6
 7
6
7
6 f ak 7
6
7
::
6
7
4
5
:

.ek /
f
ak

If f .x/ 2 Ker.T /, then f .i / .ai / D 0 for i D 0; : : : ; ei , so f .x/ is divisible by .x ai /ei C1 for each i . Thus f .x/ divisible by .x a1 /e1 C1    .x
ak /ek C1 , a polynomial of degree n. Since f .x/ has degree at most n 1, we
conclude f .x/ is the 0 polynomial. Thus Ker.T / D f0g. Since dim V D n
we conclude from Corollary 1.3.2 that T is an isomorphism. Thus for any

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 22 — #36

i

i

22

1. Vector spaces and linear transformations
e

n real numbers b10 ; : : : ; b1e1 ; : : : ; bk0 ; : : : ; bkk there is a unique polynomial
j
f .x/ of degree at most n 1 with f .j / .ai / D bi for j D 0; : : : ; ei and for
i D 1; : : : ; k. (This example generalizes Example 1.2.22(1), where k D 1,
and Example 1.2.22(2), where ei D 0 for each i .)
Þ
Let us now see that the numerical relation in Theorem 1.3.1 is the only
restriction on the kernel and image of a linear transformation.
Theorem 1.3.11. Let V and W be vector spaces with dim V D n. Let V1
be a k-dimensional subspace of V and let W1 be an .n k/-dimensional
subspace of W . Then there is a linear transformation T W V ! W with
Ker.T / D V1 and Im.T / D V2 .
Proof. Let B1 D fv1 ; : : : ; vk g be a basis of V1 and extend B1 to B D
fv1 ; : : : ; vn g, a basis of V . Let C1 D fwkC1 ; : : : ; wn g be a basis of W1 .
Define T W V ! W by T .vi / D 0 for i D 1; : : : ; k and T .vi / D wi for
i D k C 1; : : : ; n.
Remark 1.3.12. In this section we have stressed the importance and
utility of counting arguments. Here is a further application:
A philosopher, an engineer, a physicist, and a mathematician are sitting
at a sidewalk cafe having coffee. On the opposite side of the street there is
an empty building. They see two people go into the building. A while later
they see three come out.
The philosopher concludes “There must have been someone in the building to start with.”
The engineer concludes “We must have miscounted.”
The physicist concludes “There must be a rear entrance.”
The mathematician concludes “If another person goes in, the building
will be empty.”
Þ

1.4

Subspaces and
direct sum decompositions

We now generalize the notion of spanning sets, linearly independent sets,
and bases. We introduce the notions of V being a sum of subspaces W1 ; : : : ;
Wk , of the subspaces W1 ; : : : ; Wk being independent, and of V being the
direct sum of the subspaces W1 ; : : : ; Wk . In the special case where each
W1 ; : : : ; Wk consists of the multiples of a single nonzero vector vi , let B D
fv1 ; : : : ; vk g. Then V is the sum of W1 ; : : : ; Wk if and only if B spans

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 23 — #37

i

i

1.4. Subspaces and direct sum decompositions

23

V ; the subspaces W1 ; : : : ; Wk are independent if and only if B is linearly
independent; and V is the direct sum of W1 ; : : : ; Wk if and only if B is a
basis of V . Thus our work here generalizes part of our work in Section 1.2,
but this generalization will be essential for future developments. In most
cases we omit the proofs as they are very similar to the ones we have given.
Definition 1.4.1. Let V be a vector space and let fW1 ; : : : ; Wk g be a set
of subspaces of V . Then V is the sum V D W1 C    C Wk if every v 2 V
can be written as v D w1 C : : : C wk where wi 2 Wi .
Þ
Definition 1.4.2. Let V be a vector space and let fW1 ; : : : ; Wk g be a set
of subspaces of V . This set of spaces is independent if 0 D w1 C    C wk
with wi 2 Wi implies wi D 0 for each i .
Þ
Definition 1.4.3. Let V be a vector space and let fW1 ; : : : ; Wk g be a
set of subspaces of V . Then V is the direct sum V D W1 ˚    ˚ Wk if
(1) V D W1 C    C Wk , and
(2) fW1 ; : : : ; Wk g is independent.

Þ

We have the following equivalent criterion.
Lemma 1.4.4. Let fW1 ; : : : ; Wk g be a set of subspaces of V . This set of
subspaces is independent if and only if Wi \ .W1 C    C Wi 1 C Wi C1 C
   C Wk / D f0g for each i .
If we only have two subspaces fW1 ; W2 g this condition simply states
W1 \ W2 D f0g. If we have more than two subspaces, it is stronger than
the condition Wi \ Wj D f0g for i ¤ j , and it is the stronger condition we
need for independence, not the weaker one.
Lemma 1.4.5. Let V be a vector space and let fW1 ; : : : ; Wk g be a set of
subspaces of V . Then V is the direct sum V D W1 ˚    ˚ Wk if and only
if v 2 V can be written as v D w1 C    C wk with wi 2 Wi , for each i , in
a unique way.
Lemma 1.4.6. Let V be a vector space and let fW1 ; : : : ; Wk g be a set of
subspaces of V . Let Bi be a basis of Wi , for each i , and let B D B1 [    [
Bk . Then
(1) B spans V if and only if V D W1 C    C Wk .
(2) B is linearly independent if and only if fW1 ; : : : ; Wk g is independent.
(3) B is a basis for V if and only if V D W1 ˚    ˚ Wk .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 24 — #38

i

i

24

1. Vector spaces and linear transformations

Corollary 1.4.7. Let V be a finite-dimensional vector space and let fW1 ;
: : : ; Wk g be a set of subspaces with V D W1 ˚    ˚ Wk . Then dim.V / D
dim.W1 / C    C dim.Wk /.
Corollary 1.4.8. Let V be a vector space of dimension n and let fW1 ; : : : ;
Wk g be a set of subspaces. Let ni D dim.Wi /.
(1) If n1 C    C nk > n then fW1 ; : : : ; Wk g is not independent.
(2) If n1 C    C nk < n then V ¤ W1 C    C Wk .
(3) If n1 C    C nk D n the following are equivalent:
(a) V D W1 ˚    ˚ Wk .

(b) V D W1 C    C Wk

(c) fW1 ; : : : ; Wk g is independent.

Definition 1.4.9. Let V be a vector space and let W1 be a subspace of
V . Then W2 is a complement of W1 if V D W1 ˚ W2 .
Þ
Lemma 1.4.10. Let V be a vector space and let W1 be a subspace of V .
Then W1 has a complement W2 .
Proof. Let B1 be a basis of W1 . Then B1 is linearly independent, so by
Corollary 1.2.10 there is a basis B of V containing B1 . Let B2 D B B1 .
Then B2 is a subset of V , so is linearly independent. Let W2 be the span of
B2 . Then B2 is a linearly independent spanning set for W2 , i.e., a basis for
W2 , and so by Lemma 1.4.6 V D W1 ˚ W2 , and hence W2 is a complement
of W1 .
Remark 1.4.11. Except when W1 D f0g (where W2 D V ) or W1 D V
(where W1 D f0g), the subspace W2 is never unique. We can always choose
a different way of extending B1 to a basis of V , in order to obtain a different
W2 . Thus W2 is a, not the, complement of W1 .
Þ

1.5

Affine subspaces
and quotient spaces

For the reader familiar with these notions, we can summarize much of what
we are about to do in this section in a paragraph: Let W be a subspace of
V . Then W is a subgroup of V , regarded as an additive group. An affine
subspace of V parallel to W is simply a coset of W in V , and the quotient

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 25 — #39

i

i

1.5. Affine subspaces and quotient spaces

25

space V =W is simply the group quotient V =W , which also has a vector
space structure.
But we will not presume this familiarity, and instead proceed “from
scratch”.
We begin with a generalization of the notion of a subspace of a vector
space.
Definition 1.5.1. Let V be a vector space. A subset X of V is an affine
subspace if for some element x0 of X,
˚
U D x 0 x0 j x 0 2 X
is a subspace of V . In this situation X is parallel to U .

Þ

The definition makes the element x0 of X look distinguished, but that
is not the case.
Lemma 1.5.2. Let X be affine subspace of V parallel to the subspace U .
Then for any element x of X,
˚
U D x0 x j x0 2 X :

Remark 1.5.3. An affine subspace X of V is a subspace of V if and only
if 0 2 X.
Þ
An alternative way of looking at affine subspaces is given by the following result.
Proposition 1.5.4. A subset X of V is an affine subspace of V parallel to
the subspace U of V if and only if for some, and hence for every, element x
of X,
X D x C U D fx C u j u 2 U g:
There is a natural definition of the dimension of an affine subspace.
Definition 1.5.5. Let X be affine subspace of V parallel to the subspace
U . Then the dimension of X is dim.X/ D dim.U /.
Þ
Proposition 1.5.6. Let X be an affine subspace of V parallel to the subspace U of V . Let x0 be an element of X and let fu1 ; u2 ; : : :g be a basis of
U . Then any element x of X may be written uniquely as
X
x D x0 C
ci ui

for some scalars fc1 ; c2; : : :g.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 26 — #40

i

i

26

1. Vector spaces and linear transformations
The most important way in which affine subspaces arise is as follows.

Theorem 1.5.7. Let T W V ! W be a linear transformation and let w0 2
W be an arbitrary element of W . If T 1 .w0 / is nonempty, then T 1 .w0 /
is an affine subspace of V parallel to Ker.T /.
Proof. Choose v0 2 V with T .v0 / D w0 . If v 2 T 1 .w0 / is arbitrary, then
v D v0 C .v v0 / D v0 C u and T .u/ D T .v v0 / D T .v/ T .v0 / D
w0 w0 D 0, so u 2 Ker.T /. Conversely, if u 2 Ker.T / and v D v0 C u,
then T .v/ D T .v0 C u/ D T .v0 / C T .u/ D w0 C 0 D w0 . Thus we see
that
T

1

.w0 / D v0 C Ker.T /

and the theorem then follows from Proposition 1.5.4.
Remark 1.5.8. The condition in Definition 1.5.1 is stronger than the
condition that U D fx2 x1 j x1 ; x2 2 U g. (We must fix x1 and let x2
vary, or vice versa, but we cannot let both vary.) For example, if V is any
vector space and X D V f0g, then V D fx2 x1 j x1 ; x2 2 Xg, but X is
never an affine subspace of V , except in the case that V is a 1-dimensional
vector space over the field with 2 elements.
Þ
Let V be a vector space and W a subspace. We now define the important notion of the quotient vector space V =W , and investigate some of its
properties.
Definition 1.5.9. Let V be a vector space and let W be a subspace of
V . Let  be the equivalence relation on V given by v1  v2 if v1 v2 2 W .
Denote the equivalence class of v 2 V under this relation by Œv. Then the
quotient V =W is the vector space
˚
V =W D equivalence classes Œv j v 2 V

with addition given by Œv1  C Œv2  D Œv1 C v2  and scalar multiplication
given by cŒv D Œcv.
Þ

Remark 1.5.10. We leave it to the reader to check that these operations
give V =W the structure of a vector space.
Þ
Here is an alternative definition of V =W .
Lemma 1.5.11. The quotient space V =W of Definition 1.5.9 is given by
˚
V =W D affine subspaces of V parallel to W :

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 27 — #41

i

i

1.5. Affine subspaces and quotient spaces

27

Proof. As in Proposition 1.5.4, we can check that for v0 2 V , the equivalence class Œv0  of v0 is given by
  ˚
˚
v0 D v 2 V j v  v0 D v 2 V j v

v0 2 W D v0 C W;

which is an affine subspace parallel to W , and every affine subspace arises
in this way from a unique equivalence class.
There is a natural linear transformation from V to V =W .
Definition 1.5.12. Let W be a subspace of V . The canonical projection
 W V ! V =W is the linear transformation given by .v/ D Œv D v C W .
Þ
We have the following important construction and results. They improve
on the purely numerical information provided by Theorem 1.3.1.
Theorem 1.5.13. Let T W V ! X be a linear transformation. Then T W
V = Ker.T / ! X given by T .v C Ker.T // D T .v/ (i.e., by T ..v// D
T .v/) is a well-defined linear transformation, and T gives an isomorphism
from V = Ker.T / to Im.T /  X.
Proof. If v1 C Ker.T / D v2 C Ker.T /, then v1 D v2 C w for some
w 2 Ker.T /, so T .v1 / D T .v2 C w/ D T .v2 / C T .w/ D T .v2 / C 0 D
T .v2 /, and T is well-defined. It is then easy to check that it is a linear
transformation, that it is 1-1, and that its image is Im.T /, completing the
proof.
Let us now see how to find a basis for a quotient vector space.
Theorem 1.5.14. Let V be a vector space and W1 a subspace. Let B1 D
fw1; w2 ; : : :g be a basis for W1 and extend B1 to a basis B of V . Let B2 D
B B1 D fz1 ; z2 ; : : :g. Let W2 be the subspace of V spanned by B2 , so that
W2 is a complement W1 in V with basis B2 . Then the linear transformation
P W W2 ! V =W1 defined by P .zi / D Œzi  is an isomorphism. In particular,
B 2 D fŒz1; Œz2 ; : : :g is a basis for V =W1 .
Proof. It is easy to check that P is a linear transformation. We show that
fŒz1 ; Œz2; : : :g is a basis for V =W1 . Then, since P is a linear transformation
taking a basis of one vector space to a basis of another, P is an isomorphism.
First let us see that B 2 spans V =W1 . Consider an equivalence class Œv
P
P
in V =W1 . Since B is a basis of V , we may write v D
ci wi C dj zj

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 28 — #42

i

i

28

1. Vector spaces and linear transformations

P
P
P
for some fci g and fdj g. Then v
dj zj D
ci wi 2 W1 , so v  dj zj
P
P
and hence Œv D Œ dj zj  D dj Œzj .
P
Next let us see that B 2 is linearly independent. Suppose
dj Œzj  D
P
P
P
P
Œ dj zj  D 0. Then
d z 2 W1 , so dj zj D
ci wi for some fci g.
P
Pj j
But then . ci /wi C
dj zj D 0, an equation in V . But fw1 ; w2; : : : ;
z1 ; z2; : : :g D B is a basis of V , and hence linearly independent, so
(c1 D c2 D    D 0 and) d1 D d2 D    D 0.
Remark 1.5.15. We cannot emphasize strongly enough the difference
between a complement W2 of the subspace W1 and the quotient V =W1 . The
quotient V =W1 is canonically associated to W1 , whereas a complement is
not. As we observed, W1 almost never has a unique complement. Theorem 1.5.14 shows that any of these complements is isomorphic to the quotient V =W1 . We are in a situation here where every quotient object V =W1 is
isomorphic to a subobject W2 . This is not always the case in algebra, though
it is here, and this fact simplifies arguments, as long as we remember that
what we have is an isomorphism between W2 and V =W1 , not an identification of W2 with V =W1 . Indeed, it would be a bad mistake to identify V =W1
with a complement W2 of W1 .
Þ
Often when considering a subspace W of a vector space V , what is
important is not its dimension, but rather its codimension, which is defined
as follows.
Definition 1.5.16. Let W be a subspace of V . Then the codimension
of W in V is
codimV W D dim V =W:

Þ

Lemma 1.5.17. Let W1 be a subspace of V . Let W2 be any complement of
W1 in V . Then codimV W1 D dim W2 .
Proof. By Theorem 1.5.14, V =W1 and W2 are isomorphic.
Corollary 1.5.18. Let V be a vector space of dimension n and let W be a
subspace of V of dimension k. Then dim V =W D codimV W D n k.
Proof. Immediate from Theorem 1.5.14 and Lemma 1.5.17.
Here is one important way in which quotient spaces arise.
Definition 1.5.19. Let T W V ! W be a linear transformation. Then
the cokernel of T is the quotient space
Coker.T / D W= Im.T /:

Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 29 — #43

i

i

1.5. Affine subspaces and quotient spaces

29

Corollary 1.5.20. Let V be an n-dimensional vector space and let T W
V ! V be a linear transformation. Then dim.Ker.T // D dim.Coker.T //.
Proof.

By Theorem 1.3.1, Corollary 1.5.18, and Definition 1.5.19,



dim Ker.T / D dim.V / dim Im.T / D dim V = Im.T /

D dim Coker.T / :



We have shown that any linearly independent set in a vector space V
extends to a basis of V . We outline another proof of this, using quotient
spaces. This proof is not any easier, but its basic idea is one we will be
using later.
Theorem 1.5.21. Let B1 be any linearly independent subset of a vector
space V . Then B1 extends to a basis B of V .
Proof. Let W be the subspace of V generated by B1 , and let  W V !
V =W be the canonical projection. Let C D fx1 ; x2; : : :g be a basis of V =W
and for each i let ui 2 V with .ui / D xi . Let B2 D fu1 ; u2 ; : : :g. We
leave it to the reader to check that B D B1 [ B2 is a basis of V .
In a way, this result is complementary to Theorem 1.5.14, where we
showed how to obtain a basis of V =W , starting from the right sort of basis
of V . Here we showed how to obtain a basis of V , starting from a basis of
W and a basis of V =W .
Definition 1.5.22. Let T W V ! V be a linear transformation. T is
Fredholm if Ker.T / and Coker.T / are both finite-dimensional, in which
case the index of T is dim.Ker.T // dim.Coker.T //.
Þ
Example 1.5.23. (1) In case V is finite-dimensional, every T is Fredholm. Then by Corollary 1.5.20, dim.Ker.T // D dim.Coker.T //, so T
has index 0. Thus in the finite-dimensional case, the index is completely
uninteresting.
(2) In the infinite-dimensional case, the index is an important invariant,
and may take on any integer value. For example, if V D r F 11 , L W V !
V is left shift and R W V ! V is right shift, as in Example 1.1.23(1), then
Ln has index n and Rn has index n.
(3) If V D C 1 .R/, then D W V ! V has kernel ff .x/ j f .x/ is a constant functiong, of dimension 1, and is surjective, so D has index 1. Also,
Ia W V ! V is injective and has image ff .x/ j f .a/ D 0g, of codimension 1, so Ia has index 1.
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 30 — #44

i

i

30

1.6

1. Vector spaces and linear transformations

Dual spaces

We now consider the dual space of a vector space. The dual space is easy to
define, but we will have to be careful, as there is plenty of opportunity for
confusion.
Definition 1.6.1. Let V be a vector space over a field F . The dual V 
of V is
V  D HomF .V; F / D flinear transformations T W V ! F g:

Þ

Lemma 1.6.2. (1) If V is a vector space over F , then V is isomorphic to a
subspace of V  .
(2) If V is finite-dimensional, then V is isomorphic to V  . In particular,
in this case dim V D dim V  .
Proof. Choose a basis B of V , B D fv1 ; v2; : : :g. Let B  be the subset of
V  given by B D fw1 ; w2; : : :g where vi is defined by wi .vi / D 1 and
wi .vj / D 0 if j ¤ i . (This defines wi by Lemma 1.2.23.) We claim that
P
B  is a linearly independent set. To see this, suppose
cj wj D 0. Then
P

. cj wj /.v/ D 0 for every v 2 V . Choosing v D vi , we see that ci D 0,
for each i .
The linear transformation SB W V ! V  defined by SB .vi / D wi
takes the basis B of V to the independent set B  of V  , so is an injection
(more precisely, an isomorphism from V to the subspace of V  spanned by
B  ).
Suppose V is finite-dimensional and let w  be an element of V  . Let
P

w .vi / D ai for each i . Let v D
ai vi , a finite sum since V is finitedimensional. For each i , SB .v/.vi / D w .vi /. Since these two linear transformations agree on the basis B of V , by Lemma 1.2.23 they are equal, i.e.,
SB .v/ D w  , and SB is a surjection.
Remark 1.6.3. It is important to note that there is no natural map from
V to V  . The linear transformation SB depends on the choice of basis B.
In particular, if V is finite-dimensional then, although V and V  are isomorphic as abstract vector spaces, there is no natural isomorphism between
them, and it would be a mistake to identify them.
Þ
Remark 1.6.4. If V D F n with E the standard basis fe1 ; : : : ; en g, then
the proof of Lemma 1.6.2 gives the standard basis E  of V  , E  D fe1 ; : : : ;

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 31 — #45

i

i

1.6. Dual spaces
en g, defined by

31

02

31
a1
B6 : 7C
ei @4 :: 5A D ai :

Þ

an

Remark 1.6.5. The basis B  (and hence the map SB ) depends on the
entire basis B. For example, let V D F 2 and choose the standard basis E
of V ,
   
˚
1
0
ED
;
D e1 ; e2 :
0
1

Then E  is the basis fe1 ; e2g of V  , with
 
x
Dx
and
e1
y

e2

 
x
D y:
y

If we choose the basis B of V given by
   
˚
1
1
BD
;
D v1 ; v2 ;
0
1

then B  D fw1; w2 g with
 
x
w1
DxCy
y

and

Thus, even though v1 D e1 , w1 ¤ e1 .

w2

 
x
D
y

y:
Þ

Example 1.6.6. If V is infinite-dimensional, then in general the linear
transformation SB is an injection but not a surjection. Let V D F 1 with
basis E D fe1; e2 ; : : :g and consider the set E  D fe1 ; e2 ; : : :g. Any element
w  of the subspace V  spanned by E  has the property that w .ei / ¤ 0 for
only finitely many values of i . This is not the case for a general element of
V  . In fact, V  is isomorphic to F 11 as follows: If
2 3
2 3
a1
b1
6b2 7
6a2 7
1

vD4 52F
and
x D 4 5 2 F 11
::
::
:
:
P
then we have the pairing x  .v/ D
ai bi . (This makes sense for any x  , as
only finitely many entries of v are nonzero.) Any element w  of V  arises

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 32 — #46

i

i

32

1. Vector spaces and linear transformations

in this way as we may choose
2  3
w e1

6w  e2 7

x D4
5:
::
:

Thus in this case the image of SB is F 1  F 11 .

Þ

Remark 1.6.7. The preceding example leaves open the possibility that
V might be isomorphic to V  by some other isomorphism than TB . That is
also not the case in general. We have seen in Remark 1.2.19 that F 1 is a
vector space of countably infinite dimension and F 11 is a vector space of
uncountably infinite dimension.
Þ
Remark 1.6.8. Just as a typical element of V is denoted by v, a typical
element of V  is often denoted by v  . This notation carries the danger of
giving the impression that there is a natural map from V to V  given by
v 7! v  (i.e., that the element v  of V  is the dual of the element v of V ),
and we emphasize again that that is not the case. There is no such natural
map and that is does not make sense to speak of the dual of an element of V .
Thus we do not use this notation and instead use w  to denote an element
of V  .
Þ
Example 1.6.9 (Compare Example 1.2.22). Let V D Pn 1 .R/ for any
n.
(1) For any a 2 R, V has basis B D fp0 .x/; p1 .x/; : : : ; pn 1 .x/g
where p0 .x/ D 1 and pk .x/ D .x a/k = kŠ for k D 1; : : : ; n 1. The dual
basis B  is given by B  D fEa ; Ea ı D; : : : ; Ea ı Dn 1 g.
(2) For any distinct a1 ; : : : ; an 2 R, V has basis C D fq1 .x/; : : : ; qn .x/g
with qk .x/ D …j ¤k .x aj /=.ak aj /. The dual basis C  is given by
C  D fEa1 ; : : : ; Ean g.
(3) Fix an interval Œa; b and let T W V ! R be the linear transformation
Z b

T f .x/ D
f .x/ dx:
a





Then T 2 V . Since C (as above) is a basis of V  , we have T D
Pn
i D1 ci Eai for some constants c1 ; : : : ; cn .
In other words, we have the exact quadrature formula, valid for every
f .x/ 2 V ,
Z b
n
X
f .x/ dx D
ci f .ai /:
a

i D1

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 33 — #47

i

i

1.6. Dual spaces

33

For simplicity, let Œa; b D Œ0; 1, and let us for example choose equally
spaced points.
For n D 0 choose a1 D 1=2. Then c1 D 1, i.e.,
Z 1
f .x/ dx D f .1=2/ for f 2 P0 .R/:
0

For n D 1, choose a1 D 0 and a2 D 1. Then c1 D c2 D 1=2, i.e.,
Z 1
f .x/ dx D .1=2/f .0/ C .1=2/f .1/ for f 2 P1 .R/:
0

For n D 2, choose a1 D 0, a2 D 1=2, a3 D 1. Then c1 D 1=6,
c2 D 4=6, c3 D 1=6, i.e.,
Z 1
f .x/ dx D .1=6/f .0/ C .4=6/f .1=2/ C .1=6/f .1/ for f 2 P2 .R/:
0

The next two expansions of this type are
Z 1
f .x/ dx D .1=8/f .0/ C .3=8/f .1=3/ C .3=8/f .2=3/
0

Z

C .1=8/f .1/

1
0

for f 2 P3 .R/;

f .x/ dx D .7=90/f .0/ C .32=90/f .1=4/ C .12=90/f .1=2/
C .32=90/f .3=4/ C .7=90/f .1/

for f 2 P4 .R/:

These formulas are the basis for commonly used approximate quadrature formulas: The first three yield the midpoint rule, the trapezoidal rule,
and Simpson’s rule respectively.
(4) Fix an interval Œa; b and for any polynomial g.x/ let
Z b
Tg.x/ D
f .x/g.x/ dx:
a

Then Tg.x/ 2 V . Let D D fT1; Tx ; : : : ; Txn 1 g. We claim that D  is
linearly independent. To see this, suppose that




T D a0 T1 C a1 Tx C    C an

1 Tx n

1

D 0:

Then T D Tg.x/ with g.x/ D a0 C a1x C    C an 1 x n 1 2 V . To say that
T D 0 is to say that T .f .x// D 0 for every f .x/ 2 V . But if we choose
f .x/ D g.x/, we find
Z b


g.x/2 dx D 0
T f .x/ D Tg.x/ g.x/ D
a

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 34 — #48

i

i

34

1. Vector spaces and linear transformations

which forces g.x/ D 0, i.e., a0 D a1 D    D an 1 D 0, and D  is
linearly independent.
Since D  is a linearly independent set of n elements in V  , a vector
space of dimension n, it must be a basis of V  , so every element of V 
is Tg.x/ for a unique g.x/ 2 V . In particular this is true for Ec for every
c 2 Œa; b. It is simply a matter of solving a linear system to find g.x/. For
example, let Œa; b D Œ0; 1 and let c D 0. We find
f .0/ D
for g.x/ D 1

for g.x/ D 4

for g.x/ D 9

for g.x/ D 16

for g.x/ D 25

Z

1

f .x/g.x/ dx
0

if f .x/ 2 P0 .R/;

6x
36x C 30x

if f .x/ 2 P1 .R/;

2

120x C 240x

2

300x C 1050x

140x
2

if f .x/ 2 P2 .R/;

3
3

1400x C 630x

4

if f .x/ 2 P3 .R/;

if f .x/ 2 P4 .R/:

Admittedly, we rarely if ever want to evaluate a function at a point by computing an integral instead, but this shows how it could be done.
We have presented (3) and (4) here so that the reader may see some
interesting examples early, but they are best understood in the context of
inner product spaces, which we consider in Chapter 7.
Þ
To every subspace of V we can naturally associate a subspace of V 
(and vice-versa), as follows.
Definition 1.6.10. Let U be a subspace of V . Then the annihilator
Ann .U / is the subspace of V  defined by
˚
Ann .U / D w  2 V  j w  .u/ D 0 for every u 2 U :

Þ

Lemma 1.6.11. Let U be a finite-dimensional subspace of V . Then
V  = Ann .U / is isomorphic to U . Consequently,

codim Ann .U / D dim.U /:

Proof. Set X  D Ann .U / and let fx1 ; x2 ; : : :g be a basis of X  . Let
fu1 ; : : : ; uk g be a basis for U . Let U 0 be a complement of U , so V D
U ˚U 0 , and let fu01 ; u02 ; : : :g be a basis of U 0 . Then fu1 ; : : : ; uk ; u01 ; u02 ; : : :g

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 35 — #49

i

i

1.6. Dual spaces

35

is a basis of V . For j D 1; : : : ; k define yj 2 V  by

yj ui D 0 if i ¤ j;

yj uj D 1;

yj u0m D 0 for every m:

We claim fy1 ; : : : ; yk ; x1 ; x2 ; : : :g is a basis of V  . First we show it
P
P

is linearly independent: Suppose cj yj C dm xm
D 0. Evaluating this
function at ui we see it has the value ci , so ci D 0 for i D 1; : : : ; k. Then
dm D 0 for each m as fx1 ; x2 ; : : :g is linearly independent. Next we show
it spans V  : Let w  2 V  . For j D 1; : : : ; k, let ci D w  .ui /. Let y  D
P
w
cj yj . Then y  .ui / D 0 for each i , so y  2 Ann.U  / and hence
P
P
P



y D dm x m
for some d1 ; : : : ; dm . Then w  D cj yj C dm xm
.




Let Y be the subspace of V spanned by fy1 ; : : : ; yk g. Then V  D
X  ˚ Y  so V  =X  is isomorphic to Y  . But we have an isomorphism
S W U ! Y  given by S.ui / D yi . (If we let ui be the restriction of yi to
U , then fu1 ; : : : ; uk g is the dual basis to fu1 ; : : : ; uk g.)
Remark 1.6.12. We often think of Lemma 1.6.11 as follows: Suppose we
have k linearly independent elements u1 ; : : : ; uk of V , so that they generate
a subspace U of V of dimension k. Then the requirements that a linear
transformation from V to F be zero at each of u1 ; : : : ; uk imposes k linearly
independent conditions on the space of all such linear transformations, so
the subspace of linear transformations satisfying precisely these conditions,
which is Ann .U /, has codimension k.
Þ
To go the other way, we have the following association.
Definition 1.6.13. Let U  be a subspace of V  . Then the annihilator
Ann.U  / is the subspace of V defined by
˚
Ann.U  / D v 2 V j w  .v/ D 0 for every w  2 U  :
Þ



Remark 1.6.14. Observe that Ann .f0g/ D V
similarly Ann.f0g/ D V and Ann.V  / D f0g.





and Ann .V / D f0g;
Þ

If V is finite-dimensional, our pairings are inverses of each other, as we
now see.
Theorem 1.6.15. (1) For any subspace U of V , Ann.Ann .U // D U .
(2) Let V be finite-dimensional. For any subspace U  of V  ,
Ann .Ann.U  // D U  :

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 36 — #50

i

i

36

1. Vector spaces and linear transformations

So far in this section we have considered vectors, i.e., objects. We now
consider linear transformations, i.e., functions. We first saw pullbacks in
Example 1.1.23(3), and now we see them again.
Definition 1.6.16. Let T W V ! X be a linear transformation. Then
the dual T  of T is the linear transformation T  W X  ! V  given by
T  .y  / D y  ı T , i.e., T  .y  / 2 V  is the linear transformation on V
defined by



T  y  .v/ D y  ı T .v/ D y  T .v/ ; for y  2 X  :
Þ

Remark 1.6.17. (1) It is easy to check that T  .y  / is a linear transformation for any y  2 X  . But we are claiming more, that y  7! T  .y  /
is a linear transformation from V  to X  . This follows from checking that
T  .y1 C y2 / D T  .y1 / C T  .y2 / and T  .cy  / D cT  .y  /.
(2) The dual T  of T is well-defined and does not depend on a choice
of basis, as it was defined directly in terms of T .
Þ
Now we derive some relations between various subspaces.
Lemma 1.6.18. Let T W V ! X be a linear transformation. Then Im.T  / D
Ann .Ker.T //.
Proof. Let w  2 V  be in Im.T  /, so w  D T  .y  / for some y  2 X  .
Then for any u 2 Ker.T /, w .u/ D .T  .y  //.u/ D y  .T .u// D y  .0/ D
0, so w  is in Ann .Ker.T //. Thus we see that Im.T  /  Ann .Ker.T //.
Let w  2 V  be in Ann .Ker.T //, so w  .u/ D 0 for every u 2
Ker.T /. Let V 0 be a complement of Ker.T /, so V D Ker.T / ˚ V 0 . Then
we may write any v 2 V uniquely as v D u C v 0 with u 2 Ker.T /,
v 0 2 V 0 . Then w .v/ D w  .u C v 0 / D w  .u/ C w .v 0 / D w .v 0 /. Also,
T .v/ D T .v 0 /, so T .V / D T .V 0 /. Let X 0 be any complement of T .V 0 / in
X, so that X D T .V 0 / ˚ X 0 .
Since the restriction of T to V 0 is an isomorphism, we may write x 2 X
uniquely as x D T .v 0 / C x 0 with v 0 2 V 0 and x 0 2 X 0 . Define y  2 X  by
y  .x/ D w .v 0 /

where x D T .v 0 / C x 0 ; v 0 2 V 0 and x 0 2 X 0 :

(It is routine to check that y  is a linear transformation.) Then for v 2 V ,
writing v D u C v 0 , with u 2 Ker.T / and v 0 2 V 0 , we have



T  y  .v/ D y  T .v/ D y  T .v 0 / D w .v 0 / D w .v/:

Thus T  .y  / D w  and we see that Ann .Ker.T //  Im.T  /.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 37 — #51

i

i

1.6. Dual spaces

37

The following corollary gives a useful dimension count.
Corollary 1.6.19. Let T W V ! X be a linear transformation.
(1) If Ker.T / is finite-dimensional, then


D dim Coker T 





D dim Coker.T / D codim Im.T / :

codim Im T 

(2) If Coker.T / is finite-dimensional, then
dim Ker T 




D dim Ker.T / :

Proof. (1) Let U D Ker.T /. By Lemma 1.6.11,
dim Ker T
By Lemma 1.6.18,




D codim Ann Ker.T / :



Ann Ker.T / D Im T  :

(2) is proved using similar ideas and we omit the proof.
Here is another useful dimension count.
Corollary 1.6.20. Let T W V ! X be a linear transformation.
(1) If dim.V / is finite, then
dim Im T 



D dim Im T




D dim Ker.T / :

(2) If dim.V / D dim.X/ is finite, then
dim Ker T 



:

Proof. (1) By Theorem 1.3.1 and Corollary 1.6.19,



dim Im.T / D dim Ker.T /



D codim Im T  D dim V 
dim Im.T / ;

dim.V /

and by Lemma 1.6.2, dim.V  / D dim.V /.
(2) By Theorem 1.3.1 and Lemma 1.6.2,


dim.Ker T  / D dim X 
D dim.V /


dim Im.T  /


dim Im.T / D dim Ker.T / :



i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 38 — #52

i

i

38

1. Vector spaces and linear transformations

Remark 1.6.21. Again we caution the reader that although we have
equality of dimensions, there is no natural identification of the subspaces
in each part of Corollary 1.6.20.
Þ
Lemma 1.6.22. Let T W V ! X be a linear transformation.
(1) T is injective if and only if T  is surjective.
(2) T is surjective if and only if T  is injective.
(3) T is an isomorphism if and only if T  is an isomorphism.
Proof. (1) Suppose that T is injective. Let w  2 V  be arbitrary. To show
that T  is surjective we must show that there is a y  2 X  with T  .y  / D
w  , i.e., y  ı T D w  .
Let B D fv1 ; v2 ; : : :g be a basis of V and set xi D T .vi /. T is injective
so fx1; x2 ; : : :g is a linearly independent set in X. Extend this set to a basis
C D fx1 ; x2; : : : ; x10 ; x20 ; : : :g of X and define a linear transformation U W
X ! V by U.xi / D vi , U.xj0 / D 0. Note UT .vi / D vi for each i so UT
is the identity map on V . Set y  D w  ı U. Then T  .y  / D y  ı T D
.w  ı U/ ı T D w  ı .U ı T / D w  .
Suppose that T is not injective and choose v ¤ 0 with T .v/ D 0. Then
for any y  2 X  , T  .y  /.v/ D .y  ı T /.v/ D y  .T .v// D y  .0/ D 0.
But not every element w  of V  has w  .v/ D 0. To see this, let v1 D v and
extend v1 to a basis B D fv1 ; v2 ; : : :g of V . Then there is an element w  of
V  defined by w  .v1 / D 0, w  .vi / D 0 for i ¤ 1.
(2) Suppose that T is surjective. Let y  2 X  . To show that T  is
injective we must show that if T  .y  / D 0, then y  D 0. Thus, suppose
T  .y  / D 0, i.e., that .T  .y  //.v/ D 0 for every v 2 V . Then 0 D
.T  .y  //.v/ D .y  ı T /.v/ D y  .T .v// for every v 2 V . Choose x 2
X. Then, since T is surjective, there is a v 2 V with x D T .v/, and so
y  .x/ D y  .T .v// D 0. Thus y  .x/ D 0 for every x 2 X, i.e., y  D 0.
Suppose that T is not surjective. Then Im.T / is a proper subspace of
X. Let fx1; x2 ; : : :g be a basis for Im.T / and extend this set to a basis
C D fx1 ; x2; : : : ; x10 ; x20 ; : : :g of X. Define y  2 X  by y  .xi / D 0 for all
i , y  .x10 / D 1, and y  .xj0 / D 0 for j ¤ 1. Then y  ¤ 0, but y  .x/ D 0
for every x 2 Im.T /. Then



T  y  .v/ D y  ı T .v/ D y  T .v/ D 0

so T  .y  / D 0.
(3) This immediately follows from (1) and (2).

Next we see how the dual behaves under composition.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 39 — #53

i

i

1.6. Dual spaces

39

Lemma 1.6.23. Let T W V ! W and S W W ! X be linear transformations. Then S ı T W V ! X has dual .S ı T / W X  ! V  given by
.S ı T / D T  ı S  .
Proof. Let y  2 X  and let x 2 X. Then



.S ı T / y  / .x/ D y  .S ı T /.x/ D y  S T .x/



D S y  T .x/ D T  S y  .x/


D T  ı S  y  .x/:

Since this is true for every x and y  , .S ı T / D T



ı S .

We can now consider the dual V  of V  , known as the double dual
of V .
An element of V  is a linear transformation from V to F , and so is a
function from V to F . An element of V  is a linear transformation from
V  to F , and so is a function from V  to F . In other words, an element
of V  is a function on functions. There is one natural way to get a function on functions: evaluation at a point. This is the linear transformation Ev
(“Evaluation at v”) of the next definition.
Definition 1.6.24. Let Ev 2 V  be the linear transformation Ev W
V  ! F defined by Ev .w  / D w  .v/ for every w  2 V  .
Þ
Remark 1.6.25. It is easy to check that Ev is a linear transformation.
Also, Ev is naturally defined. It does not depend on a choice of basis. Þ
Lemma 1.6.26. The linear transformation H W V ! V  given by H .v/ D
Ev is an injection. If V is finite-dimensional, it is an isomorphism.
Proof. Let v be an element of V with Ev D 0. Now Ev is an element of
V  , the dual of V  , so Ev D 0 means that for every w  2 V  , Ev .w  / D
0. But Ev .w  / D w  .v/. Thus v 2 V has the property that w  .v/ D 0
for every w  2 V  . We claim that v D 0. Suppose not. Let v1 D v and
extend fv1 g to a basis B D fv1 ; v2 ; : : :g of V . Consider the dual basis
B  D fw1 ; w2; : : :g of V  . Then w1.v1 / D 1 ¤ 0.
If V is finite-dimensional, then Ev is an injection between vector spaces
of the same dimension and hence is an isomorphism.
Remark 1.6.27. As is common practice, we will often write v  D
H .v/ in case V is finite-dimensional. The map v 7! v  then provides a
canonical identification of elements of V with elements of V  , as there is
no choice, of basis or anything else, involved.
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 40 — #54

i

i

40

1. Vector spaces and linear transformations

Beginning with a vector space V and a subspace U of V , we obtained
from Definition 1.6.10 the subspace Ann .U / of V  . Similarly, beginning
with the subspace Ann .U / of V  we could obtain the subspace
Ann .Ann .U // of V  . This is not the construction of Definition 1.6.13,
which would give us the subspace Ann.Ann .U //, which we saw in Theorem 1.6.15 was just U . But these two constructions are closely related.
Corollary 1.6.28. Let V be a finite-dimensional vector space and let U be
a subspace of V . Let H be the linear transformation of Lemma 1.6.26. Then
H W U ! Ann .Ann .U // is an isomorphism.
Since we have a natural way of identifying finite-dimensional vector
spaces with their double duals, we should have a natural way of identifying
linear transformations between finite-dimensional vector spaces with linear
transformations between their double duals, and we do.
Definition 1.6.29. Let V and X be finite-dimensional vector spaces.
If T W V ! X is a linear transformation, its double dual is the linear
transformation T  W V  ! X  given by T  .v  / D .T .v// .
Þ
Lemma 1.6.30. Let V and X be finite-dimensional vector spaces. Then
T 7! T  is an isomorphism from HomF .V; X/ D flinear transformations:
V ! Xg to HomF .V  ; X  / D flinear transformations:V  ! X  g.
Proof. It is easy to check that T 7! T  is a linear transformation. Since V
and V  have the same dimension, as do X and X , flinear transformations:
V ! Xg and flinear transformations:V  ! X  g are vector spaces of
the same dimension. Thus in order to show that T 7! T  is an isomorphism, it suffices to show that T 7! T  is an injection. Suppose T  D 0,
i.e., T  .v  / D 0 for every v  2 V  . Let v 2 V be arbitrary. Then
0 D T  .v  / D .T .v// D H .T .v//. But H is an isomorphism by
Lemma 1.6.26, so T .v/ D 0. Since this is true for every v 2 V , T D 0.
Remark 1.6.31. In the infinite-dimensional case it is in general not true
that V is isomorphic to V  . For example, if V D F 1 we have seen in
Example 1.6.6 that V  is isomorphic to F 11 . Also, V  is isomorphic to
a subspace of V  . We thus see that V has countably infinite dimension
and V  has uncountably infinite dimension, so they cannot be isomorphic.
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 41 — #55

i

i

CHAPTER

2

Coordinates
In this chapter we investigate coordinates.
It is useful to keep in mind the metaphor:
Coordinates are a language for describing vectors and linear
transformations.
In human languages we have, for example:
ŒEnglish D star, ŒFrench D étoile, ŒGerman D Stern,
Œ!English D arrow, Œ!French D flèche, Œ!German D Pfeil.
Coordinates share two similarities with human languages, but have one
important difference.
(1) Often it is easier to work with objects, and often it is easier to work
with words that describe them. Similarly, often it is easier and more
enlightening to work with vectors and linear transformations directly,
and often it is easier and more enlightening to work with their descriptions in terms of coordinates, i.e., with coordinate vectors and matrices.
(2) There are many different human languages and it is useful to be able to
translate among them. Similarly, there are different coordinate systems
and it is not only useful but indeed essential to be able to translate
among them.
(3) A problem expressed in one human language is not solved by translating it into a second langauge. It is just expressed it differently. Coordinate systems are different. For many problems in linear algebra there
is a preferred coordinate system, and translating the problem into that
41

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 42 — #56

i

i

42

Guide to Advanced Linear Algebra
language greatly simplifies it and helps to solve it. This is the idea behind eigenvalues, eigenvectors, and canonical forms for matrices. We
save their investigation for a later chapter.

2.1

Coordinates for vectors

We begin by restating Lemma 1.2.21.
Lemma 2.1.1. Let V be a vector space and let B D fvi g be a set of vectors
in V . Then B is a basis for V if and only if every v 2 V can be written
P
uniquely as v D
ci vi for ci 2 F , all but finitely many zero.

With this lemma in hand we may make the following important definition.

Definition 2.1.2. Let V be an n-dimensional vector space and let B D
fv1 ; : : : ; vn g be a basis for V . For v 2 V the coordinate vector of v with
P
respect to the basis B, ŒvB , is given as follows: If v D ci vi , then
2 3
c1
6 c2 7
6 7
ŒvB D 6 : 7 2 F n :
Þ
4 :: 5
cn
Theorem 2.1.3. Let V be an n-dimensional vector space and let B be a
basis of V . Then T W V ! F n by T .v/ D ŒvB is an isomorphism.

Proof. Let B D fv1; : : : ; vn g. Define S W F n ! V by
02 31
c1
B6 :: 7C X
S @4 : 5A D
ci vi :
cn

It is easy to check that S is a linear transformation, and then Lemma 2.1.1
shows that S is an isomorphism. Furthermore, T D S 1 .
Example
(1) Let V D F n and let B D E be the standard basis.
 c12.1.4.

P
If v D ::: , then v D
ci ei (where E D fe1 ; : : : ; eng) and so ŒvE D
cn
 c1 
:: . That is, a vector “looks like itself” in the standard basis.
:
cn

(2) Let V be arbitrary and let B D fb1; : : : ; bn g be a basis for V . Then
Œbi B D ei .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 43 — #57

i

i

2.2. Matrices for linear transformations

43

nh i h io
nh i h io
1
0
1
3
(3) Let V D R2 , let E D
;
D fe1 ; e2g and let B D
;
D
0
1
2
7
h i
h i
h i
h i
h i
1
3
1
1
0
fb1 ; b2g. Then Œb1 E D
and Œb2 E D
(as
D1
C2
and
2
7
2
0
1
h i
h i
h i
3
1
0
D3
C 7 ).
7
0
1
h i
h i
h i
h i
7
3
1
1
On the other hand, Œe1 B D
and Œe2 B D
(as
D7
C
2
1
0
2
h i
h i
h i
h i
3
0
1
3
. 2/
and
D . 3/
C 1 ).
7
1
0
7
h i
h i
h i
17
17
x
Let v1 D
. Then Œv1 E D
. Also, Œv1 B D 1 where v1 D
39
39
x2
h i
h i
h i
17
1
3
x1 b1 C x2 b2 , i.e.,
D x1
C x2 . Solving, we find x1 D 2, x2 D
39
2
7
h i
h i
h i
2
27
27
5, so Œv1 B D
. Similarly, let v2 D
. Then Œv2 E D
. Also,
5
62
62
h i
h i
h i
h i
y
27
1
3
Œv2 B D 1 where v2 D y1 b1 C y2 b2 , i.e.,
D y1
C y2 .
y2
62
2
7
h i
3
Solving, we find y1 D 3, y2 D 8, so Œv2 B D
.
8

(4) Let V D P2 .R/, let B0 D f1; x; x 2g, and let B1 D f1; x
.x 1/2 g. Let p.x/ D 3 6x C 4x 2 . Then

1;

2 3
3


4
p.x/ B D
65 :
0
4

Also p.x/ D 1 C 2.x

1/ C 4.x

1/2 , so

2 3
1


p.x/ B D 425:
1
4

Þ

2.2 Matrices for linear transformations
Let V and W be vector spaces of finite dimensions n and m respectively
with bases B D fv1 ; : : : ; vn g and C D fw1; : : : ; wm g and let T W V ! W
is a linear transformation. Then we have isomorphisms S W V ! F n given
by S.v/ D ŒvB and U W W ! F m given by U.w/ D ŒwC , and we
may form the composition U ı T ı S 1 W F n ! F m . Since this is a linear
transformation, it is given by multiplication by a unique matrix. We are thus
led to the following definition.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 44 — #58

i

i

44

Guide to Advanced Linear Algebra

Definition 2.2.1. Let V be an n-dimensional vector space with basis
B D fv1 ; : : : ; vn g and let W be an m-dimensional vector space with basis
C D fw1 ; : : : ; wmg. Let T W V ! W be a linear transformation. The matrix
of the linear transformation T with respect to the bases B and C, denoted
ŒT C B , is the unique matrix such that
ŒT C

B ŒvB

D ŒT .v/C

It is easy to write down ŒT C

B

(at least in principle).

Lemma 2.2.2. In the situation of Definition 2.2.1, the matrix ŒT C
given by

 


 
ŒT C B D T v1 C j T v2 C j    j T vn C ;
i.e., ŒT C

B

Þ

for every v 2 V:

B

is

is the matrix whose i th column is ŒT .vi /C , for each i .

Proof. By Lemma 1.2.23, we need only verify the equation ŒT C B Œv D
ŒT .v/C for v D vi , i D 1 : : : ; n. But Œvi B D ei and ŒT C B ei is the
i th column of ŒT C B , i.e., ŒT C B Œvi B D ŒT C B ei D ŒT .vi /C as
required.
Theorem 2.2.3. Let V be a vector space of dimension n and let W be a
vector space of dimension m over a field F . Choose bases B of V and C of
W . Then the linear transformation
S W flinear transformations T W V ! W g
! fm-by-n matrices with entries in F g
given by S.T / D ŒT C

B

is an isomorphism.

Corollary 2.2.4. In the situation of Theorem 2.2.3, flinear transformations
T W V ! W g is a vector space over F of dimension mn.
Proof. fm-by-n matrices with entries in F g is a vector space of dimension
mn, with basis the set of matrices fEij g, 1  i  m, 1  j  n, where Eij
has an entry of 1 in the .i; j / position and all other entries 0.
Lemma 2.2.5. Let U , V , and W be finite-dimensional vector spaces with
bases B, C, and D respectively. Let T W U ! V and S W V ! W be
linear transformations. Then S ı T W U ! W is a linear transformation
with
ŒS ı T D

B

D ŒSD

C ŒT C

B:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 45 — #59

i

i

2.2. Matrices for linear transformations
Proof.

For any u 2 W ,
ŒSD

C ŒT C

But also ŒS ı T D

B



B ŒuB

45


ŒT C B ŒuB

 
D ŒSD C T .u/ C



 
D S T .u/ D D S ı T .u/ D :

ŒuB D ŒSD

C

D Œ.S ı T /.u/D so

ŒS ı T D

B

D ŒSD

C ŒT C

B:



Example 2.2.6. Let A be an m-by-n matrix and let TA W F n ! F m be
defined by TA .v/ D Av. Choose the standard bases En for F n and Em for
F m . Write A D Œa1 j a2 j    j an , i.e., ai is the i th column of A. Then
ŒTA Em En is the matrix whose i th column is

 



TA ei Em D Aei Em D ai Em D ai ;

so we see that ŒTAEm En D A. That is, multiplication by a matrix “looks
like itself” with respect to the standard bases.
Þ
The following definition is the most important special case of Definition 2.2.1, and the case we will concentrate on.
Definition 2.2.7. Let V be an n-dimensional vector space with basis
B D fv1 ; : : : ; vn g and let T W V ! V be a linear transformation. The
matrix of the linear transformation T in the basis B, denoted ŒT B , is the
unique matrix such that
ŒT B ŒvB D ŒT .v/B

for every v 2 V:

Þ

Remark 2.2.8. Comparing Definition 2.2.7 with Definition 2.2.1, we see
that we have simplified our notation in this special case: We have replaced
ŒT B B by ŒT B .
With this simplification, the conclusion of Lemma 2.2.2 reads





 
ŒT B D T v1 B j T v2 B j    j T vn B :
Þ
We also make the following observation.

Lemma 2.2.9. Let V be a finite-dimensional vector space and let B be a
basis of V .
(1) If T D I, the identity linear transformation, then ŒT B D I , the
identity matrix.
(2) T W V ! V is an isomorphism if and only if ŒT B is an invertible
matrix, in which case ŒT 1 B D .ŒT B / 1 .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 46 — #60

i

i

46

Guide to Advanced Linear Algebra

h
i
65 24
Example 2.2.10. Let T W R2 ! R2 be given by T .v/ D
v.
149 55
h
i
h i
65 24
1
Then ŒT E D
. Let B be the basis B D fb1 ; b2 g with b1 D
149 55
2
h i
3
and b2 D
. Then ŒT B D ŒŒv1 B j Œv2 B  where
7


v1 D T b1 D



65
149

24
55

   
1
17
D
2
39

and
v2 D T b2





65
D
149

   
24 3
27
D
:
55 7
62

We have computed
Œv1 B and
where we obtained
 
 Œv
 2 B in Example
 2.1.4(3)

2
3
2 3
Œv1 B D
and Œv2 B D
, so ŒT B D
.
Þ
5
8
5 8
We shall see further examples of matrices of particularly interesting linear transformations in Example 2.3.18.

2.3

Change of basis

We now investigate how to change coordinates. In our metaphor of coordinates providing a language, changing coordinates is like translating between
languages. We look at translation between languages first, in order to guide
us later.
Suppose we wish to translate from English to English, for example, or
from German to German. We could do this by using an English to English
dictionary, or a German to German dictionary, which would look in part
like:
English
star
arrow

English
star
arrow

German
Stern
Pfeil

German
Stern
Pfeil

The two columns are identical. Indeed, translating from any language to
itself leaves every word unchanged, or to express it mathematically, it is the
identity transformation.
Suppose we wish to translate from English to German or from German
to English. We could use an English to German dictionary or a German to
English dictionary, which would look in part like:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 47 — #61

i

i

2.3. Change of basis
English
star
arrow

47

German
Stern
Pfeil

German
Stern
Pfeil

English
star
arrow

The effect of translating from German to English is to reverse the effect of translating from English to German, and vice versa. Mathematically,
translating from German to English is the inverse of translating from English to German, and vice versa.
Suppose that we wish to translate from English to German but we do not
have an English to German dictionary available. However, we do have an
English to French dictionary, and a French to German dictionary available,
and they look in part like:
English
star
arrow

French
étoile
flèche

French
étoile
flèche

German
Stern
Pfeil

We could translate from English to German by first translating from
English to French, and then translating from French to German. Mathematically, translating from English to German is the composition of translating
from English to French followed by translating from French to German.
We now turn from linguistics to mathematics.
Let V be an n-dimensional vector space with bases B D fv1; : : : ; vn g
and C D fw1; : : : ; wn g. Then we have isomorphisms S W V ! F n given by
S.v/ D ŒvB , and T W V ! F n given by T .v/ D ŒvC . The composition
T ı S 1 W F n ! F n is then an isomorphism, and T ı S 1 .ŒvB / D ŒvC .
By Lemma 1.1.12, it isomorphism is given by multiplication by a unique
(invertible) matrix. We make the following definition.
Definition 2.3.1. Let V be an n-dimensional vector space with bases
B D fv1 ; : : : ; vn g and C D fw1 ; : : : ; wmg. The change of basis matrix
PC B , is the unique matrix such that
PC

B ŒvB

D ŒvC
Þ

for every v 2 V .
It is easy to write down, at least in principle, PC

B.

Lemma 2.3.2. In the situation of Definition 2.3.1, the matrix PC
given by
   
  
PC B D v1 C j v2 C j    j vn C ;
i.e., PC

B

B

is

is the matrix whose i th column is Œvi C .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 48 — #62

i

i

48

Guide to Advanced Linear Algebra

Proof. By Lemma 1.2.23, we need only verify the equation PC B ŒvB D
ŒvC for v D vi , i D 1; : : : ; n. But Œvi B D ei and PC B ei is the i th
column of PC B , i.e., PC B Œvi B D PC B ei D Œvi C as required.
Remark 2.3.3. If we think of B as the “old” basis, i.e., the one we are
translating from, and C as the “new” basis, i.e., the one we are translating
to, then this lemma says that in order to solve the translation problem for an
arbitrary vector v 2 V , we need only solve the translation problem for the
old basis vectors, and write down their translations in successive columns to
form a matrix. Then multiplication by that matrix does translation for every
vector.
Þ
We have a theorem that parallels our discussion of translation between
human languages.
Theorem 2.3.4. Let V be a finite-dimensional vector space.
(1) For any basis B of V , PB B D I is the identity matrix.
(2) For any two bases B and C of V , PC B is invertible and .PC
PB C .
(3) For any three bases B, C, and D of V , PD B D PD C PC

1

B/

D

B.

Proof. (1) For any v 2 V ,
ŒvB D I ŒvB D PB

B ŒvB ;

so PB B D I .
(2) For any v 2 V ,
.PB

C PC

B /ŒvB

D PB

C .PC

B ŒvB /

D PB

C ŒvC

D ŒvB ;

so PB C PC B D I , and similarly PC B PB C D I so .PC
PB C .
(3) PD B is the matrix defined by PD B ŒvB D ŒvD . But
.PD
so PD

C PC
B

B /ŒvB

D PD

C PC

D PD

C .PC

B ŒvB /

D PD

C ŒvC

B/

1

D

D ŒvD ;

B.

Remark 2.3.5. There is no uniform notation for PC B . We have chosen
a notation that we feel is mnemonic: PC B ŒvB D ŒvC as the subscript
“B” of ŒvB is near the “B” in the subscript “C
B” of PC B , and
this subscript goes to “C”, which is the subscript in the answer ŒvC . Some
other authors denote PC B by PCB and some by PBC . The reader should
pay careful attention to the author’s notation as interchanging the two bases
takes the change of basis matrix to its inverse.
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 49 — #63

i

i

2.3. Change of basis

49

Remark 2.3.6. (1) There is one case in which the change of basis matrix
is easy to write down. Suppose V D F n , B D fv1 ; : : : ; vn g is a basis of V ,
and E D fe1; : : : ; en g is the standard basis of V . Then, by Example 2.1.4(1),
Œvi E D vi , so
PE

B

D Œv1 j v2 j    j vn :

Thus, the change of basis matrix into the standard basis is easy to find.
(2) It is more often the case that we wish to find the change of basis
matrix out of the standard basis, i.e., we wish to find PB E . Then it requires
work to find Œei B . Instead we may write down PE B as in (1) and then
find PB E by PB E D .PE B / 1 .
(3) Suppose we have two bases B and C of F n neither of which is the
standard basis. We may find PC B directly, or else we may find PC B by
PC B D PC E PE B D .PE C / 1 PE B .
Þ
Lemma 2.3.7. Let P be an n-by-n matrix. Then P is a change of basis
matrix between two bases of F n if and only if P is invertible.

Proof. Let P D .pij /. Choose a basis C D fw1 ; : : : ; wng of V . Let vi D
P
j pij wj . Then B D fv1 ; : : : ; vn g is a basis of V if and only if P is
invertible, in which case P D PC B .
Remark 2.3.8. Comparing Lemma 2.2.2 and Lemma 2.3.2, we observe
that PC B D ŒIC B where I W F n ! F n is the identity linear transformation (I.v/ D v for every v in F n ).
Þ
    
    
1
0
1
3
Example 2.3.9. Let V D R2 , E D
;
, and B D
;
.
0
1
2
7
 
 
17
17
Let v1 D
, so also Œv1 E D
. We computed directly in Exam39
39
 
 
 
2
27
27
ple 2.1.4(3) that Œv1 B D
. Let v2 D
, so also Œv2 E D
. We
5
62
62
 
3
computed directly in Example 2.1.4(3) that Œv2 B D
.
8
 
1 3
We know from Remark 2.3.6(1) that PE B D
and from Re2 7
  1


13
7 3
mark 2.3.6(2) that PB E D
D
. Then we can easily
27
2 1
verify that
  
 
  
 
2
7 3 17
3
7 3 27
D
and
D
:
Þ
5
2 1 39
8
2 1 62

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 50 — #64

i

i

50

Guide to Advanced Linear Algebra

We shall see further particularly interesting examples of change of basis
matrices in Example 2.3.17.
Now we wish to investigate change of basis for linear transformations.
Again we will return to our metaphor of language, and see how linguistic
transformations work.
Let T be the transformation that takes an object to several of the same
objects, T .?/ D ? ? ?    ?, T .!/ D!!!    !.
This is reflected in the linguistic transformation of taking the plural.
Suppose we wish to take the plural of German words, but we do not know
how. We consult our German to English and English to German dictionaries:
German
Stern
Sterne
Pfeil
Pfeile

English
star
stars
arrow
arrows

English
star
stars
arrow
arrows

German
Stern
Sterne
Pfeil
Pfeile

We thus see that to take the plural of the German word Stern, we may
translate Stern into the English word star, take the plural (i.e., apply our
linguistic transformation) of the English word star, and translate this word
into German to obtain Sterne, the plural of the German word Stern. Similarly, the path Pfeil ! arrow ! arrows ! Pfeile gives us the plural of the
German word Pfeil.
The mathematical analog of this conclusion is the following theorem.
Theorem 2.3.10. Let V be an n-dimensional vector space and let T W V !
V be a linear transformation. Let B and C be any two bases of V . Then
ŒT C D PC

B ŒT B PB

Proof. For any vector v 2 V ,

PC B ŒT B PB C ŒvC D PC

D PC
D PC

D PC

But ŒT C is the unique matrix with

C:

B ŒT B
B ŒT B





PB

C ŒvC

ŒvB

B ŒT B ŒvB




B T .v/ B D T .v/ C :

ŒT C ŒvC D ŒT .v/C
for every v 2 V , so we see that ŒT C D PC

B ŒT B PB

C.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 51 — #65

i

i

2.3. Change of basis

51

Corollary 2.3.11. In the situation of Theorem 2.3.10,
 1
ŒT C D PB C
ŒT B PB C
 1
D PC B ŒT B PC B
:

Proof. Immediate from Theorem 2.3.10 and Theorem 2.3.4(2).

We are thus led to the following very important definition. (A priori,
this definition may seem very unlikely, but in light of our development it is
almost forced on us.)
Definition 2.3.12. Two n-by-n matrices A and B are similar if there is
an invertible matrix P with
ADP

1

Þ

BP:

Remark 2.3.13. It is easy to check that similarity is an equivalence relation.
Þ
The importance of this definition comes from the following theorem.
Theorem 2.3.14. Let A and B be n-by-n matrices. Then A and B are
similar if and only if they are matrices of the same linear transformation
T W F n ! F n with respect to a pair of bases of F n .
Proof. Immediate from Corollary 2.3.11.
There is an alternate point of view.
Theorem 2.3.15. Let V be a finite-dimensional vector space and let S W
V ! V and T W V ! V be linear transformations. Then S and T are
conjugate (i.e., T D R 1 SR for some invertible linear transformation
R W V ! V ) if and only if there are bases B and C of V with
ŒSB D ŒT C :
Proof.

If ŒSB D ŒT C , then by Corollary 2.3.11
ŒSB D ŒT C D PC

B ŒT B PC

1
B

so ŒSB and ŒT B are conjugate by the matrix PC B and hence, since a
linear transformation is determined by its matrix in any basis, S and T are
conjugate. Conversely, if T D R 1 SR then
ŒT E D ŒR

1

E ŒSE ŒRE

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 52 — #66

i

i

52

Guide to Advanced Linear Algebra

but ŒRE , being an invertible matrix, is a change of basis matrix PC
some basis C. Then
ŒT E D PC 1 E ŒSPC
so
PC

E ŒT E PC

1
E

i.e.,

D ŒSE ;

h

65

Example 2.3.16. Let T W R2 ! R2 be T D TA , where A D
149
nh i h io
1
3
2
Let B D
;
, a basis of R . Then ŒT B D PB E ŒT E PE

PB 1 E ŒT E PB

7

E.

for

E;

ŒT C D ŒSE :

2

B

i

24
.
55
B

D

Since ŒT E D A we see that

ŒT B D



13
27



1



65
149

24
55



  
13
2 3
D
;
27
5 8

verifying the result of Example 2.2.10, where we computed ŒT B directly.
Þ
Example 2.3.17. Let V D Pn .R/ and let B and C be the bases
B D f1; x; x .2/; x .3/; : : : ; x .n/ g;
where x .i / D x.x

1/.x

2/    .x

i C 1/, and

C D f1; x; x 2; : : : ; x n g:
Let P D .pij / D PC B and Q D .qij / D PB C D P 1 . The entries
pij are called Stirling numbers of the first kind and the entries qij are called
Stirling numbers of the second kind. Here we number the rows/columns of
the respective matrices from 0 to n, not from 1 to n C 1. For example, if
n D 5 we have
3
2
3
2
10 0 0 0
0
1 000 0 0
60 1 1 2 6 247
60 1 1 1 1 1 7
7
7
6
6
7
6
7
6
60 0 1 3 7 157
60 0 1 3 11 507
P D6
7:
7 and Q D 6
60 0 0 1 6 257
60 0 0 1 6 357
7
6
7
6
40 0 0 0 1 105
40 0 0 0 1 105
0 000 0 1
00 0 0 0
1
(The numbers pij and qij are independent of n as long as i; j  n.)

Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 53 — #67

i

i

2.4. The matrix of the dual

53

Example 2.3.18. Let V D P5 .R/ with bases B D f1; x; : : : ; x .5/g and
C D f1; x; : : : ; x 5 g as in Example 2.3.17.
(1) Let D W V ! V be differentiation, D.p.x// D p 0 .x/.
Then
2
3
2
3
0 1 1 2
6
24
01 00 00
60 0 2 6 22 1007
60 0 2 0 0 07
6
7
6
7
6
7
6
7
60 0 0 3 18 1057
60 0 0 3 0 07
ŒDB D 6
7 and ŒDC D 6
7;
60 0 0 0
4
407
60 0 0 0 4 07
6
7
6
7
40 0 0 0
40 0 0 0 0 55
0
55
0 0 0 0
0
0
00 00 00
so these two matrices are similar. Indeed,
ŒDB D P

1

ŒDC P D QŒDC Q

1

where P and Q are the matrices of Example 2.3.17.
(2) Let  W V ! V be the forward difference operator,
p.x C 1/ p.x/. Then
3
2
2
011 11
010 000
60 0 2 3 4
6 0 0 2 0 0 07
7
6
6
7
6
6
60 0 0 3 6
6 0 0 0 3 0 07
ŒB D 6
7 and ŒC D 6
60 0 0 0 4
6 0 0 0 0 4 07
7
6
6
40 0 0 0 0
4 0 0 0 0 0 55
000 00
000 000
so these two matrices are similar. Again,
ŒB D P

1

ŒC P D QŒC Q

.p.x// D
3
1
57
7
7
107
7
107
7
55
0

1

where P and Q are the matrices of Example 2.3.17.
(3) Since ŒDC D ŒB , we see that D W V ! V and  W V ! V are
conjugate.
Þ

2.4

The matrix of the dual

Let T W V ! X be a linear transformation between finite-dimensional
vector spaces. Once we choose bases B and C of V and X respectively, we
can represent T by a unique matrix ŒT C B . We also have the dual linear
transformation T  W X  ! V  and the dual bases C  and B  of X  and
V  respectively, and it is natural to consider the matrix ŒT  B  C  .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 54 — #68

i

i

54

Guide to Advanced Linear Algebra

Definition 2.4.1. Let T W V ! X be a linear transformation between
finite dimensional vector spaces, and let A be the matrix A D ŒT C B . The
transpose of A is the matrix t A given by t A D ŒT  B  C  .
Þ
Let us first see that this gives the usual definition of the transpose of a
matrix.
Lemma 2.4.2. Let A D .aij / be an m-by-n matrix. Then B D t A D .bij /
is the n-by-m matrix with entries bij D aj i , i D 1; : : : ; m, j D 1; : : : ; n.
Proof. Let B D fv1 ; : : : ; vn g, B  D fw1; : : : ; wn g, C D fx1 ; : : : ; xmg,

and C  D fy1 ; : : : ; ym
g. Then, by definition,
m
 X
T vj D
akj xk
kD1

and

n
 X
bki wk
T  yi D
kD1

Now

yi T vj
and
T  yi





D aij


vj D bj i

for j D 1; : : : ; n

for i D 1; : : : ; m:



as yi xi D 1; yi xk D 0 for k ¤ i


as wj vj D 1; wk vj D 0 for k ¤ j:

By the definition of T  , for any y  2 X  and any v 2 V


T  y  .v/ D y  T .v/

so we see bj i D aij , as claimed.

Remark 2.4.3. Every matrix is the matrix of a linear transformation with
respect to a pair of bases, so t A is defined for any matrix A. Our definition
appears to depend on the choice of the bases B and C, so to see that t A is
well-defined we must show it is independent of the choice of bases. This
follows from first principles, but it is easier to observe that Lemma 2.4.2
gives a formula for t A that is independent of the choice of bases.
Þ
Remark 2.4.4. It easy to see that t .A1 C A2 / D t A1 C t A2 and that
.cA/ D c t A.
Þ

t

Other properties of the transpose are a little more subtle.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 55 — #69

i

i

2.4. The matrix of the dual

55

Lemma 2.4.5. t .AB/ D t B t A.
Proof. Let T W V ! X with ŒT C B D B and let S W X ! Z with
ŒT D C D A. Then, as we have seen, S ıT W V ! Z with ŒS ıT D B D
AB. By Definition 2.4.1 and Lemma 1.6.23,
t

.AB/ D Œ.S ı T / B 


D ŒT B 

D  D ŒT

C  ŒS C  D 



ı S  B 
t

D

t



D B A:

Lemma 2.4.6. Let A be an invertible matrix. Then, t .A

1



/ D .t A/


1

.



Proof. Clearly if T W V ! V is the identity, then T W V ! V is the
identity, (w  .T .v// D w .v/ D .T  .w  //.v/ if T and T  are both the
respective identities). Choose a basis B of V and let R W V ! V be the
linear transformation with ŒRB D A. Then ŒR 1B D A 1 , and

 
 
I D ŒIB D I  B  D R 1 ı R B 

  
 
D R B  R 1 B  D t A t A 1 ;
and

 

 
I D ŒIB D I  B  D R ı R 1 B 

   

D R 1 B  R  B  D t A 1 t A:



As an application of these ideas, we have a theorem from elementary
linear algebra.
Theorem 2.4.7. Let A be an m-by-n matrix. Then the row rank of A and
the column rank of A are equal.
Proof. Let T D TA W F n ! F m be given by T .v/ D Av. Then ŒT Em En D
A, so the column rank of A, which is the dimension of the subspace of F m
spanned by the columns of A, is the dimension of the subspace Im.T / of
F m.
 D
Consider the dual T  W .F m / ! .F n / . As we have seen, ŒT  En Em
t
t

A, so the column rank of A is equal to the dimension of Im.T /. By
Corollary 1.6.20, dim Im.T  / D dim Im.T /, and obviously the column
space of t A is identical to the row space of A.
We have considered the dual. Now let us consider the double dual. In
Lemma 1.6.26 we defined the linear transformation H from a vector space
to its double dual.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 56 — #70

i

i

56

Guide to Advanced Linear Algebra

Lemma 2.4.8. Let T W V ! X be a linear transformation between finitedimensional F -vector spaces. Let B D fv1 ; : : : ; vn g be a basis of V and
C D fx1 ; : : : ; xmg be a basis of X.

Let B  D fv1; : : : ; vn g and C  D fx1 ; : : : ; xm
g, bases of V 



and X respectively (where vi D H .vi / and xj D H .xj /). Then
  
T
C 

B 

D ŒT C

B:

Proof. An inspection of Definition 1.6.29 shows that T  is the composition H ı T ı H 1 where the right-hand H is H W V ! V  and the
left-hand H is H W W ! W  . But ŒH B  B D I and ŒH C  C D I
so
  
T
D ŒH C  C ŒT C B ŒH 1 B B 
C  B 
D I ŒT C

BI

1

D ŒT C

B:



The following corollary is obvious from direct computation but we present
another proof.
Corollary 2.4.9. Let A be an m-by-n matrix. Then t . tA/ D A.
Proof. Let T W V ! W be a linear transformation with ŒT C B D A.
Then by Lemma 2.4.8,


A D ŒT C B D ŒT  C  B  D t tŒT C B D t tA ;
as T  is the dual of the dual of T .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 57 — #71

i

i

CHAPTER

3

Determinants
In this chapter we deal with the determinant of a square matrix. The determinant has a simple geometric meaning, that of signed volume, and we
use that to develop it in Section 3.1. We then present a more traditional and
fuller development in Section 3.2. In Section 3.3 we derive important and
useful properties of the determinant. In Section 3.4 we consider integrality
questions, e.g., the question of the existence of integer (not just rational)
solutions of the linear system Ax D b, a question best answered using determinants. In Section 3.5 we consider orientations, and see how to explain
the meaning of the sign of the determinant in the case of real vector spaces.
In Section 3.6 we present an interesting family of examples, the Hilbert
matrices.

3.1

The geometry of volumes

The determinant of a matrix A has a simple geometric meaning. It is the
(signed) volume of the image of the unit cube under the linear transformation TA .
We will begin by doing some elementary geometry to see what properties (signed) volume should have, and use that as the basis for the not-sosimple algebraic definition.
Henceforth we drop the word “signed” and just refer to volume.
In considering properties that volume should have, suppose we are working in R2, where volume is area. Let A be the matrix A D Œv1 j v2 . The
unit square in R2 is the parallelogram determined by the standard unit vectors e1 and e2 . TA .e1 / D v1 and TA .e2 / D v2 , so we are looking at the area
of the parallelogram P determined by v1 and v2 , the two columns of A.
57

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 58 — #72

i

i

58

Guide to Advanced Linear Algebra

The area of a parallelogram should certainly have the following two
properties:
(1) If we multiply one side of P by a number c, e.g., if we replace P by
the parallelogram P 0 determined by v1 and cv2 , the area of P 0 should be c
times the area of P .
(2) If we add a multiple of one side of P to another, e.g., if we replace
P by the parallelogram P 0 determined by v1 and v2 C cv1 , the area of
P 0 should be the same as the area of P . (To see this, note that the area of
a parallelogram is base times height, and while this operation changes the
shape of the parallelogram, it does not change its base or its height.)
Property (1) should in particular hold if c D 0, when one of the sides
becomes the zero vector, in which case the parallelogram degenerates to a
line (or to a point if both sides are the zero vector), and a line or a point has
area 0.
We now consider an arbitrary field F , and consider n-by-n matrices. We
are still guided by properties (1) and (2), extending them to n-by-n matrices
using the idea that if only one or two columns are changed as in (1) or (2),
and the other n 1 or n 2 columns are unchanged, then the volume should
change as in (1) or (2). We are thus led to the following definition.
Definition 3.1.1. A volume function Vol W Mn .F / ! F is a function
satisfying the properties:
(1) For any scalar c, and any i ,


Vol v1 j    j vi 1 j cvi j vi C1 j    j vn


D c Vol v1 j    j vi 1 j vi j vi C1 j    j vn :

(2) For any scalar c, and any j ¤ i ,



Vol v1 j    j vi 1 j vi C cvj j vi C1 j    j vn


D Vol v1 j    j vi 1 j vi j vi C1 j    j vn :

Note we have not shown that Vol exists, but we will proceed on the
assumption it does to derive properties that it must have, and we will use
them to prove existence.
As we have defined it, Vol cannot be unique, as we can scale it by an
arbitrary factor. Once we specify the scale we obtain a unique function that
we will denote by Vol1 , and we will let the determinant be Vol1 . But it is
convenient to work with arbitrary volume functions and normalize the result

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 59 — #73

i

i

3.1. The geometry of volumes

59

at the end. Vol1 (or the determinant) will be Vol scaled so that the signed
volume of the unit n-cube, with the columns arranged in the standard order,
is C1.
Þ
Lemma 3.1.2. (1) If some column of A is zero, then Vol.A/ D 0.
(2) If the columns of A are not linearly independent, then Vol.A/ D 0.
In particular, if two columns of A are equal, then Vol.A/ D 0.
(3)


Vol v1 j    j vj j    j vi j    j vn


D Vol v1 j    j vi j    j vj j    j vn :
(4)



Vol v1 j    j au C bw j    j vn


D a Vol v1 j    j u j    j vn


C b Vol v1 j    j w j    j vn :

Proof. (1) Let vi D 0. Then vi D 0vi , so by property (1)




Vol v1 j    j vi j    j vn D 0  Vol v1 j    j vi j    j vn D 0:

(2) Let vi D a1 v1 Ca2 v2 C  Cai 1 vi 1 Cai C1 vi C1 C  Can vn . Let
vi0 D a2 v2 C  Cai 1vi 1 Cai C1 vi C1 C  Can vn , so that vi D a1 v1 Cvi0 .
Then, applying property (2),




Vol v1 j    j vi j    j vn D Vol v1 j    j a1 v1 C vi0 j    j vn


D Vol v1 j    j vi0 j    j vn :
Proceeding in the same way, applying property (2) repeatedly, we obtain




Vol v1 j    j vi j    j vn D Vol v1 j    j 0 j    j vn D 0:

(3)



Vol v1 j    j vj j    j vi j    j vn


D Vol v1 j    j vj j    j vj C vi j    j vn


D Vol v1 j    j vi j    j vj C vi j    j vn


D Vol v1 j    j vi j    j vj j    j vn


D Vol v1 j    j vi j    j vj j    j vn :

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 60 — #74

i

i

60

Guide to Advanced Linear Algebra

(4) First, suppose fv1 ; : : : ; vi 1 ; vi C1 ; : : : ; vn g is not linearly independent. Then, by part (3), the equation in (4) becomes 0 D a  0 C b  0, which
is true.
Now for the heart of the proof. Suppose fv1 ; : : : ; vi 1 ; vi C1; : : : ; vn g is
linearly independent. By Corollary 1.2.10(1), we may extend this set to a
basis fv1 ; : : : ; vi 1; vi C1 ; : : : ; vn ; zg of F n . Then we may write
u D c1 v1 C    C ci

w D d1 v1 C    C di

1 vi 1
1 vi 1

C ci C1 vi C1 C    C cn vn C c 0 z;

C di C1 vi C1 C    C dn vn C d 0 z:

Let v D au C bw. Then
v D e1 v1 C    C ei

1 vi 1

C ei C1 vi C1 C    C en vn C e 0 z

where e 0 D ac 0 C bd 0 .
Applying property (2) repeatedly, and property (1), we see that


Vol v1 j    j v j    j vn D e 0 Vol


Vol v1 j    j u j    j vn D c 0 Vol


Vol v1 j    j w j    j vn D d 0 Vol

yielding the theorem.




v1 j    j z j    j vn ;


v1 j    j z j    j vn ;


v1 j    j z j    j vn ;

Remark 3.1.3. Setting vi D vj D z (z arbitrary) in Lemma 3.1.2(3)
gives 2 Vol.Œv1 j    j z j    j z j    j vn / D 0 and hence Vol.Œv1 j
   j z j    j z j    j vn / D 0 if F does not have characteristic z. This
latter condition is stronger if char.F / D 2, and it is this stronger condition,
coming directly from the geometry, that we need.
Þ
Theorem 3.1.4. A function f W Mn .F / ! F is a volume function if and
only if it satisfies:
(1) Multilinearity: If A D Œv1 j    j vn  with vi D au C bw for some i ,
then
f



v1 j    j vi j    j vn





v1 j    j u j    j vn


C bf v1 j    j w j    j vn :

D af

(2) Alternation: If A D Œv1 j    j vn  with vi D vj for some i ¤ j , then
f



v1 j    j vn



D 0:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 61 — #75

i

i

3.1. The geometry of volumes

61

Proof. We have seen that any volume function satisfies Lemma 3.1.2(3)
and (4), which gives alternation and multilinearity. Conversely, it is easy to
see that multilinearity and alternation give properties (1) and (2) in Definition 3.1.1.
Remark 3.1.5. The conditions of Theorem 3.1.4 are usually taken to be
the definition of a volume function.
Þ

Remark 3.1.6. In characteristic 2, the function f Œ ab dc  D ac is multilinear and satisfies f .Œv2 j v1 / D f .Œv1 j v2 / D f .Œv1 j v2 /, but is not
alternating.
Þ
Theorem 3.1.7. Suppose there exists a nontrivial volume function Vol W
Mn .F / ! F . Then there is a unique volume function Vol1 satisfying Vol1 .I /
D 1. Furthermore, any volume function is Vola for some a 2 F , where Vola
is the function Vola .A/ D a Vol1 .A/.
Proof. Let A be a matrix with Vol.A/ ¤ 0. Then, by Lemma 3.1.2(2), A
must be nonsingular. Then there is a sequence of elementary column operations taking A to I . By Definition 3.1.1(1) and (2), and by Lemma 3.1.2(4),
each of these operations has the effect of multiplying Vol.A/ by a nonzero
scalar, so Vol.I / ¤ 0.
Any scalar multiple of a volume function is a volume function, so we
may obtain a volume function Vol1 by Vol1 .A/ D .1= Vol.I // Vol.A/, and
clearly Vol1 .I / D 1. Then set Vola .A/ D a Vol1 .A/.
Now let f be any volume function. Set a D f .I /. If A is singular,
then f .A/ D 0. Suppose A is nonsingular. Then there is a sequence of
column operations taking I to A, and each of these column operations has
the effect of multiplying the value of any volume function by a nonzero
constant independent of the choice of volume function. Thus, if we let b be
the product of these constants, we have
f .A/ D bf .I / D ba D b Vola .I / D Vola .A/;
so f D Vola . In particular, if f is any volume function with f .I / D 1,
then f D Vol1 , which shows that Vol1 is unique.
Note the proof of this theorem does not show that Vol1 exists, as a priori
we could choose two different sequences of elementary column operations
to get from I to A and obtain two different values for Vol1 .A/. In fact Vol1
does exist, as we now see.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 62 — #76

i

i

62

Guide to Advanced Linear Algebra

Theorem 3.1.8. There is a unique volume function Vol1 W Mn .F / ! F
with Vol1 .I / D 1.
Proof. We proceed by induction on n. For n D 1 we define det.Œa/ D a.
Suppose det is defined on .n 1/-by-.n 1/ matrices. We define det on
n-by-n matrices by
det.A/ D

n
X

. 1/1Cj a1j det.M1j /

j D1

where A D .aij / and M1j is the .n 1/-by-.n 1/ matrix obtained by
deleting row 1 and column j of A. (M1j is known as the .1; j /-minor of
A.)
We need to check that the properties of a volume function are satisfied. Instead of checking the properties in Definition 3.1.1 directly, we will
check the equivalent properties in Theorem 3.1.4. We use the notation of
that theorem.
We prove the properties of det by induction on n. We assume that det
has the properties of a volume function given in Theorem 3.1.4 for .n 1/by-.n 1/ matrices, and in particular that the conclusions of Lemma 3.1.2
hold for det on .n 1/-by-.n 1/ matrices.
We first prove multilinearity. In the notation of Theorem 3.1.4, let vi D
au C bw, and let A D .aij /. Then a1i D au1 C bw 1, where u1 and
w 1 are the first entries of u and w respectively. Also, M1i D Œv1 j    j

vi 1 j vi C1 j    j vn . Inspecting the sum for det.A/, and applying
Lemma 3.1.2(4), we see that multilinearity holds.
We next prove alternation. Again follow the notation of Theorem 3.1.4
and let vi D vj for some i ¤ j . If k ¤ i and k ¤ j , the minor M1k
has two identical columns and so by Lemma 3.1.2(2), det.M1k / D 0. Then,
inspecting the sum for det.A/, we see that it reduces to


det.A/ D . 1/1Ci a1i det M1i C . 1/1Cj a1j det M1j

with a1i D a1j . Let i < j . Then

and


M1i D v 1 j    j v i


M1j D v1 j    j v i

1

j v i C1 j    j vj

1

j vj j v i C1 j    j vj

1

j vj j vj C1 j    j v n

1




j vj C1 j    j v n ;

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 63 — #77

i

i

3.1. The geometry of volumes

63

where v k is the vector obtained from vk by deleting its first entry, and vi D
vj .
We may obtain M1i from M1j as follows: First interchange v i with
v i C1 , then interchange v i with v i C2 ; : : :, and finally interchange v i with
vj 1 . There is a total of j i 1 interchanges, and by Lemma 3.1.2(3)
each interchange has the effect of multiplying det by 1, so we see that
det.M1i / D . 1/j

i 1

det.M1j /:

Hence, letting a D a1j and m D det.M1j /,
det.A/ D . 1/1Ci a. 1/j

i 1

m C . 1/1Cj am

D . 1/j am 1 C . 1/ D 0:

Finally, det.Œ1/ D 1 and by induction we have that det.In / D 1 
det.In 1 / D 1, where In (respectively In 1 ) denotes the n-by-n (respectively .n 1/-by-.n 1/) identity matrix.
Definition 3.1.9. The unique volume function Vol1 is the determinant
function, denoted det.A/.
Þ
Corollary 3.1.10. Let A be an n-by-n matrix. Then det.A/ ¤ 0 if and only
if A is nonsingular.
Proof. By Lemma 3.1.2(2), for any volume function Vola , Vola .A/ D 0
if A is singular. For any nontrivial volume function, i.e., for any function
Vola with a ¤ 0, we observed in the course of the proof of Theorem 3.1.7
that, for any nonsingular matrix A, Vola .A/ D c Vola .I / D ca for some
c ¤ 0.
Remark 3.1.11. Let us give a heuristic argument as to why Corollary
3.1.10 should be true, from a geometric viewpoint. Let A D Œv1 j    j vn 
be an n-by-n matrix. Then vi D Aei D TA .ei /, i D 1; : : : ; n, where I D
Œe1 j    j en . Thus the n-parallelogram P spanned by the columns of A
is the image of the unit n-cube under the linear transformation TA , and the
determinant of A is the signed volume of P .
If det.A/ ¤ 0, i.e., if P has nonzero volume, then the translates of P
“fill up” F n , and so for any w 2 F n , there is a v 2 F n with TA .v/ D
Av D w. Thus in this case TA is onto F n , and hence is an isomorphism by
Corollary 1.3.2, so A is invertible.
If det.A/ D 0, i.e., if P has zero volume, then it is a degenerate nparallelogram, and so is a nondegenerate k-parallelogram for some k < n,

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 64 — #78

i

i

64

Guide to Advanced Linear Algebra

and its translates only “fill up” a k-dimensional subspace of F n . Thus in
this case TA is not onto F n , and hence A is not invertible.
Þ
Remark 3.1.12. Another well-known and important property of determinants, that we shall prove in Theorem 3.3.1, is that for any two n-by-n
matrices A and B, det.AB/ D det.A/ det.B/. Let us also give a heuristic
argument as to why this should be true, again from a geometric viewpoint.
But we need to change our viewpoint slightly, from a “static” one to a “dynamic” one. In the notation of Remark 3.1.11,


det v1 j    j vn D det.A/ D det.A/  1 D det.A/ det.I /


D det.A/ det e1 j    j en :

We then think of the determinant of A as the factor by which the linear
transformation TA multiplies signed volume when it takes the unit n-cube
to the n-parallelogram P . A linear transformation is homogeneous in that it
multiplies each “bit” of signed volume by the same factor. That is, if instead
of starting with I we start with any n-parallelogram J and take its image Q
under the linear transformation TA , the signed volume of Q will be det.A/
times the signed volume of J .
To apply this we begin with the linear transformation TB and let J be
the n-parallelogram that is the image of I under TB .
In going from I to J , i.e., in taking the image of I under TB , we multiply signed volume by det.B/, and in going from J to Q, i.e., in taking the image of J under TA , we multiply signed volume by det.A/, so
in going from I to Q, i.e., in taking the image of I under TA ı TB , we
multiply signed volume by det.A/ det.B/. But TA ı TB D TAB , so TAB
takes I to Q, and so TAB multiplies signed volume by det.AB/. Hence,
det.AB/ D det.A/ det.B/.
Þ

Remark 3.1.13. The fact that the determinant is the factor by which linear transformations multiply signed volume is the reason for the appearance
of the Jacobian in the transformation formula for multiple integrals.
Þ
We have carried our argument this far in order to show that we can obtain the existence of the determinant purely from the geometric viewpoint.
In the next section we present an algebraic viewpoint, which only uses our
work up through Theorem 3.1.4. We use this second viewpoint to derive the
results of Section 3.3. But we note that the formula for the determinant we
have obtained in Theorem 3.1.4 is a special case of the Laplace expression
of Theorem 3.3.6. (The geometric viewpoint is simpler, but the algebraic
viewpoint is technically more useful, which is why we present both.)

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 65 — #79

i

i

3.2. Existence and uniqueness of determinants

3.2

65

Existence and uniqueness
of determinants

We now present a more traditional approach to the determinant.
Lemma 3.2.1. Let Vn;m D fmultilinear functions f W Mn;m .F / ! F g.
Then Vm;n is a vector space of dimension nm with basis ff g, where  W
f1; : : : ; mg ! f1; : : : ; ng is any function and, if A D .aij /,
f .A/ D a.1/;1 a.2/;2 : : : a.m/;m :
Proof. We proceed by induction on m. Let m D 1. Then, by multilinearity,
f 2 Vn;1 is given by
0
2 3
2 3
2 31
02 31
1
0
0
a11
B
607
617
607C
B6a21 7C
B
6 7
6 7
6 7C
B6 7C
B
6 7
6 7
6 7C
f B6 :: 7C D f Ba11 607 C a21 607 C    C an1 607C
B
6 :: 7
6 :: 7
6 :: 7C
@4 : 5A
@
4:5
4:5
4 : 5A
an1
0
0
1
D c11 a11 C    C cn1 an1

where c11 D f .e1 /; : : : ; cn1 D f .en /, and the lemma holds.
Now for the inductive step. Assume the lemma holds for m and consider
f 2 Vn;mC1 . Let A 2 Mn;mC1 and write A0 for the n-by-m submatrix of A
consisting of the first m columns of A. Then, by multilinearity,
3 31
02 ˇ 2
ˇ a1mC1
ˇ
B6 ˇ 6 : 7 7C
f @4A0 ˇ 4 :: 5 5A
ˇ
ˇ anmC1
D a1mC1 f



A0 j e1



C    C anmC1 f

 0

A j en :

But g.A0 / D f .ŒA0 j ei / is a multilinear function on m-by-n matrices, so
P
by induction g.A0 / D
c0 f0 .A0 / where  0 W f1; : : : ; mg ! f1; : : : ; ng,
and so we see that
f .A/ D
D

n
X
i D1

n
X
i D1

c 0 f




A0 j e1 a0 .1/;1    a0 .m/;m ai;mC1

c a.1/;1    a.mC1/;mC1

where  W f1; : : : ; m C 1g ! f1; : : : ; ng is given by .k/ D  0 .k/ for
1  k  m, and .m C 1/ D i , and the lemma holds.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 66 — #80

i

i

66

Guide to Advanced Linear Algebra

We now specialize to the case m D n. In this case, Vol, being a multilinear function, is a linear combination of basis elements. We have not used
the condition of alternation yet. We do so now, in two stages.
We let P0 be the n-by-n matrix defined by P0 D .pij / where pij D 1
if i D 0.j / and pij D 0 if i ¤ 0 .j /. P0 has exactly one nonzero entry
in each column: an entry of 1 in row 0 .j / of column j . We then observe
that if
f .A/ D

X


c  a.1/;1    a.n/;n ;

then f .P0 / D c0 . For if  D 0 then each factor p.j /;j is 1, so the
product is 1, but if  ¤ 0 then some factor P.j /;j is 0, so the product is 0.
Lemma 3.2.2. Let f 2 Vn;n be alternating and write
f .A/ D

X


c a.1/;1    a.n/;n

where  W f1; : : : ; ng ! f1; : : : ; ng. If 0 is not 1-to-1, then c0 D 0.
Proof. Suppose 0 is not 1-to-1. As we have observed, f .P0 / D c0 . But
in this case P0 is a matrix with two identical columns (columns j1 and
j2 where 0 .j1 / D 0 .j2 /), so by the definition of alternation, f .P0 / D
0.
We restrict our attention to 1-1 functions  W f1; : : : ; ng ! f1; : : : ; ng.
We denote the set of such functions by Sn , and elements of this set by . Sn
forms a group under composition of functions, as any  2 Sn is invertible.
Sn is known as the symmetric group, and  2 Sn is a permutation. (We
think of  as giving a reordering of f1; : : : ; ng as f.1/; : : : ; .n/g.)
We now cite some algebraic facts without proof. A transposition is an
element of Sn that interchanges two elements of f1; : : : ; ng and leaves all
the others fixed. (More formally,  2 Sn is a transposition if for some 1 
i ¤ j  n, .i / D j , .j / D i , .k/ D k for k ¤ i; j .) Every element
of Sn can be written as a product (i.e., composition) of transpositions. If 
is the product of t transpositions, we define its sign by sign./ D . 1/t .
Though t is not well-defined, sign./ is well-defined, i.e., if  is written
as a product of t1 transpositions and as a product of t2 transpositions, then
t1  t2 .mod 2/.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 67 — #81

i

i

3.2. Existence and uniqueness of determinants

67

Lemma 3.2.3. Let f 2 Vn;n be alternating and write
X
f .A/ D
c a .1/;1    a .n/;n :
 2Sn

Then f .P0 / D sign.0 /f .I /.
Proof. The matrix P0 is obtained by starting with I and performing t interchanges of pairs of columns, where 0 is the product of t transpositions,
and the only term in the sum that contributes is when  D 0 , so the lemma
follows from Lemma 3.1.2(3).
Theorem 3.2.4. Any multilinear, alternating function Vol W Mn .F / ! F is
given by
0
1
X
Vol.A/ D Vola .A/ D a @
sign./a .1/;1    a .n/;n A
 2Sn

for some a 2 F , and every function defined in this way is multilinear and
alternating.
Proof. We have essentially already shown the first part. Let a D f .I /.
Then by Lemma 3.2.3, for every  2 Sn , c D a sign./.
It clearly suffices to verify the second part when a D 1. Suppose A D
Œv1 j    j vn  and vi D vi0 C vi00 . Let
2 3
2 3
2 3
a1i
b1i
c1i
6 :: 7
6 :: 7
6 :: 7
0
00
vi D 4 : 5 ;
vi D 4 : 5 ; and vi D 4 : 5 ;
ani
bni
cni
so aki D bki C cki .
Then
X
sign./a .1/;1    a .i /;i    a .n/;n
 2Sn

D
D

X

 2Sn

X

 2Sn

C


sign./a .1/;1    b .i /;i C c .i /;i    a .n/;n
sign./a .1/;1    b .i /;i    a .n/;n

X

 2Sn

sign./a .1/;1    c .i /;i    a .n/;n ;

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 68 — #82

i

i

68

Guide to Advanced Linear Algebra

showing multilinearity. Suppose columns i and j of A are equal, and let
 2 Sn be the transposition that interchanges i and j . To every  2 Sn we
can associate  0 D  2 Sn , and  is associated to  0 as  2 is the identity,
and hence  D  2  D  0 . Write this association as  0  . Then
X
sign./a .1/;1    a .i /;i    a .j /;j    a .n/;n
 2Sn

D

X

  0

sign./a .1/;1    a .i /;i    a .j /;j    a .n/;n

C sign. 0 /a 0 .1/;1    a 0 .i /;i    a 0 .j /;j    a 0 .n/;n :

But sign./ D sign. 0 / and the two products of elements are equal because columns i and j of A are identical, so the terms cancel in pairs and
the sum is 0, showing alternation.
Definition 3.2.5. The function det W Mn .F / ! F , given by
X
det.A/ D
sign./a .1/;1    a .n/;n
 2Sn

is the determinant function.

3.3

Þ

Further properties

We now derive some important properties of the determinant.
Theorem 3.3.1. Let A; B 2 Mn .F /. Then
det.AB/ D det.A/ det.B/:
Proof. Define a function f W Mn .F / ! F by f .B/ D det.AB/. It is
straightforward to check that f is multilinear and alternating, so f is a volume function f .B/ D Vola .B/ D a det.B/ where a D f .I / D det.AI / D
det.A/.
Corollary 3.3.2. (1) det.A/ ¤ 0 if and only if A is invertible.
(2) If A is invertible, then det.A 1 / D 1= det.A/. Furthermore, for any
matrix B, det.ABA 1 / D det.B/.
Proof. We have already seen in Lemma 3.1.2 that for any volume function
f , f .A/ D 0 if A is not invertible. If A is invertible we have 1 D det.I / D
det.AA 1 / D det.A/ det.A 1 / from which the corollary follows.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 69 — #83

i

i

3.3. Further properties

69

Lemma 3.3.3. (1) Let A be a diagonal matrix. Then det.A/ is the product
of its diagonal entries.
(2) More generally, let A be an upper triangular, or a lower triangular,
matrix. Then det.A/ is the product of its diagonal entries.
Proof. (1) If A is diagonal, then there is only one nonzero term in Definition 3.2.5, the term corresponding to the identity permutation (.i / D i for
every i ), which has sign C1.
(2) If  is not the identity then there is a j with .j / < j , and a k
with .k/ > k, so for a triangular matrix there is again only the diagonal
term.
Theorem 3.3.4. (1) Let M be a block diagonal matrix,


A 0
M D
:
0 D
Then det.M / D det.A/ det.D/.
(2) More generally, let M be a block upper triangular or a block lower
triangular matrix,




A B
A 0
M D
or M D
:
0 D
C D
Then det.M / D det.A/ det.D/.
Proof. (1) Define a function f W Mn .F / ! F by


A 0
f .D/ D det
:
0 D
Then f is multilinear and alternating, so f .D/ D f .I / det.D/. But f .I / D


det A0 I0 D det.A/. (This last equality is easy to see as any permutation


that contributes nonzero to det A0 I0 must fix all but (possibly) the first n
entries.)
(2) Suppose M is upper triangular (the lower triangular case is similar).
If A is singular then there is a vector v ¤ 0 with Av D 0. Then let w be the
vector whose first n entries are that of v and whose remaining entries are 0.
Then M w D 0. Thus M is singular as well, and 0 D 0  det.D/.
Suppose that A is nonsingular. Then

 


A B
A 0 I A 1B
D
:
0 D
0 D 0 I

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 70 — #84

i

i

70

Guide to Advanced Linear Algebra

The first matrix on the right-hand side has determinant det.A/ det.D/, and
the second matrix on the right-hand side has determinant 1, as it is upper
triangular, and the theorem follows.
Lemma 3.3.5. Let t A be the matrix obtained from A by interchanging the
rows and columns of A. Then det.t A/ D det.A/.
Proof.

For any  2 Sn , sign. 1 / D sign./. Let B D .bij / D t A. Then
X
det.A/ D
sign./a .1/;1    a .n/;n
 2Sn

D
D
D

X

sign./a1;

1 .1/

 2Sn

X

sign.

1

/a1;

 2Sn


X

1 2S

sign.

1

/b

   an;

1 .1/

1 .n/

   an;

1 .1/;1

   b

1 .n/

1 .n/;n

n

D det.t A/:



Let Aij denote the .i; j /-minor of the matrix A, the submatrix obtained
by deleting row i and column j of A.
Theorem 3.3.6 (Laplace expansion). Let A be an n-by-n matrix, A D .aij /.
(1) For any i ,
det.A/ D

n
X

j D1


. 1/i Cj aij det Aij :

n
X


. 1/i Cj aij det Aij :

(2) For any j ,
det.A/ D

i D1

(3) For any i , and for any k ¤ i ,
0D

n
X


. 1/i Cj akj det Aij :

j D1

(4) For any j , and for any k ¤ j ,
0D

n
X
i D1


. 1/i Cj ai k det Aij :

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 71 — #85

i

i

3.3. Further properties

71

Proof. We prove (1) and (3) simultaneously, so we fix k (which may or may
not equal i ).
The sum on the right-hand side is the sum of multilinear functions so is
itself multilinear. (This is also easy to see directly.)
We now show it is alternating. Let A be a matrix with columns p and q
equal, where 1  p < q  n. If j ¤ p; q then Aij is a matrix with two
columns equal, so det.Aij / D 0. Thus the only two terms that contribute to
the sum are


. 1/i Cp akp det Aip C . 1/i Cq akq det Ai q :
By hypothesis, akq D akp . Now


Aip D v1 j    j vp 1 j vpC1 j    j vq 1 j vq j vqC1 j    j vn ;


Ai q D v1 j    j vp 1 j vp j vpC1 j    j vq 1 j vqC1 j    j vn :

where vm denotes column m of the matrix obtained from A by deleting
row i of A. By hypothesis, vp D vq , so these two matrices have the same
columns but in a different order. We get from the first of these to the second
by successively performing q p 1 column interchanges (first switching
vq and vq 1 , then switching vq and vq 2 , . . . , and finally switching vq and
vpC1 ), so det.Ai q / D . 1/q p 1 det.Aip /. Thus we see that the contribution of these two terms to the sum is


. 1/i Cp akp det Aip C . 1/i Cq akp . 1/q p 1 det Aip
and since . 1/i Cp and . 1/i C2q p 1 always have opposite signs, they cancel.
By our uniqueness result, the right-hand side is a multiple a det.A/ for
some a. A computation shows that if A D I , the right-hand side gives 1 if
k D i and 0 if k ¤ i , proving the theorem in these cases.
For cases (2) and (4), using the fact that det.B/ D det.t B/ for any
matrix B, we can take the transpose of these formulas and use cases (1) and
(3).

Remark 3.3.7. Theorem 3.3.6(1) (respectively, (3)) is known as expansion by minors of the j th column (respectively, of the i th row).
Þ
Definition 3.3.8. The classical adjoint of A is the matrix Adj.A/ defined by Adj.A/ D .bij / where bij D . 1/i Cj det.Aj i /.
Þ
Note carefully the subscript in the definition—it is Aj i , as written, not
Aij .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 72 — #86

i

i

72

Guide to Advanced Linear Algebra

Corollary 3.3.9. (1) For any matrix A,


Adj.A/ D A Adj.A/ D det.A/I:

(2) If A is invertible,

A

1

D

1
Adj.A/:
det.A/

Proof. (1) can be verified by a computation that follows directly from Theorem 3.3.6. Then (2) follows immediately.
Remark 3.3.10. We have given the formula in Corollary 3.3.9(2) for its
theoretical interest (and we shall see some applications of it later) but as
a practical matter it should almost never be used to find the inverse of a
matrix.
Þ
Corollary 3.3.11 (Cramer’s rule). Let A be an invertible n-by-n matrix and
n
n
let b be
 xa1 vector in F . Let x be the unique vector in F with Ax D b. Write
:
x D :: . Then, for 1  i  n, xi D det.Ai .b//= det.A/, where Ai .b/ is
xn

the matrix obtained from A by replacing its i th column by b.

Proof. Let the columns of A be a1 ; : : : ; an . By linearity, it suffices to prove
the corollary for all elements of any basis B of F n . We choose the basis
B D fa1 ; : : : ; an g.
Fix i and consider Ax D ai . Then Ai .ai / D A, so the above formula
gives xi D 1. For j ¤ i , Ai .aj / is a matrix with two identical columns, so
the above formula gives xj D 0. Thus x D ei , the i th standard basis vector,
and indeed Aei D ai .
Remark 3.3.12. Again this formula is of theoretical interest but should
almost never be used in practice.
Þ
Here is a familiar result from elementary linear algebra.
Definition 3.3.13. If the matrix A has a a k-by-k submatrix with
nonzero determinant, but does not have a .k C 1/-by-.k C 1/ submatrix
with nonzero determinant, then the determinantal rank of A is k.
Þ
Theorem 3.3.14. Let A be a matrix. Then the row rank, column rank, and
determinantal rank of A are all equal.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 73 — #87

i

i

3.3. Further properties

73

Proof. We showed that the row rank and column rank of A are equal in
Theorem 2.4.7. We now show that the column rank of A is equal to the
determinantal rank of A.
Write A D Œv1 j    j vn , where A is m-by-n. Let A have a k-by-k
submatrix B with nonzero determinant. For simplicity, we assume that B is
the upper left-hand corner of A. Suppose B is k-by-k. Let  W F m ! F k
be defined by
02

31 2 3
a1
a1
B6 :: 7C 6 :: 7
 @4 : 5A D 4 : 5 :
am

ak

Then B D Œ.v1 / j    j .vk /. Since det.B/ ¤ 0, B is nonsingular,
so f.v1 /; : : : ; .vk /g is linearly independent, and hence fv1 ; : : : ; vk g is
linearly independent. But then this set spans a k-dimensional subspace of
the column space of A, so A has column rank at least k.
On the other hand, suppose A has k linearly independent columns.
Again, for simplicity, suppose these are the leftmost k columns of A. Now
fv1 ; : : : ; vk g is linearly independent and fe1 ; : : : ; em g spans F m , so
fv1 ; : : : ; vk ; e1; : : : ; em g spans F m as well. Then, by Theorem 1.2.9, there is
a basis B of F m with fv1 ; : : : ; vk g  B  fv1 ; : : : ; vk ; e1; : : : ; em g. Write
B D fv1 ; : : : ; vk ; vkC1 ; : : : ; vm g and note that, for each i  k C 1, vi D ej
for some j . Form the matrix B 0 D Œv1 j    j vk j vkC1 j    j vn  and
note that det.B 0 / ¤ 0. Expand by minors of columns n; n 1; : : : ; k C 1 to
obtain 0 ¤ det.B 0 / D ˙ det.B/ where B is a k-by-k submatrix of A, so A
has determinantal rank at least k.
We have defined the determinant for matrices. We can define the determinant for linear transformations T W V ! V , where V is a finitedimensional vector space.
Definition 3.3.15. Let T W V ! V be a linear transformation with V
a finite-dimensional vector space. The determinant det.T / is defined to be

det.T / D det Œ.T /B  where B is any basis of V .
Þ

To see that this is well-defined we have to know that it is independent of
the choice of the basis B. That follows immediately from Corollary 2.3.11
and Corollary 3.3.2(2).
We have defined the general linear groups GLn .F / and GL.V / in Definition 1.1.29.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 74 — #88

i

i

74

Guide to Advanced Linear Algebra

Lemma 3.3.16. GLn .F / D fA 2 Mn .F / j det.A/ ¤ 0g. For V finite
dimensional,
˚
GL.V / D T W V ! V j det.T / ¤ 0 :

Proof. Immediate from Corollary 3.3.2.

We can now make a related definition.
Definition 3.3.17. The special linear group SLn .F / is the group
˚
SLn .F / D A 2 GLn .F / j det.A/ D 1 :

For V finite dimensional,

˚
SLn .V / D T 2 GL.V / j det.T / D 1 :

Þ

Theorem 3.3.18. (1) SLn .F / is a normal subgroup of GLn .F /.
(2) For V finite dimensional, SL.V / is a normal subgroup of GL.V /.
Proof. SLn .F / is the kernel of the homomorphism det W GLn .F / ! F  ,
and similarly for SL.V /. (By Theorem 3.3.1, det is a homomorphism.) Here
F  denotes the multiplicative group of nonzero elements of F .

3.4

Integrality

While we almost exclusively work over a field, it is natural to ask the question of integrality, and we consider that here.
Let R be an integral domain with quotient field F . An element u of R
is a unit if there is an element v of R with uv D vu D 1. (The reader
unfamiliar with quotient fields can simply take R D Z and F D Q, and
note that the units of Z are ˙1.)
Theorem 3.4.1. Let A be an n-by-n matrix with entries in R and suppose
that it is invertible, considered as a matrix with entries in F . The following
are equivalent:
(1) A

1

has entries in R.

(2) det.A/ is a unit in R.
(3) For every vector b all of whose entries are in R, the unique solution of
Ax D b is a vector all of whose entries are in R.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 75 — #89

i

i

3.4. Integrality

75

Proof. First we show that (1) and (3) are equivalent and then we show that
(1) and (2) are equivalent.
Suppose (1) is true. Then the solution of Ax D b is x D A 1 b,
whose entries are in R. Conversely, suppose (3) is true. Let Axi D ei ,
i D 1; : : : ; n, where fei g is the set of standard unit vectors in F n . Form the
matrix B D Œx1 j x2 j    j xn . Then B is a matrix all of whose entries are
in R, and AB D I , so B D A 1 by Corollary 1.3.3.
Suppose (1) is true. Let det.A/ D u and det.A 1 / D v. Then u and
v are elements of R and uv D det.A/ det.A 1 / D det.I / D 1, so u is a
unit in R. Conversely, suppose (2) is true, so det.A/ D u is a unit in R. Let
uv D 1 with v 2 R, so v D 1=u. Then Corollary 3.3.9(2) shows that all of
the entries of A 1 are in R.
Remark 3.4.2. Let A be an n-by-n matrix with entries in R and suppose that A is invertible, considered as a matrix with entries in F . Let
d D det.A/.
(1) If b is a vector in Rn all of whose entries are divisible by d , then
x D A 1 b, the unique solution of Ax D b, has all its entries in R.
(2) This condition on the entries of b is sufficient but not necessary. It is
possible to have a vector b whose entries are not all divisible by d with the
solution ofAx D b having all its entries in R. For example,
 let
 R D Z and
1 1 , a matrix of determinant 2. Then Ax D 1 has solution
take A
D
1
  13
x D 10 . (By Theorem 3.4.1, if d is not a unit, this is not possible for all
b.)
Þ
We can now generalize the definitions of GLn .F / and SLn .F /.
Definition 3.4.3. The general linear group GLn .R/ is defined by
˚
GLn .R/ D A 2 Mn .R/ j A has an inverse in Mn .R/ :

Þ

Corollary 3.4.4.

˚
GLn .R/ D A 2 Mn .R/ j det.A/ is a unit in R :

Definition 3.4.5. The special linear group SLn .R/ is defined by
˚
SLn .R/ D A 2 GLn .R/ j det.A/ D 1 :

Þ

Lemma 3.4.6. SLn .R/ is a normal subgroup of GLn .R/.

Proof. SLn .R/ is the kernel of the determinant homomorphism.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 76 — #90

i

i

76

Guide to Advanced Linear Algebra

Remark 3.4.7. If R D Z, the units in R are f˙1g. Thus SLn .Z/ is a
subgroup of index 2 of GLn .Z/.
Þ
It follows from our previous work that for any nonzero vector v 2 F n
there is an invertible matrix A with Ae1 D v (where e1 is the first vector in
the standard basis of F n ). One can ask the same question over the integers:
Given a nonzero vector v 2 Zn , is there a matrix A with integer entries, invertible as an integer matrix, with Ae1 D v? There is an obvious necessary
condition, that the entries of v be relatively prime. This condition turns out
to be sufficient. We prove a slightly more precise result.
 a1 
:
Theorem 3.4.8. Let n  2 and let v D :: be a nonzero vector with intean

gral entries. Let d D gcd.a1 ; : : : ; an /. Then there is a matrix A 2 SLn .Z/
with A.de1 / D v.

Proof. We proceed by induction on n. We begin with n D 2. If d D
gcd.a1 ; a2 /, let a10 D a1 =d and b10 D b1 =d . Then there are integers p
and q with a10 p C a20 q D 1. Set
 0

a
q
A D 10
:
a2 p
Suppose the theorem is true for n 1, and consider v 2 Zn . It is easy
to see that the theorem is true if a1 D    D an 1 D 0, so suppose not. Let
d0 D gcd.a1 ; : : : ; an 1 /. Then d D gcd.d0 ; an /. By the proof of the n D 2
case, there is an n-by-n matrix A1 with
2 3
d0
607
7
 6
6:7
A1 de1 D 6 :: 7 :
6 7
405
an

(A1 has suitable entries in its “corners” and an .n 2/-by-.n 2/ identity
matrix in its “middle”.) By the inductive assumption, there is an n-by-n
matrix A2 with
02 31
d0
2 3
B6 0 7C
a1
B6 7C
B6 :: 7C 6 :: 7
A2 B6 : 7C D 4 : 5 :
B6 7C
@4 0 5A
an
an

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 77 — #91

i

i

3.4. Integrality

77

(A2 is a block diagonal matrix with a suitable .n 1/-by-.n 1/ matrix in
its upper left-hand corner and an entry of 1 in its lower right-hand corner.)
Set A D A2 A1 .
 a1 
:
Corollary 3.4.9. Let n  2 and let v D :: be a nonzero vector with
an

integer entries, and suppose that fa1 ; : : : ; an g is relatively prime. Then there
is a matrix A 2 SLn .Z/ whose first column is v.
Proof. A is the matrix constructed in the proof of Theorem 3.4.8.
Let Z=N Z denote the ring of integers mod N . We have the map Z !
Z=N Z by a 7! a .mod N /. This induces a map on matrices as well.
Theorem 3.4.10. For every n  1, the map ' W SLn .Z/ ! SLn .Z=N Z/
given by the reduction of entries .mod N / is an epimorphism.

Proof. We prove the theorem by induction on n. For n D 1 it is obvious.
Suppose n > 1. Let M 2 SLn .Z=N Z/ be arbitrary. Then there is
certainly a matrix M with integer entries with '.M / D M , and then
det.M /  1 .mod
 a1N /. But this is not good enough. We need det.M / D 1.
::
Let v1 D
: be the first column of M . Then M 2 SLn .Z=N Z/
an

implies gcd.a1 ; : : : ; an ; N / D 1.
Let d D gcd.a1 ; : : : ; an /. Then d and N are relatively prime. By Theorem 3.4.8, there is a matrix A 2 SLn .Z/ with AM a matrix of the form
2
3
d
60
7
7
AM D 6
4 ::: w2    wn5 :
0

If d D 1 we may set M1 D M , B D I , and P D AM D BAM1 .
Otherwise, let L be the matrix with an entry of N in the .2; 1/ position and
all other entries 0. Let M1 D M C A 1 L. Then
2
3
d
6N
7
6
7
60
7
AM1 D 6 w2    wn7
6 ::
7
5
4:
0
and M1  M .mod N /.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 78 — #92

i

i

78

Guide to Advanced Linear Algebra

As in the proof of Theorem 3.4.8, we choose integers p and q with
dp C N q D 1. Let E be the 2-by-2 matrix


p q
ED
N d
and let B be the n-by-n block matrix


E 0
BD
:
0 I
Then P D BAM1 is of the form
2

3
1
7
60
7
P D6
4 ::: u2    un5 :
0

Write P as a block matrix

P D




1X
:
0U

Then det.P /  det.M /  1 .mod N /, so det.U /  1 .mod N /. U is an
.n 1/-by-.n 1/ matrix, so by the inductive hypothesis there is a matrix
V 2 SLn 1 .Z/ with V  U .mod N /. Set


1X
QD
:
0 V
Then Q 2 SLn .Z/ and
Q  P D BAM1  BAM .mod N /:
Thus
R D .BA/

1

Q 2 SLn .Z/

and

R  M .mod N /;

i.e., '.R/ D '.M / D M , as required.

3.5

Orientation

We now study orientations of real vector spaces, where we will see the
geometric meaning of the sign of the determinant. Before we consider orientation per se it is illuminating to study the topology of the general linear
group GLn .R/, the group of invertible n-by-n matrices with real entries.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 79 — #93

i

i

3.5. Orientation

79

Theorem 3.5.1. The general linear group GLn .R/ has two components.
Proof. We have the determinant function det W Mn .R/ ! R. Since a matrix
is invertible if and only if its determinant is nonzero,
GLn .R/ D det

1

.R

f0g/:

Now R f0g has two components, so GLn .R/ has at least two components,
fmatrices with positive determinantg and fmatrices with negative determinantg. We will show that each of these two sets is path-connected. (Since
GLn .R/ is an open subset of Euclidean space, components and path components are the same.)
We know that every nonsingular matrix can be transformed to the identity matrix by left-multiplication by a sequence of elementary matrices, that
have the effect of performing a sequence of elementary row operations. (We
could equally well right-multiply and perform column operations with no
change in the proof.) We will consider a variant on elementary row operations, namely operations of the following type:
(1) Left multiplication by a matrix
2

1

6 1 a
eD6
E
6
::
4
:

3
7
7
7
5

1

with a in the .i; j / position, which has the effect of adding a times row j
to row i . (This is a usual row operation.)
(2) Left multiplication by a matrix
2
1
6 1
6
6
::
6
:
eD6
E
6
6
c
6
6
::
:
4

3

1

7
7
7
7
7
7
7
7
7
5

with c > 0 in the .i; i / position, which has the effect of multiplying row i
by c. (This is a usual row operation, but here we restrict c to be positive.)

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 80 — #94

i

i

80

Guide to Advanced Linear Algebra
(3) Left multiplication by a matrix
2
1
6 ::
6
:
6
6
0
1
6
6
1
6
6
::
6
:
eD6
E
6
6
1
6
6
1
0
6
6
1
6
6
::
4
:

3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5

1

with 1 in the .i; j / position and 1 in the .j; i / position, which has the
effect of replacing row i by row j and row j by the negative of row i . (This
differs by a sign from a usual row operation, which replaces each of these
two rows by the other.)
There is a path in GLn .R/ connecting the identity to each of these elee
ments E.
In case (1), we have the path given by

3
2
1
7
e D6
E.t/
4 : : : ta 5
1

for 0  t  1.
In case (2), we have the path given by
2

3

1

6 :
6 ::
6

6
exp t ln.c/
6
e
E.t/ D 6
6
1
6
6
::
:
4

1

7
7
7
7
7
7
7
7
7
5

for 0  t  1.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 81 — #95

i

i

3.5. Orientation

81

In case (3), we have the path given by
2

3

1

6 :
6 ::
6


6
cos t=2
sin t=2
6
6
::
e D6
E.t/
:
6


6
6
sin t=2
cos t=2
6
6
::
:
4

1

7
7
7
7
7
7
7
7
7
7
7
7
5

for 0  t  1.
Now let A be an invertible matrix and suppose we have a sequence
of elementary row operations that reduces A to the identity, so that
ei
Ek    E2 E1 A D I . Replacing each Ei by the corresponding matrix E
e
e
e
we see that E k    E 1 A D I is a matrix differing by I in at most the sign of
its entries, i.e., e
I is a diagonal matrix with each diagonal entry equal to ˙ 1.
e1 .t/A gives a path from A to E
e1 A; as
As t goes from 0 to 1, the product E
e
e
e
e
e
t goes from 0 to 1, E 2 .t/E 1 A gives a path from E 1 A to E 2 E 1 A, and so
forth. In the end we have path from A to e
I , so A and e
I are in the same path
component of GLn .R/. Note that A and e
I have determinants with the same
sign. Thus there are two possibilities:
(1) A has a positive determinant. In this case e
I has an even number of
1 entries on the diagonal, which can be paired. Suppose there is a pair of
e is the appropriate matrix of
1 entries in positions .i; i / and .j; j /. If E
e 2e
type (3), E
I will be a matrix of the same form as e
I , but with both of these
entries equal to C1 and the others unchanged. As above, we have a path
e2e
from e
I to E
I . Continue in this fashion to obtain a path from e
I to I , and
hence a path from A to I . Thus A is in the same path component as I .
(2) A has a negative determinant. In this case e
I has an odd number of
1 entries. Proceeding as in (1), we pair up all but one of the 1 entries
to obtain a path from e
I to a diagonal matrix with a single 1 entry on
the diagonal and all other diagonal entries equal to 1. If the 1 entry is in
the .1; 1/ position there is nothing more to do. If it is in the .i; i / position
for i ¤ 1 (and hence the entry in the .1; 1/ position is 1) we apply an
e of type (3) to obtain the diagonal matrix with 1 as
appropriate matrix E
the first entry on the diagonal and all other entries equal to 1, and hence a
path from A to this matrix, which we shall denote by I . Thus in this case
A is in the same path component as I .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 82 — #96

i

i

82

Guide to Advanced Linear Algebra

We now come to the notion of an orientation of a real vector space. We
assume V is finite dimensional and dim.V / > 0.
Definition 3.5.2. Let B D fv1; : : : ; vn g and C D fw1; : : : ; wn g be
two bases of the n-dimensional real vector space V . Then B and C give
the same orientation of V if the change of basis matrix PC B has positive
determinant, while they give opposite orientations of V if the change of
basis matrix PC B has negative determinant.
Þ
Remark 3.5.3. It is easy to check that “giving the same orientation” is an
equivalence relation on bases. It then follows that we can regard an orientation on a real vector space (of positive finite dimension) as an equivalence
class of bases of V , and there are two such equivalence classes.
Þ
In general, there is no preferred orientation on a real vector space, but
in one very important special case there is.
Definition 3.5.4. Let B D fv1 ; : : : ; vn g be a basis of Rn . Then B
gives the standard orientation of Rn if B gives the same orientation as the
standard basis E of Rn . Otherwise B gives the nonstandard orientation of
Rn .
Þ
Remark 3.5.5. (1) E itself gives the standard orientation of Rn as PE E
D I has determinant 1.
(2) The condition in Definition 3.5.4 can be phrased more simply. By
Remark 2.3.6(1), PE B is the matrix PE B D Œv1 j v2 j    j vn . So B
gives the standard orientation of Rn if det.PE B / > 0 and the nonstandard
orientation of Rn if det.PE B / < 0.
(3) In Definition 3.5.4, recalling that PC B D .PE C / 1 PE B , we
see that B and C give the same orientation of Rn if the determinants of the
matrices Œv1 j v2 j    j vn  and Œw1 j w2 j    j wn  have the same sign and
opposite orientations if they have opposite signs.
Þ
Much of the significance of the orientation of a real vector space comes
from topological considerations. We continue to let V be a real vector space
of finite dimension n > 0, and we choose a basis B0 of V . For any basis
C of V we have a map f0 W fbases of V g ! GLn .R/ given by f0 .C/ D
PB0 C . (If C D fw1 ; : : : ; wng then f0 .C/ is the matrix ŒŒw1B0 j    j
Œwn B0 .) This map is 1-1 and onto. We then give fbases of V g a topology
by requiring that f0 be a homeomorphism. That is, we define a subset O of
fbases of V g to be open if and only if f0 .O/ is an open subset of GLn .R/.
A priori, this topology depends on the choice of B0 , but in fact it does
not. For if we choose a different basis B1 and let f1 .C / D PB1 C , then

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 83 — #97

i

i

3.5. Orientation

83

f1 .C / D Pf0 .C/ where P is the constant matrix P D PB1 B0 , and
multiplication by the constant matrix P is a homeomorphism from GLn .R/
to itself.
We then have:
Corollary 3.5.6. Let V be an n-dimensional real vector space and let B
and C be two bases of V . Then B and C give the same orientation of V if
and only if B can continuously be deformed to C, i.e., if and only if there
is a continuous function p W Œ0; 1 ! fbases of V g with p.0/ D B and
p.1/ D C.
Proof. The bases B and C of V give the same orientation of V if and only
if PC B has positive determinant, and by Theorem 3.5.1 this is true if and
only if there is a path in GLn .R/ joining I to PC B .
To be more explicit, let p W Œ0; 1 ! GLn .R/ with p.0/ D I and
p.1/ D PC B . For any t between 0 and 1, let B t be the basis defined by
PB t B D p.t/. Then B0 D B and B1 D C.
That there is no corresponding analog of orientation for complex vector
spaces. This is a consequence of the following theorem.
Theorem 3.5.7. The general linear group GLn .C/ is connected.
Proof. We show that it is path connected (which is equivalent as GLn .C/
is an open subset of Euclidean space). The proof is very much like the proof
of Theorem 3.5.1, but easier. We show that there are paths joining the identity matrix to the usual elementary matrices.
(1) For
2
3
1
6 1
a 7
7
6
ED6
7
::
5
4
:
1
we have
2
1
6 1
at
6
p.t/ D 6
:
::
4

3
7
7
7
5

1

with a t D ta.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 84 — #98

i

i

84

Guide to Advanced Linear Algebra
(2) For
2
1
6 :
6 ::
6
6
1
6
6
E D6
c
6
6
1
6
6
::
4
:

3

1

7
7
7
7
7
7
7
7
7
7
7
5

with c D r e i ;

we have
2
1
6 1
6
6
::
6
:
6
6
p.t/ D 6
ct
6
6
1
6
6
::
:
4

3

1

7
7
7
7
7
7
7
7
7
7
7
5

with c t D e t ln.r /e t i :

(3) For
2

6
6
6
6
6
6
6
6
6
6
ED6
6
6
6
6
6
6
6
6
6
4

3

1
::

:
1
0

1
1
::

:
1

1

0
1
::

:
1

7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 85 — #99

i

i

3.5. Orientation

85

we have
2

6
6
6
6
6
6
6
6
6
6
p.t/ D 6
6
6
6
6
6
6
6
6
6
4

3

1
::

:
1
at

bt
1
::

:
1

ct

dt
1
::

:
1

7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5

with




 
at bt
cos  t=2 e  i t sin  t=2
D
:
ct dt
sin  t=2 e  i t cos  t=2



We may also consider the effect of nonsingular linear transformations
on orientation.
Definition 3.5.8. Let V be an n-dimensional real vector space and let
T W V ! V be a nonsingular linear transformation. Let B D fv1; : : : ; vn g
be a basis of V . Then C D fT .v1 /; : : : ; T .vn g/ is also a basis of V . If B
and C give the same orientation of V then T is orientation preserving, while
if B and C give opposite orientations of V then T is orientation reversing.
Þ
The fact that this is well-defined, i.e., independent of the choice of basis
B, follows from the following proposition, which proves a more precise
result.
Proposition 3.5.9. Let V be an n-dimensional real vector space and let
T W V ! V be a nonsingular linear transformation. Then T is orientation
preserving if det.T / > 0, and T is orientation reversing if det.T / < 0.
Remark 3.5.10. Suppose we begin with a complex vector space V of
dimension n. We may then “forget” the fact that we have complex numbers
acting as scalars and in this way regard V as a real vector space VR of dimension 2n. In this situation VR has a canonical orientation. Choosing any
basis B D fv1 ; : : : ; vn g of V , we obtain a basis BR D fv1 ; iv1 ; : : : ; vn ; ivn g

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 86 — #100

i

i

86

Guide to Advanced Linear Algebra

of VR . It is easy to check that if C is any other basis of V , then CR gives
the same orientation of VR as BR does. Furthermore, suppose we have an
arbitrary linear transformation T W V ! V . By “forgetting” the complex
structure we similarly obtain a linear transformation TR W VR ! VR . In this
situation det.TR / D det.T /det.T /. In particular, if T is nonsingular, then
TR is not only nonsingular but also orientation preserving.
Þ

3.6

Hilbert matrices

In this section we present, without proofs, a single family of examples, the
Hilbert matrices. This family is both interesting and important. More information on it can be found in the article “Tricks or Treats with the Hilbert
Matrix” by M. D. Choi, Amer. Math Monthly 90 (1983), 301–312.
In this section we adopt the convention that the rows and columns of an
n-by-n matrix are numbered from 0 to n 1.
Definition 3.6.1. The n-by-n Hilbert matrix is the matrix H D .hij /
with hij D 1=.i C j C 1/.
Þ
Theorem 3.6.2. (1) The determinant of Hn is
det Hn

4
1Š2Š    .n 1/Š
D
:
1Š2Š    .2n 1/Š



(2) Let Gn D .gij / D Hn 1 . Then Gn has entries
i Cj

gij D . 1/



nCi
.i C j C 1/
n 1 j



nCi
n 1 i




i Cj
i Cj
:
i
j

Remark 3.6.3. The entries of Hn 1 are all integers, and it is known that
det.Hn / is the reciprocal of an integer.
Þ
Example 3.6.4. (1) det.H2 / D 1=12 and


4 6
1
H2 D
:
6 12
(2) det.H3 / D 1=2160 and
H3 1

2

9
4
D
36
30

36
192
180

3
30
1805 :
180

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 87 — #101

i

i

3.6. Hilbert matrices

87

(3) det.H4 / D 1=6048000 and
2
16
120
6
120
1200
H4 1 D 6
4 240 2700
140 1680

240
2700
6480
4200

(4) det.H5 / D 1=266716800000 and
2
25
300
1050
6 300
4800
18900
6
6
H5 1 D 6 1050 18900
79380
6
4 1400 26880 117600
630 12600
56700

3
140
1680 7
7:
4200 5
2800

1400
26880
117600
179200
88200

3
630
126007
7
7
567007 :
7
882005
44100

While we do not otherwise deal with numerical linear algebra in this
book, the Hilbert matrices present examples that are so pretty and striking,
that we cannot resist giving a pair.
These examples arise from the fact that, while Hn is nonsingular, its determinant is very close to zero. (In technical terms, Hn is “ill-conditioned”.)
We can already see this when n D 3.
Þ
Example 3.6.5. (1) Consider the equation
3
3 2
2
1:833 : : :
11=6
H3 v D 413=125 D 41:0833 : : :5 :
0:7833 : : :
47=60
It has solution

2 3
1
v D 415 :
1

Let us round off the right-hand side to two significant digits and consider
the equation
3
2
1:8
H3 v D 4 1:1 5 :
0:78
It has solution

2

3
0
v D 4 6 5:
3:6

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 88 — #102

i

i

88

Guide to Advanced Linear Algebra

(2) Let us round off the entries of H3 to two significant figures to obtain
the matrix
2
3
1 0:5 0:33
4 0:5 0:33 0:255 :
0:33 0:25 0:2
It has inverse
2

3500
1
4 17500
63
16100

17500
91100
85000

3
16100
850005 :
80000

Rounding the entries off to the nearest integer, it is
2
3
56
278
256
4 278 1446 13495 :
256 1349 1270

Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 89 — #103

i

i

CHAPTER

4

The structure of a
linear transformation I
In this chapter we begin our analysis of the structure of a linear transformation T W V ! V , where V is a finite-dimensional F -vector space.
We have arranged our exposition in order to bring some of the most important concepts to the fore first. Thus we begin with the notions of eigenvalues and eigenvectors, and we introduce the characteristic and minimum
polynomials of a linear transformation early in this chapter as well. In this
way we can get to some of the most important structural results, including
results on diagonalizability and the Cayley-Hamilton theorem, as quickly
as possible.
Recall our metaphor of coordinates as a language in which to speak
about vectors and linear transformations. Consider a linear transformation
T W V ! V , V a finite-dimensional vector space. Once we choose a basis
B of V , i.e., a language, we have the coordinate vector ŒvB of every vector
v in V , a vector in F n , and the matrix ŒT B of the linear transformation T ,
an n-by-n matrix, (where n is the dimension of V ) with the property that
ŒT .v/B D ŒT B ŒvB . If we choose a different basis C, i.e., a different language, we get different coordinate vectors ŒvC and a different matrix ŒT C
of T , though again we have the identity ŒT .v/C D ŒT C ŒvC . We have
also seen change of basis matrices, which tell us how to translate between
languages.
But here, mathematical language is different than human language. In
human language, if we have a problem expressed in English, and we translate it into German, we haven’t helped the situation. We have the same problem, expressed differently, but no easier to solve.
89

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 90 — #104

i

i

90

Guide to Advanced Linear Algebra

In linear algebra the situation is different. Given a linear transformation
T W V ! V , V a finite-dimensional vector space, there is a preferred basis
B of V; i.e., a best language in which to study the problem, one that makes
ŒT B as simple as possible and makes the structure of T easiest to understand. This is the language of eigenvalues, eigenvectors, and generalized
eigenvectors.
We first consider a simple example to motivate our discussion.
Let A be the matrix
 
20
AD
03
and consider TA W R2 !hR2i(where,
h as
i usual, TA .v/ D hAv).
i Also, consider
x
x
x
the standard basis E; so y
D y for every vector y 2 R2; and furE

thermore ŒTAE D A. TA looks simple, and indeed
h i it is easy to understand.
We observe that TA .e1 / D 2e1 , where e1 D 10 is the first standard basis
h i
vector in E, and TA .e2 / D 3e2 , where e2 D 01 is the second standard
basis vector in E. Geometrically, TA takes the vector e1 and stretches it by
a factor of 2 in its direction, and takes the vector e2 and stretches it by a
factor of 3 in its direction.
On the other hand, let B be the matrix


4 14
BD
3
9

h i
h i
4
and consider TB W R2 ! R2 : Now TB .e1 / D B 10 D
; and
3
h i
h
i
0
14
TB .e2 / D B 1 D
; and TB looks like a mess. TB takes each
9
of these vectors to some seemingly random vector in the plane, and there
seems to be no rhyme or reason here. But this appearance is deceptive, and
comes from the fact that we are studying B by using the standard basis
E; i.e., in the E language, which is the wrong language
problem.
nh for
i hthe io
7
2
Instead, let us choose the basis B D fb1; b2 g D
;
: Then
3
1 h
h i
h i
h i
i
TB .b1 / D B 37 D 146 D 2 37 D 2b1; and TB .b2 / D B 21 D
h i
h i
6
2
D
3
D 3b2 : Thus TB has exactly the same geometry as TA :
3
1
It takes the vector b1 and stretches it by a factor of 2 in its direction, and
it takes the vector b2 and stretches it by a factor of 3 in its direction. So
we should study TB by using the B basis, i.e., in the B language. This is

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 91 — #105

i

i

4.1. Eigenvalues, eigenvectors, and . . .

91

the right language for our problem, as it makes TB easiest to understand.
Referring to Remark 2.2.8 we see that
 
 
 
20
TB B D
D TA E :
03
This “right” language is the language of eigenvalues, eigenvectors, and
generalized eigenvectors, and the language that lets us express the matrix of
a linear transformation in “canonical form”.
But before we proceed further, let me make two more remarks.
On the one hand, even if V is not finite dimensional, it is often the
case that we still want to study eigenvalues and eigenvectors for a linear
transformation T ; as these are important structural features of T and still
give us a good way (sometimes the best way) of understanding T :
On the other hand, in studying a linear transformation T on a finitedimensional vector space, it is often a big mistake to pick a basis B and
study ŒT B : It may be unnatural to pick any basis at all. T is what comes
naturally and is usually what we want to study, even if in the end we can get
important information about T by looking at ŒT B : Let me again emphasize
this point: Linear algebra is about linear transformations, not matrices.

4.1

Eigenvalues, eigenvectors,
and generalized eigenvectors

In this section we introduce some of the most important structural information associated to a linear transformation.
Definition 4.1.1. Let T W V ! V be a linear transformation. Let
 2 F : If Ker.T
I/ ¤ f0g; then  is an eigenvalue of T : In this
case, any nonzero v 2 Ker.T
I/ is an eigenvector of T ; and the subspace Ker.T
I/ of V is an eigenspace of T . In this situation, , v, and
Ker.T I/ are associated.
Þ
Remark 4.1.2. Let v 2 V , v ¤ 0. If v 2 Ker.T
I/, then
.T
I/.v/ D 0, i.e., T .v/ D v, and conversely, the traditional definition of an eigenvector.
Þ
We will give some examples of this very important concept shortly, but
it is convenient to generalize it first.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 92 — #106

i

i

92

Guide to Advanced Linear Algebra

Definition 4.1.3. Let T W V ! V be a linear transformation and let
 2 F be an eigenvalue of T : The generalized eigenspace of T associated
to  is the subspace of V given by
˚
v j .T

I/k .v/ D 0 for some positive integer k :

If v is a nonzero vector in this generalized eigenspace, then v is a generalized eigenvector associated to the eigenvalue : For such a v; the smallest
positive integer k for which .T I/k .v/ D 0 is the index of v.
Þ
Remark 4.1.4. A generalized eigenvector of index 1 is just an eigenvector.
Þ
For a linear transformation T and an eigenvalue  of T ; we let E
denote the eigenspace E D Ker.T
I/: For a positive integer k; we let
Ek be the subspace Ek D Ker.T I/k : We let E1 denote the generalized
eigenspace associated to the eigenvalue . We see that E1  E2     and
that the union of these subspaces is E1 .
Example 4.1.5. (1) Let V D r F 1 and let L W V ! V be left shift. Then
L has the single eigenvalue  D 0 and the eigenspace E0 is 1-dimensional,
E0 D f.a1 ; a2 ; : : :/ 2 V j ai D 0 for i > 1g: More generally, E0k D
f.a1 ; a2 ; : : :/ 2 V j ai D 0 for i > kg; so dim E0k D k for every k; and
finally V D E01 : In contrast, R W V ! V does not have any eigenvalues.
(2) Let V D r F 1 1 and let L W V ! V be left shift. Then for any
 2 F ; E is 1-dimensional with basis f.1; ; 2 ; : : :/g: It is routine to
check that Ek is k-dimensional for every  2 F and every positive integer
k: In contrast, R W V ! V does not have any eigenvalues.
(3) Let F be a field of characteristic 0 and let V D P .F /, the space of
all polynomials with coefficients in F : Let D W V ! V be differentiation,
D.p.x// D p 0 .x/: Then D has the single eigenvalue 0 and the corresponding eigenspace E0 is 1-dimensional, consisting of the constant polynomials.
More generally, E0k is k-dimensional, consisting of all polynomials of degree at most k 1:
(4) Let V D P .F / be the space of all polynomials with coefficients in
a field of characteristic 0 and let T W V ! V be defined by T .p.x// D
xp 0 .x/: Then the eigenvalues of T are the nonnegative integers, and for
every nonnegative integer m the eigenspace Em is 1-dimensional with basis
fx m g:
(5) Let V be the space of holomorphic functions on C; and let D W V !
V be differentiation, D.f .z// D f 0 .z/: For any complex number , E

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 93 — #107

i

i

4.1. Eigenvalues, eigenvectors, and . . .

93

is 1-dimensional with basis f .z/ D e z : Also, Ek is k-dimensional with
basis fe z ; ze z ; : : : ; z k 1 e z g:
Þ
Now we turn to some finite-dimensional examples. We adopt the standard language that the eigenvalues, eigenvectors, etc. of an n-by-n matrix A are the eigenvalues, eigenvectors, etc. of TA W F n ! F n (where
TA .v/ D Av/:
Example 4.1.6. (1) Let 1 ; : : : ; n be distinct elements of F and let A
be the n-by-n diagonal matrix
2

6
6
AD6
4

3

1
2
::

:
n

7
7
7:
5

For each i D 1; : : : ; n; i is an eigenvalue of A with 1-dimensional eigenspace
Ei with basis fei g:
(2) Let  be an element of F and let A be the n-by-n matrix
2
3
 1
6  1
7
6
7
6
:: :: 7
AD6
7
:
:
6
7
4
15

with entries of  on the diagonal, 1 immediately above the diagonal, and
0 everywhere else. For each k D 1; : : : ; n, ek is a generalized eigenvector
of index k; and the generalized eigenspace Ek is k-dimensional with basis
fe1 ; : : : ; ek g:
Þ
Now we introduce the characteristic polynomial.
Definition 4.1.7. Let A be an n-by-n matrix. The characteristic polynomial cA .x/ of A is the polynomial
cA .x/ D det.xI

A/:

Þ

Remark 4.1.8. By properties of the determinant it is clear that cA .x/ is
a monic polynomial of degree n.
Þ
Lemma 4.1.9. Let A and B be similar matrices. Then cA .x/ D cB .x/:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 94 — #108

i

i

94

Guide to Advanced Linear Algebra

Proof. If B D PAP 1 ; then cB .x/ D det.xI B/ D det.xI PAP
det.P .xI A/P 1 / D det.xI A/ D cA .x/ by Corollary 3.3.2.

1

/D

Definition 4.1.10. Let V be a finite-dimensional vector space and let
T W V ! V be a linear transformation. Let B be any basis of V and let
A D ŒT B : The characteristic polynomial cT .x/ is the polynomial
cT .x/ D cA .x/ D det.xI

A/:
Þ

Remark 4.1.11. By Corollary 2.3.11 and Lemma 4.1.9, cT .x/ is welldefined (i.e., independent of the choice of basis B of V ).
Þ
Theorem 4.1.12. Let V be a finite-dimensional vector space and let T W
V ! V be a linear transformation. Then  is an eigenvalue of T if and
only if  is a root of the characteristic polynomial cT .x/; i.e., if and only if
cT ./ D 0:
Proof. Let B be a basis of V and let A D ŒT B : Then by definition  is an
eigenvalue of T if and only if there is a nonzero vector v in Ker.T
I/;
n
i.e., if and only if .A I /u D 0 for some nonzero vector u in F (where
the connection is that u D ŒvB ). This is the case if and only if A I is
singular, which is the case if and only if det.A I / D 0: But det.A I / D
. 1/n det.I A/; where n D dim.V /; so this is the case if and only if
cT ./ D cA ./ D det.I A/ D 0:
Remark 4.1.13. We have defined cA .x/ D det.xI A/ and this is the
correct definition, as we want cA .x/ to be a monic polynomial. In actually
finding eigenvectors or generalized eigenvectors, it is generally more convenient to work with A I rather than I A: Indeed, when it comes
to finding chains of generalized eigenvectors, it is almost essential to use
A I; as using I
A would introduce spurious minus signs, which
would have to be corrected for.
Þ
For the remainder of this section we assume that V is finite dimensional.
Definition 4.1.14. Let T W V ! V and let  be an eigenvalue of
T : The algebraic multiplicity of , alg-mult./, is the multiplicity of  as
a root of the characteristic polynomial cT .x/. The geometric multiplicity
of , geom-mult ./, is the dimension of the associated eigenspace E D
Ker.T I/.
Þ
We use multiplicity to mean algebraic multiplicity, as is standard.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 95 — #109

i

i

4.1. Eigenvalues, eigenvectors, and . . .

95

Lemma 4.1.15. Let T W V ! V and let  be an eigenvalue of T . Then
1  geom-mult./  alg-mult./.
Proof. By definition, if  is an eigenvalue of T there exists a (nonzero)
eigenvector, so 1  dim.E /:
Suppose dim.E / D d and let fv1 ; : : : ; vd g D B1 be a basis for E :
Extend B1 to a basis B D fv1 ; : : : ; vn g of V: Then
ŒT B D




I B
D A;
0 D

a block matrix with the upper left-hand block d -by-d . Then
ŒxI

T B D xI

AD


xI

I
0

B
xI

D



D



.x

/I
0

B
xI

D



so
cT .x/ D det.xI
D .x

A/ D det .x

/d det.xI

D/


/I det.xI

D/

and hence d  alg-mult./:
Corollary 4.1.16. Let T W V ! V and let  be an eigenvalue of T with
alg-mult./ D 1: Then geom-mult./ D 1:
It is important to observe that the existence of eigenvalues and eigenvectors depends on the field F ; as we see from the next example.
Example 4.1.17. For any nonzero rational number t let A t be the matrix
 
0 1
At D
;
t 0
so
A2t D




t 0
D tI:
0 t

Let  be an eigenvalue of A t with associated eigenvector v: Then, on the
one hand,

A2t .v/ D A t A t .v/ D A t .v/ D A t .v/ D 2 v;

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 96 — #110

i

i

96

Guide to Advanced Linear Algebra

but, on the other hand,
A2t .v/ D tI.v/ D tv;
so 2 D t:
(1) Suppose t D 1: Then 2 D 1,  Dh ˙1,
i and we have the eigenvalue
1
 D 1 with associated eigenvector v D 1 ; and the eigenvalue  D 1
h i
with associated eigenvector v D 11 :
(2) Suppose t D 2: If we regard A as being defined over Q; then there
is no  2 Q with 2 D 2; so p
A has no eigenvalues.
If we regard A as being
p
defined over R; then

D
˙
2,
and

D
2
is
an
eigenvalue with assoh 1 i
p
p
, and  D
2 is an eigenvalue with associated
ciated eigenvector
h 1 i 2
eigenvector p :
2
(3) Suppose t D 1: If we regard A as being defined over R; then there
is no  2 R with 2 D 1; so A has no eigenvalues. If we regard A as being
defined overhC;i then  D ˙i , and  D i is an eigenvalue with associated
eigenvector 1i , and  D i is an eigenvalue with associated eigenvector
h i
1
.
Þ
i
Now we introduce the minimum polynomial.

Lemma 4.1.18. Let A be an n-by-n matrix. There is a nonzero polynomial
p.x/ with p.A/ D 0:
2

Proof. The set of matrices fI; A; : : : ; An g is a set of n2 C 1 elements of a
vector space of dimension n2 , and so must be linearly dependent. Thus there
2
exist scalars c0 ; : : : ; cn2 ; not all zero, with c0 I C c1 A C    C cn2 An D 0:
2
Then p.A/ D 0 where p.x/ is the nonzero polynomial p.x/ D cn2 x n C
   C c1 x C c0 :
Theorem 4.1.19. Let A be an n-by-n matrix. There is a unique monic polynomial mA .x/ of lowest degree with mA .A/ D 0: Furthermore, mA .x/ divides every polynomial p.x/ with p.A/ D 0:
Proof. By Lemma 4.1.18, there is some nonzero polynomial p.x/ with
p.A/ D 0:
If p1 .x/ and p2 .x/ are any polynomials with p1 .A/ D 0 and p2 .A/ D
0; and q.x/ D p1 .x/ C p2 .x/; then q.A/ D p1 .A/ C p2 .A/ D 0 C
0 D 0: Also, if p1 .x/ is any polynomial with p1 .A/ D 0; and r .x/ is any
polynomial, and q.x/ D p1 .x/r .x/; then q.A/ D p1 .A/r .A/ D 0r .A/ D

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 97 — #111

i

i

4.2. Some structural results

97

0: Thus, in the language of Definition A.1.5, the set of polynomials fp.x/ j
p.A/ D 0g is a nonzero ideal, and so by Lemma A.1.8 there is a unique
polynomial mA .x/ as claimed.
Definition 4.1.20. The polynomial mA .x/ of Theorem 4.1.19 is the
minimum polynomial of A.
Þ
Lemma 4.1.21. Let A and B be similar matrices. Then mA .x/ D mB .x/:
Proof. If B D PAP 1 ; and p.x/ is any polynomial with p.A/ D 0, then
p.B/ D Pp.A/P 1 D P 0P 1 D 0, and vice-versa.
Definition 4.1.22. Let V be a finite-dimensional vector space and let
T W V ! V be a linear transformation. Let B be any basis of V and
let A D ŒT B : The minimum polynomial of T is the polynomial mT .x/
defined by mT .x/ D mA .x/.
Þ
Remark 4.1.23. By Corollary 2.3.11 and Lemma 4.1.21, mT .x/ is welldefined (i.e., independent of the choice of basis B of V ). Alternatively we
can see that mT .x/ is well-defined as for any linear transformation S W
V ! V; S D 0 (i.e., S is the 0 linear transformation) if and only if the
matrix ŒSB D 0 (i.e., ŒSB is the 0 matrix) in any and every basis B of V:
Þ

4.2

Some structural results

In this section we prove some basic but important structural results about a
linear transformation, obtaining information about generalized eigenspaces,
direct sum decompositions, and the relationship between the characteristic and minimum polynomials. As an application, we derive the famous
Cayley-Hamilton theorem.
While we prove much stronger results later, the following result is so
easy that we will pause to obtain it here.
Definition 4.2.1. Let V be a finite-dimensional vector space and let
T W V ! V be a linear transformation. T is triangularizable if there is a
basis B of V in which the matrix ŒT B is upper triangular.
Þ
Theorem 4.2.2. Let V be a finite-dimensional vector space over the field
F and let T W V ! V be a linear transformation. Then T is triangularizable if and only if its characteristic polynomial cT .x/ is a product of linear
factors. In particular, if F is algebraically closed then every T W V ! V is
triangularizable.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 98 — #112

i

i

98

Guide to Advanced Linear Algebra

Proof. If ŒT B D A is an upper triangular matrix with diagonal entries
d1 ; : : : ; dn ; then cT .x/ D cA .x/ D det.xI A/ D .x d1 /    .x dn / is
a product of linear factors.
We prove the converse by induction on n D dim.V /: Let cT .x/ D
.x d1 /    .x dn /: Then d1 is an eigenvalue of T I choose an eigenvector
v1 and let V1 be the subspace of V generated by v1 : Let V D V =V1 : Then
T induces T W V ! V with cT .x/ D .x d2 /    .x dn /: By induction, V
has a basis B D fv 2 ; : : : ; vn g with ŒT B D D upper triangular. Let vi 2 V
with .vi / D v i for i D 2; : : : ; n; and let B D fv1 ; v2 ; : : : ; vn g: Then
ŒT B D
for some 1-by-.n
upper triangular.



d1 C
0 D



1/ matrix C . Regardless of what C is, this matrix is

Lemma 4.2.3. (1) Let v be an eigenvector of T with associated eigenvalue
 and let p.x/ 2 F Œx be a polynomial. Then p.T /.v/ D p./v: Thus, if
p./ ¤ 0 then p.T /.v/ ¤ 0:
(2) More generally, let v be a generalized eigenvector of T of index k
with associated eigenvalue  and let p.x/ 2 F Œx be a polynomial. Then
p.T /.v/ D p./v C v 0 ; where v 0 is a generalized eigenvector of T of index
k 0 < k with associated eigenvector : Thus if p./ ¤ 0 then p.T /.v/ ¤ 0:
Proof. We can rewrite any polynomial p.x/ 2 F Œx in terms of x
p.x/ D an .x

/n C an

1 .x

/n

1

C    C a1 .x

:

/ C a0 :

Setting x D  we see that a0 D p./:
(1) If v is an eigenvector of T with associated eigenvalue , then
p.T /.v/ D an .T

I/n C    C a1 .T

D p./I.v/ D p./v


I/ C p./I .v/

as all terms but the last vanish.
(2) If v is a generalized eigenvector of T of index k with associated
eigenvalue , then
p.T /.v/ D an .T

I/n C    C a1 .T

D v 0 C p./v


I/ C p./I .v/

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 99 — #113

i

i

4.2. Some structural results

99

where
v 0 D an .T

I/n C    C a1 .T
I/n

D an .T

1


C    C a1 .T


I/ .v/

I/.v/

is a generalized eigenvector of T of index at most k 1 associated to .
Lemma 4.2.4. Let T W V ! V be a linear transformation with cT .x/ D
.x 1 /e1    .x m /em ; with 1 ; : : : ; m distinct. Let Wi D E1 be the
i
generalized eigenspace of T associated to the eigenvalue i : Then Wi is
e
a subspace of V of dimension ei : Also, Wi D Eii ; i.e., any generalized
eigenvector of T associated to i has index at most ei :
Proof. In proof of Theorem 4.2.2, we may choose the eigenvalues in any
order, so we choose i first, ei times. Then we find a basis B of V with
ŒT B an upper triangular matrix


A B
ŒT B D
;
0 D
where A is an upper triangular ei -by-ei matrix all of whose diagonal entries
are equal to i and D is an .n ei /-by-.n ei / matrix all of whose diagonal
entries are equal to the other j ’s and thus are unequal to i : Write B D
B1 [B10 where B1 consists of the first ei vectors in B; B1 D fv1 ; : : : ; vei g.
We claim that Wi is the subspace spanned by B1 :
To see this, observe that




A i I
B
T i I B D
0
D i I
so



T

i I

ei

B

D



A

i I
0

 ei

B0
e
D i I i



for some submatrix B 0 (whose exact value is irrelevant). But A i I is an
ei -by-ei upper triangular matrix with all of its diagonal entries 0; and, as
is easy to compute, .A i I /ei D 0: Also, D i I is an ei -by-ei upper
triangular matrix with none of its diagonal entries 0; and as is also easy to
compute, .D i I /ei is an upper triangular matrix with none of its diagonal
entries equal to 0: Both of these statements remain true for any e  ei :
Thus for any e  ei ;



e
0 B0
T i I B D
0 D0

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 100 — #114

i

i

100

Guide to Advanced Linear Algebra

with D 0 an upper triangular matrix all of whose diagonal entries are nonzero.
Then it is easy to see that for any e  ei ; Ker.ŒT i IeB / is the subspace
of F n generated by fe1 ; : : : ; ei g. Thus Wi is the subspace of V generated by
fv1 ; : : : ; vei g D B1 ; and is a subspace of dimension ei :
Lemma 4.2.5. In the situation of Lemma 4.2.4,
V D W1 ˚    ˚ Wm :
Proof. Since n D deg cT .x/ D e1 C    C em ; by Corollary 1.4.8(3) we
need only show that if 0 D w1 C    C wm with wi 2 Wi for each i; then
wi D 0 for each i:
Suppose we have an expression
0 D w1 C    C wi C    C wm
with wi ¤ 0: Let qi .x/ D cT .x/=.x i /ei ; so qi .x/ is divisible by
.x j /ej for every j ¤ i; but qi .i / ¤ 0: Then

0 D qi .T /.0/ D qi .T / w1 C    C wi C    C wm



D qi .T / w1 C    C qi .T / wi C    C qi .T / wm

D 0 C    C qi .T / wi C    C 0

D qi .T / wi ;

contradicting Lemma 4.2.3.

Lemma 4.2.6. Let T W V ! V be a linear transformation whose characteristic polynomial cT .x/ is a product of linear factors. Then
(1) mT .x/ and cT .x/ have the same linear factors.
(2) mT .x/ divides cT .x/.
Proof. (1) Let mT .x/ have a factor x , and let n.x/ D mT .x/=.x /.
Then n.T / ¤ 0, so there is a vector v0 with v D n.T /.v0 / ¤ 0. Then
.T
I/.v/ D mT .T /.v/ D 0, i.e., v 2 Ker.T
I/, so v is an eigenvector of T with associated eigenvalue . Thus x  is a factor of cT .x/.
Suppose x  is a factor of cT .x/ that is not a factor of mT .x/, so that
mT ./ ¤ 0. Choose an eigenvector v of T with associated eigenvector .
Then on the one hand mT .T / D 0 so mT .T /.v/ D 0, but on the other
hand, by Lemma 4.2.3, mT .T /.v/ D mT ./v ¤ 0; a contradiction.
e
(2) Since V D W1 ˚    ˚ Wm where Wi D Eii ; we can write any
v 2 V as v D w1 C    C wm with wi 2 Wi :

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 101 — #115

i

i

4.2. Some structural results
Then

101


cT .T /.v/ D cT .T / w1 C    C wm


D cT .T / w1 C    C cT .T / wm
D 0CC0 D 0

as for each i; cT .x/ is divisible by .x i /ei and .T
i I/ei .wi / D 0
ei
by the definition of E : But mT .x/ divides every polynomial p.x/ with
i
p.T / D 0; so mT .x/ divides cT .x/:
This lemma has a famous corollary, originally proved by quite different
methods.
Corollary 4.2.7 (Cayley-Hamilton theorem). Let V be a finite-dimensional
vector space and let T W V ! V be a linear transformation. Then
cT .T / D 0:
Proof. In case cT .x/ factors into a product of linear factors,
cT .x/ D .x

1 /e1    .x

m /em ;

we showed this in the proof of Lemma 4.2.6.
In general, pick any basis B of V and let A D ŒT B : Then cT .T / D 0
if and only if cA .A/ D 0: (Note cT .x/ D cA .x/.) Now A is a matrix with
entries in F ; and we can consider the linear transformation TA W F n ! F n :
But we may also take any extension field E of F and consider Te W En !
En defined by Te.v/ D Av: (So Te D TA ; but we are being careful to
use a different notation as Te is defined on the new vector space En .) Now
ce
.x/ D cA .x/ D det.xI A/ D cT .x/: In particular, we may take E to be
T
a field in which cA .x/ splits into a product of linear factors. For example, we
could take E to be the algebraic closure of F ; and then every polynomial
p.x/ 2 F Œx splits into a product of linear factors over E: Then by the
first case of the corollary, ce
.Te/ D 0; i.e., cA .A/ D 0; i.e., cT .T / D 0:
T
(Expressed differently, A is similar, as a matrix with entries in E; to a matrix
B for which cB .B/ D 0: If A D PBP 1 ; then for any polynomial f .x/;
f .A/ D Pf .B/P 1 : Also, since A and B are similar, cA.x/ D cB .x/:
Thus cA .A/ D cB .A/ D P cB .B/P 1 D P 0P 1 D 0.)
Remark 4.2.8. For the reader familiar with tensor products, we observe
that the second case of the corollary can be simplified to:
Consider Te D T ˝ 1 W V ˝F E ! V ˝F E: Then cT .x/ D ce
.x/ and
T
e / D 0 by the lemma, so cT .T / D 0.
ce
.
T
Þ
T

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 102 — #116

i

i

102

Guide to Advanced Linear Algebra

Remark 4.2.9. If F is algebraically closed (e.g., F D C; which is algebraically closed by the Fundamental Theorem of Algebra) then cT .x/
automatically splits into a product of linear factors, and we are in the first
case of the Cayley-Hamilton theorem, and we are done—fine. If not, although our proof is correct, it is the “wrong” proof. We should not have to
pass to a larger field E in order to investigate linear transformations over
F : We shall present a “right” proof later, where we will see how to generalize both Lemma 4.2.5 and Lemma 4.2.6 (see Theorem 5.3.1 and Corollary
5.3.4).
Þ

4.3

Diagonalizability

Before we continue with our analysis of general linear transformations, we
consider a particular but very useful case.
Definition 4.3.1. (1) Let V be a finite-dimensional vector space and let
T W V ! V be a linear transformation. Then T is diagonalizable if V has
a basis B with ŒT B a diagonal matrix.
(2) An n-by-n matrix A is diagonalizable if TA W F n ! F n is diagonalizable.
Þ
Remark 4.3.2. In light of Theorem 2.3.14, we may phrase (2) more simply as: A is diagonalizable if it is similar to a diagonal matrix.
Þ
Lemma 4.3.3. Let V be a finite-dimensional vector space and let T W V !
V be a linear transformation. Then T is diagonalizable if and only if V has
a basis B consisting of eigenvectors of T :
Proof. Let B D fv1 ; : : : ; vn g and let D D ŒT B be a diagonal matrix with
diagonal entries 1; : : : ; n : For each i;

 
 

T vi B D ŒT B vi B D D ei D i ei D i vi B ;

so T .vi / D i vi and vi is an eigenvector.
Conversely, if B D fv1 ; : : : ; vn g is a basis of eigenvectors, so T .vi / D
i vi for each i , then


 ˇ 
 ˇ
ŒT B D T v1 B ˇ T v2 B ˇ   
ˇ
ˇ

 ˇ
 ˇ
 

D 1v1 ˇ 2 v2 ˇ    D 1 e1 ˇ 2 e2 ˇ    D D
B

B

is a diagonal matrix.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 103 — #117

i

i

4.3. Diagonalizability

103

Theorem 4.3.4. Let V be a finite-dimensional vector space and let T W
V ! V be a linear transformation. If cT .x/ does not split into a product
of linear factors, then T is not diagonalizable. If cT .x/ does split into a
product of linear factors (which is always the case if F is algebraically
closed) then the following are equivalent:
(1) T is diagonalizable.
(2) mT .x/ splits into a product of distinct linear factors.
(3) For every eigenvalue  of T ; E D E1 (i.e., every generalized
eigenvector of T is an eigenvector of T ).
(4) For every eigenvalue  of T ; geom-mult./ D alg-mult./:
(5) The sum of the geometric multiplicities of the eigenvalues is equal
to the dimension of V:
(6) If 1 ; : : : ; m are the distinct eigenvalues of T , then
V D E1 ˚   

Em :

Proof. We prove the contrapositive of the first claim: Suppose T is diagonalizable and let B be a basis of V with D D ŒT B a diagonal matrix
with diagonal entries 1 ; : : : ; n: Then cT .x/ D cD .x/ D det.xI D/ D
.x 1 /    .x n /:
Suppose cT .x/ D .x 1 /    .x n /. The scalars 1 ; : : : ; n may not
all be distinct, so we group them. Let the distinct eigenvalues be 1 ; : : : ; m
so cT .x/ D .x 1 /e1    .x m /em for positive integers e1 ; : : : ; em :
Let n D dim.V /: Visibly, ei is the algebraic multiplicity of i ; and
e1 C    C em D n: Let fi be the geometric multiplicity of i : Then we
know by Lemma 4.1.15 that 1  fi  ei ; so f1 C    C fm D n if and only
if fi D ei for each i; so (4) and (5) are equivalent. We know by Lemma
4.2.4 that ei D dim E1 ; and by definition fi D dim Ei ; and Ei  E1 ;
i
i
so (3) and (4) are equivalent.
By Lemma 4.2.5, V D E11 ˚    ˚ E1 , so V D E1 ˚    ˚ Ek if
k
and only if E1 D E11 for each i , so (3) and (6) are equivalent.
If V D E1 ˚    ˚ Em , let Bi be a basis for Ei and let B D
B1 [    [ Bm . Let Ti be the restriction of T to Ei . Then B is a basis for
V and
2
3
A1
6
7
::
ŒT B D 4
5 D A;
:
Am

a block diagonal matrix with Ai D ŒTi Bi . But in this case Ai is the ei -by-ei
matrix i I (a scalar multiple of the identity matrix) so (6) implies (1).

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 104 — #118

i

i

104

Guide to Advanced Linear Algebra

If there is an eigenvalue i of T for which Ei  E1 ; let vi 2 E1
i

i

be a generalized eigenvector of index k > 1; so .T
i I/k .vi / D 0 but
k 1
.T i I/ .vi / ¤ 0: For any polynomial p.x/ with p.i / ¤ 0; p.T /.vi /
is another generalized eigenvector of the same index k. This implies that any
polynomial f .x/ with f .T /.vi / D 0; and in particular mT .x/; has a factor
of .x i /k : Thus not-(3) implies not-(2), or (2) implies (3).
Finally, let T be diagonalizable, ŒT B D D in some basis B; where
D is a diagonal matrix with entries 1 ; : : : ; m; and with distinct diagonal
entries 1 repeated e1 times, 2 repeated e2 times, etc. We may reorder B
so that
2
3
A1
6
7
::
ŒT B D 4
5DA
:
Am

with Ai the ei -by-ei matrix i I: Then Ai i I is the zero matrix, and an
easy computation shows .A 1 I /    .A m I / D 0; so mT .x/ divides,
and is easily seen to be equal to, .x 1 /    .x m /; and (1) implies
(2).
Corollary 4.3.5. Let V be a finite-dimensional vector space and T W V !
V a linear transformation. Suppose that cT .x/ D .x 1 /    .x n / is
a product of distinct linear factors. Then T is diagonalizable.
Proof. By Corollary 4.1.16, alg-mult.i / D 1 implies geom-mult.i / D 1
as well.

4.4

An application to
differential equations

Let us look at a familiar situation, the solution of linear differential equations, and see how the ideas of linear algebra clarify what is going on. Since
we are interested in the linear-algebraic aspects of the situation rather than
the analytical ones, we will not try to make minimal differentiability assumptions, but rather make the most convenient ones.
We let V be the vector space of C 1 complex-valued functions on the
real line R: We let L be an nth order linear differential operator L D
an .x/Dn C    C a1 .x/D C a0 .x/; where the ai .x/ are functions in V and
D denotes differentiation: D.f .x// D f 0 .x/ and Dk .f .x// D f .k/ .x/; the
kth derivative. We further assume that an .x/ ¤ 0 for all x 2 R:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 105 — #119

i

i

4.4. An application to differential equations

105

Theorem 4.4.1. Let L be as above. Then Ker.L/ is an n-dimensional subspace of V: For any b.x/ 2 V; fy 2 V j L.y/ D b.x/g is an affine subspace
of V parallel to Ker.L/:
Proof. As the kernel of a linear transformation, Ker.L/ is a subspace of V:
Ker.L/ D fy 2 V j L.y/ D 0g is just the solution space of the linear
differential equation L.y/ D an .x/y .n/ C    C a1 .x/y 0 C a0 .x/y D 0:
For x0 2 R define a linear transformation E W Ker.L/ ! C n by
 3
2
y x0
6 0  7
6 y x0 7
7:
E.y/ D 6
::
7
6
:
4
5

.n 1/
y
x0

The fundamental existence and uniqueness theorem for linear differential equations tells us that E is onto (that’s existence—there is a solution for
any set of initial conditions) and that it is 1-1 (that’s uniqueness), so E is an
isomorphism and Ker.L/ is n-dimensional. For any b.x/ 2 V this theorem
tells us that L.y/ D b.x/ has a solution, so now, by Theorem 1.5.7, the set
of all solutions is an affine subspace parallel to Ker.L/:
Now we wish to solve L.y/ D 0 or L.y/ D b.x/:
To solve L.y/ D 0; we find a basis of Ker.L/: Since we know Ker.L/
is n-dimensional, we simply need to find n linearly independent functions
fy1 .x/; : : : ; yn .x/g in Ker.L/ and the general solution of L.y/ D 0 will be
y D c1 y1 .x/ C    C cn yn .x/: Then, by Proposition1.5.6, in order to solve
the inhomogeneous equation L.y/ D b.x/; we simply need to find a single
solution, i.e., a single function y0 .x/ with L.y0 .x// D b.x/; and then the
general solution of L.y/ D b.x/ will be y D y0 .x/ C c1y1 .x/ C    C
cn yn .x/:
We now turn to the constant coefficient case, where we can find explicit
solutions. That is, we assume an ; : : : ; a0 are constants.
First let us see that a familiar property of differentiation is a consequence of a fact from linear algebra.
Theorem 4.4.2. Let V be a (necessarily infinite-dimensional) vector space
and let T W V ! V be a linear transformation such that T is onto and
Ker.T / is 1-dimensional. Then for any positive integer k; Ker.T k / is kdimensional and is the subspace fp.T /.vk / j p.x/ an arbitrary polynomialg
for a single generalized eigenvector vk of index k, (necessarily associated
to the eigenvalue 0).

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 106 — #120

i

i

106

Guide to Advanced Linear Algebra

Proof. We proceed by induction on k. By hypothesis the theorem is true
for k D 1: Suppose it is true for k and consider T kC1 . By hypothesis,
there is a vector vkC1 with T .vkC1 / D vk , and vkC1 is then a generalized
eigenvector of index kC1. The subspace fp.T /.vkC1 / j p.x/ a polynomialg
is a subspace of Ker.T kC1 / of dimension kC1. We must show this subspace
is all of Ker.T kC1 /: Let w 2 Ker.T kC1 /, so T kC1 .w/ D T k .T .w// D 0.
By the inductive hypothesis, we can write T .w/ D p.T /.vk / for some
polynomial p.x/. If we let w0 D p.T /.vkC1 /, then
T .w0 / D T p.T /.vkC1 / D p.T /T .vkC1 / D p.T /.vk / D T .w/:
Hence w w0 2 Ker.T /; so w D w0 C av1 where v1 D T k 1 .vk / D
T k .vkC1 /, i.e., w D .p.T / C aT k /.vkC1 / D q.T /.vkC1 / where q.x/ D
p.x/ C ax k , and we are done.
Lemma 4.4.3. (1) Ker.Dk / has basis f1; x; : : : ; x k 1g:
(2) More generally, for any a; Ker.D a/k has basis fe ax ; xe ax ; : : : ;
k 1 ax
x
e g.
Proof. We can easily verify that
.D

a/k .x k

1 ax

e /D0

but .D

a/k

1

.x k

1 ax

e /¤0

(and it is trivial to verify that Dk .x k 1 / D 0 but Dk 1 .x k 1 / ¤ 0). Thus
B D fe ax ; xe ax ; : : : ; x k 1e ax g is a set of generalized eigenvectors of indices 1; 2; : : : ; k associated to the eigenvalue a. Hence B is linearly independent. We know from Theorem 4.4.1 that Ker..D a/k / has dimension
k; so B forms a basis.
Alternatively, we can use Theorem 4.4.2. We know Ker.D/ consists precisely of the constant functions, so it is 1-dimensional with basis f1g: Furthermore,
D is onto by the Fundamental Theorem of Calculus: If F .x/ D
Rx
f
.t/dt;
then D.F .x// D f .x/:
x0
For D a the situation is only a little more complicated. We can easily
find that Ker.D a/ D fce ax g; a 1-dimensional space with basis fe ax g: If
we let
Z x
F .x/ D e ax
e at f .t/ dt;
x0

the product rule and the Fundamental Theorem of Calculus show that
.D

a/.F .x// D f .x/:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 107 — #121

i

i

4.4. An application to differential equations

107

With notation as in the proof of Theorem 4.4.2, if we let v1 D e ax and
solve for v2 ; v3 ; : : : ; recursively, we obtain a basis of Ker.D a/
fe ax ; xe ax; .x 2 =2/e ax ; : : : ; .x k

1

1/Š/e ax g

=.k

(or f1; x; x 2=2; : : : ; x k 1=.k 1/Šg if a D 0) and since we can replace any
basis element by a multiple of itself and still have a basis, we are done.
Theorem 4.4.4. Let L be a constant coefficient differential operator with
factorization
L D an .D 1 /e1    .D m /em
where 1 ; : : : ; m are distinct. Then
fe 1 x ; : : : ; x e1

1 1 x

e

; : : : ; e m x ; : : : ; x em

1

m xg

is a basis for Ker.L/; so that the general solution of L.y/ D 0 is
y D c1;1 e 1 x C    C c1;e1 x e1
C cm;1 e

m x

1 1 x

e

C    C cm;em x

C

em 1 m x

e

:

If b.x/ 2 V is arbitrary, let y0 D y0 .x/ be an element of V with
L.y0 .x// D b.x/: (Such an element y0 .x/ always exists.) Then the general
solution of L.y/ D b.x/ is
y D y0 C c1;1 e 1 x C    C c1;e1 x e1
C cm;1 e

m x

C    C cm;em x

1 1 x

e

em 1 m x

e

C

:

Proof. We know that the set of generalized eigenspaces corresponding to
distinct eigenvalues are linearly independent (this follows directly from the
proof of Lemma 4.2.5, which does not require V to be finite dimensional)
and then within each eigenspace a set of generalized eigenvectors with distinct indices is linearly independent as well, so this entire set of generalized
eigenvectors is linearly independent. Since there are n of them, they form a
basis for Ker.L/: The inhomogeneous case then follows immediately from
Proposition1.5.6.
Remark 4.4.5. Suppose L has real coefficients and we want to solve
L.y/ D 0 in real functions. We proceed as above to obtain the general
solution, and look for conditions on the c’s for the solution to be real. Since
an x n C    C a0 is a real polynomial, if the complex number  is a root
of it, so is its conjugate , and then to obtain a real solution of L.y/ D 0

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 108 — #122

i

i

108

Guide to Advanced Linear Algebra

the coefficient of e x must be the complex conjugate of the coefficient of
e x ; etc. Thus in our expression for y there is a pair of terms ce x C ce x :
Writing c D c1 C i c2 and  D a C bi ,


ce x C ce x D c1 C i c2 e ax cos.bx/ C i sin.bx/


C c1 i c2 e ax cos.bx/ i sin.bx/
D d1 e ax cos.bx/ C d2 e ax sin.bx/

for real numbers d1 and d2 : That is, we can perform a change of basis and
instead of using the basis given in Theorem 4.4.4, replace each pair of basis
elements fe x ; e x g by the pair of basis elements fe ax cos.bx/; e ax sin.bx/g;
etc., and express our solution in terms of this new basis.
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 109 — #123

i

i

CHAPTER

5

The structure of a
linear transformation II
In this chapter we conclude our analysis of the structure of a linear transformation T W V ! V . We derive our deepest structural results, the rational
canonical form of T and, when V is a vector space over an algebraically
closed field F , the Jordan canonical form of T .
Recall our metaphor of coordinates as giving a language in which to describe linear transformations. A basis B of V in which ŒT B is in canonical
form is a “right” language to describe the linear transformation T . This is
especially true for the Jordan canonical form, which is intimately related to
eigenvalues, eigenvectors, and generalized eigenvectors.
The importance of the Jordan canonical form of T cannot be overemphasized. Every structural fact about a linear transformation is encoded in
its Jordan canonical form.
We not only show the existence of the Jordan canonical form, but also
derive an algorithm for finding the Jordan canonical form of T as well as
finding a Jordan basis of V , assuming we can factor the characteristic polynomial cT .x/. (Of course, there is no algorithm for factoring polynomials,
as we know from Galois theory.)
We have arranged our exposition in what we think is the clearest way,
getting to the simplest (but still important) results as quickly as possible
in the preceding chapter, and saving the deepest results for this chapter.
However, this is not the logically most economical way. (That would have
been to prove the most general and deepest structure theorems first, and
to obtain the simpler results as corollaries.) This means that our approach
involves a certain amount of repetition. For example, although we defined
109

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 110 — #124

i

i

110

Guide to Advanced Linear Algebra

the characteristic and minimum polynomials of a linear transformation in
the last chapter, we will be redefining them here, when we consider them
more deeply. But we want to remark that this repetition is a deliberate choice
arising from the order in which we have decided to present the material.
While our ultimate goal in this chapter is the Jordan canonical form, our
path to it goes through rational canonical form. There are several reasons
for this: First, rational canonical form always exists, while in order to obtain
the Jordan canonical form for an arbitrary linear transformation we must be
working over an algebraically closed field. (There is a generalization of
Jordan canonical form that exists over an arbitrary field, and we will briefly
mention it though not treat it in depth.) Second, rational canonical form is
important in itself, and, as we shall see, has a number of applications. Third,
the natural way to prove the existence of the Jordan canonical form of T is
first to split V up into the direct sum of the generalized eigenspaces of T
(this being the easy step), and then to analyze each generalized eigenspace
(this being where the hard work comes in), and for a linear transformation
with a single generalized eigenspace, rational and Jordan canonical forms
are very closely related.
Here is how our argument proceeds. In Section 5.1 we introduce the
minimum and characteristic polynomials of a linear transformation T W
V ! V , and in particular we derive Theorem 5.1.11, which is both very
useful and important in its own right. In Section 5.2 we consider T -invariant
subspaces W of V and the map T induced by T on the quotient space
V =W . In Section 5.3 we prove Theorem 5.3.1, giving the relationship between the minimum and characteristic polynomials of T , and as a corollary
derive the Cayley-Hamilton Theorem. (It is often thought that this theorem is a consequence of Jordan canonical form, but, as you will see, it is
actually prior to Jordan canonical form.) In Section 5.4 we return to invariant subspaces, and prove the key technical results Theorem 5.4.6 and Theorem 5.4.10, which tell us when T -invariant subspaces have T -invariant
complements. Using this work, we quickly derive rational canonical form
in Section 5.5, and then we use rational canonical form to quickly derive
Jordan canonical form in Section 5.6. Because of the importance and utility
of this result, in Section 5.7 we give a well-illustrated algorithm for finding the Jordan canonical form of T , and a Jordan basis of V , providing
we can factor the characteristic polynomial of T . In the last two sections
of this chapter, Section 5.8 and Section 5.9, we apply our results to derive
additional structural information on linear transformations.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 111 — #125

i

i

5.1. Annihilating, minimum, characteristic polynomials 111

5.1

Annihilating, minimum, and
characteristic polynomials

Let V be a finite-dimensional vector space and let T W V ! V be a linear
transformation. In this section we introduce three sorts of polynomials associated to T : First, for any nonzero vector v 2 V , we have its T -annihilator
mT ;v .x/. Then we have the minimum polynomial of T , mT .x/, and the
characteristic polynomial of T , cT .x/. All of these polynomials will play
important roles in our development.
Theorem 5.1.1. Let V be a vector space of dimension n and let v 2 V be a
vector, v ¤ 0. Then there is a unique monic polynomial mT ;v .x/ of lowest
degree with mT ;v .T /.v/ D 0. This polynomial has degree at most n.
Proof. Consider the vectors fv; T .v/; : : : ; T n .v/g. This is a set of n C 1
vectors in an n-dimensional vector space and so is linearly dependent, i.e.,
there are a0 ; : : : ; an not all zero such that a0 v C a1 T .v/C   C an T n .v/ D
0. Thus if p.x/ D an x n C    C a0 , p.x/ is a nonzero polynomial with
p.T /.v/ D 0. Now J D ff .x/ 2 F Œx j f .T /.v/ D 0g is a nonzero ideal
in F Œx (if f .T /.v/ D 0 and g.T /.v/ D 0, then .f C g/.T /.v/ D 0 and
if f .T /.v/ D 0 then .cf /.T /.v/ D 0, and p.x/ 2 J, so J is a nonzero
ideal.) Hence by Lemma A.1.8 there is a unique monic polynomial mT ;v .x/
of lowest degree in J.
Definition 5.1.2. The polynomial mT ;v .x/ is called the T -annihilator
of the vector v.
Þ
Example 5.1.3. Let V have basis fv1 ; : : : ; vn g and define T by T .v1 / D
0 and T .vi / D vi 1 for i > 1. Then mT ;vk .x/ D x k for k D 1; : : : ; n.
This shows that mT ;v .x/ can have any degree between 1 and n.
Þ
Example 5.1.4. Let V D r F 1 and let L W V ! V be left shift. Consider
v 2 V , v ¤ 0. For some k, v is of the form .a1 ; a2 ; : : : ; ak ; 0; 0; : : :/ with
ak ¤ 0, and then mT ;v .x/ D x k . If R W V ! V is right shift, then for any
vector v ¤ 0, the set fv; R.v/; R2 .v/; : : :g is linearly independent and so
there is no nonzero polynomial p.x/ with p.T /.v/ D 0.
Þ
Theorem 5.1.5. Let V be a vector space of dimension n. Then there is a
unique monic polynomial mT .x/ of lowest degree with mT .T /.v/ D 0 for
every v 2 V . This polynomial has degree at most n2 .
Proof. Choose a basis B D fv1 ; : : : ; vn g of V . For each vk 2 B we have its
T -annihilator pk .x/ D mT ;vk .x/. Let q.x/ be the least common multiple

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 112 — #126

i

i

112

Guide to Advanced Linear Algebra

of p1 .x/; : : : ; pn .x/. Since pk .x/ divides q.x/ for each k, q.T /.vk / D 0.
Hence q.T /.v/ D 0 for every v 2 V by Lemma 1.2.23. If r .x/ is any polynomial with r .x/ not divisible by pk .x/ for some k, then for that value of
k we have r .T /.vk / ¤ 0. Thus mT .x/ D q.x/ is the desired polynomial.
mT .x/ divides the product p1 .x/p2 .x/    pn .x/, of degree n2 , so mT .x/
has degree at most n2 .
Definition 5.1.6. The polynomial mT .x/ is the minimum polynomial
of T .
Þ
Remark 5.1.7. As we will see in Corollary 5.1.12, mT .x/ has degree at
most n.
Þ
Example 5.1.8. Let V be n-dimensional with basis fv1 ; : : : ; vn g and for
any fixed value of k between 1 and n, define T W V ! V by T .v1 / D 0,
T .vi / D vi 1 for 2  i  k, T .vi / D 0 for i > k. Then mT .x/ D x k .
This shows that mT .x/ can have any degree between 1 and n (compare
Example 5.1.3).
Þ
Example 5.1.9. Returning to Example 5.1.4, we see that if T D R,
given any nonzero vector v 2 V there is no nonzero polynomial f .x/
with f .T /.v/ D 0, so there is certainly no nonzero polynomial f .x/ with
f .T / D 0. Thus T does not have a minimum polynomial. If T D L, then
mT ;v .x/ exists for any nonzero vector v 2 V , i.e., for every nonzero vector
v 2 V there is a polynomial fx .x/ with fv .T /.v/ D 0. But there is no
single polynomial f .x/ with f .T /.v/ D 0 for every v 2 V , so again T
does not have a minimum polynomial. (Such a polynomial would have to
be divisible by x k for every positive integer k.) Let T W V ! V be defined
by T .a1 ; a2 ; a3 ; a4; : : :/ D . a1 ; a2; a3 ; a4 ; : : :/. If v0 D .a1 ; a2 ; : : :/
with ai D 0 whenever i is odd, then T .v0 / D v0 so mT ;v0 .x/ D x 1. If
v1 D .a1 ; a2; : : :/ with ai D 0 whenever i is even, then T .v1 / D v1
so mT ;v1 .x/ D x C 1. If v is not of one of these two special forms,
then mT ;v .x/ D x 2 1. Thus T has a minimum polynomial, namely
mT .x/ D x 2 1.
Þ

Lemma 5.1.10. Let V be a vector space and let T W V ! V be a linear
transformation. Let v1 ; : : : ; vk 2 V with T -annihilators pi .x/ D mT ;vi .x/
for i D 1; : : : ; k and suppose that p1 .x/; : : : ; pk .x/ are pairwise relatively
prime. Let v D v1 C    C vk have T -annihilator p.x/ D mT ;v .x/. Then
p.x/ D p1 .x/    pk .x/.

Proof. We proceed by induction on k. The case k D 1 is trivial. We do the
crucial case k D 2, and leave k > 2 to the reader.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 113 — #127

i

i

5.1. Annihilating, minimum, characteristic polynomials 113
Let v D v1 C v2 where p1 .T /.v1 / D p2 .T /.v2 / D 0 with p1 .x/ and
p2 .x/ relatively prime. Then there are polynomials q1 .x/ and q2 .x/ with
p1 .x/q1 .x/ C p2 .x/q2 .x/ D 1, so


v D Iv D p1 .T /q1 .T / C p2 .T /q2 .T / v1 C v2


D p2 .T /q2 .T / v1 C p1 .T /q1 .T / v2
D w1 C w2 :

Now



p1 .T / w1 D p1 .T / p2 .T /q2 .T / v1


D p2 .T /q2 .T / p1 .T / v1 D 0;

so w1 2 Ker.p1 .T // and similarly w2 2 Ker.p2 .T //.
Let r .x/ be any polynomial with r .T /.v/ D 0.
Now v D w1 C w2 so p2 .T /.v/ D p2 .T /.w1 C w2 / D p2 .T /.w1 /,
so 0 D r .T /.v/ gives 0 D r .T /p2 .T /q2 .T /.w1 /. Also, p1 .T /.w1 / D 0
so we certainly have 0 D r .T /p1 .T /q1 .T /.w1 /. Hence


0 D r .T / p1 .T /q1 .T / C p2 .T /q2 .T / w1


D r .T / I w1

D r .T / w1
(as p1 .x/q1 .x/ C p2 .x/q2 .x/ D 1), and similarly 0 D r .T /.w2 /.
Now
r .T /.w1 / D r .T /.p2 .T /q2 .T //.v1 /:

But p1 .x/ is the T -annihilator of v1 , so by definition p1 .x/ divides
r1 .x/.p2 .x/q2 .x//. From 1 D p1 .x/q1 .x/Cp2 .x/q2 .x/ we see that p1 .x/
and p2 .x/q2 .x/ are relatively prime, so by Lemma A.1.21, p1 .x/ divides
r .x/. Similarly, considering r .T /.w2 /, we see that p2 .x/ divides r .x/. By
hypothesis p1 .x/ and p2 .x/ are relatively prime, so by Corollary A.1.22,
p1 .x/p2 .x/ divides r .x/.
On the other hand, clearly
.p1 .T /p2 .T //.v/ D .p1 .T /p2 .T //.v1 C v2 / D 0:
Thus p1 .x/p2 .x/ is the T -annihilator of v, as claimed.
Theorem 5.1.11. Let V be a finite-dimensional vector space and let T W
V ! V be a linear transformation. Then there is a vector v 2 V such that
the T -annihilator mT ;v .x/ of v is equal to the minimum polynomial mT .x/
of T .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 114 — #128

i

i

114

Guide to Advanced Linear Algebra

Proof. Choose a basis B D fv1 ; : : : ; vn g of V . As we have seen in Theorem 5.1.5, the minimum polynomial mT .x/ is the least common multiple of
the T -annihilators mT ;v1 .x/; : : : ; mT ;vn .x/. Factor mT .x/ D p1 .x/f1   
pk .x/fk where p1 .x/; : : : ; pk .x/ are distinct irreducible polynomials, and
hence p1 .x/f1 ; : : : ; pk .x/fk are pairwise relatively prime polynomials. For
each i between 1 and k, pi .x/fi must appear as a factor of mT ;vj .x/ for
some j . Write mT ;vj .x/ D pi .x/fi q.x/. Then the vector ui D q.T /.vj /
has T -annihilator pi .x/fi . By Lemma 5.1.10, the vector v D u1 C    C uk
has T -annihilator p1 .x/f1    pk .x/fk D mT .x/.
Not only is Theorem 5.1.11 interesting in itself, but it plays a key role in
future developments: We will often pick an element v 2 V with mT ;v .x/ D
mT .x/, and proceed from there.
Here is an immediate application of this theorem.
Corollary 5.1.12. Let T W V ! V where V is a vector space of dimension
n. Then mT .x/ is a polynomial of degree at most n.
Proof. mT .x/ D mT ;v .x/ for some v 2 V . But for any v 2 V , mT ;v .x/
has degree at most n.
We now define a second very important polynomial associated to a linear transformation from a finite-dimensional vector space to itself.
We need a preliminary lemma.
Lemma 5.1.13. Let A and B be similar matrices. Then det.xI
det.xI B/ (as polynomials in F Œx).
Proof.

If B D PAP
xI

1

A/ D

then

B D x.PIP

1

D P .xI /P

/

1

.PAP
PAP

1
1

/
D P .xI

A/P

1

;

so
det.xI

B/ D det.P .xI

A/P

D det.P / det.xI

1

/ D det.P / det.xI

A/ det.P /

1

D det.xI

A/ det.P
A/:

1

/


Definition 5.1.14. Let A be a square matrix. The characteristic polynomial cA .x/ of A is the polynomial cA .x/ D det.xI A/. Let V be a
finite-dimensional vector space and let T W V ! V be a linear transformation. The characteristic polynomial cT .x/ is the polynomial defined as
follows. Let B be any basis of V and let A be the matrix A D ŒT B . Then
cT .x/ D det.xI A/.
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 115 — #129

i

i

5.1. Annihilating, minimum, characteristic polynomials 115
Remark 5.1.15. We see from Theorem 2.3.14 and Lemma 5.1.13 that
cT .x/ is well defined, i.e., independent of the choice of basis B.
Þ
We now introduce a special kind of matrix, whose importance we will
see later.
Definition 5.1.16. Let f .x/ D x n C an 1 x n 1 C    C a1 x C a0 be
a monic polynomial in F Œx of degree n  1. Then the companion matrix
C.f .x// of f .x/ is the n-by-n matrix
2
3
an 1 1 0    0
6 an 2 0 1    07
7
 6
6
:: : : 7 :
C f .x/ D 6
7
:
:
7
6
4 a1 0 0    15
a0 0 0    0

Þ

(The 1’s are immediately above the diagonal.)
n

n 1

Theorem 5.1.17. Let f .x/ D x C an 1 x
C    C a0 be a monic
polynomial and let A D C.f .x// be its companion matrix. Let V D F n
and let T D TA W V ! V be the linear transformation T .v/ D Av.
Let v D en be the nth standard basis vector. Then the subspace W of V
defined by W D fg.T /.v/ j g.x/ 2 F Œxg is V . Furthermore, mT .x/ D
mT ;v .x/ D f .x/.
Proof. We see that T .en / D en 1 , T 2 .en / D T .en 1 / D en 2 , and in
general T k .en / D en k for k  n 1. Thus the subspace W of V contains
the subspace spanned by fT n 1 .v/; : : : ; T .v/; vg D fe1; : : : ; en 1 ; en g,
which is all of V . We also see that this set is linearly independent, and
hence that there is no nonzero polynomial p.x/ of degree less than or equal
to n 1 with p.T /.v/ D 0. From
T n .v/ D T .e1 / D an
D an

1T

n 1

1 e1

.v/

an

an
2T

2 e2
n 2



.v/

a1 en


1

a0 en

a1 T .v/

a0 v

we see that
0 D an T n .v/ C    C a1 T .v/ C a0 v;
i.e., f .T /.v/ D 0. Hence mT ;v .x/ D f .x/.
On the one hand, mT ;v .x/ divides mT .x/. On the other hand, since
every w 2 V is w D g.T /.v/ for some polynomial g.x/,
mT ;v .T /.w/ D mT ;v .T /g.T /.v/ D g.T /mT ;v .T /.v/ D g.T /.0/ D 0;

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 116 — #130

i

i

116

Guide to Advanced Linear Algebra

for every w 2 V , and so mT .x/ divides mT ;v .x/. Thus


mT .x/ D mT ;v .x/ D f .x/:

Lemma 5.1.18. Let f .x/ D x n C an 1 x n 1 C    C a0 be a monic polynomial of degree n  1 and let A D C.f .x// be its companion matrix. Then
cA .x/ D det.xI A/ D f .x/.
Proof. We proceed by induction. If n D 1 then A D C.f .x// D Œ a0  so
xI A D Œx C a0  has determinant x C a0 .
Assume the theorem is true for k D n 1 and consider k D n. We
compute the determinant by expansion by minors of the last row
2
x C an
6 an 2
6
6
::
det 6
:
6
4 a1

3
1 0  0
x 1    07
7
7
:: : :
7
: :
7
0
15
a0
0 
x
2
1 0 
6 x 1 
6
6
::
D . 1/nC1 a0 det 6
6 0 x :
6 :
4 ::
1

0

D . 1/nC1 a0 . 1/n
n

D x C an

5.2

1x

n 1

1

x

C x xn

1

3
2
0
x C an
7
07
6 an 2
6
7
6
::
07
:
7 C x det 6
6
:: 7
4 a2
5
:
a1
1
C an

1x

n 2

C    C a1 x C a0 D f .x/:

1

1
x

0 
1 
::
:

0 
0 

C    C a2 x C a1

3
0
07
7
:: 7
:7
7
15

x




Invariant subspaces
and quotient spaces

Let V be a vector space and let T W V ! V be a linear transformation. A
T -invariant subspace of V is a subspace W of V such that T .W /  W . In
this section we will see how to obtain invariant subspaces and we will see
that if W is an invariant subspace of V , then we can obtain in a natural way
the “induced” linear transformation T W V =W ! V =W . (Recall that V =W
is the quotient of the vector space V by the subspace W . We can form V =W
for any subspace W of V , but in order for T to be defined we need W to be
an invariant subspace.)

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 117 — #131

i

i

5.2. Invariant subspaces and quotient spaces

117

Definition 5.2.1. Let T W V ! V be a linear transformation. A subspace W of V is T -invariant if T .W /  W , i.e., if T .v/ 2 W for every
v 2 W.
Þ
Remark 5.2.2. If W is a T -invariant subspace of V , then for any polynomial p.x/, p.T /.W /  W .
Þ
Lemma 5.2.4 and Lemma 5.2.6 give two basic ways of obtaining T invariant subspaces.
Definition 5.2.3. Let T W V ! V be a linear transformation. Let
B D fv1 ; : : : ; vk g be a set of vectors in V . The T -span of B is the subspace
W D

(

k
X
i D1

)
ˇ
ˇ
pi .T / vi pi .x/ 2 F Œx :

In this situation B is said to T -generate W .

Þ

Lemma 5.2.4. In the situation of Definition 5.2.3, the T -span W of B is a
T -invariant subspace of V and is the smallest T -invariant subspace of V
containing B.
In case B consists of a single vector we have the following:
Lemma 5.2.5. Let V be a finite-dimensional vector space and let T W V !
V be a linear transformation. Let w 2 V and let W be the subspace of V
T -generated by w. Then the dimension of W is equal to the degree of the
T -annihilator mT ;w .x/ of w.
Proof. It is easy to check that mT ;w .x/ has degree k if and only if
fw; T .w/; : : : ; T k 1 .w/g is a basis of W .
Lemma 5.2.6. Let T W V ! V be a linear transformation and let p.x/ 2
F Œx be any polynomial. Then
 ˚
Ker p.T / D v 2 V j p.T /.v/ D 0

is a T -invariant subspace of V .
Proof.

If v 2 Ker.p.T //, then
p.T /.T .v// D T .p.T //.v/ D T .0/ D 0:



Now we turn to quotients and induced linear transformations.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 118 — #132

i

i

118

Guide to Advanced Linear Algebra

Lemma 5.2.7. Let T W V ! V be a linear transformation, and let W  V
be a T -invariant subspace. Then T W V =W ! V =W given by T .vCW / D
T .v/ C W is a well-defined linear transformation.
Proof. Recall from Lemma 1.5.11 that V =W is the set of distinct affine
subspaces of V parallel to W , and from Proposition 1.5.4 that each such
subspace is of the form v C W for some element v of V . We need to check
that the above formula gives a well-defined value for T .v C W /. Let v and
v 0 be two elements of V with v C W D v 0 C W . Then v v 0 D w 2 W , and
then T .v/ T .v 0 / D T .v v 0 / D T .w/ D w 0 2 W , as we are assuming
that W is T -invariant. Hence
T .v C W / D T .v/ C W D T .v 0 / C W D T .v 0 C W /:
It is easy to check that T is linear.
Definition 5.2.8. In the situation of Lemma 5.2.7, we call T W V =W !
V =W the quotient linear transformation.
Þ
Remark 5.2.9. If  W V ! V =W is the canonical projection (see Definition 1.5.12), then T is given by T ..v// D .T .v//.
Þ
When V is a finite-dimensional vector space, we can recast our discussion in terms of matrices.
Theorem 5.2.10. Let V be a finite-dimensional vector space and let W
be a subspace of V . Let B1 D fv1 ; : : : ; vk g be a basis of W and extend B1 to B D fv1 ; : : : ; vk ; vkC1 ; : : : ; vn g, a basis of V . Let B2 D
fvkC1 ; : : : ; vn g. Let  W V ! V =W be the quotient map and let B 2 D
f.vkC1 /; : : : ; .vn /g, a basis of V =W .
Let T W V ! V be a linear transformation. Then W is a T -invariant
subspace if and only if ŒT B is a block upper triangular matrix of the form


A B
ŒT B D
;
0 D
where A is k-by-k.
In this case, let T W V =W ! V =W be the quotient linear transformation. Then
Œ T B D D:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 119 — #133

i

i

5.3. Characteristic and minimum polynomials

119

Lemma 5.2.11. In the situation of Lemma 5.2.7, let V be finite dimensional,
let v 2 V =W be arbitrary, and let v 2 V be any element with .v/ D v.
Then mT ;v .x/ divides mT ;v .x/.
Proof. We have v D v C W . Then
mT ;v .T /.v/ D mT ;v .T /.v C W / D mT ;v .T /.v/ C W D 0 C W D 0;
where 0 D 0 C W is the 0 vector in V =W .
Thus mT ;v .x/ is a polynomial with mT ;v .v/ D 0. But mT ;v .x/ divides
any such polynomial.
Corollary 5.2.12. In the situation of Lemma 5.2.11, the minimum polynomial mT .x/ of T divides the minimum polynomial mT .x/ of T .
Proof. It easily follows from Remark 5.2.9 that for any polynomial p.x/,
p.T /..v// D .p.T /.v//. In particular, this is true for p.x/ D mT .x/.
Any v 2 V =W is v D .v/ for some v 2 V , so
mT .T /.v/ D .mT .T /.v// D .0/ D 0:
Thus mT .T /.v/ D 0 for every v 2 V =W , i.e., mT .T / D 0. But mT .x/
divides any such polynomial.

5.3

The relationship between
the characteristic and
minimum polynomials

In this section we derive the very important Theorem 5.3.1, which gives the
relationship between the minimum polynomial mT .x/ and the characteristic polynomial cT .x/ of a linear transformation T W V ! V , where V
is a finite-dimensional vector space over a general field F . (We did this in
the last chapter for F algebraically closed.) The key result used in proving
this theorem is Theorem 5.1.11. As an immediate consequence of Theorem 5.3.1 we have Corollary 5.3.4, the Cayley-Hamilton theorem: For any
such T , cT .T / D 0.
Theorem 5.3.1. Let V be a finite-dimensional vector space and let T W
V ! V be a linear transformation. Let mT .x/ be the minimum polynomial
of T and let cT .x/ be the characteristic polynomial of T . Then

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 120 — #134

i

i

120

Guide to Advanced Linear Algebra

(1) mT .x/ divides cT .x/.
(2) Every irreducible factor of cT .x/ is an irreducible factor of mT .x/.
Proof. We proceed by induction on n D dim.V /. Let mT .x/ have degree
k  n. Let v 2 V be a vector with mT ;v .x/ D mT .x/. (Such a vector v
exists by Theorem 5.1.11.) Let W1 be the T -span of v. If we let vk D v
and vk i D T i .v/ for i  k 1 then, as in the proof of Theorem 5.1.17,
B1 D fv1 ; : : : ; vk g is a basis for W1 and ŒT jW1 B1 D C.mT .x//, the
companion matrix of mT .x/.
If k D n then W1 D V , so ŒT B1 D C.mT .x// has characteristic
polynomial mT .x/. Thus cT .x/ D mT .x/ and we are done.
Suppose k < n. Then W1 has a complement V2 , so V D W1 ˚ V2 . Let
B2 be a basis for V2 and B D B1 [B2 a basis for V . Then ŒT B is a matrix
of the form


A B
ŒT B D
0 D
where A D C.mT .x//. (The 0 block in the lower left is due to the fact that
W1 is T -invariant. If V2 were T -invariant then we would have B D 0, but
that is not necessarily the case.) We use the basis B to compute cT .x/.



xI A
B
cT .x/ D det xI ŒT B D det
0
xI D
D det.xI

A/ det.xI

D mT .x/ det.xI

D/

D/;

so mT .x/ divides cT .x/.
Now we must show that mT .x/ and cT .x/ have the same irreducible
factors. We proceed similarly by induction. If mT .x/ has degree n then
mT .x/ D cT .x/ and we are done. Otherwise we again have a direct sum
decomposition V D W1 ˚ V2 and a basis B with


A B
ŒT B D
:
0 D
In general we cannot consider the restriction T jV2 , as V2 may not be
invariant. But we can (and will) consider T W V =W1 ! V =W1 . If we let
B D .B/, then by Theorem 5.2.10,
 
T B D ŒD:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 121 — #135

i

i

5.3. Characteristic and minimum polynomials

121

By the inductive hypothesis, mT .x/ and cT .x/ have the same irreducible factors. Since mT .x/ divides cT .x/, every irreducible factor of
mT .x/ is certainly an irreducible factor of cT .x/. We must show the other
direction. Let p.x/ be an irreducible factor of cT .x/. As in the first part of
the proof,
cT .x/ D det.xI

A/ det.xI

D/ D mT .x/cT .x/:

Since p.x/ is irreducible, it divides one of the factors. If p.x/ divides the
first factor mT .x/, we are done. Suppose p.x/ divides the second factor. By
the inductive hypothesis, p.x/ divides mT .x/. By Corollary 5.2.12, mT .x/
divides mT .x/. Thus p.x/ divides mT .x/, and we are done.
Corollary 5.3.2. In the situation of Theorem 5.3.1, let mT .x/ D p1 .x/e1   
pk .x/ek for distinct irreducible polynomials p1 .x/; : : : ; pk .x/, and positive integers e1 ; : : : ; ek . Then cT .x/ D p1 .x/f1    pk .x/fk for integers
f1 ; : : : ; fk with fi  ei for each i .
Proof. This is just a concrete restatement of Theorem 5.3.1.
The following special case is worth pointing out explicitly.
Corollary 5.3.3. Let V be an n dimensional vector space and let T W V !
V be a linear transformation. Then V is T -generated by a single element if
and only if mT .x/ is a polynomial of degree n, or, equivalently, if and only
if mT .x/ D cT .x/.
Proof. For w 2 V , let W be the subspace of V T -generated by w. Then the
dimension of W is equal to the degree of mT ;w .x/, and mT ;w .x/ divides
mT .x/. Thus if mT .x/ has degree less than n, W has dimension less than
n and so W  V .
By Theorem 5.1.11, there is a vector v0 2 V with mT ;v0 .x/ D mT .x/.
Thus if mT .x/ has degree n, the subspace V0 of V generated by v0 has
dimension n and so V0 D V .
Since mT .x/ and cT .x/ are both monic polynomials, and mT .x/ divides cT .x/ by Theorem 5.3.1, then mT .x/ D cT .x/ if and only if they
have the same degree. But cT .x/ has degree n.
Theorem 5.3.1 has a famous corollary, originally proved by completely
different methods.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 122 — #136

i

i

122

Guide to Advanced Linear Algebra

Corollary 5.3.4 (Cayley-Hamilton Theorem). (1) Let V be a finite-dimensional vector space and let T W V ! V be a linear transformation with
characteristic polynomial cT .x/. Then cT .T / D 0.
(2) Let A be an n-by-n matrix and let cA .x/ be its characteristic polynomial cA .x/ D det.xI A/. Then cA .A/ D 0.
Proof. (1) mT .T / D 0 and mT .x/ divides cT .x/, so cT .T / D 0.
(2) This is a translation of (1) into matrix language. (Let T D TA .)
Remark 5.3.5. The minimum polynomial mT .x/ has appeared more
prominently than the characteristic polynomial cT .x/ so far. As we shall
see, mT .x/ plays a more important role in analyzing the structure of T
than cT .x/ does. However, cT .x/ has the very important advantage that
it can be calculated without having to consider the structure of T . It is a
determinant, and we have methods for calculating determinants.
Þ

5.4

Invariant subspaces and
invariant complements

We have stressed the difference between subspaces and quotient spaces. If
V is a vector space and W is a subspace, then the quotient space V =W is
not a subspace of V . But W always has a complement W 0 (though except in
trivial cases, W 0 is not unique), V D W ˚ W 0 , and if  W V ! V =W is the
canonical projection, then the restriction =W gives an isomorphism from
W 0 to V =W . (On the one hand this can be very useful, but on the other hand
it makes it easy to confuse the quotient space V =W with the subspace W 0 .)
Once we consider T -invariant subspaces, the situation changes markedly.
Given a vector space V , a linear transformation T W V ! V , and a T invariant subspace W , then, as we have seen in Lemma 5.2.7, we obtain
from T in a natural way a linear transformation T on the quotient space
V =W . However, it is not in general the case that W has a T -invariant complement W 0 .
This section will be devoted investigating the question of when a T invariant subspace W has a T -invariant complement W 0 . We will see two
situations in which this is always the case—Theorem 5.4.6, whose proof
is relatively simple, and Theorem 5.4.10, whose proof is more involved.
Theorem 5.4.10 is the key result we will need in order to develop rational
canonical form, and Theorem 5.4.6 is the key result we will need in order
to further develop Jordan canonical form.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 123 — #137

i

i

5.4. Invariant subspaces and invariant complements 123
Definition 5.4.1. Let T W V ! V be a linear transformation. Then
V D W1 ˚    ˚ Wk is a T -invariant direct sum if V D W1 ˚    ˚ Wk
is the direct sum of W1 ; : : : ; Wk and each Wi is a T -invariant subspace. If
V D W1 ˚ W2 is a T -invariant direct sum decomposition, then W2 is a
T -invariant complement of W1 .
Þ
Example 5.4.2. (1) Let V be a 2-dimensional vector space with basis
fv1 ; v2 g and let T W V ! V be defined by T .v1 / D 0, T .v2 / D v2 . Then
W1 D Ker.T / D fc1v1 j c1 2 F g is a T -invariant subspace, and it has
T -invariant complement W2 D Ker.T I/ D fc2v2 j c2 2 F g.
(2) Let V be as in part (1) and let T W V ! V be defined by T .v1 / D 0,
T .v2 / D v1 . Then W1 D Ker.T / D fc1 v1 j c1 2 F g is again a T -invariant
subspace, but it does not have a T -invariant complement. Suppose W2 is
any T -invariant subspace with V D W1 C W2 . Then W2 has a vector of the
form c1 v1 C c2v2 for some c2 ¤ 0. Then T .c1 v1 C c2 v2 / D c2 v1 2 W2 , so
W2 contains the subspace spanned by fc2v1 ; c1v1 C c2v2 g, i.e., W2 D V ,
and then V is not the direct sum of W1 and W2 . (Instead of W1 \ W2 D f0g,
as required for a direct sum, W1 \ W2 D W1 .)
Þ
We now consider a more elaborate situation and investigate invariant
subspaces, complements, and induced linear transformations.
Example 5.4.3. Let g.x/ and h.x/ be two monic polynomials that are
not relatively prime and let f .x/ D g.x/h.x/. (For example, we could
choose an irreducible polynomial p.x/ and let g.x/ D p.x/i and h.x/ D
p.x/j for positive integers i and j , in which case f .x/ D p.x/k where
k D i C j .)
Let V be a vector space and T W V ! V a linear transformation with
mT .x/ D cT .x/ D f .x/.
Let v0 2 V be an element with mT ;v0 .x/ D f .x/, so that V is T generated by the single element v0 . Let W1 D h.T /.V /. We claim that W1
does not have a T -invariant complement. We prove this by contradiction.
Suppose that V D W1 ˚ W2 where W2 is also T -invariant. Denote the
restrictions of T to W1 and W2 by T1 and T2 respectively. First we claim
that mT1 .x/ D g.x/.
If w1 2 W1 , then w1 D h.T /.v1 / for some v1 2 V . But v0 T -generates
V , so v1 D k.T /.v0 / for some polynomial k.T /, and then



g.T / w1 D g.T / h.T / v1 D g.T / h.T / k.T / v0

D k.T / g.T / h.T / v0

D k.T / f .T / v0 D k.T /.0/ D 0:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 124 — #138

i

i

124

Guide to Advanced Linear Algebra

Thus g.T /.w1 / D 0 for every w1 2 W1 , so mT1 .x/ divides g.x/. If we
let w0 D h.T /.v0 / and set k.x/ D mT1 ;w0 .x/, then 0 D k.T /.w0 / D
k.T /h.T /.v0 /, so mT ;v0 .x/ D g.x/h.x/ divides k.x/h.x/. Thus g.x/
divides k.x/ D mT1 ;w0 .x/, which divides mT1 .x/.
Next we claim that mT2 .x/ divides h.x/. Let w2 2 W2 . Then h.T /.w2 /
2 W1 (as h.T /.v/ 2 W1 for every v 2 V ). Since W2 is T -invariant,
h.T /.w2 / 2 W2 , so h.T /.w2 / 2 W1 \ W2 . But W1 \ W2 D f0g by the
definition of a direct sum, so h.T /.w2 / D 0 for every w2 2 W2 , and hence
mT2 .x/ divides h.x/. Set h1 .x/ D mT2 .x/.
If V D W1 ˚ W2 , then v0 D w1 C w2 for some w1 2 W1 , w2 2 W2 .
Let k.x/ be the least common multiple of g.x/ and h.x/. Then k.T /.v0 / D
k.T /.w1 C w2 / D k.T /.w1 / C k.T /.w2 / D 0 C 0 as mT1 .x/ D g.x/
divides k.x/ and mT2 .x/ D h1 .x/ divides h.x/, which divides k.x/. Thus
k.x/ is divisible by f .x/ D mT ;v0 .x/. But we chose g.x/ and h.x/ to not
be relatively prime, so their least common multiple k.x/ is a proper factor
of their product f .x/, a contradiction.
Þ
Example 5.4.4. Suppose that g.x/ and h.x/ are relatively prime, and
let f .x/ D g.x/h.x/. Let V be a vector space and let T W V ! V a
linear transformation with mT .x/ D cT .x/ D f .x/. Let v0 2 V with
mT ;v0 .x/ D mT .x/, so that V is T -generated by v0 . Let W1 D h.T /.V /.
We claim that W2 D g.T /.V / is a T -invariant complement of W1 .
First we check that W1 \ W2 D f0g. An argument similar to that in
the previous example shows that if w 2 W1 , then mT1 ;w .x/ divides g.x/,
and that if w 2 W2 , then mT2 ;w .x/ divides h.x/. Hence if w 2 W1 \ W2 ,
mT ;w .x/ divides both g.x/ and h.x/, and thus divides their gcd. These two
polynomials were assumed to be relatively prime, so their gcd is 1. Hence
1w D 0, i.e., w D 0.
Next we show that we can write any vector in V as a sum of a vector in
W1 and a vector in W2 . Since v0 T -generates V , it suffices to show that we
can write v0 in this way. Now g.x/ and h.x/ are relatively prime, so there
are polynomials r .x/ and s.x/ with g.x/r .x/ C s.x/h.x/ D 1. Then
 
v0 D 1v0 D h.T /s.T / C g.T /r .T / v0


D h.T / s.T / v0 C g.T / r .T / v0 D w1 C w2
where

and

w1 D h.T /.s.T /.v0 // 2 h.T /.V / D W1
w2 D g.T /.r .T /.v0 // 2 g.T /.V / D W2 :

Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 125 — #139

i

i

5.4. Invariant subspaces and invariant complements 125
Example 5.4.5. Let g.x/ and h.x/ be arbitrary polynomials and let
f .x/ D g.x/h.x/. Let V be a vector space and T W V ! V a linear transformation with mT .x/ D cT .x/ D f .x/. Let v0 2 V with
mT ;v0 .x/ D mT .x/ so that V is T -generated by v0 .
Let W1 D h.T /.V /. Then we may form the quotient space V 1 D
V =W1 , with the quotient linear transformation T W V 1 ! V 1 , and 1 W
V ! V 1 . Clearly V 1 is T -generated by the single element v 1 D 1 .v0 /.
(Since any v 2 V can be written as v D k.T /.v0 / for some polynomial k.x/, then v C W1 D k.T /.v0 / C W1 .) We claim that mT ;v 1 .x/ D
cT ;v 1 .x/ D h.x/. We see that h.T /.v 1 / D h.T /.v0 / C W1 D 0 C W1 as
h.T /.v0 / 2 W1 . Hence mT ;v 1 .x/ D k.x/ divides h.x/. Now k.T /.v 1 / D
0 C W1 , i.e., k.T /.v0 / 2 W1 D h.T /.V /, so k.T /.v0 / D h.T /.u1 / for
some u1 2 V . Then g.T /k.T /.v0 / D g.T /h.T /.v1 / D f .T /.u1 / D 0
since mT .x/ D f .x/. Then f .x/ D g.x/h.x/ divides g.x/k.x/, so h.x/
divides k.x/. Hence mT ;v 1 .x/ D k.x/ D h.x/.
The same argument shows that if W2 D g.T /.V / and V 2 D V =W2 with
T W V 2 ! V 2 the induced linear transformation then V 2 is T -generated
by the single element v 2 D 2 .v0 / with mT ;v 2 .x/ D g.x/.
Þ
We now come to the two most important ways we can obtain T -invariant
complements (or direct sum decompositions). Here is the first.
Theorem 5.4.6. Let V be a vector space and let T W V ! V be a linear
transformation. Let T have minimum polynomial mT .x/ and let mT .x/
factor as a product of pairwise relatively prime polynomials, mT .x/ D
p1 .x/    pk .x/. For i D 1; : : : ; k, let Wi D Ker.pi .T //. Then each Wi is
a T -invariant subspace and V D W1 ˚    ˚ Wk .
Proof. For any i , let wi 2 Wi . Then
pi .T /.T .wi // D T .pi .T /.wi // D T .0/ D 0
so T .wi / 2 Wi and Wi is T -invariant.
For each i , let qi .x/ D mT .x/=pi .x/. Then fq1 .x/; : : : ; qk .x/g is relatively prime, so there are polynomials r1 .x/; : : : ; rk .x/ with q1 .x/r1 .x/ C
   C qk .x/rk .x/ D 1.
Let v 2 V . Then

v D Iv D q1 .T /r1 .T / C    C qk .T /rk .T / .v/
D w1 C    C wk

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 126 — #140

i

i

126

Guide to Advanced Linear Algebra

with wi D qi .T /ri .T /.v/. Furthermore,

pi .T / wi D pi .T /qi .T /ri .T /.v/
D mT .T /ri .T /.v/ D 0

as mT .T / D 0

by the definition of the minimum polynomial mT .x/, and so wi 2 Wi .
To complete the proof we show that if 0 D w1 C    C wk with wi 2 Wi
for each i , then w1 D    D wk D 0. Suppose i D 1. Then 0 D w1 C    C
wk so
0 D q1 .T /.0/ D q1 .T /.w1 C    C wk /
D q1 .T /.w1 / C 0 C    C 0 D q1 .T /.w1 /

as pi .x/ divides q1 .x/ for every i > 1. Also p1 .T /.w1 / D 0 by definition.
Now p1 .x/ and q1 .x/ are relatively prime, so there exist polynomials f .x/
and g.x/ with f .x/p1 .x/ C g.x/q1 .x/ D 1. Then
w1 D Iw1 D .f .T /p1 .T / C g.T /q1 .T //.w1 /
D f .T /.p1 .T /.w1 // C g.T /.q1 .T /.w1 //

D f .T /.0/ C g.T /.0/ D 0 C 0 D 0:
Similarly, wi D 0 for each i .

As a consequence, we obtain the T -invariant subspaces of a linear transformation T W V ! V .
Theorem 5.4.7. Let T W V ! V be a linear transformation and let
mT .x/ D p1 .x/e1    pk .x/ek be a factorization of the minimum polynomial of T into powers of distinct irreducible polynomials. Let Wi D
Ker.pi .T /ei /, so that V D W1 ˚  ˚Wk , a T -invariant direct sum decomposition. For i D 1; : : : ; k, let Ui be a T -invariant subspace of Wi (perhaps
Ui D f0g). Then U D U1 ˚    ˚ Uk is a T -invariant subspace of V , and
every T -invariant subspace of V arises in this way.
Proof. We have V D W1 ˚    ˚ Wk , by Theorem 5.4.6. It is easy to check
that any such U is T -invariant. We show that these are all the T -invariant
subspaces.
Let U be any T -invariant subspace of V . Let i W V ! Wi be the
projection and let Ui D i .U /. We claim that U D U1 ˚    ˚ Uk . To show
that it suffices to show that Ui  U for each i . Let ui 2 Ui . Then, by the
definition of Ui , there is an element u of U of the form u D u1 C   C ui C
   C uk , for some elements uj 2 Uj , j ¤ i . Let qi .x/ D mT .x/=pi .x/ei .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 127 — #141

i

i

5.4. Invariant subspaces and invariant complements 127
Since pi .x/ei and qi .x/ are relatively prime, there are polynomials ri .x/
and si .x/ with ri .x/pi .x/ei C si .x/qi .x/ D 1. We have qi .T /.uj / D 0 for
j ¤ i and pi .T /ei .ui / D 0. Then
 
ui D 1ui D 1 ri .T /pi .T /ei ui

D si .T /qi .T / ui

D 0 C : : : C si .T /qi .T / ui C : : : C 0

D si .T /qi .T / u1 C : : : C si .T /qi .T /.ui / C : : : C sk .T /qk .T /.ui /

D si .T /qi .T / u1 C : : : C ui C : : : C uk D si .T /qi .T /.u/:

Since U is T -invariant, si .T /qi .T /.u/ 2 U , i.e., ui 2 U , as claimed.

Now we come to the second way in which we can obtain T -invariant
complements. The proof here is complicated, so we separate it into two
stages.
Lemma 5.4.8. Let V be a finite-dimensional vector space and let T W V !
V be a linear transformation. Let w1 2 V be any vector with mT ;w1 .x/ D
mT .x/ and let W1 be the subspace of V T -generated by w1. Suppose that
W1 is a proper subspace of V and that there is a vector v2 2 V such that
V is T -generated by fw1; v2 g. Then there is a vector w2 2 V such that
V D W1 ˚ W2 , where W2 is the subspace of V T -generated by w2 .
Proof. Observe that if V2 is the subspace of V that is T -generated by v2 ,
then V2 is a T -invariant subspace and, by hypothesis, every v 2 V can
be written as v D w10 C v200 for some w10 2 W1 and some v200 2 V2 . Thus
V D W1 C V2 . However, there is no reason to conclude that W1 and V2 are
independent subspaces of V , and that may not be the case.
Our proof will consist of showing how to “modify” v2 to obtain a vector
w2 such that we can still write every v 2 V as v D w10 C w20 with w10 2 W1
and w20 2 W2 , the subspace of V T -generated by w2 , and with W1 \ W2 D
f0g. We consider the vector v20 D v2 C w where w is any element of W1 .
Then we observe that fw1; v20 g also T -generates V . Our proof will consist
of showing that for the proper choice of w, w2 D v20 D v2 Cw is an element
of V with W1 \ W2 D f0g. Let V have dimension n and let mT .x/ be a
polynomial of degree k. Set j D n k. Then W1 has basis
B1 D fu1 ; : : : ; uk g D fT k

1

.w1 /; : : : ; T .w1 /; w1g:

By hypothesis, V is spanned by
fw1; T .w1 /; : : :g [ fv20 ; T .v20 /; : : :g;

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 128 — #142

i

i

128

Guide to Advanced Linear Algebra

so V is also spanned by
fw1 ; T .w1 /; : : : ; T k

1

.w1 /g [ fv20 ; T .v20 /; : : :g:

We claim that
fw1; T .w1 /; : : : ; T k

1

.w1 /g [ fv20 ; T .v20 /; : : : ; T j

1

.v20 /g

is a basis for V . We see this as follows: We begin with the linearly independent set fw1 ; : : : ; T k 1 .w1 /g and add v20 ; T .v20 /; : : : as long as we can do so
and still obtain a linearly independent set. The furthest we can go is through
T j 1 .v20 /, as then we have k C j D n vectors in an n-dimensional vector
space. But we need to go that far, as once some T i .v20 / is a linear combination of B1 and fv20 ; : : : ; T i 1 .v20 /g, this latter set, consisting of k C i
vectors, spans V , so i  j . (The argument for this uses the fact that W1 is
T -invariant.) We then let
B20 D fu0kC1 ; : : : ; u0n g D fT j

1

.v20 /; : : : ; v20 g and B 0 D B1 [ B20 :

Then B 0 is a basis of V .
Consider T j .u0n /. It has a unique expression in terms of basis elements:
D

k
X

If we let p.x/ D x j C cj

1x

T

j

u0n



i D1

bi ui C
j 1

j 1
X
i D0


ci u0n i :

C    C c0 , we have that

k

 X
u D p.T / v20 D p.T / u0n D
bi ui 2 W1 :
i D1

Case I (incredibly lucky): u D 0. Then T j .v20 / 2 V20 , the subspace
T -spanned by v20 , which implies that T i .v20 / 2 V20 for every i , so V20 is
T -invariant. Thus in this case we choose w2 D v20 , so W2 D V2 , T D
W1 ˚ W2 , and we are done.

Case II (what we expect): u ¤ 0. We have to do some work.
The key observation is that the coefficients bk ; bk 1 ; : : : ; bk j C1 are all
P
0, and hence u D ikD1j bi ui . Here is where we crucially use the hypothesis
that mT ;w1 .x/ D mT .x/. We argue by contradiction. Suppose bm ¤ 0 for
some m  k j C 1, and let m be the largest such index. Then
Tm

1

.u/ D bm u1 ;

Tm

2

.u/ D bm u2 C bm

1 u1 ;

::::

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 129 — #143

i

i

5.4. Invariant subspaces and invariant complements 129
Thus we see that
˚ m 1

T
p.T / v20 ; T m

2



p.T / v20 ; : : : ; p.T / v20 ;

T j 1 v20 ; T j

2


v20 ; : : : ; v20

is a linearly independent subset of V20 , the subspace of V T -generated by
v20 , and hence V20 has dimension at least m C j  k C 1. That implies
that mT ;v20 .x/ has degree at least k C 1. But mT ;v20 .x/ divides mT .x/ D
mT ;w1 .x/, which has degree k, and that is impossible.
We now set
k 1
X

wD

i D1

bi ui Cj

and w2 D v20 C w,
B1 D fu1 ; : : : ; uk g D fT k

1

B2 D fukC1 ; : : : ; un g D fT j

.w1 /; : : : ; w1g
1

(as before);

.w2 /; : : : ; w2g; and B D B1 [ B2 :

We then have




T j un D T j v20 C w D T j v20 C T j w
!
k
k
Xj
Xj
j
D
bi ui C T
bi ui Cj
i D1

k j

D

X
i D1

k j

bi ui C

X
i D1

i D1


bi ui D 0

and we are back in Case I (through skill, rather than luck) and we are done.
Corollary 5.4.9. In the situation of Lemma 5.4.8, let n D dim V and let
k D deg mT .x/. Then n  2k. Suppose that n D 2k. If V2 is the subspace
of V T -generated by v2 , then V D W1 ˚ V2 .
Proof. From the proof of Lemma 5.4.8 we see that j D n k  k. Also, if
n D 2k, then j D k, so bk ; bk 1 ; : : : ; b1 are all zero. Then u D 0, and we
are Case I.
Theorem 5.4.10. Let V be a finite-dimensional vector space and let T W
V ! V be a linear transformation. Let w1 2 V be any vector with

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 130 — #144

i

i

130

Guide to Advanced Linear Algebra

mT ;w1 .x/ D mT .x/ and let W1 be the subspace of V T -generated by
w1 . Then W1 has a T -invariant complement W2 , i.e., there is a T -invariant
subspace W2 of V with V D W1 ˚ W2 .
Proof. If W1 D V then W2 D f0g and we are done.
Suppose not. W2 D f0g is a T -invariant subspace of V with W1 \
W2 D f0g. Then there exists a maximal T -invariant subspace W2 of V with
W1 \ W2 D f0g, either by using Zorn’s Lemma, or more simply by taking
such a subspace of maximal dimension. We claim that W1 ˚ W2 D V .
We prove this by contradiction, so assume W1 ˚ W2  V .
Choose an element v2 of V with v2 … W1 ˚ W2 . Let V2 be the subspace
T -spanned by v2 and let U2 D W2 C V2 . If W1 \ U2 D f0g then U2 is
a T -invariant subspace of V with W1 \ U2 D f0g and with U2  W2 ,
contradicting the maximality of W2 .
Otherwise, let V 0 D W1 C U2 . Then V 0 is a T -invariant subspace of
V so we may consider the restriction T 0 of T to V 0 , T 0 W V 0 ! V 0 . Now
W2 is a T 0 -invariant subspace of V 0 , so we may consider the quotient linear
transformation T 0 W V 0 =W2 ! V 0 =W2 . Set X D V 0 =W2 and S D T 0 . Let
 W V 0 ! X be the quotient map. Let w 1 D .w1 / and let v 2 D .v2 /.
Let Y1 D .W1 /  X and let Z2 D .U2 /  X. We make several
observations: First, Y1 and Z2 are S-invariant subspaces of X. Second, Y1
is T -spanned by w 1 and Z2 is T -spanned by v 2 , so that X is T -spanned
by fw 1 ; v2 g. Third, since W1 \ W2 D f0g, the restriction of  to W1 ,  W
W1 ! Y1 , is 1-1.
Certainly mT 0 .x/ divides mT .x/ (as if p.T /.v/ D 0 for every v 2 V ,
then p.T /.v/ D 0 for every v 2 V 0 ) and we know that mS .x/ divides
mT 0 .x/ by Corollary 5.2.12. By hypothesis mT ;w1 .x/ D mT .x/, and, since
 W W1 ! Y1 is 1-1, mS ;w 1 .x/ D mT ;w1 .x/. Since w1 2 V 0 , mT ;w1 .x/
divides mT 0 .x/. Finally, mS ;w 1 .x/ divides mS .x/. Putting these together,
we see that
mS ;w 1 .x/ D mS .x/ D mT 0 .x/ D mT .x/ D mT ;w1 .x/:
We now apply Lemma 5.4.8 with T D S, V D X, w1 D w 1 , and
v2 D v 2 . We conclude that there is a vector, which we denote by w 2 , such
that X D Y1 ˚ Y2 , where Y2 is the subspace of X generated by w 2 . Let w20
be any element of V 0 with .w20 / D w 2 , and let V20 be the subspace of V 0
T 0 -spanned by w20 , or, equivalently, the subspace of V T -spanned by w20 .
Then .V20 / D Y2 .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 131 — #145

i

i

5.4. Invariant subspaces and invariant complements 131
To finish the proof, we observe that
V 0 =W2 D X D Y1 C Z2 D Y1 ˚ Y2 ;
so, setting U20 D W2 C V20 ,


V D W1 C V20 C W2 D W1 C W2 C V20 D W1 C U20 :

Also, W1 \ U20 D f0g. For if x 2 W1 \ U20 , .x/ 2 .W1 / \ .U20 / D
Y1 \ Y2 D f0g (as .w2 / D f0g). But if x 2 W1 \ U20 , then x 2 W1 , and
the restriction of  to W1 is 1-1, so .x/ D 0 implies x D 0.
Hence V 0 D W1 ˚ U20 and U20  W2 , contradicting the maximality of
W2 .
We will only need Theorem 5.4.10 but we can generalize it.
Corollary 5.4.11. Let V be a finite-dimensional vector space and let T W
V ! V be a linear transformation. Let w1 ; : : : ; wk 2 V and let Wi be
the subspace T -spanned by wi , i D 1; : : : ; k. Suppose that mT ;wi .x/ D
mT .x/ for i D 1; : : : ; k, and that fW1 ; : : : ; Wk g is independent. Then
W1 ˚    ˚ Wk has a T -invariant complement, i.e., there is a T -invariant
subspace W 0 of V with V D W1 ˚    ˚ Wk ˚ W 0 .
Proof. We proceed by induction on k. The k D 1 case is Theorem 5.4.10.
For the induction step, consider T W V ! V where V D V =W1 .
We outline the proof.
Let WkC1 be a maximal T -invariant subspace of V with
.W1 ˚    ˚ Wk / \ WkC1 D f0g:
We claim that W1 ˚    ˚ WkC1 D V . Assume not. Let W i D T .Wi /
for i D 2; : : : ; k. By the inductive hypothesis, W 2 ˚    ˚ W k has a T invariant complement Y kC1 containing .WkC1 /. (This requires a slight
modification of the statement and proof of Theorem 5.4.10. We used our
original formulation for the sake of simplicity.) Let YkC1 be a subspace
of V with YkC1  WkC1 and .YkC1 / D Y kC1 . Certainly .W2 ˚    ˚
Wk / \ YkC1 D f0g. Choose any vector y 2 YkC1 , y … WkC1 . If the
subspace Y T -generated by y is disjoint from W1 , set x D y and X D Y .
Otherwise, “modify” Y as in the proof of Lemma 5.4.8 to obtain x with X,
the subspace T -generated by x, disjoint from W1 . Set W 0 D WkC1 ˚ X.
Then W 0  WkC1 and W 0 is disjoint from W1 ˚    ˚ Wk , contradicting
the maximality of WkC1 .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 132 — #146

i

i

132

5.5

Guide to Advanced Linear Algebra

Rational canonical form

Let V be a finite-dimensional vector space over an arbitrary field F and let
T W V ! V be a linear transformation. In this section we prove that T has
a unique rational canonical form.
The basic idea of the proof is one we have seen already in a much simpler context. Recall the theorem that any linearly independent subset of a
vector space extends to a basis of that vector space. We think of that as saying that any partial good set extends to a complete good set. We would like
to do the same thing in the presence of a linear transformation T : Define a
partial T -good set and show that any partial T -good set extends to a complete T -good set. But we have to be careful to define a T -good set in the
right way. We will see that the right kind of way to define a partial T -good
set is to define it as the right kind of basis for the right kind of T -invariant
subspace W . Then we will be able extend this to the right kind of basis for
all of V by using Theorem 5.4.10.
Definition 5.5.1. Let V be a finite-dimensional vector space and let
T W V ! V be a linear transformation. An ordered set C D fw1; : : : ; wk g
is a rational canonical T -generating set of V if the following conditions
are satisfied:
(1) V D W1 ˚   ˚ Wk where Wi is the subspace of V that is T -generated
by wi
(2) pi .x/ is divisible by pi C1 .x/ for i D 1; : : : ; k
mT ;wi .x/ is the T -annihilator of wi .

1, where pi .x/ D
Þ

When T D I, any basis of V is a rational canonical T -generating set
and vice-versa, with pi .x/ D x 1 for every i . Of course, every V has a
basis. A basis for V is never unique, but any two bases of V have the same
number of elements, namely the dimension of V .
Here is the appropriate generalization of these two facts. For the second
fact, we have not only that any two rational canonical T -generating sets
have the same number of elements, but also the same number of elements
of each “type”, where the type of an element is its T -annihilator.
Theorem 5.5.2. Let V be a finite-dimensional vector space and let T W
V ! V be a linear transformation. Then V has a rational canonical T generating set C D fw1; : : : ; wk g. If C 0 D fw10 ; : : : ; wl0 g is any rational
canonical T -generating set of V , then k D l and pi0 .x/ D pi .x/ for i D
1; : : : ; k, where pi0 .x/ D mT ;w 0 .x/ and pi .x/ D mT ;wi .x/.
i

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 133 — #147

i

i

5.5. Rational canonical form

133

Proof. First we prove existence and then we prove uniqueness.
For existence we proceed by induction on n D dim.V /. Choose an
element w1 of V with mT ;w1 .x/ D mT .x/ and let W1 be the subspace of
V T -generated by w1 . If W1 D V we are done.
Otherwise, let W 0 be a T -invariant complement of W in V , which exists
by Theorem 5.4.10. Then V D W ˚ W 0 . Let T 0 be the restriction of T to
W 0 , T 0 W W 0 ! W 0 . Then mT 0 .x/ divides mT .x/. (Since mT .T /.v/ D 0
for all v 2 V , mT .T /.v/ D 0 for all v in W 0 .) By induction, W 0 has a
rational canonical T 0 -generating set that we write as fw2; : : : ; wk g. Then
fw1; : : : ; wk g is a rational canonical T -generating set of V .
For uniqueness, suppose V has rational canonical T -generating sets
C D fw1 ; : : : ; wk g and C 0 D fw10 ; : : : ; wl0 g with corresponding T -invariant
direct sum decompositions V D W1 ˚  ˚Wk and V D W10 ˚  ˚Wl0 and
corresponding T -annihilators pi .x/ D mT ;wi .x/ and pi0 .x/ D mT ;wi0 .x/.
Let these polynomials have degree di and di0 respectively, and let V have
dimension n. We proceed by induction on k.
Now p1 .x/ D mT .x/ and p10 .x/ D mT .x/, so p10 .x/ D p1 .x/. If k D
1, V D W1 , dim.V / D dim.W1 /, n D d1 . But then n D d10 D dim.W10 / so
V D W10 . Then l D 1, p10 .x/ D p1 .x/, and we are done.
Suppose for some k  1 we have pi0 .x/ D pi .x/ for i D 1; : : : ; k. If
V D W1 ˚    ˚ Wk then n D d1 C    C dk D d10 C    C dk0 so V D
W10 ˚  ˚Wk0 as well, l D k, pi0 .x/ D pi .x/ and we are done, and similarly
if V D W10 ˚    ˚ Wl0 . Otherwise consider the vector space pkC1 .T /.V /,
a T -invariant subspace of V . Since V D W1 ˚    ˚ Wk ˚ WkC1 ˚    we
have that


pkC1 .T /.V / D pkC1 .T / W1 ˚    ˚ pkC1 .T / Wk

˚ pkC1 .T / WkC1 ˚    :

Let us identify this subspace further. Since pkC1 .x/ D mT ;wkC1 .x/, we
have that pkC1 .T /.wkC1 / D 0, and hence pkC1 .T /.WkC1 / D 0. Since
pkCi .x/ divides pkC1 .x/ for i  1, we also have that pkC1 .T /.wkCi / D 0
and hence pkC1 .T /.WkCi / D 0 for i  1. Thus


pkC1 .T /.V / D pkC1 .T / W1 ˚    ˚ pkC1 .T / Wk :

Now pkC1 .x/ divides pi .x/ for i < k, so pkC1 .T /.Wi / has dimension
di dkC1 , and hence pkC1 .T /.V / is a vector space of dimension d D
.d1 dkC1 / C .d2 dkC1 / C    C .dk dkC1 /. (Some or all of these
differences of dimensions may be zero, which does not affect the argument.)

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 134 — #148

i

i

134

Guide to Advanced Linear Algebra

Apply the same argument to the decomposition V D W10 ˚  ˚Wl0 to obtain


pkC1 .T /.V / D pkC1 .T / W10 ˚    ˚ pkC1 .T / Wk0

0
˚ pkC1 .T / WkC1
˚

which has the subspace pkC1 .T /.W10 / ˚    ˚ pkC1 .T /.Wk0 / of dimension d as well (since pi0 .x/ D pi .x/ for i  k). Thus this subspace must
0
be the entire space, and in particular pkC1 .T /.WkC1
/ D 0, or, equiva0
0
0
lently, pkC1 .T /.WkC1 / D 0. But wkC1 has T -annihilator pkC1
.x/, so
0
0
pkC1 .x/ divides pkC1 .x/. The same argument using pkC1 .T /.V / instead
0
of pkC1 .T /.V / shows that pkC1 .x/ divides pkC1
.x/, so we see that
0
pkC1 .x/ D pk .x/. Proceeding in this way we obtain pi0 .x/ D pi .x/ for
every i , and l D k, and we are done.
We translate this theorem into matrix language.
Definition 5.5.3. An n-by-n matrix M is in rational canonical form if
M is a block diagonal matrix

2
3
C p1 .x/

6
7
C p2 .x/
6
7
M D6
7
::
4
5
:

C pk .x/
where C.pi .x// denotes the companion matrix of pi .x/, for some sequence
of polynomials p1 .x/; p2 .x/; : : : ; pk .x/ with pi .x/ divisible by pi C1 .x/
for i D 1; : : : ; k 1.
Þ

Theorem 5.5.4 (Rational Canonical Form). (1) Let V be a finite-dimensional
vector space, and let T W V ! V be a linear transformation. Then V has a
basis B such that ŒT B D M is in rational canonical form. Furthermore,
M is unique.
(2) Let A be an n-by-n matrix. Then A is similar to a unique matrix M
in rational canonical form.
Proof. (1) Let C D fw1; : : : ; wk g be a rational canonical T -generating set
for V , where pi .x/ D mT;wi .x/ has dimension di . Then
˚



B D T d1 1 w1 ; : : : ; w1; T d2 1 w2 ; : : : ; w2; : : : ; T dk 1 wk ; : : : ; wk

is the desired basis.
(2) Apply part (1) to the linear transformation T D TA .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 135 — #149

i

i

5.5. Rational canonical form

135

Definition 5.5.5. If T has rational canonical form with diagonal blocks
C.p1 .x//; C.p2 .x//; : : : ; C.pk .x// with pi .x/ divisible by pi C1 .x/ for
i D 1; : : : ; k 1, then p1 .x/; : : : ; pk .x/ is the sequence of elementary
divisors of T .
Þ
Corollary 5.5.6. (1) T is determined up to similarity by its sequence of
elementary divisors p1 .x/; : : : ; pk .x/
(2) The sequence of elementary divisors p1 .x/; : : : ; pk .x/ is determined
recursively as follows: p1 .x/ D mT .x/. Let w1 be any element of V with
mT ;w1 .x/ D mT .x/ and let W1 be the subspace T -generated by w1 . Let
T W V =W1 ! V =W1 . Then p2 .x/ D mT .x/, etc.
Corollary 5.5.7. Let T have elementary divisors fp1 .x/; : : : ; pk .x/g. Then
(1) mT .x/ D p1 .x/
(2) cT .x/ D p1 .x/p2 .x/    pk .x/.
Proof.

We already know (1). As for (2),

cT .x/ D det.C.p1 .x/// det.C.p2 .x///    D p1 .x/p2 .x/    pk .x/: 
Remark 5.5.8. In the next section we will develop Jordan canonical
form, and in the following section we will develop an algorithm for finding the Jordan canonical form of a linear transformation T W V ! V , and
for finding a Jordan basis of V , providing we can factor the characteristic
polynomial of T .
There is an unconditional algorithm for finding a rational canonical T generating set for a linear transformation T W V ! V , and hence the rational canonical form of T . Since it can be tedious to apply, and the result is
not so important, we will merely sketch the argument.
First observe that for any nonzero vector v 2 V , we can find its T annihilator mT ;x .x/ as follows: Successively check whether the sets
fvg; fv; T .v/g; fv; T .v/; T 2 .v/g; : : : , are linearly independent. When we
come to a linearly dependent set fv; T .v/; : : : ; T k .v/g, stop. From the linear dependence we obtain the T -annihilator mT .x/ of v, a polynomial of
degree k.
Next observe that using Euclid’s algorithm we may find the gcd and lcm
of any finite set of polynomials (without having to factor them).
Given these observations we proceed as follows: Pick a basis fv1 ; : : : ; vn g
of V . Find the T -annihilators mT ;v1 .x/; : : : ; mT ;vn .x/. Knowing these, we
can find the minimum polynomial mT .x/ by using Theorem 5.1.5. Then

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 136 — #150

i

i

136

Guide to Advanced Linear Algebra

we can find a vector w1 2 V with mT ;w1 .x/ D mT .x/ by using Theorem 5.1.11.
Let W1 be the subspace of V T -generated by w1. Choose any complement V2 of V , so that V D W1 ˚ V2 , and choose any basis fv2 ; : : : ; vm g
of V2 . Successively “modify” v2 ; : : : ; vm to u2 ; : : : ; um as in the proof of
Lemma 5.4.8. The subspace U2 spanned by fu2 ; : : : ; um g is a T -invariant
complement of W1 , V D W1 ˚ U2 . Let T 0 be the restriction of T to U2 , so
that T 0 W U2 ! U2 . Repeat the argument for U2 , etc.
In this way we obtain vectors w1; w2 ; : : : ; wk , with C D fw1; : : : ; wk g
being a rational canonical T -generating set for V , and from C we obtain a
basis B of V with ŒT B the block diagonal matrix whose diagonal blocks
are the companion matrices C.mT ;w1 .x//; : : : ; C.mT ;wk .x//, a matrix in
rational canonical form.
Þ

5.6

Jordan canonical form

Now let F be an algebraically closed field, let V be a finite-dimensional
vector space over F , and let T W V ! V be a linear transformation. In
this section we show in Theorem 5.6.5 that T has an essentially unique Jordan canonical form. If F is not algebraically closed that may or may not
be the case. In Theorem 5.6.6 we see the condition on T that will guarantee that it does. At the end of this section we discuss, though without full
proofs, a generalization of Jordan canonical form that always exists (Theorem 5.6.13).
These results in this section are easy to obtain given the hard work we
have already done. We begin with some preliminary work, apply Theorem 5.4.6, use rational canonical form, and out pops Jordan canonical form
with no further ado!
Lemma 5.6.1. Let V be a finite-dimensional vector space and let T W V !
V be a linear transformation. Suppose that mT .x/ D cT .x/ D .x a/k .
Then V is T -generated by a single element w1 and V has a basis B D
fv1 ; : : : ; vk g where vk D w and vi D .T aI/.vi C1 / for i D 1; : : : ; k 1.
Proof. We know that there is an element w of V with mT ;w .x/ D mT .x/.
Then w T -generates a subspace W1 of V whose dimension is the degree k
of mT .x/. By hypothesis mT .x/ D cT .x/, so cT .x/ also has degree k. But
the degree cT .x/ is equal to the dimension of V , so dim.W1 / D dim.V /
and hence W1 D V .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 137 — #151

i

i

5.6. Jordan canonical form

137

Set vk D w and for 1  i < k, set vi D .T
aI/k i .vk /. Then
k i
k i 1
vi D .T aI/ .vk / D .T aI/.T aI/
.vk / D .T aI/.vi C1 /.
It remains to show that B D fv1 ; : : : ; vk g is a basis. It suffices to show
that this set is linearly independent. Suppose that c1 v1 C    C ck vk D 0,
i.e., c1 .T
aI/k 1 vk C    C ck vk D 0. Then p.T /.vk / D 0 where
p.x/ D c1 .x a/k 1 C c2.x a/k 2 C   C ck . Now p.x/ is a polynomial
of degree at most k 1, and mT ;vk .x/ D .x a/k is of degree k, so p.x/
is the zero polynomial. The coefficient of x k 1 in p.x/ is c1 , so c1 D 0I
then the coefficient of x k 2 in p.x/ is c2, so c2 D 0, etc. Thus c1 D c2 D
   D ck D 0 and B is linearly independent.
Corollary 5.6.2. Let T and B be as in Lemma 5.6.1. Then

ŒT B

2

3
a 1
6 a 1
7
6
7
6
:
:: 7
D6
7;
6
7
4
15
a

a k-by-k matrix with diagonal entries a, entries immediately above the diagonal 1, and all other entries 0.
Proof. .T
aI/.v1 / D 0 so T .v1 / D v1 I .T
aI/.vi C1 / D vi so
T .vi C1 / D vi C avi C1 , and the result follows from Remark 2.2.8.
Definition 5.6.3. A basis B of V as in Corollary 5.6.2 is called a Jordan
basis of V .
If V D V1 ˚    ˚ Vl and Vi has a Jordan basis Bi , then B D B1 [
   [ Bl is called a Jordan basis of V .
Þ
Definition 5.6.4. (1) A k-by-k matrix
2
3
a 1
6 a 1
7
6
7
6
7
::
6
7
:
6
7
4
15
a
as in Corollary 5.6.2 is called a k-by-k Jordan block associated to the eigenvalue a.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 138 — #152

i

i

138

Guide to Advanced Linear Algebra

(2) A matrix J is said to be in Jordan canonical form if J is a block
diagonal matrix
2
3
J1
6 J2
7
6
7
J D6
7
:
:: 5
4
Jl
Þ

with each Ji a Jordan block.

Theorem 5.6.5 (Jordan canonical form). (1) Let F be an algebraically
closed field and let V be a finite-dimensional F -vector space. Let T W V !
V be a linear transformation. Then V has a basis B with ŒT B D J a
matrix in Jordan canonical form. J is unique up to the order of the blocks.
(2) Let F be an algebraically closed field and let A be an n-by-n matrix
with entries in F . Then A is similar to a matrix J in Jordan canonical form.
J is unique up to the order of the blocks.
Proof. Let T have characteristic polynomial
cT .x/ D .x

a1 /e1    .x

am /em :

Then, by Theorem 5.4.6, we have a T -invariant direct sum decomposition
V D V 1 ˚    ˚ V m where V i D Ker.T ai I/ei . Let Ti be the restriction
of T to V i . Then, by Theorem 5.5.2, V i has a rational canonical T -basis
C D fw1i ; : : : ; wki g and a corresponding direct sum decomposition V i D
i
W1i ˚    ˚ Wki . Then each Wji satisfies the hypothesis of Lemma 5.6.1, so
i
Wji has a Jordan basis Bji . Then
B D B11 [    [ Bk11 [    [ B1m [    [ Bkmm
is a Jordan basis of V . To see uniqueness, note that there is unique factorization for the characteristic polynomial, and then the uniqueness of each of
the block sizes is an immediate consequence of the uniqueness of rational
canonical form.
(2) Apply part (1) to the linear transformation T D TA .
We stated Theorem 5.6.5 as we did for emphasis. We have a more general result.
Theorem 5.6.6 (Jordan canonical form). (1) Let V be a finite-dimensional
vector space over a field F and let T W V ! V be a linear transformation. Suppose that cT .x/, the characteristic polynomial of T , factors into a

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 139 — #153

i

i

5.6. Jordan canonical form

139

product of linear factors, cT .x/ D .x a1 /e1    .x am /em . Then V has
a basis B with ŒvB D J a matrix in Jordan canonical form. J is unique
up to the order of the blocks.
(2) Let A be an n-by-n matrix with entries in a field F . Suppose that
cA .x/, the characteristic polynomial of A, factors into a product of linear
factors, cA .x/ D .x a1 /e1    .x am /em . Then A is similar to a matrix
J in Jordan canonical form. J is unique up to the order of the blocks.
Proof. Identical to the proof of Theorem 5.6.5.
Remark 5.6.7. Let us look at a couple of small examples. Let A1 D
1 0
in Jordan canonical form, but its rational canon0 2 . Then A1 is already
 3 1
 
ical form is M1 D 2 0 . Let A2 D 30 13 . Then A2 is already in Jordan


canonical form, but its rational canonical form is M2 D 96 01 . In both of
these two (one diagonalizable, one not) we see that the rational canonical
form is more complicated and less informative than the Jordan canonical
form, and indeed in most applications it is the Jordan canonical form we are
interested in. But, as we have seen, the path to Jordan canonical form goes
through rational canonical form.
Þ
The question now naturally arises as to what we can say for a linear
transformation T W V ! V where V is a vector space over F and cT .x/
may not factor into a product of linear factors over F . Note that this makes
no difference in the rational canonical form. Although there is not a Jordan
canonical form in this case, there is an appropriate generalization. Since it is
not so useful, we will only state the results. The proofs are not so different,
and we leave them for the reader.
Lemma 5.6.8. Let V be a finite-dimensional vector space and let T W V !
V be a linear transformation. Suppose that mT .x/ D cT .x/ D p.x/k ,
where p.x/ D x d C ad 1 x d 1 C    C a0 is an irreducible polynomial
of degree d . Then V is T -generated by a single element w, and V has a
basis B D fv11; : : : ; v1d ; v21 ; : : : ; v2d ; : : : ; vk1 ; : : : ; vkd g where vkd D w and T
is given as follows: For any j , and for i > 1, T .vji / D vji 1 . For j D 1,
and for i D 1, T .v11 / D a0 v11 a1 v12    ad 1 v1d . For j > 1, and for
i D 1, T .vj1 / D a0 vj1 a1 vj2    ad 1 vjd C vjd 1 .
Remark 5.6.9. This is a direct generalization of Lemma 5.6.1, as if
mT .x/ D cT .x/ D .x a/k , then d D 1 so we are in the case i D 1.
the companion matrix of p.x/ D x a is the 1-by-1 matrix Œa0  D Œ a,
and then T .v11 / D av11 and T .vj1 / D avj1 C vj1 1 for j > 1.
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 140 — #154

i

i

140

Guide to Advanced Linear Algebra

Corollary 5.6.10. In the situation of Lemma 5.6.8,

ŒT B

2

3
C N
6 C N
7
6
7
D6
7;
::
4
: N5
C

where there are k identical d -by-d blocks C D C.cT .x// along the diagonal, and .k 1/ identical d -by-d blocks N immediately above the diagonal,
where N is a matrix with an entry of 1 in row d , column 1 and all other
entries 0.
Remark 5.6.11. If p.x/ D .x
Þ

a/ this is just a k-by-k Jordan block.

Definition 5.6.12. A matrix as in Corollary 5.6.10 is said to be a generalized Jordan block. A block diagonal matrix whose diagonal blocks are
generalized Jordan blocks is said to be in generalized Jordan canonical
form.
Þ
Theorem 5.6.13 (Generalized Jordan canonical form). (1) Let V be a finitedimensional vector space over the field F and let cT .x/ factor as cT .x/ D
p1 .x/e1    pm .x/em for irreducible polynomials p1 .x/; : : : ; pm.x/. Then
V has a basis B with ŒV B a matrix in generalized Jordan canonical form.
ŒV B is unique up to the order of the generalized Jordan blocks.
(2) Let A be an n-by-n matrix with entries in F and let cA .x/ factor
as cA .x/ D p1 .x/e1    pm .x/em for irreducible polynomials p1 .x/; : : : ;
pm .x/. Then A is similar to a matrix in generalized Jordan canonical form.
This matrix is unique up to the order of the generalized Jordan blocks.

5.7

An algorithm for Jordan
canonical form and Jordan basis

In this section we develop an algorithm to find the Jordan canonical form
of a linear transformation, and a Jordan basis, assuming that we can factor
the characteristic polynomial into a product of linear factors. (As is well
known, there is no general method for doing this.)
We will proceed by first developing a pictorial encoding of the information we are trying to find. We call this picture the labelled eigenstructure
picture or `ESP, of the linear transformation.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 141 — #155

i

i

5.7. Jordan canonical form and Jordan basis

141

Definition 5.7.1. Let uk be a generalized eigenvector of index k corresponding to an eigenvalue  of a linear transformation T W V ! V . Set
uk 1 D .T I/.uk /, uk 2 D .T I/.uk 1 /; : : : , u1 D .T I/.u2 /.
Then fu1 ; : : : ; uk g is a chain of generalized eigenvectors. The vector uk is
the top of the chain.
Þ
Remark 5.7.2. If fu1 ; : : : ; uk g is a chain as in Definition 5.7.1, then for
each 1  i  k, ui is a generalized eigenvector of index i associated to the
eigenvalue  of T .
Þ
Remark 5.7.3. A chain is entirely determined by the vector uk at the
top. (We will use this observation later: To find a chain, it suffices to find
the vector at the top of the chain.)
Þ
We now pictorially represent a chain as in Definition 5.7.1 as follows:

k

uk

k–1

uk–1
.
.
.
u2

2

u1
1
λ
If fu1 ; : : : ; uk g forms a Jordan basis for a k-by-k Jordan block for the
eigenvalue  of T , the vectors in this basis form a chain. Conversely, from
a chain we can construct a Jordan block, and a Jordan basis.
A general linear transformation will have more than one Jordan block.
The `ESP of a linear transformation is the picture we obtain by putting its
chains side by side.
The eigenstructure picture, or ESP, of a linear transformation, is obtained from the `ESP by erasing the labels. We will usually think about this
the other way: We will think of obtaining the `ESP from the ESP by putting

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 142 — #156

i

i

142

Guide to Advanced Linear Algebra

the labels in. From the Jordan canonical form of a linear transformation we
can determine its ESP, and conversely. Although the ESP has less information than the `ESP, it is easier to determine.
The opposite extreme from the situation of a linear transformation whose
Jordan canonical form has a single Jordan block is a diagonalizable linear
transformation.
Suppose T is diagonalizable with eigenvalues 1 ; : : : ; n (not necessarily distinct) and a basis fv1 ; : : : ; vn g of associated eigenvectors. Then T has
`ESP

v1

v2

v3

λ1

λ2

λ3

.
.
.

1

vn

λn

We have shown that the Jordan canonical form of a linear transformation
is unique up to the order of the blocks, so we see that the ESP of a linear
transformation is unique up to the order of the chains. As Jordan bases are
not unique, neither is the `ESP.
The `ESP is easier to illustrate by example than to define formally. We
have just given two general examples. For a concrete example we advise the
reader to look at the beginning of Example 5.7.7.
We now present our algorithm for determining the Jordan canonical
form of a linear transformation. Actually, the algorithm we present will be
an algorithm for ESP.
To find the ESP of T what we need to find is the positions of the nodes at
the top of chains. We envision starting at the top, i.e., the highest index, and
working our way down. From this point of view, the nodes we encounter
at the top of chains are “new” nodes, while nodes that are not at the top of
chains come from nodes we have already seen, and we regard them as “old”
nodes.
Let us now imagine ourselves in the middle of this process, say at height
(D index) j , and suppose we see part of the ESP of T for the eigenvalue :

j

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 143 — #157

i

i

5.7. Jordan canonical form and Jordan basis

143

Each node in the ESP represents a vector in the generalized eigenspace
E1 , and together these vectors are a basis for E1 . More precisely, the
vectors corresponding to the nodes at height j or less form a basis for Ej ,
the subspace of E1 consisting of eigenvectors of index at most j (as well
as the 0 vector). Thus if we let dj ./ be the number of nodes at height at
most j , then
j

dj ./ D dim E :
As a first step toward finding the number of new nodes at index j , we
want to find the number of all nodes at this index. If we let djex ./ denote
the number of nodes exactly at level j , then
djex./ D dj ./

dj

1 ./:

(That is, the number of nodes at height exactly j is the number of nodes at
height at most j minus the number of nodes at height at most j 1.)
We want to find djnew./, the number of new nodes at height j . Every
node at height j is either new or old, so the number of new nodes at height
j is
djnew ./ D djex ./

djexC1 ./

as every old node at height j comes from a node at height j C 1, and there
are exactly djexC1 ./ of those.
This gives our algorithm:
Algorithm 5.7.4. Let  be an eigenvalue of T W V ! V .
Step 1. For j D 1; 2; : : : , compute
j

dj ./ D dim E D dim.Ker.T

I/j /:

Stop when dj ./ D d1 ./ D dim E1 . Recall from Lemma 4.2.4 that
d1 ./ D alg-mult./. Denote this value of j by jmax ./. (Note also that
jmax ./ is the smallest value of j for which dj ./ D dj 1 ./.)
Step 2. For j D 1; : : : ; jmax ./ compute djex ./ by
d1ex ./ D d1 ./;

djex ./ D dj ./

dj

1 ./

for j > 1:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 144 — #158

i

i

144

Guide to Advanced Linear Algebra

Step 3. For j D 1; : : : ; jmax./ compute djnew./ by
djnew./ D djex./

djnew./

D

djex./

djexC1 ./

for j < jmax ./;

for j D jmax ./:

We now refine our argument to use it to find a Jordan basis for a linear
transformation. The algorithm we present will be an algorithm for `ESP,
but since we already know how to find the ESP, it is now just a matter of
finding the labels.
Again we us imagine ourselves in the middle of this process, at height
j for the eigenvalue . The vectors labelling the nodes at height at most j
form a basis for Ej and the vectors labelling the nodes at height at most
j 1
j 1 form a basis for E . Thus the vectors labelling the nodes at height
j
j
exactly j are a basis for a subspace F of E that is complementary to
j 1
E . But cannot be any subspace, as it must contain the old nodes at
j C1
height j , which come from one level higher, i.e., from a subspace F
j C1
j
of E that is complementary to E . But that is the only condition on the
j
complement F , and since we are working our way down and are at level j ,
j C1
at level
we may assume we have successfully chosen a complement F
j C 1.
With a bit more notation we can describe our algorithm. Let us denote
j
the space spanned by the old nodes at height j by A . (We use A because
it is the initial letter of alt, the German word for old. We cannot use O
for typographical reasons.) The nodes in Aj come from nodes at height
j C 1, but we already know what these are: they are in Fj C1 . Thus we set
j
j 1
j
j C1
j
I/.F /. Then A and E are both subspaces of E ,
A D .T
j
and in fact they are independent subspaces, as any nonzero vector in A has
j 1
height j and any nonzero vector in E has height at most j 1. We then
j
j
j 1
j
choose N to be any complement of E ˚ A in E . (For j D 1 the
j
situation is a little simpler, as we simply choose N to be a complement of
j
j
A in E .)
This is a space of new (or, in German, neu) vectors at height j and is
j
precisely the space we are looking for. We choose a basis for N and label
the new nodes at height j with the elements of this basis. In practice, we
usually find Nj as follows: We find a basis B1 of Ej 1 , a basis B2 of
j
j
A , and extend B1 [ B2 to a basis B of E . Then B .B1 [ B2 / is a
j
j
basis of N . So actually we will find the basis of N directly, and that is the
j
j 1
j
j
information we need. Finally, we have just obtained E D E ˚A ˚N

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 145 — #159

i

i

5.7. Jordan canonical form and Jordan basis

145

so we set Fj D Aj ˚ Nj and we are finished at height j and ready to drop
down to height j
1. (When we start at the top, for j D jmax ./, the
situation is easier. At the top there can be no old vectors, so for j D jmax
we simply have Ej D Ej 1 ˚ Nj and Fj D Nj .)
We summarize our algorithm as follows:
Algorithm 5.7.5. Let  be an eigenvalue of T W V ! V .

Step 1. For j D 1; 2; : : : ; jmax ./ find the subspace Ej D Ker..T
Step 2. For j D jmax ./; : : : ; 2; 1:
j

j 1

I/j /.

j

(a) If j D jmax ./, let N be any complement of E in E . If j <
j
j C1
j
jmax ./, let A D .T
I/.F /. Let N be any complement of
j 1
j
j
j
j
E ˚ A in E if j > 1, and let N be any complement of A in
j
E if j D 1.
j

(b) Label the new nodes at height j with a basis of N .
(c) Let Fj D Aj ˚ Nj .
There is one more point we need to clear up to make sure this algorithm
works. We know from our results on Jordan canonical form that there is
some Jordan basis for A, i.e., some labelling so that the `ESP is correct.
We have made some choices, in choosing our complements Nj , and in
choosing our basis for Nj . But we can see that these choices all yield the
same ESP (and hence one we know is correct.) For the dimensions of the
various subspaces are all determined by the Jordan canonical form of A, or
equivalently by its ESP, and different choices of bases or complements will
yield spaces of the same dimension.
Remark 5.7.6. There are lots of choices here. Complements are almost
never unique, and bases are never unique except for the vector space f0g.
But no matter what choice we make, we get labels for the ESP and hence
Jordan bases for V . (It is no surprise that a Jordan basis is not unique.) Þ
In finding the `ESP (or, equivalently, in finding a Jordan basis), it is
essential that we work from the top down and not from the bottom up. If
we try to work from the bottom up, we have to make arbitrary choices and
we have no way of knowing if they are correct. Since they almost certainly
won’t be, something we would only find out at a later (perhaps much later)
stage, we would have to go back and modify them, and this rapidly becomes
an unwieldy mess.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 146 — #160

i

i

146

Guide to Advanced Linear Algebra

We recall that if A is a matrix and B is a Jordan basis for V , then
A D PJP 1 where J is the Jordan canonical form of A and P is the matrix whose columns consist of the vectors in B (taken in the corresponding
order).
Example 5.7.7. Here is an example for a matrix that is already in Jordan
canonical form. We present it to illustrate all of the various subspaces we
have introduced, before we move on to some highly nontrivial examples.
Let
2
3
6 10
60 6 1
7
6
7
60 0 6
7
6
7
6
7
6
6
7
AD6
7;
6
6
7
6
7
6
7
7
1
6
7
4
07 5
7
with characteristic polynomial cA.x/ D .x 6/5 .x
We can see immediately that A has `ESP
3

e3

2

e2

e7

e1

1
6

E61 D Ker.A

6I /

E62 D Ker.A

6I /2

E63 D Ker.A

E71 D Ker.A

E72 D Ker.A

7/3 .

6I /3
7I /
7I /2

e4

e5

6

6

e6
7

e8
7

˚
has dimension 3, with basis e1 ; e4 ; e5 :
˚
has dimension 4, with basis e1 ; e2 ; e4; e5 :
˚
has dimension 5, with basis e1 ; e2 ; e3; e4 ; e5 :
˚
has dimension 2, with basis e6 ; e8 :
˚
has dimension 3, with basis e6 ; e7 ; e8 :

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 147 — #161

i

i

5.7. Jordan canonical form and Jordan basis

147

Thus
d1 .6/ D 3;

d2 .6/ D 4;

d3 .6/ D 5;

so
d1ex .6/ D 3;

d2ex .6/ D 4

d3ex .6/ D 5

3 D 1;

4 D 1;

and
d1new .6/ D 3

1 D 2;

d2new .6/ D 1

1 D 0;

d3new .6/ D 1:

Also
d1 .7/ D 2;

d2 .7/ D 3;

so
d1ex .7/ D 2;

d2ex .7/ D 3

2 D 1;

and
d1new .7/ D 2

1 D 1;

d2new.7/ D 1;

and we recover that A has 1 3-by-3 block and 2 1-by-1 blocks for the eigenvalue 6, and 1 2-by-2 block and 1 1-by-1 block for the eigenvalue 7.
Furthermore,
˚
E62 has a complement in E63 of N63 with basis e3 :
Set F63 D N63 with basis fe3g.
A26 D .A 6I /.F63 / has basis fe2 g, and E61 ˚ A26 has complement in
2
E6 of N62 D f0g with empty basis. Set
˚
F62 D A26 ˚ N62 with basis e2 :

A16 D .A 6I /.F62 / has basis fe1g, and A16 has complement in E61 of
N61 with basis fe4 ; e5g.
Also
˚
E71 has complement in E72 of N72 with basis e7 :

Set F72 D N72 with basis fe7g.
A17 D .A 7I /.F72 / has basis fe6g, and A17 has complement in E71 of
1
N7 with basis fe8 g.
Thus we recover that e3 is at the top of a chain of height 3 for the
eigenvalue 6, e4 and e5 are each at the top of a chain of height 1 for the

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 148 — #162

i

i

148

Guide to Advanced Linear Algebra

eigenvalue 6, e7 is at the top of a chain of height 2 for the eigenvalue 7, and
e8 is at the top of a chain of height 1 for the eigenvalue 7.
Finally, since e2 D .A 6I /.e3 / and e1 D .A 6I /.e2 /, and e6 D
.A 7I /.e7 /, we recover that fe1 ; e2; e3 ; e4; e5 ; e6; e7 ; e8g is a Jordan basis.
Þ
Example 5.7.8. We present a pair of (rather elaborate) examples to illustrate our algorithm.
(1) Let A be the 8-by-8 matrix
2
3
3 30 0 0 1 0
2
6 3 41 1 1 0 1
17
6
7
6 0 63 0 0 2 0
47
6
7
6
7
57
6 2 40 1 1 0 2
AD6
7
6 3 21 1 2 0 1
27
6
7
6 1 10 1 1 3 1
17
6
7
4 5 10 1 3 2 1 6 105
3 21 1 1 0 1
1

with characteristic polynomial cA.x/ D .x

3/7 .x

2/.

The eigenvalue  D 2 is easy to deal with. We know without any further computation that d1 .2/ D d1 .2/ D 1 and that Ker.A 2I / is 1dimensional.
For the eigenvalue  D 3, computation shows that A 3I has rank
5, so Ker.A 3I / has dimension 3 and d1 .3/ D 3. Further computation
shows that .A 3I /2 has rank 2, so Ker.A 3I /2 has dimension 6 and
d2 .3/ D 6. Finally, .A 3I /3 has rank 1, so Ker.A 3I /3 has dimension
7 and d3 .3/ D d1 .3/ D 7.
At this point we can conclude that A has minimum polynomial mA .x/ D
.x 3/3 .x 2/.
We can also determine the ESP of A. We have
d1ex.3/ D d1 .3/ D 3

d2ex.3/ D d2 .3/

d3ex.3/ D d3 .3/

d1 .3/ D 6

3D3

d2 .3/ D 7

6D1

and then
d3new.3/ D d3ex .3/ D 1

d2new.3/ D d2ex .3/

d1new.3/ D d1ex .3/

d3ex .3/ D 3

d2ex .3/ D 3

1D2
3 D 0:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 149 — #163

i

i

5.7. Jordan canonical form and Jordan basis

149

Thus we see that for the eigenvalue 3, we have one new node at level 3,
two new nodes at level 2, and no new nodes at level 1. Hence A has `ESP

3

u3

2

u2

v2

w2

u1

v1

w1

1
3

3

3

x1
2

with the labels yet to be determined, and thus A has Jordan canonical form
2
31 0
60 3 1
6
60 0 3
6
6
3 1
6
J D6
6
0 3
6
6
31
6
4
03

3

7
7
7
7
7
7
7:
7
7
7
7
5
2

Now we find a Jordan basis.
Equivalently, we find the values of the labels. Once we have the labels
u3 , v2 , w2 , and x1 on the new nodes, the others are determined.
The vector x1 is easy to find. It is any eigenvector corresponding to the
eigenvalue 2. Computation reveals that we may choose
2

6
6
6
6
6
6
x1 D 6
6
6
6
6
4

3
30
127
7
687
7
7
187
7:
17
7
47
7
665
1

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 150 — #164

i

i

150

Guide to Advanced Linear Algebra

The situation for the eigenvalue 3 is more interesting. We compute that
8̂2 3 2 3 2 3 2 3 2 3 2 3 2 39
1
0
0
0
0
0
0 >
ˆ
ˆ6 7 6 7 6 7 6 7 6 7 6 7 6 7>
>
ˆ
>
ˆ
0
1
0
0
0
0
ˆ
6 7 6 7 6 7 6 7 6 7 6 7 607>
>
ˆ
>
ˆ
6
7
6
7
6
7
6
7
6
7
6
7
6
7
>
ˆ
ˆ
607 607 617 607 607 607 607>
>
ˆ
<6 7 6 7 6 7 6 7 6 7 6 7 6 7>
=
607 607 607 617 607 607 607
3
Ker.A 3I / has basis
6 7;6 7;6 7;6 7;6 7;6 76 7 ;
ˆ
607 607 607 607 617 607 607>
ˆ
ˆ
6 7 6 7 6 7 6 7 6 7 6 7 6 7>
>
ˆ
ˆ
607 607 607 607 607 617 607>
>
ˆ
>
ˆ
6
7
6
7
6
7
6
7
6
7
6
7
6
7
>
ˆ
>
ˆ
ˆ405 405 405 405 405 405 415>
>
>
:̂
;
0
1
0
0
0
0
0

Ker.A

3I /2 has basis

and Ker.A

8̂2 3 2 3 2 3 2 3 2 3 2 39
1
0
0
0
0
0 >
ˆ
>
ˆ
ˆ
607 617 607 607 607 607>
>
ˆ
ˆ
6 7 6 7 6 7 6 7 6 7 6 7>
>
ˆ
>
ˆ
6
7
6
7
6
7
6
7
6
7
6
7
>
ˆ
ˆ
627 607 607 607 607 607>
>
ˆ
<6 7 6 7 6 7 6 7 6 7 6 7>
=
607 607 617 607 607 607
6 7;6 7;6 7;6 7;6 7;6 7 ;
ˆ
607 607 607 617 607 607>
ˆ
ˆ
6 7 6 7 6 7 6 7 6 7 6 7>
>
ˆ
ˆ607 607 607 607 617 607>
>
ˆ
>
ˆ
6
7
6
7
6
7
6
7
6
7
6
7
>
ˆ
>
ˆ
ˆ405 405 405 405 405 415>
>
>
:̂
;
0
1
0
0
0
0

3I / has basis

8̂2 3 2 3 2 39
0 >
0
1
>
ˆ
>
ˆ
>
7
6
7
6
ˆ
6
>
ˆ
07 617 607
7
>
ˆ
6
>
ˆ
7
>
7
6
7
6
ˆ
6
>
ˆ
0
0
2
7
>
7
6
7
6
ˆ
6
ˆ
=
<6 7 6 7 6 7>
607 607 617
6 7;6 7;6 7 :
ˆ
607 617 607>
ˆ
>
7 6 7 6 7>
ˆ
ˆ6
>
ˆ
607 617 607>
>
ˆ
7
7
6
7
6
ˆ
6
>
ˆ4 5 4 5 4 5>
ˆ
1 >
1
>
ˆ 1
>
;
:̂
0
1
0

For u3 we may choose any vector u3 2 Ker.A
Ker.A 3I /2 . Inspection reveals that we may choose
2 3
1
607
6 7
607
6 7
6 7
607
u3 D 6 7 :
607
6 7
607
6 7
405
0

3I /3 , u3 …

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 151 — #165

i

i

5.7. Jordan canonical form and Jordan basis

151

Then
2

u2 D .A

6
6
6
6
6
6
3I /u3 D 6
6
6
6
6
4

3
0
37
7
07
7
7
27
7
37
7
17
7
55
3

2

and

u1 D .A

6
6
6
6
6
6
3I /u2 D 6
6
6
6
6
4

3
2
07
7
47
7
7
07
7:
07
7
07
7
25
0

For v2 , w2 we may choose any two vectors in Ker.A 3I /2 such that the
set of six vectors consisting of these two vectors, u2 , and the given three
vectors in our basis of Ker.A 3I / are linearly independent. Computation
reveals that we may choose
2 3
1
607
6 7
627
6 7
6 7
607
v2 D 6 7
607
6 7
607
6 7
405
0

and

2 3
0
617
6 7
607
6 7
6 7
607
w2 D 6 7 :
607
6 7
607
6 7
405
1

Then
2

v1 D .A

6
6
6
6
6
6
3I /v2 D 6
6
6
6
6
4

3
0
17
7
07
7
7
27
7
17
7
17
7
35
1

2

and w1 D .A

6
6
6
6
6
6
3I /w2 D 6
6
6
6
6
4

3
1
07
7
27
7
7
17
7:
07
7
07
7
05
0

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 152 — #166

i

i

152

Guide to Advanced Linear Algebra

Then
˚

u1 ; u2 ; u3 ; v1; v2 ; w1; w2; x1
8̂2 3 2 3 2 3 2
2
0
1
ˆ
ˆ
ˆ
6 07 6 37 607 6
ˆ
ˆ
6 7 6 7 6 7 6
ˆ
ˆ
6 47 6 07 607 6
ˆ
ˆ
6 7 6 7 6 7 6
ˆ
<6 7 6 7 6 7 6
6 07 6 27 607 6
D 6 7;6 7;6 7;6
ˆ6 07 6 37 607 6
ˆ
ˆ
6 7 6 7 6 7 6
ˆ
ˆ
6 07 6 17 607 6
ˆ
ˆ
6 7 6 7 6 7 6
ˆ
ˆ
ˆ4 25 4 55 405 4
:̂
0
3
0

3 2 3 2
0
1
607 6
17
7 6 7 6
6 7 6
07
7 627 6
7 6 7 6
27 607 6
7;6 7;6
17 607 6
7 6 7 6
6 7 6
17
7 607 6
35 405 4
1
0

3 2 3 2
1
0
617 6
07
7 6 7 6
6 7 6
27
7 607 6
7 6 7 6
17 607 6
7;6 7;6
07 607 6
7 6 7 6
6 7 6
07
7 607 6
05 405 4
0
1

39
30 >
>
>
>
127
>
7>
>
7
>
687>
>
=
7>
187
7
17>
>
7>
>
>
47
>
7>
>
665>
>
>
;
1

Þ

is a Jordan basis.

(2) Let A be the 8-by-8 matrix
2

6
6
6
6
6
6
AD6
6
6
6
6
4

3
3
1
6
1
3
1
4

1
4
0
0
1
1
0
1

0
1
3
0
0
2
1
0

0
1
1
2
0
0
0
1

0
1
2
0
4
4
2
0

with characteristic polynomial cA.x/ D .x

0 0
1 3
2 6
0 0
0 0
0 12
2 10
0 0

3
1
37
7
17
7
7
67
7
17
7
37
7
15
8

4/6 .x

5/2 .

For the eigenvalue  D 5, we compute that A 5I has rank 7, so
Ker.A 5I / has dimension 1 and hence d1.5/ D 1, and also that Ker.A
5I /2 has dimension 2 and hence d2 .5/ D d1 .5/ D 2.

For the eigenvalue  D 4, we compute that A 4I has rank 5, so
Ker.A 4I / has dimension 3 and hence d1 .4/ D 3, that .A 4I /2 has rank
4, so Ker.A 4I /2 has dimension 4 and hence d2 .4/ D 4, that .A 4I /3
has rank 3, so Ker.A 4I /3 has dimension 5 and hence that d3 .4/ D 5 and
that .A 4I /4 has rank 2, so Ker.A 4I /4 has dimension 6 and hence that
d4 .4/ D d1 .4/ D 6.
Thus we may conclude that mA .x/ D .x

4/4 .x

5/2 .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 153 — #167

i

i

5.7. Jordan canonical form and Jordan basis

153

Furthermore
d1ex .4/ D d1 .4/ D 3
d2ex .4/ D d2 .4/

d1 .4/ D 4

3D1

d3ex .4/ D d3 .4/

d2 .4/ D 5

4D1

d4ex .4/ D d4 .4/

d3 .4/ D 6

5D1

and then
d4new.4/ D d4ex D 1
d3new.4/ D d3ex .4/

d4ex .4/ D 1

1D0

d2new.4/

d3ex .4/

1D0

D

d2ex .4/

d1new.4/ D d1ex .4/

D1

d2ex .4/ D 3

1 D 2:

Also
d1ex .5/ D d1 .5/ D 1
d2ex .5/ D d2 .5/

d1 .5/ D 2

1D1

and then
d2new .5/ D d2ex.5/ D 1
d1new .5/ D d1ex.5/

d2ex .5/ D 1

1 D 0:

Hence A has `ESP as on the next page with the labels yet to be determined. In any case A has Jordan canonical form
2
4
60
6
60
6
6
60
6
6
6
6
6
4

1
4
0
0

0
1
4
0

0
0
1
4

3

7
7
7
7
7
7
7:
4
7
7
7
4
7
5 15
05

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 154 — #168

i

i

154

Guide to Advanced Linear Algebra

4

u4

3

u3

2

u2

x2

u1

1

v1

4

4

Now we find the labels. Ker.A

Ker.A

8̂2
ˆ
ˆ
ˆ6
ˆ
ˆ
6
ˆ
ˆ6
ˆ
ˆ
6
ˆ
<6
6
6
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
ˆ4
:̂

w1
4

5

4I /4 has basis

3 2 3 2 3 2 3 2
1
0
0
0
617 607 607 6
07
7 6 7 6 7 6 7 6
6 7 6 7 6 7 6
07
7 607 607 667 6
7 6 7 6 7 6 7 6
07 607 637 607 6
7;6 7;6 7;6 7;6
07 607 607 607 6
7 6 7 6 7 6 7 6
6 7 6 7 6 7 6
07
7 607 607 607 6
5
0 405 405 415 4
1
0
1
0

4I /3 has basis
8̂2
ˆ
ˆ
ˆ6
ˆ
ˆ
6
ˆ
ˆ6
ˆ
ˆ
6
ˆ
<6
6
6
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
ˆ4
:̂

x1

3 2 3 2 3 2
1
0
0
617 607 6
07
7 6 7 6 7 6
6 7 6 7 6
07
7 607 667 6
7 6 7 6 7 6
07 607 607 6
7;6 7;6 7;6
07 607 607 6
7 6 7 6 7 6
07 607 607 6
7 6 7 6 7 6
05 405 415 4
1
0
0

3 2 39
0
0 >
>
607>
>
07
7 6 7>
>
>
7
6
7
>
07 607>
>
7 6 7>
=
07 607
7;6 7 ;
37 607>
7 6 7>
>
6 7>
>
07
7 637>
>
>
5
4
5
1
1 >
>
>
;
0
0

3 2 39
0
0 >
>
607>
>
07
>
7 6 7>
>
7
6
7
>
07 607>
>
7 6 7>
=
07 607
7;6 7 ;
37 607>
7 6 7>
>
>
07 637>
7 6 7>
>
>
5
4
5
1
1 >
>
>
;
0
0

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 155 — #169

i

i

5.7. Jordan canonical form and Jordan basis
Ker.A

4I /2 has basis
8̂2
ˆ
ˆ
ˆ
6
ˆ
ˆ6
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
<6
6
6
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
ˆ4
:̂

and Ker.A

Also, A

and Ker.A

3 2 3 2 3 2 39
1
0
0
0 >
>
617 607 607>
>
07
7 6 7 6 7 6 7>
>
>
7
6
7
6
7
6
7
>
07 607 607 607>
>
7 6 7 6 7 6 7>
=
07 607 607 607
7;6 7;6 7;6 7 ;
07 607 617 607>
7 6 7 6 7 6 7>
>
6 7 6 7 6 7>
>
07
7 607 617 637>
>
>
05 405 405 415>
>
>
;
1
0
0
0

4I / has basis
8̂2
ˆ
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
<6
6
6
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
6
ˆ
ˆ
ˆ4
:̂

5I 2 has basis

5I / has basis

155

3 2 3 2 39
1
0
0 >
>
>
7
6
7
6
>
07 607 607
7>
>
>
7
6
7
6
7
>
07 607 607>
>
7 6 7 6 7>
=
07 607 607
7;6 7;6 7 :
07 617 607>
7 6 7 6 7>
>
6 7 6 7>
>
07
7 617 637>
>
>
4
4
5
5
5
0
0
1 >
>
>
;
1
0
0

8̂2 3 2 39
0 >
0
>
ˆ
>
ˆ
>
6
7
ˆ
6
>
ˆ
17 607
7
>
ˆ
6
>
ˆ
7
6
7
ˆ
6
>
ˆ607 617>
>
ˆ
ˆ
=
<6 7 6 7>
627 607
6 7;6 7 ;
ˆ
607 607>
ˆ
>
ˆ
6 7 6 7>
ˆ
>
ˆ
607 627>
ˆ
>
ˆ
6 7 6 7>
>
ˆ
ˆ
5
4
5
4
1 >
>
ˆ 0
>
;
:̂
0
1
8̂2 39
0 >
ˆ
>
ˆ
ˆ607>
>
ˆ
>
ˆ
6
7
>
ˆ
ˆ617>
>
ˆ
>
ˆ
6
7
>
ˆ
<6 7>
=
607
6 7 :
ˆ
607>
ˆ
ˆ
6 7>
>
ˆ
ˆ
627>
>
ˆ
ˆ
6 7>
>
ˆ
>
ˆ
4
5
ˆ 1 >
>
>
:̂
;
0

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 156 — #170

i

i

156

Guide to Advanced Linear Algebra

We may choose for u4 any vector in Ker.A
4I /4 that is not in
3
Ker.A 4I / . We choose
2 3
2 3
0
1
607
6 07
6 7
6 7
607
6 27
6 7
6 7
6 7
6 7
637
6 07
u4 D 6 7 ;
so u3 D .A 4I /u4 D 6 7 ;
607
6 17
6 7
6 7
607
6 37
6 7
6 7
405
4 15
1
1
2 3
2 3
0
1
617
6 07
6 7
6 7
607
6 07
6 7
6 7
6 7
6 7
607
6 07
u2 D .A 4I /u3 D 6 7 ; u1 D .A 4I /u2 D 6 7 :
607
6 17
6 7
6 7
607
6 17
6 7
6 7
405
4 05
0
1

Then we may choose v1 and w1 to be any two vectors such that u1 , v1 , and
w1 form a basis for Ker.A 4I /. We choose
2 3
2 3
0
1
607
6 07
6 7
6 7
607
6 07
6 7
6 7
6 7
6 7
607
6 07
v1 D 6 7 and w1 D 6 7 :
607
6 07
6 7
6 7
637
6 07
6 7
6 7
415
4 05
0
1
We may choose x2 to be any vector in Ker.A 5I /2 that is not in
Ker.A 5I /. We choose
2 3
2 3
0
0
617
607
6 7
6 7
617
607
6 7
6 7
6 7
6 7
627
607
x2 D 6 7 so x1 D .A 5I /x2 D 6 7 :
607
607
6 7
6 7
607
627
6 7
6 7
405
415
1
0

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 157 — #171

i

i

5.8. Field extensions

157

Thus we obtain a Jordan basis
˚
u1 ; u2 ; u3 ; u4 ; v1; w1 ; x1; x2
8̂2 3 2 3 2 3 2 3 2
1
0
1
0
ˆ
ˆ
ˆ6 07 617 6 07 607 6
ˆ
ˆ
6 7 6 7 6 7 6 7 6
ˆ
ˆ6 07 607 6 27 607 6
ˆ
ˆ
6 7 6 7 6 7 6 7 6
ˆ
<6 7 6 7 6 7 6 7 6
6 07 607 6 07 637 6
D 6 7;6 7;6 7;6 7;6
ˆ
6 17 607 6 17 607 6
ˆ
ˆ
6 7 6 7 6 7 6 7 6
ˆ
ˆ
6 17 607 6 37 607 6
ˆ
ˆ
6 7 6 7 6 7 6 7 6
ˆ
ˆ
ˆ4 05 405 4 15 405 4
:̂
1
0
1
1

5.8

Field extensions

3 2 3 2 3 2 39
1
0
0
0 >
>
607 607 617>
>
07
7 6 7 6 7 6 7>
>
>
7
6
7
6
7
6
7
>
07 607 617 607>
>
7 6 7 6 7 6 7>
=
07 607 607 627
7;6 7;6 7;6 7 :
07 607 607 607>
7 6 7 6 7 6 7>
>
>
07 637 627 607>
7 6 7 6 7 6 7>
>
>
05 415 415 405>
>
>
;
1
0
0
1

Suppose we have an n-by-n matrix A with entries in F and suppose we have
an extension field E of F . An extension field is a field E  F . For example,
we might have E D C and F D R. If A is similar over F to another matrix
B, i.e., B D PAP 1 where P has entries in F , then A is similar to B over
E by the same equation B D PAP 1 , since the entries of P , being in F ,
are certainly in E. (Furthermore, P is invertible over F if and only if it is
invertible over E, as we see from the condition that P is invertible if and
only if det.P / ¤ 0.) But a priori, the converse may not be true. A priori, A
might be similar to B over E, i.e., there may be a matrix Q with entries in
E with B D QAQ 1 , though there may be no matrix P with entries in F
with B D PAP 1 . In fact, this does not occur: A and B are similar over
F if and only if they are similar over some (and hence over any) extension
field E of F .
Lemma 5.8.1. Let fv1 ; : : : ; vk g be vectors in F n and let E be an extension
of F . Then fv1 ; : : : ; vk g is linearly independent over F (i.e., the equation
c1 v1 C    C ck vk D 0 with each ci 2 F only has the solution c1 D
   D ck D 0) if and only if it is linearly independent over E (i.e., the
equation c1v1 C    C ck vk D 0 with each ci 2 E only has the solution
c1 D    D ck D 0).
Proof. Certainly if fv1; : : : ; vk g is linearly independent over E, it is linearly
independent over F .
Suppose now that fv1 ; : : : ; vk g is linearly independent over F . Then
fv1 ; : : : ; vk g extends to a basis fv1 ; : : : ; vn g of F n . Let E D fe1 ; : : : ; en g
be the standard basis of F n . It is the standard basis of En as well. Since

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 158 — #172

i

i

158

Guide to Advanced Linear Algebra

fv1 ; : : : ; vn g is a basis, the matrix P D ŒŒv1 E j    jŒvn E  is nonsingular
when viewed as a matrix over F . That means det.P / ¤ 0. If we view
P as a matrix over E, P remains nonsingular as det.P / ¤ 0. (det.P / is
computed purely from the entries of P .) Then fv1 ; : : : ; vn g is a basis for V
over E, so fv1 ; : : : ; vk g is linearly independent over E.
Lemma 5.8.2. Let A be an n-by-n matrix over F , and let E be an extension
of F .
(1) For any v 2 F n , mA;v .x/ D m
eA;v .x/ where mA;v .x/ (respectively
m
eA;v .x/) is the A-annihilator of v regarded as an element of F n
(respectively of En ).

(2) mA .x/ D m
eA .x/ where mA .x/ (respectively m
eA .x/) is the minimum
polynomial of A regarded as a matrix over F (respectively over E).

(3) cA .x/ D e
cA .x/ where cA .x/ (resp. e
cA .x/) is the characteristic polynomial of A regarded as a matrix over F (resp. over E).

Proof. (1) m
eA;v .x/ divides any polynomial p.x/ with coefficients in E for
which p.A/.v/ D 0 and mA;v .x/ is such a polynomial (as its coefficients
lie in F  E). Thus m
eA;v .x/ divides mA;v .x/.
Let mA;v .x/ have degree d . Then fv; Av; : : : ; Ad 1 vg is linearly independent over F , and hence, by Lemma 5.8.1, over E as well, so m
eA;v .x/
has degree at least d . But then m
eA;v .x/ D m
eA;v .x/.
(2) Again, m
eA .x/ divides mA .x/. There is a vector v in F n with mA .x/ D
mA;v .x/. By (1), m
eA;v .x/ D mA;v .x/. But m
eA;v .x/ divides m
eA .x/, so they
are equal.
(3) cA .x/ D det.xI
A/ D e
cA .x/ as the determinant is computed
purely from the entries of A.
Theorem 5.8.3. Let A and B be n-by-n matrices over F and let E be an
extension field of F . Then A and B are similar over E if and only if they are
similar over F .

Proof. If A and B are similar over F , they are certainly similar over E.
Suppose A and B are not similar over F . Then A has a sequence of elementary divisors p1 .x/; : : : ; pk .x/ and B has a sequence of elementary divisors
q1 .x/; : : : ; pl .x/ that are not the same. Let us find the elementary divisors
of A over E. We follow the proof of rational canonical form, still working
over F , and note that the sequence of elementary divisors we obtain over
F is still a sequence of elementary divisors over E. (If fw1 ; : : : ; wk g is a

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 159 — #173

i

i

5.9. More than one linear transformation

159

rational canonical T -generating set over F , it is a rational canonical T generating set over EI this follows from Lemma 5.8.2.) But the sequence of
elementary divisors is unique. In other words, p1 .x/; : : : ; pk .x/ is the sequence of elementary divisors of A over E, and similarly q1 .x/; : : : ; ql .x/
is the sequence of elementary divisors of B over E. Since these are different, A and B are not similar over E.
We have stated the theorem in terms of matrices rather than linear transformation so as not to presume any extra background. But it is equivalent to
the following one, stated in terms of tensor products.
Theorem 5.8.4. Let V be a finite-dimensional F -vector space and let S W
V ! V and T W V ! V be two linear transformations. Then S and T
are conjugate if and only if for some, and hence for any, extension field E
of F , S ˝ 1 W V ˝F E ! V ˝F E and T ˝ 1 W V ˝F E ! V ˝F E are
conjugate.

5.9

More than one linear
transformation

Hitherto we have examined the structure of a single linear transformation.
In the last section of this chapter, we derive three results that have a common
theme: They deal with questions that arise when we consider more than one
linear transformation.
To begin, let T W V ! W and S W W ! V be linear transformations,
with V and W finite-dimensional vector spaces. We examine the relationship between ST W V ! V and T S W W ! W .
If V D W and at least one of S and T are invertible, then ST and T S
are conjugate: ST D T 1 .T S/T or T S D S 1 .ST /S. In general we
have
Lemma 5.9.1. Let T W V ! W and S W W ! V be linear transformations
between finite-dimensional vector spaces.
Let p.x/ D a t x t C    C a0 2 F Œx be any polynomial with constant
term a0 ¤ 0. Then


dim Ker p.ST / D dim Ker p.T S/ :

Proof. Let fv1 ; : : : ; vk g be a basis for Ker.p.ST //. We claim that
fT .v1 /; : : : ; T .vk /g is linearly independent. To see this, suppose
c1 T .v1 / C    C ck T .vk / D 0:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 160 — #174

i

i

160

Guide to Advanced Linear Algebra

Then T .c1 v1 C    C ck vk / D 0, so ST .c1 v1 C    C ck vk / D 0. Let
v D c1 v1 C    C ck vk , so ST .v/ D 0. But v 2 Ker.p.ST //, so 0 D
.a t .ST /t C    C a1 .ST / C a0 I /.v/ D 0 C    C 0 C a0 v D a0 v and
hence, since a0 ¤ 0, v D 0. Thus c1 v1 C    C ck vk D 0. But fv1 ; : : : ; vk g
is linearly independent, so ci D 0 for all i , and hence fT .v1 /; : : : ; T .vk /g
is linearly independent.
Next we claim that T .vi / 2 Ker.p.T S// for each i . To see this, note
that
.T S/s T D .T S/    .T S/T D T .ST /    .ST / D T .ST /s
for any s. Then
p.T S/.T .vi // D .a t .T S/t C    C a0 I/.T .vi //

D .T .a t .ST /t C    C a0 I//.vi /
D T .p.ST /.vi // D T .0/ D 0:

Hence fT .v1 /; : : : ; T .vk /g is a linearly independent subset of Ker.p.T S//,
so dim.Ker.p.T S///  dim.Ker.p.ST ///. Interchanging S and T shows
that the dimensions are equal.
Theorem 5.9.2. Let T W V ! W and S W W ! V be linear transformations between finite-dimensional vector spaces over an algebraically closed
field F . Then ST and T S have the same nonzero eigenvalues, and for each
common eigenvalue  ¤ 0 ST and T S have the same ESP at  and hence
the same Jordan block structure at  (i.e., the same number of blocks of the
same sizes).
Proof. Apply Lemma 5.9.1 to the polynomials p t; .x/ D .x /t for
t D 1; 2; : : : , noting that the sequence of integers fdim.Ker.p t; .R/// j
t D 1; 2; : : :g determines the ESP of a linear transformation R at , or,
equivalently, its Jordan block structure at .
Corollary 5.9.3. Let T W V ! V and S W V ! V be linear transformations on a finite-dimensional vector space over an arbitrary field F . Then
ST and T S have the same characteristic polynomial.
Proof. First suppose that that F is algebraically closed. If dim.V / D n and
ST , and hence T S, has distinct nonzero eigenvalues 1 ; : : : ; k of multiplicities e1 ; : : : ; ek respectively, then they each have characteristic polynomial x e0 .x 1 /e1    .x k /ek where e0 D n .e1 C    C ek /.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 161 — #175

i

i

5.9. More than one linear transformation

161

In the general case, choose an arbitrary basis for V and represent S and
T by matrices A and B with entries in F . Then regard A and B as having
entries in F , the algebraic closure of F , and apply the algebraically closed
case.
Theorem 5.9.2 and Corollary 5.9.3 are the strongest results that hold in
general. It is not necessarily the case that ST and T S are conjugate, if S
and T are both singular linear transformations.
 
 
 
Example 5.9.4. (1) Let A D 10 00 and B D 00 10 . Then AB D 00 10
0 0
and BA D 0 0 are not similar, so TA TB D TAB and TB TA D TBA are not
conjugate, though they both have characteristic polynomial x 2.


 


(2) Let A D 11 00 and B D 11 11 . Then AB D 11 11 and BA D
0 0
0 0 are not similar, so TA TB D TAB and TB TA D TBA are not conjugate,
though they both have characteristic polynomial x 2. (In this case TA and TB
are both diagonalizable.)
Þ
Let T W V ! V be a linear transformation, let p.x/ be a polynomial,
and set S D p.T /. Then S and T commute. We now investigate the question of under what circumstances any linear transformation that commutes
with T must be of this form.
Theorem 5.9.5. Let V be a finite-dimensional vector space and let T W
V ! V be a linear transformation. The following are equivalent:
(1) V is T -generated by a single element, or, equivalently, the rational
canonical form of T consists of a single block.
(2) Every linear transformation S W V ! V that commutes with T can be
expressed as a polynomial in T .
Proof. Suppose (1) is true, and let v0 be a T -generator of V . Then every
element of V can be expressed as p.T /.v0 / for some polynomial p.x/. In
particular, there is a polynomial p0 .x/ such that S.v0 / D p0 .T /.v0 /.
For any v 2 V , let v D p.T /.v0 /. If S commutes with T ,


D p.T / S v0 D p.T / p0 .T / v0

D p0 .T / p.T / v0 D p0 .T /.v/I

S.v/ D S p.T / v0



so S D p0 .T /. (We have used the fact that if S commutes with T , it commutes with any polynomial in T . Also, any two polynomials in T commute
with each other.) Thus (2) is true.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 162 — #176

i

i

162

Guide to Advanced Linear Algebra

Suppose (1) is false, so that V has a rational canonical T -generating set
fv1 ; : : : ; vk g with k > 1. Let pi .x/ be the T -annihilator of vi , so p1 .x/
is divisible by pi .x/ for i > 1. Then we have a T -invariant direct sum
decomposition V D V1 ˚    ˚ Vk . Define S W V ! V by S.v/ D 0
if v 2 V1 and S.v/ D v if v 2 Vi for i > 1. It follows easily from the
T -invariance of the direct sum decomposition that S commutes with T .
We claim that S is not a polynomial in T . Suppose S D p.T / for some
polynomial p.x/. Then 0 D s.v1 / D p.T /.v1 / so p.x/ is divisible by
p1 .x/, the T -annihilator of v1 . But p1 .x/ is divisible by pi .x/ for i  1, so
p.x/ is divisible by pi .x/ for i > 1, and hence S.v2 / D    D S.vk / D 0.
Thus S.v/ ¤ v if 0 ¤ v 2 Vi for i > 1, a contradiction, and (2) is
false.
Remark 5.9.6. Equivalent conditions to condition (1) of Theorem 5.9.5
were given in Corollary 5.3.3.
Þ
Finally, let S and T be diagonalizable linear transformations. We see
when S and T are simultaneously diagonalizable.
Theorem 5.9.7. Let V be a finite-dimensional vector space and let S W
V ! V and T W V ! V be diagonalizable linear transformations. The
following are equivalent:
(1) S and T are simultaneously diagonalizable, i.e, there is a basis B of
V with ŒSB and ŒT B both diagonal, or equivalently, there is a basis
B of V consisting of common eigenvectors of S and T .
(2) S and T commute.
Proof. Suppose (1) is true. Let B D fv1 ; : : : ; vn g where S.vi / D i vi
and T .vj / D i vi for some i , i 2 F . Then S.T .vi // D S.i vi / D
i i vi D i i vi D T .i .vi // D T .S.vi // for each i , and since B is a
basis, this implies S.T .v// D T .S.v// for every v 2 V , i.e., that S and T
commute.
Suppose (2) is true. Since T is diagonalizable, V D V1 ˚    ˚ Vk
where Vi is the eigenspace of T corresponding to the eigenvalue i of T .
For v 2 Vi , T .S.vi // D S.T .vi // D S.i vi / D i S.vi /, so S.vi / 2 Vi
as well. Thus each subspace Vi is S-invariant. Since S is diagonalizable, so
is its restriction Si W Vi ! Vi . (mSi .x/ divides mS .x/, which is a product
of distinct linear factors, so mSi .x/ is a product of distinct linear factors as
well.) Thus Vi has a basis Bi consisting of eigenvectors for S. Since every
nonzero vector in Vi is an eigenvector of T , Bi consists of eigenvectors of
T , as well. Set B D B1 [    [ Bk .

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 163 — #177

i

i

5.9. More than one linear transformation

163

Remark 5.9.8. It is easy to see that if S and T are both triangularizable
linear transformations and S and T commute, then they are simultaneously
triangularizable, buth it isi even easierh to see
i that the converse is false. For
1
1
1
0
example, take S D 0 2 and T D 0 2 .
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 164 — #178

i

i

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 165 — #179

i

i

CHAPTER

6

Bilinear, sesquilinear,
and quadratic forms
In this chapter we investigate bilinear, sesquilinear, and quadratic forms, or
“forms” for short. A form is an additional structure on a vector space. Forms
are interesting in their own right, and they have applications throughout
mathematics. Many important vector spaces naturally come equipped with
a form.
In the first section we introduce forms and derive their basic properties.
In the second section we see how to simplify forms on finite-dimensional
vector spaces and in some cases completely classify them. In the third section we see how the presence of nonsingular form(s) enables us to define
the adjoint of a linear transformation.

6.1

Basic definitions and results

Definition 6.1.1. A conjugation on a field F is a map c W F ! F with
the properties (where we denote c.f / by f ):
(1) f D f for every f 2 F ,
(2) f1 C f2 D f1 C f2 for every f1 ; f2 2 F ,
(3) f1 f2 D f1 f2 for every f1 ; f2 2 F .
The conjugation c is nontrivial if c is not the identity on F .
A conjugation on a vector space V over F is a map c W V ! V with the
properties (where we denote c.v/ by v ):
(1) v D v for every v 2 V ,
165

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 166 — #180

i

i

166

Guide to Advanced Linear Algebra

(2) v1 C v2 D v1 C v2 for every v1 ; v2 2 V ,
(3) f v D f v for every f 2 F , v 2 V .
Þ
Remark 6.1.2. The archetypical example of a conjugation on a field is
complex conjugation on the field C of complex numbers.
Þ
Definition 6.1.3. Let F be a field with a nontrivial conjugation and let
V and W be F -vector spaces. Then T W V ! W is conjugate linear if
(1) T .v1 C v2 / D T .v1 / C T .v2 / for every v1 ; v2 2 V
(2) T .cv/ D c T .v/ for every c 2 F , v 2 V .
Þ
Now we come to the basic definition. The prefix “sesqui” means “one
and a half”.
Definition 6.1.4. Let V be an F -vector space. A bilinear form is a
function ' W V  V ! F , '.x; y/ D hx; yi, that is linear in each entry, i.e.,
that satisfies
(1) hc1 x1 C c2x2 ; yi D c1hx1 ; yi C c2 hx2 ; yi for every c1 ; c2 2 F , and
x1 ; x2 ; y 2 V
(2) hx; c1y1 C c2y2 i D c1 hx; y1i C c2 hx; y2i for every c1; c2 2 F , and
x; y1 ; y2 2 V .
A sesquilinear form is a function ' W V  V ! F , '.x; y/ D hx; yi, that is
linear in the first entry and conjugate linear in the second, i.e., that satisfies
(1) and (2):
(2) hx; c1y1 C c2 y2 i D c 1 hx; y1 i C c 2 hx; y2 i for every c1 ; c2 2 F , and
x; y1 ; y2 2 V
Þ

for a nontrivial conjugation c 7! c on F .
n

t

Example 6.1.5. (1) Let V D R . Then hx; yi D xy is a bilinear form. If
V D C n , then hx; yi D txy is a sesquilinear form. In both cases this is the
familiar “dot product.” Indeed for any field F we can define a bilinear form
on F n by hx; yi D txy and for any field F with a nontrivial conjugation we
can define a sesquilinear form on F n by hx; yi D txy.
(2) More generally, for an n-by-n matrix A with entries in F , hx; yi D
t
xAy is a bilinear form on F n , and hx; yi D txAy is a sesquilinear form

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 167 — #181

i

i

6.1. Basic definitions and results

167

on F n . We will see that all bilinear and sesquilinear forms on F n arise this
way, and, by taking coordinates, that all bilinear and sesquilinear forms on
finite-dimensional vector spaces over F arise in this way.
(3) Let V D r F 1 and let x D .x1 ; x2; : : :/, y D .y1 ; y2 ; : : :/. We
P
define a bilinear form on V by hx; yi D
xi yi . If F has a nontrivial
P
conjugation, we define a sesquilinear form on V by hx; yi D
xi y i .
(4) Let V be the vector space of real-valued continuous functions on
Œ0; 1. Then V has a bilinear form given by
˝
˛
f .x/; g.x/ D

Z

1

f .x/g.x/ dx:
0

If V is the vector space of complex-valued continuous functions on
Œ0; 1, then V has a sesquilinear form given by
˝
˛
f .x/; g.x/ D

Z

1

f .x/g.x/ dx:
0

Þ

Let us see the connection between forms and dual spaces.
Lemma 6.1.6. (1) Let V be a vector space and let '.x; y/ D hx; yi be a
bilinear form on V . Then ˛' W V ! V  defined by ˛' .y/.x/ D hx; yi is a
linear transformation.
(2) Let V be a vector space and let '.x; y/ D hx; yi be a sesquilinear form on V . Then ˛' W V ! V  defined by ˛' .y/.x/ D hx; yi is a
conjugate linear transformation.
Remark 6.1.7. In the situation of Lemma 6.1.6, ˛' .y/ is often written
as h; yi, so with this notation ˛' W y 7! h; yi.
Þ
Definition 6.1.8. Let V be a vector space and let ' be a bilinear (respectively sesquilinear) form on V . Then ' is nonsingular if the map ˛' W
V ! V  is an isomorphism (respectively conjugate isomorphism).
Þ
Remark 6.1.9. In more concrete terms, ' is nonsingular if and only if
the following is true: Let T W V ! F be any linear transformation. Then
there is a unique vector w 2 V such that
T .v/ D '.v; w/ D hv; wi for every v 2 V:
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 168 — #182

i

i

168

Guide to Advanced Linear Algebra

In case V is finite dimensional, we have an easy criterion to determine
if a form ' is nonsingular.
Lemma 6.1.10. Let V be a finite-dimensional vector space and let '.x; y/ D
hx; yi be a bilinear or sesquilinear form on V . Then ' is nonsingular
if and only if for every y 2 V , y ¤ 0, there is an x 2 V such that
hx; yi D '.x; y/ ¤ 0.
Proof. Since dim V  D dim V , ˛' is an (conjugate) isomorphism if and
only if it is injective.
Suppose that ˛' is injective, i.e., if y ¤ 0, then ˛' .y/ ¤ 0. This means
that there exists an x 2 V with ˛' .y/.x/ D '.x; y/ ¤ 0.
Conversely, suppose that for every y 2 V , y ¤ 0, there exists an x with
˛' .y/.x/ D '.x; y/ ¤ 0. Then for every y 2 V , y ¤ 0, ˛' .y/ is not the
zero map. Hence Ker.˛' / D f0g and ˛' is injective.
Now we see how to use coordinates to associate a matrix to a bilinear or
sesquilinear form on a finite-dimensional vector space. Note this is different
from associating a matrix to a linear transformation.
Theorem 6.1.11. Let '.x; y/ D hx; yi be a bilinear (respectively sesquilinear) form on the finite-dimensional vector space V and let B D fv1; : : : ; vn g
be a basis for V . Define a matrix A D .aij / by

Then for x; y 2 V ,

˝
˛
aij D vi ; vj

hx; yi Dt ŒxB AŒyB

i; j D 1; : : : ; n:

respectively t ŒxB AŒyB :

Proof. By construction, this is true when x D vi and y D vj (as then
Œx D ei and Œy D ej ) and by (conjugate) linearity that implies it is true
for any vectors x and y in V .
Definition 6.1.12. The matrix A D .aij / of Theorem 6.1.11 is the
matrix of the form ' with respect to the basis B. We denote it by Œ'B . Þ
Theorem 6.1.13. The bilinear or sesquilinear form ' on the finite dimensional vector space V is nonsingular if and only if matrix Œ'B in any basis
B of V is nonsingular.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 169 — #183

i

i

6.1. Basic definitions and results

169

Proof. We use the criterion of Lemma 6.1.10 for nonsingularity of a form.
Suppose A D Œ'B is a nonsingular matrix. For x 2 V , x ¤ 0, let
2 3
c1
6 :: 7
ŒxB D 4 : 5 :
cn

Then for some i , ci ¤ 0. Let z D A 1 ei 2 Fn and let y 2 V with ŒyB D z
(or ŒyB D z). Then '.x; y/ D txAA 1 ei D ci ¤ 0.
Suppose A is singular. Let z 2 F n , z ¤ 0, with Az D 0. Then if y 2 V
with ŒyB D z (or ŒyB D z), then '.x; y/ D txAz D tx0 D 0 for every
x 2 V.
Now we see the effect of a change of basis on the matrix of a form.
Theorem 6.1.14. Let V be a finite-dimensional vector space and let ' be
a bilinear (respectively sesquilinear) form on V . Let B and C be any two
bases of V . Then
Œ'C D tPB

C Œ'B PB

C

.respectively tPB

C Œ'B P B

C /:

Proof. We do the sesquilinear case; the bilinear case follows by omitting
the conjugation.
By the definition of Œ'C ,
'.x; y/ D tŒxC Œ'C ŒyC
and by the definition of Œ'B ,
'.x; y/ D tŒxB Œ'B ŒyB :
But ŒxB D PB

C ŒxC

and ŒyB D P B

C ŒyC .

Substitution gives

t

ŒxC Œ'C ŒyC D '.x; y/ D tŒxB Œ'B ŒyB


D t PB C ŒxC Œ'B P B C ŒyC

D tŒxC tPB C Œ'B P B C Œy C :

Since this is true for every x; y 2 V ,
Œ'C D tPB

C Œ'B P B

C:



This leads us to the following definition.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 170 — #184

i

i

170

Guide to Advanced Linear Algebra

Definition 6.1.15. Two square matrices A and B with entries in F
are congruent if there is an invertible matrix P with tPAP D B, and are
conjugate congruent if there is an invertible matrix P with tPAP D B. Þ
It is easy to check that (conjugate) congruence is an equivalence relation. We then have:
Corollary 6.1.16. (1) Let ' be a bilinear (respectively sesquilinear) form
on the finite-dimensional vector space V . Let B and C be bases of V . Then
Œ'B and Œ'C are congruent (respectively conjugate congruent).
(2) Let A and B be congruent (respectively conjugate congruent) n-byn matrices. Let V be an n-dimensional vector space over F . Then there is a
bilinear form (respectively sesquilinear form) ' on V and bases B and C
of V with Œ'B D A and Œ'C D B.

6.2

Characterization and
classification theorems

In this section we derive results about the characterization and classification
of forms on finite-dimensional vector spaces.
Our discussion so far has been general, but almost all the forms encountered in mathematical practice fall into one of the following classes.
Definition 6.2.1. (1) A bilinear form ' on V is symmetric if '.x; y/ D
'.y; x/ for all x; y 2 V .
(2) A bilinear form ' on V is skew-symmetric if '.x; y/ D '.y; x/
for all x; y 2 V , and '.x; x/ D 0 for all x 2 V (this last condition follows
automatically if char.F / ¤ 2).
(3) A sesquilinear form ' on V is Hermitian if '.x; y/ D '.y; x/ for
all x; y 2 V .
(4) A sesquilinear form ' on V is skew-Hermitian if char.F / ¤ 2 and
'.x; y/ D '.y; x/ for all x; y 2 V . (If char.F / D 2, skew-Hermitian is
not defined.)
Þ
Lemma 6.2.2. Let V be a finite-dimensional vector space over F and let '
be a form on V . Choose a basis B of V and let A D Œ'B . Then
(1) ' is symmetric if and only if tA D A.
(2) ' is skew-symmetric if and only if tA D A (and, if char.F / D 2,
the diagonal entries of A are all 0).
(3) ' is Hermitian if and only if tA D A.
(4) ' is skew-Hermitian if and only if tA D A (and char.F / ¤ 2).

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 171 — #185

i

i

6.2. Characterization and classification theorems

171

Definition 6.2.3. Matrices satisfying the conclusion of Lemma 6.2.2
parts (1), (2), (3), or (4) are called symmetric, skew-symmetric, Hermitian,
or skew-Hermitian respectively.
Þ
For the remainder of this section we assume that the forms we consider
are one of these types: symmetric, Hermitian, skew-symmetric, or skewHermitian, and that the vector spaces they are defined on are finite dimensional.
We will write .V; '/ for the space V equipped with the form '.
The appropriate notion of equivalence of forms is isometry.
Definition 6.2.4. Let V admit a form ' and W admit a form . Then
a linear transformation T W V ! W is an isometry between .V; '/ and
.W; / if T is an isomorphism and furthermore



T v1 ; T v2 D ' v1 ; v2 for every v1 ; v2 2 V:

If there exists an isometry between .V; '/ and .W; / then .V; '/ and .W; /
are isometric.
Þ
Lemma 6.2.5. In the situation of Definition 6.2.4, let V have basis B and
let W have basis C. Then T is an isometry if and only if M D ŒT C B is
an invertible matrix with
t

M Œ C M D Œ'B in the bilinear case, or

t

M Œ C M D Œ'B in the sesquilinear case.

Thus V and W are isometric if and only if Œ C and Œ'B are congruent, in
the bilinear case, or conjugate congruent, in the sesquilinear case, in some
(or any) pair of bases B of V and C of W .
Definition 6.2.6. Let ' be a bilinear or sesquilinear form on the vector
space V . Then the isometry group of ' is
˚
Isom.'/ D T W V ! V isomorphism j
T is an isometry from .V; '/ to itself :

Þ

Corollary 6.2.7. In the situation of Definition 6.2.6, let B be any basis of
V . Then T 7! ŒT B gives an isomorphism
˚
Isom.'/ ! invertible matrices M j
t

M Œ'B M D Œ'B or tM Œ'B M D Œ'B :

Now we begin to simplify and classify forms.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 172 — #186

i

i

172

Guide to Advanced Linear Algebra

Definition 6.2.8. Let V admit the form '. Then two vectors v1 and v2
in V are orthogonal (with respect to ') if


' v1 ; v2 D ' v2 ; v1 D 0:

Two subspaces V1 and V2 are orthogonal (with respect to ') if


' v1 ; v2 D ' v2 ; v1 D 0

for all v1 2 V1 ; v2 2 V2 :

Þ

We also have an appropriate notion of direct sum.
Definition 6.2.9. Let V admit a form ', and let V1 and V2 be subspaces
of V . Then V is the orthogonal direct sum of V1 and V2 , V D V1 ? V2 ,
if V D V1 ˚ V2 (i.e., V is the direct sum of V1 and V2 ) and V1 and V2
are orthogonal with respect to '. This is equivalent to the condition: Let
v; v 0 2 V and write v uniquely as v D v1 C v2 with v1 2 V1 and v2 2 V2 ,
and similarly v 0 D v10 C v20 with v10 2 V1 and v20 2 V2 .
Let '1 be the restriction of ' to V1  V1 , and '2 be the restriction of '
to V2  V2 . Then


'.v; v 0 / D '1 v1 ; v10 C '2 v2 ; v20 :

In this situation we will also write .V; '/ D .V1 ; '1 / ? .V2 ; '2 /.

Þ

Remark 6.2.10. Translated into matrix language, the condition in Definition 6.2.9 is as follows: Let B1 be a basis for V1 and B2 be a basis for
V2 . Let A1 D Œ'1 B1 and A2 D Œ'2 B2 . Let B D B1 [ B2 and A D Œ'B .
Then


A1 0
AD
0 A2
(a block-diagonal matrix with blocks A1 and A2 ).

Þ

First let us note that if ' is not nonsingular, we may “split off” its singular part.
Definition 6.2.11. Let ' be a form on V . The kernel of ' is the subspace of V given by
˚
Ker.'/ D v 2 V j '.v; w/ D '.w; v/ D 0 for all w 2 V :

Þ

Remark 6.2.12. By Lemma 6.1.10, ' is nonsingular if and only if
Ker.'/ D 0.
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 173 — #187

i

i

6.2. Characterization and classification theorems

173

Lemma 6.2.13. Let ' be a form on V . Then V is the orthogonal direct sum
V D Ker.'/ ? V1
for some subspace V1 , with '1 D 'jV1 a nonsingular form on V1 , and
.V1 ; '1 / is well-defined up to isometry.
Proof. Let V1 be any complement of Ker.'/, so that V D Ker.'/ ˚ V1 ,
and let '1 D 'jV1 . Certainly V D Ker.'/ ? V1 . To see that '1 is nonsingular, suppose that v1 2 V1 with '.v1 ; w1/ D 0 for every w1 2 V1 . Then
'.v1 ; w/ D 0 for every w 2 V , so v1 2 Ker.'/, i.e., v 2 Ker.'/ \ V1 D
f0g.
There was a choice of V1 , but we claim that all choices yield isometric
forms. To see this, let V 0 be the quotient space V = Ker.'/. There is a welldefined form ' 0 on V 0 defined as follows: Let  W V ! V = Ker.'/ be the
canonical projection. Let v 0 ; w 0 2 V 0 , choose v; w 2 V with v 0 D .v/ and
w 0 D .w/. Then ' 0 .v 0 ; w 0/ D '.v; w/. It is then easy to check that =V1
gives an isometry from .V1 ; '1 / to .V 0 ; ' 0 /.
In light of this lemma, we usually concentrate on nonsingular forms.
But we also have the following well-defined invariant of forms in general.
Definition 6.2.14. Let V be finite dimensional and let V admit the form
'. Then the rank of ' is the dimension of V1 , where V1 is the subspace given
in Lemma 6.2.13.
Þ
Definition 6.2.15. Let W be a subspace of V . Then its orthogonal
subspace is the subspace
˚
W ? D v 2 V j '.w; v/ D 0 for all w 2 W :

Þ

Lemma 6.2.16. Let V be a finite-dimensional vector space. Let W be a
subspace of V and let D 'jW . If is nonsingular, then V D W ? W ? .
If ' is nonsingular as well, then ? D 'jW ? is nonsingular.
Proof. Clearly W and W ? are orthogonal, so to show that V D W ? W ?
it suffices to show that V D W ˚ W ? .
Let v0 2 W \ W ? . Then v0 2 W ?, so '.w; v0/ D 0 for all w 2 W .
But v0 2 W as well, so .w; v0/ D '.w; v0/ and then the nonsingularity
of implies v0 D 0.
Let v0 2 V . Then T .w/ D '.w; v0/ is a linear transformation T W
W ! F , and we are assuming is nonsingular so by Remark 6.1.9 there

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 174 — #188

i

i

174

Guide to Advanced Linear Algebra

is a w0 2 W with T .w/ D .w; w0/ D '.w; w0/ for every w 2 W .
Then '.w; v0 w0 / D 0 for every w 2 W , so v0 w0 2 W ? , and
v0 D w0 C .v0 w0 /.
Suppose ' is nonsingular and let v0 2 W ? . Then there is a vector v 2 V
with '.v; v0 / ¤ 0. Write v D w1 C w2 with w1 2 W , w2 2 W ? . Then





0 ¤ ' v; v0 D ' w1 C w2 ; v0 D ' w1 ; v1 C ' w2 ; v0 D ' w2 ; v0 ;

so 'jW ? is nonsingular.

Remark 6.2.17. The condition that 'jW be nonsingular is necessary.
For example, if ' is the form on F 2 defined by
'.v; w/ D tv




01
w
10

and W is the subspace
  
x
W D
;
0
then W D W ? .

Þ

Corollary 6.2.18. Let V be a finite-dimensional vector space and let W be
a subspace of V with 'jW and 'jW ? both nonsingular. Then .W ? /? D
W.
Proof. We have V D W ? W ? D W ? ? .W ? /? . It is easy to check that
.W ? /?  W , so they are equal.
Our goal now is to “simplify”, and in favorable cases classify, forms on
finite-dimensional vector spaces. Lemma 6.2.16 is an important tool that
enables to apply inductive arguments. Here is another important tool, and a
result interesting in its own right.
Lemma 6.2.19. Let V be a vector space over F , and let V admit the nonsingular form '. If char.F / ¤ 2, assume ' is symmetric or Hermitian. If
char.F / D 2, assume ' is Hermitian. Then there is a vector v 2 V with
'.v; v/ ¤ 0.
Proof. Pick a nonzero vector v1 2 V . If '.v1 ; v1/ ¤ 0, then set v D v1 .
If '.v1 ; v1/ D 0, then, by the nonsingularity of ', there is a vector v2

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 175 — #189

i

i

6.2. Characterization and classification theorems

175

with b D '.v1 ; v2/ ¤ 0. If '.v2 ; v2 / ¤ 0, set v D v2 . Otherwise, let
v3 D av1 C v2 where a 2 F is an arbitrary scalar. Then


' v3 ; v3 D ' av1 C v2 ; av1 C v2




D ' av1 ; av1 C ' av1 ; v2 C ' v2 ; av1 C ' v2 ; v2


D ' av1 ; v2 C ' v2 ; av1
D 2ab

if ' is symmetric

D ab C ab

if ' is Hermitian.

In the symmetric case, choose a ¤ 0 arbitrarily. In the Hermitian case,
let a be any element of F with ab ¤ ab. (If char.F / ¤ 2 we may choose
a D b 1 . If char.F / D 2 we may choose a D b 1 c where c 2 F with
c ¤ c.) Then set v D v3 for this choice of a.
Remark 6.2.20. The conclusion of this lemma does not hold if char.F / D
2. For example, let F be a field of characteristic 2, let V D F 2 , and let ' be
the form defined on V by
 
0 1
'.v; w/ D v
w:
1 0
t

Then it is easy to check that '.v; v/ D 0 for every v 2 V .

Þ

Thus we make the following definition.
Definition 6.2.21. Let V be a vector space over a field F of characteristic 2 and let ' be a symmetric bilinear form on V . Then ' is even if
'.v; v/ D 0 for every v 2 V , and odd otherwise.
Þ
Lemma 6.2.22. Let V be a vector space over a field F of characteristic 2
and let ' be a symmetric bilinear form on V . Then V is even if and only if
for some (and hence for every) basis B D fv1 ; v2 ; : : :g of V , '.vi ; vi / D 0
for every vi 2 B.
Proof.

This follows immediately from the identity
'.v C w; v C w/ D '.v; v/ C '.v; w/ C '.w; v/ C '.w; w/
D '.v; v/ C 2'.v; w/ C '.w; w/
D '.v; v/ C '.w; w/:



Here is our first simplification.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 176 — #190

i

i

176

Guide to Advanced Linear Algebra

Definition 6.2.23. Let V be a finite-dimensional vector space and let '
be a symmetric bilinear or a Hermitian form on V . Then ' is diagonalizable
if there are 1-dimensional subspaces V1 ; V2 ; : : : ; Vn of V such that
V D V1 ? V2 ?    ? Vn :

Þ

Remark 6.2.24. Let us see where the name comes from. Choose a nonzero
vector vi in Vi for each i (so fvi g is a basis for Vi ) and let ai D '.vi ; vi /.
Let B be the basis of V given by B D fv1 ; : : : ; vn g. Then
2
3
a1
6 a2
07
6
7
Œ'B D 6
7
:
:
4
5
:
0
an

is a diagonal matrix. Conversely if V has a basis B D fv1 ; : : : ; vn g with
Œ'B diagonal, then V D V1 ?    ? Vn where Vi is the subspace spanned
by vi .
Þ

Remark 6.2.25. We will let Œa denote the bilinear or Hermitian form
on F (an F -vector space) with matrix Œa, i.e., the bilinear form given by
'.x; y/ D xay, or the Hermitian form given by '.x; y/ D xay. In this
notation a form ' on V is diagonalizable if and only if it is isometric to
Œa1  ?    ? Œan  for some a1 ; : : : ; an 2 F .
Þ
Theorem 6.2.26. Let V be a finite-dimensional vector space over a field
F of characteristic ¤ 2, and let ' be a symmetric or Hermitian form on
V . Then ' is diagonalizable. If char.F / D 2 and ' is Hermitian, then ' is
diagonalizable.

Proof. We only prove the case char.F / ¤ 2.
By Lemma 6.2.13, it suffices to consider the case where ' is nonsingular. We proceed by induction on the dimension of V .
If V is 1-dimensional, there is nothing to prove. Suppose the theorem is
true for all vector spaces of dimension less than n, and let V have dimension n.
By Lemma 6.2.19, there is an element v1 of V with '.v1 ; v1 / D a1 ¤ 0.
Let V1 D Span.v1 /. Then, by Lemma 6.2.16, V D V1 ? V1? and 'jV1? is
nonsingular. Then by induction V1? D V2 ?    ? Vn for 1-dimensional
subspaces V2 ; : : : ; Vn , so V D V1 ? V2 ?    ? Vn as required.
The theorem immediately gives us a classification of forms on complex
vector spaces.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 177 — #191

i

i

6.2. Characterization and classification theorems

177

Corollary 6.2.27. Let ' be a nonsingular symmetric bilinear form on V ,
where V is an n-dimensional vector space over C. Then ' is isometric to
Œ1 ?    ? Œ1. In particular, any two such forms are isometric.
Proof. By Theorem 6.2.26, V D V1 ?    ? Vn where Vi has basis fvi g.
Let ai D '.vi ; vi /. If bi is a complex number with bi2 D 1=ai and B is the
basis B D fb1 v1 ; : : : ; bn vn g of V , then
Œ'B

2
1
6 ::
D4
:
0

0
1

3

7
5:



The classification of symmetric forms over R, or Hermitian forms over
C, is more interesting. Whether we can solve bi2 D 1=ai over R, or bi bi D
1=ai over C, comes down to the sign of ai . (Recall that in the Hermitian
case ai must be real.)
Before developing this classification, we introduce a notion interesting
and important in itself.
Definition 6.2.28. Let ' be a symmetric bilinear form on the real vector
space V , or a Hermitian form on the complex vector space V . Then ' is
positive definite if '.v; v/ > 0 for every v 2 V , v ¤ 0, and ' is negative
definite if '.v; v/ < 0 for every v 2 V , v ¤ 0. It is indefinite if there are
vectors v1 ; v2 2 V with '.v1 ; v1 / > 0 and '.v2 ; v2 / < 0.
Þ
Theorem 6.2.29 (Sylvester’s law of inertia). Let V be a finite-dimensional
real vector space and let ' be a nonsingular symmetric bilinear form on
V , or let V be a finite-dimensional complex vector space and let ' be a
nonsingular Hermitian form on V . Then ' is isometric to pŒ1 ? qŒ 1 for
well-defined integers p and q with p C q D n D dim.V /.
Proof. As in the proof of Corollary 6.2.27, we have that ' is isometric to
pŒ1 ? qŒ 1 for some integers p and q with p C q D n. We must show
that p and q are well-defined.
To do so, let VC be a subspace of V of largest dimension with 'jVC
positive definite and let V be a subspace of V of largest dimension with
'jV negative definite. Let p0 D dim.VC / and q0 D dim.V /. Clearly p0
and q0 are well-defined. We shall show that p D p0 and q D q0 . We argue
by contradiction.
Let B be a basis of V with Œ'B D pŒ1 ? qŒ 1. If B D fv1 ; : : : ; vn g,
let BC D fv1 ; : : : ; vp g and B D fvpC1 ; : : : ; vn g. If WC is the space

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 178 — #192

i

i

178

Guide to Advanced Linear Algebra

spanned by BC , then 'jWC is positive definite, so p0  p. If W is the
space spanned by B , then 'jW is negative definite, so q0  q. Now
p C q D n, so p0 C q0  n. Suppose it is not the case that p D p0 and
q D q0 . Then p0 Cq0 > n, i.e., dim.VC /Cdim.V / > n. Then VC \V has
dimension at least one, so contains a nonzero vector v. Then '.v; v/ > 0 as
v 2 VC , but '.v; v/ < 0 as v 2 V , which is impossible.
We make part of the proof explicit.
Corollary 6.2.30. Let V and ' be as in Theorem 6.2.29. Let p0 be the
largest dimension of a subspace VC of V with 'jVC positive definite and
let q0 be the largest dimension of a subspace V of V with 'jV negative
definite. If ' is isometric to pŒ1 ? qŒ 1, then p D p0 and q D q0 . In
particular, ' is positive definite if and only if ' is isometric to nŒ1.
We can now define a very important invariant of these forms.
Definition 6.2.31. Let V , ', p, and q be as in Theorem 6.2.29. Then
the signature of ' is p q.
Þ
Corollary 6.2.32. A nonsingular symmetric bilinear form on a finite-dimensional vector space V over R, or a nonsingular Hermitian form on a finitedimensional vector space V over C, is classified up to isometry by its rank
and signature.
Remark 6.2.33. Here is one way in which these notions appear. Let
f W Rn ! R be a C 2 function and let x0 be a critical point of f . Let H
be the Hessian matrix of f at x0 . Then f has a local minimum at x0 if H
is positive definite and a local maximum at x0 if H is negative definite. If
H is indefinite, then x0 is neither a local maximum nor a local minimum
for f .
Þ
We have the following useful criterion.
Theorem 6.2.34 (Hurwitz’s criterion). Let ' be a nonsingular symmetric bilinear form on the n-dimensional complex vector space V . Let B D
fv1 ; : : : ; vn g be an arbitrary basis of V and let A D Œ'B . Let ı0 .A/ D 1
and for 1  k  n let ık .A/ D det.Ak / where Ak is the k-by-k submatrix
in the upper left corner of A. Then
(1) ' is positive definite if and only if ık .A/ > 0 for k D 1; : : : ; n.
(2) ' is negative definite if and only if . 1/k ık .A/ > 0 for k D 1; : : : ; n.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 179 — #193

i

i

6.2. Characterization and classification theorems

179

(3) If ık .A/ ¤ 0 for k D 1; : : : ; n, then the signature of ' is r s, where
˚
r D # k j ık .A/ and ık 1 .A/ have the same sign
˚
s D # k j ık .A/ and ık 1 .A/ have opposite signs :

Proof. We prove (1). Then (2) follows immediately by considering the form
'. We leave (3) to the reader; it can be proved using the ideas of the proof
of (1).
We prove the theorem by induction on n D dim.V /. If n D 1, the
theorem is clear: ' is positive definite if and only if Œ'B D Œa1  with a1 >
0. Suppose the theorem is true for all forms on vector spaces of dimension
n 1 and let V have dimension n. Let Vn 1 be the subspace of V spanned
by Bn 1 D fv1 ; : : : ; vn 1 g, so that An 1 D Œ'jVn 1 Bn 1 .
Suppose ' is positive definite. Then 'jVn 1 is also positive definite (if
'.v; v/ > 0 for all v ¤ 0 in V , then '.v; v/ > 0 for all v 2 Vn 1 ). By
the inductive hypothesis ı1 .A/; : : : ; ın 1 .A/ are all positive. Also, since
ın 1 .A/ ¤ 0, 'jVn 1 is nonsingular. Hence V D Vn 1 ? Vn? 1 , where
Vn? 1 is a 1-dimensional subspace generated by a vector wn . Let bnn D
'.wn ; wn /, so bnn > 0.
Let B 0 be the basis fv1 ; : : : ; vn 1 ; wn g. Then
det.Œ'B 0 / D ın

1 .A/bnn

> 0:

By Theorem 6.1.14, if P is the change of basis matrix PB 0 B , then

det Œ'B 0 D det.P /2 det.A/ D det.P /2 ın .A/ if ' is symmetric
ˇ
ˇ2
D det.P /det.P / det.A/ D ˇ det.P /ˇ ın .A/ if ' is Hermitian

and in any case ın .A/ has the same sign as det.Œ'B 0 /, so ın .A/ > 0.
Suppose that ı1 .A/; : : : ; ın 1 .A/ are all positive. By the inductive hypothesis 'jVn 1 is positive definite. Again let V D Vn 1 ? Vn? 1 with wn
as above. If bnn D '.wn ; wn/ > 0 then ' is positive definite. The same
argument shows that ın 1 .A/bnn has the same sign as ın .A/. But ın 1 .A/
and ın .A/ are both positive, so bnn > 0.
Here is a general formula for the signature of '.
Theorem 6.2.35. Let ' be a nonsingular symmetric bilinear form on the
n-dimensional real vector space V or a nonsingular Hermitian form on
the n-dimensional complex vector space V . Let B be a basis for ' and let
A D Œ'B . Then

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 180 — #194

i

i

180

Guide to Advanced Linear Algebra

(1) A has n real eigenvalues (counting multiplicity), and
(2) the signature of ' is r s, where r is the number of positive eigenvalues
and s is the number of negative eigenvalues of A.
Proof. To prove this we need a result from the next chapter, Corollary 7.3.20,
that states that every symmetric matrix is orthogonally diagonalizable and
that every Hermitian matrix is unitarily diagonalizable. In other words, if
A is symmetric then there is an orthogonal matrix P , i.e., a matrix with
t
P D P 1 , such that D D PAP 1 is diagonal, and if A is Hermitian there
is a unitary matrix P , i.e., a matrix with tP D P 1 , such that D D PAP 1
is diagonal (necessarily with real entries). In both cases the diagonal entries
of D are the eigenvalues of A and D D Œ'C for some basis C.
Thus we see that r s is the number of positive entries on the diagonal
of D minus the number of negative entries on the diagonal of D.
Let C D fv1 ; : : : ; vn g. Reordering the elements of C if necessary, we
may assume that the first r diagonal entries of D are positive and the remaining s D n r diagonal entries of D are negative. Then V D W1 ? W2
where W1 is the subspace spanned by fv1; : : : ; vr g and W2 is the subspace
spanned by fvr C1 ; : : : ; vn g. Then 'jW1 is positive definite and 'jW2 is negative definite, so the signature of ' is equal to dim.W1 / dim.W2 / D
r s.
Closely related to symmetric bilinear forms are quadratic forms.
Definition 6.2.36. Let V be a vector space over F . A quadratic form
on V is a function ˆ W V ! F satisfying
(1) ˆ.av/ D a2 ˆ.v/ for any a 2 F , v 2 V

(2) the function ' W V  V ! F defined by
'.x; y/ D ˆ.x C y/

ˆ.x/

ˆ.y/

is a (necessarily symmetric) bilinear form on V . We say that ˆ and ' are
associated.
Þ
Lemma 6.2.37. Let V be a vector space over F with char.F / ¤ 2. Then
every quadratic form ˆ is associated to a unique symmetric bilinear form,
and conversely.
Proof. Clearly ˆ determines '. On the other hand, suppose that ' is associated to ˆ. Then 4ˆ.x/ D ˆ.2x/ D ˆ.x C x/ D 2ˆ.x/ C '.x; x/

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 181 — #195

i

i

6.2. Characterization and classification theorems

181

so
ˆ.x/ D

1
'.x; x/
2

and ' determines ˆ as well.
In characteristic 2 the situation is considerably more subtle and we simply state the results without proof. For an integer m let e.m/ D 2m 1 .2m C
1/ and o.m/ D 2m 1 .2m 1/.
Theorem 6.2.38. (1) Let ' be a symmetric bilinear form on a vector space
V of dimension n over the field F of 2 elements. Then ' is associated
to a quadratic form ˆ if and only if ' is even (in the sense of Definition 6.2.21). In this case there are 2n quadratic forms associated to '. Each
such quadratic form ˆ is called a quadratic refinement of '.
(2) Let ' be a nonsingular even symmetric bilinear form on a vector
space V of necessarily even dimension n D 2m over F , and let ˆ be a
quadratic refinement of '.
The Arf invariant of ˆ is defined as follows: Let j  j denote the cardinality of a set. Then either
ˇ 1 ˇ
ˇ
ˇ
ˇˆ .0/ˇ D e.m/ and ˇˆ 1 .1/ˇ D o.m/; in which case Arf.ˆ/ D 0;
or
ˇ
ˇˆ

1

ˇ
.0/ˇ D o.m/

ˇ
and ˇˆ

1

ˇ
.1/ˇ D e.m/;

in which case Arf.ˆ/ D 1:

Then there are e.m/ quadratic refinements ˆ of ' with Arf.ˆ/ D 0 and
o.m/ quadratic refinements ˆ of ' with Arf.ˆ/ D 1.
(3) Quadratic refinements of a nonsingular even symmetric bilinear
form on a finite-dimensional vector space V are classified up to isometry
by their rank .D dim.V // and Arf invariant.
Proof. Omitted.
Example 6.2.39. We now give a classical application of our earlier results. Let
8̂2 39
< x:1 >
=
6 7
V D F n D 4 :: 5 ;
:̂ x >
;
n

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 182 — #196

i

i

182

Guide to Advanced Linear Algebra

F a field of characteristic ¤ 2, and suppose we have a function Q W V ! F
of the form
02 3 1
x1
X
B6 :: 7C 1 X
ai i xi2 C
aij xi xj :
Q @4 : 5 A D
2
i
i  0 whenever 4 : 5 ¤ 4 : 5 :
xn
xn
0

Then q is positive definite, and we call Q positive definite in this case as
well. We then see that for an appropriate change of variable
02 31
x1
n
B6 : 7C X 2
yi :
Q @4 :: 5A D
i D1
xn

That is, over R every positive definite quadratic form can be expressed
as a sum of squares.
Þ
Let us now classify skew-symmetric bilinear forms.

Theorem 6.2.40. Let V be a vector space of finite dimension n over an
arbitrary field F , and let ' be a nonsingular skew-symmetric
bilinear form
 0 1
on
V
.
Then
n
is
even
and
'
is
isometric
to
.n=2/
,
or,
equivalently,
to
1 0
 0I
,
where
I
is
the
.n=2/-by-.n=2/
identity
matrix.
I 0

Proof. We proceed by induction on n. If n D 1 and ' is skew-symmetric,
then we must have Œ'B D Œ0, which is singular, so that case cannot occur.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 183 — #197

i

i

6.2. Characterization and classification theorems

183

Suppose the theorem is true for all vector spaces of dimension less than n
and let V have dimension n.
Choose v1 2 V , v1 ¤ 0. Then, since ' is nonsingular, there exists
w 2 V with '.w; v1/ D a ¤ 0, and w is not a multiple of v1 as ' is
skew-symmetric. Let v2 D .1=a/w, let B1 D fv1 ; v2 g, and let V1 be the
subspace of V spanned by B1 . Then Œ'jV1 B1 D Œ 10 01 . V1 is a nonsingular
subspace so, by Lemma 6.2.16, V D V1 ? V1? . Now dim.V1? / D n 2
so we may assume by induction that V1? has a basis B2 with Œ'jV1? B2 D
..n 2/=2/Œ 10 01 . Let B D B1 [ B2 . Then Œ'B D .n=2/Œ 10 01 .
Finally, if B D fv1 ; : : : ; vn g, let B 0 D fv1 ; v3 ; : : : ; vn 1 ; v2 ; v4; : : : ; vn g.
Then Œ'B 0 D Œ I0 I0 .
Finally, we consider skew-Hermitian forms. In this case, by convention,
the field F of scalars has char.F / ¤ 2. We begin with a result about F itself.
Lemma 6.2.41. Let F be a field with char.F / ¤ 2 equipped with a nontrivial conjugation c 7! c. Then:
(1) F0 D fc 2 F j c D cg is a subfield of F .
(2) There is a nonzero element j 2 F with j D

j.

(3) Every element of F can be written uniquely as c D c1 C jc2 with
c1 ; c2 2 F (so that F is a 2-dimensional F0 -vector space with basis
f1; j g). In particular, c D c if and only if c D c2 j for some c2 2 F0 .
Proof. (1) is easy to check. (Note that 1 D .1  1/ D 1  1 so 1 D 1.)
(2) Let c be any element of F with c ¤ c and let j D .c c/=2.
(3) Observe that c D c1 Cjc2 with c1 D .c Cc/=2 and c2 D .c c/=2j .
It is easy to check that c1 ; c2 2 F0.
Also, if c D c1 C c2 j with c1 ; c2 2 F0, then c D c1 jc2 and, solving
for c1 and c2 , we obtain c1 D .c C c/=2 and c2 D .c c/=2j .
Remark 6.2.42. If F D C and the conjugation is complex conjugation,
F0 D R and we may choose j D i .
Þ
Theorem 6.2.43. Let V be a finite-dimensional vector space and let ' be
a nonsingular skew-Hermitian form on V . Then ' is diagonalizable, i.e.,
' is isometric to Œa1  ? : : : ? Œan  with ai 2 F , ai ¤ 0, ai D ai , or
equivalently ai D jbi with bi 2 F0, bi ¤ 0, for each i .
Proof. First we claim there is a vector v 2 V with '.v; v/ ¤ 0. Choose
v1 2 V , v1 ¤ 0, arbitrarily. If '.v1 ; v1/ ¤ 0, choose v D v1 . Otherwise,
since ' is nonsingular there is a vector v2 2 V with '.v1 ; v2/ D a ¤ 0.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 184 — #198

i

i

184

Guide to Advanced Linear Algebra

(Then '.v2 ; v1 / D a.) If '.v2 ; v2 / ¤ 0, choose v D v2 . Otherwise,
for any c 2 F , let v3 D v1 C cv2 . We easily compute that '.v3 ; v3 / D
ac a c D ac .ac/. Thus if we let v D v1 C .j=a/v2 , '.v; v/ ¤ 0.
Now proceed as in the proof of Theorem 6.2.26.
Corollary 6.2.44. Let V be a complex vector space of dimension n and
let ' be a nonsingular skew-Hermitian form on V . Then ' is isometric to
r Œi  ? sŒ i  for well-defined integers r and s with r C s D n.
Proof. By Theorem 6.2.43, V has a basis B D fv1 ; : : : ; vn g with Œ'B diagonal with entries i b1 ; : : : ; i bn forp
nonzero real numbers b1 ; : : : ; bn . Letting
B 0 D fv10 ; : : : ; vn0 g with vi0 D . 1=jbi j/vi we see that Œ'B 0 is diagonal
with all diagonal entries ˙i . It remains to show that the numbers r of Ci
and s of i entries are well-defined.
The proof is almost identical to the proof of Theorem 6.2.29, the
only difference being that instead of considering '.v; v/ we consider
.1= i /'.v; v/.

6.3

The adjoint of a
linear transformation

We now return to the general situation. We assume in this section that .V; '/
and .W; / are nonsingular, where the forms ' and are either both bilinear or both sesquilinear. Given a linear transformation T W V ! W , we
define its adjoint T adj W W ! V . We then investigate properties of the
adjoint.
Definition 6.3.1. Let T W V ! W be a linear transformation. The
adjoint of T is the linear transformation T adj W W ! V defined by


T .x/; y D ' x; T adj.y/ for all x 2 V; y 2 W:
Þ

This is a rather complicated definition, and the first thing we need to see
is that it in fact makes sense.

Lemma 6.3.2. T adj W W ! V , as given in Definition 6.3.1, is a welldefined linear transformation.
Proof. We give two proofs, the first more concrete and the second more
abstract.
The first proof proceeds in two steps. The first step is to observe that the
formula '.x; z/ D .T .x/; y/, where x 2 V is arbitrary and y 2 W is

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 185 — #199

i

i

6.3. The adjoint of a linear transformation

185

any fixed element, defines a unique element z of V , since ' is nonsingular.
Hence T adj.y/ D z is well-defined. The second step is to show that T adj is
a linear transformation. We compute, for x 2 V arbitrary,




' x; T adj.y1 C y2 / D T .x/; y1 C y2 D T .x/; y1 C T .x/; y2


D ' x; T adj .y1 / C ' x; T adj .y2 /
and



T .x/; cy D c T .x/; y


D c ' x; T adj.y/ D ' x; cT adj .y/ :


' x; T adj .cy/ D

For the second proof, we first consider the bilinear case. The formula in
Definition 6.3.1 is equivalent to



˛' T adj .y/ .x/ D ˛ .y/ T .x/ D T  ' .y/ .x/;
where T  W W  ! V  is the dual of T , which gives
T adj D ˛' 1 ı T  ı ˛ :
In the sesquilinear case we have a bit more work to do, since ˛' and
˛ are conjugate linear rather than linear. The formula in Definition 6.3.1
is equivalent to .T .x/; y/ D '.x; T adj .y//. Define ˛' by ˛' .y/.x/ D
'.x; y/, and define ˛ similarly. Then ˛' and ˛ are linear transformations
and by the same logic we obtain
T adj D ˛' 1 ı T  ı ˛ :



Remark 6.3.3. T adj is often denoted by T  , but we will not use that notation in this section as we are also considering T  , the dual of T , here. Þ
Suppose V and W are finite dimensional. Then, since T adj W W ! V is
a linear transformation, once we have chosen bases, we may represent T adj
by a matrix.
Lemma 6.3.4. Let B and C be bases of V and W respectively and let
P D Œ'B and Q D Œ C . Then
 adj
T
D P 1 t ŒT C B Q if ' and are bilinear;
B C

and

 adj 
T
B

C

DP

1t
ŒT

C

B

Q

if ' and

are sesquilinear:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 186 — #200

i

i

186

Guide to Advanced Linear Algebra

In particular, if V D W , ' D and B D C, and P D Œ'B , then
 adj
T
D P 1 t ŒT B P if ' is bilinear;
B

and

 adj
T
DP
B

1t

ŒT B P

if ' is sesquilinear:

Proof. Again we give two proofs, the first more concrete and the second
more abstract.
For the first proof, let ŒT C B D M and ŒT adjC B D N . Then
 ˝
˛

T .x/; y D T .x/; y D t M ŒxB QŒyC D t ŒxB t MQŒyC
and


 ˝
˛
' x; T adj .y/ D x; T adj.y/ D t ŒxB P N ŒyC D t ŒxB P N ŒyC

from which we obtain
t

MQ D P N

and hence

N DP

1t
M

Q:

For the second proof, let B D fv1 ; v2 ; : : :g and set B D fv 1 ; v2 ; : : :g.
Then, keeping track of conjugations, we know from the second proof of
Lemma 6.3.2 that
 adj 
 
 1 
 
T
D ˛' B  B
T B C  ˛ C  C :
B C
But Œ˛' B  B D P ; Œ˛ C  C D Q, and from Definition 2.4.1 and
Lemma 2.4.2 we see that ŒT  B  C  D t ŒT C B D t ŒT C B .
In one very important case this simplifies.
Definition 6.3.5. Let V be a vector space and let ' be a form on V . A
basis B D fv1 ; v2 ; : : :g of V is orthonormal if '.vi ; vj / D '.vj ; vi / D 1 if
i D j and 0 if i ¤ j .
Þ
Remark 6.3.6. We see from Corollary 6.2.30 that if F D R or C then
V has an orthonormal basis if and only if ' is real symmetric or complex
Hermitian, and positive definite in either case.
Þ
Corollary 6.3.7. Let V and W be finite-dimensional vector spaces with
orthonormal bases B and C respectively. Let T W V ! W be a linear
transformation. Then
ŒT adj B

C

D t ŒT C

B

if ' and

are bilinear

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 187 — #201

i

i

6.3. The adjoint of a linear transformation

187

and
ŒT adjB

C

D t ŒT C

B

if ' and

are sesquilinear:

In particular, if T W V ! V then
ŒT adjB D t ŒT B

if ' is bilinear

and
ŒT adjB D t ŒT B

if ' is sesquilinear:

Proof. In this case, both P and Q are identity matrices.
Remark 6.3.8. There is an important generalization of the definition of
the adjoint. We have seen in the proof of Lemma 6.3.2 that T adj is defined by
˛' ıT adj D T ı˛ . Suppose now that ˛' , or equivalently ˛' , is injective but
not surjective, which may occur when V is infinite dimensional. Then T adj
may not be defined. But if T adj is defined, then it is well-defined, i.e., if there
is a linear transformation S W W ! V satisfying '.T .x/; y/ D .x; S.y//
for every x 2 V , y 2 W , then there is a unique such linear transformation
S, and we set T adj D S.
Þ
Remark 6.3.9. (1) It is obvious, but worth noting, that if ˛' is injective
the identity I W V ! V has adjoint I  D I, as '.I.x/; y/ D '.x; y/ D
'.x; I.y// for every x; y 2 V .
(2) On the other hand, if ˛' is not injective there is no hope of defining
an adjoint. For suppose V0 D Ker.˛' / ¤ f0g. Let P0 W W ! V be
any linear transformation with P0 .W /  V0 . If S W W ! V is a linear
transformation with .T .x/; y/ D '.x; S.y//, then S 0 D S C P0 also
satisfies .T .x/; y/ D '.x; S 0 .y// for x 2 V , y 2 W .
Þ
We state some basic properties of adjoints.
Lemma 6.3.10. (1) Suppose T1 W V ! W and T2 W V ! W both have
adjoints. Then T1 C T2 W V ! W has an adjoint and .T1 C T2 /adj D
adj
adj
T1 C T2 .
(2) Suppose T W V ! W has an adjoint. Then cT W V ! W has an
adjoint and .cT /adj D c T adj.
(3) Suppose S W V ! W and T W W ! X both have adjoints. Then
T ı S W V ! X has an adjoint and .T ı S/adj D S adj ı T adj.
(4) Suppose T W V ! V has an adjoint. Then for any polynomial
p.x/ 2 F Œx, p.T / has an adjoint and .p.T //adj D p.T adj /.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 188 — #202

i

i

188

Guide to Advanced Linear Algebra

Lemma 6.3.11. Suppose that ' and are either both symmetric, both Hermitian, both skew-symmetric, or both skew-Hermitian. If T W V ! W has
an adjoint, then T adj W W ! V has an adjoint and .T adj /adj D T .
Proof. We prove the Hermitian case, which is typical. Let S D T adj. By
definition, .T .x/; y/ D '.x; S.y// for x 2 V , y 2 W . Now S has an
adjoint R if and only if '.S.y/; x/ D .y; R.x//. But




' S.y/; x D ' x; S.y/ D T .x/; y D y; T .x/

so R D T , i.e., .T adj /adj D T .

We will present a number of interesting examples of and related to adjoints in Section 7.3 and in Section 7.4.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 189 — #203

i

i

CHAPTER

7

Real and complex inner
product spaces
In this chapter we consider real and complex vector spaces equipped with
an inner product. An inner product is a special case of a symmetric bilinear
form, in the real case, or of a Hermitian form, in the complex case. But it is
a very important special case, one in which much more can be said than in
general.

7.1

Basic definitions

We begin by defining the objects we will be studying.
Definition 7.1.1. An inner product '.x; y/ D hx; yi on a real vector
space V is a symmetric bilinear form with the property that hv; vi > 0 for
every v 2 V , v ¤ 0.
An inner product '.x; y/ D hx; yi on a complex vector space V is a
Hermitian form with the property that hv; vi > 0 for every v 2 V , v ¤ 0.
A real or complex vector space equipped with an inner product is an
inner product space.
Þ
Example 7.1.2. (1) The cases F D R and C of Example 6.1.5(1) give
inner product spaces.
(2) Let F D R and let A be a real symmetric matrix (i.e., tA D A),
or let F D C and let A be a complex Hermitian matrix (i.e., tA D A) in
Example 6.1.5(2). Then we obtain inner product spaces if and only if A is
positive definite.
(3) Let F D R or C in Example 6.1.5(3).
(4) Example 6.1.5(4).
Þ
189

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 190 — #204

i

i

190

Guide to Advanced Linear Algebra

In this chapter we let F be R or C. We will frequently state and prove
results only in the complex case when the real case can be obtained by
ignoring the conjugation.
Let us begin by relating inner products to the forms we considered in
Chapter 6.
Lemma 7.1.3. Let ' be an inner product on the finite-dimensional real
or complex vector space V . Then ' is nonsingular in the sense of Definition 6.1.8.
Proof. Since '.y; y/ > 0 for every y 2 V , y ¤ 0, we may apply Lemma
6.1.10, choosing x D y.
Remark 7.1.4. Inner products are particularly nice symmetric or Hermitian forms. One of the ways they are nice is that if ' is such a form on a vector space V , then not only is ' nonsingular but its restriction to any subspace
W of V is nonsingular. Conversely, if ' is a form on a real or complex vector
space V such that the restriction of ' to any subspace W of V is nonsingular, then either ' or ' must be an inner product. For if neither ' nor ' is
an inner product, there are two possibilities: (1) There is a vector w0 with
'.w0 ; w0/ D 0, or (2) There are vectors w1 and w2 with '.w1 ; w1/ > 0 and
'.w2 ; w2/ < 0. In this case f .t/ D '.tw1 C .1 t/w2 ; tw1 C .1 t/w2 / is
a continuous real-valued function with f .0/ > 0 and f .1/ < 0, so there is
a value t0 with f .t0 / D 0, i.e., '.w0 ; w0/ D 0 for w0 D t0 w1 C .1 t0 /w2 .
Then ' is identically 0 on Span.fw0 g/.
Þ
We now turn our attention to norms of vectors.
Definition 7.1.5. Let V be an inner product space. The norm kvk of a
vector v 2 V is
p
kvk D hv; vi:
Þ

Lemma 7.1.6. Let V be an inner product space.
(1) kcvk D jcjkvk for any c 2 F and any v 2 V .
(2) kvk  0 for all v 2 V and kvk D 0 if and only if v D 0.
(3) (Cauchy-Schwartz-Buniakowsky inequality) jhv; wij  kvkkwk for
all v; w 2 V , with equality if and only if fv; wg is linearly dependent.
(4) (Triangle inequality) kv C wk  kvk C kwk for all v; w 2 V , with
equality if and only if w D 0 or v D pw for some nonnegative real number
p.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 191 — #205

i

i

7.1. Basic definitions

191

Proof. (1) and (2) are immediate.
For (3), if fv; wg is linearly dependent then w D 0 or w ¤ 0 and
v D cw for some c 2 F , and it is easy to check that in both cases we have
equality. Assume that fv; wg is linearly independent. Then for any c 2 F ,
x D v cw ¤ 0, and then direct computation shows that
0 < kxk2 D hx; xi D hv; vi C h cw; vi C hv; cwi C h cw; cwi
D hv; vi

chv; wi

chv; wi C jcj2 hw; wi:

Setting c D hv; wi=hw; wi gives
0 < hv; vi
which gives the inequality.
For (4), we have that
vCw

2

ˇ
ˇ
ˇhv; wiˇ2 =hw; wi;

D hv C w; v C wi

D hv; vi C hv; wi C hw; vi C hw; wi

D kvk2 C hv; wi C hv; wi C kwk2
ˇ
ˇ
 kvk2 C 2ˇhv; wiˇ C kwk2

2
 kvk2 C 2kvkkwk C kwk2 D kvk C kwk ;

which gives the triangle inequality. The second inequality in the proof is the
Cauchy-Schwartz-Buniakowsky inequality. The first inequality in the proof
holds because for a complex number c, c C c  2jcj; with equality only if
c is a nonnegative real number.
To have kvCwk2 D .kvkCkwk/2 both inequalities in the proof must be
equalities. The second one is an equality if and only if w D 0; in which case
the first one is, too, or if and only if w ¤ 0 and v D pw for some complex
number p: Then hv; wi C hw; vihpw; wi C hw; pwi D .p C p/kwk2 and
then the first inequality is an equality if and only if p is a nonnegative real
number.
If V is an inner product space, we may recover the inner product from
the norms of vectors.
Lemma 7.1.7 (Polarization identities). (1) Let V be a real inner product
space. Then for any v; w 2 V ,
hv; wi D .1=4/kv C wk2

.1=4/kv

wk2:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 192 — #206

i

i

192

Guide to Advanced Linear Algebra

(2) Let V be a complex inner product space. Then for any v; w 2 V ,
hv; wi D .1=4/kv C wk2 C .i=4/kv C iwk2
wk2

.1=4/kv

.i=4/kv

iwk2:

For convenience, we repeat here some earlier definitions.
Definition 7.1.8. Let V be an inner product space. A vector v 2 V is a
unit vector if kvk D 1. Two vectors v and w are orthogonal if hv; wi D 0.
A set B of vectors in V , B D fv1 ; v2 ; : : :g, is orthogonal if the vectors in
B are pairwise orthogonal, i.e., if hvi ; vj i D 0 whenever i ¤ j . The set B
is orthonormal if B is an orthogonal set of unit vectors, i.e., if hvi ; vi i D 1
for every i and hvi ; vj i D 0 for every i ¤ j .
Þ
Example 7.1.9. Let h ; i be the standard inner product on F n , defined by
hv; wi D tvw. Then the standard basis E D fe1 ; : : : ; en g is orthonormal. Þ
Lemma 7.1.10. Let B D fv1 ; v2 ; : : :g be an orthogonal set of nonzero vecP
tors in V . If v 2 V is a linear combination of the vectors in B, v D i ci vi ,
then cj D hv; vj i=kvj k2 for each j . In particular, if B is orthonormal then
cj D hv; vj i for each j .
Proof. For any j;
˝

˛

v; vj D

X
i



ci vi ; vj D

X ˝
˛
˝
˛
ci vi ; vj D cj vj ; vj
i

as hvi ; vj i D 0 for i ¤ j:
Corollary 7.1.11. Let B D fv1 ; v2 ; : : :g be an orthogonal set of nonzero
vectors in V: Then B is linearly independent.
Lemma 7.1.12. Let B D fv1; v2 ; : : :g be an orthogonal set of nonzero
vectors in V: If v 2 V is a linear combination of the vectors in B; v D
P
P
2
2
kvk2 D
i ci vi ; thenP
i jci j kvi k : In particular if B is orthonormal
2
2
then kvk D i jci j :

Proof.

We compute
2

kvk D hv; vi D
D

X
i

X
i;j

ci vi ;

X
j

cj vj



˝
˛ X ˇ ˇ2 ˝
˛
ˇci ˇ vi ; vi :
ci cj vi ; vj D



i

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 193 — #207

i

i

7.1. Basic definitions

193

Corollary 7.1.13 (Bessel’s inequality). Let B D fv1 ; v2; : : : ; vn g be a finite
orthogonal set of nonzero vectors in V . For any vector v 2 V ,
n
X
ˇ˝
˛ˇ
ˇ v; vi ˇ2 = vi

2

i D1

 kvk2;

P
with equality if and only if v D niD1 hv; vi ivi :
In particular, if B is orthonormal then
n
X
ˇ˝
˛ˇ
ˇ v; vi ˇ2  kvk2
i D1

P
with equality if and only if v D niD1 hv; vi ivi :
P
Proof. Let w D niD1 .hv; vi i=kvi k2 /vi and let x D v w: Then hv; vi i D
hw; vi i for each i; so hx; vi i D 0 for each i and hence hx; wi D 0: Then
kvk2 D hv; vi D hw C x; w C xi D kwk2 C kxk2  kwk2
D

n
X
ˇ˝
˛ˇ
ˇ v; vi ˇ2 = vi

2

;

i D1

with equality if and only if x D 0:
We have a more general notion of a norm.
Definition 7.1.14. Let V be a vector space over F : A norm on V is a
function k  k W V ! R satisfying:
(a) kvk  0 and kvk D 0 if and only if v D 0,
(b) kcvk D jcjkvk for c 2 F and v 2 V ,
(c) kv C wk  kvk C kwk for v; w 2 V .

Þ

Theorem 7.1.15. (1) Let V be an inner product space. Then
p
kvk D hv; vi

is a norm in the sense of Definition 7.1.14.
(2) Let V be a vector space and let kp k be a norm on V: There is an
inner product h ; i on V such that kvk D hv; vi if and only if k  k satisfies
the parallelogram law

kv C wk2 C kv wk2 D 2 kvk2 C kwk2 for all v; w 2 V:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 194 — #208

i

i

194

Guide to Advanced Linear Algebra

Proof. (1) is immediate. For (2), given any norm we can define h ; i by use
of the polarization identities of Lemma 7.1.7, and it is easy to verify that
this is an inner product if and only if k  k satisfies the parallelogram law.
We omit the proof.
Example 7.1.16. If
2

3
x1
6: 7
v D 4 :: 5
xn

define k  k on F n by kvk D jx1 j C    C jxn j: Then k  k is a norm that does
not come from an inner product.
Þ
We now investigate some important topological properties.
Definition 7.1.17. Two norms k  k1 and k  k2 on a vector space V are
equivalent if there are positive constants a and A such that
akvk1  kvk2  Akvk1

for every v 2 V:

Þ

Remark 7.1.18. It is easy to check that this gives an equivalence relation
on norms.
Þ
Lemma 7.1.19. (1) Let k  k be any norm on a vector space V: Then d.v; w/
D kv wk is a metric on V:
(2) If k  k1 and k  k2 are equivalent norms on V; then the metrics
d1 .v; w/ D kv wk1 and d2 .v; w/ D kv wk2 give the same topology on
V:
Proof. (1) A metric on a space V is a function d W V  V ! V satisfying:
(a) d.v; w/  0 and d.v; w/ D 0 if and only if w D v
(b) d.v; w/ D d.w; v/
(c) d.v; x/  d.v; w/ C d.w; x/.
It is then immediate that d.v; w/ D kv wk is a metric.
(2) The metric topology on a space V with metric d is the one with a
basis of open sets B" .v0 / D fv j d.v; v0 /  "g for every v0 2 V and every
" > 0: Thus k  ki gives the topology with basis of open sets B"i .v0 / D fv j
kv v0 ki < "g for v0 2 V and " > 0; for i D 1; 2: By the definition
2
1
of equivalence B"=A
.v0 /  B"1 .v0 / and B"=a
.v0 /  B"2 .v0 / so these two
bases give the same topology.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 195 — #209

i

i

7.1. Basic definitions

195

Theorem 7.1.20. Let V be a finite-dimensional F -vector space. Then V
has a norm, and any two norms on V are equivalent.
Proof. First we consider V D F n : Then V has the standard norm
kvk D hv; vi D tvv
coming from the standard inner product h; i.
It suffices to show that any other norm k  k2 is equivalent to this one.
By property (b) of a norm, it suffices to show that there are positive
constants a and A with
a  kvk2  A for every v 2 V with kvk D 1:
First suppose that k  k2 comes from an inner product h ; i2: Then
hv; vi2 D tvBv for some matrix B; and so we see that f .v/ D hv; vi2
is a quadratic function of the entries of v (in the real case) or the real and
complex parts of the entries of v (in the complex case). In particular f .v/ is
a continuous function of the entries of v: Now fv j kvk D 1g is a compact
set, and so f .v/ has a minimum a (necessarily positive) and a maximum A
there.
In the general case we must work a little harder. Let


m D min e1 2 ; : : : ; en 2
and M D max e1 2 ; : : : ; en 2

where fe1; : : :; engis the standard basis of F n :
x1
::
Let v D
: with kvk D 1: Then jxi j  1 for each i; so, by the
xn
properties of a norm,
kvk2 D x1 e1 C    C xn en

2

 x 1 e1 2 C    C x n en 2
ˇ ˇ
ˇ ˇ
D ˇx1 ˇ e1 2 C    C ˇxn ˇ en

2

 1  M C    C 1  M D nM:
We prove the other inequality by contradiction. Suppose there is no such
positive constant a: Then we may find a sequence of vectors v1 ; v2; : : : with
kvi k D 1 and kvi k2 < 1= i for each i:
Since fv j kvk D 1g is compact, this sequence has a convergent subsequence w1 ; w2; : : : with kwi k D 1 and kwi k2 < 1= i for each i: Let
w1 D limi !1 wi ; and let d D kw1 k2 : (We cannot assert that d D 0
since we do not know that k  k2 is continuous.)

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 196 — #210

i

i

196

Guide to Advanced Linear Algebra

For any ı > 0; let w 2 V be any vector with kw
d D w1

2

 w1

w

Choose ı D d=.2nM /: Then kw
equality, that
kwk2  d

2

C w

2

w1 k < ı: Then

 ınM C kwk2:

w1 k < ı implies, by the above inınM D d=2:

Choosing i large enough we have kwi w1 k < ı and kwi k2 < d=2; a
contradiction.
This completes the proof for V D F n : For V an arbitrary vector space
of dimension n; choose any basis B of V and define k  k on V by
kvk D ŒvB
where k  k is the standard norm on F n :
Remark 7.1.21. It is possible to put an inner product (and hence a norm)
on any vector space V; as follows: Choose a basis B D fv1; v2 ; : : :g of V
and define h ; i by hvi ; vj i D 1 if i D j and 0 if i ¤ j; and extend h ; i to
V by (conjugate) linearity. However, unless we can actually write down the
basis B; this is not very useful.
Þ
Example 7.1.22. If V is any infinite-dimensional vector space then V
admits norms that are not equivalent. Here is an example. Let V D r F 1 :
Let v D Œx1 ; x2; : : : and w D Œy1 ; y2 ; : : :: Define h ; i on V by hv; wi D
P1
P1
0
j
j D1 xj yj =2 : Then h ; i
j D1 xj yj and define h ; i on V by hv; wi D
and h ; i0 give norms k  k and k  k0 that are not equivalent, and moreover
the respective metrics d and d 0 on V define different topologies, as the
sequence of points fe1 ; e2; : : :g does not have a limit on the topology on V
given by d; but converges to Œ0; 0; : : : in the topology given by d 0 :
Þ

7.2

The Gram-Schmidt process

Let V be an inner product space. The Gram-Schmidt process is a method
for transforming a basis for a finite-dimensional subspace of V into an orthonormal basis for that subspace. In this section we introduce this process
and investigate its consequences.
We fix V; the inner product h ; i; and the norm k  k coming from this
inner product, throughout this section.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 197 — #211

i

i

7.2. The Gram-Schmidt process

197

Theorem 7.2.1. Let W be a finite-dimensional subspace of V; dim.W / D
k; and let B D fv1 ; v2 ; : : : ; vk g be a basis of W: Then there is an orthonormal basis C D fw1; w2; : : : ; wk g of W such that Span.fw1 ; : : : ; wi g/ D
Span.fv1 ; : : : ; vi g/ for each i D 1; : : : ; k: In particular, W has an orthonormal basis.
Proof. By Lemma 7.1.3 and Theorem 6.2.29 we see immediately that W
has an orthonormal basis. Here is an independent construction.
Define vectors xi inductively:
x1 D w 1 ;
xi D vi

X hvi ; xj i
xj
hxj ; xj i

for i > 1:

j  m:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 216 — #230

i

i

216

Guide to Advanced Linear Algebra

We think of fg0 .x/; g1 .x/; g2 .x/; : : :g as a sequence of approximations
to f .x/; and we hope that it converges in some sense to f .x/: Of course,
the question of convergence is one of analysis and not linear algebra.
Þ
We do, however, present the following extremely important special case.
Example 7.4.4. Let V D L2 .Œ ;  /: By definition, this is the space of
complex-valued measurable function f .x/ on Œ ;   such that the Lebesgue
integral
Z 
ˇ
ˇ
ˇf .x/ˇ2 dx


is finite.
Then, by the Cauchy-Schwartz-Buniakowsky inequality, V is an inner
product space with inner product
Z 
˝
˛
1
f .x/; g.x/ D
f .x/g.x/dx:
2 

For each integer n; let pn .x/ D e i nx : Then fpn .x/g is an orthonormal
set, as we see from the equalities
Z 
Z 
1
1
2
pn .x/ D
e i nx e i nx dx D
1 dx D 1
2 
2 
and, for m ¤ n;
˝
˛
1
pm .x/; pn .x/ D
2
D

Z



e i mx e

i nx



1
2 i.m

n/

e i.m

dx D

1
2

Z



e i.m

n/x



ˇ
ˇ D 0:

n/x ˇ



For any function f .x/ 2 L2 .Œ ;  / we have its classical Fourier
coefficients
Z 
Z 
˝
˛
1
b.n/ D f .x/; pn .x/ D 1
f
f .x/pn .x/dx D
f .x/e i nx dx
2 
2 
for any integer n; and the Fourier expansion
g.x/ D

1
X

nD 1

b.n/pn .x/:
f

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 217 — #231

i

i

7.4. Examples

217

It is a theorem from analysis that the right-hand side is well-defined,
i.e., that if for a nonnegative integer m we define
gm .x/ D

m
X

nD m

b.n/pn .x/;
f

then g.x/ D limm!1 gm .x/ exists, and furthermore it is another theorem
from analysis that, as functions in L2 .Œ ;  /;
f .x/ D g.x/:
This is equivalent to limm!1 kf .x/ gm .x/k D 0; and so we may regard
g0 .x/, g1 .x/, g2 .x/; : : : as a series of approximations that converges to
f .x/ (in norm).
Þ
Now we turn from orthogonal sets to adjoints and normality.
Example 7.4.5. (1) Let V D C01 .R/ be the space of real valued infinitely differentiable functions on R with compact support (i.e., for every
f .x/ 2 C01 .R/ there is a compact interval I  R with f .x/ D 0 for
x … I ). Then V is an inner product space with inner product given by
Z 1
˝
˛
f .x/; g.x/ D
f .x/g.x/dx:
1

Let D W V ! V be defined by D.f .x// D f 0 .x/: Then D has an adjoint
D W V ! V given by D .f .x// D E.x/ D f 0 .x/; i.e., D D D: To
see this, we compute
˝

˛ ˝
˛
D f .x/ ; g.x/
f .x/; E g.x/
Z 1
Z 1

D
f 0 .x/g.x/dx
f .x/
g0 .x/ dx
1
1
Z 1

D
f 0 .x/g.x/ C f .x/g0 .x/ dx
1
Z 1
0
D
f .x/g.x/ dx D f .x/g.x/jba D 0;


1

where the support of f .x/g.x/ is contained in the interval Œa; b:
Since D D D; D commutes with D; so D is normal.
(2) Let V D C 1 .R/ or V D P1 .R/; with inner product given by
Z 1
˛
˝
f .x/; g.x/ D
f .x/g.x/dx:
0

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 218 — #232

i

i

218

Guide to Advanced Linear Algebra

We claim that D W V ! V defined by D.f .x// D f 0 .x/ does not
have an adjoint. We prove this by contradiction. Suppose D has an adjoint
D D E: Guided by (1) we write E.f .x// D f 0 .x/ C F .f .x//: Then
we compute
˝

˛ ˝
˛
D f .x/ ; g.x/
f .x/; E g.x/
Z 1
Z 1
0

D
f .x/g.x/ dx
f .x/F g.x/ dx
0

0

D f .1/g.1/

f .0/g.0/

Z

1

0


f .x/F g.x/ dx;

true for every pair of functions f .x/; g.x/ 2 V . Suppose there is some
function g0 .x/ with F .g0 .x// ¤ 0. Setting f .x/ D x 2.x 1/2 F .g0 .x//
we find a nonzero right-hand side, so E is not an adjoint of D. Thus the
only possibility is that F .f .x// D 0 for every f .x/ 2 V , and hence
that E.f .x// D f 0 .x/. Then f .1/g.1/ f .0/g.0/ D 0 for every pair
of functions f .x/; g.x/ 2 V , which is false (e.g., for f .x/ D 1 and
g.x/ D x).
(3) For any fixed n let V D Pn 1 .R/ with the same inner product. Then
V is finite-dimensional. Thus D W V ! V has an adjoint D W V ! V . In
case n D 1, D D 0 so D D 0; and D is trivially normal. For n  1, D is
not normal: Let f .x/ D x. Then D2 .f .x// D 0 but D.f .x// ¤ 0, so D
cannot be normal, by Lemma 7.3.16(4).
Let us compute D for some small values of n. If we set D .g.x// D
h.x/, we are looking for functions satisfying
Z

1
0

f 0 .x/g.x/dx D

Z

1

f .x/h.x/dx
0

for every f .x/ 2 V:

Since D is a linear transformation, it suffices to give the values of D on
the elements of a basis of V . We choose the standard basis E.
On P0 .R/:
D .1/ D 0:
On P1 .R/:
D .1/ D 6 C 12x

D .x/ D 3 C 6x:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 219 — #233

i

i

7.5. The singular value decomposition

219

On P2 .R/:
D .1/ D 6 C 12x

D .x/ D 2

D x 2 D 3

7.5

24x C 30x 2

26x C 30x 2:

Þ

The singular value decomposition

In this section we augment our results on normal linear transformations
to obtain geometric information on an arbitrary linear transformation T W
V ! W between finite dimensional inner product spaces. We assume we
are in this situation throughout.
Lemma 7.5.1. (1) T  T is self-adjoint.
(2) Ker.T  T / D Ker.T /:
Proof. For (1), .T  T / D T  T  D T  T :
For (2), we have Ker.T  T /  Ker.T /: On the other hand, let v 2
Ker.T  T /: Then
˝
˛ ˝
˛
0 D hv; 0i D v; T  T .v/ D T .v/; T .v/

so T .v/ D 0 and hence Ker.T  T /  Ker.T /:

Definition 7.5.2. A linear transformation S W V ! V is nonnegative
(respectively positive) if S is self-adjoint and hS.v/; vi  0 (respectively
hS.v/; vi > 0) for every v 2 V; v ¤ 0:
Þ
Lemma 7.5.3. The following are equivalent:
(1) S W V ! V is nonnegative (respectively positive).
(2) S W V ! V is self-adjoint and all the eigenvalues of S are nonnegative
(respectively positive).
(3) S D T  T for some (respectively some invertible) linear transformation T W V ! V:
Proof. (1) and (2) are equivalent by the spectral theorem, Corollary 7.3.20.
If S is self-adjoint with distinct eigenvalues 1 ; : : : ; k ; all  0; then in
the notation of Corollary 7.3.22 we have S D 1 T1 C    C k Tk : Choosing
p
p
T D R D 1 T1 C    C k Tk ; we have T  D R as well, and then
T  T D R 2 D S; so (2) implies (3).

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 220 — #234

i

i

220

Guide to Advanced Linear Algebra

Suppose (3) is true. We already know by Lemma 7.5.1(1) that T  T is
self-adjoint. Let  be an eigenvalue of T  T ; and let v be an associated
eigenvector. Then
˝
˛ ˝
˛
hv; vi D hv; vi D v; T  T .v/ D T .v/; T .v/ ;

so   0: By Lemma 7.5.1(2), T  T is invertible if and only if T is invertible, and we know that T is invertible if and only if all its eigenvalues are
nonzero. Thus (3) implies (2).

Corollary 7.5.4. For any nonnegative linear transformation S W V ! V
there is a unique nonnegative linear transformation R W V ! V with
R 2 D S:
Proof. R is constructed in the proof of Lemma 7.5.3. Uniqueness follows
easily by considering eigenvalues and eigenspaces.
Definition 7.5.5. Let T W V ! W have rank r . Let 1 ; : : : ; r be
the (not necessarily distinct) nonzero eigenvalues of T  T (all of which are
necessarily positive) ordered so that 1  2  : : :  r . Then 1 D
p
p
1 ; : : : , r D r are the singular values of T .
Þ

Theorem 7.5.6 (Singular value decomposition). Let T W V ! W have
rank r , and let 1 ; : : : ; r be the singular values of T : Then there are orthonormal bases C D fv1 ; : : : ; vn g of V and D D fw1; : : : ; wm g of W
such that


T vi D i wi for i D 1; : : : ; r and T vi D 0 for i > r:

Proof. Since T  T is self-adjoint, we know that there is an orthonormal
basis C D fv1 ; : : : ; vn g of V of eigenvectors of T  T and we order the basis
so that the associated eigenvalues are 1 ; : : : ; r ; 0; : : : ; 0: For i D 1; : : : ; r;
let


wi D 1=i T vi :
We claim C1 D fw1; : : : ; wr g is an orthonormal set. We compute
˝
˛
2 ˝

˛
2
wi ; wi D 1=i T vi ; T vi D 1=i i D 1

and for i ¤ j
˛
˝
˛
˛
˝

˝
wi ; wj D 1=i j T vi ; T vj D 1=i j vi ; T  T vj
˝
˛
˝
˛
D 1=i j vi ; j vj D j =i j vi ; vj D 0:
Then extend C to an orthonormal basis C of W:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 221 — #235

i

i

7.5. The singular value decomposition

221

Remark 7.5.7. This theorem has a geometric interpretation: We choose
new letters to have an unbiased description. Let X be an inner product space
and consider an orthonormal set B D fx1 ; : : : ; xn g of vectors in X: Then
for any positive real numbers a1 ; : : : ; ak ;
(

)
ˇ k
ˇ X ˇ ˇ2 2
ˇ
ˇ
ˇ
x D c 1 x1 C    C c k xk ˇ
ci =ai D 1
i D1

defines an ellipsoid in X: If k D dim.X/ and ai D 1 for each i this ellipsoid
is the unit sphere in X:
The singular value decomposition says that if T W V ! W is a linear transformation, then the image of the unit sphere of V under T is an
ellipsoid in W; and furthermore it completely identifies that ellipsoid. Þ
We also observe the following.
Corollary 7.5.8. T and T  have the same singular values.
Proof. This is a special case of Theorem 5.9.2.
Proceeding along these lines we now derive the polar decomposition of
a linear transformation.
Theorem 7.5.9 (Polar decomposition). Let T W V ! V be a linear transformation. Then there is a unique positive semidefinite linear transformation R W V ! V and an isometry Q W V ! V with T D QR. If T is
invertible, Q is also unique.
Proof.

Suppose T D QR: By definition, Q D Q

1

and R  D R. Then

T  T D .QR/ QR D R  .Q Q/R D RIR D R 2 :
Then, by Corollary 7.5.4, R is unique.
Suppose that T is invertible, and define R as in Corollary 7.5.4. Then
R is invertible, and then T D QR for the unique linear transformation
Q D T R 1. It remains to show that Q is an isometry. We compute, for any
v 2 V;
˝
˛ ˝
˛ ˝
Q.v/; Q.v/ D T R 1 .v/; T R 1.v/ D v; T R

˛ ˝
˝
D v; R 1 T  T R 1 .v/ D v; R
˝
˛
D v; R 1R 2R 1 .v/ D hv; vi:


1 

1

˛
T R 1 .v/

˛
T  T R 1 .v/

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 222 — #236

i

i

222

Guide to Advanced Linear Algebra

Suppose that T is not (necessarily) invertible. Choose a linear transformation S W Im.R/ ! V with RS D I W Im.R/ ! Im.R/.
By Lemma 7.5.1 we know that Ker.T  T / D Ker.T / and also that
Ker.R/ D Ker.R  R/ D Ker.R 2 / D Ker.T  T /:
Hence Y D Im.R/? and Z D Im.T /? are inner product spaces of the
same dimension .dim.Ker.T /// and hence are isometric. Choose an isometry Q0 W Y ! Z. Define Q as follows: Let X D Im.R/; so V D X ? Y .
Then

Q.v/ D T S.x/ C Q0 .y/ where v D x C y; x 2 X; y 2 Y:

(In the invertible case, S D R 1 and Q0 W f0g ! f0g; so Q is unique,
Q D T R 1 . In general, it can be checked that Q is independent of the
choice of S, but it depends on the choice of Q0 , and is not unique.)
We claim that QR D T and that Q is an isometry.
To prove the first claim, we make a preliminary observation. For any
v 2 V; let x D R.v/: Then R.S.x/ v/ D RS.x/ R.v/ D x x D 0;
i.e., S.x/ v 2 Ker.R/: But Ker.R/ D Ker.T /; so S.x/ v 2 Ker.T /;
i.e., T .S.x/ v/ D 0, so T .S.x// D T .v/. Using this observation we
compute that for any v 2 V ,
QR.v/ D Q.x C 0/ D T S.x/ C Q0 .0/ D T .v/ C 0 D T .v/:

To prove the second claim, we observe that for any v 2 V ,
˝
˛ ˝
˛ ˝
˛ ˝
˛ ˝
˛
R.v/; R.v/ D v; R  R.v/ D v; R 2.v/ D v; T  T .v/ D T .v/; T .v/ :
Then, using the fact that Im.Q0 /  Z D Im.T /? ; and writing v D
x C y as above,
˝
˛ ˝
˛
Q.v/; Q.v/ D T S.x/ C Q0 .y/; T S.x/ C Q0 .y/
˝
˛ ˝
˛
D T S.x/; T S.x/ C Q0 .y/; Q0 .y/
˝
˛
˝
˛
D T .v/; T .v/ C hy; yi D R.v/; R.v/ C hy; yi
D hx; xi C hy; yi D hx C y; x C yi D hv; vi:



i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 223 — #237

i

i

CHAPTER

8

Matrix groups
as Lie groups
Lie groups are central objects in mathematics. They lie at the intersection
of algebra, analysis, and topology. In this chapter, we will show that many
of the groups we have already encountered are in fact Lie groups.
This chapter presupposes a certain knowledge of differential topology,
and so we will use definitions and theorems from differential topology without further comment. We will also be a bit sketchy in our arguments in
places. Throughout this chapter, “smooth” means C 1 . We use cij to denote
a matrix entry that may be real or complex, xij to denote a real matrix entry
and zij to denote a complex matrix entry, and we write zij D xij C iyij
where xij and yij are real numbers. We let F D R or C and dF D dimR F ,
so that dR D 1 and dC D 2.

8.1

Definition and first examples

Definition 8.1.1. G is a Lie group if
(1) G is a group.
(2) G is a smooth manifold.
(3) The multiplication map m W G G ! G by m.g1 ; g2 / D g1 g2 and the
inversion map i W G ! G by i.g/ D g 1 are both smooth maps.
Þ
Example 8.1.2. (1) The general linear group
GLn .F / D finvertible n-by-n matrices with entries in F g:
2

GLn .F / is a Lie group: It is an open subset of F n as
GLn .F / D det

1

.F

f0g/;
223

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 224 — #238

i

i

224

Guide to Advanced Linear Algebra

so it is a smooth manifold of dimension dF n2 . It is noncompact for every
n  1 as GL1 .F / contains matrices Œc with jcj arbitrarily large. GLn .R/
has two components and GLn .C/ is connected, as we showed in Theorem 3.5.1 and Theorem 3.5.7. The multiplication map is a smooth map as
it is a polynomial in the entries of the matrices, and the inversion map is
a smooth map as it is a rational function of the entries of the matrix with
nonvanishing denominator, as we see from Corollary 3.3.9.
(2) The special linear group
SLn .F / D fn-by-n matrices of determinant 1 with entries in F g:
SLn .F / is a Lie group: SLn .F / D det 1 .f1g/. To show SLn .F / is a smooth
manifold we must show that 1 is a regular value of det. Let M D .cij /,
M 2 SLn .F /. Expanding by minors of row i , we see that
1 D det.M / D . 1/i C1 det.Mi1 / C . 1/i C2 det.Mi 2 / C    ;
where Mij is the submatrix obtained by deleting row i and column j of M ,
so at least one of the terms in the sum is nonzero, say cij . 1/i Cj det.Mij /.
But then the derivative matrix det0 of det with respect to the matrix entries, when evaluated at M , has the entry . 1/i Cj det.Mij / ¤ 0, so this
matrix has rank dF everywhere. Hence, by the inverse function theorem,
2
SLn .F / is a smooth submanifold of F n . Since f1g  F has codimension
2
dF , SLn .F / has codimension dF in F n , so it is a smooth manifold of dimension dF .n2 1/.
SL1 .F / D fŒ1g is a single point and hence is compact, but SLn .F / is
noncompact for n > 1,
as we see from the fact that SL2 .F / contains matri0 
ces of the form 0c 1=c
with jcj arbitrarily large. An easy modification of the
proofs of Theorem 3.5.1 and Theorem 3.5.7 shows that SLn .F / is always
connected. Locally, SLn .F / is parameterized by all but one matrix entry,
and, by the implicit function theorem, that entry is locally a function of the
other n2 1 entries. We have observed that multiplication and inversion
are smooth functions in the entries of a matrix, and hence multiplication
and inversion are smooth functions of the parameters in a coordinate patch
around each element of SLn .F /, i.e., m D SLn .F /  SLn .F / ! SLn .F /
and i W SLn .F / ! SLn .F / are smooth functions.
Þ

8.2

Isometry groups of forms

Our next family of examples arises as isometry groups of nonsingular bilinear or sesquilinear forms. Before discussing these, we establish some

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 225 — #239

i

i

8.2. Isometry groups of forms

225

notation:
In is the n-by-n identity matrix.
I
For p C q D n, Ip;q is the n-by-n matrix 0p

0 
Iq .
0 Im
Im 0



For n even, n D 2m, Jn is the n-by-n matrix
.
For a matrix M D .cij /,"we write
M D Œm1 j    j mn , so that mi is
#

the i th column of M , mi D

c1i
c2i

::
:

.

cni

Example 8.2.1. Let ' be a nonsingular symmetric bilinear form on a
vector space V of dimension n over F . We have two cases:
(1) F D R. Here, by Theorem 6.2.29, ' is isometric to pŒ1 ? qŒ 1
for uniquely determined integers p and q with p C q D n. The orthogonal
group
˚
Op;q .R/ D M 2 GLn .R/ j t MIp;q M D Ip;q :
In particular if p D n and q D 0 we have
˚
On .R/ D On;0 .R/ D M 2 GLn .R/ j t M D M

1

:

(2) F D C. In this case, by Corollary 6.2.27, ' is isometric to nŒ1. The
orthogonal group
˚
On .C/ D M 2 GLn .C/ j t M D M 1 :

(The term “the orthogonal group” is often used to mean On .R/. Compare
Definition 7.3.12.)
Let G D Op;q .R/, On .R/, or On .C/. G is a Lie group of dimension
dF n.n 1/=2. G has two components. Letting S G D G \ SLn .F /, we
obtain the special orthogonal groups. For G D On .R/ or On .C/, S G is the
identity component of G, i.e., the component of G containing the identity
matrix. If G D On .R/ then G is compact. O1 .C/ D O1 .R/ D f˙Œ1g. If
G D On .C/ for n > 1, or G D Op;q .R/ with p  1 and q  1, then G is
not compact.
We first consider the case G D Op;q .R/, including G D On;0 .R/ D
 a1 
 b1 
:
On .R/. For vectors v D :: and w D ::: , let
an

hv; wi D

bn

p
X
i D1

ai bi

n
X

ai bi :

i DpC1

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 226 — #240

i

i

226

Guide to Advanced Linear Algebra

Let M D Œm1 j    j mn . Then M 2 G if and only if
˝
˛
fi i .M / D mi ; mi D 1
for i D 1; : : : ; p
˝
˛
fi i .M / D mi ; mi D 1 for i D p C 1; : : : ; n
˝
˛
fij .M / D mi ; mj D 0
for 1  i < j < n:

Thus if we let F W Mn .R/ ! RN , N D n.n C 1/=2, by

F .M / D f11 .M /; f22 .M /; : : : ; fnn .M /; f12 .M /;
f13 .M /; : : : ; f1n .M /; : : : ; fn

then
GDF

1

.t0 /



1;n .M /

where t0 D .1; : : : ; 1; 0; : : : ; 0/:

We claim that M D I is a regular point of F . List the entries of M in the
order x11; x22 ; : : : ; xnn ; x12; : : : ; x1n ; : : : ; xn 1;n ; x21; : : : ; xn1 ; : : : ; xn;n 1 .
Computation shows that F 0 .I /, the matrix of the derivative of F evaluated
at M D I , which is an N -by-n2 matrix, has its leftmost N -by-N submatrix
a diagonal matrix with diagonal entries ˙2 or ˙1. Thus F 0 .I / has rank N ,
and I is a regular point of F . Hence, by the inverse function theorem, there
is an open neighborhood B.I / of I in Mn .R/ such that F 1 .t0 / \ B.I /
is a smooth submanifold of B.I / of codimension N , i.e., of dimension
N 2 n D n.n 1/=2. But for any fixed M0 2 GLn .R/, multiplication by
M0 is an invertible linear map, and hence a diffeomorphism, from Mn .R/
to itself. Thus we know that M0 .F 1 .t0 / \ B.I // is a smooth submanifold
of M0 B.I /, which is an open neighborhood of M0 in Mn .R/. But, since G
is a group, M0 F 1 .t0 / D M0 G D G D F 1 .t0 /. Hence we see that G is a
smooth manifold. Again we apply the implicit function theorem to see that
the group operations on G are smooth maps.
Finally, we observe that any M D .cij / in On .R/ has jcij j  1 for
every i , j , so On .R/ is a closed and bounded, and hence compact, subn2
space
h p of R . On
i the other hand, the group O1;1 .R/ contains the matrices
x 2 C1 p x
x
x 2 C1

2

for any x 2 R, so it is an unbounded subset of Rn and
hence it is not compact, and similarly for Op;q .R/ with p  1 and q  1.
A very similar argument applies in case G D On .C/. We let
˝
˛
˝
˛
fij .M / D Re mi ; mj
and gij .M / D Im mi ; mj

where Re./ and Im./ denote real and imaginary parts respectively. We then
let F W Mn .C/ ! R2N by

F .M / D f11 .M /; g11 .M /; f22 .M /; g22 .M /; : : : ;

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 227 — #241

i

i

8.2. Isometry groups of forms

227

2

and we identify Mn .C/ with R2n by identifying the entry zij D xij C iyij
of M with the pair .xij ; yij / of real numbers. Then

G D F 1 t0 where t0 D .1; 0; 1; 0; : : : ; 1; 0; 0; : : : ; 0/:

Again we show that M D I is a regular point of F , and the rest of the
argument is the same, showing that G is a smooth manifold of dimension
2N 2n2 D n.n 1/, and hthat the group operations
are smooth. Also,
i

O2 .C/ contains the matrices

i

p
x2 1 p x
x
i x2 1

compact, and similarly for On .C/ for n  2.

for any x 2 R, so it is not
Þ

Example 8.2.2. Let ' be a nonsingular Hermitian form on a vector space
V of dimension n over C. Then, by Theorem 6.2.29, ' is isometric to
pŒ1 ? qŒ 1 for uniquely determined integers p and q with p C q D n.
The unitary group
˚
Up;q .C/ D M 2 GLn .C/ j t MIp;q M D Ip;q :
In particular if p D n and q D 0 we have
˚
Un .C/ D M 2 GLn .C/ j t M D M

1

:

(The term “the unitary group” is often used to mean Un .C/. Compare Definition 7.3.12.)
Let G D Un .C/ or Up;q .C/. G is a Lie group of dimension n2 . G is
connected. If G D Un .C/ then G is compact. If G D Up;q .C/ with p  1
and q  1, then G is not compact. Letting S G D G \ SLn .R/, we obtain
the special unitary groups, which are closed connected subgroups of G of
codimension 1.
The argument here is very similar to the argument in the last example.
 a1 
 b1 
::
For vectors v D : and w D ::: we let
an

bn

hv; wi D

p
X
i D1

ai b i

n
X

ai b i :

i DpC1

Let M D Œm1 j    j mn . Then M 2 G if and only if
˝
˛
mi ; mi D 1
for i D 1; : : : ; p
˝
˛
mi ; mi D 1 for i D p C 1; : : : ; n
˝
˛
mi ; mj D 0
for 1  i < j < n:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 228 — #242

i

i

228

Guide to Advanced Linear Algebra

Let fi i .M / D hmi ; mi i, which is always real valued. For i ¤ j , let
fij .M / D Re.hmi ; mj i/ and gij D Im.hmi ; mj i/.
Set N D n C 2.n.n 1/=2/ D n2 . Let F D Mn .C/ ! RN by

Then


F .M / D f11 .M /; : : : ; fnn .M /; f12 .M /; g12 .M /; : : : :
GDF

1

t0



where t0 D .1; : : : ; 1; 0; : : : ; 0/:

2

Identify Mn .C/ with R2n as before. We again argue as before, showing
that I is a regular point of F and then further that G is a smooth manifold
of dimension 2n2 n2 D n2 , and in fact a Lie group. Also, a similar
argument shows that Un .C/ is compact but that Up;q .C/ is not compact for
p  1 and q  1.
Þ
Example 8.2.3. Let ' be a nonsingular skew-symmetric form on a vector
space
 0 IV
 of dimension n over F . Then, by Theorem 6.2.40, ' is isometric to
I 0 . The symplectic group
˚
Sp.n; F / D M 2 GLn .F / j t MJn M D Jn :

Let G D Sp.n; R/ or Sp.n; C/. G is connected and noncompact. G is a Lie
group of dimension dF .n.n C 1/=2/. We also have the symplectic group
Sp.n/ D Sp.n; C/ \ U.n; C/:
G D Sp.n/ is a closed subgroup of both Sp.n; C/ and U.n; C/, and is
a connected compact Lie group of dimension n.n C 1/=2. (The term “the
symplectic group” is often used to mean Sp.n/.)
We consider G D Spn .F / for F D R or C.
 b1 
 a1 
:
:
The argument is very similar. For V D :: and w D :: , let
an

n=2
X
hv; wi D
.ai bi Cn=2
i D1

bn

ai Cn=2 bi /:

If M D Œm1 j    j mn  then M 2 G if and only if
˝
˛
mi ; mi Cn=2 D 1 for i D 1; : : : ; n=2
˝
˛
mi ; mj D 0 for 1  i < j  n;

j ¤ i C n=2:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 229 — #243

i

i

8.2. Isometry groups of forms
Let fij .M / D hmi ; mj i for i < j . Set N D n.n
Mn .F / ! F N by

F .M / D f12 .M /; : : : ; fn 1;n .M / :

229
1/=2. Let F W

Then

GDF

1

.t0 /

where t0 D .0; : : : ; 1; : : :/:

Again we show that I is a regular point for F , and continue similarly, to
obtain that G is a Lie group of dimension dF n2 dF N D dF .n.n C 1/=2/.
 0 
Sp2 .F / contains the matrices x0 1=x
for any x ¤ 0 2 R, showing that
Spn .F / is not compact for any n.
Finally, Sp.n/ D Spn .C/ \ U.n; C/ is a closed subspace of the compact
space U.n; C/, so is itself compact. We shall not prove that it is a Lie group
nor compute its dimension, which is .n2 C n/=2, here.
Þ

Remark 8.2.4. A warning to the reader: Notation is not universally
consistent and some authors index the symplectic groups by n=2 instead
of n.
Þ
Finally, we have a structure theorem for GLn .R/ and GLn .C/. We defined AC
N , Nn .R/ and Nn .C/ in Definition 7.2.18, and these are obviously
Lie groups.
Theorem 8.2.5. The multiplication maps
m W O.n; R/  AC
n  Nn .R/ ! GLn .R/
and
m W U.n; C/  AC
n  Nn .C/ ! GLn .C/

given by m.P; A; N / D PAN are diffeomorphisms.

Proof. The special case of Theorem 7.2.20 with k D n gives that m is a
homeomorphism, and it is routine to check that m and m 1 are both differentiable.
Remark 8.2.6. We have adopted our approach here on two grounds: first,
to use elementary arguments to the extent possible, and second, to illustrate
and indeed emphasize the linear algebra aspects of Lie groups. But it is
possible to derive the results of this chapter by using more theory and less
computation. It was straightforward to prove that GLn .R/ and GLn .C/ are
Lie groups. The fact that the other groups we considered are also Lie groups
is a consequence of the theorem that any closed subgroup of a Lie group is
a Lie group. But this theorem is a theorem of analysis and topology, not of
linear algebra.
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 230 — #244

i

i

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 231 — #245

i

i

CHAPTER

A

Polynomials
In this appendix we gather and prove some important facts about polynomials. We fix a field F and we let R D F Œx be the ring of polynomials in
the variable x with coefficients in F ,
R D fan x n C    C a1 x C a0 j ai 2 F ; n  0g:

A.1 Basic properties
We define the degree of a nonzero polynomial to be the highest power of x
that appears in the polynomial. More precisely:
Definition A.1.1. Let p.x/ D an x n C    C a0 with an ¤ 0. Then the
degree deg p.x/ D n.
Þ
Remark A.1.2. The degree of the 0 polynomial is not defined. A polynomial of degree 0 is a nonzero constant polynomial.
Þ
The basic tool in dealing with polynomials is the division algorithm.
Theorem A.1.3. Let f .x/; g.x/ 2 R with g.x/ ¤ 0. Then there exist
unique polynomials q.x/ (the quotient) and r .x/ (the remainder) such that
f .x/ D g.x/q.x/ C r .x/, where r .x/ D 0 or deg r .x/ < deg g.x/.
Proof. We first prove existence.
If f .x/ D 0 we are done: choose q.x/ D 0 and r .x/ D 0. Otherwise,
let f .x/ have degree m and q.x/ have degree n. We fix n and proceed by
complete induction on m. If m < n we are again done: choose q.x/ D 0
and r .x/ D f .x/.
Otherwise, let g.x/ D an x n C    C a0 and f .x/ D bm x m C    C b0 .
If q0 .x/ D .bm =an /x m n , then f .x/ g.x/q0 .x/ has the coefficient of
231

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 232 — #246

i

i

232

Guide to Advanced Linear Algebra

x m equal to zero. If f .x/ D g.x/q0 .x/ then we are again done: choose
q.x/ D q0 .x/ and r .x/ D 0. Otherwise, f1 .x/ D f .x/ g.x/q0 .x/ is a
nonzero polynomial of degree less than m. Thus by the inductive hypothesis
there are polynomials q1 .x/ and r1 .x/ with f1 .x/ D g.x/q1 .x/ C r1 .x/
where r1 .x/ D 0 or deg r1 .x/ < deg g.x/. Then f .x/ D g.x/q0 .x/ C
f1 .x/ D g.x/q0 .x/Cg.x/q1 .x/Cr1 .x/ D g.x/q.x/Cr .x/ where q.x/ D
q0 .x/Cq1 .x/ and r .x/ D r1.x/ is as required, so by induction we are done.
To prove uniqueness, suppose f .x/ D g.x/q1 .x/ C r1 .x/ and f .x/ D
g.x/q2 .x/ C r2 .x/ with r1 .x/ and r2 .x/ satisfying the conditions of the
theorem. Then g.x/.q1 .x/ q2 .x// D r2 .x/ r1 .x/. Comparing degrees
shows r2 .x/ D r1 .x/ and q2 .x/ D q1 .x/.
Remark A.1.4. The algebraically well-informed reader will recognize the
rest of this appendix as a special case of the theory of ideals in a Euclidean
ring, but we will develop this theory from scratch for polynomial rings. Þ
Definition A.1.5. A nonempty subset J of R is an ideal of R if it has
the properties
(1) If p1 .x/ 2 J and p2 .x/ 2 J, then p1 .x/ C p2 .x/ 2 J.
(2) If p1 .x/ 2 J and q.x/ 2 R, then p1 .x/q.x/ 2 J.

Þ

Remark A.1.6. Note that J D f0g is an ideal, the zero ideal. Any other
ideal (i.e., any ideal containing a nonzero element) is a nonzero ideal. Þ
Example A.1.7. (1) Fix a polynomial p0 .x/ and let J be the subset of
R consisting of all multiples of p0 .x/, J D fp0 .x/q.x/ j q.x/ 2 Rg. It is
easy to check that J is an ideal. An ideal of this form is called a principal
ideal and p0 .x/ is called a generator of J, or is said to generate J.
(2) Let fp1 .x/; p2 .x/; : : :g be a (possibly infinite) set of polynomials in
P
R and let J D f pi .x/qi .x/ j only finitely many qi .x/ ¤ 0g. It is easy to
check that J is an ideal, and fp1 .x/; p2 .x/; : : :g is called a generating set
for J (or is said to generate J).
Þ
A nonzero polynomial p.x/ D an x n C    C a0 is called monic if the
coefficient of the highest power of x appearing in p.x/ is 1, i.e., if an D 1.
Lemma A.1.8. Let J be a nonzero ideal of R. Then J contains a unique
monic polynomial of lowest degree.
Proof. The set fdeg p.x/ j p.x/ 2 J; p.x/ ¤ 0g is a nonempty set of
nonnegative integers, so, by the well-ordering principle, it has a smallest
element d . Let e
p 0 .x/ be a polynomial in J with deg e
p 0 .x/ D d . Thus

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 233 — #247

i

i

A.1. Basic properties

233

e
p 0 .x/ is a polynomial in J of lowest degree, which may or may not be
monic. Write e
p 0 .x/ D e
ad x d C    C e
a0 . By the properties of an ideal,
p0 .x/ D .1=e
ad /e
p 0 .x/ D x d C    C .e
a 0 =e
ad / D x d C    C a0 is in
J. This gives existence. To show uniqueness, suppose we have a different
monic polynomial p1 .x/ of degree d in J, p1 .x/ D x d C   C b0 . Then by
the properties of an ideal e
q .x/ D p0 .x/ p1 .x/ is a nonzero polynomial of
degree e < d in J, e
q.x/ D cee x e C    C ce0. But then q.x/ D .1=e
ce /e
q .x/ D
e
x C  C.e
c0 =e
ce / is a monic polynomial in J of degree e < d , contradicting
the minimality of d .
Theorem A.1.9. Let J be any nonzero ideal of R. Then J is a principal
ideal. More precisely, J is the principal ideal generated by p0 .x/, where
p0 .x/ is the unique monic polynomial of lowest degree in J.
Proof. By Lemma A.1.8, there is such a polynomial p0 .x/. Let J0 be the
principal ideal generated by p0 .x/. We show that J0 D J.
First we claim that J0  J. This is immediate. For, by definition, J0
consists of polynomials of the form p0 .x/q.x/, and, by the properties of an
ideal, every such polynomial is in J.
Next we claim that J  J0. Choose any polynomial g.x/ 2 J. By
Theorem A.1.3, we can write g.x/ D p0 .x/q.x/ C r .x/ where r .x/ D 0
or deg r .x/ < deg p0 .x/. If r .x/ D 0 we are done, as then g.x/ D
p0 .x/q.x/ 2 J0 . Assume r .x/ ¤ 0. Then, by the properties of an ideal,
r .x/ D g.x/ p0 .x/q.x/ 2 J. (p0.x/ 2 J so p0 .x/. q.x// 2 J;
then also g.x/ 2 J so g.x/ C p0 .x/. q0 .x// D r .x/ 2 J). Now r .x/
is a polynomial of some degree e < d , r .x/ D ae x e C    C a0 , so
.1=ae /r .x/ D x e C    C .a0 =ae / 2 J. But this is a monic polynomial
of degree e, contradicting the minimality of d .
We now have an important application of this theorem.
Definition A.1.10. Let fp1 .x/; p2 .x/; : : :g be a (possibly infinite) set
of nonzero polynomials in R. Then a monic polynomial d.x/ 2 R is a
greatest common divisor (gcd) of fp1 .x/; p2 .x/; : : :g if it has the following
properties
(1) d.x/ divides every pi .x/.
(2) If e.x/ is any polynomial that divides every pi .x/, then e.x/ divides
d.x/.
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 234 — #248

i

i

234

Guide to Advanced Linear Algebra

Theorem A.1.11. Let fp1 .x/; p2 .x/; : : :g be a (possibly infinite) set of
nonzero polynomials in R. Then fp1 .x/; p2 .x/; : : :g has a unique gcd d.x/.
More precisely, d.x/ is the generator of the principal ideal
X
JDf
pi .x/qi .x/ j qi .x/ 2 R only finitely many nonzerog:

Proof. By Theorem A.1.9, there is unique generator d.x/ of this ideal. We
must show it has the properties of a gcd.
Let J0 be the principal ideal generated by d.x/, so that J0 D J.
(1) Consider any polynomial pi .x/. Then pi .x/ 2 J, so pi .x/ 2 J0 .
That means that pi .x/ D d.x/q.x/ for some q.x/, so d.x/ divides pi .x/.
P
(2) Since d.x/ 2 J, it can be written as d.x/ D
pi .x/qi .x/ for some
polynomials fqi .x/g. Let e.x/ be any polynomial that divides every pi .x/.
Then it divides every product pi .x/qi .x/, and hence their sum d.x/.
Thus we have shown that d.x/ satisfies both properties of a gcd. It remains to show that it is unique. Suppose d1 .x/ is also a gcd. Since d.x/
is a gcd of fp1 .x/; p2 .x/; : : :g, and d1 .x/ divides each of these polynomials, then d1 .x/ divides d.x/. Similarly, d.x/ divides d1 .x/. Thus d.x/ and
d1 .x/ are a pair of monic polynomials each of which divides the other, so
they are equal.
We recall an important definition.
Definition A.1.12. A field F is algebraically closed if every nonconstant polynomial f .x/ in F Œx has a root in F , i.e., if for every nonconstant
polynomial f .x/ in F Œx there is an element r of F with f .r / D 0.
Þ
We have the following famous and important theorem, which we shall
not prove.
Theorem A.1.13 (Fundamental Theorem of Algebra). The field C of complex numbers is algebraically closed.
Example A.1.14. Let F be an algebraically closed field and let a 2 F .
Then J D fp.x/ 2 R j p.a/ D 0g is an ideal. It is generated by the
polynomial x a.
Þ
Here is one of the most important applications of the gcd.
Corollary A.1.15. Let F be an algebraically closed field and let fp1.x/; : : : ;
pn .x/g be a set of polynomials not having a common zero. Then there is a
set of polynomials fq1 .x/; : : : ; qn .x/g such that
p1 .x/q1 .x/ C    C pn .x/qn .x/ D 1:

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 235 — #249

i

i

A.1. Basic properties

235

Proof. Since fp1.x/; : : : ; pn .x/g have no common zero, they have no nonconstant polynomial as a common divisor. Hence their gcd is 1. The corollary then follows from Theorem A.1.11.
Definition A.1.16. A set of polynomials fp1.x/; p2 .x/; : : :g is relatively prime if it has gcd 1.
Þ
We often phrase this by saying the polynomials p1 .x/; p2 .x/; : : : are
relatively prime.
Remark A.1.17. Observe that fp1 .x/; p2 .x/; : : :g is relatively prime if
and only if the polynomials pi .x/ have no nonconstant common factor. Þ
Closely related to the greatest common divisor (gcd) is the least common multiple (lcm).
Definition A.1.18. Let fp1 .x/; p2 .x/; : : :g be a set of polynomials.
A monic polynomial m.x/ is a least common multiple (lcm) of fp1 .x/;
p2 .x/; : : :g if it has the properties
(1) Every pi .x/ divides m.x/.
(2) If n.x/ is any polynomial that is divisible by every pi .x/, then m.x/
divides n.x/.
Þ
Theorem A.1.19. Let fp1 .x/; : : : ; pk .x/g be any finite set of nonzero polynomials. Then fp1.x/; : : : ; pk .x/g has a unique lcm m.x/.
Proof. Let J D fpolynomials n.x/ j n.x/ is divisible by every pi .x/g. It is
easy to check that J is an ideal (verify the two properties of an ideal in Definition A.1.5). Also, J is nonzero, as it contains the product p1 .x/    pk .x/.
By Theorem A.1.9, J is generated by a monic polynomial m.x/. We
claim m.x/ is the lcm of fp1 .x/; : : : ; pk .x/g. Certainly m.x/ is divisible by
every pi .x/, as m.x/ is in J. Also, m.x/ divides every n.x/ in J because J,
as the principal ideal generated by m.x/, consists precisely of the multiples
of m.x/.
Remark A.1.20. By the proof of Theorem A.1.19, m.x/ is the unique
monic polynomial of smallest degree in J. Thus the lcm of fp1 .x/; : : : ;
pk .x/g may alternately be described as the unique monic polynomial of
lowest degree divisible by every pi .x/.
Þ
Lemma A.1.21. Suppose p.x/ divides the product q.x/r .x/ and that p.x/
and q.x/ are relatively prime. Then p.x/ divides r .x/.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 236 — #250

i

i

236

Guide to Advanced Linear Algebra

Proof. Since p.x/ and q.x/ are relatively prime there are polynomials f .x/
and g.x/ with p.x/f .x/ C q.x/g.x/ D 1. Then
p.x/f .x/r .x/ C q.x/g.x/r .x/ D r .x/:
Now p.x/ obviously divides the first term p.x/f .x/r .x/, and p.x/ also
divides the second term as, by hypothesis p.x/ divides q.x/r .x/, so p.x/
divides their sum r .x/.
Corollary A.1.22. Suppose p.x/ and q.x/ are relatively prime. If p.x/
divides r .x/ and q.x/ divides r .x/, then p.x/q.x/ divides r .x/.
Proof. Since q.x/ divides r .x/, we may write r .x/ D q.x/s.x/ for some
polynomial s.x/. Now p.x/ divides r .x/ D q.x/s.x/ and p.x/ and q.x/
are relatively prime, so by Lemma A.1.21 we have that p.x/ divides s.x/,
and hence we may write s.x/ D p.x/t.x/ for some polynomial t.x/. Then
r .x/ D q.x/s.x/ D q.x/p.x/t.x/ is obviously divisible by p.x/q.x/.
Corollary A.1.23. If p.x/ and q.x/ are relatively prime monic polynomials, then their lcm is the product p.x/q.x/.
Proof. If their lcm is m.x/, then on the one hand m.x/ divides p.x/q.x/, by
the definition of the lcm. On the other hand, since both p.x/ and q.x/ divide
m.x/, then p.x/q.x/ divides m.x/, by Corollary A.1.22. Thus p.x/q.x/
and m.x/ are monic polynomials that divide each other, so they are equal.

A.2 Unique factorization
The most important property that R D F Œx has is that it is a unique factorization domain.
In order to prove this we need to do some preliminary work.
Definition A.2.1. (1) The units in F Œx are the nonzero constant polynomials.
(2) A nonzero nonunit polynomial f .x/ is irreducible if
f .x/ D g.x/h.x/

with g.x/h.x/ 2 F .x/

implies that one of g.x/ and h.x/ is a unit.
(3) A nonzero nonunit polynomial f .x/ in F Œx is prime if whenever
f .x/ divides a product g.x/h.x/ of two polynomials in F Œx, it divides (at
least) one of the factors g.x/ or h.x/.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 237 — #251

i

i

A.2. Unique factorization

237

(4) Two nonzero polynomials f .x/ and g.x/ in F Œx are associates if
f .x/ D ug.x/ for some unit u.
Þ
Lemma A.2.2. A polynomial f .x/ in F Œx is prime if and only if it is irreducible.
Proof. First suppose f .x/ is prime, and let f .x/ D g.x/h.x/. Certainly
both g.x/ and h.x/ divide f .x/. By the definition of prime, f .x/ divides
g.x/ or h.x/. If f .x/ divides g.x/, then f .x/ and g.x/ divide each other,
and so have the same degree. Thus h.x/ is constant, and so is a unit. By the
same argument, if f .x/ divides h.x/, then g.x/ is constant, and so a unit.
Suppose f .x/ is irreducible, and let f .x/ divide g.x/h.x/. To show
that f .x/ is prime, we need to show that f .x/ divides one of the factors.
By Theorem A.1.11, f .x/ and g.x/ have a gcd d.x/. By definition,
d.x/ divides both f .x/ and g.x/, so in particular d.x/ divides f .x/, f .x/ D
d.x/e.x/. But f .x/ is irreducible, so d.x/ or e.x/ is a unit. If e.x/ D u is
a unit, then f .x/ D d.x/u so d.x/ D f .x/v where uv D 1. Then, since
d.x/ divides g.x/, f .x/ also divides g.x/. On the other hand, if d.x/ D u
is a unit, then d.x/ D 1 as by definition, a gcd is always a monic polynomial. In other words, by Definition A.1.16, f .x/ and g.x/ are relatively
prime. Then, by Lemma A.1.21, f .x/ divides h.x/.
Theorem A.2.3 (Unique factorization). Let f .x/ 2 F Œx be a nonzero polynomial. Then
f .x/ D ug1 .x/    gk .x/
for some unit u and some set fg1 .x/; : : : ; gk .x/g of irreducible polynomials. Furthermore, if also
f .x/ D vh1.x/    hl .x/
for some unit v and some set fh1 .x/; : : : ; hl .x/g of irreducible polynomials,
then l D k and, after possible reordering, hi .x/ and gi .x/ are associates
for each i D 1; : : : ; k.
Proof. We prove this by complete induction on n D deg f .x/. First we
prove the existence of a factorization and then we prove its uniqueness.
For the proof of existence, we proceed by induction. If n D 0 then
f .x/ D u is a unit and there is nothing further to prove. Suppose that we
have existence for all polynomials of degree at most n and let f .x/ have
degree n C 1. If f .x/ is irreducible, then f .x/ D f .x/ is a factorization

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 238 — #252

i

i

238

Guide to Advanced Linear Algebra

and there is nothing further to prove. Otherwise f .x/ D f1 .x/f2 .x/ with
deg f1 .x/  n and deg f2 .x/  n. By the inductive hypothesis f1 .x/ D
u1 g1;1 .x/    g1;s .x/ and f2 .x/ D u2 g2;1 .x/    g2;t .x/ so we have the
factorization

f .x/ D u1 u2 g1;1 .x/    g1;s .x/g2;1 .x/    g2;t .x/;

and by induction we are done.
For the proof of uniqueness, we again proceed by induction. If n D 0
then f .x/ D u is a unit and again there is nothing to prove. (f .x/ cannot be divisible by any polynomial of positive degree.) Suppose that we
have uniqueness for all polynomials of degree at most n and let f .x/ have
degree n C 1. Let f .x/ D ug1 .x/    gk .x/ D vh1 .x/    hl .x/. If f .x/
is irreducible, then by the definition of irreducibility these factorizations
must be f .x/ D ug1 .x/ D vh1 .x/ and then g1 .x/ and h1 .x/ are associates of each other. If f .x/ is not irreducible, consider the factor gk .x/.
Now gk .x/ divides f .x/, so it divides the product vh1.x/    hl .x/ D
.vh1 .x/    hl 1 .x//hl .x/. Since gk .x/ is irreducible, by Lemma A.2.2 it
is prime, so gk .x/ must divide one of these two factors. If gk .x/ divides
hl .x/, then, since hl .x/ is irreducible, we have hl .x/ D gk .x/w for some
unit w, in which case gk .x/ and hl .x/ are associates. If not, then gk .x/ divides the other factor vh1 .x/    hl 1 D .vh1 .x/    hl 2 .x//hl 1 .x/ and
we may repeat the argument. Eventually we may find that gk .x/ divides
some hi .x/, in which case gk .x/ and hi .x/ are associates. By reordering
the factors, we may simply assume that gk .x/ and hl .x/ are associates,
hl .x/ D gk .x/w for some unit w. Then f .x/ D ug1 .x/    gk .x/ D
vh1 .x/    hl .x/ D .vw/h1 .x/    hl 1 .x/g.x/. Let f1 .x/ D f .x/=g.x/.
We see that
f1 .x/ D ug1 .x/    gk

1 .x/

D .vw/h1 .x/    hl

1 .x/:

Now deg f1 .x/  n, so by the inductive hypothesis k 1 D l 1, i.e., k D l ,
and after reordering gi .x/ and hi .x/ are associates for i D 1; : : : ; k 1.
We have already shown this is true for i D k as well, so by induction we
are done.
There is an important special case of this theorem that is worth observing separately.
Corollary A.2.4. Let F be algebraically closed and let f .x/ be a nonzero
polynomial in F Œx. Then f .x/ can be written uniquely as


f .x/ D u x r1    x rn

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 239 — #253

i

i

A.3. Polynomials as expressions and functions

239

with u ¤ 0 and r1 ; : : : ; rn elements of F .
Proof. If F is algebraically closed, every irreducible polynomial is linear,
of the form g.x/ D v.x r /, and then this result follows immediately from
Theorem A.2.3. (This special case is easy to prove directly, by induction on
the degree of f .x/. We leave the details to the reader.)
Remark A.2.5. By Theorem A.1.13, Corollary A.2.4 applies in particular
when F D C.
Þ

A.3 Polynomials as expressions
and polynomials as functions
Let p.x/ 2 F Œx be a polynomial. There are two ways to regard p.x/: as an
expression p.x/ D a0 C a1 x C   C an x n , and as a function p.x/ W F ! F
by c 7! p.c/. We have at times, when dealing with the case F D R or C,
conflated these two approaches. In this section we show there is no harm in
doing so. We show that if F is an infinite field, then two polynomials are
equal as expressions if and only if they are equal as functions.
Lemma A.3.1. Let p.x/ 2 F Œx be a polynomial and let c 2 F . Then
p.x/ D .x c/q.x/ C p.c/ for some polynomial q.x/.
Proof. By Theorem A.1.3, p.x/ D .x
substitute x D c to obtain a D p.c/.

c/q.x/ C a for some a 2 F . Now

Lemma A.3.2. Let p.x/ be a nonzero polynomial of degree n. Then p.x/
has at most n roots, counting multiplicities, in F . In particular, p.x/ has at
most n distinct roots in F .
Proof. We proceed by induction on n. The lemma is clearly true for n D 0.
Suppose it is true for all polynomials of degree n. Let p.x/ be a nonzero
polynomial of degree n C 1. If p.x/ does not have a root in F , we are done.
Otherwise let r be a root of p.x/. By Lemma A.3.1, p.x/ D .x r /q.x/,
where q.x/ has degree n. By the inductive hypothesis, q.x/ has at most n
roots in F , so p.x/ has at most n C 1 roots in F , and by induction we are
done.
Corollary A.3.3. Let p.x/ be a polynomial of degree at most n. If p.x/ has
more than n roots, then p.x/ D 0 (the 0 polynomial).

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 240 — #254

i

i

240

Guide to Advanced Linear Algebra

Corollary A.3.4. (1) Let f .x/ and g.x/ be polynomials of degree at most
n. If f .c/ D g.c/ for more than n values of c, then f .x/ D g.x/.
(2) Let F be an infinite field. If f .x/ D g.x/ for every x 2 F , then
f .x/ D g.x/.
Proof. Apply Corollary A.3.3 to the polynomial p.x/ D f .x/

g.x/.

Remark A.3.5. Corollary A.3.4(2) is false if F is a finite field. For example, suppose that F has n elements c1 ; : : : ; cn . Then f .x/ D .x c1 /.x
c2 /    .x cn / has f .c/ D 0 for every c 2 F , but f .x/ ¤ 0.
Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 241 — #255

i

i

CHAPTER

B

Modules over principal
ideal domains
In this appendix, for the benefit of the more algebraically knowledgable
reader, we show how to derive canonical forms for linear transformations
quickly and easily from the basic structure theorems for modules over a
principal ideal domain (PID).

B.1 Definitions and structure theorems
We begin by recalling the definition of a module.
Definition B.1.1. Let R be a commutative ring. An R-module is a set
M with a pair of operations satisfying the conditions of Definition 1.1.1
except that the scalars are assumed to be elements of the ring R.
Þ
One of the most basic differences between vector spaces (where the
scalars are elements of a field) and modules (where they are elements of a
ring) is the possibility that modules may have torsion.
Definition B.1.2. Let M be an R-module. An element m ¤ 0 of M is
a torsion element if r m D 0 for some r 2 R, r ¤ 0. If m is any element of
M its annihilator ideal Ann.m/ is the ideal of R given by
Ann.m/ D fr 2 R j r m D 0g:
(Thus Ann.0/ D R and m ¤ 0 is a torsion element of M if and only if
Ann.m/ ¤ f0g.)
If every nonzero element of M is a torsion element then M is a torsion
R-module.
Þ
241

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 242 — #256

i

i

242

Guide to Advanced Linear Algebra

Remark B.1.3. Here is a very special case: Let M D R and regard M
as an R-module. Then we have the dual module M  defined analogously to
Definition 1.6.1, and we can identify M  with R as follows: Let f 2 M  ,
so f W M ! R. Then we let f 7! f .1/. (Otherwise said, any f 2 M  is
given by multiplication by some fixed element of R, f .r / D r0 r , and then
f 7! r0 .) For s0 2 R consider the principal ideal J D s0 R D fs0 r j r 2
Rg. Let N D J and regard N as a submodule of M . Then

Ann s0 D Ann .N /

where Ann .N / is the annihilator as defined in Definition 1.6.10.

Þ

Here is the basic structure theorem. It appears in two forms.
Theorem B.1.4. Let R be a principal ideal domain (PID). Let M be a
finitely generated torsion R-module. Then there is an isomorphism
M Š M1 ˚    ˚ Mk
where each Mi is a nonzero R-module generated by a single element wi ,
and Ann.w1 /      Ann.wk /. The integer k and the set of ideals
fAnn.w1 /; : : : ; Ann.wk /g are well-defined.
Theorem B.1.5. Let R be a principal ideal domain (PID). Let M be a
finitely generated torsion R-module. Then there is an isomorphism
M Š N1 ˚    ˚ Nl
where each Ni is a nonzero R-module generated by a single element xi ,
e
and Ann.xi / D pi i R is the principal ideal of R generated by the element
ei
pi , where pi 2 R is a prime and ei is a positive integer. The integer l and
e
the set of ideals fp1e1 R; : : : ; pl l Rg are well-defined.
Remark B.1.6. In the notation of Theorem B.1.4, if Ann.wi / is the principal ideal generated by the element ri of R, the condition Ann.w1 / 
    Ann.wk / is that ri is divisible by ri C1 for each i D 1; : : : ; k 1. Þ

B.2

Derivation of canonical forms

We now use Theorem B.1.4 to derive rational canonical form, and Theorem B.1.5 to derive Jordan canonical form.
We assume throughout that V is a finite-dimensional F -vector space and
that T W V ! V is a linear transformation.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 243 — #257

i

i

B.2. Derivation of canonical forms

243

We let R be the polynomial ring R D F Œx and recall that R is a PID.
We regard V as an R-module by defining
p.x/.v/ D p.T /.v/

for any p.x/ 2 R and any v 2 V:

Lemma B.2.1. V is a finitely generated torsion R-module.
Proof. V is a finite-dimensional F -vector space, so it has a finite basis B D
fv1 ; : : : ; vn g. Then the finite set B generates V as an F -vector space, so
certainly generates V as an R-module.
To prove that v ¤ 0 is a torsion element, we need to show that p.T /.v/ D
0 for some nonzero polynomial p.x/ 2 R. We proved this, for every v 2 V ,
in the course of proving Theorem 5.1.1 (or, in matrix terms, Lemma 4.1.18).

To continue, observe that Ann.v/, as defined in Definition B.1.2, is the
principal ideal of R generated by the monic polynomial mT ;v .x/ of Theorem 5.1.1, and we called this polynomial the T -annihilator of v in Definition 5.1.2.
We also observe that a subspace W of V is an R-submodule of V if and
only if it is T -invariant.
Theorem B.2.2 (Rational canonical form). Let V be a finite-dimensional
vector space and let T W V ! V be a linear transformation. Then V has a
basis B such that ŒT B D M is in rational canonical form. Furthermore,
M is unique.
Proof. We have simply restated (verbatim) Theorem 5.5.4(1). This is the
matrix translation of Theorem 5.5.2 about the existence of rational canonical T -generating sets. Examining the definition of a rational canonical T generating set in Definition 5.5.1, we see that the elements fwi g of that
definition are exactly the elements fwi g of Theorem B.1.4, and the ideals Ann.wi / are the principal ideals of R generated by the polynomials
mT ;wi .x/.
Corollary B.2.3. In the notation of Theorem B.1.4, let fi .x/ D mT ;wi .x/.
Then
(1) The minimum polynomial mT .x/ D f1 .x/.
(2) The characteristic polynomial cT .x/ D f1 .x/    fk .x/.
(3) mT .x/ divides cT .x/.

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 244 — #258

i

i

244

Guide to Advanced Linear Algebra

(4) mT .x/ and cT .x/ have the same irreducible factors.
(5) (Cayley-Hamilton Theorem) cT .T / D 0.
Proof. For parts (1) and (2), see Corollary 5.5.6. Parts (3) and (4) are then
immediate. For (5), mT .T / D 0 and mT .x/ divides cT .x/, so cT .T / D
0.
Remark B.2.4. We have restated this result here for convenience, but the
full strength of Theorem B.2.2 is not necessary to obtain parts (2), (4), and
(5) of Corollary B.2.3—see Theorem 5.3.1 and Corollary 5.3.4.
Þ
Theorem B.2.5 (Jordan canonical form). Let F be an algebraically closed
field and let V be a finite-dimensional F -vector space. Let T W V ! V be
a linear transformation. Then V has a basis B with ŒT B D J a matrix in
Jordan canonical form. J is unique up to the order of the blocks.
Proof. We have simply restated (verbatim) Theorem 5.6.5(1). To prove this,
apply Theorem B.1.5 to V to obtain a decomposition V D N1 ˚    ˚ Nl
as R-modules, or, equivalently, a T -invariant direct sum decomposition of
V . Since F is algebraically closed, each prime in R is a linear polynomial.
Now apply Lemma 5.6.1 and Corollary 5.6.2 to each submodule Ni .
Remark B.2.6. This proof goes through verbatim to establish Theorem 5.6.6, the existence and essential uniqueness of Jordan canonical form,
under the weaker hypothesis that the characteristic polynomial cT .x/ factors into a product of linear factors. Also, replacing Lemma 5.6.1 by Lemma
5.6.8 and Corollary 5.6.2 by Corollary 5.6.10 gives Theorem 5.6.13, the existence and essential uniqueness of generalized Jordan canonical form. Þ

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 245 — #259

i

i

Bibliography
There are dozens, if not hundreds, of elementary linear algebra texts, and
we leave it to the reader to choose her or his favorite. Other than that, we
have:
[1] Kenneth M. Hoffman and Ray A. Kunze, Linear Algebra, second
edition, Prentice Hall, 1971.
[2] Paul R. Halmos, Finite Dimensional Vector Spaces, second edition,
Springer-Verlag, 1987.
[3] William A. Adkins and Steven H. Weintraub, Algebra: An Approach
via Module Theory, Springer-Verlag, 1999.
[4] Steven H. Weintraub, Jordan Canonical Form: Theory and Practice,
Morgan and Claypool, 2009.
[1] is an introductory text that is on a distinctly higher level than most,
and is highly recommended.
[2] is a text by a recognized master of mathematical exposition, and has
become a classic.
[3] is a book on a higher level than this one, that proves the structure
theorems for modules over a PID and uses them to obtain canonical forms
for linear transformations (compare the approach in Appendix B).
[4] is a short book devoted entirely to Jordan canonical form. The proof
there is a bit more elementary, avoiding use of properties of polynomials.
While the algorithm for finding a Jordan basis and the Jordan canonical
form of a linear transformation is more or less canonical, our exposition of
it here follows the exposition in [4]. In particular, the eigenstructure picture
(ESP) of a linear transformation was first introduced there.

245

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 246 — #260

i

i

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 247 — #261

i

i

Index
adjoint, 184, 202
algebraically closed, 234
alternation, 60
annihilator, 34, 35
Arf invariant, 181
associates, 237
basis, 10
orthonormal, 186
standard, 13
Bessel’s inequality, 193
block
generalized Jordan, 140
Jordan, 137

determinant, 63, 68, 73
diagonalizable, 102
simultaneously, 162
dimension, 12, 25
direct sum
T -invariant, 123
orthogonal, 172, 197
dual, 30, 36
double, 39, 40
eigenspace, 91
generalized, 92
eigenstructure picture, 141
labelled, 140
eigenvector, 91
generalized, 92
elementary divisors, 135
endomorphism, 7
expansion by minors, 71
extension field, 3

canonical form
generalized Jordan, 140
Jordan, 138, 244
rational, 134, 243
Cauchy-Schwartz-Buniakowsky inequality, 190
Cayley-Hamilton Theorem, 101, 122, 244
form
chain of generalized eigenvectors, 141
bilinear, 166
change of basis matrix, 47
diagonalizable, 176
codimension, 28
even, 175
cokernel, 28
Hermitian, 170
column space, 7
indefinite, 177
companion matrix, 115, 134
matrix of, 168
complement, 24
negative definite, 177
T -invariant, 123
odd, 175
congruent, 170
positive definite, 177
conjugate congruent, 170
quadratic, 180
conjugate linear, 166
sesquilinear, 166
conjugation, 165
skew-Hermitian, 170
coordinate vector, 42
skew-symmetric, 170
Cramer’s rule, 72
symmetric, 170
degree, 231
Fourier coefficients, 215

247

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 248 — #262

i

i

248
classical, 216
frame, 200
Fredholm, 29
Fundamental Theorem of Algebra, 234
Fundamental Theorem of Calculus, 6
Gram-Schmidt process, 197
greatest common divisor (gcd), 233
group
general linear, 8, 79, 83, 223
Lie, 223
orthogonal, 225
special linear, 74, 224
special orthogonal, 225
special unitary, 227
symplectic, 228
unitary, 227
Hermitian, 171
Hilbert matrix, 86
homomorphism, 7
Hurwitz’s criterion, 178
ideal, 232
annihilator, 241
generator of, 232
principal, 232
identity, 4
identity matrix, 4
image, 7
independent, 23
index, 29, 91, 92
inner product, 189
inner product space, 189
irreducible, 236
isometric, 171
isometry, 171, 204
isometry group, 171
isomorphic, 5
isomorphism, 5
joke, 22
Jordan basis, 137
Jordan block
generalized, 140
kernel, 7, 172

Index
Laplace expansion, 70
least common multiple (lcm), 235
linear combination, 8
linear transformation, 3
quotient, 118
linearly independent, 9
matrix of a linear transformation, 44, 45
minor, 70
Multilinearity, 60
multiplicity
algebraic, 94
geometric, 94
nonsingular, 167
norm, 190, 193
normal, 203
normalization map, 199
notation
.V; '/, 171
C.f .x//, 115
Ek , 92
E1 , 92
E , 92
I, 4
In , 225
Ip;q , 225
Jn , 225
PC B , 47
V  , 30
V1 ? V2 , 172
W ? , 197
W1 C    C Wk , 23
W1 ˚    ˚ Wk , 23
W1 ? W2 , 197
ŒT B , 45
ŒT C B , 44
Œ'B , 168
Œa, 176
ŒvB , 42
Adj.A/, 71
Ann.U  /, 35
Ann.m/, 241
Ann .U /, 34
En , 13
EndF .V /, 7

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 249 — #263

i

i

Index
F 1, 2
F n, 2
F 11 , 2
F A, 3
GLn .F/, 223
GL.V /, 8
GLn .F/, 8
HomF .V; W /, 7
I, 4
Im.T /, 7
Ker.T /, 7
…W , 199
SLn .F/, 74
Span.B/, 9
T  , 36
T adj , 184
TA , 4
kvk, 190
Vol, 58
alg-mult./, 94
˛' , 167
˛' , 185
deg, 231
det, 68
det.A/, 63
det.T /, 73
dim.V /, 12
hx; yi, 166
AC
, 200
k
Gn;k .F/, 200
Sn;k .F/, 200
T  , 202
Ev , 39
L, 6
R, 6
, 27
Op;q .R/, 225
cA .x/, 93, 114
cT .x/, 94, 114
dj ./, 143
djex ./, 143
djnew ./, 143
ei , 3
mA .x/, 97
mT ;v .x/, 111
mT .x/, 97, 112

249
On .C/, 225
On .R/, 225
Un .C/, 227
Up;q .C/, 227
t A, 54
geom-mult(), 94
nullspace, 7
orientation, 82
orthogonal, 172, 192, 205
orthogonal complement, 197
orthogonal projection, 199
orthonormal, 192
parallelogram law, 193
permutation, 66
polar decomposition, 221
polarization identities, 191
polynomial
characteristic, 93, 94, 114, 119, 243
minimum, 97, 112, 119, 243
monic, 232
polynomials
Chebyshev of the first kind, 213
Chebyshev of the second kind, 214
Hermite, 215
Legendre, 213
prime, 236
projection
canonical, 27
quotient, 26
R-module, 241
rank, 173
refinement
quadratic, 181
relatively prime, 235
Schur’s theorem, 210
self-adjoint, 203
shift
left, 6
right, 6
signature, 178
similar, 51
singular value decomposition, 220

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 250 — #264

i

i

250
singular values, 220
skew-Hermitian, 171
skew-symmetric, 171
spanning set, 9
spectral theorem, 209
Stirling numbers, 52
subspace
affine, 25
orthogonal, 173
sum, 23
direct, 23
Sylvester’s law of inertia, 177
symmetric, 171
symmetric group, 66

Index
T -annihilator, 111
T -generate, 117
T -generating set
rational canonical, 132
T -invariant, 117
T -span, 117
torsion, 241
transpose, 54
triangle inequality, 190
triangularizable, 97
unique factorization, 237
unit vector, 192
unitary, 205
units, 236
volume function, 58, 60

i

i
i

i

i

i
“book” — 2011/3/4 — 17:06 — page 251 — #265

i

i

About the Author
Steven H. Weintraub is Professor of Mathematics at Lehigh University.
He was born in New York, received his undergraduate and graduate degrees
from Princeton University, and was on the permanent faculty at Louisiana
State University for many years before moving to Lehigh in 2001. He has
had visiting appointments at UCLA, Rutgers, Yale, Oxford, Göttingen,
Bayreuth, and Hannover, and has lectured at universities and conferences
around the world. He is the author of over 50 research papers, and this is his
ninth book.
Prof. Weintraub has served on the Executive Committee of the Eastern
Pennsylvania-Delaware section of the MAA, and has extensive service with
the AMS, including currently serving as the Associate Secretary for the
AMS Eastern Section.

251

i

i
i

i
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.6 Linearized : Yes Create Date : 2013:07:14 12:39:47+03:00 Creator : dvips(k) 5.86d Copyright 1999 Radical Eye Software Modify Date : 2013:07:14 12:39:47+03:00 XMP Toolkit : Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26 Format : application/pdf Rights : copyright 2011 Mathematical Association of America Producer : PCTeX 6.1 Keywords : Creator Tool : dvips(k) 5.86d Copyright 1999 Radical Eye Software Metadata Date : 2013:07:14 12:39:47+03:00 Marked : True Startup Profile : Print Document ID : uuid:25ebef95-efff-481a-bac7-181142f6138b Instance ID : uuid:606a1b66-d0e0-45b1-81bd-f4dd35c05749 Page Mode : UseOutlines Page Count : 266
EXIF Metadata provided by EXIF.tools
A Guide To Advanced Linear Algebra

Navigation menu

Versions of this User Manual:

Views

Navigation