Solu Manual For MSA Johnson

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 369

DownloadSolu Manual For MSA Johnson
Open PDF In BrowserView PDF
Preface

This solution manual was prepared as an aid for instrctors who wil benefit by

having solutions available. In addition to providing detailed answers to most of the
problems in the book, this manual can help the instrctor determne which of the
problems are most appropriate for the class.
The vast majority of the problems have been solved with the help of available
the problems have been solved with
software (SAS, S~Plus, Minitab). A few of
computer
hand calculators. The reader should keep in mind that round-off errors can occurparcularly in those problems involving long chains of arthmetic calculations.

We would like to take this opportnity to acknowledge the contrbution of many
students, whose homework formd the basis for many of the solutions. In paricular, we
would like to thank Jorge Achcar, Sebastiao Amorim, W. K. Cheang, S. S. Cho, S. G.
Chow, Charles Fleming, Stu Janis, Richard Jones, Tim Kramer, Dennis Murphy, Rich
Raubertas, David Steinberg, T. J. Tien, Steve Verril, Paul Whitney and Mike Wincek.
Dianne Hall compiled most of the material needed to make this current solutions manual
consistent with the sixth edition of the book.
The solutions are numbered in the same manner as the exercises in the book.
Thus, for example, 9.6 refers to the 6th exercise of chapter 9.
We hope this manual is a useful aid for adopters of our Applied Multivariate
Statistical Analysis, 6th edition, text. The authors have taken a litte more active role in
the preparation of the current solutions manual. However, it is inevitable that an error or
two has slipped through so please bring remaining errors to our attention. Also,
comments and suggestions are always welcome.
Richard A. Johnson
Dean W. Wichern

.ì

Chapter 1
1.1

Xl =" 4.29

X2 = 15.29

51i = 4.20

522 = 3.56

S12 = 3.70

1.2 a)
Scatter Plot and Marginal Dot Plots

.
.

.

.

.

.

.

.

.
.
.

.

17.5

.

15.0

I'

.

.

.

.

.

12~5

.

.

.

)C

10.0

.

7.5

.
.

.
.
.

.
.

5.0
0

4

2

6

10

8

12

xl

b) SlZ is negative

c)
Xi =5.20 x2 = 12.48 sii = 3.09 S22 = 5.27

SI2 = -15.94 'i2 = -.98
Large Xl occurs with small Xz and vice versa.
d)
x = 12.48

(5.20 )

Sn --

-15.94)

-15.94
( 3.09

5.27

R =(
1 -.98)
-.98 1

.

2

1.3

SnJ6 : -~::J

x =

-

-. 40~

UJ

R =

. 3~OJ

.(1
(synet:;
.577c )

L (synetric) 2 .

1.4 a) There isa positive correlation between Xl and Xi. Since sample size is

small, hard to be definitive about nature of marginal distributions.
However, marginal distribution of Xi appears to be skewed to the right. .
The marginal distribution of Xi seems reasonably symmetrc.
.....'....._.,..'....,...,..'.":

SCëtter.PJot andMarginaldøøt:~llôt!;

. .

. . .

.

25

20

.

I'
)C

15

.

.
50

.

.

.

.

..

.

.

10

.

. .

.

.
.
.

.

.

.

.

.
100

;..

150

xl

200

250

300

b)
Xi = 155.60 x2 = 14.70 sii = 82.03 S22 = 4.85

SI2 = 273.26 'i2 = .69
Large profits (X2) tend to be associated with large sales (Xi); small profits

with small sales.

3
1.5 a) There is negative correlation between X2 and X3 and negative correlation

between Xl and X3. The marginal distribution of Xi appears to be skewed to

the right. The marginal distribution of X2 seems reasonably symmetric.
The marginal distribution of X3 also appears to be skewed to the right.

Sêåttêr'Plotäl'(i'Marginal.DotPiØ_:i..~sxli.

.

.

. .

.

.

. .

1600

)C

.

.
.

1200
M

.

.

.
.
.
. .

.

.
.

.800

.

400

.

.

.

.

0

25

20

15

10

. . . .

x2

. '-'

. .Scatiêr;Plötànd:Marginal.alÎ.'llÎi:lîtfjtì.I...

. .

. . .

1600

)C

.

.

. .
.

.

1200
M

.

..

.
.

800

.

400

.

.
0

50

100

150

xl

200

250

.
300

.
.
.
.

.

.

. . .

4

1.5 b)
273.26

Sn = 273.26
(- 32018.36
82.03

x = 14.70

710.91
(155.60J
1

-.85
1.6

- 948.45

461.90

-.85)
-.42

.69
R = ( ~69

4.85

-32018.36)
-948.45

-.42

1

a) Hi stograms

Xs

Xi
NUMBER OF

HIDDLE OF

HIDIILE OF
INTERVAL
co

OBSERVATioNS
*****
5

INTERVAL

5.
6.
7.
8.
9.

6.

********

S

******

X2
NUHBER OF
OBSERVATIONS
1
*

HIDDLE OF
INTERVAL

30.
40.
50.
60.
70.
80.
90.

J

2
3
10

12
a

100.
110.

2
1

s

9.
10.
11.
12.
13.
14.
15.
16.
17.

un*

6
4
4

**********
************
********

0

2.
4.
6.

1*

OBSERVA T I'ONS

1.

.... .

:3 .

4.

s.

NUMBER OF.
OBSERVA T IONS

1 J ***$*********
15 ***************

a ********
5
1 ui**
*

J ****
***
4
7. *******

5 n***

2
2 **
u

1*
2
1 **
*

20.
22.
24.
26.

X4

NUltEiER OF
OI4SERVATIOllS

7 *******
B ********

10.
12.
14.
16.
18.

NUHBER OF

S u***

INTERVAL

*

1

I' I DOLE .OF

19 *******************
9 *********
.3 U*

HIDDLE OF

*

1

Xl

:s *****

4.
5.
6.
7.

*

0

a.

J.

*

1

n

*

X3

2.

****

1

0

INTE"RVAL

HIDIILE OF
INTERVAL

4

0

19.
20.
21.

**
***

u**
Uu.*

.S

LS.

n*

*****
*****
******
****

S

a.

***********

11
5
6

U*

J

7.

u*****

7

10.

oJ .

NUMB£R. OF

08SERVATIONS
2
**

o
o

X7
HIDDLE OF
INTERVAL

2.

J.
4.
s.

NUMBER OF
OBSERVATIONS

7 *******
9 *********
1*

25 *************************

5

1.6

2.440

7.5

b)

293.360

73 .857

-

x

-2 . 714

4.548
=

2. 191

-.369
3.816
1.486

S =

-.452

- . 571

-2.1 79

.,.67

-1 .354

30.058
2.755

:609

.658

6.602
2.260

1 . 154

1 . 062

-.7-91

.172

11. 093

3.052

1 .019

30.241

.580

n

1 0 . 048

9.405
3.095

.138

.467

(syrtric)

The pair x3' x4 exhibits a small to moderate positive correlation and so does the

pair x3' xs' Most of the entries are small.

1.7
ill

b)

3

x2

.

4

4

..

.

2

.
Xl

2

2 4
Scatter.p1'Ot
(vari ab 1 e space)

~ ~tem space.)

1

-6

1.8 Using (1-12) d(P,Q) = 1(-1-1 )2+(_1_0)2; = /5 = 2.236

Using (1-20) d(P.Q)' /~H-1 )'+2(l)(-1-1 )(-1-0) '2t(-~0);' =j~~ = 1.38S
Using (1-20) the locus of points a c~nstant squared distance 1 from Q = (1,0)

is given by the expression t(xi-n2+ ~ (x1-1 )x2 + 2t x~ = 1. To sketch the
locus of points defined by this equation, we first obtain the coordinates of

some points sati sfyi ng the equation:
(-1,1.5), (0,-1.5), (0,3), (1,-2.6), (1,2.6), (2,-3), (2,1.5), (3,-1.5)
The resulting ellipse is:

X1

1.9

a) sl1 = 20.48

s 12 = 9.09

s 22 = 6. 19
X2

.

5

.

.
.

0

.
-"5

.

.

.
5

xi
10

7

1.10 a) This equation is of the fonn (1-19) with aii = 1, a12 = ~. and aZ2 = 4.
Therefore this is a

distance for correlated variables if it is non-negative
easily if we write

for all values of xl' xz' But this follows

2. 2. 1 1 15 2.

xl + 4xZ + x1x2 = (xl + r'2) + T x2 ,?o.

b) In order for this expression to be a distance it has to be non-negative for
2. :¿

all values xl' xz' Since, for (xl ,x2) = (0,1) we have xl-2xZ = -Z, we
conclude that this is not a validdistan~e function.

1.11
d(P,Q) = 14(X,-Yi)4 + Z(-l )(x1-Yl )(x2-YZ) + (x2-Y2):¿'

= 14(Y1-xi):¿ + 2(-i)(yi-x,)(yZ-x2) + (xz-Yz):¿' = d(Q,P)

Next, 4(x,-yi)2. - 2(xi-y,)(x2-y2) + (x2-YZ): =

=,(x1-Yfx2+Y2):1 + 3(Xi-Yi):1,?0 so d(P,Q) ~O.

The s€cond term is zero in this last ex.pr.essi'on only if xl = Y1 and
then the first is

zero only if x.2 = YZ.

8

1.12 a)

If P = (-3,4) then d(Q,P) =max (1-31,141) = 4
b) The locus of points whosesquar~d distance from (n,O) is , is
X2
.1

1

..

1

-1

7

x,

-1

c) The generalization to p-dimensions is given by d(Q,P) = max(lx,I,lx21,...,lxpl)'

1.13 Place the faci'ity at C-3.

9

1.14 a)
360.+ )(4

.
320.+

.

.
280.+

.
.
....
. .

240.+

200.+

. ...

.

. .
.

..

.

.. .

.*

I:
I:

160.+

+______+_____+-------------+------~.. )(2

130. 1:5:5. 180. 20:5.' 230. 2:5:5.

Strong positive correlation. No obvious "unusual" observations.
b) Mul tipl e-scl eros; s group.

42 . 07

179.64

x =

12.31

236.62
13.16

116.91
Sn

61 .78

-20.10

61 . 1 3

-27 . 65

812.72

-218.35

865.32

3 as . 94

221 '. 93

90.48
286.60
82.53

=

1146.38

(synetric)

337.80

10

.200

1

1

R

-. H)6

.167

-.139

.438

.896

.173

1

.375

.892
.133

=

1

( synetrit: )

1

Non multiple-sclerosis group.

37 . 99

147.21

i = 1 .56
1 95.57

1.62

5.28
1.84
1.78

273.61 95.08
11 0.13

sn =

1 01 . 67

3.2u

1 03 .28

2.15

2.22

.49
2.35
2.32

183 . 04 .

(syietric)

1

.548
1

R

=

(symmetric)

.239
.132

.454
.727

.127
.134

1

.123

.244

1

.114
1

11

1.15

a) Scatterplot of x2 and x3.

., ..".. ... . . . ., .. .0 .

. . . . . . + . . . . l . . . . + . . . . + . . . . . . . . . +. . . . . . . . . + . . .. .... + . . . . . . . . . + . .. . . . . . . . . .

.

.l -

.

.

I

.

.

1

.

3. it

.

.

.

J

-

.

.

1.2

.

t

1

1

.,

.

I

1

--

.

~

.

1

.

1

t
.

"'. .

1

-

E
E

.

1

1

I

--

~

.

.

.

t

I

:

.
I

-.- '_ 1

X:i

.

~

I

I

+.

1

.

2

.

1

1

III

2. cl

.
.

1

I

1

.
.

1

I

. .

.z.o

.1

3

.

.

1

I

.

+

t

1

.
.

.

.

I
1
.

\1

.

1

t

I

i

.

I

1

\1

.

.

1

1

.

1

.

i
J .2

I

t

1

.

l

1

.

.

1

.

J
1I

2

1

.

1

.80

..

I

. . .. .. . . . .

. 75f)

. .

1

.

.. . .. .

1.25
1.88
t.~~

.

.. .

1.75

.. '. ...

t. ~ e

.. . . ..

2.25

... ...-

2.7'5 3.25

Z "~A

ACTIVITY X%

b)

3.54
1.81

x =

2.14
~.21
2.58
1.27

.. .

3.P,1) 3.S11

.
3.75 G.25
".--. . .

1I.llfl

12

4.61

1.15

Sn

..92

.58

.27

.61

.11

.12

.57

.09

1.~6
.39
.34

.11

.21

=

.85
;.

(synetric)

1

.551
1

.362
.187
1

R

=

.386
.455
.346
1

(syretric)

.537

.15

-.02
.11

.02
-.01
.85

. 077 '

.535
.496
.704

-.035

1

-. 01 0

.156
.071

1

The largest correlation is between appetite and amount of food eaten.

Both activity and appetite have moderate positive correlations with
symptoms. A1 so, appetite and activity have a moderate positive
correl a tion.

13

1.16

There are signficant positive correlations among al variable. The lowest correlation is
. 0.4420 between Dominant humeru and Ulna, and the highest corr.eation is 0.89365 bewteen
Dominant hemero and Hemeru.
0.8438
0.8183

1.00000 0.85181 0.69146 0.66826 0.74369 0.67789
0.85181 1.00000 0.61192 0.74909 0.74218 0.80980

1. 7927
1. 7348

, R = 0.66826 0.74909 0.89365 1.00000 0.ti2555 0.61882

0.7044
0.6938

0.74369 0.74218 0.55222 0.62555 1.00000 0.72889
0.67789 0.80980 0.44020 0.61882 0.72889 1.00000

x-

0.69146 0.61192 1.00000 -0.89365 0.55222 0.4420

0.0124815 0.0099633 0.0214560 0.0192822 0.0087559 0.0076395
0.0099633 0.0109612 0.0177938 0.0202555 0.0081886 0.0085522
0.02145tiO

Sn -

0.0192822
0.0087559
0.0076395

0.0177938
0.0202555
0.0081886
0.0085522

0.0771429
0.0641052
0.0161635
0.0123332

0.0641052
0.0667051
0.0170261
0.0161219

0.0161635
0.0170261
0.0111057
0.0077483

0.0123332
0.0161219
0.0077483
0.0101752

1.17

There are large positive correlations among all variables. Paricularly large
correlations occur between running events that are "similar", for example,
the 1 OOm and 200m dashes, and the 1500m and 3000m runs.

11.36

.152

.338 .875

.027

.082

.230

4.254

23.12

.338

.847 2.152

.065

.199

.544

10.193

51.99

.875

2.152 6.621

.178

.500

1.400

.027

.065 .178

.007

.021

. .060

.082

.199 .500

.021

.073

.212

.230

.544 1.400

.060

.212

.652

28.368
1.197
3.474
10.508
265.265

x = 2.02
4.19
9.08
153.62

So=

4.254 10.193 28.368 1.197

1.000 .941.871 .809 .782 .728 .669
.941 1.000 .909 .820 .801 .732 .680

.871 .909 1.000 .806 .720 .674 .677

R = .809 .820 .806 1.000 .905 .867 .854
.782 .801 .720. .905 1.000 .973 .791

.728 .732 .674 .867 .973 1.000 .799
.669 .680 .677 .854 .791 .799 1.000

3.474 10.508

14

1.18

There are positive correlations among all variables. Notice the correlations
decrease as the distances between pairs of running events increase (see the first
column of the correlation matrx R). The correlation matrix for running events
measured in meters per second is very similar to the correlation matrix for the
running event times given in Exercise 1.17.
8.81

.091 .096 .097 .065 .082 .092 .081

8.66

.096 .115 .114 .075 .096 .105 .093

7.71

.097 .114 .138 .081 .095 .108 .102

x = 6.60

Sn = .065 .075 .081 .074 .086 .100 .094

5.99

.082 .096 .095 .086 .124 .144 .118

5.54
4.62

.092 .105 .108 .100 .144 .177 .147
.081 .093 .102 .094 .118 .147 .167

1.000 .938 .866 .797 .776 .729 .660
.938 1.000 .906 .816 .806 .741 .675

.866 .906 1.000 .804 .731 .694 .672

R = .797 .816 .804 1.000 .906 .875 .852
.776 .806 .731 .906 1.000 .972 .824
.729 .741 .694 .875 .972 1.000 .854

.660 .675 .672 .852 .824 .854 1.000

15

1.19

(a)
o _R A 0 IUS

RADIUS

LHUI.ERUS

tlUME~US

ILULNA

ULNA

c-

..

I

o'

'0

..

o'

:
'.
.'

.'

"

"

..
....

z-

C
...
..

C

.,.
.'

..

:;

00
00.

:

o.

..

:

..

..
..'"

-

-..II
0'

i::

,

,

-

....

-z

....I:

II

~

"
o'
"
.0

-

-

....

-I:
..

:

..
..
QI

t-

"

.~
-..I:

..C
co

..
..
'"

....C

co

co

..-

UI

..

....

....I:

~
....

t-

~
CI

-

.QI

.0

-

QI

CI

".

i

::

;:
o
c:

en

..

.~.,..

'"

o

,...I:

i

=

c:

..¡c::

,

c:

.o''

0,

in

..
....

=

C
~
..

c:
,.

.-

..¡c:;
c:

. ..

en

"
o'

,.

00

"

..
....
Q
I
c:
i-

C

,.
..

.

z:

;:

..

:

..

,,'

"

CI

-..
c:
i-

.,

z:
;:

..

00

'.

..

o

en

..,.

-

..

-0
c:

C

CD

c,.

::
;:

C
..,.

..

..

c
....

.

".
".

.

0,

.

16

1.19

(b)

~.
.c

. P.
~.

.

.

.

.

i:_...to

l

.

.

...

l.!,l
.0'

..'

\
.

...

..

.i

-.

t

.

.

~

..
. ,

-i-.

:i~

... .

.

.

.

"-:f'

I!

.8.,

~

.,

.

I.
~

\; .

.

.1,
. ..

..

~,:~

t.:.
. .
.

.

.
"

.

- ,

.

.~....
...."
..

"

..

. ..

.~
~;
. -it .
. .. . t-

.1.;.

.'\:
..

. . .

... .

:.. .
.~
..(
. .- ,-.

\.
-l
.. 'L

~.

.

...

.'

~..,...

.

.

.~.

.

:,.

"

";t':",o;,
' tl,' !t

~

1"
~.,.
....
..
.

...

\.lý
.

. .,. . .

.

.

'l

. ...
.,

_. .-,

..
.

A.

...~.

.,¡
.,.....

~l

t..,
-Ii..

...

".

....

. ..

.'

... .

· ~c,.

.t:.....

.

.
.
. .~.
.~,. ~..: .~~
. ,.
.
..
...
.
. . \.....
ll.
.
.

.

l.

..

.
..

i

l

0

\. \.~
"~

.. f

.~.
. .;;-

:,.
..

.

~:
.

-\1:
..

.1. ·
.I..
.1fiii: '.~:
.. .

:.

17

1.20
Xl

(a)

.

. ..
..
.
..
'.
. ..

..

(b) ,,' A L_-l_ X

. ..

...
.. .
. .

.

..

~'T · '\\ .. . ....

L _ _ (,"" i. \ l'ø.l l

.

\. . . ." r

. '"
~'~t ...
. . ,"..
\
.ö .
,

.

~

x3

.

~

.

x1

.
..

.
.

\

X3
X1

.
..

.
.

(a1 The plot looks like a cigar shape, but bent. Some observations. in the lower left hand

part could be outliers. From the highlighted plot in (b) (actually non-bankrupt group
not highlighted), there is one outlier in the nonbankruptgroup, which is apparently

located in the bankrupt group, besides the strung out pattern to the right.
(ll) The dotted line in the plot would be an orientation for the classificà.tion.

18

1.21

o

(a)

o

(b)

Outlier

Outlier
~~ô
0'"

.

. ...
.

X1

... .

..

. ... .
...... .

.

.

..

.
.. .

X1

.

.
.
.
. .. . .
...... .

. ..

. .. ... ....
.. ..

G~~

.

.

..-....l.~...
.
.

...
.

.

.

..

.

.
X3

.

tfe'

Ó,e

Outlier Q

(a) There are two outliers in the upper right and lower right corners of the plot.

(b) Only the points in the gasoline group are highlighted. The observation in the upper

right is the outlier. As indiCated in the plot, there is an orientation to classify into two
groups.

19

1.22

possible outliers are indicated.

G

Outlier

~ø

~~

~ø.

t I ~~\e
X1

.
.

.

.. .

.. .
.. . .
. . . . .
fi

.

/ \, X1

Xz/ ·

.

..

. ..

.

.

.

.

..

x,

.

..

Xz

.."

..

.

. . ...
.
.. . . .
. .... .
.
. .
. ...,.

.

x,

..

.. .
.

.

.

fi

..
.

~~e

..

Xz

.

~,et
ot)~

...

)l"

;I
./
.
./

./

.. .~

..

... ./ /
././
..

· ),. il

./ ~e~~\.e
.

..

..

..s..
Outliers

~

VI

Q.

.Q

-0

VI

CI

...
s.
U

..u

VI

C
s.

0

IØ

U

s.

c:

fa

s.

V)

oi
=

V)

s.

II
C'

Cd

.::

oi
::

V)

s.

ci

V)

Co

=

V)

i.
s.

~ci
::
iu

VI

II

ai

ci

~VI::

..a:u

4;

. ..
..

z:
Cd

CJ

I)

.,.
c:

.

~
i-aci
.Q
c:
ci

~

s.

s.

ci
VI

en

::
a.

~

C"

~
::
iU

~ci

~
~s.

~VIa.

--i..
ci

to

::
iCo

.-

-u

Cd

ci
Q.

VI

CI

VI

ci

u

..
..I¡
Cd

a

c:
s.
ci

.c
u

N
oi
c:

-=

s.

~ciVI
en ::
..c:s. iu
ci
~VI i-U::
iVI

-~ .. -u::

cc:
s.

ci

..c;I

-a:
ra

::
VI

s.

c(

.G

M

N.
'"

+J

VI

ra

ci
.c

::

en
i-

Cd

ci

VI

..~a:

i-n:
+J

l-0

..u

::
"'

e
a.

20

21

Cl uster 1

1.24

20

10

13

4

C1 uster 2

3

9

14

19

18

C1 uster 3

22

.s

1

22
Clust~r 4

16

8

11

Cl uster 5

21

5

Cluster 6

17

12

2

C1 uster 7

We have cluster~d these faces

in. the same manner as those in
Example 1.12. Note, however,

other groupings are~qually .

plausible. for instance, utilities
9 and 18 l1ight be swit.ched from

7.

'5

Cluster 2 toC1 uster 3 and so

forth.

23

1.25

We illustrate one cluster of "stars.l. The
shown) can be gr~uped in 3 or 4 additional

r.emai ni ng

stars

cl usters.

....

10

4

-.

'.

-.

."-,,.

/ ....1

¡.. ..l ".......;-

....'." .¡..~.

'/

....~." I:

. ...0: .. .":

..
.i ....
-,'-1

.....: l. f

~

20

",. -.

13

'-a.-

(not

24

1.26 Bull data
R

(a) XBAR

Breed SalePr YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt

4. 3816
1742.4342
50.5224
995.9474
70.8816
6.3158
0.1967

1.000 -0.224 0.525 0.409 0.472 0.434 -0.~15 0.487 0.116
-0.224 1.000 0.423 0.102 -0.113 0.479 0.277 0.390 0.317

o . 525 0 .423 1 .000 O. 624 0 . 523 0 . 940 -0.344 0 . 860 0 . 368
0.409 0.102 0..624 1.000 0.691 0.605 -0.168 0.699 0.5££
0.472 -0.113 0.523 0.691 1.000 0.482 -0.488 0.521 0.198
0.434 0.479 0.940 0.605 0.482 1.000 -0.260 0.801 0.368
-0.615 0.277 -0.344 -0.168 -0.488 -0.260 1.QOO ~0.282 0.208

£4. 1263

0.487 0.390 0.860 0.699 0.521 0.801 -0.282 1.00 0.~66

1555.2895

0.116 0.317 0.368 0.555 0.198 0.368 0.208 0.566 1.000

Sn

SalePr
-429.02

Breed

YrHgt FtFrBody PrctFFB

Frame

BkFat

2.79 116.28
1.23 -0.17
4;73
-429 ..02 383026.64 450.47 5813.09 -226.46 272.78 15.24
450. 47
2.96
2.79
98.81
2.92
1.49 -0.05
5813.09 98.81 8481. 26 206 . 75 51. 27 -1.38
116.28
-226.46
2.92 206. 75 10.55
4.73
1.44 -0.14
272.78
1.49
51.27
1.44
1.23
0.85 -0.02
15.24 -0.05
-0. 17
-1. 38
-0.14 -0.02 0.01
480 . 56
2.94 128.23
3.00
3.37
1.47 -0.05
46.32 25308.44 81.72 6592.41 82.82 43.74 2.38

9.55

SaleHt
3.00
480.56
2.94
128.23
3.37
1.47

'SaleWt

46.32
25308 . 44

81. 72

6592.41
82.82
43.74
2.38
145.35
16628.94

-0.05
3.97
145 . 35

90 1100 t30

5.0 6.0 7.0 8.0
.
.
. . . .

.,
CD

CD

.

Breed

.

.

. . . .

.

Breed

..

N

'"

.

cci .

a.. .

.

.

.

.

..

. . . . . .
. .
.

.
~

.

. . . . .

.

.

8
--¡.:

Frame
:i .

~ .
.

.

. . . . . . .

.

.

.

.

.

.

.

.

.

.
.
.

.
.

.
.
.
.

.

.

. .
. .

.

.

.

2 4 6 8

§!

. . . .

.
.

~
.

~

.

.
.

.
.

FtFrB

I

,

.
.

.

.

.
.

BkFat

.

'"

d
'"

d

.

.

d
O. t 0.2 0.3 0.4 0.5

.

. ... .
I..,.,;....
'.

..~'.l . .

.

. .., .1, .
. . .

. - . -:-

.I ..-.. .
. ....

l. \..i'..
..,..
-.-'..

o

CD

..

CD
CD
on

SaleHt

l: -:,: .

. ,.

:;
~

g

2 4 6 8

50 52 54 1i 58 60

25

1.27
(a) Correlation r = .173

Scatterplot of Size Y5 viSitors

2500

.
2000

.

1500

iI
üi

.
.1000

.

.

Gæct '5lio\£~

"' .

500

..
. .

0

0

1

. .
.

. .
2

3

4

5

6

7

8

9

Visitors

(b) Great Smoky is unusual park. Correlation with this park removed is r = .391.
This single point has reasonably large effect on correlation reducing the positive
correlation by more than half when added to the national park data set.

(c) The correlation coefficient is a dimensionless measure of association. The
correlation in (b) would not change if size were measured in square miles
instead of acres.

26

Chapter 2
2.1

a)

;

...

---.

I,

i
;

:

i

,

,

:

... ------

II

i
i

.

I

.
i

, .
~

,

I
!

.

/l

oJ

:

:
,
r

I
,

-A : t- ~ ..1'-

, ø ':, . - ,i :=-: -j.1 3 d~j =

,:':
,,,~.;
. .,;'
.
,.. .:~..,:/
_~,__
"
,1
'
,1
..
.
.
.
/"
1-.
: '7
~~
/.
-~'"'
K
(-~
'
"'7' .! . _ ... i I .
~ =, -ii;;;;;.I~ -.A
: 'JO " : i i ; , , ,
l-' : ; '.,;
../ :
'-'
,
ILl
/
i I;
:I Ii I i
-tI ! ii i
7.
./

I

;

~

.

,

,

,

,

I

!

! i/'i : g

¡
,

.

:

;

;

¡

i

~. .

:
,

.
i

:
I

i

.

, 'i;

i ,,/_.

.

!

"

I

I
,

I

,

I ,

i
;

I
i

I I

I

,
,
I

,
:

I

,

i !!

I i : I

.

,

,

,

i

!

: :

1
I

I

I
,

./

== i

.

i I

J

:

.

,t

I

b)

i)

il)

Lx =

RX

cas(e)

=

..
x.y

=

Iß

=

5.9l'i

1

=

=

19.621

LxLy

.051

- -

e
; 1 i)

;, 870

= arc cos ( .051 )

proJection of

L

on

x

;s

lt~i
is1is

x =

i

3š

x

= 7~35'35

(1 1 31'

c)
':
: i
, :

! I

I .

. i
, I

;-.

i

i : l :

=i

i

I 1
I i

02-2:

. I

!- .

::t

i

i I

,

i'

i,

:i:-'.
i

.1 ..

:i

'i

i
I

.1

i.

i

:3
.

1
~
I

~

;

"T

~-:~~-';'-i-'
.
~._~-:.i" 1 .'

:.
. :. -"-- --_..-- ---

I

27

15)
2.2

a)

b)

SA =

20

-6

r-6

1a

( -~

-q

SA = - ~

-9

c)

d)

=

AIBI

-1

(-1 :

e)

No.

a)

AI

b)

C'

c

i

so

3.

. (~

(A I) = A' = A

(C'f"l~

:l

.(:

1a

-ìa
lõ,i
31

-1 =

J)
10

c)
8

' 10

=

4

(1:

AB

=

n

(i
has

-Tõ

7)'

(AB) , =

d)

il). (t''-'

(C' J' 'l- 1~

iÕ 4
-iÕJ
i2

B'A'

(12, -7)

-: )

1). A

2.3

=

C'B

U ':11 )

11

(~

~)

-

=

(~

(AB) i

':)
11

(i ~j )th entry
k

a,.
= i aitb1j
1 =Ja"b1,
1 +Ja'2b2'
1 J+...+
1 a,,,b,,,
J R.=1
Consequently,

(AB) i

has

entry.
('1 ~J',)th

k
c,, =

Jl

Next ßI

I

ajR,b1i ,

1=1

has

.th row (b, i ,b2i ~'" IbkiJ

1

and

A'

lias. jth

28

column (aji,aj2"",ajk)1 so SIAl has ~i~j)th entry
k

bliaji+b2ibj2+...+bk1~jk = t~l ajtb1i = cji
51 nce i and j were arbi trary choices ~ (AB) i = B i A I .
2.4

a)

I = II and AA-l = I = A-1A.

and 1= (A-1A)' = A1(A-l)l.

Thus I i = I = (AA - ~ ) I = (A-l)' A,I
Consequently, (A-l)1 is the inverse

of Al or (AI r' = (A-l)'.
(f1A)B - B-1S' I so AS has inverse (AS)-1 ·
I

bl (S-lA-l)AS _ B-1

B-1 A- i. It was suff1 ci ent to check for a 1 eft inverse but we may

also verify AB(B-1A-l) =.A(~Bi~)A-i = AA-l = I ,

¡s

2.5

IT

QQI

=

-12
13

2,6

12l r

_121 r

IT IT

13 = 1 69

5 12
i3 IT

5

i3 a

169

~1

A' is symetric.

a)

5i nce

b)

Since the quadratic form

A = AI,

= QIQ ,

1 :J .l:

9xi - 4x1 X2 + 6X2

x' Ax . (xi ,x2J ( 9

- - .. -2 -:)(::1~ (2Xi.x2)2 + 5(x;+xi) ~ 0 for tX,lx2) -l (O~O)

we conclude that A is positive definite.

2.7 a) Eigenvalues: Ål = 10, Å2 = 5 .

Nonnalized eigenvectors: ':1 = (2/15~ -1/15)= (,894~ -,447)
~2 = (1/15, 2/15) = (.447, .894)

29

b)

A' V-2

-2 ) . 1 fIlS r2/1S.

-1//5 + 5 (1/1S1 (1/IS,

9-1/~

2/~
2) . (012

c)

-1

A =

2//5

0041

1

9(6)-( -2)( -2)

(: 9 ,04

.18

d) Eigenval ues: ll = ,2, l2 = ,1

Normal;z~ eigenvectors: ;1 = (1/¡;~ 2/15J

;z =: (2/15~, -1I/5J

2.8 Ei genva1 ues: l1 = 2 ~ l2 = -3
Norma 1; zed e; genvectors: ;~ = (2/15 ~ l/~ J

=~ = (1/15. -2/15 J
2) = 2 (2//5) (2/15, 1/15J _ 3( 1/1S)(1//s' -2/151 '
A · (:

2.9

-2 1/15 -2/~

a) A-1
= 1(-2)-2(2)
1 - -1 (-2 -2) -2
=i1131
11

3 6

b) Eigenvalues: l1 = 1/2~ l2 = -1/3

Nonna1iz.ed eigenvectors: ;1 = (2/ß, l/I5J

;z = (i/ß~ -2/I5J

cJ A-l =(t

11 = 1 (2/15) (2/15, . 1//5J _ir 1/15) (1//5, -2/ß1

-1 2 1/15 3L-21 5

30

2.10

B-1- 4(4,D02001
_ 1 r 4.002001
-44,0011
)-(4,OOl)~ ~4,OOl
.
~.0011
= 333,333

-4 , 001

( 4,OÒZOCl
-: 00011

1 ( 4.002

-1

A = 4(4,002)~(4,OOl)~ -4,001

-: 00011

= -1,000,000

-4 , 001
. ( 4.002
Thus A-1 ~ (_3)B-1

with p=2,

aii- and
2.11 With p=l~ laii\ =

aii

a

a

a22

= a11a2Z - 0(0) = aiia22

Proceeding by induction~we assume the result holds for any

(p-i)x(p-l) diagonal matrix Aii' Then writing

aii
=

A

(pxp)

a

a

a
.
.

.

Aii

a

we expand IAI according to Definition 2A.24 to find
IAI = aii I

Aii

I + 0 + ,.. + o. S~nce IAnl =, a2Za33 ... ~pp

by the induction hypothesis~ IAI = al'(a2Za33.... app) =
al1a22a33 ,.. app'

31

2.12 By (2-20), A = PApl with ppi = pip = 1. From Result 2A.l1(e)
IAI = ¡pi IAI Ipil = ¡AI. Since A is a diagonal matrix wlth

p p

diagonal elements Ài,À2~...,À , we can apply Exercise 2.11 to

get I A I = I A I = n À , .
'1
1=

1

2.14 Let À be ,an eigenvalue of A, Thus a = tA-U I. If Q ,is
orthogona 1, QQ i = I and I Q II Q i I = 1 by Exerci se 2.13. . Us; ng
Result 2A.11(e) we can then write
a = I Q I I A-U I I Q i I = I QAQ i -ÀI I

and it follows that À is also an eigenvalue of QAQ' if Q is
orthogona 1 .

2.16

show; ng A i A ; s symetric.

(A i A) i = A i (A i ) I = A i A

Yl

Y = Y 2 = Ax.

p _.. .. ..

Then a s Y12+y22+ ,.. + y2 = yay = x'A1Ax

yp

and AlA is non-negative definite by definition.
2.18

Write c2 = xlAx with A = r 4 -n1. Theeigenvalue..nonnalized

- - tl2 3

eigenvector pairs for A are:
Ài = 2 ~

Å2 = 5,

'=1 = (.577 ~ ,816)
':2 = (.81 6, -, 577)

'For c2 = 1, the hal f 1 engths of the major and minor axes of the

elllpse of constant distance are

~1 12 ~ ~

~ = -i = ,707 and ~ =.. = .447
respectively, These axes 1 ie in the directions of the vectors ~1

and =2 r~spectively,

32

For c2 = 4~ th,e hal f lengths of the major and mlnor axes are

c
2
'
ñ:, .f

c _ 2 _
- = - = 1.414 and -- - -- - .894 .
ñ:2 ' IS

As c2 increases the lengths of, the major and mi~or axes ; ncrease.
2.20 Using matrx A in Exercise 2.3, we determne

Ài = , ,382, :1 = (,8507, - .5257) i
À2 = 3.6'8~ :2 = (.5257., .8507)1
We know

,325)
A '/2 = Ifl :1:1 + 1r2 :2:2

,325

__(' .376

1. 701

- .1453 J

A-1/2 = -i e el + -- e el _ ( ,7608
If, -1 -1 Ir -2 _2 ~ -,1453
We check

Al/ A-1/2 =(:
~) . A-l/2 Al/2

.6155

33

2,21 (a)
A' A = r 1 _2 2 J r ~ -~ J = r 9 1 J

l1 22 l2 2 l19

0= IA'A-A I I = (9-A)2- 1 = (lu- A)(8-A) , so Ai = 10 and A2 = 8.
Next,

U;J ¡::J
¡ i ~ J ¡:~ J

-

10 ¡:~ J

-

8 ¡:~J

gives

gives

ei - . 1/.;

- (W2J

¡ 1/.; J

e2 = -1/.;

(b)

AA'= ¡~-n U -; n = ¡n ~J
o = /AA' - AI 12-A
I - .1 0 80- À40

4 0 8-A

= (2 - A)(8 - A)2 - 42(8 - A) = (8 - A)(A -lO)A so Ai = 10, A2 = 8, and
A3 = O.

(~ ~ ~ J ¡ ~ J - 10 (~J
.gves

¡~

gives

so ei= ~(~J

4e3 - 8ei
8e2 - lOe2
0
8
0 ~J

4e3

4ei

Also, e3 = 1-2/V5,O, 1/V5 J'

-

8 (~J

¡ :: J

-

Gei
U

so e,= (!J

34

\C)

u -~ J - Vi ( l, J ( J" J, 1 + VB (! J (to, - J, I
2,22 (a)
AA' = r 4 8 8 J

l 3 6 -9

r : ~ J = r 144 -12 J

l8 -9 L -12 126

o = IAA' - À I I = (144 - À)(126 - À) - (12)2 = (150 - À)(120 - À) , so
Ài = 150 and À2 =' 120. Next,

r 144 -12) r ei J = .150 r ei J

L -12 126 L e2 le2

. r 2/.; )

gives ei = L -1/.; .

and À2 = 120 gives e2 = f1/v512/.;)'.

(b)
AI A = r: ~ J

l8 -9

r438 8J

- r ~~ i~~ i~ J

l 6-9

25 - À 505

l 5 10 145

0= IA'A - ÀI 1= 50 100 - À 10 = (150 - A)(A - 120)A

5 10 145 - À
so Ai = 150, A2 = 120, and Ag = 0, Next,

¡ 25 50 5 J
50 100 10
5 10 145

gives

r ei J' r ei J
l :: = 150 l::

-120ei + 60e2 0 1 ( J

-25ei + 5eg
VùU
O or ei
= 'W0521

lD 145
( ~5 i~~
i~ J

eg e2

( :~ J = 120 (:~ J

35

gives -l~~~ ~ -2:~: ~ or., = ~ ( j J
Also, ea = (2/J5, -l/J5, 0)'.
(c)

3 68 -9
(4
8J
= Ý150 ( _~ J (J. vk j, J + Ý120 ( ~ J (to ~ - to J

2.24

a)

;-1 = ~

'9

( 1

c)

For ~-l
+:

À1 = 4,

a
1

a

b)

n

À1 = 1/4,

À2 = 9 ~

À3 = 1,

':1 = (1 ,O,~) i

À2 = 1 /.9, ':2 = (0 ~ 1 ,0) ,

À3 = 1,

el
-3

= (OlO~l)1

=l=('~O,OJ'
=2 = (0,1,0)'
=3 = (0,0,1)'

36

2.25

Vl/2 "(:

a)

a

3 4/15 1/6 1

(:

0

0Jt 1 -1/5 4flJ (5

2

a -1/5,

a

3 4/15 1/6

= (~:
a)

-.2 .26~
- 2

1

il

.1'67

' 1 67 i

" ~:i'67

V 1/2 .e v 1/2 =

b)

2.26

2

OJ ( 1 -1/5 4fl5J
o 'if.= ,-1/5 1 1/6=

a

1

° OJ i5

-1

1/6 0

2

° = -2/5

2

1 a

a

3 4/5

1/2

4/3) (5 a
1/3 a

2

3 a

0

:J

-2
4

n =f

1

1/2 i /2
P13 = °13/°11 °22, = 4/13 ¡q = 4l15 = ,2£7

b) Write Xl = 1 'Xl + O'X2 + O-X3 = ~~~. with ~~ = (1 ~O~O)

1 1 i , i 1 1

2 x2 + 2 x3 = ~2 ~ W1 th ~2 = (0 i 2' 2" J

Then Var(Xi) =al1 = 25. By (2-43),

~

1X 1X ,+ 1 2 1 .19

Var(2" 2 +2" 3) =':2 + ~2 =4 a22 + 4 a23 + '4 °33 = 1 + 2+ 4

15
= T = 3.75

By (2-45) ~ (see al so hi nt to Exerc,ise 2.28),

1 1 i 1 1

Cov(X, ~ 2Xi + 2 Xi) = ~l r ~2 = "'0'12 +"2 °13 = -1 + 2 = 1

~o

37

1Xl +1'2 X2) =

Corr(X1 ~ '2

2.27

, 1

COy(X" "2X, + '2X2) 1
.103

~r(Xi) har(~ Xl + ~ X2) =Sl3 :=

a)

iii

- 2iiZ ~

aii

b)

-lll

+ 3iZ ~

aii + 9a22 - 6a12

c)

iii + \12 + \13'

d)

ii, +~2\12 -. \13,

+ 4a22 - 4012

aii + a22 + a3i + 2a12 + 2a13 +2a23

aii' +~a22 + a33 + 402 - 2a,.3 - 4023

e) 3i1 - 4iiZ' 9a11 + 16022 since a12 = a .

38

2,31 (a)
E¡X(l)J = ¡,(l) = ¡ :i (b) A¡,(l) = ¡ 1 -'1 1 ¡ ~ J = 1

(c)
COV(X(l) ) = Eii = ¡ ~ ~ J

(d)
COV(AX(l) ) = AEiiA' = ¡i -1 i ¡ ~ n ¡ -iJ = 4

(e)

E(X(2)J = ¡,,2) = ¡ n tf) B¡,(2) (~ -iJ ¡ n = ¡ n

(g)
COV(X(2) ) = E22 = ¡ -; -: J

(h)

COV(BX(2)) = BE22B' = ¡ ~ -~ J (-; -: J (-~ ~ J - (~: -~ J

0)
COV(X(l), X(2)) = ¡ ~ ~ J

(j)

COV(AX(1),BX(2))=AE12B'=(1 -1) ¡~ ~J ¡ _~ n=(O 21

39

2,32 ~a)

EIX(l)j = ILll) = ¡ ~ J (b) AIL(l) = ¡ ~ -~ J ¡ ~ J = ¡ -~ J
(c)
Co(X(l) ) = En = l-i -~ J

td)
COV(AX(l)) = AEnA' = ¡ ~ -¡ J ¡ -i -~ J L ~~ ~ J - ¡ i ~ J

(e)

E(Xl2)j = IL(;) = ( -~ J (f) BIL(2) = ¡ ~ ; -~ J ( -~ i = ¡ -; J

(g)
COV(X(2) ) = ~22 = 1 4

( -1
6 10 -~1 i

(h)

COV(BX(2) ) = BE22B' ,

= U i -~ J (j ~ -~ J U -n 0)
CoV(X(1),X(2)) = ¡ l ::J ~ J

(j)

COV(AX(l) i BX(2)) = AE12B'

¡ 12 9 J
9 24

40

- U j J H =l n (¡ j J - l ~ ~ J

2,33 (a)

E(X(l)j = Li(l) = ( _~ J (b) Ati(l) = L î -~ ~ J ( _~ J - ¡ ~ J

(c)

Cov(X(l¡ ) = Eii = - ~ - ~

( 4 i 6-i~J

(d)
COV(AX(l) ) = Ai:iiA' ,

¡234)
= (î -~ ~) (-¡ -~!J (-~ n -

4 63

(e)
E(X(2)J = ti(2) = ¡ ~ ) (f) Bti(2) = ¡ ~ -î J ¡ ~ J = I ; )

(g)

Co( X(2) ) = E" = ¡ ¿ n
(h)
CoV(BX,2) ) = BE"B' = U - î ) L ¿ ~ J D - ~ J - I 1~ ~ J

41

(i)

COv(X(1),X(2))= -1 0
1 -1
( _1
0 J

ü)
COV(AX(l), BX~2)) = A:E12B1

= ¡ 2 -1 0 J (=!O J

1 1 3 i1 -1
0

¡ ~ - ~ J = ¡ -4,~ 4,~ J

42

2.34

bib = 4 + 1 + 16 + a = 21,-did
- = 15 and bid = -2-3-8+0 = -13
(ÉI~)Z = 169 ~ 21 (15) = 315

2.35

bid

- -

biBb
-

= -4 + 3 = -1

= (-4, 3)

=

L: -:J

(-:14

23)

( -~ J · 125

(-~ J
2/6 ) il )

d I B-1 d

=

(1~1) 2/6

11/6

=

2/6 1

( 5/6

--'

so 1 = (bld)Z s 125 (11/6)" = 229.17

2.36 4x~ + 4x~ + 6xix, = x'Ax wher A = (: ~).
(4 - ).)2 - 32 = 0 gives ).1 = 7,).2 = 1. Hence the maximum is 7 and the minimum is 1.

2.37

From (2~51),

max
x'x=l
- -

X i Ax =

max

~ 'A!
~13

= À1

~fQ

where À1 is the largest eigenvalue of A. For A given in
2.7 ~ Ài = 10 and

-1 x I x Fl
Exercise 2.6, we have from Exercise

el . (.894, -,447), Therefore max xlAx = 10 andth1s
maximum is attained for : = ~1.

2.38
Using computer, ).1 = 18, ).2 = 9, ).3 = 9, Hence the maximum is 18 and the minimum is 9,

43

2.41 (8) E(AX) = AE(X) = APX = m

o OJ
(b)

Cav(AX) = ACov(X)A' = ALXA' = (~

18 0
o 36

(c) All pairs of linear combinations have zero covarances.

2.42 (8) E(AX)

=

AE(X)=

Apx =(i

o OJ
(b)

Cov(AX) = ACov(X)A' = ALxA' = ( ~

12 0
o 24

(c) All pairs of linear combinations have zero covariances.

44

Chapter 3
3.1

a) ~ = (:)

b) ~, = ~, - i,! = (4 tOt -4) i

':2 = ~z - x2! = (-1 t '. 0) I

c)

et

L = m.

L = 12

..1

:2

Let e be the angl e between .:, and :2' then èos ~e) ~
-4//32

x 2 = -.5

:, 22 ~2

Therefore n s" = L2 or $" = 32/3; n S = i2 or S22 = 2/3;
n 5'2 = ~i':2 or slZ = -4/3. Also, riZ = cos (e) = -.S. Conse-

quently
S = and
n -4/3
2/3R =-.5

1
(32/3 -413) "( 1" -.5)

3.2

a) g = (;J

b) :1 = II - xl! = (-', 2, -11'

~2 = l2 - xz! = (3, -3, 0)'

c)

L =/6; L =11

':1 ~2

Let e be the angle between ':1 and ~2' then eOs (e) =

-9/16 x 18 = - .866 .
Therefore n 31, = L!l or s" = 6/3 = 2; n 522 ~ l~ or szi =

= 18/3 = "6; n ši.2 = :~ -:2

or :5'2 = -9/3 = -3. Also, r1Z =

~"s (e) = _ .8:6'6. Consequently So =( Z -3) and R= ( 1 - .8661
-3 '6 . -.86'6 1 J

45

3.3

xl !

II = (1, 4, 4)';

= (3,3. 3);

Thus

li

=

4

=

a)

l'.(~

1 .--

+

3
3

4

3.5

_2J

3

1

5

5

- l1&
X

:) ;

3

(: i :J

_.) - ')'

~2J
S .l6

0

=e

-:)E ~;o1. -4
( 32

1

-:J

and l sIc: l2

-2
6

b)

ii ! + (ll - xl l)

1

2S=(X-xl CX-xl
.. ..
so

II - ii ! = (-2, 1, 1 J'

i l' · (;

1: =

, (34

-2

4

;J

1

:J ~

". ,. .. ..

2 S = (X - 1 x')' ( X-I ?) =

so S =.. ( 3 -9/2 J

-1

3

~

-3

-1

0

2
(-31

-3

-:J

e -9)

= -9 18

and Isl = 2.7/4

-9/2 9

- N 3 -1 2 N

3.6 a) X'- 1 x' = r -~ ~ -~ J. Thus d'i = (-3, 0, -3),

!2 = to, 1, -1) and ~/3 = (-3, 1,2) .
Since,Ši = .92 = 23' the matrx of deviations is not offull ra.

46

15J

-3

b)
2 S =

(X -..
1 X')'
X-I-xl) = ( ~ ~
"i ( øw
15

So

S = -3/2 1
-1/2
( 15/2
9 -3/2.

2

-1

-1

l4

1'5/2)
-1/2
7

. .

Isl = 0 (Verify). The 3 deviation vectors lie in a 2-dimensional
subspace. - The 3-dimensional volume ,enclosed by ~he deviation

vectors 1 s zero.

c) Total sample varia-nce = 9 + 1 + 7 = 17 .

-

3.7

All e11 ipses are 'centered at

i) For S = (: : J '

-x .
-4/9J

S";1
~ (-:~:

519

Eigenvalue-normalized eigenv~ctor pairs for 5-1 are:

À1 = 1. ;1 = (.707, -.707)
À2 = 1/9, !~ = (.707, .7n7)

Half lengths of

axes

.. - ..-

of ellipse (x-x)'S-l(X-X) S 1

are l/Ir = 1 and l/~ = 3 respectively. The major axis
of ell ipse 1 ies in the direction of ~2; the minor axis
1 ies in the direction of :1.

if)

For

s=
,
s
=
-4
5
( 5 -4) -1 .

4/9)
4/9
(5/9

Eigenvalue-normal ized eigenvectors for

5/9
5-1

Ài = 1. :~ = (.707. .707)
i

À2 = 1/9, ~2 = (.7~7, -.7Ð7)

are:

47

Half l~ngths ~faxes of ell ipse (x....
- x)'S-l ...
(x - x) ~ 1 are,
of the

again. l/lr = 1 and 1/1. = 3. The major axes

ellipse li.es in the direction of ':2; the minor axis lieS'

in the directi~n of =1. Note that ~2 here is =1 in
"part (i) above and =1 here is =2 in part (i) above.

o 3 0 l/3

iii) For S = (3 0),. S-l = (1/3 OJ
Eigenvalue-normalized eigenvector pairs for 5-1 are:

).1 = 1 13; ~i = (1, 0)
).2 = 1/3, !~ = (0. lJ

axes

Half lengths of

(x....
- x)' 5-1_..
(x - x) s 1

of ellipse

are

equal and given by l/ir = l/lr = 13. Major and minor
"
axes of ellipse can be taken to lie in the directions of

the

coordinate axes. Here, the salid ellipse.is, fn fact, a solid
sphere.
Notice for aii three cases 1s1 = 9.

3.8

a)

Total

sample variance in

both cases is

3.

0

b)

For

S. G

1

0

Isl = 1

~J.
-1/2

For S =(-1~2

-1/2

1

- l/2

-1/2J

-1/2 ,
1

Isl = 0

48

3.9 (8) Vve calculate æ = (16,18,34 l and

-4 -1 -5
2

2

4

Xc= -2 -2 -4
4
0

0

4

1

1

and we notice coh( Xc)+ coh( Xc) = cOli( Xc)

so a = fl, 1, -1 J' gives Xca = O.
(b)

S = 1~ 2.~ 5~~ so S = -(13)2(2.5) _ 9(18.5) -55(5.5) = 0

13 5.5 18.5 "

( J I I 10(2.5)(18.5) + 39(15.5) + 39(15.5)
As above in a)
Sa = ( 3 ~ ;53 -= 5~~ J - ¡o~ J
13 + 5.5 - 18.5

( c) Check.

3.10 (a) VVe calculate æ = (5,2,3 J' and

-2 -1 -3
1

Xc= -1
2
0

2
0

-2
1

3

-1

and we notice coh( Xc)+ ~012( Xc) = cOli( Xc)

0
1

so a = iI, 1, -1 J' giv.es Xca = O.
~b)

S =. 0 2.5 2.5

soI-S-(2.5)3
_ 0 -+(2.5)3
I - 5(2.5)2
0 + 0 = i)

( 2.52.5
2.5 .0 2.55 J
Using the

save coeffient vector a as in Part a) Sa = O.

49

(c:) Setting Xa = 0,

3ai + a2 = 0
7ai + 3ag = 0 so
5ai + 3a2 + 4ag = 0

ai 5ai

g
-"jag
3(3ai) + 4ag = 0

so we must have ai = as = 0 but then, by the first equation in the fil"t
set, a2 = O. The columns of the data matrix are linearly independent.

1 4213 J

3.11

Con~equently

S =

14213
i14808

15538

o)

09:70) ;
01/2 = (121 ~6881

R =

124 .6515

( 09:70

and

0-1/2 =

(" 0:82

00:0 J

The relationships R = 0-1/2 S 0-1/2 and S = 0'12 R 01/2
can now be verifi ed by direct matrix multiplication.

50

3.14. a) From fi rst pri nciples we hav.e

f ~l · (2 3) (~J' 21
-

Similarly Ë' ~2 = 19 and Ë' ~3 = 8 so
sample mean =

2l+19+8 = 16
3

sampl~ vari ance =

(21_16)1+(19-16)2+(8-16)2 = 49
2
I

Also :' ~1 · (-1 2) (~J = -7;

C -2
X=
_

1

and :' ~3 = 3

so

sampl e mean = -1
sampl~ variance = 28

Finally sample covariance = (21-16)(-7+1)+(19-16)(1+1)+(8-16)(3+1) =

-2.8.

b) ~-I= (5

.

2)

Using (3-36)

and

S · ( ~: -12 J

51

sample mean of b' X =~' ~. (2 3) (:1 = 16

sample mean of :' ~ = (-1 2) (:1.-1

sample variance of b' X · ~' S~ · (2 3) e: -121(: 1 = 49

sample variance of C' X = :' S:.' (-1 21 C: -121 (";1 · .28

'"

sample covariance of ..
b' X..
and..
c' X

:b'Sc=(23) " . =-28

- -, l6
-2-2J1 (-11
2

Resul ts same as those ; n part (a).

3.15

-2.5
E · (;1.

S = -2.5
1.5 -1.5
(13
1

sampl e mean of -b.-X= 12
sample mean of c.
X = -1
- samp1 e variance of b' X = l2

sampl e vari ance of c' X = 43
sample covariance of b' X and c' X = -3

1.SJ
-1.5
3

52

3.16

S 1 nee

tv =E(~ -~V)(~ -~V)'

I , I I)

= E(~ - ~V - ~V~ +~VJ:V

, 'E(V' )" ,

,,,,

;: E(~ ) - E(~)!:V - ~V _ +~V!:V

:: E(~ ) - !:VJ:V -: !:V~V + ~V!:V

= E(~') - !:V!;V. '
we have E(VV') = * + !;V!;~ .

3.18 (a) Let y = Xi+X2+X3+X4 be the total energy consumption. Then

y=(1 1 1 l)x=1.873

,
s~ =(1 1 1 I)S(1 1 1 1) =3.913

(b) Let y = Xl -X2 be the excess of petroleum consumption over natural gas

consumption. Then
y=(i -1 0 0)x=.258
,

s~ =(1 -1 0 O)S(1 -1 0 0) =.154

S3

Chapter 4
4.1 (a) We are given p = 2 i

2 -.8 x V2i J

¡i=(;J E=¡ -.8 x J2

50

I E I = .72 and

E-1 = ~:

i
(i
1
2
2V2
2
)
(27l) .72 2 .72.9 .72
( 4:
i V2)
:7

I(:i) = V: exp -- ( -(Xi - 1) + -(Xl - i)(X2 - 3) + -(X2 - 3)2)

1
(
)2
2V2(
2
2
.72.9 .72

-(b)

- Xl - 1 + - Xl - 1)(x2 - 3) + -(X2 - 3)

4.2 ta) We are given p = 2 ,

I' = (n E =(

2
1

V2

and
2

L-l =

v'

~)

so I E I = 3/2

-4

. V2"J
2
-T

:i = (27l)'¡3/2
3Xi -23Xl(X2
+ 3 X2 - 2) )
I(
) i (exp1-2"(2
2V2- 2)4()2
(b)

2
2
2V2
4
2
3 3 3

-Xi - -Xi~X2 - 2) + -(X2 - 2 )

~c) c2 = x~(.5) = 1.39. Ellpse centered at (0,2)' with the major ax liav-

in, haif-length .¡ c = \12.366\11.39 = 1.81. The major ax lies
in the direction e = I.SSg, .460)'. The minor axis lies in the direction
e =i-Aß-O , .B81' and has half-length ý' c = \I;ô34v'1.S9 = .94.

54

Constant density contour that contains
50% of the probability

oc?
I.

~
C'
x ~0

..I.

..o
-3

-2

-1

o

1

2

3

x1

4.3 We apply Result 4.5 that relates zero covariance to statisti~a1 in-

dependence
a) No, 012 1 0
b) Yes, 023 = 0
c) Yes, 013 = 023 = 0

d) Yes, by Result 4.3, (X1+XZ)/Z and X3 are jointly normal and

their covariance is210
= 0. (0 ,
1 +1a.
3 2 ¿3

e) No, by Result 4.3 with A = _~ 1
to see that the covari anc.e i~ 10 and not o.

_ ~ ), form A * A i

ss

4.4

a) 3Xi - 2X2 + X3 is N03,9)
b) Require Cov (X2,X2-aiXi-a3X3) = : - a, - 2a3 = O. Thus any

~ i = tai ,a3J of the fonn ~ i
requirement. As an example,

4.5

= (3-2a3,.a3J wi 11 meet the

-a'

= (1,1).

a) Xi/x2 is N(l'(XZ-2),~)
b) X2/xi ,x3 is N(-2xi-5, 1)

c) x3lxi ,x2 is N(¥x1+X2+3) ,!)

4.6 (a) Xl and X2 are independent since they have a bivariate normal distribution
with covariance 0"12 = O.

(b) Xl and X3 are dependent since they have nonzero covariancea13 = - i.
~c) X2 and X3 are independent sin-ce they have a bivariate normal distribution
with covariance 0"23 = O.

(d) Xl, X3 and X2 are independent since they have a trivariate normal distribution where al2 = û and a32 = o.
te) Xl and Xl + 2X2 - 3X3 are dependent since they have nonzero covariance

au + 20"12 - 3a13 = 4 + 2(0) - 3( -1) = 7

4.7 (a) XilX3 is N(l + "&(X3 - 2) , 3.5)

.(b) Xilx2,X3 is N(l + .5(xa - 2) ,3.5) . Since)(2 is independent of Xi, conditioning further on X2 does not change the answer from Part a).

S6

4.16 (a) By Result 4.8, with Cl = C3 = 1/4, C2 = C4 - -1/4 and tLj = /- for
. j = 1, ...,4 we have Ej=1 CjtLj = a and ( E1=1 c; ) E = iE. Consequently,
VI is N(O, lL). Similarly, setting b1 = b2 = 1/4 and b3 = b4 = -1/4, we

find that V2 is N(a, iL).
(b) A.gain by Result 4.8, we know that Viand V 2 are jointly multivariate
normal with covariance

4
(1
1
-1
1
1
-1
-1
-1
)
( L bjcj ) L = -( -) + -( - ) + -( -) + -( - ) E = 0

j=1 4 4 4 4 4 4 4 4

That is,
( ~: J is distributed N,p (0, (l; l~ J )
so the joint density of the 2p variables is

I( v¡, v,) = (21l)pf lE I exp ( - ~(v;, v; J (l; l~ r (:: J )

1 . (1 i -1 i -l ) )

= (27l)pl lE I exp - s( VI E Vl + V2 E V2

4.17 By Result 4.8, with Cl = C2 = C3 = C4 = Cs = 1/5 and /-j - tL for j = 1, ...,5 we
find that V 1 has mean EJ=1 Cj tLj = tL and covariance matrix ( E;=1 cJ ) .L =

lL.
Similarly, setting bi = b3 = bs = 1/5 and b2 = b4 = -1/5 we fid that V2 has
mean ì:;=i bj/-j = l/- and covariance matrix ( ¿:J=1 b; ) L = fE.
Again by Result 4.8, we know that Vi and V2 have covariance

4
(1
1
-1
1
1
1
-1
1
1
1)
1
(~b'c.)L=
-( -)5+-5.(-)5.Jl -(
- )+-(
.;; J 1 "5
5 -)5+-(
5"5
'5 5-) ~=-E
25

57

4.18 By Result 4.11 we know that the maximum 1 He1 i hood estimat.es of II

and

and t are x = (4,6) i

1 L - -)'
n

_n j=l
(x.-x)(x.-x
= t tmH~J)(m-mHm-(~J)((:1-(m'
-J - -J.(GJ-m)(~H~J) '.((~J-m)mimn .

= t tc~J Gi aj.~¥o -i).m(i j) .(~JfP 1)1

b) From (4-23), ~ - N~(~,io t). Then ~-~ - N~(~,io t) and
finally I2 (~-~) - Nô(~,t)
c) From (4-23), 195 has a Wishart distribution with 19 d.f.

4.20 8(195)B' is a 2x2 matrix distributed as W19('1 BtBt) with 19 d.f.
where

1 1 1
1 1 1 1 1'1
1 1 i

a) BtB i has

(1,1) entry =011 + ~22 + tf33 - 012 - G13 + Z'23

l'

(1 ,2) entry = -r14 of :t.24 +tf34 -'ZOlS +:tZ5 +-r35 +?'l ô - za26 - f13'ô

(2,2) entry = 0ô6 + :t55 + tf44 - °46 - °S6 + zc45

°131 .

b)

stB'
°31
=l °11

G33J

S8

4.21 (a) X is distributed N4(J.1 n-l~ )
(b) Xl - J- is distributed N4\OI L ) so ( Xl - J. )'L-1( Xl - J. ) is distributed
as chi-square with p degrees of freedom.

(c) Using Part a)i

( X - J. )'( n-1L )-l( X - J. ) = n( X - Jl )'~-l( X - J. )
is distributed as chi-square with p degrees of freedom.

(d) Approximately distributed as chi-square with p degrees of freedom. Since
i L can be replaced by S.
the sample size is 1

arge

59
4.22' a) We see that n = 75 is a sufficiently lar"ge sample (compared
with p)and apply R,esult 4.13 to get Iñ(~-!:) is approximately

Hp(~,t) and that ~ is approximately Np(~'~ t).

c i -1(- )

By (4-28) we ~onclude that ýn(X-~) S ~-~ is approximately

b)

X2

p.

4.23 (a) The Q-Q plot shown below is not particularly straight, but the sample

size n = 10 is small. Diffcult to determine if data are normally distributed
from the plot.

Q-Q Plot for Dow Jones Data
30

.

.

.

.

20

.

-C

10

.

)C

.

0

.

.

-10

.
-20
-2

-1

0

1

q(i)

(b) TQ = .95 and n = 10. Since TQ = .95 ~ .9351 (see Table 4.2), cannot reject

hypothesis of normality at the 10% leveL.

2

60

4.24 (a) Q-Q plots for sales and profits are given below. Plots not particularly

straight, although Q-Q plot for profits appears to be "straighter" than
plot for sales. Difficult to assess normality from plots with such a small
sample size (n = 10).
Q-Q Plot for Sales
300

250

a.'I.

200

~

.

150

.

100

50

-2

-1

o

2

1

q(i)

.. Q"4 P1Ót for l)rofits

.

.

lS

.

10

-2

~1

o

1

2

q(i)
(b) The critical point for n = i 0 when a = . i 0 is .935 i. For sales, TQ = .940 and for

profits, TQ = .968. Since the values for both of these correlations are greater

than .9351, we cannot reject normality in either case.

61

4.25 The chi-square plot for the world's largest companies data is shown below. The
plot is reasonably straight and it would be difficult to reject multivariate normality
given the small sample size of n = i O. Information leading to the construction of
this plot is also displayed.

5

4
1i
is 3

g

'"

l!

u 2

'!

o
1

o
o

2

1

3ChiSqQuantii.
4 5

6

303.6 -35576 J
x = 14.7

S = 303.6

710.9
(155.6J

(-35576
7476.5

Ordered SqDist

.3142
1.2894
1.4073
1.6418
2.0195
3.0411
3.1891

4.3520
4.8365
4.9091

26.2 -1053.8

-l053.8 237054

Chi-square Ouantiles

.3518
.7978
1.2125
1.6416
2.1095
2.6430
3.2831
4.1083
5.3170
7.8147

7

8

62

x=( 12.48
5.20J s=(
10.6222 -17.7102
J s-I 1.2569
=(2.1898 .7539
1.2569
4.26 (a)
' -17.7102
30.8544'

J

Thus dJ = 1.8753, 2.0203, 2.9009, .7353, .3105, .0176, 3.7329, .8165,
1.3753, 4.2153

50% contour.
(b) Since xi(.5) = 1.39, 5 observations (50%) are within the

(c) The chi-square plot is shown below.
CÍ1i~squåre pløt for

.
.

..
.
. 2

(d) Given the results in pars (b) and (c) and the small number of observations
(n = 10), it is diffcult to reject bivarate normality.

4.27

q-~ plot is shown below.

63

*

*
*

100.
* 2
2 2

so.

*"..

*3*

**2*

:;3 3

2*

2*

60.'

40. *

*
*

* *

:\

20.
\
-2. S

i

I

-1.S

-0.5

0.5

i.5

\'a(i)
%.5

Since r-q = .970 -i .973 (See Table 4..2 for n = 40 and .a = ..05) t
we would rejet the hypothesis of normality at the 5% leveL.

64

4.29
(a).
x = (~~4~:~~:~)' s = (11.363531 3~:~~:~~~).
Generalized distances are as follows;

2.3771
0.8162

1 . 6283

0.4135

o . 47£ 1

1. 1849

1.3566

o .6228

5.6494

o . 8988

4. 7647

3.0089

6.1489

1 .0360

2.2489

3.4438
0.1901

o .4607

1 .8985

2 .7783

8.4731

1.1472
0.6370

o . 6592
O. 1388
7 . 0857
o . 7032

0.3159
2.7741
0.8856

o .4607

o . 6592

10.6392
0.4135
1.0360

0.1388
0.1225
o . 7874

O. 1380
O. 1225

1 .4584
1. 80 14

(b). The number of observations whose generalized distances are less than X2\O.ti) = 1.39 is
26. So the proportion is 26/42=0.6190.

(c). CHI-SQUARE PLOT FOR (X1 X2)
8

w
a:

8

~
~

4

c

2

0
0

2

4

6

8

10

~saUARE
4.30 (a) ~ = 0.5 but ~ = 1 (i.e. no transformation) not ruled out by data. For

~ = 1, TQ = .981 ~.9351 the critical point for testing normality with
n = 10 and a = .10. We cannot reject the hypothesis of normality at
the 10% level (and, consequently, not at the 5% level).
(b) ~ = 1 (i.e. no transformation). For ~ = 1, TQ = .971 ~.9351 the critical

point for testing normality with n = 10 and a = .1 O. We cannot reject the
hypothesis of normality at the 10% level (and, consequently, not at
the 5% level).

(c) The likelihood function 1~Â" --) is fairly flat in the region of Â, = 1, -- = 1
so these values are not ruled out by the data. These results are consistent with
those in parts (a) and (b).
n-n niot~ follow

65

4.31
The non-multiple-scle"rosis group:
X2

X3

X4

Xs

0.96133Xi3.S

0.95585(X3 + 0.005)°.4

0.91574X¡3.4

0.94446-

Xl

X2

X3

0.91137

0.97209

0.79523-

X4
0.978-69

Xs
0.84135-

Xi
0.94482X-o.s
1

rQ

(Xs + 0.'(05)°.32
Transformation
*: significant at 5 % level (the critical point = 0.9826 for n=69).

The multiple-sclerosis group:
rQ

-

-

-

(X5 + 0.005)°.21
(X3 + 0.005)°.26
Transformation
*: significant at 5 % level (the critical point = 0.9640 for n=29).

Transformations of X3 and X4 do not improve the approximaii-on to normality V~l"y much

because there are too many zeros.
4.32
Xl

X2

X3

X4

rQ

0.98464 -

0.94526-

0.9970

0.98098-

Transformation

(Xl + 0.005)-0.59

x.¡0.49

*: significant at 5 % level

-

XO.2S
4

X6

Xs
0.99057

-

0.92779(Xs + 0.ûå5)0.Sl

(the critical point = O.USïO for n=98).

4.33
Marginal Normality:
rQ

Xl

X2

0.95986*

0.95039-

X3
0.96341

X4
0.98079

*: significant at 5 % level (the ci"itical point = 'Ü.9652 for n=30).
Bivariate Normality: the X2 plots are

(X31 X4) appear reasonably straight.

given in the next page. Those for (Xh X2), (Xh X3),

66

CHI-SQUARE PLOT FOR (X1,X3)

CHI-SQUARE PLOT FOR (X1,X2)
8

8

6

w

"

~
.¿

~

~

8

i:
c
"

is

is

2

2

0

0

2

0

"

8

6

CHI-SQUARE PLOT FOR (X2,X3)

CHI-SQUARE PLOT FOR tX1 ,X4)
8

8

6

w
a:

w

~

c
~.¿

"

:f

6

"

is

CJ

2

2

0

0
0

2

"

8

8

"

2

0

12

10

8

10

12

CHI-SQUARE PLOT FOR (X3,X4)

CHI-SQUARE PLOT FOR (X2,X4)
8

8

8

w

6

a:

c
~

i:

~

8

e-SOARE

e-SOARE

w

10

e-SORE

e-SOUARE

i:
c~

8

6

"

2

0

10

"

"

:f
(.

:f
CJ

2

2

0

0
0

5

10

e-SOUARE

15

0

2

"

e-SORE

6

a

67-

4.34
Mar,ginal Normality:
Xl

rQ. 0.95162-

X2

X3

X4

0.97209

0.98421

0.99011

Xs
0.98124

X6

0.99404

*: significant at 5 % level (the critical point == 0.9591 for n==25).

Bivariate Normalitv: Omitted.
4.35 Marginal normality:

& (MachDir) X;i ,(CrossDir)

Xl (Density)

.991 .924*

rQ I .897*

* significant at the 5% level; critical point = .974 for n = 41

From the chi-square plot (see below), it is obvious that observation #25 is a
multivariate outlier. If this observation is removed, the chi-square plot is
considerably more "straight line like" and it is difficult to reject a hypothesis of
multivariate normality. Moreover, rQ increases to .979 for density, it is virtually
unchanged (.992) for machine direction and cross direction (.926).

Chi-square Plot
3S

:l
25

:!
15

Chi-square Plot without observation 25
10

6

10

12

2

4

6

B

10

12

68

4.36 Marginal normality:

100m 200m 400m 800m
rQ I .983 .976* .969* .952*

1500m 3000m Marathon
.909* .866* .859*

* significant at the 5% level; critical point = .978 for n = 54

Notice how the values of rQ decrease with increasing distance. As the distance
increases, the distribution of times becomes increasingly skewed to the right.
The chi-square plot is not consistent with multivariate normality. There are
several multivariate outliers.

4.37 Marginal normality:

100m 200m 400m 800m

rQ I .989 .985 .984 .968*

1500m 3000m Marathon
.947* .929* .921*

* significant at the 5% level; critical point = .978 for n = S4
As measured by rQ, times measured in meters/second for the various distances

are more nearly marginally normal than times measured in seconds or minutes
(see Exercise 4.36). Notice the values of rQ decrease with increasing distance. In
this case, as the distance increases the distribution of times becomes increasingly
skewed to the left.

The chi-square plot is not consistent with multivariate normality. There are
several multivariate outliers.

69

4.38. Marginal and multivariate normality of bull data

Normaliy

of Bull Data

A chi-square plot of the ordered distances

o
C\

r:l/

.¡ ..
'C

CI

~0

-ë ..

o

lt
.

..'

~'"

.... . .

2 4 6 8 10 12 14 16 18
qchisq

..
I/

r = 0.9916 normal

00
..C'

C\
_I/
01

"8 0

;; 0I/

~
::
u.

II

0010

=-

:i

ai 0

..

-2

.1

0

2

1

-2

Quantiles of Standard Norml

0
II

not nonnal

r = 0.9631

0

-1
Quantlles of

1

2

Standard Nonnal

I/
c:

r = 0.9847 nonnal

r = 0.9376 not nonnal

..

c:

ai
I/
u. iu.

~

a.

~~
oX 0
ai

0i-

C\

.0

..
d

lt

co

-2

0
co
II

0
1
Quantiles of Standard Nonnal
-1

2

. ...
.2

-1

0

1

2

Quantiles of Standard Nonnal

00
..01

r = 0.9956 normal

lt

r = 0.9934 normal

00
_ is:
..
Gl

_I/
:i
CI
co

Æ ;g

¡¡ 00
en
I/
..

C\

I/

0
I/

00
..C'
-2

-1
0
1
Quantiles of Standard Nonnal

2

-2

-1

0

1

Ouantiles of Standard Norml

2

70

XBAR

S

FtFrBody
100.1305
8594.3439
2 . 9600
209.5044
-0 .0534
-1. 3982
2.9831
129.9401
82.8108 6680. 3088
YrHgt

5-0.5224

995.9474
70.881-6

0.1967
54. 1263
1555.2895

1

2

3
4
5
6
7
8

2 . 9980
100 . 1305

Ordered
dsq qchisq
1 . 3396 0.7470
1. 7751 1.1286
1 . 7762 1.3793
2.2021 1 .5808
2.3870 1.7551
2.5512 1 . 9118
2.5743 2.0560
2.5906 2.1911
2. 7604 2.3189
3.0189 2.4411
3 . 0495 2.5587

9
10
11
12 3 . 2679
13 3.2766
14 3.3115
15 3.3470
16 3 . 3669
17 3.3721
18 3.4141
19 3 . 5279

2 .6725

2.7832
2.8912
2.9971
3.1011
3 . 2036
3 . 3048

3.4049

20

3.5453

3 . 5041

21

3 . 6097

3 .6027

22
23
24
25

3.6485
3.6681

3 . 7007

3 . 7236

3. 7983
3. 8957

3.7395

3.9929

3.4142 -0.0506

SaleHt
SaleWt
2.9831
82.8108
129.9401 6680 . 3088
3.4142
83 .9254
-0.0506
2.4130
4.0180
147.2896

83.9254 2.4130

147.2896 16850.6618

PrctFFB
BkFat
2 . 9600 -0.0534
209.5044 -1.3982
10.6917 -0.1430

-0.1430

Ordered
dsq qchisq
26 3.8618 4 .0902
27 3 . 8667 4.1875
28 3 .9078 4.2851
29 4.0413 4 .3830
30 4.1213 4.4812
31 4. 1445 4.5801
32 4 . 2244 4.6795
33 4.2522 4 . 7797
34 4.2828 4 . 8806
35 4.4599 4.9826
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

4. 7603
4. 8587
5. 1129
5 . 1876

5.2891

5 . 0855

5. 1896
5 . 2949

5 .4017
5 .5099

5 . 3004

5.6197

5.3518

5 . 7313

5 .4024

5 .8449
5 .9605

5.5938

6.0783
5.6333 6.1986
5 .7754 6.3215
6.2524 6.4472
5 .6060

6 . 3264

6.57'60

6.6491

6.7081

o .0080

Ordered
dsq qchisq
51
52
53
54
55
56
57
58

59

6 . 6693

6 . 8439

6.6748 6 .9836
6 .6751

6.8168

7 . 1276
7 . 2763

6 . 9863

7 .430 1

7. 1405

7 .5896

7 . 1763

7 . 7554

7.4577 7 .9281
7.5816 8.1085
7 .6287 8.2975

60
61
8 . 0873 8 . 4963
62
8 .6430 8 .7062
63
8 . 7748 8 .9286
64
8.7940 9. 1657
65
9.3973 9.4197
66
9 . 3989 9.6937
67
9 .6524 9.9917
68 10.6254 10.3191
69 10.6958 10.6829
70 10.8037 11. 0936
71 10.9273 11.5665
72 11.3006 12.1263
73 11.321$ 12.8160
74 12.4744 13.7225
75 17.6149 15.0677
76 21.5751 17.8649

From Table 4.2, with a = 0.05 and n = 76, the critical point f.or the Q - Q plot correlation coeffcient test for normality is 0.9839. We reject the hypothesis of multivariate
normality at a = 0.05, because some marginals are not normaL.

71

4.39 (a) Marginal normality:
independence

support benevolence

conformity leadership

.997 .984*

.993 .997

rQ I .991

* significant at the 5% level; critical point = .990 for n = 130

(b) The chi-square plot is shown below. Plot is straight with the exception of
observation #60. Certainly if this observation is deleted would be hard
to argue against multivariate normality.

Chi-square plot for indep, supp, benev, conform, leader
15

.
. ...

10

. ~&

..

..

"..

du)A2
5

..

o
o

2

4

6

8 10 12 14 16 18
q(u-.5)/130)

(c) Using the rQ statistic, normality is rejected at the 5% level for leadership. If
leadership is transformed by taking the square root (i.e. 1 = 0.5), rQ = .998 and
we cannot reject normality at the

5% leveL.

72
4.40 (a) Scatterplot is shown below. Great Smoky park is an outlier.

o

G..u;t .;..01/.0':

500

o -...
.'

.

cO

"Visitors
5 6

8

7

9

(b) The power transformation -l = 0.5 (i.e. square root) makes the size
observations more nearly normaL. rQ = .904 before transformation and

rQ = .975 after transformation. The 5% critical point with n = 15 for the
hypothesis of normality is .9389. The Q-Q plot for the transformd
observations is given below.

10

-1

1

(c) The power transformation ~ = 0 (i.e. logarithm) makes the visitor
observations more nearly normaL. rQ = .837 before transformation and
rQ == .960 after transformation. The 5% critical point with n = 1"5 for the

hypothesis of normality is .9389. The Q-Q plot for the transformed
observations is .given next.

73

(d) A chi-square plot for the transformed observations is shown below. Given
the small sample size (n = i 5), the plot is reasonably straight and it would be
hard to reject bivarate normality.
.. _.,..,.'...... -,
....,...
.........'..,....
".. .. ",-,-,-,
.......:.,.

transformed nat

Chi-square plot for

..
o
Ð

1

..

2 3 4 5
Chi~square quantiles

6

7

74

4.41 (a) Scatterplot is shown below. There do not appear to be any outliers with the
possible exception of observation #21.

(b) The power transformation ~ = 0 (i.e. logarithm) makes the duration
observations more nearly normaL. TQ = .958 before transformation and

TQ = .989 after transformation. The 5% critical point with n = 25 for the

hypothesis of normality is .9591. The Q-Q plot for the transformed
observations is given below.
Dutå60n

Q-QPlatfotNatural Log

3.0

2.5

(I
5'
o~ 2.0

1.5

1.0

-2

-1

o

q(i)

1

2

7S

(c) The power transformation t = -0.5 (i.e. reciprocal of square root) makes the
man/machine time observations more nearly normaL. rQ = .939 before

transformation and rQ = .991 after transformation. The 5% critical point with
n = 25 for the hypothesis of normality is .9591. The Q-Q plot for the
transformed observations is given next.
ManlMachinl! Time

,QeQ',plot for Reê:iprocal of Square Root of

..
..
. .

. .

..
.2

-1

o

2

1

q(i)

(d) A chi-square plot for the transformed observations is shown below. The plot is
straight and it would be difficult to reject bivariate normality.
Ci., ",:,-,..,::'_',::-...','__d"":,'d,.'.' _ ',' "',, , ' ,,:':::;_:,":':'"::-_'..,.,..,'.:.;c_',:::,.,:;,',"',""" ._,_, _ "".:;..:',_'"':::',, -,-," 'J;'g'~

Chi-square plot for transformed,snow,rernovat data ...0 0'0
10

8

6

o

......
o

.....

. .. .

..

2

. . .

3 4 5
Chi-squa.~ quanti

6

7

8

7-6

Chapter 5

-i:13).
5.1 .) ~ "

(i60) ; s" (-i-:/3
f2 = 1 SO/ll = 13.64
b) T2 is 3ri,2 (see (5-5))
c) HO :~. ~ (7,11)

a =- .05 so F2,2(.05~ - 19.00
Since T2 _ 13.64 (;' 3FZ,2(.05) = 3(19) =57; do not reject H1l at
the a - .05 1 eve1
n

(n-1)!.J.g1 (:j-~O)(:j-~O)'!
5.3

a)

TZ ;.

n ,- - (n-1) = 3(~4) - 3 = 13:64

!j=l
r (x.-i)(xJ.-i)'!
.
-J - - -

b)

li - (I Jïi (~j-~)(~j-~) 'I 'r =

244 =.0325
(44)2

- I j~i (~r~~H~j-~o)' i/

Wil ks i 1 ambda '" A2/n = A1/Z = '.0325 - .1803
5.5 HO:~' = (.'5'5,;60); TZ = 1.17
a -.05; FZ ,40( .05) ;. 3.23

Since TZ '" 1.17 (; 2~~) F2,40( .05) =- 2.05(3.23) = 6.62,

we do not reject HO at the a" .05 level. The r,esult is ~onsistent
with the 9Si confidence ellipse

-

for ~ pi~tured in Figure 5.1 since

\11 = (.'55,.60) is inside the ellipse.

77

-1(- )

el
X-\1 .:
- CI-S -0

5.8

f227.273 -181.8181

t18L818 212.121 J

.603 .60
((.Sti4 J -( .5'5 J )

-1.909
= (2.636
J

tZ = n(~'(~-~O))Z =

a'
- SA
-

42(~.£,'3L. -1.9a'"J . )
(.014)
.003 2

= 1.31 = TZ

r2.636 -1 9091 .(.0144 .01 i71f2.6361
1. ':J .0117 .0146jL-i.909j

5.9 a) Large sample 95% T simultaeous confdence intervals:

Weight: (69.56, 121.48) Girt: (83.49, 103.29)
Body

leng: (16.55, 19.41)

lengt: (152.17, 176.59) Head

Neck: (49.61, 61.77) Head width: (29.04, 33.22)
b) 95% confidence region determined by all Pi,P4 such that

(95.52 - ,up93.39
~93.39
12.59/61
= .2064
- .006927--,u4
.019248
- P4
L,002799
.006927J(9S.52
-Pi)
Beginng at the center x' = (95.52,93.39), the axes of

the 95%

confidence ellpsoid are:
major axis

.,

minor axis

:t .J3695.52.Ji 2.5 9(' 939)
.343

:t .J45.92.J12.59(- .343)
.939

(See confidence ellpsoid in par d.)
c) Bonferroni 95% simultaneous confidence intervals (m = 6):
160 (.025 / 6) = 2.728 (Alternative multiplier is z(.025/6) = 2.638)
Weight: . (75.56, 115.48) Gii1h: (86.27, 100.51)

Body lengt: (155.00, 173.76) Head length: (16.~, 19.0g)
Neck: (51.01, 60.37) Head width: (29.52, 32.74)
d) Because ofthe high positive correlation between weight (Xi) and girt~X4),
the 95% confidence ellpse is smaller, more informative, than the 95%

Bonferroni rectangle.

78

5.9

,Continued)

Large sample 95% confidence regions.

0
....

large sample simultaneous
Bonferroni

..0
LO

0
..0
"'

-- --- -~ -------- - - - - ---- - ----

LO

)(

C'

0
C'

I
:
,
i

I
:
i
:
:,
I.

- - - - - - - - - - - - - - - - -.

LO
CD

I

. . . . .. .. . . . . .'. .. . . . .. . ... . . . . . . . . . -' . . . . . . . . . . . . "l . . . . . . ~

0
CD

60

70

80

100

90

110

120

130

x1

e) Bonferroni 95% simultaneous confidence interval for difference between
mean head width and mean head lengt (,u6 - tls ) follows.
(m = 7 to allow for new statement and statements about individual means):
t60 (.025/7) = 2.783 (Alternative multiplier is z(.025/7) = 2.690)

n 61

x6 -xs :tt60 . J

- (0036) S66 - 2sS6 + sss = (31.13 -17.98) +_ 2.78~~2i.26 -2(13.88) + 9.95

or

12.49:: tl6 -,us:: 13.81

79

5.10 a) 95% T simultanous confidence intervals:
Lngt: (13D.65, 155.93) Lngt4: (160.33, 185.95)
Lngt3: (127.00, 191.58) Lngt5: (155.37, 198.91)

b) 95% T- simultaneous intervals for change in lengt (ALngt):

~Lngth2-3: (-21.24, 53.24)

~Lngt-4: (-22.70, 50.42)
~Lngth4-5: (-20.69, 28.69)
c) 95% confidenceregon determined by all tl2-3,tl4-S such that

. 16-tl2_3,4-tl4_s
( i.Oll024 .009386J(16.- ~72.96/7=10.42
,u2-3)
.009386 .025135 4 - ,u4-S

where ,u2-3 is the mean increase in length from year 2 to 3, and tl4-S is
the mean increase in length from year 4 to 5.
the 95% confidence

Beginnng at the center x' = (16,4), the axes of

ellpsoid are:
maior axis

.~~.895)

:tv157.8 72.96.

- .447

mior axis

:t .J33.53.J72.96 .
( .895
447)

(See confidence ellpsoid in par e.)

d) Bonferroni 95% simultaneous confdence intervals (m = 7):
Lngt: (137.37, 149.21)

Lngt4: (167.14, 179.14)

Lngth3: (144.18, 174.40)

Lngth5: (166.95, 187.33)

..6Lngth2-3: (-1.43, 33.43)

i1Lngth4-5: (-7.55, 15.55)

i1Lngt3-4: (-3.25, 30.97)

'80

-5.10 (Continued)

e) The Bonferroni 95% confidence rectangle is much smaller and more
informative than the 95% confidence ellpse.

95% confidence regions.

o
"l
simultaneous T"2

o
C"

Bonferroni

...... ..........

....................................................

0C\
.,
vI

,.0

::

0

0,.
I

0C\
I

I
I
I
I
I
I
I
I
I
I

.
I
;
I
:
I
;
I
;
: I
i
:

I
I
I

--~--------- ----------------~
.. . , . .~. . ... .. , ... , , . , . . . J. . . . . .. . .. , . . . , , . . . . , . . , , , . , .. . . .

-20

o

20

J.2-3

40

81

5.11 a) E' =- (5.1856, 16.0700)

S = (176.0042 . 287.2412J;

S-1 =(

287.2412 527.8493

.0508

~ .0276 J

-.0276

.0169

Eigenvalues and eigenvectors of S:
,

,t = 688.759

~
--

A

.42 = 15.094

£1 = (.49,.87)
,

i. = (.87,-.49)

§i 16

Fp,n_p(.10) =: 7 F2.7(.10) = T (3.26) = 7.45

Confidence Region
45

40
35

~
L.

V)
'O
N
)(

!'

15 I 20 25 30 35 40 45

-10 -

,

I -10 J

x1 ( C r )

b) 90% T intervals for the full data set:
Cr: (-6.88, 17.25) Sr: (-4.83, 36.97)

(.30, 1

OJ' is a plausible value for i.
.r-

82
5.11 (Continued)
c) Q-Q pJotsfor the margial distributions of

both varables

.

oi
30

020

o. ......

10

.

-l. -UL .0.5 0.0 os 1.0 1.5

nomscor

normty for ths varable at a = 0.01

Since r = 0.627 we rejec the hypothesis of

80

.

7I
eo
50
u; 40

30

20
10
0

.
-1.

..

.

..

..

-1.0 .0.5 0.0 0.5 1.0 1.5

nomsrSr
Since r= 0.818 we rejec the hypothesis of

this varable at a = 0.01

normty for

d) With data point (40.53, 73.68) removed,

ii = (.7675, 8.8688);

.3786
S =b r .0303

1.0303 J

69.8598.

S-1·(2.7518
- .0406

-.0406 J
. 0149

-T p1n-p 'I

1. F (.10)= 7(62t F" 6(.10) '" 164 (3.4'6) ~. 8.07

90% r intervals: Cr: (.15, 1.9)

Sc: (.47, 17.27)

83

5.12 Initial estimates are

2 1.5

'ß - 6, ~ - 2.0 0.0 .
( 4 i - - (0.5 0.0 0.5 i

The first revised estimates are

'ß = 6.0000 , E = 2.500 0.0.

2.2500 1.9375

( 4.0833 i -( 0.6042 0.1667 0.8125 i

5.13 The X2distribution with 3 degrees of freeom.
Bonferroni interval = tn_i(a/2)/tn_i(a/2m).
5.14 Length of one-at-a time t-interval / Length of

n 2
15 0.8546
25 0.8632
-50 0.8691

100 0.8718
00 0.8745

m
4

10

0.7489
0.7644
D.7749
0.7799

0.6449
0.6678
0.6836
0.6911
0.6983

"0.7847

5.15

(0).

E(Xij) = (l)Pi + (0)(1 - Pi) = Pi.
Var(Xij) = (1 - pi)2pi +(0 - p¡)2(1 - Pi) = Pi(1 - Pi)
(b). COV(Xij, Xkj)

= E(XijXik) - E(Xij)E(Xkj) = 0 - PiPIi =-PiPk.

5.16

(6). Using Pj:: vx3.(0.05)VPj(1 - pj)ln, the 95 % confidence intervals for Pi, P2, 11, P4, Ps
are
(0.221, 0.370),(0.258, 0.412), (0.098, 0.217), (0.029, 0.112),\0.084, .a.198) respectively.

(b). Using Pi - ßi :l Vx3.(0.05)V(pi(1 - ßi) + ßi(1 - ßi) - 2ßiPi) In, the 95 % confdence
interval for Pi - P2 is (-0.118, 0.0394), There is no significant difference in two proportions.
5.17
ßi = 0.585, ßi = 0.310, P3 = 0.105. Using Pj:l vx'5(O.-D5)VPj(1 - Pi)fn, the 95 %.confidence

intervals for Pi, P2, 11 are "(0.488, 0.682), (0.219, 0.401), ('0.044, 0.lô6), respectively.

84

5.18

\lo). Hotellng's T2 = 223.31. The critical point for the statistic (0: = 0.05) is 8.33. We reject
Ho : fl = (500,50,30)'. That is, The group of students represented by scores are significantly
different from average college students.
(b). The lengths of three axes are 23.730,2.473, 1.183. And directions of corresponding ax..
are

-0.010 .

0.995 ,

0.103 .,

0.999 )
( -0.037

0.006 )
( -0.104

0.038 )
( 0.994

.(c). Data look fairly normaL.

..

700

70

/

60

I'

.I-

.

500

I"
ir

60

I'

I

;C

~

.Lf

xM

i

J
ii
o.

..."1

-1

0

-

.
--

15

-2

2

1

-.
.
-

-

.
.-

25
20

30

-2

-

.
.-

30

J

50
40

400

35

~

-1

0

.2

2

1

2

1

NORMAL SCORE

NORMA SCORE

NORMA SCORE

0

-1

.

.

700

600

-, ..
. .a. .

50' .
400

. ..~....

...-I. ...,.0....

. ... ...t
.... : .

.. .......... ..

60

..- .'1.

..e : .._

;c

... .a.

... I.. ....
x

500

400

I...
.. i. .:.-.

..

.

.. . ..

. .

.

:. 0.

60

.

N

x

I . 01

.. .

. .

...-..

.

.o .-.:
:.
..

i

.

1
0

I.oi o.0 ..0

. .! .

50

40

. . . .
. . .. 0

0
0

70

700

.

.. .

30

.o
30 40 50 60 70

15 20 2S 30 35

X2

X3

15

20

2S

30

35

X3

5.19 a) The summary statistics are:
361"621 .031

n = 30,

-x __ (18£0. 50J
~354 .13

and

s = (124055.17

361621 .03

348"6330.9'0 J

85

wher~ S has e i g~nva 1 ues and e; g~nv ect~rs
Å1 = 3407292

e~ = (.105740, .994394)
_1

Å2 = 82748

!2 = (.994394,-.1 0574~)

, n, '

Then, since 1 p(;:~) .Fp n_p(a) = 3~ 2~i) F2 2St .tl5) = .2306,

. -

is given by the set of \1

a 95% confidence region for ~
-

(1860.'50-\11' 8354.13-~2)
..
361621 .03 348633tl. 90 83~4 .13-~2

(124055.17 3'61~21.03J' ~1(1860.5tl-~lJ

. ~ .2306

half lengths of the axes of this ellipse are 1.2300 Ir = 886.4 and

The

l. 2306 .~ = 138 ~ 1. Th~refore the ell ipse has the form

-------_. --_...__.. . ...... --_..- .._._.. ".

-----------_.- ------_._-

-~

Ì"

/,.
'1

,

;
,
,.

:

--

:

f

j

;

'v..

,

:
;

~

;

,

:
,
I

!
;

...

I

!
i

.

i

.

i

,
i

,

,

,

,
I
:

,

-

,
,
i
:

,

,

,

¡

,~~.So .

:: --

-

l

I

,
i

"J

JI

:

!

,
!
i

J
.1

I
1

.

;

:

!

i
;

i

;

:

,
!

l

:

11

,
!

:

I

i

!
.
i

i

:

ß~5'4.13

.1

,

,

;
:

'.
i

;

,

:

i

:

:

~ I

:

i
:

';fJ"w

:

,

:
;

;

,
,

,

i

:

,

,

;

:

.

¡

'-

I

10000
:
;

/'
i

:

,

I

!
i

;

2,000

-

,

:
i

~E
.
,

:¿1,

I

'. . IOQ"

i

2,.aoø. '

,

. 3öOO '

I

'. l.øo.ft

Xl

:

86

b) Since ~O = (2000, 10000)' does not fall within the 9Siconfidence
ellipse, we would reJect the hypothesis HO:~ = ~O at the 5% level.
Thus, the data analyz~d are not consistent with these values.

c) The Q-Q plots for both stiffness and bending strength (see below)

show that the marginal normal ity is not seri ously viol ated. Ai so
;

coefficients for the test of normal ity are .989 and

the correlation

.990 respectively so that we fail to reject even at the ii signifi-

cance level. Finally, the scatter diagram (see below) does not indicate departure from bivariate normality. So, the bivariate normal
distribution is a plausible probability model for these data.

Q-Q Plot-Bend i n9 Strength
X2

12000. .

*

...._-----

* * *

10000.

**

**

. ._-* ***

---- -.- --

**

8000 .

..2--..' ..

*****
- _.. .,-------***"

*
._-"-

..._--- -_._-

. -._--_._--

* * *

6000.

- ...... *

4000. :i
-2.0

------_.._---

._--.._.._....... _.

l

,

-1.'0

t:rr.e 1 at; on .989

0.0

1.~

2.0

3.0

I.
!

87

Q-Q Plot-Stiffness

Xi

2800.
*
* *

2400 .

* *

***
*

::ooo.

..

***2
*2
**

**

****

1600 .

* *

*
."__0"" ._____. .._--_ -_._---"

1200.
*
*
800. .._----------- ._-- ..---"
..

-2.0

_ ____, ._-_ _=J.!.9..._

. _.._ . ..Correlation .. -.990 .

_......-_.. ._.__....__.- ~.-,
I

0.0

.

1.0

.....-._.. ---~------I--:.
2.0

._ _ _ _.._~ .9.___

'88

Sea tter 01 agram

-_.. . -...._....~.. ..-

- -------- - ...*.
2400 ;. .

--- ----_..-

**
*

*

*.

*

*

*

2000.

* *
* *

* ****

1600.

**

*

* *

*
*

.....-_.._-- -_..__...

*

**

. '.._-.- . ........_... .. . ...

-- ---.- -._----

1200. . *
*

..._-_..__....- _.. ......._. .. ,...__. ---------

800.
I

4000 .

6000.
._-.. -- ---- . _._........--

I

80ÖO.
. - - ..
10000.
_._-~.-:-

.__.,.. .1---0- -~r.
12000.
X2
.. . i 4000 .

89

5.20 (6). Yes, they are plausible since the hypothesized vector eo (denoted as . in the

plot) is inside the "95% confidence region. .
96li S1mullJeouB Cooldence Region for Wean Veclor

iiI

ii.
ii.
i .~
ii i

ii.
"

i.o
i ..

¥

i"
i' .
i, i

Ui
i' .

...

110

II .

'ii ,., ... 1'. ,.. "7 ...
iiu.

(11).

189 .822

UPPER
197 . 423

274.782

284.774

LOWER

Bonferroni C. i.:

Simultaneous C. i.:

189.422

197 .823

274. 25S

285.299

Simultaneous confidence intervals are larger than Bonferroni's confidence intervals. Simulfrom outside.
(c). Q~Q plots suggests non-normality of (Xii X2). Could try tra.nsforming XI.
taneous confidence intervals wil touch the simultaneous confidence region

Q-Q PLOT FOR X1

Q-Q PLOT FOR X2
310

-.

210

-.

300

.

200

..
.

)(
190

.

xN

.
---

.2

280

270

.

29
xN

..

.-

0

NORMAL SCORE

2

.. . .

'.

.

260

250
-1

. ..: .- . ...

280

270

..

260

. ..

180

r
./

..

300

J

290

.
..

.
..

310

. ..

250
-2

-1

0

NOMA SC-QRE

2

ISO 20
X1

.

90

5.21

HOTELLING T SQUARE - 9 .~218

P-VALUE 0.3616

T2 INTERVAL

xl

x2
x3
x4
x5
x6

N

2S
25
25
25
25
25

MEAN

0.84380
0.81832
1.79268
1.73484
0.70440
0.69384

STDEV

0.11402
0.10685
0.28347
0.26360
0.10756
0.10295

.742
.723
1.540

1. 499

.608
.602

.946
.914

2.046

Bonferroni

the

T2 intevals use the constant 4.465.

t

(

.778
.757

.642
.635

.786

.00417

)

TO

1. 629
1. 583

1. 970
.800

The

intervals use

BONFERRONI

TO

-

2.88 and

.909
.880
1. 95"6

1. 887

.766
.753

5.22

91

la). After eliminating outliers, the approximation to normality is improved.
a-a PLOT FOR X1

a-a PL-DT FOR X2

a-a PLOT FOR X3

30

18

2S

15

20

X

C/

W

. . .'

.

5

..
::
0

~

.'

10

a:

..
,.
_..

15

l-

-I

-2

0

..

10

111

.'.

..

--

x..

...

5

2

. .. .
-2

NOMA SCRE

..

14
12
10

,.'--

8
6

0

....

4

0

.1

2

-2

NOMA SCORE

::
~

.,.

0

-1

2

NORMA SCORE

l-

15

~

..

'.

10

. .'..

.

5

.

.l

8

...

10

5

..X

is

20

25

a:

W

..l::

a
l::
a::

..
ø. .

..

8

8
6

4

4

5

20

15

25

..

30

10

S

1S

a-a PLOT FOR X1

a-a PLOT FOR X2

a-a PLOT FOR X3

12
10
8
6

.. o.

..

..

..

.-

..-

~

. .

..

14
12
10
8

..

6

. .. .

4

2

.
2

0

-1

.2

14

..
.. ..

.

4

.

2
4

II

...

8 10

X

XI

x

..
. ...

6
0

2

,.
ILL

.. .
:. .

.

. ..

'"

.

x

14
12
10

.

6

II

4

4

4

8

8 10
X1

14

-I

0

2

NOMA SCRE

111

12
10

...""

.. ...

.
-2

iI

8

14

.1

14

'"

. ..

'"

NOM4 SCORE

l-

12
10
8

-_..

...

18
16
14
12
10
8
4

NORMA SCRE

II

10

I. .

.

X2

-2

~

xM

o.

X1

4

~

, ...
o

.

16
14
12
10

Xi

14

X

18

30

18

CJ

18
18
14
12
10

I. .

.

..

2

4

6 8 10
X2

14

92
l. Outliers remov.edi~

LOWER

UPPER

Bonferroni c. i.:

9.63
5.24
8.82

12.87
9.67
12.34

Simul taneous C. i.:

9.25
4.72
8.41

13.24
10.19
12.76

Simultaneous confidence intervals are larger than ßonferroni's confidence intervals.
(b) Full data set:

Bonferroni C. I.:

Simultaneous C. I.:

Lower Upper
9.79
15.33
5.78
10.55
8.65
12.44
9.16
5.23
8.21

15.96
11.09
12.87

93

5.23 a) The data appear to be multivanate normal as shown by the "straightness" of
the Q-Q plòts and chi-square plot below.

.

140 -

.
.c

't

CD

x

t'
:2

130 -

.
. .

.

.
..

.

.

.

.

140

.
. .

-

. .
.
. .
.
. .

or
0)

:i
Ul
lU

130

ID

.

.

.

.
120 -

.
-1

i

-2

-1

. I

i

0

1

.

120
2

NScMB

-2

-1

0

2

NScBH

~= .994

i- = .97'6

.

... .

55 110 -

-i

s:

. .
.
. .

C)
Ul
lU

m

100 -

.
90 -

.

.

.
. .

.

. .

-

.c
Ul
lU

50 -

Z

.
45 -

.
-T

.1

I

I

,

-2

-1

0

1

2

.

.

-2

'.

I

-1

0

!

r;= .995

10 -

.
5 _.

. ...

.. .

...

..

.. ...

. .

..
..

I

i

"U

5

.

.

.

.

. .

¡

10

.fe,4l.( -.5)/30)

i
2

NScNH
i. = .992

o -

.

.

.

NScBL

d¿)

.

C)

:i

.

.

.

.

94

5.23 (Continued)
b) Bonferroni 95% simultaneous confidence intervals (m = p = 4):
t29 (.05/8) = 2.663

MaxBrt:
BasHgth:
BasLngth:

NasHgt:

(128.87, 133.87)
(131.42, 135.78)
(96.32, 102.02)
(49.17, 51.89)

95% T simultaneous confidence intervals:

4(29) F (.05) = 3.496
26 4.26

MaxBrt:
BasHgt:
BasLngt:
NasHgth:

(128.08, 134.66)
(130.73, 136.47)
(95.43, 102.91)
(48.75, 52.31)

The Bonferroni intervals are slightly shorter than the T intervals.

9S

5.24 Individual X charts for the Madison, Wisconsin, Police Department data

LegalOT
ExtraOT

xbar
s
3557.8 ô06.5

LeL

UCL

5377.4
1478.4 1182.8 -2.070.0 5026.9

Holdover

2676.9 1207.7
COA 13563.6 1303.2

800.0 474.0

MeetOT

1738. 1

-946 . 2

6300 . 0
9654.0 17473.2
2222 . 1

-622. 1

use
use

L-CL = 0

LCL = 0

use LCL=O

The XBAR chart for x3 = holdover hours

0

ai

::

ii
;:
ii::

"C
":;

'e

.5

0
.0
CD
0
0
0
("
0
.0
0
..

.

.

.

.
.

.
.
...a................y........................;..........;..........................__......................;................................

.
. . .
----------------------------------------

.

2

4

6

8

10

12

14

16

Observation Number

The XBAR chart for x4 = COA hours
ai

::

ii
;:
ii::

"C
":;

'e
.5

0
0
0
,...
.0
0

a("

..
0
0
0
Q)

. . .
.
.
.
............................................................--;........................................................................................
.
. .
.
.
.
.

2

4

6

8

10

Observation Number

Both holdover and COA hours are stable and in control.

12

14

16

96

5.25 Quality ellpse and T2 chart for the holdover and COA overtime hours.

quality control 95% ellpse is

All points ar.e in control. The

1.37x 10-6(X3 - 2677)2 + 1.18 x 10-6(X4 - 13564)2
+1.80 X 1O-6(x3 - 2677)(X4 - 13564) =5.99.

The quality control 95% ellipse for

holdover hours and COA hours

00
0
r..

0
00
..
co

.
.
..

00
0
..It
0
in
0'I0
:i
0
..
J:
c(
0
u0 0
t'0
..
0
0
0
C\
..
00
0
T-

.
. .+
.

.
.
.

.

T-

-1000

0

1000

3000

5000

Holdover Hours

The 95% Tsq chart for holdover hours and COA hours
a:

r-

UCL = 5.991
ci .................. '''..n.....
............ ........ ..... ...... .._...... ..... ..............._.._...........__.

i:

in

t! 'It'

C\

o

97

5.26 T2 chart using the data on Xl = legal appearances overtime hours, X2 - extraordinary

event overtime hours, and X3 = holdover overtime hours. All points are in control.
The 99% Tsq chart based on x1, x2 and x3

o
..

................................................................................................................................................

.

CD

C'

~

co

v
N

o

5.27 The 95% prediction ellpse for X3 = holdover hours and X4 = COA hours is
1.37x 10-6(x3 - 2677)2 + 1.18 x 1O-6(x4 - 13564)2

+1.80x 1O-6(x3 - 2677)(X4 - 13564) = 8.51.
The 95% control ellpse for future holdover hours
and COA hours

0
..

0co0

0j

!!

:z

.
...

00

. .
. .+
.

0
..v

c(

0

()

0
..

0N0

o
o
o
o
..
-1000 0 1000

3000
Holdover Hours

5000

98

5.28 (a)

x=

-.506

.0626 .0616

.0474 .0083 .0197 .0031

-.207"
-.062

.0616 .0924

.0268 -.0008 .0228 .0155

.0474 .0268

.1446 .0078 .0211 -.0049

.0083 -.0008

.0078 .1086 .0221 .0066

.698

.0197 .0228

.0211 .0221 .3428 .0146

-.065

.0031 .0155

-.0049 .0066 .0146 .0366

-.032

s=

The fl char follows.

limit.

(b) Multivariate observations 20, 33,36,39 and 40 exceed the upper control

The individual variables that contribute significantly to the out of control data
points are indicated in the table below.

Point Variable P-Value
Grea ter Than UCL

20

33

Xl

X2
X3
X4
X5
X6
X4
X6

O. 0000

0.00.01

0.0000
0.0105
0.0210
0.0032

.0.0088
O. 0000
o . 0000
\) .0000

36

Xl

39

X2
X3
X4
X2
X4
X5
X6

40

XL

0.0000

X2
X3
X4

O. 0088

\). OO.QO

0.0343
0.0198
0.0001
0.0054
o . 000'0

0.0114
0.0-013

99

2 472' 2 29(6) .

5.29 T = 12. . Since T = 12.472 c: -- F6,24 (.05) = 7.25(2.51) = 18.2 , we do not

reject H 0 : ¡. = 0 at the 5% leveL.

5.30 (a) Large sample 95% Bonferroni intervals for the indicated means follow.

Multiplier is t49 (.05/2(6)):: z(.0042) = 2.635

Petroleum: .766:t 2.635(.9251,J) = .766:t .345 -7 (.421, 1.111)
Natural Gas: .508:t 2.635(.753/.J) = .508:t .282 -7 (.226, .790)

Coal: .438:t2.635(.4141.J) = .438:t.155 -7 (.283, .593)
Nuclear: .161:t 2.635(.207/.J) = .161 :t.076 -7 (.085, .237)
Total: 1.873:t 2.635(1.978/.J) = 1.873 :t.738 -7 (1.135, 2.611)

Petroleum - Natural Gas: .258:t2.635(.392/.J) = .258:t.146 -- (.112, .404)
(b) Large sample 95% simultaneous r intervals for the indicated means follow.

Multiplier is ~%;(.05) = .J9.49 = 3.081

Petroleum: .766:t3.081(.9251.J) = .766:t.404 -- (.362, 1.170)
Natural Gas: .508:t3.081(.753/.J) = .508:t.330 -- (.178, .838)

Coal: .438:t3.081(.414/.J) = .438:t.182 -- (.256, .620)
Nuclear: .161:t3.081(.207/.J) =.161:t.089 -- (.072, .250)
Total: 1.873:t 3.081(1.978/.J) = 1.873:t .863 -- (1.010, 2.736)

Petroleum - Natural Gas: .258:t 3.081(.392/.J) = .258:t .171-- (.087, .429)
Since the multiplier, 3.081, for the 95% simultaneous r intervals is larger than

given
interval is the same, the r intervals wil be wider than the Bonferroni intervals.
the multiplier, 2.635, for the Bonferroni intervals and everything else for a

100

5.31 (a) The power transformation ~ = 0 (i.e. logarthm) makes the duration
observations more nearly normaL. The power transformation t = -0.5

(i.e. reciprocal of square root) makes the man/machine time observations
more nearly normaL. (See Exercise 4.41.) For the transformed observations,
say Yi = In Xi' Y2 = 1/'¡ where Xl is duration and X2 is man/machine time,

- = p.171J

Y l .240

s = r .1513 -.0058J
l- .0058 .0018

, ,

S-i - r 7.524 23.905J
l23.905 624.527

The eigenvalues for S are Â. = .15153, Â. = .00160 with corresponding

eigenvectors ei = (.99925 - .03866), e2 = (.03866 .99925l Beginning at
center y, the axes of the 95% confidence ellpsoid are

maior axis:

:! IT
v Â.2(24)
F2 23 (.05) ei = :t.208el

..

r: 2(24)
:tvÂ.
F223(.OS)e2 =:t.021e2

mInor axis:

The ratio of

25(23) .

the lengths of

25(23) .

the major and minor axes, .416/.042 = 9.9, indicates

the confidence ellpse is elongated in the ei direction.

(b) t24 (.05/2(2)) = 2.391, so the 95% confidence intervals for the two component
the transformed observations) are:
means (of
Yi :tt24(.0125)¡; = 2.171:t2.391.J.1513 = 2.171:t.930 ~ (1.241, 3.101)

Y2 :tt24

(.0125)'¡ =.240:t2.391.J.0018 =.240:t.101 ~ (.139, .341)

Chapter 6
ii.1

WI

Ei9~nvalues andei9~nvectnrs of Sd are:

"1 = 449.778,

!1 = (.333, .943)

"2 = 168.082,

~~ = (.943, -.333)

Ellipse cent~rl!d at r = (-9.36,.13.27). Half length of major axis is
20.57 units. Half length of minor axis is 12.58 units. Major and minor

axes lie in :1 and !2 d;r~ctions, respetively.

Yes, the t.est answers the question: Is ô = 0 ins1tfe the 95i confi-

dence e 11 ipse 1

6.2 Using a critical value tn_i(cr/2p) = tio(O.0l25) = 2.6338,
UlWER

Bonferroni ~. I.:

-20 . 57

-2.97
Simul taneous 'C. I.:

-22 . 45

-5.70

UPPER

1.85
29.52
3.73
32.25

Simultaneous confidence intervals are larger than Bonferroni's confidence intervals.

6.3 The 95% Bonferroni intervals are
LOWER

Bonferroni 'C. i.:
Simultaneous C.I. :

UPPER

-21.92

-2.08

-3.31)

20.56

-23.70

-~ . 30

-5 .50

22.70

Since the hypothesize vector '6 = 0 (denoted as * in the plot) is outside the joint confidenæ
r.egion, we reject Ho : '6 = O. Bonferroni C.!. are consistent with this result. After the
elimination of the outlier, the difference between pairs became significant.

95% Simultaneous Conidence Region (or Della Vector

102

.3 0

M 20
U
1

2

10
M

U

2 0
2

- 10

-20 -10

-.3 0

MU11-MU21

o

6.3

Problem

6.4
(a). HoteHing's T2 _

10.215. Since the critical point with cr

Ho : ..
ó =...o.

(b).
Bonferroni C. I.:

T Simultaneous C. 1.:

..
1..
'.0

Lower

Uoner

-1.09
-0.04

-0.02

-1.18
-0.10

0.07
0.69

- 0.05 is 9.4'59, we reject

0.64

95% Cofidence Slips Ab the Me Vecor

...
...
..S

...
0.'
0.'

(o~O)

...
-0.'
-0.2
-0.4
-0."
-1.1i -1.5 -1.4 -1.3 -I.R -1.1 ....0 -o.S -0.. -0.7 _0.. -0.& -0.4 -0.8 -0.2 -e., 0.0 0.' .... o.a

..

ld ...."'1 -it

Figure 1: 95% Confidence Ellpse and

Diffence

'Simultaneous T2 Interv for the Mea

103

(c) The Q-Q plots for In(DiffBOD) and In(DiffSS) are shown below. Marginal

normality cannot be rejected for either variable. The.%2 plot is not straight
(with at least one apparent lJivariate outlier) and, although the sample size
"argue for bivariate normality.
(n =11) is small, it is diffcult to
a-a Plot
o.S

/'
....

ôo _o.s

//

-.,/

./

/////

../".

. .--~/

~/-

ê

..
i5

/

.. -I..

.//-'

~......,

.-......-

////_1.5 . r"''/
//
..../
//

_2.0

. ///..,,~

o 0.&

_1.5

lr' Qlt.. 1_

Q-Q Plots
1.25

1.00

..

0.76

~

0.60

..

0.25

CI

is
..

/.. //~'

.. /
//
//"/

_./

.....

../

_0.25
_0.50

/////"

/-,..

~/ ...//

../.....

../.-///-

o ..5
..1~...1_

Chi -squa-e Aot d th OderedDistcnce
d

.

0-11

,

3

,

-'-

4

.i
7

104

-1

.0--

6.5 a) H: Cii = 0 wher.e C = ('0
1

-~

). ~. = (~1'~2'~3) ·

-32.6)
i:x = (-11.2), CSt' = (55.5

- 6.9 -32.6

66.4

- -

T2 = n(Ci)' (ese' )-1 (ei) = 90.4; n = 40; q = 3
((~~~li)l) Fq_l.n_a+i(.05) = (3~~2 (3.25) = 6.67

o--

Since T2 = 90.4 ~ 6.67 reject H :Cii = 0
b) 951 simultaneous confidence intervals:

111 - 1-2: (46.1 - 57.3) :! -/6.67 J5~õ5 = -11.2 :! 3.0
1-2 - 1-3: ti.9 :! 3.3

111 - 1-3: -4.3 :! 3.3

The means are all different from one another.

IOS

6.'6

a)

Tr"eatment 2:

Sampl e mean vector

-3~2 J

(:l

sampl e covariance matrix

(-3;2
Trea tment 3:

Sampl e mean vector

samplè covariance matrix

(:) ;
-4/3 J

4/3

r~13

Spoled =

-1.4

( 1.6

b)

;1.~

TZ = (2-3, 4-2)

((1 + 1)

-1.4
(1.6

= 3.-88

r (:~J

("1 +n2-2)p _ (5)2 _
-("
p 1 ( .01) - 4 (18) - 45
1 +n2-p 1)- Fp'1n +n
2--

Since TZ = 3.88 ~ 45 do not reject HO=l2 -!!3 = ~ at the ci = .01
1 evel .

c). 99%simul taneous confi-dence intervals:

1121 - l1:n: (2-3) :! I4 Æ~+l)l.,- = -1 :16.5
1122 - l132: 2 :I 7.2

6.7

TZ = (74.4 201.6)

(45 + 55)

21505.5
( 1 1 (10963.!

21505.5 7;4.4
.
= 1'6 1
53661.3 _ 201 .'6 .
JJ-1( J

(ni +"2-2)P

1 FP'"l
+n 2-P1 ( .05)= 6 .~6
"1 +n2-p.

Since r2 = 16.1 ;) 6.26 reject HO:~l - t!2 = ~ at the ct = ..o level.

,. -1ldxi-it=
(- -).
&êrS
_ poo ~ - ;.,

106

.0026

(.001 7 J

6.8 a) For first variable:
trea t:nt

observation

(:

5

8

1

2

5

3

+

mean.

=

: 7)
=

(:

4

4

4

4

4

4

r.esidual

+

4) (2 2 2 2

4

+ 1 -1 a

+ -2 -2 -2
4

-1 -1 -1 -1

SS
· 1 92
mean

. SSobs = 246

. effect

SStr = 36

1 (0-1-12 20.-1
-2 J

SSres :: 18

For second variable:

5 55) (333 3

3 6 3 = 5 5 5 + -1 -1 -1

311355

(79 6 9 9) (5 5
SSobs = 402

5 5 -3 -3 -3 -3
SStr =84

SSmean = 300

3)(-1 1 ~2 1 'J
+ -1 2-1
1 -1 -1 1

SS
res· 18

Cross product contri butions:
275

48

240

-13

b) MANOVA tabl e:
Source of

Vari ation

SSP

Treatment

B = (36

Residual

W -

48

d.f.
48 J

-13J
- -13

rlB

3-1=2.

84

18

35)
Tota 1 (corr~ct~)

35 Hl2
(54

'5+3+4-3-9
n

107

* ~ 155

c) li = TäT = 4283 = .U362

Using Table 6.3 with p = Z and 9 = 3

(1 - IÃ \ (En 1 - 9 - ~ = 17 .02 .

\IK) 9-1) .

Since F4,16(.01) = 4.77 we:~onc1ude that treatmnt differences

exist at e = .01 l~vel.

Al ternat1vely, using Bartl ett' s procedure,

( (p+g.) (5) ( )

- n - 1 - 2 ) ln A* = - 12 - 1 - '2 1n .0362 = 28.i09

Since x;e .01) = 13.28 we again conclude treatment differences

exist at e = .01 level.

6.9 for!! matrix C

_ n..J n"J ..
a = 1 I: d. = C~ 1 1: x.) = C X

and

so

6.10

d. - a = C(x. - x)
..J"
-J" .

n- -J - -J - n-. ..J - ..J" .

. Sd =..1 r(d.-a)(d.-a)' = C(..1 r(x.-x)(x.-x)')C' = t:SC'

.. ... . . ..g

ei 1)'((xi-x)u1 + ... + (xg-x)u )

= x((x1-x)ni + ... + (ig -x)ngJ

= i(nix1 + ... + ngxg-x(ni + ....+ ng)).

= x(("l + ... + "g)i-x(ni +... +" )) = 0
. 9

108

6.11 l(~1'!:2,t) = L(~l ,t)l(~2';)

z~ ",)+nzlexp
2 lt ~;1
2 -1)
j 51.+( "2-1152)
=((
(", +"21p
(tr t-' ((",
+ ",(~, _~,l' t-'(~-~,l + "2(~2-~21' t-l(~2 - !:1)1

". _ A_

using (4-16) and (4-17). The likelihood is maximized with respect

to ~, and ~z at ~l = ~1 and ~2 = ~ respectively and with
. respect to * at

1
Qi+n2-~
12.12

i = n +n ((n1 -1)S, :l (nZ - 2)SZ) = n +n

S

poo 1 l!d

(for the maximization with respect to ; see Result 4.10 with

n,+nZ

b = 2 and B = (n1 -1)S, + (nZ - 2)52)

6.13 . a) and b) For firs.t variab1 e:

. factor 1

Observation = mean + effect +

factor 2 .. residual

effect

,. -2 4 -3J + 2 -1 0-1
-3 -4 3 -4.J 1 1 1 1 . -3 -3 -3 -3

1 -2 4 -3 -2 0 1 ,.
( : -: : ~l = (~. ~ ~ ~J + (-~ -~ -~ .~~J + (1 .:Z 4 -3'1 (0 1 -1 . OJ
SStot = 220 SSmean = 12

SSfac 1 = 104

SSfaC Z = 90

SSres=14

For second variable:

8233 = 3333 +
~ 6
-5T -3
-6 (3
3 33 33 3
(8
Z OJ
3J
SSt~t = 44()

1 -ZJ (-3 0 3 OJ
1 1 1 1 + 3-2 1 -2 + 1 ° -2 1
-6 (3-2
3 -2 1 -2 2 ° -1 -1
(-65 -6
5 -6
5 5)

SSmean = 1\) SSfac 1 = 248

SSfac Z:i 54

SSres · 3Q

109
Sum of ~ross products:

SCP tot = SCP m~an + SCP fac 1 + SCP fac 2 + SCP r~s

227 = 36 + 148 + 51 - 8
c) MANOVA table:

Source of
Variation

SSP

d.f.
9 ...1"=3 -1 =2

Factor 1

148 1481
248J
l04

b-l=4-1-3

Factor 2

51 51)
54
( 90'
(g-l )( b-l) = 1)

Residual

(14 ..8)
-8 .3 0

Total (Corrected)

r 208 1911

. gb - 1 = 11

L191 332J
d) We reject HO:!l =!2 = !3 = ~ at a = .05 l~vel since

((g-l )(b-l) _ (p+l2- (Q-1
= -(61_ 3-2)ln.
r~s
2 \ ))J1nA*
SSP fac
+( SSP
res
, ss
i. IÙ
ë: -5.5 1n ( 356 ) = 19.87 ~ X:( .05) = 9.49
.13204'
and concl ude there are factor 1 effects.

We al so reject HO:~l = ~2 = ~3 = ~4 = ~ at the ~ = .05 level

since

110

. K ~ res

_ ((g-l )(b-l) _ (p+l - 2
(b-l))R.nti*
-(6 _ 3-3)R.n~
res I \)"
2" i==SSPf
., +Iss,p
SSP
r.
0: -6 R.n ( 356 ~ = 17.77 ~ X~ (.05) = 12.59'
6887 .

and concl ude there are factor 2 effe~ts.

6.14 b) MANOVA Tabl e:

Source of

d."f.

SSP

Variation

1841
2

Factor 1

184

(496

208J

24)
'3

Factor 2

.24

36

.0
(32

4:)

(36

Interaction

.6

~S41

12

Residual

Total
c) Since

-84
(312

400J

(Corrected) 23 "
1 24 124J
688
(876

. G .

. I SSP I
-(gb(n-l) - (p+l - (g-l Hb-l n/2)R.nA*. =lSSP1,
-13 5tn
tn+ res
SSP
res

.. -13.5R.n( .808) = 2.88 0( xi: (.05) = 21.03 we ~.! reject
HO:!l1 = !12 = ... = !34 = ~ (no i!lteraction effects) a~ t~

a = .05 level.

111

Si nc~

-(gb(n-l)-(p+i-Cg-1))/2)R-nA*=-11.SLnfac
lssp 1
~e~sp
res r

. ( lssp 1 )

= -;1.SR-n(.24.47) =16.19 ~ XH.05) = 9.49 we rejet:t

-2 -3 -

HO:"t_1 = "C = "t = 0 (no factor 1 effects) at

the a. = .05

1 eve 1 .

Since

.
~
ISSPresl
)
-(gb(n-1)-(p+l-(b-l ))/2)Wi* :I -lZl lssp + SSP r
fac 2 r..s

:: -12R.n(.7949) =2.1fi 0( XU.05) = 12."59 we do not reJect
HO:~l = ~2 ? ~3 = ~4 = ~ (no factor 2 effects) at.the

a = . OS 1 eve 1 .

112
6.15 Example "6.1l. g. b · 2, n · '5;

a) For "0:!1 .;2 .~, A* D .3819

Since
'*

-(gb(n-l)-(p+l-(g-l))/2)tn A =-14.51n(.3819):2

· 13.96 :) X: (.05) = 7.81 .
we reject HO at a = .05 level. For HO: ~1 = ~2 = ~. 14*'= .'5230 and

:~4.5~n (.5230) :0 9~40. Again we reject "0 at a. '.05 level.
These results are consistent with the exact F tests.

-1 a

a--

6.16 H : Cll = 0; Hi: C!: 'I Q where,

c=U

1 -1

o 1

-1~J

.

. Suniary stati stics:
1 906.1

x

=

1749.5
1509.1
1725.0

1 05"625

.

,

S =

94759

87249

94268

1 01 761

761 6ô

81193

91809

90333
1043Z9

- -

r2 = n ( Cx p ( CSC i ) -1 (Cx) = 254.7

. ~(~:~ii)l) Fq_l.n_q+1(a) = (3~~it11F3,27(.05) = 9.54
Since T2 = 254.7 ,;) 9.54 we reject "a at C1 = .OS level.
95i simultaneous confidence interval for -dynamic. versus .static.

means h.11 + ll2) - (1.3 + 1.4) is, with :' = (1 1 -1 -1).

- I
( n-1
q-1
) ()
; I~ :t
(Il~q+
1) fHq-1
,n-q+l
a I rc
=n
= 421.5 :: 174.5 -- (247.

59ó)

113

Arabic G)

6.17 (a)

Q)

Format

ø

Words ø
Different

Same

Party
Effects

Contrast

Party main:

(¡.2 + ¡.4) - (#¡ + #3)

Format main:

(¡.3 + #4) - (¡¡ + #2)

Interaction:

(#2 + #3) - (#1 + #4)

Contrast matrx:

c= -1
-1
(-1

~1 ;1 J

S. T2 31(3) .

ince = 135.9;: -(2.93) = 9.40, reject H 0 : C,u = 0 (no treatment-effects)

at the 5% leveL.

29

(b) 95% simultaneous T intervals for the contrasts:
Party main effect:

-206.4:t.J9.40 20,598.6 -7~-280.3, -125.1)

32
Format main effect:

-307:t.J9.40 42,939.5 -7(-411.4, -186.9)

32
Interaction effect:

22.4:t.J9.40~9,8l8.5 -7 (-32.3, 75.0)
32

No interaction effect. Party effect-"different" resonses slower than
"same" responses. Format effect-"words" slower than "Arabic".
(c) The M model of numerical cognition is a reasonable population model for the scores.
(d) The multivarate normal model is a reasonable model for the scoresconesponding to
the party contrast, the format contrast and the interaction contrast.

114

6.18

Female turtle

Male turtle

A chi-square plot of the ordered distances
(0

. .

10

-ê ~
"C

..

C"

Q)

E

0

.

C\

..
...
...
0

...

. ...

.

..

.

co

.

C'

CD

A chi-square plot of the ordered distances

.

.

C'
~ (0

. . .

.

'C

..

. . .

Q)

~

Q)

"E

..

0

C\

0

0 . ....

4

2

6

8

10

..

0

. .

.......
2

6

4

qchisq

.

0
:2 ui

ë,
c
Q)

..

-i: ~.

:: co

(0

-2

.

..
.. ...

..

-..
..

ëi
i: ,.
Q)
.
.s

. ..

::~
(0

. .

2

1

-2

Quantiles of Standard Normal

..

... ....

..

. .

.
0

.

.

.. .

..

-1

10

CJ

co

.

...

.

..

.
.... ...

8

qchisq

0

-1

2

1

Quantiles of Standard Normal

.

.

co

..
~

.~ (0

-g ..

~

..

-'C ...
-.s.~ ~~

.. . .

:2

.

.

-2

...

.

.

....

... .

...

:5 10

..

0

-1

.
(0

1

2

~
~

.

-

..-C)

Q) .
.s

.- CJ

S ('

..

,.

M

.
-2

.

.

..

. .....

.

.

..

.. .
..

.

..

.. e.

.

.
-2

Quantiles of Standard Normal

..
..

.

..
..
.
.

. ..

...
.

0

-1

-2

1

Quantiles of Standard Normal

.

.
-1
1
0
2
Quantile of Stanar Noral

10

co

M

-,.
-.

10

-§ C"

ëü
..-10
.s ta

('
10

ia

('

..
.

~

..

...

....

...

.

.

.

. ..

.

.

.
-1
t)
1
Quantile ófStandard Norm::l

2

115

mean vector for f~males:

mean vector for males:

X1BAR

SPOOLED

X2BAR

4.9006593

4. 7254436

4. 6229089
3. 9402858

4.4775738
3.7031858

0.0187388 0.0140655 0.0165386
0.0140655 0.0113036 0.0127148
0.0165386 0.0127148 0.0158563

TSQ

CVTSQ F CVF PVALUE

85.052001

8.833461 27.118029 2.8164658 4. 355E-1 0

linear combination most responsible for rej ection
of HO has coeffici.ent vector:
COEFFVEC

- 43. 72677

-8.710687
67.546415

95% simultaneous CI for the difference

in female and male means

Bonferroni CI

LOWER UPPER
0.0577676 0.2926638
0.0541167 0.2365537
0.1290622 0.3451377

LOWER UPPeR
o .07ß8599 0.2735714
0.0689451 0.2217252
0.1466248 0.3275751

116

6.19

a)

~1 = 8.113 i :z = (~~:~:J i.
9.590
_ (12.219J

18. Hi8

223.0134 12.3664 2.9066

51 =

17.5441 4.7731

13.9'633

Sz = 25.8512 7.6857
6543
( 4.3623 . .7599 46.
2.3'621

J;

Spooled = 20.7458 5.8960
'( 15.8112 7.855026.5750
2.6959 J'

(('1
+ pool
l)S ed
)-1 =
n1, n2

.8745 -. 1525

. 5640 ,
L--i'0939 -.4084 -,0203)
,HO: lh - !!Z' = 2
, ((
)-1 (- ~l
-) -:2
= St'.92
Since T2 =(-:1-)-:2
'n 1+ 1
n2) Spooled

:) (n1+n£-2)p . _ (57)(3)
(ni+nZ-p-l) Fp,ni+n2-p-,t.Ol) - 55 F3.SS(.01) = 13.
we reject HO at the a = .01 1 evel. Thereis a diff~rence in the
(mean) cost vectors betw.een gaso1 ine trucks and dies.el trucks.

wax x = -1.ae

,a.) '" _s-l
(- -)
c: pool.ed
-1 - (
-2 3.5,8 J .
-4.48

117
c) 99% simul taneous conf;~ence interval sara:

~ll - ~21: 2.113 t 3.790
~12 - ~22: -2.650:! 4.341

~13 - ~23: -8.578:! 4.913

dl Assumpti on ti = t~.

Since 51 and 52 are quite differant, it may not be reasonab1.e
to pool. However, using "large sample" theory (n1 = 36~ n2 = 23)
wa have, by Result 6.4,

- - )) r: 1 l' )-1 (- - ( )) 1

(~l -.~2 - (~1 - ~2 'Lñ1 51 +ñ2 52 ~1 - ~2 - !:l -!:2 - xp

5inca

(- -) I ( 1 1 )-1 (- -) 2 ( )

. ~1 - :2 ñ1 5, + "2 S2 ~l - ~2 = 43.15 ~)(3 .01 = 11.34
we reject HO: ~l - ~2 = ~ at the a = .01 level. Thisis consistent with the result in part (a).

118

6.20

(I)
31.

260
260
L

a

240
I

1

2201

m

200i
160

.

.

..

.

....

. .
.
. .
... . . .... . ..I. ..
.. .
. .

260

.
..

..

.

300

280
'" i ngm

(b) The output below shows that the analysis does not differ when we delete the
observtion 31 or when we consider it equals 184. Both tests reject the null
hypothesis of equal mean difference. The most critical
linear combination
leading to the rejection of Ho has coeffcient vecor (-3.490238; 2.07955)'
and the the linear combination most responsible for the rejection of Ho is

the Tail difference.

(c) Results below.

Comparing Mean Vectors from Tvo Populations

rObS. 31 Delete~

T2 C
25.005014 5.9914645

Reject HO. There is mean difference

'951. simultaneous confidence intervals,:
LAELCI
Mean Diff. 1:
Mean Diff. 2:

LICIMD

LSCIMD

-11.76436 -1.161905

-5 . 985685 8. 3392202

RESULT

Coefficient Vector:

COEF

-3.490238
2.07955

(Tail difference)
(Wing difference)

119

~omparing Mean Vectors from Two Populations

1,,~\ ,i
lObs. 31 = 184.

T2 C
25.662531 5.9914645

Reject HO. There is mean difference
957'simultaneous confidence intervals:
LICIMD

LABELCI

Mean Diff. 1:
Mean Diif. 2:

LSCIMD

-11.78669 -1.27998
-6.003431 8.1812088

REULT

COEF

Coefficient Vector:

-3.574268
2. 1220203

..s.

95% Cofidence Ellips Ab th Me Veda

-:61 ..e.

~.

5

..

.. ..

.-.~ .

::

-eo

"f iL\" \ t."'

(d) Female birds ai~ g~flerally larger, since the confidence intervl bounds for
difference in Tails (Male - Female) are negative and the

confidence intervl

for difference in Wings includes zero, indicating no significance difference.

120

6.21 (a) The (4,2) and (4,4) entri~s in 51 and 52 differ .con-

siderably. Howev~r, "1 = n2 so the large sample approximation amounts to pooling.

( b) H 0 : ~1 - ~2 = ~ and H1: ~1 - ~2 t ~

T2 = 15.830 :) (3~~(4) F4,3S(.OS) = 11.47

so we reject HO at the ~. .OS level.

( c)

x ) -3.74
= .16
,. S-l
a
a:ed(X-1 _ -2
_ pool

. .01
(-.241

121

(d) Looking at the coeffic1.ents â1'Sii.pooled. whieh apply to

the standardi zed variables. we see that X2: long term interest
rate has the largest coefficient and therefore might be
useful in -classifying a bond as 'lhigh" or "mediumlt quality.

4+16
(e) From (b), T2 = 15.830. Have p = 4 and v = = 37.344 so, at the 5% level, the
.53556

critical value is
vp F (.05) 37.344(4) F (.05)=149.376(2.647)=11.513
v - p + 1 p,v-p+1 37.344 - 4 + 1 4,37.344-4+1 34.344

Since T2 = 15.830 ::11.513, reject H 0: I! -!J2 = 0, the same conclusion reached in
(b). Notice the critical value here is only slightly larger than the critical value in (b).
6.22 (a) The sample means for female and male are:

¡ 0.3136 J ( 0.3972 J

_
jS8' XM
_ 5.3296
XF'.1.1
= 2.3152
= 3.6876 .
38.1548 49.3404
The Hotellng's T2 = 96.487 ). 11.00 where 11.00 is a critical point corresponding
to cr = 0.0~5. Therefore, we reject Ho : J.i - J.2 = O. The coeffcient of the linear
combination of most responsible for rejection is (-95.600,6.145,5.737, -0.762)'.
(b) The 95% simultaneous C. 1. for female mean -male mean:
¡ -0.1697.234, 0.0025.2336 J

-1.4650835, 1.16348346

-1.87"60572, -0.8687428
-17.032834, -5.3383659

(c) \Ve cannot extend the obtained result to the population of persons in their midtwenties. Firstly this was a self selected sample of volunteers (rrienàs) and is not

even a random sample of graduate students. Further, graduate students are -probably

more sedentary than the typical persons of their age.

122

6.23 n1 = n2 = n3 = 50;p = 2, 9 = 3 (~epal width and petal Width)
responses only!

.30~ 1 .18576

~l = (3.428 J; S =. (.143£4 -.00474 J

x =
-2

· 0418 J
.326 J ;
( 1U70

~3 =

2.026 J
(2.974

.,

S2 =

(.09860

.03920

.0471i.4 J

S =
3

(.1 0368

.07563

NAllOVA Table:

Source
Trea tment

d.f.

SSP

-21 .820 J

B = (11.344

75.352

2

4.125 J

R.esidua1

. W = (16.950

14.729

Total

B+W = .
(28.294

.
232.64~
A =* ~ -l
= 2235.64

-17 ·

.147

695J

90.081 .

149

.104

Since (rni-p-2\ (1 - IÃ) ~ !.

P ~ IA 153.3 ~ 2.37 - F4 .292( .OS)
"Ie rej.ect Ho: !l =!2 =!3 at th~ ci. .05 level.

123

6.24 Wilks'

lambda: A* = .8301. Sinceg= 3,(90-4-2'(1~) = 2.049 is anF

4 A .8301
value with 8 and I~8 degrees of freedom. Since p-value = P(F:; 2.049) = .044, we
would

just reject the null hypothis Ho :.1"1 =!2 =.r3 = Q at the 5% level implyig

there is a time period effect.

Fstatistics andp-values for ANOVA's:

F p-value
MaxBrth:
BasHght:

BasLgt:

3.66 .030
0.47 .629
3.84 .025

NasHght:

O.LU .901

Any differences over time periods are probably due to changes in maximum breath

of skull (Maxrth) and basialveolar lengt of skull (BasLgt).
95% Bonferroni simultaneous intervals:
t87(.05/24) = 2.94

BasBrth

BasH;Et

BasLgth

m = pg(g-I)/2 =12,

£11 -£21 :

-1:t 2.94 1785.4(2- + 2-) -- -1:t 3.44

£11 - 'l31 :

-3.1:t 3.44

£21 -£31 :

- 2.1:t 3.44

£12 -£22 :

0.9:t 2.94 1924.3(2- + 2-) -- 0.9:t 3.57

£12 -£32 :

- 0.2:t 3.57

£22 - £32 :

- 1.1:t 3.57

'l13 - £23 :

0.lO:t2.94 2153(2-+2-) -- 0.1O:t3.78

87 . 30 30

87 30 30

87 30 30

'l13 - £33: 3.14:t 3.78
£23 - £33: 3.03:t 3.78

NasH T" - T,,: 0.30 :t 2.94 /840.2 (2- + 2-) -- 0.30:t 2.36
V 87 30 30
'l14 - £34: - 0..o3:t 2.36
£24 - £34: - 0.33:t 2.36

size over
time is marginal. If-changes exist, then these changes might be in maximum breath
and basialveolar lengthofskull frm time periods 1 to 3.

All the

simultaneous intervals include O. Evidence for changes in skull

The usual MA~OV A assumptions appear to be satisfied for thse data.

124

6.25

Without transftlrming the data, A * =IWI =.i 159 and F = 18.98.
IB+WI
Afer transformation, A * :: .1198 and F = 18.52. ~ .FO,98 (.05) = 1.93

There is a clear need for transforming the data to make the hypothesis tenable.
6.26 To test for paralle11 sm, consider H01: C~l = C~2 with C giv~n

by (.6-61).

C(~l - ~2) = - .167 ;

.947
2.014

(CS
c1r1 =
poo 1 es

.616J
1 . 144

L.674

.036
(-- .413J

2.341

11 = 9.58 ;) cZ = 8.0., we reject HO at the 11 = .05 level. The
excess electri~al usage of the test group was much low~r than that
of the control group for the 11 A.rl.. 1 P.M. and 3 P.M. hours.

The s imi 1 ar 9 A.M~ usage for the two groups contradi cts the

parallelism hypothesis. .
6.27 a) . Plots of the husband and wife profiles look similar but seem
di sparate for the 1 evel of acompanionat~ lnve' tha t you feel

for your partneru.

b) Parall el ism hypothesis HO: C~l = ~2 with C given by
(ó~l).

C (~1 - ~2) = -. 17 i

(- ·.33
13 J

.733
.870

CSpool~dCI
= (.685

.029 J

-.028
.095

fo r a = . 05, c 2 = 8.7 ( see (6-62)). Since

T1 = 19.58 ;) c~ = 8.7 we reject Ha at the a. .~ level.

125

"' .. ;t

6.28 T2 = 106.13 ~ 16.59. \Ve reject Ho :¡.i - J12 = 0 at 5% significance leveL. There
is a significant difference in the two species.

Sample Mean for L. torrens and L .carteri :

L. torrens
96.457
42..914
35.371
14.514
25 .629

9.571
9.714

L. carteri
99 . 343
43 . 743

39.314
14.657
30.000
9.657
9.371

Difference
-2.886
-0 .829

-3.943
-0.143
-4.371
-0 .086
o .343

Pooled Sample Covariance Matrix:
36.008

2.426 2.649
1.053 0.934
6.437 o . 692 1. 615 0.211 0.671
3.039 2.407 o . 274 0.229
13.767 0.565 o . 637
1. 213 0.914
0.990

14.595 6.078
16.639 2.764

3.675 9.573
2.992 6.101

Linear Combination of most responsible for rejection
of Ho: L. torrens mean - L. carteri mean = 0 is

(0.006,0.151, -0.854, 0.268, -~.383, -2.187, 2.971)'
951. S imul taneous C. I. for L. torrens mean - L.carteri mean:
UPPER
LOWER

-8 . 73

-4.80
-6.41
-1.84
-7.98
-1.16
-0 . 63

2.96
3.14
-1.47
1.55
-0 . 76

0.99
1.31

The third and fifth components are most responsible for rejecting Ho. The X2 plots look
fairy straight.

CHI-SQUARE

CHI-SQUARE PLOT FOR Lcarteri

fOR L.torr~ns

PLOT

'5

15

ä!
'"

:;

w
a:
'"

:;

'0

10

~
~
0

51

..:i
0

5
5

'5

'0

5

25

20

15

10

5

o

20

25

o-SOARE

O-SOUARE

6.29

(a).
S

XBAR

o .02548
o .05784

Summary Statistics:

0.01056

0.00366259 0.00482862 O. 0~154159
0.00482862 0.01628931 0.00304801
0.00154159 0.00304801 0.00602526

IIotellng's T2 = 5.946. The critical point is 9.979 and we fail to reject Ho :/£1 - Jl2 = 0 at
5% significance leveL.
(b). (e).

LOWER

-0.0057
-0.0079
-0.0294

Bonferroni C. i.:

Simultaneous C. i.:

UPPER
o . 0566

0.1235
o .0505

-0.0128
-0.0228

o . 0637

-0 .0385

o . 0596

0.1385

6.30

HOTELLING T SQUARE -

P-VALUE 0 .3616

9.0218
T2 INTERVAL

xl
x2
x3
x4
xS
xli

The

N

24
24
24
24
24
24

MEAN

0.00012
-0.00325

-0.0072
-0.0123

0.01513
o . 00017

.Bonfer~oni

STDEV

intervals use

the T in"tevals use

-.0443
-.0286

0.04817
0.02751
0.1030
0.0625
0.03074
0.04689

TO

- .1020

-.0701

-.01'30
- .0430

t

the constan-t

(

.0445
.0221
.0876
.0455
.0436
.0434

.(H)417

4.516.

BONFERRONI

)

-

TO

-.0283 .~285
-.0195 .0130
-.0679 .0535
-.0493 .0247
-.0'030 .0333
-.0275 ."0278
2.89 and

126

127

6.31 (8) Two-factor MANOVA of peanuts data
E = Error SSkCP Matrix
XL

X2

X2 49.365
X3
76.48

49.3"65

XL 104.205

I3
76.48
121.995
94.835

352.105
121. 995

XL X2
XL 0.7008333333 -10.6575
H = Type III SS&CP Matrix for FACTORl

X2 -10.6575 162.0675
X3 7.12916666"67 -108.4125

( Loco.+~ot')
X3

7.1291666667
-108.4125
72.520833333

Manova Test Criteria and Exact F Statistics for
the Hypothesis of no Overall FACTORl Effect
H = Type III SSkCP Matrix for FACTORl E = Error SSkCP Matrix

S=l

M=0.5

Statistic

N=l

Value
0.10651620
0.89348380

Wilks' Lambda
Pillai ' s Trace

Hoteiiing-La~iey Trace
Roy's Greatest Root

F

Num DF
3
3
3
3

11.1843

8 . 38824348

11. 1843
11. 1843

8.38824348

11.1843

Den DF

Pr ~ F

4
4
4

0.0205

4

o .0205

o .0205
o .OQ05

XL X2 X3

H = Type III SSkCP Matrix for FACTOR2. (Vat'~e.t~)

XL 196.115 365.1825 42.6275
X2 365.1825 1089.015 414.655

X3 42.6275 414.655 284.101666"67
Manova Test Criteria and F Approximations for
the Hypothesis of no Overall FACTOR2 Effect

H = Type III SS&CP Matrix for FACTOR2 E = Error SSkCP Matrix

S=2 M=O N=l
Statistic
Wilks' Lambda
Pillai's Trace
Hoteiiing-La~iey Trace
Roy's Greatest Root

Value
0.01244417
1.70910921

10.6191

21.375"67504
18.187"61127

10.6878
30.3127

XL X2

F

Num DF

Den DF fir ~ F

6
6
6
3

10 0.0011
6 0.0055

9 .7924

H = Type III SS&CP Matrix for FACTOR1*FACTOR2
XL

12
X3

205.10166£67 363.6675
3153 .~675 760.695

1~7 . 7~583333 254 . 22

X3

1'07.78583333

254.22
~5 . 95166"667

8 0..0019

5 0...012

128

Manova Test ~riteria and F Approximations for
the Hypothesis of no Overall FACTOR1*FACTOR2 Effect
H = Type III SS&CP Matrix for FACTOR1*FACTOR2 E = Error SS&CP Matrix

S=2 M=O N=1

Statistic
Pillai l s Trace

Value
0.07429984
1.29086073

Hotelling-La~iey Trace

7 . 54429038

Roy l s Createst Root

6.82409388

Wilks J Lambda

F

Num OF

3.5582
3.0339
3.7721
11.3735

6
6
6
3

Den DF Pr) F
8 0.0508
10 0.0587
6 0.0655
5 0.0113

seem large in absolute value, but

(b) The residuals for X2 at location 2 for variety 5

Q-Q plots of residuals indicate that univariate normality -cannot be rejected for all
three variables.
1 PRE2 RES2 PRE3 RES3

CODE FACTOR1 FACTOR2 PRED1 RES

a 1 5 194.80 0.50 160.40 -7.30 52.55 -1.15
a 1 5 194.80 -0.50 160.40 7.30 52.55 1.15

b 2 5 185.05 4.65 130.30 9.20 49.95 5.55
b 2 5 185.05 -4.65 130.30 -9.20 49.95 -5.55
c 1 6 199.45 3.55 161.40 -4.60 47.80 2.00
c 1 6 199.45 -3.55 161.40 4.$0 47.80 -2.00

d
d
e
e
f
f

2
2
1
1
2
2

6
6
8
8
8
8

200.15 2.55 1Q3. 95 2.15 57.25 3.15
200.15 -2.55 163.95 -2.15 57.25 -3.15
190.25 3.25 1"64.80 -0.30 58.20 -0.40
190.25 -3.25 1$4.80 0.30 58.20 0.40
200.75 0.75 170.30 -3.50 66.10 -1.10
200.75 -0.75 170.30 3.5066.10 1.10

Figu i: Q-Q P1ø - Red-lfot Yied
..

//"~//

..

/'

--

-.-

--/

,/
*//./

~y/-/

/-

./ ./~

.
--'I -i.._

"""/~

129
..

2: Q-Q Plot - Residual for Sound Mature Kernels /' .,

Figure

..//' /~/",/'

'//

.~.//-

.-......
.~~."'/.;
.' .'

.,/
//,,/

//'

//

/'
.../+
/~/

,... ..

.
_10....1_

//. //.'/'

..","

Figure 3: Q_Q Plot - Reidual for Seed Size

.../'

.

~.

..//,,,/

/ / /'

,//

'/'

.....

/,,/

y'/' /

../ ..

..:./ /,A

..4/.....

...'"
..,".'"

-iI a-...._

(c) Univariate two factor ANOV As follow. *Evidence of variety effect and, for Xl = yield
variety interaction.
and X2 = sound mature kernel, a location
Dependent Variable :

yield;

Sum of
OF

Squares

Mean Square

Model

5

401 .9175000

80.3835000

Error

6

1 04 . 2050000

17.3675000

11

506 . 1225000

Source

Corrected Total

Source

location
variety
location*variety

F

R-Square

Coeff Var

Root MSE

yield Mean

O. 794111

2.136324

4.167433

195.0750

OF
1

2
2

Type III 55

Mean Square

o . 7008333
196 . 11 50000

0.7008333
98.0575000

205. 1016667

102.5"508333

F

Value

Pr ~ F

4.63

0.0446

Value

Pr ~ F

0.04
5.65
5.90

0.8474
0.0418
O. 038~

130

Dependent Variable: sdmatker

Sum of
OF

~quares

Mean Square

F Value

Pr :. F

Model

5

~031 .777500

406 . 355500

6.92

0.0177

Error

6

352. 105000

58.684167

11

2383 . 882500

Source

Corrected Total
A-Square

Coeff Var

Aoot MSE

sdmatker Mean

o . 852298

4 . 832398

7 . 660559

158.5250

OF

Source

location
variety
location*variety

1

2

2

Type II I SS

Mean Square

162.067500
1089.015000
780.695000

162.067500
544.507500
390.347500

F

Value

Pr :. F

2.76
9.28
6.65

O. 1476

Value

Pr :. F

5.60

o . 0292

Value

Pr :. F

4.59
8.99
2.72

o . 0759

0.0146
0.0300.

The GLM Procedure
Dependent Variable: seedsize

Sum of
OF

Squares

Mean Square

Model

5

442.5741667

88.5148333

Error

6

94 . 8350000

15 . 8058333

11

537 .4091667

Source

Corrected Total

F

A-Square

Coeff Var

Aoot MSE

seedsize Mean

o . 823533

7.188166

3.975655

55 . 30833

Source

location
variety
location*variety

OF

Type II I SS

Mean Square

1

72 . 5208333

72.5208333
142.0508333
42.9758333

2
2

284.1016667
85.9516667

F

0.0157
0.1443

131

(d) Bonferroni ~simultaneous comparisoRs of va-ri.ety.
differ, and they differ only on X3.
Only varieties 5 and 8
Bonferroni (Dun) T tests for variable: XL
Alpha= O. Q5 Confiden~e= 0.95 df= 8 MSE= 38.66333
~ritica1 Value of T= 3.01576
Minimum Significant Difference= 13.26
'***' .
Comparisons si~ificant at the 0.05 level are indicated by

Simultaneous

Simutaneous

Difference

Upper

Confidence
Limit

Between

Confidence

Means

Limi t

-8 . 960
-3 . 385
-17 . 560

4.300
9.875

17 .560

-4 . 300

-7.685
-23.135
-18.835

5.575
-9 .875
-5 .575

Lower
F ACTOR2

Compari son

6-8
6-5
8-6
8-5
5-6
5-8

23.135
8.960
18.835
3.385
7.685

Bonferroni \Dun) T tests for variable: X2
Alpha= 0.05 Confidence= 0.95 df= 8 MSE= 141.6
Cri tical Value of T= 3.01576
Minimum Significant Differen~e= 25.375
Comparisons significant at the 0.05 level are indicated by '***'.

F ACTOR2

Difference

Upper

Confidence

Between

Confidence
Limit

Comparison

8-6
8-5
6-8
6-5
5-8
5-6

Simultaneous

Simul taneous
Lower
Limi t

Means

-20. '500

4.875

30 . 250

-3. 175

22 . 200

47.575

-30.250
-8.050
-47.575
-42.700

-4. 875

20 . 500

17 . 325

42.700
3.175
8.050

-22.200
-17.325

Bonferroni (Dun) T tests for variable: X3
Alpha= 0.05 Confidence= 0.95 dr= 8 MSE= 22.59833
Critical Value of T= 3.01576
Minimum Significant Difference= 10.137
-Comparis.ons significant at
the 0.0"5 level are indicated by '***'. .

Simultaneous
Lower

Comparison

Confidence
Limit

FACTOR2

8
8
6
6
5
i:

Simultaneous

Difference

Upper

Between

.confidence

Means

Limi t

- 6

-0.512

9.625

19.7'62

- 5

o . 763

- 8

-19.7"62

1'0.900
-9."625

21.037
0.512

- S

-8.862

1.275

11. 412

-21.'037

- HL 900
_1 "7i:

-0 . 763

- 8
- ~

-11

.11')

R

Qi:"

***

***

132

6.32 (a) MADV A f-or Species: Wilks' lambda A~ = .00823
F= 5.011; p-value = P( F-; 5.011) = .173
F4,2 (.05) = 19.25
'Species effects

Do not reject Ho: No

MADV A for Nutrient: Wilk'S' lambda A~ = .31599
F = 1.082; p-value = P( F -; 1.082) = .562
F2,l (.05) = 199.5

Do not reject Ho: No nutrent effects

(b) Minitab output for the two-way ANOV A's:

560cM
Analysis of Variance for 56QCM

Source

Spec

Nutrient
Error
Total

DF
2
1
2
5

SS

MS

8; 260

23.738
8.260
2.3£1

47.476
4.722
60.458

F P

10 . 06 0 . 09\l
3.50 0.202

720cM.
Analysis of Variance for 720CM

Source

Spec

Nutrient
Error
Total

DF
2
1

2

5

SS

2£2.239
4.489
9.099
275.827

MS

131.119
4.489
4.550

F

28.82
0.99

P

0.034
0.425

The ANDV A results are mostly consistent with the MANDV A results. The
exception is for 720CM where there appears to be Species effects. A look
at the data suggests the spectral reflectance of
Japanse larch (JL) at 720
nanometers is somewhat larger than the reflectance
species (SS and LP) regardless of

the other two
nutrent leveL. This difference is not as
of

apparent at 560 nanometers.

Wilks' lambda statistic does not indicate
Species effects. However, Pilai's trace statistic, 1.6776 with
F = 5.203 and p-value = .07, suggests there may be Species effects.
For MANOV A, the value of

(For Nutrent, Wilks' lambda and Pillai's trace statistic give the sam F
value.) For larger sample sizes, Wilks' lambda and Pilai's trace stati'Stic
would .give essentially the same result for all factors.

133

6.33. (a) MAGV A for Species: Wilk' lambda A~ = .06877
F = 36.571; p-value = P( F ~ 36.571) = .000
F4,52 (.05) = 2.55

Reject H(J: No species effects

MANDV A for Time: Wilks' lambda A'2 = .04917
F= 45.629; p-value =P( F~ 45.629) = .000
F4,'52 (.05) = 2.55

Reject Ho: No time effects
MANOV A for Species*Time: Wilks' lambda A~2 = .08707

F= 15.528; p-value=P(F~ 15.528)=.000
Fa,52 (.05) = 2.12

Reject Ho: No interaction effects
(b) A few outliers but, in general, residuals approximately normally distrbuted
(see histograms bèiow). Observations are likely to be positively correlated
over time. Observations are not independent.
Histogram of the Residuals

Histogram of the Residuals

(nipo Is 560nm)

(..po Is 720rv)

90

90
90

eo

70

,.

70.

60

g 50

tì 60

~ 60
:i
~ 40

Gl

6- 40
I!

u. 30

u. 30

20

20

10

10
o

.s

-6

-2 0

-4

4

.20

o 10

.10

Residual

Residual

(c) Interaction shows up .for the 560nm wavelengt but not for the 720nm

~

wavelengt. See the Mintab ANDV A output below.

Analysis of Variance for
Source

Species
Time

Species*Time

Error
Total

DF
2
2

SS

965.18
1275.25

4

7 9S . 81

27
35

76.66
3112.90

MS

482.59
637.62
198.95
2.84

F

169.97
224.58
7(J. \)7

P

O. 000

0.000
O. (Joa

Analysis of Variance for 720nm
Source

Species
Time
Species

Error
Total

*

Time

DF

SS

2

2026. 8~

2

5573.81

4

193. 5S

27
35

1769.t54

95:3.85

MS

1013.43
2766.90
48.39
65.54

F

P

15.46 0.000
42.52 -0.000
0.74 '0.574

20

30

6.33-,. (Continued) 134
t d) The data might be analyzed using the growth cure methodology discussed in
Section 6.l.. The data might also be analyzed asuming species are "nested"
within date. fu ths case, an interesting question is: Is "Spetral reflctane the

same for all species for each date?

6.34 Fitting a linear gr.owth curve to calcium measurements on the dominant ulna

XBAR

Grand mean

72.3800 .69.2875

MLEof beta

71.1939

73.2933 70.6562
72.4733 71.1812

71 .8273

73.4707 70.5049
-1.9035 -0.9818

64.7867 64.5312

(B'Sp~ (-l)B) - (-1)

93.1313 -5.2393
-5.2393 1 .2948

72.1848
65.2$67

Sl

S2

92.1189 86.1106 73.3623 74.5890

98.1745 97.013489.482486.1111

86 .11~6 89.0764 72.9555 71.7728

97.0134 100. õ960 88.1425 88.2095

73.3623 72.9555 71.8907 63.5918
74.5890 71.7728 63.5918 75.4441

89.4824 88.1425 86.3496 80. 5S06
86.1111 88.2095 80.5506 81.4156

Spooled

W = (N-g) *Spooled

95.2511
91.7500
81.7003
80.5487

91.7500
95.0348
80.8108
80.2745

81.7003
80.8108
79.3694
72.3636

80.5487
80.2745
72.3636
78.5328

Estimated covariance matrix

7.1816 -0.4040 0.0000 0.0000
-0 .4040 0 . 0998 0 . 0000 0 . ~OOO

2762.282
2660.749
2369.308
2335.912

2660.749
2756.009
2343.514
2327.961

2369.308
2343.514
2301.714
2098.544

2335.912
2327.961
2098.544
2277.452

WL

2803.839 2610.438 2271.920 2443.549
2610.438 2821.243 2464.120 2196.065

o . 0000 0 . 0000 6.7328 -0.3788

2271 .920 2464. 120 2531. 625 1~45. 313

0.0000 ~.~OO -0.3788 ~.0936

2443.549 2196.065 1845.313 25S6.818

Lambàa = 1~ I / IWll = 0.201
Since, with

a = 0.01, - IN - ~tp - q + g)) 10g(A) = 45.72 ;: X~4-i-l)2(O.0l) = 13.28,

we reject the null hypothesis of a Iinear fit at a = u.Ol.

135

6.35 Fitting a quadratic growth curve to calcium measurements.on the dominant ulna,
treating all 31 subjects as a single group.

XBAR

MLE of beta

(B 'Sp~ (-l)B) - (-1)

70.7839

71.6039

92.2789 -5 .9783 0 .0799

71. 9323
71.80:65

3 . 8673

-1.9404

-5.9783 9.3020 -2.9033
0.0799 -2.9033 1.0760

64.6548
S

W = (n-l) *8

94.5441
90.7962
80.0081
78.0676

90.7962
93.6616
78.9965
77.7725

80.0081
78.9965
77.1546
70.0366

2836.322 2723.886 2400.243 2342.027

78.0676
77.7725
70.0366
75.9319

2723.886 2809.848 23ß9. 894 2333. 175

2400.243 2369.894 2314.~39 2101.099
2342.027 2333.175 2101.099 2277.957

Estimated covariance matrix

W2

2857.167 2764.522 2394.410 23ß9.674

3.1894 -0.2066 0.0028
-0.2066 0.3215 -0.1003
0.0028 -0.1003 0.0372

2764.522 2889.063 2358.522 2387 .~70

2394.410 2358.522 2316.271 2093.3ß2
2369.674 2387.070 2093.362 2314.'"25

Lambda = I w I / I W21 = 0.7653
Since, with a = V.OI, - (n - Hp - q + 1)) 10g(A) = 7.893 ~ XL2_i(0.01) = 6:635, we
reject the null hypothesis of a quadratic fit at a = 0.01.

6.36 Here

p = 2, n¡ = 45, n2 = 55, In 1 S¡ 1= 19.90948, In I S2 1= 18.40324, In 1 S pool~d 1= 19.27712

so u =(~+~- 1 J(2(4)+3(2)-lJ = .02242
44 54 44+54 6(2+1)(2-1)

and
C = (1- .02242)(98(19.27712) -44(19.90948) -54(18.40324)) = 18.93

The chi-square degrees of freedom v =.! 2(3)(1) = 3 and z; (.05) = 7.81. Since
2

C = 18.93;: Z;(.05) = 7.83, reject Ho : ~¡ = ~2 = ~ at the 5% leveL.

136
6.37 Here
p = 3, n, = 24, n2 = 24, In 1 S, 1= 9.48091, In 1 S2 1= 6.67870, In I Spooled 1= 8.62718

so u =(~+~- 1 Jr 2(9)+3(3)-1) =.07065

23 23 23+23 L 6(3+1)(2-1)

and
C = (1-.07065)(46(8.62718) - 23(9.48091) - 23(6.67870)) = 23.40
The chi-square degrees of freedom v = .!3(4)(1) = 6 and .%;(.05) = 12.59. Since
2

C = 23.40 )0 xi (.05) = 12.59, reject H 0 : 1:, = 1:2 = 1: at the 5% leveL.

6.38 Working with the transformed data, Xl = vanadium, X2= .Jiron, X3 =,Jberyllum,
X4 = 1 ¡f saturated hydrocarbons J , Xs = aromatic hydrocarbons, we have
p = 5, n, = 7, n2 = 11, n3 = 38, In 1 S, 1= -17.81620, In I S2 1= -7.24900,
InIS31=-7.09274,lnISpoled 1=-7.11438

so u=r.!+..+~- 1 Jr2(25)+3(5)-I)

L6 10 37 6+10+37 L 6(5+1)(3-1)

=.24429

and
C = (1-.24429)(53(-7.11438) -6(-17.81620) -10(-7.24900) -37(-7.09274)) = 48.94

The chi-square degrees of freedom v = .!5(6)(2) = 30 and .%;0(.05) = 43.77. Since
2
C = 48.94)0 x;o (.05) = 43.77, reject H 0 : 1:1 =I:2 = 1:3 = 1: at the 5% leveL.

6.39 (a) Following Example 6.5, we have (iF - xM)' = (119.55, 29.97),

-8 +-8 - an i- - . . inee

(28
1 F
1 28
J-'M-r- .108533
.033186 .423508
-.108533) d.. -76 97 S'

r = 76.97 )0 xi (.05) = 5.99, we reject H 0 : PF - PM = 0 at the 5% leveL.

(b) With equal sample sizes, the large sample procedure is essentially the same
as the procedure based on the pooled covariance matrix.
(e) Here p=2, 154(.05/2(2)):: z(0125) = 2.24, (J.8 +J.sJ =(186.148 47.705J, so

28 F 28 M 47.705 14.587

PF' - PM': 119.55:f 2.24.186.148 ~ (88.99, 150.11)

PF2 -PM2: 29.97:f2.24.J14.587 ~(21.41, 38.52)
Female Anacondas are considerably longer and heavier than males.

137

6.41 Three factors: (Problem) Severity, Wroblem) Complexity and (Engineer)

Experience, each at two levels. Two responses: Assessment time,
Implementation time. MANOV A results for significant (at the 5% level) effects.

Effect
Severity
Complexity
Experience
Severity*complexity

Wilks'

F

lambda

.06398
.01852
.03694
.33521

P-value

73.1

.00

265.0
130.4
9.9

.000
.000
.004

the two responses, Assessment time and
Implementation time, show only the same three main effects and two factor
interaction as significant with p-values for the appropriate F statistics less than .01
in all cases. We see that both assessment time and implementation time is affected
by problem severity, problem complexity and engineer experience as well as the
interaction between severity and complexity. Because of the interaction effect, the
main effects severity and complexity are not additive and do not have a clear
interpretation. For this reason, we do not calculate simultaneous confidence
intervals for the magnitudes of the mean differences in times across the two levels
of each of these main effects. There is no interaction term associated with
Individual ANOV A's for each of

experience however. Since there are only two levels of experience, we can

calculate ordinary t intervals for the mean difference in assessment time and the
mean difference in implementation time for gurus (G) and novices (N). Relevant

summary statistics and calculations are given below.
1.217J
Error sum of squares and crossproducts matrix = 1.217 2.667
(2.222

Error deg. of freedom: 11
Assessment time: xG = 3.68, xN = 5.39

95% confidence interval for mean difference in experience:

3.68-5.39 :!2.201.J2.222 2 = -1.71:!.49 -7 (-2.20, -1.22)
11 8

Implementation time: xG = 6.80, xN = 10.96

95% confidence interval for mean difference in experience:
~2.667 2

6.80 -1 0.96 :! 2.201 -- = -4.16:! .54 -7 (-4.70, - 3.62)
11 8

The decrease in mean assessment time for gurus relative to novices is estimated to

138

be between 1.22 and 2.20 hours. Similarly the decrease in mean implementation
time for gurus relative to novices is estimated to be between 3.62 and 4.70 hours.

l39

Chapter 7
7.1

-1~1 (8::)= 11 (-::) = (~:::J

1 l (120

- - 120 -10

ß = (Z'Z)- l'y =--

180
85
123

.. .... 1

Y=LS=_ _ 15

351

199
142

=

12 . 000

15

5.667
8.200
23.400
13.2£7

9
3

,.

,.

£: = Y-Y
- -

=

9 .4'67

25

-

l2 .000
5.667

3.3J3

8 .lOO

-'5.200

.23.400 =

9

13.267

13

9 .467

3 .ooõl

1.~0

-o.2t7

3.533

.. ,.

Residual sum of squares: :1: = 101.467

fitted equation: y = -.667 + 1.2£7 zl

7.2

Standardized variables
zl

z2

Y

- .292

-1 .088

.391

-1 . 166

- . 7.2£

- . 391

- . 81 7
1 .283

- .726

-1 .174

. 3£3

1.695

-.117

.726

- . £5.2

1 . 1 08

1 .451

.130

fi tted equa ti on:

..

Y = 1 .33z1

- .7 9zZ

Al so, pri or to standardi zi ng the variables, zl = 11 .6'ó7,

z1 z12 :z

ž2 = 5.000 and y = 12.tlOO; Is = 5.716, '¡sz z = .2.7'57
and IS = 7.6'67 .
yy

The fi tted equation for the origi na 1 variabl~s is

= 1 33 -

y _ 1 2 (Zl - 11 .6£7)

7.667. 5.716.

.79 2.757;

(Z2 - 5\

,.
y = .43 + 1.7Bz1 - 2.19z2

7.3

- ~ - - -w

Foll.o\'1 hint and note that s* = y* - y* = v-1/2y_v-1/2;æ ami
(n-r-l )02 = Ê*'.Ê*

is distributed as X1
n-r-1.

140

7.4

ii )

v=I

b)

V

-1

,. 1 n n

so ß.w = (zlz)- z'y = t L z.Y.)/( ¿z~).
~ - - - - j=l J J j=l J

is diagonal with jth diagonal -element 1/'1. so

J

n n

""W - . j=l J j=l J

â = (zIV-lz)-l :iv-l~ = (L y.)/( r z.)

cj

y-l is diagonal with jth diagonal element l/z~ ~o

J
n

~W - .. .... J=1

ß = (z'y-1z)-lzIV-ly = (.r (YJ,¡zJ.))/n

7.5 So, ution follows from Hi nt.

7.6

a)

irs t
nO.e
at A.
1,0,...0)
F.
+ th
- --d1ag
. r Ai
-1,...,).
-1
ri +

is a

generalized inverse of il since . .
o

À1
AA- = r 1;1+1

.

so M - A =

:J

.Àr,

= A
+1

a
.1)

.0

Si nce

Z'l = ! )..e.e! = PAP'
. , 1-1-1
1=

ri+1
(Z'Z)- = ¿ ).:' e.e~ = PA-P'
1.= 1 1 - 1_ 1
.

with

PP' = P'P = I , we check that th~ defining relation holds
p

-.~

(Z'Z)(Z'Z)-(Z'Z) = PAp1(PA-P')PAP'

= PM- Api
= PAP' = l Z

ti )

8y

the hint,
,.

lZ8

=

Z'y.

if ze-,. is

the 'Projection t

c) ,

that

In

we show

,.

ze-

.0 =

-

is the

Z' (y-

,.

- or

- ia)

pro je.ct;..o n

of y- .

. 141

_ -1/2

c)

= l,~,..., r 1 +1 . Then

Consider q. - À. Ze. for

_1 1 _1

ri +1

-1 . )ZI

ri +1

Z(Z'Z)-Z' = Z(I.~ 1À.
e.e. =
-1-1
i =1

1

I.

;=1

q.q.
_1_1

The (S11 are r1+l mutually perpendicular unit l~ngth vectors
that span the space of all linear combinations of the columns of

Z. Thé projection of iis then (see R.esul t 2A.2 and
Def; nition 2A.12)

ri+l ri+l

ri+l

'1-1-1- -

I (q!y)q. = ¿ q.(q~y) = ( L q.q~)y = Z(Z'Z)- Z'y
1=

;=1 -1- -1 i=l -1 -1d) See Hint.

7.7

and Z = (Zl h J .

Write

. = ~(2)
~
(_~U1J

..

- =.(2) -

r - :r-q A

Recall from Result 7.4 that ß =(~ii) = (Z'Z)-lZ'Y is distributed
as N +1(ß,a2(Z'Z)-1) indepen4ently of nâz = (n-r-l)sZ which is
distributed as a2 X~-r-l. From the Hint, (~(2'-~(2))'(Cl~~(2'-~(2))

iscl2 and this is distribut~d independently of S2. Ühe latter
follows because the full random vect-or ê is distributed independently

of SZ). The result follows from the definition of a F random variable

as the ratio of .two independent X2 random variables divided by their
degrees of freedom.

7.8

(;t) H2 = Z(Z'Z)-l Z'Z(Z'Z)-i Z' = Z(Z'Z)-i Z' = H.
(h) Since i - H is an idempotent matrix, it is positive semidefinite. Let a be an n x i unit
vector with j th element 1. Then 0 ~ a'(l - H)a = (1- hii)' That is, hji ~ 1. On the
other hand, (Z'Z)-l is posiiÏe definite. Hence hij = bj(Z'Z)-lbi ~ 0 where hi is the i
th row of z.

¿'i~:hij = tr(Z(Z'Z)-iZ') = tr((Z'Z)-iZ'Z) =tr(Ir+1) =r+l.

142

(c) Usill

(Z'Z)-In£J1=1
="':1(z'
¡ ¿~I
zl -l:~1
- i"i:z'ZiJ
n'
J- -z)2
£Ji=1
we obtain

1
(
ßn
)
i_I , 1=1 i=i

hjj - (1 Zj)(Z'ZJ-I ( ;j )

- ní:~ (z' _ z)2 L:z; - 2z; ¿Zi + nzj)

1 (Zj - z)2

- ;; + í:i'i(Zj - z)2

7.9

Z' = (' ,
-2 -1

1 1

:l

a

(ZIZ)-l
'=('0/5

1;10 J

~(1) = (Z'Zi-1Z'l(1) = L~91; ~(2) = (Z'Z)-lZ'~(2) = ri ~5 J

t = (~(l) :1 ~~2)J - (

- - ~9

1 ~5 J

Hence

4.8 -3.0
3.9 -1.5

" ,.

y = Z~ =

3.0 a
2.1 1.5
1.2 3.0

,. ,.

e =y-y=

5

-3

3

-1.
-1

4
2
1

2
3

4.8
3.9
3.0
2.1

1.2

-3.0
-1.5
0

1.5
3.0

=

.2
- .9

.5

1.0

.1.0

- .1

.5

- .2

0

".A A A
Y'Y = y'y + tit

r 55

J-15
l-

0

- 1 SJ ( 53 . 1

-13.SJ + r 1.9

-1. SJ

24 =. -13.5

22 .5 L - i .5

1.5

143

7.10 a) Using Result 7.7, the 95% confidence interval for the mean
reponse is given by

(1, .5) l'"3.0) :t 3.18

.5) (.: .~)I.1 (\9)

or

- .9

(1.35, 3.75).

b) Usi ng Resu1 t 7.8, the 95% prediction interval for the actual Y
is given by

(1, -. 5 J (3 .0 J- :! 3.18

-.9

)11 + (1, os) (0: .~H~KI j9)'or

(~ . 25, 5.35) .
c) Using (7-l.¿) a 95% prediction ellipse for the actual V's is
given by

(YOl -2.55, Y02 - .75)

7.5' 9.5 Y02 - .75

(7.5 7.5J (Y01. -2.SS)

s (1 + .225) ~2)P~ (19) = 69.825

144

7.11 The proof follows the proof of Result 7.10 with rl replaced by A.

n

(Y- ZB ) i (T- Z' B) = I (V. -8 z . )( Y . -B z . ) ,
j=l -J -J -J -J

and

i:~=1 dj(B) = tr(A-1rY'-ZB)'(Y-IS))

Next,
(¥- ZS) i (Y-ZB) = (Y-Z~+Zp-ZB) i (y- zP+ZS-ZB) = ê'€ + '~~-B) i Z i Z(~-B)J

so
i:~=1 dj(B) = tr(A-l£'tJ + tr(A-l(j-B)'Z'I~i-B))

The fi rst tenn does not depend on the choice of B. Usi n9 Resul t
2A. 1 2 ( c )

tr(A-lt~-B)'Z'Z\P-B) = tr(~p-B)'Z'Z(s-8)AJ
= tr(Z i Z (S-B )A(~-B) i)

,. ,.

= tr(Z'(f3-B )A(S-ß)' Zi)
~ C i Ac ) 0

- -

~/here ~ is any non-zero row of ~(~-B). Unless B = i, Z(S-B)
will have a non-zero row. Tl)us ~ is the best 'Choice f-or any positive d'efi ni te A.

145

7.12

(a)
(1))

best linear pr~di~tor = -4 + 2Z1 - Z2

+-1
mean square error = cr - a i + az = 4

yy _ zy zz - y

a a
i t-1

(c)

PY(x)
= -zyayy
zz -zy

_ IS _
- '3 - .745

(d) Following equation (7-5b), we partition t as

t = iL ~ -i ~J

1 1 ii 1
and detenni ne cava r; ance of ( 1 given z2 to be

:,

( : : J - ( : J (1 ) - , (1. 1) = l: ~). Therefore

=IiT=

2
Py Z i · Z 2 =

7.13

.¡ If

(a) By Result 7.13, ß_ =zz
s-l-s
_zy

(b) Let !(2) = (Z2,Z3J

= r 3.73)
L 5. 57

R =
zl (Z2Z3)

=/3452.33 =
VS691 .34 .78

(c) Partition ~ = t l~11 so

.707

1

s
-z(2)zl

çl s
z~ 2)z(2)-z(2 )zl

s

zizl

146

S691.34 r

S I s'
s i
i S
f----------

= z(l )Z(l): -Z3Z.(,)

i

S = '600.51
126.05 i
_________l-___
217.25 23.37 i l23.11

L _z3z(,) i '3z3
and

s - s' s-l s

380.821

z(l )z(l) -z3z(1) z3z3 -z3z(,) = r3649.04
380.82

' 02.42

Thus
380 . 82

r z, z2.z3
7.14

= .£2

/3649.04 1''02.42

(a) The large positive correlation between a manager's experience
and achieved rate of return on portfolio indicat~s an apparent

advantage for managers with experience. The negative correla-

tion between attitude to\'iard risk 'and achieved rate of return
indica tes an apparent advantage for conservative managers.

(b) from (7-S1)

s s
syz,
_ YZ2s 2,

ryz, · z2

s z z

Zi

yz'.ZZ =ß') 2

=

/s
· Is
i S syz
yy-z2
z,zl.z2
_ --.

S2

yy z2z2
s zl zl

ryz,

- YZ2
r r2, z2

11 -

r~-YZ2I' -zlr~-z2

=

zl Z2

s 'S

z2z2

= .31

Removifl9 lIy.ear'S of eXl'eriencell from ,consideration, we no\'1 have a
positive c.orr-elation between "attitude towar.d riskll and "achieved

147

returnll. After adjusti ng for years of experience, there ;s an

apparent advantage ,to mana~ers who take ri sks.
7.15 (a) MINlTAB computer output gives: y = 11 ,870 + 2634'1 + 45.2z,z;
residual sum of squares = 2tl499S012 with 17 degrees of freedom.

Thus s = 3473. Now for example, the ~stimated standard devia-

,. ,.

..
tion of ßO is /1.996152 = 4906. Similar calculations give

the estimated standard deviations of ß1 and ß2.
are no apparent
(b) An analysis of the residuals indicate there

model inadequacies.
(c)

The 95% predi~tion interval is ($51 ,228; $60,23~)

(d)

Using (7-",Q), F = (45.2)( .0067)-1 (45.2) = .025

12058533 .

Since fi,17(.OS) = 4.45 we cannot reject HO:ß2 =~. It appears

as if Zl is not needed in the model provided £1 is include~
in the model.

7.16

Predictors

P=r+1

C.o

1.025

Zl

2

Z2

2

12.24

3

3

Zl 'Zz

148

sales and assets follows.

7.17 (a) Minitab output for the regression of profits on

Profits = 0.~1 + 0.0'6~1 Sales + 0.00577 Assets

Predic-tor
Constant
Sal.es

Assets
S =

SE Coef

(;oef
0.013

0.02785
0.004946

0.0'6806

0.005768

p

0.999
0.045
0.282

R-Sq(adj)

R-Sq = 55.7%

3.86282

T

0.00
2.44
1.17

7.'641

=

43.0%

Analysis of Variance
Source

DF

SS

2

131. 26

Regression
Residual Error
Total

7
9

104.45
235.71

MS

F

65.63
14.92

4.4()

P

0.058

(b) Given the small sample size, the residual plots below are consistent with the
usual regression assumptions. The leverages do not indicate any unusual
observations. All leverages are less than 3p/n=3(3)110=.9.

Resîdual Plots for Prots
;-",:¡':-,:--,. ,',',,",.', -"',:-,:;:'Nónf~tProbabtltypìól: of the

ResdualsÝetthe fi Value

Residiials

99 ,

. 5.0
90 :.

.."

ii 2.5

! 0.0
~2,5

"5.0
1

.0

"10

10,0 125 15.0 17.5 20.0

10

Fi Value

.Reual

Residuals Versus the Order of the Data

llistni..oftt~ Residuals
4:

5.0

ii..

D' 3

..

Ii

I

" 2

f 1

-2.5

o

,Reua'

(c) With sales = 100 and assets = 500, a 95% prediction interval for profits is:
(-1.55, 20.95).

(d) The t-value for testing H 0 : ß2 = 0 is t = 1.17 with a p value of .282. We cannot

reject H 0 at any reasonable significance leveL. The model should be refit after
dropping assets as a predictor varable. That is, consider the simple linear
regression model relating profits to'sales.

149

7.18 (a) The calculations for the Cp plot are given below. Note that p is the number of
model parameters including the intercpt.
2 (sales)
2.4

2 (assets)
7.0

3 (sales, assets)
3.0

(b) The AIC values are shown below.
p (predictor)
AIC

2 (sales)

29.24

2 (assets)

33.63

3 (sales, assets)
29.46

7.19 (a) The "best" regression equation involving In(y) and Z¡, Z2,..' ,Zs is

In(y) = 2.756-.322z2 +.114z4
with s = 1.058 and R2 = .60. It may be possible to find a better model
using first and second order predictor variable terms.
(b) A plot of the residuals versus the predicted values indicates no apparent
problems. A Q-Q plot of the residuals is a bit wavy but the sample size is
not large. Perhaps a transformation other than the logarthmic
transformation would produce a better modeL.

iso

7.20 Ei genva 1 ues 'Of the carrel atÍ\Jn matrix of the predi ctor vari able'S 2:1,
z2,...,z5 are 1.4465,1.1435, .8940, .8545, .6615. The correspoml-

of '1' z2,...,z5 in the

ing eigenvectors give the coefficients

principle component. for example, the first principal component,
written in terms of standardized predictor variables, is

.. * * * * *

Xl = .60647.1 .3901Z2 .6357Z 3 - .2755Z4 - .0045zS
A regression of Ln(y) on the first principle component gives

"

..

1n(y) = 1.7371 - .070.li
with s = .701 and R% = .015.
A regression of 1n(y) on the fourth principle ~ompon~nt produ~~s
the best of the one pri ncipl e component pr.edictor variable regress ions.

..
In this case 1n(y) = 1.7371 + .3604x4 and s = .618 and R1 = .235.
7.21' This data set doesn1t appear to yield a regr.ession relationship whkh
explains a larg.e proportion of the variation in the r~sponses.

(a) (i) One reader, starting with a full quadratic model in t~e

predictors z1 and z2' suggested the fitted regressi'On
equation:
"
Yl = -7.3808 + .5281 z2 - .0038z2 z

with s = 3.05 and R% = .22. (Can you do bett.er than

this?)
of the residuals versus the fitted values SU99~sts

(ii) A plot

the response may not have constant variance. Al so a Q-Q
plot of the residuals has the fOllO\'ling gen,eraT ap?ear-

ance:

151

Normal probabilty plot

.
co

. ....

(0

.,........

C/

ei
::

'C
ïñ

-.

.....

C\

Q)

0:

0
-.
I

....~..

...-

..;¡.
......

..~

......

. . ~...,

.

. . ....

..'

-2

-1

o

1

2

Quantiles of Standard Normal

Therefore the normality assumption may a 1so b~ suspect.
Perhaps a better regr.ession can be obtained after the

responses have been transformed or re-expressed in a

di fferent metri c.
(iii) Using the results in (a)(i), a 95~ prediction interval

of zl = 10 (not needed) and z2 = 80 is

10.84 :! 2.0217 or (5.32,16.37).

152

7.22 (a) The full regression model relating the dominant radius bone to the four predictor
variables is shown below along with the "best" model after eliminating non-

significant predictors. A residual analysis for the best model indicates there is
no reason to doubt the standard regression assumptions although observations

19 and 23 have large standardized residuals.

Q) The
regression equation is
DomRadius = 0.103 + 0.276 DomHumerus

- 0.165 Humerus + 0.357 DomUlna

+ 0.407 Ulna

Coef
0.1027
0.2756

Predictor

Constant

DomHumerus

Humerus

0.3566
0.4068

Ulna

P

-1. 20
1. 80
1. 87

0.246
0.088
0.076

0.97 0.346
2.40 0.02"6

0.1064
0.1147
0.1381
0.1985
0.2174

-0.1652

DomUlna

T

SE Coef

R-sq(adj) = 66.1%
R-Sq = 71.8%
S = 0.0663502
- .- ~ -,------~~-------------"'-"~_._~~- - The regression equation is
DomRadius 0.164 + 0.162 DomHumerus + 0.552 DomUlna

predictor

Constant

DomHumerus
DomUlna

Coef
0.1637
0.16249
0.5519

S = 0.0687763

P

T

SE Coef

0.128
0.012
0.002

1. 58

0.1035
0.05940
0.1566

2.74
3.53

R-sq(adj) = 63.6%

R-Sq = 66.7%

Analysis of variance

Source

DF

Regression
Residual Error

Total
(ii)

SS

0.20797
0.10406
0.31204

2

22
24

F P

MS

0.10399 21. 98 0.000
0.00473

Residual910ts for DomRadius..Dom Hi:metus and 'Dm Ulna PtedictlS
Normal probabilty Plot

"

'O

the Reduàls

Reiduals Verus the Fitt Value
. 1l

90

..

-I
,~ 50

..

. ..

:D;

10
1

1l J. ~
'0.1

0;0

0;6

0.1

0;7 OJ! 0.9

1.0

Resual

Fi Value

Histogl"in of the Residuàls

ResidualsVelsus tleOrder ofthè Ðlt

8

f&

:!

i

.. 4

!

II .2

'0.1

~ò16 "8.08 .O¡O 0.08 0.16

ll..1

~Or

2 4 6 8 10 12 14 16 18 20 22 24

153

(b) The full regression model relating the radius bone to the four predictor varables
is 'Shown below. This fitted model along with the fitted model for the dominant

radius bone using four predictors shown in part (a) (i) and the error sum of
squares and cross products matrix constitute the multivanate multiple regression
modeL. It appears as if a multivariate regression model with only one or two
predictors wil represent the data well. Using Result 7.11, a multivarate
regression model with predictors dominant ulna and ulna may be reasonable.
The results for these predictors follow.
The regression equation is
Radius = 0.114 _ 0.0110 DomHumerus + 0.152 Humerus + 0.198 DomUlna + 0.462 Ulna

Coef

Predictor

Constant

o . 11423

DomHumrus
Humerus
DomUlna

Ulna

-0.01103
0.1520
0.1976
0.4625

SE Coef

T

1.27

0.08971
0.09676

-0.11

1.31
1.18
2.52

0.11'65

0.1674
0.1833

P

0.217
0.910
0.207
0.252
0.020

S = 0.0559501 R-Sq 77.2% R-Sq(adj) = 72.6%
Error sum of squares and cross products matrix:

The regression equation is
Radius = 0.178 + 0.322 DomUlna + 0.595 Ulna

The regression equation is

DomRadius 0.223 + 0.564 DomUlna + 0.321 Ulna

Predictor

Coef
0.2235

DomUlna

o . 5645

Constant
Ulna

0.3209

SE Coef
0.1120
0.2108
0.2202

T

2.00
2.68
1.46

Predictor

p

Constant

0.059
0.014
0.159

DomUlna

Ulna

SE Coef
0.08931
0.1680
0.1755

T

2.00
1.92
3.39

Analysis of Variance

Analysis of Variance

Source

DF
5S
2 0.184863

Res idual Error

22 0.127175
24 0.312038

Total

Coef
0.17846
0.3220
0.5953

MS

0.092431
0.005781

F

15.99

P VIF

0.058

o . 0'68 2 . 1

0.003 2.1

S = 0.0'606160 R~5q = 70.5% R-Sq(adj) = '67.8%

S = 0.07'60309 R-Sq = 59.2% R-sq(adj) = 55.5%

Regression

.050120 .050120J
.062608
(.088047

P

0.000

Source

DF
5S
2 0.193195

Residual Error

22 0.080835
24 0.274029

Regression

Total

Error sum of squares and cross products matnx:

MS F I

0.09'6597 26.29 O.OOC

0.003'674

.064903J

.064903 .080835
(.127175

154

7.23. (a) Regression analysis using the response Yi = SalePr.

Sumary of Backward Elimination Procedure for Dependent Variable X2

Variable Number Partial Model

Step
1

2
3

Removed In R**2 R**2

K9
7
0.0041
0.5826
X3
0.0043 0.5655
0.5782
X5 56 0.0127

6.3735
6.4341

Sum of Mean
DF Squares Square

Model

F
o . 66g7
o . 7073

C Total

R-square

425.05739
Parameter Estimates
Root MSE

Variable

DF

INTERCEP

1

XL

1
1

X4
X6
X7

1
1
1

18

o .4033
o .1538

2.0795

F Value

Prob) F

18.224

0.0001

5 16462859.832 3292571.9663
70 12647164.839 180673.78342
75 29110024.671

Error

Prob)F
0.4161

SalePr

Dependent Variable: X2
Analysis of Variance

Source

C(p)
7 .6697

o . 5655

Parameter
-Standard
Estimate
Error
-5605.823664 1929.3g86440
-77.633612 22.29880197
-2.332721
o . 75490590
389.364490
1749.420733

89.17300145
701.21819165

133. 177'529

46 .66673277

T for HO:

Parameter=O

Prob ) ITI

-2.905
-3.482
-3.090
4.366
2.495
2.854

o . 0049

o .0009
o . 0029
o . 0001

0.01'50
o .0057

The 95% prediction interv~l for SalePr for %0 is
z~ß:f t70(0.025) /(425.06)2(1 + z~(Z/Z)-lZO)'

SalePr:~reed .) FtFrBody J Frame ~ BkFat) SaleHt)
(a) Residual plot

oo
II
~

o
ll .......

0c:

oo
..o

'"
1ã

i:::
1i
Gl
a:

~b) Normal probability plot

00
ll
~

..

c:
~

'"

o

ii::
i:
'Uj

..

00
ll

o
o
'9·....
.-~.

u;

..

1000

2000

Predicted

/:...-..-..

Gl

a:
. ... ~ . .
0
o .._.................'W......:;.._.~..........................._..
.... . .
. . e.
00

3000

/' .-

.~.......~.;:~..
.
-2 -1

o

2

Quanties of Standard Normal

155

(b) Regression analysis using the r.esponse Yi = In(SalePr).

Sumary ofBa~kward Elimination Procedure f~rDependent

Variable LOGX2

Variable Number Partial Medel

Removed In R**2 R**2

Step

X3
76 0.0033
0.6368
X7
0.0057
0.6311
i9 5 0.0122 0.6189
X4 4 0.0081 0.6108

1

2
3
4

C(p)

F

Prob~F

7.6121
6.6655

0.6121

'0 .4368

1. 0594

6 . 9445

2 . 2902

o .3070
o . 1348

6.4537

1 .4890

o . 2265

Dependent Variable: LOGX2
Analysis of Variance
'Sum of

Mean

Source

DF

'Squares

Square

F Value

Prob~F

Mode 1

4
71
75

4.02968

1. 00742

27.854

0.0001

2 . 56794

0.03'617

Error
C Total

6.597-'2

0.19018

R~ot MSE

R-square

Parameter Estimates

0.6108

Parameter
Estimate

'Standard

Error

Parameter=O

Prob ~ ITI

0.91286786

5.736

0.0001

o . 00846029
o . 00827438

-5 .841
-3 .337

o . 000 1

1

5.235773
-0.049418
-'0.027613
0.183611
o .058996

4.599
3.060

o . 0001

1

0.03992448
0.01927655

Variable

DF

INTERCEP

1

XL

1

X5
X6
X8

1

T for HO:

0.0013
0.0031

The 95% prediction interval for In(SalePr) for Zo is
z~ß:f t7o\O.025) J~O.19.Q2)2(1 + z~(ziz)-izo).

The few outliers among these latter residuals are not so pronounced.
In(alePrFfl3r.ed S PrctPF8 j Frame i SaIeHt)

(b) Normal probabilty plot

(a) Residual plot

.

c:

C\

II

iü
::

c:

"0
.¡¡
'"

a:

y/

""

. ...

.- .'....

c:

..

.
.... .".
. .
. .:".. .

C\

II

c:

II

0

iü
::
:2

.- ."

j........

'"
~ . ...............................................................................
c:

.. .:. ..1.\ .: ~ . :

C\

c?
""

..

. ..

9

7.0 7.2 7.4 7.6 7.8 8.0

Predicted

a:

J

C\

9

....~~;.
......

~ ........

9 '.

-2 -1

o

2

Quantiles of Stadard Nonnal

156

7.24. (a) Regression analysis using the response Yi = SaleHt and the predictors Zi = YrHgt
and Z2 = FtFr Body.

SaleHt

Dependent Variable: X8
Analysis of Variance
Sum of

Mean

OF

Squares

Square

F Value

Model

2

235.74'533

117 .87267

131.165

Error

73
75

65.60204
301.34737

o . 89866

Source
C Total

R-square

0.94798

Root MSE

Parameter Estimates

o . 7823

Standard
Error

Parameter
Estimate

Prob)F
0.0001

Variable

OF

INTECEP

1
1

7.846281

3.36221288

X3

o . 802235

o . 08088562

X4

1

o . 005,773

0.00151072

T for HO:

Prob )

Parameter=O

2.334
9.918
3.821

ITI

'Û . 0224
o . 000 1

o .0003

The 95% prediction interval for SaleHt for z~ = (1,50.5,970) is
53.96:f t73(0.025) \/0.8987(1.0148). = (52.06,55.86).

SaleHt:r~rHgt) FtFrBody)

(b) Normal probabilty plot

(a) Residual plot

......,.

N

N

....
. e. :.......:

.'.' .

/~:.
./
.t'

.- ... ....

Ul

ii:i
'l

c

a:

...

ëii
Gl

Ul

. ..

ll .- . ,,'
.. .:. e.

....... ............wr........ v-....... 60.............._;.. ........

ii:i
'l

0

a:

..,
C)

54

Predicted

56

..pa

....

C)

52

././

ëii
Gl

58

,,".......:.
.
.2 -1

o

2

Quantiles of Standard Normal

157

(b) Regression analysis using the response 1í = SaleWt and the predictors Zi = YrHgt and
Z2 = FtFrBody.

SaleWt

Dependent Variable: X9
Analysis of Varian~e

Sum of Mean

Source

DF Squares Square

Model

2 390456.63614 195228.31807
73 873342.99544 11963.60268
75 1263799.6316

Error
C Total

109.37826
Parameter Estimates
Root MSE

Variable

DF

INTERCEP
X3
X4

1

Parameter
Estimate

R-square

1

Prob;)F

16.319

0.0001

o . 3090

Standard
Error

675.316794 387 . 93499836
9.33265244
2.719286
0.17430765
0.745610

1

F Value

T for HO:

Parameter=O
1.741

Prob ;) ITI

0.291
4.278

0.771'6

o .0859
o .0001

The 95% prediction interval for SaleWt for z~ = (1,50.5,970) is

1535.9:: t73(0.025)V1l963.6(1.0148) = (1316.3,1755.5).

SaleW~rHgt) FtFrBody)
(a) Residual plot (b) Normal probabilit plot

oo
C'

o
o
C'
o
o
N
II

ñi
='

'0
'¡¡

Gl
IX

o
o
-

... .'.
..

. .'

ooC\

. .. ..

.... .. .
.... .

~
'
.
....
o ...

o .. ._.._.;....._.....;~.._;.. .-_..................................

II

ñi
='

'0
.¡¡

Gl

IX

o

8
.
ci

Predicted

r
/"

/

¡..-

.or

o
oo
...

1500 1600 1700 1800

. .'
, .,'.'

,. .....

oo
-

-.
o ......rI.
-. .

ci

.- .....
... .....

.,

. ...,

......

.2 -1

o

2

Quantiles of Standard Normal

158

Multivariate regression analysis using the responses Yi = SaleHt and Y2 = SaleWt and
the predictors Zi = YrHgt and Z2 = FtFrBody.
Multivariate Test: HO: YrHgt = 0
Multivariate Statisti~s and Exact F Statistics

S=1 M=O N=35

Statistic

Value

F

Wilks i Lambda

o . 38524567

57.4469

Pillai's Trace
Hotelling-Lawley Trace

0.61475433

57 . 4469

Roy

i s Greatest Root

Multivariate Test:

HO:

Multivariate Statistics
S=l

1 .59574625

57.4469

1.59574625

57 .4469

Num DF
2
2
2
2

Den DF

Num DF
2
2
2
2

Den DF

Pr ~ F

72
72
72
72

0.0001
0.0001

Pr ~ F

72 0.0001
72 0.0001
72 a .cQOO1
72 0..0001

FtFrBody = a
and Exact F Statistics

N=35

M=O

Statistic

Value
0.75813396
0.24186604
0.31902811
0.31902811

Wilks i Lambda

Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root

F

11. 4850
11 .4850
11. 4850
11 .4850

a .0001
0..0001

The theory requires using Xa (YrHgt) to predict both SaleHt and'SaleWt, even though
this term could be dropped in the prediction equation for Sale\Vt. The '95% prediction
ellpse for both SaleHt and SaleVvt for z~ = (1,50.5,970) is
1.3498(1'i - 53.96)2 + O.000l(Yó2 - 1535.9)2 - O.0098(Yói - 53.96)(ìó2 - 1535.9)

2(73)

- i.OI4872F2,72(O.05) = 6.4282.

The 95% predicion ellpse
for both SaleHt andSaleWt

Chi-square plot of residuals

o
-

~
Ò 'l...:/'

oo
~

co

l:
II

~l! '"

N
o
~

ai

o

2

4

qchisq

oo
io

-

o

, o-

CO

6

B

10

51 S2 53 S4 5S 55 57
Y01

159

7.25. (a) R-egression analysis using the first response Yi. The backward elimination proce-

dure gives Yi = ßoi + ßl1Zi + ß2iZ2. AU variables left in the model are significant at
the D.05 leveL. (It is possible to drop the intercpt but We retain it.)
Dependent Variable: Y1
Analysis of Variance

TOT

Sum of Mean

Source
Model

DF Square s Square

F Value

Prob~F

2 5905583.8728 2952791.9364

22.962

o .0001

Error

14 1800356. 3625 128596.88303

C Total

16 7705940.2353

R-square

358 . 60408

Root MSE

o . 7664

Parameter Estimates

Parameter
Estimate

Standard

Parameter=O

Frob ~ ITI

1

56. 720053
507.073084

Eror

o . 328962

0.274
2.617
6.609

o . 7878
o . 0203

1

206.70336862
193.79082471
0.04977501

Variable

DF

INTERCEP

1

Zl
Z2

T for HO:

0..0001

The 95% prediction interval for YÍ = TOT for z~ = (1,1,1200) is

958.5:l ti4(O.025)y'128596.9(1.0941) = (154.0,1763.1).

TOT~eN ; AMT)
(a) Residual plot

0co0

(b) Normal probabilty plot

00
co

. .....' -'

. .' .'

. .....

00

0N0
In

ãi
::

Q

"C

,ñ

.............................................................
~

..

CD

ir

0
0
y

r' .'

N

In

ãi
::
"C
üi

0

,.'

,... .

.....~..

00
y

....
"

oo
~

500 1000

2000

Predicted

3000

.....

...... .
..... .

CD

ir

. .

....

.'

......

g
.
~
-2

-1

o

Quantiles of Standard Normal

2

160

(b) Regression analysis using the second response Y2' The backward elimination procedure
gives 11 = ß02 + ßi2Zi + ß22Z2. All variables left in the model are significant at the
0.05 leveL.

Dependent Variable: Y2 AMI
Analysis of Variance

of Mean

Sum

Source

DF Squares Square

F Value

Prob.)F

Model

2 5989720.5384 2994860.2692
14 1620657.344 115761.23886
16 7610377.8824

25.871

0.0001

Error
C Total

R-square

340.23703
Root MSE
Parameter Estimates

Variable

DF

INTERCEP

1

Zl

1

Z2

1

o . 7870

Parameter
Standard
Estimate
Error
-241.347910 196.11640164

T for HO:

Parameter=O

Prob .) ITI

-1.231
3.298
6.866

o . 2387

183.86521452

606 . 309666
o . 324255

o . 04722563

0.0053
o . 0001

The 95% prediction interval for 1'2 = AMI for z~ = (1,1,1200) is
754.1 :l t14(0.025) Jii5761.2(1.0941) = (-9.234,1517.4).

AMi=t(eN J AMT)
(b) Normal probabilty plot

(a) Residual plot

oo
CD

oo
CD
(/
¡¡
::

ooC\ .

.' .'

.......

..

en

.

¡¡

i::i

oo
C\

........
!t0P'"

o
oo

i:

o ......A.............................................................

G)

oo

ïii

c:

i.Gl
c:

C)

C)

oo

oo

...~...
....o¡ .

....;.
.'

....

............
.
..'

'9

'9

500 1000

2000
Predicted

.2

.1

o

Quantiles of Standard Normal

2

161

(c) Multivariate regresion analysis using Yi and 1"2.

Multivariate Test: HO: PR=O. DIAP=O. QRS=O
Multivariate Statistics and F Approximations

8=2 M=O N=4

Statistic
Wilks' Lambda

Value
0.44050214

Pillai 's Trace

o . 60385990

Hotelling-Lavley Trace
Roy's Greatest Root

1.16942861
1.07581808

F
1 .6890
1 .5859

Num DF

1.7541

6
3

Den DF

6

Û

3 .9447

Pr ~ F

20 .( . 1755
22 .( . 1983
18 0.1657
11

0.0391

Based on Wilks' Lambda, the three variables Zs, Z4 and Zs are not significant. The

95% prediction ellpse for both TOT and AMI for z~ = (1,1,1200) is
4.305 x 10-5(Yå1 - 958.5)2 + 4.782 x 10-5(l'2 - 754.1)2

s( ? 2(14) ( )

- 8.214 x 10- 101 - 958.5)(1'Ó2 -754.1) = i.0941-iF2,1s,\O.D5 - 8.968.

The 95% prediction ellpse
for both TOT and AMI

Chi-square plot of residuals
U)

gII

-

II

i: ..

lI

'C
'C

N
1:

!! CO

~

CI

o'E

N

0

00
0-

00
10

..

.. ...

..

. .

0

,

0

2

3

4

qchisq

5 1l 7

o

500 1000 1500 2000
Y01

162

7.26 (a) (i) The table below summarizes the results of the "best" individual regressions.
Each predictor variable is significant at the 5% leveL.

73.6%

s
1.5192

76.5%
75.4%

.3530
.3616

80.7%

.6595

75.7%

.3504

R2

Fitted model
Yi = -70.1 + .0593z2 + .0555z3 + 82.53z4
27.04z4
Y2 =-21.6-.9640z1 +

26.12z4

Y2 = -20.92+.01 17z3 +

44.59z4

Y3 =-43.8+.0288z2 +.0282z3 +

Y4 = -17.0+.0224z2 +.0120z3+ 15.77z4

(ii) Observations with large standardized residuals (outliers) include #51, #52
and #56. Observations with high leverage include #57, #58, #60 and #61.

Apar from the outliers, the residuals plots look good.
(ii) 95% prediction interval for Y 3 is: (1.077, 4.239)

(b) (i) Using all four predictor variables, the estimated coefficient matrix and
estimated error covariance matrix are

-74.232 -24.015 -45.763 -17.727
-.550
-1.486
- 1.185
- 3.120

B=
i

I=

.098

.009

.047

.029

.049

.008

.025

.011

85.076

28.755

45.798

16.220

2.244 .398 .914 .511
.398 .118 .193 .089
.914 .193 .419 .210
.511

.089

.210 .122

A multivariate regression model using only the three predictors Zi, Z3 and Z4

wil adequately represent the data.
(ii) The same outliers and leverage points indicated in (a) (ii) are present.
Otherwise the residual analysis

suggests the usual regression assumptions

are reasonable.

(ii) The simultaneous prediction interval for Y 3 wil be wider than the
individual interval in (a) (iii).

163

7.27 The table below summarizes the results of the "best" individual regressions.
Ea~h predictor variable is significant at the 5% leveL. (The levels of Severity are
coded: Low= 1, High=2; the levels of Complexity are coded: Simple= 1, Complex=2;
Exper are coded: Novice=l, Guru=2, Experienced=3.) There are no
the levels of
signifcant interaction terms in either modeL.

Fitted model
Assessment = -1.834 + 1.270Severity + 3.003Complexity
Implementation = -4.919 + 3.477 Severity + 5.827 Complexity

74.1%

s
.9853

71.9%

2.1364

RZ

For the multivariate regression with the two predictor variables Severity and
Complexity, the estimated coeffcient matrix and estimated error covariance matrix
are

B = 1.270 3.477.
3.003 -4.919J
5.827
(-1.834

î: = ( .9707 1.9162J
1.9162 4.5643

A residual analysis suggests there is no reason to doubt the standard regression
assumptions.

164

Chapter 8
8.1 Eigenvalues of * are À1 = 6. À2 = 1. The

principal campenents are
Yl = ..894X1 + .447X2

,.

Y2 = .~47X1 - .894XZ

Vareyi) = À1 '= 6. Therefore. proporti()n of tatal population variance
explained by ~l is 6/(15+1) = .86.

8.2

.6325
1
£ = (1
. .6325J
(a) Y, = ,.707L, + .7!J7L¿

Var(Yi) = À, = 1.6325

Proportien of total population

'2 = .7n7Z, ~ .707Z2

varianceexpl ained by Yl is

1.6325/(1+1. = ..816
(b) No. The two (standardized) variables contribute ~qually to the

principal components in 8.2(a). The two variables contribute
unequally to the principal components in 8.1 because 'Of their

unequal varian~es.
(c) Py L = .903;

1 1 .

PYl Zz

= .903;.

Py Z = .429

2,

8.3 Ei~envalues of tare 2.' 4. 4. E;genvect~rs assaciate with the ei~en-

values 4. 4 are not unique. One choi~~ is =i = iO 1 O)çnd
:~ =(0 0 1). With these assignments .the 'principal components are

y 1 = Xl' Y 2 = X2 and Y 3 = X3 .
8.4 figenvalues of * are selutions.of 1;-À11 = (a2-Àp-2t~2_ÀH.a2,p)2 = 0
Thus ~0'2-À)H.a2_À)2_2cr4p2J = 0 S'O À = 0'2 'Or À ='"2~lt~hl,). For

À1 = (12,~i = (l/ff,.o,-l/I2J. 'For À2 =O'2(l+i'Ii; ~ fj/Z.)1NEiZ). fer

À~'=a2'tl-pI2). ~~ = 0/". -1/12; 1/12

1.65

Propert ion of Total

Pri nci'fa 1

Component

Vari ance

1 1

02

1/3

. a2(l+pm

1 (1+1'12)

Y 1 = a Xl - 12 X3

1 1 1
1 1 1

Variance Explained

y 2 = "2 Xl + 12 X2 + 2" X3

0'2 (1-.p12)

y 3 = '2 Xl - /ž X2 + 2" X3

1 (l-pl2)

8.5 (a) Eigenva.l ues of 2 satisfy

IE-ul = (l-À)3 + 2.p3 - 3(1:'À)p1 = 0

or (l +29-À)(1-p-À)2 = O. Hence À1 = 1 + 2.p; À2 = À3 = 1 - ?
and results are consistent with (8-16) for p = 3.

1 1

(b) By direct multiplication

.i y.p - y~ -

.ø ( c 1) = (1 + (P-1)9 H c 1 )

thus varidying the fir~t eig~nvalu~-eigenv~ct~~ pair. further
~ :i= (l-p)~i' ; = 2.3....,p .

166

8.6 (a)

Yi = .999xl + .041x2 Sample variance of Yi = -l = 7488.8
variance
Y2 =-.041xl +.999x2 Sample

of Y2 =t =13.8

(b) Proportion of total sample variance explained by Yi is -l/(-l + t) = .9982

(c) Center of constant density ellpse is (155.60, 14.70). Half length of major axis is

102.4 in direction ofyi' Half length of perpndicular minor axis is 4.4 in
direction of Y2'

19 1 .' 2

(d) r)~ x = 1.000, ry" x = .687 The first component is almost completely deterined
by Xi = sales since its variance is approximately 285 times that of X2 = profits.
This is confirmed by the correlation coefficient ry'¡.xi = 1.000.
8.7 (a)
Yi = .707z1 +.707z2

Y2 = .707z1 -.707z2

Sample varance of Yi = -l = 1.6861
Sample

variance

of Y2 =t =.3139

(b) Proportion of total sample variance explained by Yi is -l /(-l + t) = .8431

(c) rýi.i¡ = .918, rÝ1'l2 = .918 The standardized "sales" and "profits" contribute equally

to the first sample principal component.

(d) The sales numbers are much larger than the profits numbers and consequently,
sales, with the larger variance, wil dominate the first principal component
obtained from the sample covariance matrix. Obtaining the principal components
from the sample correlation matrix (the covariance matrix of the standardized
variables) typically produces components where the importance of the varables,

as measured by correlation coefficients, is more nearly equal. It is usually best to
use the correlation matrix or equivalently, to put the all the variables on similar
numerical scales.

167

8.8 (a) rý¡,Zl == êik.J i == 1,2 k = 1,2,.. .,5

Correlations:

i'\k
1

2

1

.732
-.437

2
.831

-.280

4
.604
.694

3

.726
-.374

5

.564
.719

The correlations seem to reinforce the interpretations given in Example 8.5.

(b) Using (8-34) and (8-35) we have

k rk
1
2
3
4
5

.353
.435
.354
.326
.299

r = .353

T= 103.1;: .%;(.01)=21.67 so

r=2.485

would

reject Ho at

the 1%

leveL. This test assumes a large random sample and a
multivariate normal parent population.

lti8

8.9 (a) By (S-lt)

_ ..

2

£! E!!l
e

max LÜi~t) -'

~.r -

( 21) 2 \ n-n1 ) 2 I S 12

The same resul t appl ;ed to each variable independently 9; ves

n

max

e-i
n n n

L(l1. ~O' ..) =

1 11

(2ir)'2 (n-l)2
s~.
n 11

11.
..
i ~o11

~ 1 11 .
p

Under HO ~ max L(ii.rO) = .II L(\1.~a..)

11.+0 1=1

and the 1 i kel i hood ratio stat; stic becomes

. il:fo L(~.tO)
A = 'max Ui =

.-

:~.; -

; =1 11

n

PT
n 5..
. n

and (4-17) \.¡e get
(b) When t = 0'2 I . using (4-l6)

1

max L(\l ,O"~I) =
ii

1 -2aLttr(~n-l)$))
.e

!Y !æ
(2n) 2 (cr2.) 2

1"69

8.9~ Continue)
50

np e

( )np/2 -np/2

max l.(ll ,a2 I) =

-). .0'2

(lr)np/2(n_l )np/2(tr(s))np/2

e -np/2

=

n p
1 np/2

(21T)np/Z (.!) (1 tr (5) )np/2

and the result follows. Under HO there' are P lJ; 's and. '01Ì
v~riance so the dimension of the parameter space;s YO = p. + 1.

The unrestricted case has dimension p + p(p+l)lZ so the X2 has
p(p+l )/2 - 1 = (p+2)~p-l )/2 d. f.

8.10 (a) Covariances: JPMorgan, CitiBank, WellsFargo, RoyDutShell, ExxonMobii

JPMorgan

JPMorgan

CitiBank
WellsFargo
RoyDutShell
ExxonMobil

0.00043327
0.00027566
0.00015903
0.00006410
0.00008897

CitiBank

WellsFargo RoyDutShe1l

ExxonMobi1

0.00043872
0.00017999
0.00018144
0.00012325

o .00022398
0.00007341 0.00072251
0.00006055 0.00050828

0.00076568

Fargo, RoyDutShell, Exxon

Principal Component Analysis: JPMorgan, CitBank, Wells

Eigenana1ysis of the Covariance Matrix
103 cases used

Eigenvalue
Proportion
Cumulative

Variable

JPMOrgan

CitiBank
WellsFargo
RoyDu.tShel1

ExxonMi 1

0.0013677
0.529
0.529
PC1

0.223
0.307
0.155
0.639
0..651

0.00.07012

0.271
0.801

PC2

PC3

0.0002538
0.098
0.899
PC4

0.000142ti
0.055

0.954

PC5

-0.625 -0.326 0.663 -0.118
-0.570 0.250 -.0.414 0.589
-0.345 0.038 -0.497 -0.780
0.248
0.309 -0.149
O. .642
0.094
0.322 -0.64ti -0.216

0.0.001189
.0 .04ti

1.000

170

(b) From par (a),
~ = .00137 t = .00070 .t = .00025 14 = .00014 is = .00012,

(c) Using (8-33), Bonferroni 90% simultaneous confidence intervals for Âi Â. ~ are

íl: (.00106, .00195)

Â.: (.00054, .00100)

~: (.00019, .0036)

(d) Stock returns are probably best summarized in two dimensions with 80% of the
total variation accounted for by a "market" component and an "industry"
component.
8.11 (a)
3.397 - 1. 102

9.673

s=

4.306 - 2.078

.270

-1.513 10.953 12.030
55.626 - 28.937

-.440

89.067

9 :570

31.900

(Symetric)
(b)
~ = 108.27
A

ei

t =43.15

.t = 31.29

14 = 4.60

is = 2.35

ê3

ê4

ês

ê2
.

-0.037630
0.118931
-0.479670
0.858905
0.128991

0.554515
-0.062264 0.040076
-0.249442 -0.259861 -0.769147
-0.759246 0.431404 -0.-027909
o .068822
-0.315978 0.393975
0.308887
-0.767815
-0.507549

0.828018
0.514314
-0.081081
-0.049884
-0.202000

171

.91 =-.038xl +.119x2 -.480x3 +.8S9x4 +.129xs
5'2 =-.062xl -.249x2 -.759x3 -.316x4 -.508xS

(c) Correlations between variables and components:
Xl

X2

X3

X4

Xs

r.y¡,x;

-.212

.398

-.669

.947

.238

r.Y2,X¡

-.222

-.527

-.669

-.220

-.590

The proportion of total sample variance explained by the first two principal
Components is (108.27+43.15)/(108.27+43.15+31/29+4.60+2.35)=.80.

The first component appears to be a weighted difference between percent total
employment and percent employed by government. We might call this
component an employment contrast. The second component appears to be
influenced most by roughly equal contributions from percent with professional
degree (X2), percent employment (X3) and median home value (xs). We might
call this an achievement component. The change in scale for Xs did not appear
to have much

affect on the first sample principal component (see Example 8.3)

but did change the nature of the second component. Variable Xs now has much

more influence in the second principal component.

172

-2. 768

2 . 500

8.12

(300.

51G)

- .378

-.4'64

- .586

-2.23'5

.171

. 3.914

- 1 .395

6.779

30.779

..624

1 .522

.673

2.316

2.822

.142

1.182

1 .089

,:.811

.177

11 .364

3 . 1 33

1 .04'5

3Q.978

..593

S =

.479

(Symmetrk)

1.0

-.101

1.0

-. 1 94

- .27-0

- .110

- .254

. 15'6

.183

- .074

.11£

.319

.052

.502

.557

.411

.166

.297

- .1 34

.235

.167

.448

1.0

1.0

1.0

R =

1.0
( Symetri c )

.154

1.0

Using $:
~1 = 304.2£; ~2 = 28.28; ~3 = ll.4~; ~4 = 2.52; ~~ = 1.28;

~6 = .53; 5:7 = .21
The first sampl-e princi-pal component

,.
Y1 = -.Oinxi +.993x2 +.014x3 -.OO5x4 +.024xS +.112xii +;OO2x7

accounts f-or 87% of the total sampl-e variance. Tliefirst .c'Ompont is

essentially IIso1ar r-adiation". ~Nete t~ large sample varianc~ f"()r x2
in S).

173

Usingji:

,. A

~1 = 2 .34; ~2 '=: 1.39;

,.
,. A
1.3 = 1.20; '"4 = .73; À5 = :65;

À£ = .54; À 7 = .16

The first thre,e sample

principle components are

A

Yl = .~37Z1 -.~05z2 -.551z3 ,-.378z4 -.498zS -.324z6. -.319z7
,.

Yi = -.278z,- +.527z2 +.007z3 -.435z4 -..199z5 +.5S7zti .-.308z7.
,.
Y3 = .ó44z1 +;225z2 -.113z3 -.407z4 +.197z5 +.1~9z6 +.541z7
These components ~cceunt fer 70% of the total sample vari ance.

The first camponent contrasts "\'/ind" with the. remaining
variables. It might be some general measur.e of the pol1uti()n

level ~ The second component is largely cemposed of "solar

radiati,on".. and the pollutants "NO" and Iln3". It might represent
the effects of solar radiåtion since solar radiation is involved
in the production of NO and D3 fro!l the other pollutants. The
third 'c-omponent is -eampos-d largely of ii..tind" and certain pollu-

tants (e.g. "NO" and "He"). It might represent a wi~ transport
. effect. A "better" interpretation of the components \'iould depend

on more .extensive subject matter knowledge.

The data can be eff€ctive1y summarized in three or few~r

dimensions. The choice of S' or R makes a difference.

174

8.13

(a) Covariane Matrix
XL

X2

XL

X2

X3

4.6'54750889
0.93134537C

0.931345370
0.612821160
0.110933412

o .589699088

O. 1184"69052

o .087004959

l1C933412
0.571428861

o .

X4

" .58g699088
'0.276915309

X5

1 .074885"659

o .388886434

X6

0.15815'0852

-0.024851988

0.347989910
0.110131391

X4

X5

X6

0.276915309
0.118469052

1 .074885659
o .388886434

o .087004959

0.347989910

0.110409072

0.21740"5649

o . 217405"649

.0.862172372
-0.008817'694

X3

XL

X2
X3
X4
X5
X6

0.021814433

o .

15815.Q52

-0.024851988
o .

11'(131391

0.021814433
-0.008817694
0.861455923

Correlati~n Matrix
XL

X2

X3

14

X5

X-ô

XL

1 .0000

o . 5514

0.3616

o . 53"66

0.0790

X2
X3

0.5514

1.0000
0.1875

o .3863
o .4554
o .3464

- . 0342
-0. 157"

1.0000

o .5350
o .4958
o .704'6

0.7'Û46

1 . 0000

- .0102

0.0707

-.0102

1 . 000

)(4
X5
X6

o .3616
o .3863

o . 4554
o . 5350

o .536'
0.0790

- . 0342

o . 1875
1.'ÛOOO
o . 3464

0.4958
0.1570

0.0707

(b) We wil work with R since the sample variance of xl is approximately 40 times lai.ger
than that of x4.

Eigenvalues of the Correlation Matrix

PRIN1
PRIN2
PRIN3
PRIN4
PRINS
PRIN6

Proportion

'Cumula t i va

0.47738

0.77764 0.12733

0.477385
0.179408
0.129607

0.65031 0.2"6228

0.1-08386

0.89479

0.38803 0.14478

o . '064672
o .040543

0.9594"6

Eigenvalue Difference
2.86431 1.78786
1 .-07"645 0 .29881

o . 2432"6

o . 65'679
o .78640

1 .00000

175

Eigenvectors
PRIN2

PRINl

PRIN3

PRIN4

PRIN5

-.551149
-.061367
-.421060
0.665604

-.600851

PRING
O. 146492

o . 687297

o .076408

0.331839

0.211635
o .532689

- .116262

o .4458

- .026600

0.339330
o .498607

X4

o .429300
o .358773
o . 402854

-.291738
0.380135

XS

0.521276

XL

X2
X3

X6

- . 020959
- . 073090
o . 873960

o .055877

-.628157
- .124585
- .203339

o .200526

-.207413
-.103175

o .429880

0.178715

o . 053090

-.794127

(c) It is not possible to summarize the radiotherapy data with a single component. We
nee the

fit four components to summarize the data.

(d) Correlations between principal components and Xl - X6 are
PRINl

PRIN2

PRIN3

PRIN4

-0.02766
-0 .302ti8

X4

o . 78335

X5

o .88222
o . -09457

0.39440
-0.02175
-0.07646

0.43969
-0 .55393
-0.10986
-0.17931

-0.44446
-0.04949

X3

o .75289
o .72056
o .60720

o . 29923

o . 90675

o .37909

XL

X2

X6

-0.339"55
o .53'67"6

0.16171
0.14412

8.14 S is given in Example 5~Z_

~l = 200.5, ~2 = 4..5. . Å3 = 1.3
The first sample principal component explains a proporticn

AI J

200.5/(200.5 + 4.5 + 1.3) = .97 of the total sample variance.

Also,

=1 = (-.051. -.998. .029

,.

Hence Yl = -.051x1 -.998x2 +.029x3

176

The first principal cQmponent is essentially Xz = sQdium content.

"s"dium in S). A
(NQte the (r,elativ.ely) large sample vtlriance for

Q_Q plot of the Yl values is shown bel-ow. Theseàata appear to

be approximately normal with no suspect observations.

o.
,.

Yl (1)
*

-'15.
w
..

w
..

-30.

li
'¡w
"...

*
oW

'f'

-45.

;¡
'I'

** *
..
..

-60. '

** **

w
'..

*
...
.,.

-75.
1

-2.0

-1.0

1

I

0.0

1.0
,.

Q-Q plot for Yl.

2.0

i..i~

3.\)

q(i )

177

1088.40

8.15

831 .28

1128.41
S =

7'63.23

784.09

850.32

92'6.73

1336.15

904.53

(Symmetri c)

1395.1"5'

~ A A A

À1 = 3779.01; À2 = 4'68.25; À3 = 452.13; À4 = 24~.72

Consequent1y~ the first sample principal component aCt:ounts for a

proportion .3779.01/~948.l1 = .76 of the total sample variance.
A 1 so ,

""

:1 = (.45. . .49. .51, .53)
Co nsequent 1 y ~

,.
Y, = .45xi + .49x2 + .5lx3 + .53x4

The interpretation of the first component is the same as the

interpretation of the first component, obtained from R. in
Example

8.6. (Note the sample variances in S are nearly equal).,

178

8.16. Principal component analysis of Wisconsin fish data
.(') An are positively correlated.

(b) Principal component analysis using xl - x4

Eigenvalues -of R
2.153g 0.7875 0.6157 0.4429

Eig~nvectors of R
O. 7~32 0 . 4295 O. 1886 -0.7'071

0.6722 0.3871 -0.4652 ~.4702
0.5914 -0.7126 -0.2787 -0.3216

0.6983 -0.2016 0.4938 0.5318

pel pc2 pe3 pe4
St. Dev. 1.4676 0.8874 0.7846 0.66£5
Prop. of Vax. 0.£385 0.1969 0.1539 0.1107
Cumulative Prop. 0.5385 0.7354 0.8893 1.0000

The first principal component is essentially a total of all four. The second contrasts
the Bluegil and Crappie with the two bass.
(c) Principal component analysis using xl - x6
Eigenvalues of R
2.3549 1.0719 0.9842 0.6644 0.5004 0.4242

Eigenvectors of R

-0.6716 0.0114 0.5284 -0.'0471 0.3765 -0.7293
-0.6668 -0.0100 0.2302 -0.7249 -0.1863 0.5172
-0.5555 -'0.2927 -0.2911 0 .1810 ~O. 6284 -0.3'081

-0.7'013 -'0.0403 0.0355 0.6231 0.34'07 '0.5972
0.3621 -0.4203 0.0143 -0.2250 0.5074 0.0872

-'0.4111 0.0917 -0.8911 ~O.2530 0.4021 -0.1731

pe 1 pe2 pe3 pc4 peS pe6
St. Dev. 1.5346 1.0353 0.9921 0.81£1 0.7074 0.6513

Prop. of Var. (). 3925 0.1786 0.1640 0.1107 0.0834 0.0707
Cumulative Pr~p. '0.392'0.5711 0.7352 0.84£9 0.9293 1. 0000
The \Va.liey~ is eontrasted with aU the others in the first principal eompoont ,look at
theLOvariance pattern). The

second principal component is essentially the 'Walleye and

somewhat th,e largemouth bas. The thkd principal component is nearly a contrast
betV'æ..n Northern pike and BluegilL

179

8.17
COVARIANCE MATRIX

-----------------

xl

x2
x3
x4
xS

x6

..Q13001'6

.0103784
..Q223S.Q0

.0200857
.0912071

..0079578

.0114179
.0185352
.0210995
.0085298
.0089085

.0803572
.06677"62

.0168369
.0128470

.0694845
.0177355
.0167936

.0115684
.0080712

.01'05991

The eigenvalues are

o .lS4

0.018

0..008

o .003

0.0.02

0.001

and the first two principal components are

.218 , .204, .673, .633 , .181 , .159
.337 , .432 , -.500 , .024 , .430 , .514

-x
x
...

180

8.18 (a) & (b) Principal component analysis of the correlation matrix follows.

Correlations: 100m(s), 200m(s), 400m(s), 800m, 1500m, 3000m, Marathon
100m(s)
().941

200m(s)
400m(s)
800m
1500m
3000m
Mara thon

200m(s)

400m(s)

800m

1500m

3000m

0.909
0.820
0.801
0.732
0.680

0.806
0.720
0.674
0.677

0.905
0.867
0.854

0.973
0.791

0.799

0.871
0.809
0.782
0.728
0.669

Eigenanalysis of the Correlation Matrix

0.0143
0.6287 0.2793 0.1246 0.0910 0.0545
0.002
Eigenvalue 5.8076
0.008
0.013
0.018
0.040
0.090
1.000
proportion 0.830
0.998
0.990
0.977
cumulative 0.830 0.919 0.959

Variable
100m(s)
200m(s)
400m(s)
800m
1500m
3000m
Mara thon

PC1

0.378
0.383
0.368
0.395
0.389
0.376
0.355

PC2

-0.407
-0.414
-0.459

0.161
0.309
0.423
0.389

PC3

0.141
0.101
-0.237
-0.148
0.422
0.406
-0.741

PC4

-0.587
-0.194

PC5

0.167

-0.094
-0.327
0.819
-0.026
-0.352
-0.321 -0.247
0.645
0.295
0.067
0.080

Pe6

PC7

0.089
0.745 -0.266
0.127
-0.240
0.017 -0.195
0.731
0.189
-0.240 -0.572
0.082
0.048

-0.540

)71 = .378z1 + .383z2 + .368z3 + .395z4 + .389z5 + .376z6 + .355z7

)72 =-A07z1 -A14z2 -AS9z3 +.161z4 +.309z5 +A23z6 +.389z7
Zi

Zz

Z3

Z4

r.Yi,l;

.911

.923

.887

.952

r.Y2'Z¡

-.323

-.328

-.364

.128

Z6

'l7

.937

.906

.856

.245

.335

.308

Z5

Cumulative proportion of total sample varance explained by the first

two components is .919.
(c) All track events contribute about equally to the first component. This

component might be called a track index or track excellence component. The
second component contrasts the times for the shorter distanes (100m, 200m
400m) with

the times for the longer distances (800m, 1500m, 3000m, marathon)

and might be called a distance component.

(d) The "track excellence" rankings for the first 10 and very last countries follow.
These rankings appear to be consistent with intuitive notions of athletic
excellence.
1. USA 2. Germany 3. Russia 4. China 5. France 6. Great Britain

7. Czech Republic 8. Poland 9. Romania 10. Australia .... 54. Somoa

nn
8.19 Principal component analysis of the covariance matrix follows.
Covariances: 100m/s, 200m/s, 400m/s, 800m/s, 1500mls, 3000m/s, Marmls
3000m/s
1500ml s
800m/s
40 Oml s
200m/s
100ml s

0.0905383

lOOmIs

o .0956u63

200m/s
400m/s
800m/s

Marml s

0.0966724
0.0650640
0.0822198
0.0921422
0.0810999

Marml s

0.1667141

l500m/s
3000m/s

o . 114'6714

0.1377889

0.1138699
0.0749249

o . -0809409

0.0735228

0.10831'64

0.0997547
0.0943056

0.0954430

0.0%01139

0.1054364
0.0933103

0.1018807

0.08'64542

0.12384.Q5

0.1765843
0.1465604

0.1437148
0.1184578

Marml s

Eigenanalysis of the Covariance

Eigenval ue

Proportion

Cumulati ve

Variable
lOOmIs
20 Oml s

400m/s
800m/s
1500m/s
3000m/s
Marml s

0.73215
0.829
0.829
PC1

0.310
0.357
0.379
0.299
0.391
0.460
0.423

0.08607
0.097
0.926
PC2

Matrix
0.01498
0.017
0.981

0.03338
0.038
0.964
PC3

-0.376 0.098
-0.434 0.089
-0.519 -0.274
0.053 -0.053
0.435
0.211
0.427
0.396
0.445 -0.730

PC4

0.00885
0.010
0.991

0.00617
0.007
0.998

PC5

PC6

PC7

0.127

0.236
-0.199
0.081

-0 . 499

0.00207
0.002
1.000

-0.585 -0.046 -0.624 0.138
-0.323 -0.030 0.689 -0.311
0.132
0.667 -0.187 -0.124
0.894 -0.136 -0.2'65
0.128
0.055
0.184

-0.237

-0.357
-0.136

0.734

0.095

5'1 =.3 lOx¡ + .357 x2 + .379x3 + .299x4 + .391xs + .460x6 + .423X7

5'2 =-.376x¡ -.434x2 -.519x3 +.053x4 +.21 IXs +.396x6 +.445x7
Xl

I

X2

X3

X4

Xs

X6

X7

r.YllXi

.882

.902

.874

.944

.951

.937

.886

r.Yi,X¡

-.367

-.376

-.410

.057

.176

.276

.320

Cumulative proportion of total sample variance explained by the first two
components is .926.

The interpretation of the sample component is similar to the interpretation in
Exercise 8.18. All track events contribute about equally to the first component.
This component might be called a track index or track excellence component.

The second component contrasts times in mls for the shorter distances (100m, 200m
400m) with the times for the longer distances (800m, l500m, 3000m, marathon)

and might be called a distance component.
The "track excellence" rankings for the countries are very similar to the rankings
for the countries obtained in Exercise 8.18.

182

8.20 (a) & (b) Principal component analysis of the correlation matrix follows.

Eigenanalysis of the Correlation Matrix

Eigenvalue 6.7033
proportion 0.838
cumulative 0.838

0.'6384

Variable

PC2

PC1

0.332
0.346
0.339
0.353
0.366
0.370
0.366
10,000m
Marathon 0.354

100m
200m
400m
800m
1500m
5000m

0.080
0.918

0.529
0.470
0.345

-0.089
-0.154
-0.295
-0.334
-0.387

0.2275
0.028
0.946

0.2058
0.026
0.972

PC3

PC4

o .0976

0.012
0.984
PC5

0.300

0.0707
0.009
0.993
PC6

-0.362

o .04'69
0.OQ6

0.0097
0.001
1.000

PC7

PC8

0.999

0.348

-0.381
-0.217 -0.541 0.349 -0.440
o . 114
0.077
0.133
0.851
-0 .067
0.259
-0.783 -0.134 -0.227 -0.341 -0.147
0.530
0.652
-0
.
233
-0.244
0.072 -0.359 -0.328
0.055
0.183
0.087 -0.061 -0.273 -0.351
0.244
0.594
0.375
0.335 -0.018 -0.338

0.344
-0.004

-0.066

0.061
-0.003
-0.039
-0.040

0.706

-0.697

0.069

Yi = .332z1 + .346('2 + .339 ('3 + .353z4 + .366z5 + .370('6 + .366z7 + .354z8
Y2 =.529z1 +.470'2 +.345z3 -.089z4 -.154z5 -.295z6 -.334z7 -.387('8
Zs

('6

('7

('8

.878

l4
.914

.948

.958

.948

.917

.276

-.071

-.123

-.236

-.267

-.309

('1

Z2

('3

r.YI,Z¡

.860

.896

r.Y2'Z¡

.423

.376

Cumulative proportion of total sample variance explained by the first
two components is .918.
(c) All track events contribute aboutequally to the first component. This
component might be called a track index or track excellence component. The

second component contrasts the times for the shorter distances (100m, 200m
400m) with the times for the longer distances (800m, 1500m, 500m,
lu,OOOm, marathon) and might be called a distance component.
(d) The male "track excellence" rankings for the first 10 and very lasti:ountris

follow. These rankings appear to be consistent with intuitive notions of athletic
excellence.
1. USA 2. Great Britain 3. Kenya 4. France 5. Australia 6. Italy
7. Brazil 8. Germany 9. Portugal 10. Canada ....54. Cook Islands
The principal component analysis of

the women.

the men's track data is consistent with that for

183

component analysis of the covariance matrx follows.

8.21 Principal

Covariances: 1oom/s, 2oom/s, 400m/s, 8oom/s, 1500mls, 5000mls, 1o,oom/s,lfiJQ~~

10Om/s 200m/s 40Om/s 800m/s 1500m/s

0.0434979
0.0482772
0.0434632
0.0314951
0.0425034
0.0469252
0.0448325
0.0431256

100m/s
200m/s
400m/s
800m/s
lS00m/s
5000m/s
10,OOOm/s

Marathonm/S
5000../5

10,OOOm/s

Marathonø/S

0.0648452
0.0558678
0.0432334
0.0535265
0.0587731
0.0572512
0.0562945

0.0688217
0.0428221
0.0537207
0.0617664
0.0599354
0.0567342

0.0761i388

0.0745719
0.0736518

0.0942894
0.0909952 0.0979276

Covariance Matrix

Eigenanlysis of the

0.01391
0.024
0.947

Eigenvalue 0.49405 0.04622

proportion

0.0729140

10,OOOm/s Marathonm/s

5000../s
0.0959398
0.0937357
0.0905819

cumlative

0.0468840
0.0523058
0.0571560
0.0553945
0.0541911

0.079
0.923

0,844
0.844

0.01332
0.023
0.970

0.00752
0.013
0.983

0.00575
0.010
0.993

0.00322
0.006
0.998

Eigenvalue 0.00112

proportion

cuulative
Variable
10Om/s

200m/s
400m/s
BOOm/s

1500m/s
5000m/s
10,OOOm/s

Marathonm/s

0.002
1. 000
PCL

0.244
0.311
0.317
0.278
0.364
0.428
0.421
0.416

pc3

PC2

-0.432
-0.523
-0.469

0.173
0.235

-0.684

0.436
0.439

-0; 033

0.063
0.261
0.310
0.387

-0.111
-0.187
-0.128

pc4

pc5

PC6

-0.450 -0 .390
-0.318 0.341
0.420 0.046
0.332
0.543
0.317 -0.303
-0.016 -0.374
-0.100 -0.215
-0.339 0:584

0.119

-0.247

0.177

pC7

0.584

-0.535
0.039

pc8

-0.119

0.096

-0.008
-0.070
-0.044

-0.368 0.432
0.608 -0.327
-0.334 -0.006 0.'696
-0.352 -0. ,180 -0.6,93
0.074
0.215
0.391

j\ = .244xl + .311x2 +.317 X3 + .278x4 + .364xs+ .428x6 + .421x7 + .416xs

5'2 =-.432xl -.'S23x2 -.469x3 -.033x4 +.063xS +.261x6 +.3lOx7 +.387xs
Xl

X2

X3

X4

Xs

X6

X7

Xs

r.YI,X¡ ,

.822

.858

.849

.902

.948

.971

.964

.934

r.Y2'X¡

-,445

-.442

-.384

-.033

.050

.181

.217

.266

Cumulative proportion of total sample varance explained by the first two
components is .923.

The interpretation of the sample component is similar tt) the interpretatìon in

Exercise 8.20. All track events contribute about equally to the first component.
This component might be called a track index or track

excellence component. ,

The second component contrasts times in rns for the shorter distances (100, 200
400m, 800m) with the times for the longer distances (1500m, 'SooOm, 10,0Q,

marathon) and might be called a distance component.
The "track excellence" rankings for the countries are very similar to the rankings
for the countres obtained in Exercise 8.20.

184

8.22

Using S
Eigenvalues of the CovarianeeMatrix
Cumulative

Eigenvalue

Oi ffe renee

proportion

20579.6
4874.7
5.4

15704.9

2.8
0.4

0.808198
0.191437
0.000213
0.000130
0.000018

0.1

o . 000003

PRIN1

PRIN2
PRIN3
PRIN4
PRINS
PRIN6
PRIN7

~!!S!!.2

2.1

3.3
0.5
0.1

./

0.000000

0.0

Eigenveetors

X3
X4

X5
X6
X7

X8
X9

PRIN3

PR IN4

PRINS

PRIN7

PRIN2

PRIN6

PRIN1

0.005887
0.487047

o . 009680

0.286337

0.608787
_ .003227
_.425175
0.311194

o .535569
o . 000444
o . 008388

-.509727
-.000457
0.010389

0.024592
_.000253

~
o . 008526

0.003112
0.000069
0,009330

(Õ. e72697 .;

0.029196
0.004886
_ .000493
0.008577
_ .487193

- .034277

0.904389
0.133267
_ ,018864
0.284215
0.004847

0.593037

0.390573
0.011906
_.748598

-.005597

o . 002665

-.005278

O. 855204

0.043786
0.082331
_.000341

Plot of Y1.Y2. Symbol is value of X1.
(NOTE: 10 obs hidden. I
2500
8
Y1

8

1

8115

2000
5

8 1

5

1

8

81 8 8
18
5
885
5111 11 551
51
15 111 1 1 8

155

55 18
1

1

8

8

8 1

8

8

8
8

8

1

5

5

1500

-100

a

100
Y2

200

300

0.014293
_.037984
0.998778
0.013820
_ . 000256

yrhgt

ftfrbody
prctffb
fraiie
bkfBt

saleht
salewt

185
8.22 (C"Ontinued)

Using R
~igenvalues of the ~orrelation Matrix

Ugenvalue

Difference

Proportion

'Cuaiulative

2.78357

0.581171
0.191018
0.105912
0.060204

0.58867
0.77969

4.12070
1.33713
0.74138
0.42143
0.18581
0.14650
0.04706

PRINI
PRIN2
PRIN3
PRIN4
PRINS
PRIN6
PRIN7

o . 59575

0.31996
o .~3562

o . 88560

0.94580
0.97235
0.99328

o . 02644

0.03930
0.09945

0.020929
0.006722

1 .00000

Eigenvectors

X3

X4
X5
X6
X7
X8
X9

PRINS

PRIN6

PRIN7

0.065871

_.072234
_.177061
0.127800
_.434144
0.208017
0.799288
-.276561

0.774926
0.017768

PRINl

PRIN2

PRIN3

PRIN4

0.449931
0.412326
0.355562
0.433957
...186705
0.452854
0.269947

... 042790
0.129837

-.415709

- .038732

0.113356
0.247479
0.314787
0.242818
0.618117

_ . 176650

-.215769

-.109535

0.253312

_ . 582433

0.290547

0.450292
0.568273
_.452345

..315508
0.007728
0.714719
0.101315
0.600515

Plot

Syiibol

of Vl.Y2.

-.719343
0.579367
0.142995
0.160238

yrhgt

o . 042442

ftfrbody
pr(:tffb
freiie
bkfat

- . 236723
0.047036

sa leht
salewt

- .002397

- . 582337

is value of Xl.

(NOTE: 27 obs hidden.)

1200
8

8 8
8 88181

1000

8 118851 8151
Vl
1

800

8 88811111 1 8 15 1
1111155 55

1 1 5

5

600
i

i

800

900

1000

1200

1100

1300

V2

Plot of VL.02.

(NOTE:

Syiibol used

is

Plot of Yl.02.

(NOTE:

FOA S

36 obs hidden.)

38 obs hidden.)

1200

2500

VL

SymbOl used is

hi( ~

*****

1000
VL

2000

. .

800

.*. ...

.......
......

........

***......

.- .. ..
600

1500
1

-3

.2

-1

0
.02

2

3

.3

.2

i
i

-1

a
Q2

1

2

3

1~6

8.23 a) Using S
Eigenvalues of S

4478.87 152.47 32.32 8.12 1.52 0.54
Eigenvectorsof S (in colums)

-0.849339
-0.368552
-0.194132
-0.314€78
-0.043918
-0.064458

0.470832 -0.22€606 0.074260 -0.008692 -0.000202
-0.846078 -0.368132 0.012754 -0.110784 -0.019105
-Q.058127 0.303143 -0.928388 -0.012289 -0.070597
-0.216748 0.848576 0.355060 -0.082353 0.032666
-'0.060354 0.001815 -0.060162 0.440119 0.892805
-0.092026 0.033880 0.052267 0.887138 -0.443264

The first component might be identified as a "size" component. It is domiated
by Weight, Body lengt and Gir, those varables with the largest sample
varances. The first component explains 4478.87/4673.84 = .958 or 95.8% of
the
total sample varance. The second component essentially contrasts Weight with
the remaining body size varables, Body length, Neck, Gir, Head lengt,

component

and Head width, although the sample correlation between the second
and Neck is small (-.05). The first two components explain 99.1 % of

the total

sample varance.

These body measurement data can be effectively sumarze in one dienion.

b) Using R

R

1.0000
0.8752
0.9559
0.9437
0.9025
0.9045

0.8752

1.-0000

0.9013
0.9177
0.9461
0.9503

0.9559
0.9013
1.0000
0.9635
0.9270
0.9200

0.9437
0.9177
0.9635
1.0000
0.9271
0.9439

O. 9025

0.9461
0.9270
0.9271
1.0000
0.9544

0.9045
0.9503

O. 9200

0.9439
0.9544
1.0000

Eigenvalues of R
5.6447 0.1758 0.0565 0.0492 0.0473 0.0266

Eigenvectors of R (in colums)

-0.558334
0.532348
-0.409938 -0.389366
-0.411999 -0.222694
-0.4091 £2 0.318718
-0.41'0333 0.319513
-0. 403'672

-'0.4'04313

0.286817 0.261937 -0.598371 0.128024
-0.186741 0.719785 .0.0-04276 0.012490
0.035396 0.073950 -0.561034 -0.599053
-0.581252 -0.228969 0.231095 0.580499
0.695916 -0.291938 0.251473 0.313431
-0.243840 -0.519785 -0.458838 -0.435168

l87

8.23 (Continue)
Again, the first principal component is a "size" component. All varables
contribute equally to the
first component. This component explains
5.6447/6 = .941 or 94.1 % of
the total sample variance. The second principal
component contrasts Weight, Neck and Girth with Body length, Head lengt
and Head width. The first two components explain 97% of
the total sample
variance.

These data can be effectively sumarzed in one dimension.
c) The results are similar for both the covarance matrx S and the correlation
matrx R. The fist component in each analysis is a "size" component and
aInost all of

the varation in the data. The analyses differ a bit with respect

to the second and remaining components, but these latter components explain
very little of the total sample varance.

188

8.24 An ellipse format chart based on the first two principal.cmponents of the Madison,
Wisconsin, Police Department data
XBAR

3557.8 1478.4 2676.9 13563.6 800 7141
S

-72093.8

367884 .7
-72093..8
85714.8
222491.4

1399053.1
43399 .9

139692.2 -1113809.8

1698324. 4 ~244 785.9

-44908 . 3

110517.1

101312.9

11'61018.3

-244785.9 224718 .~ 4277'67 .S
-462615.6 42771)7 .5 24138728.4

85714.8

222491.4 -44908 .3
43399 .9
139692.2 110517 .1
1458543.~ -1113809.8 330923.8
330923.8
1079573.3

1~1312. 9
111)1018.3

1079573.3
-4'6261S .6

Eigenvalues of S

4045921.9 2265078.9 761592.1 288919.3 181437.0 94302.6
Eigenvectors of S
-0.0008 -0.0567 -0.5157 0.6122 0.4311 -0.4126
-0.3092 -0.5541 0.5615 0.4932 -0.1796 -0.0810

-0.4821 0.3862 -0.3270 0 .3404 -0.5696 0 . 2667
0.3675 -0.6415 -0.4898 -0.0642 -0.4308 0.1543
-0'.1544 0.0359 -0.0316 -0.3071 -0.4062 -0.8453

-0.711)3 -0.3575 -0.2662 -0.4094 0.3269 0.1173
Principal components

yl y2 y3 y4 y5 y6

1 1745.4 -1479.3 618.7 222.6 7.2 178.1
2 -1096.6 2011.8 652.5 -69.5 636.9 560.2

3 210.6 490.6 365.8 -899.8 -293.5 -15.2
4 -1360.1 1448. 1 420.1 523.5 -972.2 88.5
5 -1255.9 502.1 -422.4 -893.8 359.9 -273.7

6 971.6 284.7 -316.9 -942.8 -83.5 -70.1
7 1118.5 123.7 572.9 319.9 -60.8 -598.5
8 -1151.6 1752.0 -1322.1 700.2 -242.2 -158.8
9 -497.3 -593.0 209.5 -149.2 101.6 -586.2

10 -2397.1 1819.6 -9.5 -147.6 -109.9 207.8
11 -3931.9 -3715.7 924.1 35.1 -274.2 152.9
12 -1392.4 -1688.0 -2285.1 372.1 444.0 85.2
13 326.8 650.8 1251.6 728.8 809.S -140.0

14 3371.4 -379.1 -499.9 -114.6 -324.3 286.9
15 3076.S -199.1 -105.7 419.8 -122.3 3.4
16 2261.9 -1029.3 -53.7 -104.5 123.8 279.6

189

2.5 X 10-7 yl + 4.4 x lL-7 yi = 5.99

The 95% 'Control ellipse base on the
first two principal.cmponents of overtime hours

ooo
~

ooo

"'

-400

o

2000 4000

y1

8.25 A control chart based on the sum of squares dij. Period 12 looks unusuaL.

Sum of squares of unexplained t:omponent of jth deviation

.

It

~

0
~
.. M
~
en
en

iq

0
d

.

.
.

.
2

.

.. ..
4

6

.
8

Period

..
1-0

.
12

14

1'6

190

8.26 (a)-(c) Principal component analysis ofthe correlation matrix R.
Correlations: Indep, Supp, Benev, Conform, Leader
Indep
-0.173
-0.561
-0.471
0.187

Supp
Benev
Conform

Leader

Supp

Benev Conform

0.018

0.298

-0.327
-0.401

-0.492 -0.333

Cell Contents: Pearson ~orrelation

Principal Component Analysis: Indep, Supp, Benev, Conform, Leader
Eigenanalysis of the Correlation Matrix

1. 3682

0.439

0.274
0.713

0.7559
0.151
0.864

PCL

PC2

PC3

Eigenvalue 2.1966
0.439

l'ortion
Cumulative
Variable
Indep

Supp
Benev
Conform

Leader

0.5888 0.0905
0.118 0.018
1.000
0.982
PC4

PC5

-0.521 0.087 -0.667 -0.253 -0.460
0.351 -0.454
0.187
0.788
0.121
0.115 -0.733 -0.386
0.548 -0.008
0.525 -0.451
0.439 -0.491 -0.295
-0.469 -0.361 0.648 0.007 -0.480

Using the scree plot and the proportion of variance explained, it appears as if 4
components should be retained. These components explain almost all (98%) of

the variabilty. It is difficult to provide an interpretation of the components
without knowing more about the subject matter. All four of the components
represent contrasts of some form. The first component contrasts independence
and leadership with benevolence and conformity. The second component
contrasts -support with conformty and leadership and so on.
SG-llot of Indap,

t.o
0;5

0.0
1

2

3
Component Number

4

5

191

. Scatterplot of y2hatvs 11l1lt .. .

. 1
U'

.... . .. 2

.
-:

..

.' .

.

. ..

..

.

.. .... . ...

. ..
-3

..

-4

.. ..
.., ... .~
. . .
. ..
. '.. .
.

.

"

.. .

.' .

fi ..I. .. .
..

"

..

.2 yII1
-I

~3

. ...

o

2

1

3

.':. -',::
.:--:'-.,
.. .... .. .."", ," ".." .: - -,-: ~\
'" - " .:---- ,::'--'.......
....._-,.

.__,..___,-::-":___.:::_,'::'/-.:--d"::,":-:

SCàlterplot òfy2h:al vsyiihat

.

.

.
.
.

.'

-:

.. .
.
.
,., .. .~
. . .

.' . .
. ..

..

/till

o
.

-4

.3

. ..
-2

..

.

.. ....

. ...

., .
.

. .-

..
.

yll

.. e.

.

.

"

...

.

..

". .. .

.. .. . .

i

3

The two dimensional plot of the scores on the first two components suggests that
the two socioeonomic levels cannot be distinguished from one another nor can
the two genders be distinguished. Observation #111 is a bit removed from the
rest and might be called an outlier.

192

covarance matrix S.

(a)-(d) Principal component analysis of the

Coyariances: Indep, Supp, Seney, Conform, Leader
Indep
34.7502
-4.271;7

Inde

Supp
Benev
Conform

-18.0718
-15.9729
5.7165

Leader

Benev

Conform

29.8447
9.3488
-13.9422

33.0426

Supp

17.5134
0.4198

-7.8682
-8.7233

-9.9419

Leader

26.9580

Principal Component Analysis: Indep, Supp, Seney, Conform, Leader
Eigenanalysis of the Covariance Matrix

68.752
0.484
0.484

Eigenvalue

Proportion

Cumulative

Variable

PC1

Indep

-0.579

Leader

-0.380

0.042
0.524
0.493

Supp
Benev
Conform

31. 509

0.222
0.706

23.101
0.163
0.868
pc3

PC2

-0.643
0.140
0.119
-0.422

0.079
0.612
0.219
-0.572

16.354
0.115
0.983
PC4

0.309

-0.515
0.734
-0 .304
0.090
0.612

-0.494

2.392
0.017
1.000
pc5

0.386
0.583
0.352
0.398
0.478

Using the scree plot and the proportion of variance explained, it appears as if 4
components should be retained. These components explain almost all (98%) 'Of

the variabilty. The components are very similar to those obtained from the
correlation matrix R. All four of the components represent contrasts of some
form. The first component contrasts independence and leadership with
benevolence and conformity. The second component contrasts support with
conformity and leadership and so on. In this case, it makes little difference
whether the components are obtained from the sample 'Correlation matrix or

the sample covariance matrix.
of ItidèP# -.1 LeaCler--Cv Mamx

Scre Plot

50
11

i "1

~
.=i 30
¡¡

20
10

o
1

2

3

CompolientNumbe

4

5

193

Scatterplbt of y2hatcov vs ylhatcov
15

. 1
. 2

..

.
...
. ..l.

u. So'.

....... .
~ .

.. .

... ..... -.

".

... ....

. ..

.. . .

. .. _... .. e.- .

..
. ..
.
.. . .
.. . ..
. . . ".
""

#111

ø

. ..

1#

~

l lö4

-15

-20

.10

io

o

y1hav

20

~

.Sctterplot of y2hatatv VB y1hàtcv

Lj

15

to

..

.-. . .

.. .

. . _'.

.. ..

... . e.
.. .
... .....

....,. ." . . .. .

. ..

fill

.

ø

""

.

.. -.. .
. . .. ... .
. . . i...

. . .

.. . ..

1#

. ..

i.

.

-iO

.15 ø
-J 10,!

.20

.Uil

o

iO

20

yilhatcv

The two dimensional plot of the scores on the first two components suggests that
the two socioeconomic levels cannot be distinguished from one another nor can
the two genders be distinguished. Observations #111 and #104 are a bit removed
from the rest and might be labeled outliers.
Large sample 95% confidence interval for Â.i:

(l-1.96-21130
((l+1.96.21130
68.752 , 68.752
)=(55.31,90.83)

194

8.27 (a)-(d) Principal component analysis of the correlation matrix R.

Correlations: BL, EM, SF, BS

BL EM SF

EM 0.914

SF 0.984 0.942

BS 0 . 988 0 . 875 0 . 975
Cell Contents: Pearson correlation

Principal Component Analysis: BL, EM, SF, BS
Eigenanalysis of the Correlation Matrix

Eigenvalue 3.8395
Proportion 0.960
0.960
Cumulative

o . 1403

Variable

PC2

BL
EM

SF
BS

PC1

0.506
0.485
0.508
0.500

0.035
0.995

0.0126
O. 003

0.998
PC3

0.0076
0.002
1.000
PC4

-0.261 -0.565 0.597
0.819 -0.194 -0.237
-0.020 0.800 0.318
-0.510 -0.053 -0.698

The proportion of variance explained and the scree plot below suggest that one
principal component effectively summarzes the paper properties data. All the
variables load about equally on this component so it might be labeled an index of
paper strength.

Component ftJlmbe

195

The plot below of the scores on the first two sample principal components
does not indicate any obvious outliers.
Sætterplot ofylhat vs y2hat

. .

. .~. . e. .. .
o.:..
..
~

.

. o.

. .:
..
.- ...

'O e.

..

~3

-4

-050 "0.25 0.00

0.25 0;50
y2hiit

U)O

0.75

1.5

(a)-(d) Principal component analysis of the covariance matrix S.
Covariances: BL, EM, SF, BS
EM

SF

BS

0.513359
0.987585
0.434307

2.140046
0.987966

0.480272

BL
BL
EM

SF
BS

8.302871
1. 88£636

4.147318

i.972056

Principal Component Analysis: BL, EM, SF, BS
Eigenanalysis of the Covariance Matrix

Eigenvalue

proportion

cumulative

Variable
BL
EM

SF
BS

11.295
0.988
0.988

0.104
0.009
0.997

PC1

PC2

0.856
0.198
0.431
0.204

0.032
0.003
0.999
PC3

0.006
0.001
1. 000
PC4

-0.332 0.155
0.786 -0.497 -0.3Hl
0.259
0.733
0.458
-0.201 0.325 -0.901
-0.364

The proportion of variance explained and the scree plot that follows suggest that
one principal component effectively summarzes the paper properties data. The

loadings of the variables on the first component are all positive, but there are
some differences in magnitudes. However, the cOl'elations of the variables with

196

the first component are .998, .928, .990 and .989 for BL, EM, SF and BS
respectively. Again, this component might be labeled an index of paper strength.

Component NurilJ

The plot below of the scores on the first two sample principal components
does not indicate any obvious outliers.
'Stàtb~r,plot of ylhatcov vs y2håtc .
27.5

..

.
..'
..- a.
.

..

. .... .

,
..

. ..

.. .
".

. a...
17;5

.

15.0 .

0.0

0.4 0.8
y2hatcov

/
1.2

1.6

197

8.28 (a) See scatter plots below. Observations 25, 34, 69 and 72 are outliers.

Scttei¡løt øf'family YS Distad

160

.~~

.
. .
.." ..
. .,.
...

"1

i'. ..

.

21)

:. '

tt 1
I)

..
. .

il ?i
114'1:

200

100

0

Dist

300

40

50

5cttl'1CJtCJfPistRlI"SiCatte
500

.

#" r.'1

"1

i

..

300

.... l=..

¡

¡OO

100

0

.
I

.

.

" ". r.. . .. .. . .

0

20

li 3~.

.
40

Catt

60

8Ø

100

(b) Principal component analysis of R follows. Removing the outliers has some but
relatively little effect on the analysis. Five components explain about 90% of
given the
the total variabilty in the data set and seems a reasonable number

scree plot.

198

.3 Coirpone
45 Numbe
6
Prlnclp81 Compon8nt An8lysls: AdjF8m, AdjDlstRd, AdjCotton, AdjMalz AdjSorU..Outle 25.34,68,72

remove)

Eigenalysis of the correlation Matrix
proportion

culative

0.121 0.088
0.745 0.833

0.160
0.625

0.465
0.465

0.3661 0.2400
0.041 0.027
0.941 0.968

0.6043
0.067
0.900

1.0845 0.7918

1. 4381

Eigenvalue 4.1851

O. 1718

0.019
0.987

Eigenvalue 0.1182
proprtion 0.013
C\lative 1.000

variable
AdjFam

AdjOistRd
AdjCotton
AdjMaize
Adj Sorg

AdjMillet
AdjBull

AdjCattle
AdjGoats

pe

0.434
0.008
0.446
0.352
0.204
0.240
0.445
0.355
0.255

PC2

PC3

0.098
-0.569
0.132
0.388
-0.111

PC4

0.171
0.496
-0.027
0.240
-0.059
0.616

0.065
-0.497
-0.009
-0.353
0.604
0.415 -0.116
-0.068 -0.030 -0.146
-0.284 0.014 -0.373
0.049 -0.687 -0.351

PC5

0.011
-0.378
-0.219
-0 . 079

-0.645
0.527
-0 . 028

0.218
0.249

PC6

-0.040
0.187
-'0.200
-0.273
0.246
0.181
-0.134
0.759
-0.402

PC7

PC8

PC9

-0.797 -0.263 -0.249
0.021 -0.048 -0.065
0.361 0.329 -0 . 675
-0.024 0.363 0.574
-0.021 0.126 0.293
0.241 0.077 0.048
0.396 -0.751 0.190
-0.011 0.169 0.038
0.274 0.149
-0 . 131

Princlp81 Component An8lysls: F8mlly, Dlsd, Cotton, Møz Sor9, MIII8 BulL. .. ·

Eigenanlysis of the Correlation Matrix

proportion

4.1443
0.460
0.460

Eigenvalue

0.1114
0.012

Eigenvalue

cuulative

proportion

C\lative
variable
Family
OistRd

cotton
Maze

sorg

Millet
Bull

Cattle
Gots

1. 2364

1. 0581

0.9205
0.102
0.818

0.6058
0.067
0.885

0_5044

0.056
0.941

0.137
0.598

0.118
0.715

PC2

PC3

PC4

PC5

-0. 002

-0 .123

-0. 089

-0 . 127

0.100

-0.216

0.129

0.110

0.770

o . 043

0.2720 0.1470
0.030 0.016
0.971 0.988

1. 000
PCL

0.444

-0 .100

-0.033 -0.072 -0.831 0.502
0.411 -0.342 -0. 068 0.030
0.337 -0.554 0.170 0.164
0.311 0.452 -0.069 -0 .229
0.043 -0.385 -0.606
0.269
0.440 -0.029 0.122 0.197
0.247 0.458 0.278 0.486
0.309 0.379 -0.173 0.100

PC6

-0.194 -0.051
-0.134 O. 053
-0.361 -0.632
-0.182 O. 594
-0.392

0_407

PC7

-0.579
-0.045
0.509
-0.352
0.055
0.089
0.458
-0.012
-0.242

PC8

PC9

0.454 -0.461
0.041
0.082
-0.372 -0 . 504
0.499
-0.360
0.300
-0 .139
0.077
-0.097
0.357
0.621
-0.215 -0..225
-0.242 0.095

199

(c) All the variables (all crops, all livestock, family) except for distance to road
(DistRd) load about equally on the first component. This component might be
called a far size component. Milet and sorghum load positively and distance

to road and maize load negatively on the second component. Without additional
subject matter knowledge, this component is difficult to interpret. The third
component is essentially a distance to the road and goats component. This
component might represent subsistence farms. The fourth component appears
to be a contrast between distance to road and milet versus cattle and goats.
component appears to
Again, this component is diffcult to interpret. The fifth
contrast sorghum with milet.
8.29 (a) The 95% ellpse format chart using the first two principal components from the

covariance matrix S (for the first 30 cases of the car body"2
assembly
data) is
"2
shown below. The ellpse consists of all YI':h such that Yl + ~2 S X; (.05) = 5.99

Â, Â.

lie outside the ellpse.

where -l = .354, t = .186. Observations 3 and 11

Scalterplot of y2hat-y2bar vs ylhat-yl_

.1111

-T

-1.5
-1

o

1

2

ylhat-ylbar

(b) To construct the alternative control char based upon unexplained components of
the observations we note that di = .4137, S~2 = .0782 so

e .0782 = .0946 v = 2 (.4137)2 = 4.4. Conservatively, we set the chi-

2(.4137) , ;U782

squared degrees of freedom to 1) = 5 and the VCL becomes
ex; (.05) = .0946(11.07) = 1.05 or approximately 1.0. The alternative control char
is plotted on the next page and it appears as if multivariate observation 18 is out
of control. For observation 18, y; makes the largest contribution to d~18 and

200

getting the most weight in Y 4 are the thickness measurements Xl
and X2. Car body #18 could be examined at locations 1 and 2 to determine the
cause of the unusual deviations in thickness from the nominal levels.

the variables

t.

l. =ä 5(.05)
." 1.0

201

Chapter 9
9.1

.8' .63 .45

L' = (.9 .7 .5);

LL' = .63 .49 .35
.45 .35 .25

so 2 = LL' + 'l
9.2

å) For m-'

h1 = 9.Ìi = .81

h1 - III = 49

. 2 - lY21' .

hi = 9.ii = .25

The communalities are those parts of the variances of the
variables explained by the single factor.

b) corr(Zi'F,) = Cov(Zi'Fi)' i = J,2,3.' By (9-5) cov(Zi,F,) = .lil.
Thus Corr(Zp'F1) = 111 = .9; Corr(~,F1) = 9.21 =. .7; .corr(Z3,Fi) =
9.31 = .5.. The first .variab1e, Zl' has the largest correlation

with the factor and therefore will probably' carry the most weight

in naming the factor(. .6251
9.3

a)

L = r'':1 = 1i . ~93. =

. ' .507,

.711 "

.831 . Slightly different
(.87'61

from result in Exercise 9.1.

b) Proportion of total variance explained = ~ = , .i6 = .65

.451
9.4

i (.81 .63

.e = f - '¥ = LL = .45
.63 .49
.35

.35
.25 .

' L = h1 ~1 = Ii .5ti23 =,.7
.40Hi . . (.91
.'5
(.7~29J

202
Result is consistent with results in Exercise 9.1. It shoul.c

- - _. -

be since m = 1 common factor completely determin.e~ e = 2 - 'l .

9.5

Since V is diagonal and S - LL' -, has zeros on the diagonal,
(sum of squared entri es of S - LL i - V) S (sum of squared .entries of

.,. ,. A " I .
S - LL). By the, hint, S - LL =,P(2)A(2)P(3) \'1hich has sum of

squared er1tri es

A ,. Ai ,. A ¡Ai. A,. Ai Ai
tr(P (2)A(2)P (2) (p (2)A(2)P (2)) J = tr(P (2)A(2f (2)P (2))

,. "'i Ai A A,.i

= tr(A(2)A(2)P(2)Pc2)) = tr(A(2)A (2))

,.,. A

m+ m+. '11

= ~2 1 + ~2 2 + ... + l!

Therefore,

..1 - ,. It A
(sum of squared entri es of : S - LL - ,) s ~~+l + À~+2 + . .. + Àp

9.6
a) Follows directly from hint.

b) Using the hint, \'ie post multiply by (Ll' +'1) to get
I = ('1-1 -'1-1L(I +L,'¥-1L)-1L''1-1)(LL' +'1)

= '1-1 (lL' +'1) -'1-1L(I +L''l-lL)-1L'v-1(LL' +'i)

'-(use part (a))
= ,-1 (Ll + '1) - '¥-l L( I ~'(I + L' '1-1 L) -1) l'
- '(-1 LÙ + L ''1-1 L) -1 L'

= ,-1Ll +1 -'1-1LL' +'1-1L(I +L'V-1L)-lL'

_ '1-1i(i +LI'1~1L)-lL' = I
Note all these multiplication steps are reversibl~.

c) Multinlyin~ the result in (t) by L we get

203
(Ll' +V)-1L = 1f-\_iy-1LlI +L''i-lLrll''¥-~
..

(use part (a))

= '¥-lL-V-\(1 _ (I i-L1'1-1L)-1) = iy-'L(I +L''1-1L)-1

Result follows by taking the transpose of both sides of the
final equal ity..

9.7 Fran the equation ~ = Ll ..' '¥, m = 1, we have

;1 9.z~

rii ai ~

, ~12 a2~

121 +"'d

9.11121

= (111 +~1

'so aii = 9.11 +,wl' a2i = ~21 +"'2 and a12 = 111121

let p = a12/lan la22 . Then, for any choice Ipl/a22 s 121
:S /aZ2 t set .lll = alZ/121 an~ check. a12 = 9.119.21. We
2
on
't1
=
aii
lYll
=
(111
.l~1
~
(111
pZau
obtai ,Ii 112 al2 . a12 - -a11 -~V11
--0

and tPZ = a22 - 12.1 ~ 0'22 -0'22 = o. Since i21 \.¡as arbitrary

of

within a suitable interval, there are an infinite number

solutions to the factorization.

9.8 . 1: = Ll + 'i for m = 1 imp 1 i es

= .l; 1 + il1

.. ¥ = 9.,,9.21
1 ,: 9.21 + *2

( 1

.., = 111131 )
.i = 121131

1 = 9.31 + "'3

No\'1 i~ ~ = :; and .l119.21 = .4 , so 9.; 1 = (:;)(.4) and

9." =:! .717. Thus .l21 =':! .55&. finally, from .9 =
111R.31 ~ -w have .t31 =:! .9/.717 = :t 1.255 ·

204
Note all the loadings must be of the same sign because all the

have

covari ances åre positiv.e~ ' We

~o 717 J

.4

LL'.= .558 (.717 .558 1.255 J.

.3111

.9

1.255

= (:¡14

.7

.9 J
.7

1 .575

so' ~3 = 1 - 1 ~~7S-= ~.57~, which is inadmissible as a varianc~.

9.9
(a) Stoetzel's interpretation seems reasonable. The first factor

seems to contrast sweet witb strong 1 iquors.
,(b)

---"-"'-_._...
__ 0' -_... .-...._..__.

. Factor 2 .::.......... .. ,1 .0 -- - ._-. ._....:-...-

-_.._... -- ... ..' - _.... -. -_.. .,. ... - _...- ..
-'_._"--- ...._..__._-

_._--~. ". -. .... ....

,,

--_.... .... ....-;.

.. .. _..

- - -- ---~ - .. .....,. _....

._---- ._~...;...... : ...... ~..=:-.... ., -:._:..~ -. ',...

---_._...._. .-_.'

. ... _. .., . .... ..

:~:::~~~ .:. ~~:="~.""'" ~ -':': ," .. '..

-_........_-

. '.' ,.

.,-........-..-.------~__---.--.
- ....... _. . ._--- ------_...

.. O....n ..... __.________..._
0.. ,-..;- ... ....-.- '--7- ~
._......- ._.._._------.. . "': ..... ....... -..._-~.__. ~.._~-

. ,.,.

....... ... _. ..-. ._--: _.._--

o Rum

.. .. .... .. ..... ........_..-.. ... ....... -_.

.... . ._ ."M. ._...._~_ .. ..'_'''_._

, '

" Marc.

'0

..__._. ._.._..

- ." ..... ... . .'-.'-' -'. .- _..._-----

'.- .. ....-. -- -_._.- ..__....._.
._.- ':'--

.5

--_...... ... ..

-- .- - --'-- _........ - .. .-_..--.. -'" .....-: .

.. ...- . ~. .... - ..
... - -_. ~.. _...._... ':'.:

_. .... . . ....~.. .... .. _.. n .. .. ..

-:.~
-- ....;
.- ...__....
- .....
_.. . n..... ..__..
.....
.:... ._.:~:_'.~:"~.- .'- ..'-.:.:

. ........._.
, -_.....
"

.
. -..1 . N .-:..
. ....__._.

. . ...
.. .

, Ca 1 vados

L... ._._...._

-_. .,...- ,... , .' .." . ... .. --factorl --.......... -_. ...... .. . -

_. ._- - - ' ' .5 Li quors'- ....' .. .._-,.._--'r-. . --

.----_...-_... .........._- C
._. ....'---.7- :_h. .....~:-.--l

a ." ,o~.~~c..-.~5 '-:". .. . ~.. Kirsch" . ,1.0 ~--. .., ---,_.::;.::..:

k _...- _.. .., .... --.-.--.. ... -.._--_..~.- _.--- -- . 'A a oc ' ,. .. Wh.
Mirabelle
--'" .... -'--'-'-' --,--=~:_~::::__ ~ =~~ d:b.~~~!.~~.::i :. ... :- _.~ ~::~: ;...:~~: ~::::::. _: .~:_._-=::~?n~ ::~:=::£--=
-----:~--~_. . . .... -----_. _.__._--;---------"'------.--..
~-':'=-_-:;_.C:-:~-~.'~=:.:_~.~ .' :......, :.:'--:..' is e~ ._........-.';.. .... -----......---;-.;..-..--:.-

..

. . ._~_____:- _.. _. '-.5' .-.::__ _.. .__.. ." _ '" __.__'-___.0.
~ --~_......_-----_.. _.. ...._.-:~-_.....--:.--:~-~

...------_..._-- ." _.. ..... -----~.._._--_.... ".._._. ._"- --_.._.._-_....
.._..~-----_.
~___~.:..-~_.
..------. ...
-----:-~7-. ~-~.. .. ---~-~... - . ._..... -~_.._..--.._-_..
i:\.

,-

It doesn't appear as if rotation of the

factor axes is necessary.

(a) & (b)

The speci f; c variances and communal ities based on the unrotated

factors, are given in the f~llcwing table:

205

Speci fi c Vari ance

Vari abl e

Communa 1 ity

.5976

.4D24

Skull breadth

.7582

.2418

Femur 1 ength

.1221

.8779

Ti bi a 1 ength

.0000

1 .0000

Humerus 1 ength

.0095

.9905

Ulna length

. 0938

.9062

Skull

1 ength

(e) The proportion of variance explained by each factor is:

Factor 1 :

~ ;=1
r

9.;;

=

Factor 2 :

! r 12i
6 0 1

=

1=

(c)

4.0001
6

.4177
6

or

66.7"h

or

6.7%

,. A ,.
R-Lz l-'i=
z
0

.193

-.017

-.032

0

.000

.000

0

- . 000

.001

.000

.000

0

- . 001

-.018

.003

.000

.000

.000,

9.11

0

0

Substituting the factor loadings given in the table (Exerci'Se

9.10) into equation (9-45) gives.

Y (unrotated) = .01087

y' (rotated) = .04692
Al though the rotated 1 cadi ngs are to be preferred by the vari-max

("sim.pl.e struct-ur.ell) cri terion, interpretation -of the fa(:tor-s

206

seems clearer with the unrotated loadings.

9.12

The covariance matrix for the logarithms of turtle meaurements is:
S = 10-3 x 8.0191419 6.4167255 6.0052707
8.1596480 6.0052707
6.7727585 J
( 11.0720040
8.0191419 8.1596480

The maximum likelihood estimates of the factor loadings for an m=1 model are
Estimated factor
loadings

Variable
1. In(length)
2. In(width)

0.1021632

3. In(height)

0.0765267

Fi
0.075201 7

Therefore,
i = 0.0752017 ,
( 0.0765267
0.1021632 J

it' = 10-3 X 7.6828 5.6553 5.7549
7.8182 5.7549
5.8563 J
( 10.4373
7.6828 7.8182

(b) Since li~ = Îti for an m=l model, the communalities are

'" 2 . A 2 .... A 2 . ", _

hi = 0.0104373, h2 = 0.0056053, h3 = 0.0058563
(a) To fid specific variances .,i'S, we use the equation

.. A 2

.,i = 8¡¡ - hi

the maximum

Note that in this case, we should use 8n to get 8¡i, not S because

likelihood estimation method is used.

n - 1 23 (10.6107 7.685 7.8197 J

Sn = -8 = -2 S = 10-3 X 7.685 6.1494 5.7551

n 4 7.8197 5.7551 6.4906

Thus we get

.Ji = 0.0001 734, .J2 = 0.0004941, .J3 = 0.0006342
(c.) The proportion explained by the factor is
.. 2 .. 2 .. 2
hi
+ h-i + h3 = 0.0219489 = .9440

811 + 822 + 833 0.0232507

(.:) From (a)-(c), the residual matrix is:

8n - it' - \Î = 10-6 X 2.1673 0 00.112497.
0.1124971.4474 J
( 1.4474
0 2.1673

207

9.13

Equation (9-40) requires m ~ ¥2P+l - ¡g). Hêre we have m = 1,
P = 3 and the sti"ict inequality docs not hold.
9.14 Since

"'~ Ä_l A~ Al ""1,. ,. A
1f 1f '1 = I, /i ~/i ~ = /i and E f E = I ,

'"
'" 1.. ',.~ "!!S~-1 "'!."'..I ..l, "'At. ALIl ,.
L''l- L = /i"tl1f~ V..Et~~:: /i~fEA"l = /i"'/i"S = A.

9.15
(a)

variable
HRA
HRE
HRS
RRA
RRE
RRS
Q

REV

variance

communality

0.188966
0.133955
0.068971
0.100611
0.079682
0.096522
0.02678
0.039634

0.811034
0.866045
0.931029
0.899389
0.920318
0.903478
0.97322
0.960366

(b) Residual Matrix

o 0.021205 0.014563 -0.022111 -0.093691 -0.078402 -0.02145 -0.015523

0.021205 0 0.063146 -0.107308 -0.068312 -0.052289 -0.005616 0.036712

0.014563 0.063146 0 -0.065101 -0.009639 -0.070351 0.006454 0.013953

-0.022111 -0.107308 -0.065101 0 0.036263 0.058416 0.00696 -0.033857

-0.093691 -0.058312 -0.009639 0.036263 0 0.032646 0.008864 0.00066
-0.078402 -0.062289 -0.010351 0.068416 0.032645 0 0.002626 -Q. 004011

-0.02145 -0.005516 0.005464 0.00696 0.006854 0.002626 0 -0.02449

-0.015523 0.035712 0.013953 -0.033867 0.00066 -0.004011 -0.02449 0

The m=3 factor model appears appropriate.

(c) The first factor is related to market-value measures -(Q, REV). The second factor is
related to accounting historical measures on equity (HRE, RRE). The third factor is
historical
related to accounting historical measures on sales (HRS, RRS). Accounting

meaures on assets (HRA,RRA) are weakly related to all factors. Ther-efore, market-

value meaures provide evidence of profitabilty distinct from that provided by the
accounting measures. However, we cannot separate accounting historical measures of

profitabilty from accounting replacement measures.

208

PROBLEM 9.15
HRE .

RRE

0.8
R :¡

NO.6
o
t;

HRA

a:

"".. '

it 0.4

HRS

02

Q
REV

02

0.6

0.6

0.4
FACTOR 1

Roia FaCr Panem

0.9

HRS

0.8

RRS

0.7
'"

a:

0
t;
c
u.

0.6
0.5

R~RA

0.4

RE

0.3

Q

RRE
HRE

02

0.4

.

T

0.6

0.6

FACTOR 1
Rotatl FaClr Pattem

0.9

HRS

0.8

RS

0.7

I'
a: 0.6

gc
u.

HRA

0.5

RRA

0.4
REV

0.3

Q

.02

0.4
FACTOR 2
ROlated Factr Panem

0.6

0.8

RRE
HRE

209

9.16

'" '" 1A " 1

fJ. = Â- L''Y- (x.-x) and
:
~J ~
n A 1'" "1 n
L U l. j=l
!,J. -! = _. .
J=l

From (9-50)

,\" _fJ' = A - L"- \" ( ) 0

"'A A
1"'''

Since

1 ~1"'Al

fjfj = Â- L ''l- (Cj - &HìSj - &)''1- L6. - .

n "',. '" 1.. "1 "r ( -)(' -)1"'-1;"-1

'" 'f, f I. . = ti- L' '1- x . - x x . _ x UI I A
J.;l -J-J' . -J - -J - x LU,

J=l '

n,

"1SAlA"
= n ti",- 1"
L' '1V- U-1
Us; ng (9A-l),
n
"'1'"

"'l...1n. ""'1

r fJ.fJ~ = n ti- LI'1- ~-~(I +ti)ti-

j=l

Al" "'.., ""

= n ti- ti(I+6)Â- = n(I+ti- ),
a diagonal matrix. Consequently, the factor scores have sampl e mean

vector Q and zero sampl e covarfances.

9.17 Using the information in Example 9.12, we have

A I A i A i (.2220 -.0283J
(Lz 'l; Lz)- = which, apar from rounding error, is a
-.0283 .0137

diagonal matrix. Since the number in the (1,1) position, .2220, is appreciably
different from 0, and the observations have been standardized, equation (9-57)
suggests the regression and generalized least squares methods for computing
factor scores could give somewhat different results.

210

9.18. Factor analysis of Wisconsin fish data

(a) Principal component solution using Xl - X4

1 2 3 4

Ini tial Factor Method: Principal Components

Eigenvalue 2.1539 0.7876 0.6157 0.4429
Difference 1.3663 0.1719 0.1728
Proportion 0.5385 0.1969 0.1539 0.1107

Cumulati ve 0.5385 0.7354 0 .8893 1.0000
Factor Pattern (m = 1)

Factor Pattern (m = 2)

FACTORl FACTOR2

F ACTORl
BCRAPP IE

0.77273
0.73867

SBASS
LBASS

o . 64983
o . 76738

BLUEGILL

BLUEGILL
BCRAPPIE
SBASS
LBASS

0.77273 -0.40581

o . 73867 -0.36549
o . 64983 ~ .67309
0.76738 0.19047

(b) lvlaximum likelihood solution using Xl - X4
Ini tial Factor Method: Maximum Likelihood

Factor Pattern (m = 1)
FACTOR1

BLUEGILL
BCRAPPIE
SBASS
LBASS

0.70812
o . 63002
o . 48544

0.65312

Factor Pattern (m = 2)
F ACTOR1 F ACTOR2
BLUEGILL
0.98748 -0.02251
BCRAPPIE
o . 50404 0.25907
SBASS
0.28186 0.65863
LBASS
0.48073 0.41799

(c) Varimax rotation. Note that rotation is not possible with 1 factor.
Principal Components
Varimax Rotated Factor Pattern
BLUEGILL
BCRAPPIE
SBASS
LBASS

FACTOR1
0.85703
0.80526
0.08767
0.48072

FACTOR2
0.16518
0.17543
0.93147
0.62774

Maximum Likelihood
Varimax Rotated Factor Pattern
F ACTOR1 F ACTOR2
BLUEGILL
0.96841 0.19445
BCRAPPIE
o .4350i 0 . 36324
SBASS
0.13066 O. 70439
LBASS

0.37743 0.51319

For both solutions, Bluegil and Crappie load heavily on the first factor, while large-

mouth and smallmouth bass load heavily on the second factor.

211

(d) Factor analysis using Xl - X6

1 2 3 4

Initial Factor Method: Principal Components

Eigenvalue 2.3549 1.0719 0.9843 0.6644
Difference 1.2830 0.0876 0.3199 0.1640

Proport ion 0.3925 0.1786 0.1640 0.1107
Cumulative 0.3925 0.5711 0.7352 0.8459
Factor Pattern (m = 3)
F ACTORl F ACTOR2
o . 72944 -0.02285
BLUEGILL
0.72422 0.01989
BCRAPPIE
o . 60333, 0 .58051
SBASS
0.76170 0.07998
LBASS
WALLEYE

NPIKE

5
o . 5004

6
o .4242

0.0762
o . 0834

o . 0707

o .9293

1 .0000

F ACTOR3

-0.47611
-0.20739
o . 26232

-0 . 39334 0 . 83342

-0.03199
-0.01286

0.44657 -0.18156

o . 80285

Varimax Rotated Factor Pattern
F ACTORl F ACTOR2 F ACTOR3
o . 85090 -0.12720 -0. 13806
BLUEGILL
0.74189 0.11256 -0.06957
BCRAPPIE
0.51192 0.46222 0.54231
SBASS
LBASS
WALLEYE

NPIKE

0.71176 0.28458 0.00311
-0.24459 -0.21480 0.86227
0.05282 0.92348 -0.14613

Initial Factor Method: Maximum Likelihood
Factor Pattern
FACTORl
FACTOR2
F ACTOR3
o . 00000
BLUEGILL
1 . 00000
o . 00000
0.18979
BCRAPPIE
0.49190 0.23481
o . 96466
SBASS
o . 26350
o . 00000
0.29875
LBASS
o . 46530
o . 29435
O. 12927 -0.22770 -0.49746
WALLEYE
0.24062
NPIKE
o . 06520
o . 46665

Varima Rotated Factor Pattern
F ACTOR1 F ACTOR2 F ACTOR3
BLUEGILL
BCRAPPIE
SBASS
LBASS
WALLEYE

NPIKE

o . 99637 0 . 06257 0 .05767
0.46485 0.21097 0.26931
0.20017 0.97853 0.04905
0.42801 0.31567 0.33099
-0.20771 O. 13392 -0.50492

o . 02359 0 . 22600 0 .47779

The first principal component factor influences the Bluegil, Crappie and the Bas.
The Northern Pike alone loads heavily on the second factor, and the Walleye and

smallmouth bass on the third factor. The MLE solution is different.

212

9.19 (a), (b) and (c) l1aximum Likelihood (m = 3)

lJNROTATED FACTOR LOADINGS (PATTRN)
FOR l1AXIMU~' LIKELIHOOD CANONICAL FACTORS

Factor
'1

Growth

1

Profits

2
3
4
5
6
7

Newaccts

Creati ve
r~echani c

Abstract
Math

VP

0.772
0.570
'0..774

0.389
0..509

0.968,
0.632
26'Z

3.

Factor

Factor

2

3

0.295
0.347
0.433

0.527
0.721

0.355
0.000
0.334

0.921
o .426

,..-0.250
0.181

O~OOO

1 .520

1 .566

0.729

ROTATED FACTOR LOADINGS (PATTERN)

Growth
Prof; ts

Newaccts

Creat; ve
Mechani c

. Abstract
Math

1

2
3
4
5
6
7
VP

Factor

Factor

Factor

1

2

3

0.374
0.316
0.544

0.437

0.794
0.912
0.653 .

O~ 919

0.054
0.179

0.437
0.019
0.208
0.953
0.295

3~ 180

1 . 720

1 . 4"54

0.255
0.541

0.300

Communa 1; ti es

1 _ Growth

2 Profi ts
3 Newaccts
4 Creative
5 Mechani c

6 Abstract

7 Math

O. 1 84

0.9615
0.9648

O. 967
O. 464

Specifi~ Variances

1 . nOOO

.0385
.0352
.0876
.0000
.4481
.0000

0.9631

.'0369

'0.9124
1 .0000

0.S519

213

1.0

.926

1.0

.884
.843

1.0
R =

.542

.708
.746

.674
.465

.700

.637

.641

1.0

.591

.147
.386

.572

1.0

1.0

(Symetri c)

1.0

.923

1.0

Ll

,.

.912
.848

.572
.542
.700

1.0

+ 'l =

. '575

.56£

1.0

1.0
""A

.927
.944
.853
.413

.694
.679

.674
.455

. .696

.641

.591

.147
.386

1.0

1.0
(Symmetri c)

.925
.948
.826
.413
.646
.566

1.0

It is clear from an examination of the r.esidual matrix

,. A

R - (LL i +'1) that an m = 3 factor sol ution repr.esents the observed carrel ations quite well. However, it is dlfficul t to

. provide intei:-retations for the factors. If we consióer the,
. rotated loadings, we see that the last two factors ar.e dominated

by the- single variables IIcreativell and "abstra'Ct" r.espectively.
The first factor links the salespeople performance variables

wi th ma th a bi 1 i ty.
'(4) Using (9-39) \.iith n = 50, p. = 7, m = 3 we have

43 833 1 n (. 00007593l\ = 62 1 ;) x32(.,o1)= 11.3

, . .000018427).

214

so \'le reject HO:r = LL' + 'l for m = 3. Neither.of the m = 2,
m,= 3 factor

models appear to fit by the' x'- criteri-on. He

, AA "
note that the matrices R, LL' + V have small determinants and
rounding error could affect the calculation of the test statistic.

Again, t~e residual matrix above indicates a good fit for m = 3.
(e.) ~' = (1.522, -.852, .465, .957, 1.129, .673, .497)

Using the regression method for computing factor scores, we

, A_1
have; wi th f = LzR~ :

-

Principal components (m = 3) Maximum 1 ikel i hood (m = 3)

f' = (.686, .271,1.395) f' = (-.70Z, .679, -.751)

,computed

Factor scores using weighted least squares can 'only be

A_l

for the principal component sol utions si nce '1 cannot be com"
puted for the maximum likelihood solutions. ('1 has zeros on the
main diagonal for the maximum lik~lihood solutions). Using (9-50),

Principal components (m = 3)

l' = (..344, .2~3, 1.805)

9.20
Xs

~

-.59 -2.23
6.78 3u.78
11.36 3.13
31.98
L(symetric)
2.;; -;~7

S = 300.52

)

215

(a)

Princi pa 1

components (m = 2)
Factor 1

Factor 2
1 oadi ngs

1 oadi ngs
i

Xl el.lind)
X2 ~solar rad.)
Xs (N02)

X6 (03)

-.17
17.32

-.37
i

-.61
I

.42

.74

1.96

I

5.19
I

Cb) Maximum 1 ikol ihood estimates of the loadings are obtained from

L = ~z where Lz a~e the l.oadings obtåined from the sample
A

Z '.

correlation matrix R. (For t see problem 9.23). Note:
Maximum 1 ikol ihood estimates of the loadings for m = 2 may be
di ff1cul t to obtain for some computer packages without good

. ,

estimates of the communålities. One choice for initial esti-

mates of the comnunallties are the communalities from the m = 2
principal components solution.

(c) Haximum likelihood estimation (\.,ith m = 2) does a better job.

of accounti ng for the covari ances inS than the m = 2
principal component sol uti

on. On the other hand, the pri ncipal

component sol ution generally produces uniformly small~r ~stimates

of the specific variances. For thë unrotated m = L solution,
the first factor is dominated by Xl = solar ,radiation and Xs =
°3. The second factor seems t~ be a contrast ,between the paJr

Xl = wind; X2 = solar radi~tion and the pair X5 = NOZ and

~6 = OJ .

~gain the ff.rst factoi. is dominated by solar radiation and,. to

som~ extent, ozone. The second' factor" might ba interpretad as a

contrast bebieen wind and the pair of pollutants N02 and 03.
Recall solar radiation and ozone have the larg~st sample variances.
This will affect the estimated loadings obtained by the principal

component method.'
"
9.22 (a) Since, for maximum 1 ikel ihood estimat.es, ,.
L =i D~Lz
and
S = O'lRO\ the factor scores gener~ted by the equations for tj
in (9-58) will be identical. Similarly, the fact~r scores

generated by the we; ghted 1 eas~ squares formul as in (9-SQ) wi 11 be

identical.
l"e factor scores generated by the regression method wi th

..

maximum likelihood estimates (m =2; seeproblem9.23~) are giv€n

-l

below for the first 10 case~.

Case
2
3

,.
f1

-

0..316

0.252
0.129

4

0.'332

5
6

0.492
0.515
0.530

7
8
9

10

:¡ .070

0.384
-::0._,179

"

f2

-0.544
-0.546
-0.509
-0.790
-0.01.2

-0.370
-0.456
0.724
-0. tl23

p.io:

217
(b) Factor scores using principal component estimates (m = 2) and
(9-51) for the fit.st 10 cases are given below:

Case
1

2
3

,.
f1

'"

1 .203

-0.368

'f . 646

~ 1 . 029

1.447
0.717
0.856

4

0.795

o . 811

0.518

0.950

~O. 083

1 .1 68

0.410
~0.492

10

-0.937

-0.049
0.394

5
6
7

8
9

f2

0.259
0.072

(c) The sets of factor scores are quite different. Factor scores
depend heavily on the method used to estimate loadings and

specific variances as well as the method us~d to g~nerate them.

9.23
, Principal components (m = 2)

Factor 2

Rotated load; ngs

1 oadi ngs

loadin~s

Facto r 1

-.56

Factor 1

.

X2 (solar rad.)

.65

-.24
-.52

Xs ( NOZ)

,.48

.74

.77

~'.20

Xl

Xs

(wind)

(°3)

-.31

C!
-.05

Q£

2

Factor

I -.53 I
- .04

(Æ
.30

218
l1aximum likeli'hood (m = 2)

Factor 1

Factor 2

1 oadi ngs

loadings

Factor 1

Factor

-.38

.32

-.09

-X
' 2 (solar rad.)

.50

.27

C:

em

IX5 (N02)'

.25

-.04

.17

- .19

~6 (°3)

.65

-.03

C&

I -.43 J

~
1

(wi nd)

Rota teet loadi ngs
2

-.10

Examining the rotated loadings, we see that both solution methods yield
similar estimated loadings for the first .factr. It mi ght be called a

"ozone pollution factor'l. There are some differences for the s,econd factor-.
However, the second factor appears to compare one of the pollutants with

wind. It might be called a "pollutant transportU factor. \4e note that the
intèrpretations of the factors might differ depending upon the choice of

R or S

(see problems 9.20 and 9.21) for analysis. Al so the two sol ution

methods give somewhat different results indicating

the solution is not ve~

stabl e. Some of the observed carrel ations between the variables are vary

small implying that a m = 1 or m = 2 factor model for these 'four
variables will not be a completely satiSfactory description of the under~

'lying structure. We may need about as many factors as vari~blas. If this
is ,the ca~e, there is nothing to be gained by proposing a fa-ctor model.

219

9.24
-.192 .313 -.119

.026

-.192

1.0 - .065 .373

.685

R = .313

-.065 1.0 -.411

-.010

1.0

-.119
.026

.373 -.411 1.0

.180

.685 -.010 .180

1.0

The correlations are relatively small with the possible exception of .685, the
correlation between Percent Professional Degree and Median Home Value.

Consequently, a factor analysis with fewer than 4 or 5 factors may be
problematic. The scree plot, shown below, reinforces this conjecture., The scree
plot falls off almost linearly, there is no sharp elbow. However, we present a
factor analysis with m = 3 factors for both the principal components and
maximum likelihood solutions.
SçreêPlqlofPopulation, .., MedianHøme
2.0

1.5
lI
:i

ii

~ 1.0
lI

ai
iü

0.5

0.0
2

4

3
Factr

5

Numbe

Principal Component Factor Analysis (m = 3)
Unrotated Factor Loadings and Communalities

Factor1 Factor2 Factor3 Communality
0.9'62
-0.371 -0.541 -0.729
0.870
0.153
0.837 -0.381
PerCen tProDeg
0.756
0.209
-0.460 -0.708
PerCentEmp::16
0.807
-0.512
0.295
0.676
PerCen tGovEmp
0.830
0.064
-0.584
0.696
MedianHorne

Variable
Population

Variance
% Var

1.9919
0.398

1.3675
0.274

0.8642
0.173

4.2236
0.845

220
Rotated Factor Loadings and Communalities
Varimax Rotation

Variable
Population

Factor1 Factor2 Factor3

Coiiunal i ty

0.102 C-d.801ì -0.321

11.756

-0.059
-0.118 ~
~ 0.160 0.147

PerCen tProDeg
PerCentEmp~16

0.962
0.870

MedianHome

~ 0.009 -0.068

0.807
0.830

Variance

1.7382
0.348

1.4050
0.281

4.2236
0.845

PerCen tGovEmp

% Var

0.277 _.Q_85Q'/ -0.082

1.0803
0.216

Factor Score Coefficients

Factor1 Factor2 Factor3
0.138 -0.940
-0.019

Variable
Population

-0.028
-0.577
0.658
-0.099

0.522
0.169
0.052
0.544

PerCentProDeg
PerCentEmp~16

PerCen tGovEmp

MedianHome

0.109

-0.135
-0.278
-0.070

Score Plot of Population, .., MedianHome (PC:)
4

.
.

3

2
.~

.

~ 1

æ

. .
.
.
.

. .~. . . .

'a

.. ..
. . ..
.
. .

I 0
-1

.
.

.

.
.
...
. . .
. ,.
.
.

.
.
.'

.

.

i

.

-2

.

-3
-2

-1

o
Firs

i
Faêtr

2

4

3

Maximum Likelihood Factor Analysis (m = 3)
* NOTE * Heywood case

Unrotated Factor Loadings and Coiiunalities

Factor1 Factor2
-0.047 -0.999
0.146
0.989
PerCen tProDeg
-0.020 -0.313
PerCentEmp~16
0.103
0.362
PerCen tGovEmp

Variable
Population
MedianHome

Variance
% Var

Factor3 -Coiiunality
-0.0011

-0.000
0.941

0.701

-0.059

-0.395
-0.015

1.6043
0.321

1.1310
0.226

1.0419
0.208

1.000
1.000
0.984
0.298

0.49'6

3.7772
0.755

221

~ c¡ ~
0.145

1. 000
1. 000

-0.165
0.041
-0.061

0.984
0.298
0.496

1.1740
0.235

1. 0282

3 .7772

Factor1 Factor2
-0.177
0.137
-0.053
1. 017
PerCen tProDeg
1. 025
0.070
PerCentEmp::16
-0.001 -0.010
PerCen tGovEmp
-0.000 -0.001
MedianHome

Factor3
-1.046
-0.046
0.159
-0.002
-0.000

Rotated Factor Loadings and Communalities
Varimax Rotation

Factor1 Factor2
0.155
-0.036
-0.090
PerCentProDeg
0.047
PerCentEmp::16
-0.430
0.333
PerCen tGovEmp

Variable
population

~

MedianHome

-. 4

1.5750
0.315

Variance
% Var

Factor3 Conuunality

0.755

0.206

Factor Score Coefficients

Variable
population

Plot

Score

Population, .., MedianHoníé (MU£')

of

.
2

.
..

.
.

1

..

..

~
: 0

.,.
.

~ .

. . ...
..

-ø

i:

8

.

iX -1

.

. ..

.

. ..
.

.

.

.

.

.

.

..

.

. .

.

.

.
.

.

.

.

.
.

-2

.

.
-3

-2

-1

o

1 2

3

4

fil'Fllctr

A m = 3 factor solution explains from 75% to 85% of the variance depending on
the solution method. Using the rotated loadings, the first factor in both methods

has large loadings on Percent Professional Degree and Median Home Value. It is
difficult to label this factor but since income is probably somewhere in this mix, it
might be labeled an "affluence" or "white collar" factor. The second and third
factors from the two solutions are similar as well. The second factor is a bipolar
factor with large loadings (in absolute value) on Percent Employed over 16 and
Percent Government Employment. We call this factor an "employment" factor.
The third factor is clearly a "population" factor. Factor scores for the first two
factors from the two solutions methods are similaro

222

9.25

105,625
S

-'

94,734

87,242

94,Z80

101,761

76,186

81 ,204

91 ,809

90,343
H)4,329

(Symmetric)

A m = 1 factor model appears to represent these data qui te well .
Pri nci pa 1 Components

Factor 1.
loadings

Maximum Likelihood
Fa ctor 1

loadings

Shocki./ave

317.

320.

Vibration

293.

291.

Stati c test 1 .

287.

275.

Static test 2

307.

297.

90.1%

86.9%

Proportion
. Variance

Expl ained

Factor scores (m = 1) using the ~gression method for the first 'few

cases are:
Principal Components

Maximum L i kel ihood

-.009

-.033.

1 .530

1.524

.808

.719

- .804

- .802

The factor scores produced from the two sol ution methods ar.e v.ery
similar. The correlation between the two sets of sc~~es is .992.
T1i'Outli.ers, spet:imens 9 and 16, were i'Óentifi..d in 'Exæipl,e 4.15.

223

9.26

a)

Principal Compûn~nts

L m = i(
Factor 1
loadi nos

lm=2(

'1 .

Factor 2
'1 oadi naS

I,Factor
1
11 oadi nQS

1

'P .

,

Litter 1

21.9

309.0

27.9

-6.2

271 .2

Li tter 2

" 30.4

205.7

30.4

-4.9

182.2

Litter 3

31.5

344.3

31.5

18.5

1.7

Li tter 4,

32.9

310.0

32.9

-8.0

245.8

Percentage
Variance
Explained

76.4i

76.4%

l!

b)

"

9.4i

Maximum Likelihood

Litter 1

, Factor
10adinas
26.8

' '~

v.i
370.2

Litter 2

30.5

1 98.2

Litter 3

28.4

529.6

Litter 4

',30.4

471. 0

Percentage
Vari ance

' ,

68.8i

Explained,

The maximum likelihood. estimates of the factor loadings for ii = Z we're
not o,btained due to convergence difficul ti es in the computer program.

c) It is only necessary to r~tate the m = 2 solution.

224
Principal Components (m = 2)

Rata ted 1 oadi ngs

FactOr 1 'Factor 2
Litter 1

26.Z

11.4

li tter 2

27.5

13.8

Litter 3

14.7

33.4

Litter 4

31.4

12.8

Percentage
Var; ance

.53.5~

32.4%

Explained

9.27

,Principal Components, (m = 2)

Rotated loadin9s

'l .

Factor 1

Factor. 2

10a~ings
..

loadings

Litter 1

.86

.44

.06

.33

.91

Li tter 2

.91

.12

.15

.59

.71

Li tter 3

.85

-.36

.14

.87

.32

Litter 4

,.87

-°.21

.20

.78

.44

45.4%

40.6%

i

Factor 1 , J

Factor 2

..

Percentage
Variance
Expl ained'

76.5%

9.5~

225

Maximum Likelihood (m = 1)

,.

Factor 1

'1 .

loadings

1

Li tter 1

.81

.34'

Litter 2

.91

.17

litter 3

.78

.39

litter 4

.ßl

Percentage
Variance

68.81

Expl ai ned

'"
"-1
f = L R z

z. _

= .297

,

.34

,

226

9.28 The covariance matrix S (see below) is dominated by the marathon since the
marathon times are given in minutes. It is unlikely that a factor analysis wil
be useful; however, the principal component solution with m = 2 is given below.

Using the unrotated loadings, the first factor explains about 98% of the variance and
the largest factor loading is associated with the marathon. Using the rotated
loadings, the first factor explains about 87% of the varance and again the largest
loading is associated with the marathon. The second factor, with either unrotated or
rotated loadings, explains relatively little of the remaining variance and can be
ignored. The first factor might be labeled a "running endurance" factor but this
factor provides us with little insight into the nature of the running events. It is
better to factor analyze the correlation matrix R in this case.
Covariances: 100m(s), 200m(s), 400m(s), 800m, 1500m, 3000m, Marathon
100m(s)
200m(s)
400m(s)
800m
1500m
3000m

100m(s)

200m(s)

400m(s)

800m

o .02770

0.86309
2.19284
0.06617
0.20276
0.55435
10.38499

6.74546
0.18181
0.50918
1.42682
28.90373

0.00755
0.02141
0.06138

0.15532
0.34456
0.89130

Marathon

0.08389
0.23388
4.33418

Marathon

Marathon
270.27015

1. 21965

Principal Component Factor Analysis of S (m = 2)
Unrotated Factor Loadings and Communalities

Variable Factorl Factor2 Communality
0.124
-0.230
0.267
100m (s)
0.749
-0.582
0.640
200m( s)
6.725
-1.881
1.785
400m(s)
0.006
-0.027
0.075
800m
0.052
-0.073
0.217
1500m
0.453
-0.158
3000m
Mara thon

Variance
% Var

~

16.438-'

0.238

270.270

274.36
0.984

4.02
0.014

278.38
0.999

Rotated Factor Loadings and Communalities
Varimax Rotation

Variable Factor1 Factor2 Communality
0.124
-0.308
0.172
100m(s)
200m(s)
400m(s)
800m
150 Om

3000m

0.401
1. 030

0.061
0.178

o ~~

Marathon (i5.517'
Variance 242.38
0.869
% Var

-0.767
-2.380
-0.051
-0.143
-0.373
-5.431

0.749
6.725
0.006
0.052
0.453
270.270

36.00
0.129

278.38
0.999

1500m

3000m

0.07418
0.21616
3.53984

0.66476
10.70609

227

The correlation matrix ~ for the women's track records follows.

Correlations: 100m(s), 200m(s), 400m(s), 800m, 1500m, 3000m, Marathon
100m(s)

0.941
0.871
0.809
0.782
0.728
0.669

200m(s)
400m (s)
800m
150 Om

3000m
Mara thon

200m (s)

400m (s)

800m

1500m

3000m

0.909
0.820
0.801
0.732
0.680

0.806
0.720
0.674
0.677

0.905
0.867
0.854

0.973
0.791

0.799

The scree plot below suggests at most a m = 2 factor solution.
Scree Plot of iOOm($l, _.,Maràthon(èortlilàtion lbtnx)

II

,~

¡c 3
,ii

ØI
¡¡ 2

1

o
1

3 4 5

i

6

F.ctrNumbet

Principal Component Factor Analysis of R (m =2)
Unrotated Factor Loadings and Communalities

Communality
0.933

Variable Factor1
100m(s)
200m(s)
400m(s)
800m
150 Om

3000m

Marathon

Variance
% Var

0.910
0.923
0.887
0.951
0.938
0.906
0.856

5.8076
0.830

0.960
0.919
0.921
0.940
0.934
0.828

0.6287
0.090

6.4363
0.919

7

228
Rotated Factor Loadings and Communalities
Varimax Rotation

Variable

Communali ty

0.933
0.960
0.919
0.921
0.940
0.934
0.828

100m(s)
200m(s)
400m(s)
800m
1500m
3000m

Marathon

3.3530
0.479

Variance
% Var

6.4363
0.919

3.0833
0.440

Factor Score Coefficients

Variable Factor1 Factor2
-0.240 -0.480
100m(s)
-0.244 -0.488
200m(s)
-0.288 -0.525
400m(s)
0.259
0.035
800m
0.172
0.386
1500m
0.280
0.481
3000m
0.255
Marathon
0.445
Plot of 10011(5), ..,Marâthôll(PC:, rn=2,

Score

2

!1

.
..

j 0

:.
,;c

.

.

~-1
tI

. .
. .. .
. .. . .
.
..

.

.

.

.. .
..

. .
..

..

..

.

.

.~i.(.

'#1/0

.

.

.

.

-2

.#3/

. #11

-3

-2

-1

o

1 Factr
2

3

Firs

Maximum Likelihood Factor Analysis of R (m = 2)
Unrotated Factor Loadings and Communalities

Variable

Communality

100m(s)
200m(s)
400m(s)

o . 90'6

0.976
0.848
0.856
0.984
0.972

800m
1500m
3000m

Marathon

Variance
% Var

o . 6'62

5 . 6 i 04

o .592?

0.801

0.085

6.2032
0.886

4

5

229
Rotated Factor Loadings and Communalities
Varimax Rotation

Variable Factorl

Marathon

0.455
0.449
0.395
.728
0.879
0.915
0.690

Variance

3.1B06

100m(s)
200m(s)
400m(s)
800m
150 Om

3000m

% Var

0.454

Communality
0.906
0.976
0.84B
0.B56
0.984
0.972
0.662

3.0225
0.432

Factor Score Coefficients

6.2032
0.886

Variable Factor1 Fat:tor2
100m(s)
-0.107
0.237
200m(s)
-0.481
1.019
400m(s)
-0.077
0.lS7
0.036
0.772
0.595
0.024

BOOm

1500m
3000m

Marathon

0.025

-0.317
-0.369
-0.003

ScorePJol of 100m(s), ..., Maràthon(JiL~, m=2)
3

. #'1

..#31
2

..
~

..I. ... ,

l 1
=

l

..

*":u.

.

. .::. .

1°

. ..

-1

-2

-2

-1

o

1firs Fad
2

3

4

The results from the two solution methods are very similar. Using the unrotated
loadings, the first factor might be identified as a "running excellence" factor. All
the running events load highly on this factor. The second factor appears to
contrast the shorter running events (100m, 200m, 400m) with the longer events
(800m, 1500m, 3000m, marathon). This bipolar factor might be called a "running
speed-running endurance" factor. After rotation the overall excellence factor
disappears and the first factor appears to represent "running endurance"-since the
running events 800m through the marathon load highly on this factor. The second
factor might be classified as a "running speed" factor. Note, for both factors, the
remaining running events in each case have moderately large loadings on the
factor. The two factor solution accounts for 89%-92% (depending on solution
method) of the total variance. The plots of the factor scores indicate that
observations #46 (Samoa), #11 (Cook Islands) and #31 (North Korea) are outliers.

230

9.29 The covariance matrix S for the running events measured in meters/second is
given below. Since all the running event variables are now on a commensurate
measurement scale, it is likely a factor analysis of S wil produce nearly the same
results as a factor analysis of the correlation matrix R. The results for a m = 2 factor

analysis of S using the principal component method are shown below. A factor
analysis of R follows.
Covariances: 100m/s, 200m/s, 400m/s,800m/s, 1500m/s, 3000m/s, Marmls
3000m/s
1500ml s
800m/s
400ml s
200m/s
100ml s

Marml s

0.0905383
0.0956063
0.0966724
0.0650640
0.0822198
0.0921422
0.0810999

Marml s

0.1667141

lOOmIs

200m/s
400m/s
800m/s
1500m/s
3000m/s

0.1146714
0.1138699
0.0749249
0.0960189
0.1054364
0.0933103

0.1377889
0.0809409
0.0954430
0.1083164
0.1018807

0.0735228
0.0864542
0.0997547
0.0943056

Marml s

Principal Component Factor Analysis of S (m = 2)
unrotated Factor Loadings and communalities

communality
0.083
0.110
0.128
0.066
0.116
0.168
0.148

variable
lOOmIs

200m/s
400m/s
800m/s
1500m/s
3000m/s
Marml s

Variance 0.73215 0.08607

% Var 0.829 0.097

0.81822
0.926

Rotated Factor Loadings and Communalities
Varimax Rotation

communality
0.083
0.110
0.128
0.066
0.116
0.168
0.148

variable
10 Oml s

200m/s
400m/s
800m/s
1500m/s
3000m/s
Marrl s
Variance 0.45423 0.36399

% Var 0.514 0.412
Factor Score Coefficients

Factor2
variable Factor1
-0.171 -0.363
lOOmIs
-0.222 -0.471
200m/s
-0.306 -0.603
400m/s
800ml s

1500ml s

3000m/s
Marmls

0.104
0.287
0.542

,Q . sse

-0.025
0.08'5
o . 28'0
-0 . 33S

0.81822
0.926

0.1238405
0.1437148
0.1184S78

0.1765843

0.1465-604

231
Using the unrotated loadings, the first factor might be identified as a "running
same size loadings on
excellence" factor. All the running events have roughly the

factor. The second factor appears to contrast the shorter running events (100m,
200m, 400m) with the longer events (800m, 1500m, 3000m, marathon). This
bipolar factor might be called a "running speed-running endurance" factor. After
rotation the overall excellence factor disappears and the first factor appears to
represent "running endurance" since the running events 800m through the marathon
have higher loadings on this factor. The second factor might be classified as a
"running speed" factor. Note, for both factors, the remaining running events in
each case have moderate and roughly equal loadings on the factor. The two factor
this

solution accounts for 93% of the varance.

The correlation matrix R is shown below along with the scree plot. A two factor
solution seems waranted.

Correlations: 100m/s, 200m/s, 400m/s, 800m/s, 1500m/s, 3000m/s, Marm/s
lOOmIs

0.938
0.866
0.797
0.776
0.729
0.660

200m/s
400m/s
800m/s
1500ml s

3000m/s
Is

Marr

2 OOml s

400m/s

800m/s

1500ml s

3000ml s

0.906
0.816
0.806
0.741
0.675

0.804
0.731
0.694
0.672

0.906
0.875
0.852

0.972
0.824

0.854

Scree Plot of lOOmIs, .., Marm/s (Correlation Matñx)
0.8
fl.7
D,6

.:lI

0.5

¡,i: 0.
II

! D.3

0.2
0.1

0.0
1

2

3 4 5
component Number

6

7

232

Principal Component Factor Analysis of R (m = 2)
Unrotated Factor Loadings and Communalities

Communal i ty

Vari.able

0.932
0.960
0.911
0.914
0.941
0.947
0.875

lOOmIs

20 Oml s
40 Oml s

800m/s
1500m/s
3000m/s
Is

Marr

Vari.ance
% Var

5.8323
0.833

6.4799
0.926

0.6477
0.093

Rotated Factor Loadings and Communalities

Varima Rotation
Variable Factor1
10 Oml s
20 Oml s

400m/s
80 Oml s

l500m/s

3000m/s
Marml s

Variance
% Var

Communality
0.932

0.418
0.436
0.400
0.771
0.839
0.886
0.871

0.960
0.911
0.914
0.941
0.947
0.875

3.3675
0.481

6.4799
0.926

3.1125
0.445

Factor Score Coefficients

Variable Factor1 Factor2
-0.252 -0.489
lOOmIs
-0.243 -0.484
20 Oml s
-0.265 -0.499
400m/s
800m/s

15 OOml s

3000m/s
Marr/ s

0.248
0.358
0.455
0.484

0.025
0.142
0.249
0.293

ScoreP1..tíifl'ØOm/s, ..,Marm/s (PC,m=2)
3

.#31

. ",i \
2

.

.

0

..

o ~q!"

l
1
11

. :t.(j,::

'ii

..

'l

. .

-1

. .
. ..
..

.

..

ai

o.

.

.

.

-2

-4

-3

-2

-1

Firs Factr

.

.

. .

'8 0

.

I)

.

.

.

. .
0'. .

.

0
0

.

.

.
i

2

233

Maximum Likelihood Factor Analysis of R (m = 2)
Unrotated Factor Loadings and communalities
Communa1i ty

Variable

0.896
0.983
0.836
0.850
0.971
0.984
0.737

lOOmIs
20 Oml s

400ml s
80 Oml s

1500m/s
3000m/s
MannI s
% Var

6.2560
0.894

0.5716
0.082

5.6844
0.812

Variance

Rotated Factor Loadings and communalities
Varimax Rotation

Communality
0.896
0.983
0.836
0.850

Variable Factor1

0.441
0.435
0.412
rO.

100ml s

200m/s
400m/s
800m/s

26

0.971
0.984
0.737

0.859
0.914
0.765

1500ml s
3 OOOml s

MannI s

3.2395
0.463

Variance
% Var

6.2560
0.894

3.0165
0.431

Factor Score Coefficients

Variable Factor1 Factor2
-0.167
-0.073
10 Oml s
-0.521 -1.122
2 OOml s
-0.106
-0.048
40 Oml s
0.039
0.379
0.949
0.041

80 Oml s

1500m/s
3000m/s
MannI s

-0.014
0.124

0.518
0.017

scOl'ePlotøf100mls, ..., Marmls (MLE, m=2)
.~ .ii

3

. ~!

',.

2
" "

.
..

1

~

l.
,:c:

. *q,~

.

..

A

.
. .

o
lI -l
.:i

.
.

.

".

.

. .
" .
.

.

..

.
. .. .,
. . .. " . .
.
. .

"
"

.
.

.

..,...

.
.

-2
-3
-4

-3

-2 -1
Firs Fac:or

o

1

2

234

The results from the two solution methods are very similar and very similar to the
principal component factor analysis of the covariance matrix S. Using the

unrotated loadings, the first factor might be identified as a "running excellence"
factor. All the running events load highly on this factor. The second factor appears
to contrast the shorter running events (100m, 200m, 400m) with the longer events
(800m, 1500m, 3000m, marathon). This bipolar factor might be called a "running

speed-running endurance" factor. After rotation the overall excellence factor
disappears and the first factor appears to represent "running endurance" since the
running events 800m through the marathon load highly on this factor. The second
factor might be classified as a "running speed" factor. Note, for both factors, the
remaining running events in each case have moderately large loadings on the
factor. The two factor solution accounts for 89%-93% (depending on solution
method) of the total variance. The plots of the factor scores indicate that
observations #46 (Samoa), #11 (Cook Islands) and #31 (North Korea) are outliers.
women's track records when time is
measured in meters per second are very much the same as the results for the m = 2
factor analysis of R presented in Exercise 9.28. If the correlation matrix R is factor
analyzed, it makes little difference whether running event time is measured in

The results of

the m = 2 factor analysis of

seconds (or minutes) as in Exercise 9.28 or in meters per second. It does make a

difference if the covariance matrix S is factor analyzed, since the measurement
scales in Exercise 9.28 are quite different from the meters/second scale.

235
9.30 The covariance matrix S (see below) is dominated by the marathon since the
marathon times are given in minutes. It is unlikely that a factor analysis wil
be useful; however, the principal component solution with m = 2 is given below.
Using the unrotated loadings, the first factor explains about 98% of the variance and

the largest factor loading is associated with the marathon. Using the rotated
loadings, the first factor explains about 83% of the varance and again the largest
loading is associated with the marathon. The second factor, with either unrotated or
rotated loadings, explains relatively little of the remaining variance and can be
ignored. The first factor might be labeled a "running endurance" factor but this
factor provides us with little insight into the nature of the running events. It is

better to factor analyze the correlation matrx R in this case.
Covariances: 100m, 200m, 400m, 800m, 1500m, 5000m, 10,OOOm, Marathon
5000m
1500m
800m
400m
200m
100m

100m
200m
400m
800m
1500m
500 Om

10 i OOOm

Marathon
10 i OOOm

Marathon

0.048973
0.111044
0.256022
0.008264
0.025720
0.124575
0.265613

1. 340139

0.300903
0.666818
0.022929
0.066193
0.317734
0.688936
3.541038

10 i OOOm

Mara thon

2.819569
14.342538

80.135356

2.069956
0.057938
0.168473
0.853486
1.849941
9.178857

0.002751
0.007131
0.034348
0.074257
0.378905

Principal Component Factor Analysis of S (m = 2)
Unrotated Factor Loadings and Communalities

Variable Factor1 Factor2 Communality
0.034
-0.107
0.152
100m
0.234
0.401 -0.270
200m
2.049
-0.979
1.044
400m
0.002
-0.015
0.043
800m
0.019
0.134 -0.033
1500m
0.537
-0.125
0.722
5000m
10.000m
Mara thon

Variance
% Var

~
~.

-0.223
0.179

2.643
80.130

84.507
0.983

1.141
0.013

85.649
0.996

Rotated Factor Loadings and Communalities
Varimax Rotation

Variable
100m
200m
400m
800m
1500m
5000m
10 i OOOm
Mara thon

Variance
% Var

Factor1 Factor2

Communality

-0.158
-0.406
-1.312
-0.031
-0.083
-0.399
-0.841

0.034
0.234
2.049
0.002
0.019
0.537
2.643
80.130

71.529 14.119
'0.832 0.lb4

85.649
0.996

0.097
0.262
0.573
0.033
0.110
0.615
1.392

~~J

0.023034
0.105833
0.229701
1.192564

0.578875
1. 262533

6.430489

236

The correlation matrix Rfor the men's track records follows.
Correlations: 100m, 200m, 400m, 800m, 1S00m, SOOOm, 10,OOOm, Marathon
200m

400m

800m

lS00m

SOOOm

10,000m

0.845
0.797
0.795
0.761
0.748
0.721

0.768
0.772
0.780
0.766
0.713

0.896
0.861
0.843
0.807

0.917
0.901
0.878

0.988
0.944

0.954

100m

0.915
0.804
0.712
0.766
0.740
0.715
0.676

200m
400m
800m

lS00m
SOOOm

10,000m

Marathon

The scree plot below suggests at most a m = 2 factor solution.

.. S

Fà.rNumbe

Principal Component Factor Analysis of R (m =2)
Unrotated Factor Loadings and Communalities

Variable Fa to
100m
200m
400m
800m

lS00m
SOOOm

10,000m
Mara thon

Variance
% Var

0.861
0.896
0.878
0.914
0.948
0.957
0.947
0.917

6.7033
0.838

Fa tor2
0.423
0.376
0.276

Communality

-0.123
-0.236
-0.267
-0.309.,

0.920
0.944
0.847
0.840
0.913
0.972
0.969
0.937

0.6384
0.080

7.3417
0.918

1

237
Rotated Factor Loadings and Communalities
Varimax Rotation

Communality

Variable

0.920
0.944
0.847
0.840
0.913
0.972
0.969
0.937

100m
200m
400m
800m
1500m
5000m

10.000m

Mara thon

Variance
% Var

7.3417
0.918

3.2249
0.403

4.1168
0.515

Factor Score Coefficients

Variable Factor1 Factor2
0.586
-0.335
100m
0.533
-0.283
200m
0.413
-0.183
400m
800m
150 Om

5000m

10.000m

Marathon

0.004
-0.053
-0.186
-0.224
-0.277

0.176
0.233
0.349
0.380
0.420

,Scareeløt~f'100ml .n, Marathoia:lPÇ,ltn=2)

. #-"
. it tj 10

.

..

..

... .0..

. .. . .
. .,
-.. ..
o

~

-3
o

-1

-2

3

1

Firs Fadr

Maximum Likelihood Factor Analysis of R (m = 2)
Unrotated Factor Loadings and Communalities

Variable Fac
100m
200m
400m
800m
1S00m
SOOOm

10.000m

Marathon

Variance
% Var

r

Communali ty

0.866
0.963
0.772
0.788
0.866
0.988
0.989
0.912

0.780
0.814
0.810
0.875
0.927
0.991
0.989
0.949

6.4134
0.802

0.7299
0.091

7.1432
0.893

238
Rotated Factor Loadings and Communalities

Varimax Rotation

Variable

Communal i ty

0.866
0.963
0.772
0.788
0.866
0.988
0.989
0.912

100m
200m
400m
800m

150 Om

5000m

10,000m

Marathon

7.1432
0.893

Variance 3.9446 3.1986

% Var 0.493 0.400

Factor Score Coefficients

Variable Factorl Factor2
0.256
-0.125
100m
0.994
-0.490
200m
'0.104
-0.044
400m
0.054
-0.011
800m
1500m
5000m

10,000m
Marathon

0.003
0.558
0.761
0.089

0.056
-0.209
-0.423
-0.051

,Seore9Jpt,of1.'ØOJDI"" Marathon (ML£,nì::l)
3
. iF II

..

L

..,'ll,
.. 1

. '" e.
..
. ...
.
. ..

fi

'I1l

'i

.....

1.1

.

. .

..

-2

.
-3
-2

~1

o

1

2

3

Firs Factr

The results from the two solution methods are very similar. Using the unrotated
loadings, the first factor might be identified as a "running excellence" factor. ,All
the running events load highly on this factor. The second factor appears to
contrast the shorter running events with the longer events although the nature of the
contrast is a bit different for the two methods. For the principal component method,
the 100m, 200m and 400m events have positive loadings and the 800m, IS00m,
5000m, 1O,000m and marathon events have negative loadings. For the maximum
likelihood method, the 100m, 200m, 400m, 800m and 1 SOOm events are in one
group (positive loadings) and the 5000, 1O,OOOm and marathon are in the other

group (negative loadings). Nevertheless, this bipolar factor might be called a

239

"running speed-running endurance" factor. After rotation the overall excellence
factor disappears and the first factor appears to represent "running endurance" since
the running events 800m through the marathon load highly on this factor. Th~
second factor might be classified as a "running speed" factor. Note, for both
factors, the remaining running events in each case have moderately large loadings
on the factor. The two factor solution accounts for 89%-92% (depending on
solution method) of the total varance. The plots of the factor scores indicate that
observations #46 (Samoa) and #11 (Cook Islands) are outliers. The factor analysis
of the men's track records is very much the same as that for the women's track
records in Exercise 9.28.
9.31 The covariance matrix S for the running events measured in meters/second is
given below. Since all the running event variables are now on a commensurate
measurement scale, it is likely a factor analysis of S wil produce nearly the same
results as a factor analysis of the correlation matrix R. The results for a m = 2 factor
analysis of S using the principal component method are shown below. A factor
analysis of R follows.
Covariances: 100m/s, 200m/s, 400m/s, 800m/s, 1S00m/s, SOOOm/s, 10,OOOm/s,...

lOOmIs
2 OOml s

400m/s
800m/s
1500m/s
5000m/s
10, OOOml s

Mara thonr/ s

5000m/s
10. OOOml s

Mara thonrl s

lOOmIs

0.0434979
0.0482772
0.0434632
0.0314951
0.0425034
0.0469252
0.0448325
0.0431256

200m/s

400m/s

800m/s

1500m/s

0.0648452
0.0558678
0.0432334
0.0535265
0.0587731
0.0572512
0.0562945

0.0688217
0.0428221
0.0537207
0.0617664
0.0599354
0.0567342

0.0468840
0.0523058
0.05715£0
0.0553945
0.0541911

0.076'6388

5000ml s

10, OOOml s

Marathonr/s

0.0942894
0.0909952

0.0979276

0.0959398
0.0937357
0.0905819

Principal Component Factor Analysis of S (m = 2)
Unrotated Factor Loadings and Communalities

Variable
lOOmIs
2 OOml s

400m/s
800m/s
1500ml s

5000m/s
10. OOOml s

Marathonr/s
Variance
% Var

Fact

0.171
0.219
0.223

0.195 '

0.256
0.301
0.296
0.293

0.49405 0.04622

0.844 0.079

Communali ty

0.038
0.061
0.060
0.038
0.066
0.094
0.092
0.093
0.54027
0.923

0.0729140

0.0745719
0.0736518

240

Rotated Factor Loadings and Communalities

varimax Rotation
Factor1
Variable
100m/s
200m/s
400m/s

800m/ s
1S00m/ s
5 OOOm/ s

10,000m/s

Marathonr/s
Variance
% Var

Communality
0.038
0.061
0.060
0.038
0.066
0.094
0.092
0.093

0.080
0.105
0.116
0.151
. 12

0.273
0.275
0.283
0.32860 0.21168

0.562 0.362

o .54027

0.923

Factor Score Coefficients

Variable
100m/s
2 OOm/ s

400m/s
800m/s
lS00m/s
5000m/ s
10, OOOm/ s

Mara thonr/ s

Factor1 Factor2
-0.197 -0.377
-0.287 -0.561
-0.254 -0.526
-0.078
0.048
-0.022
0.159
0.379
0.415
0.489

0.184
0.240
0.334

Using the unrotated loadings, the first factor might be identified as a "running
excellence" factor. All the running events have roughly the same size loadings on
this factor. The second factor appears to contrast the shorter running events (100m,
200m, 400m, 800m) with the longer events (1500m, 5000m, 10,000, marathon).
This bipolar factor might be called a "running speed-running endurance" factor.
After rotation the overall excellence factor disappears and the first factor appears to
represent "running endurance" since the running events 1500m through the
marathon have higher loadings on this factor. The second factor might be classified
as a "running speed" factor. Note, the 800m run has about equal (in absolute value)
loadings on both factors and the remaining running events in each 'Case have

moderate and roughly equal loadings on the factor. The two factor solution
accounts for 92% of the variance.
The correlation matrix R is shown next along with the scree plot. A two factor

solution seems warranted.

241

Correlations: 100m/s, 200m/s, 400m/s, 800mls, 1S00m/s, SOOOm/s, 10,OOOm/s, on
20 Om/ s

400m/s

800m/s

1500m/ s

0.909
0.794
0.697
0.755
0.726
0.700
0.661

0.836
0.784
0.778
0.745
0.732
0.706

0.754
0.758
0.760
0.744
0.691

0.895
0.852
0.833
0.800

0.916
0.899
0.872

5000m/s

10 i OOOm/s

100m/s

200m/s
400m/s
800m/ s

1500m/ s

5000m/s
10,OOOm/s

Marathonm/s

0.986
0.935

10, OOOm/ s

Marathonm/s

0.947

ScteePlot of lOOmIs, .., Maråthonl11$ (C()rrêlatiol1f1atr~)
7

5

l 4

'ii-

i:

3. 3
¡¡

2
1

o
1

2

3

4 5

6

Factr Number

Principal Component Factor Analysis of R (m = 2)
Unrotated Factor Loadings and Communalities

Variable

Communali ty

0.913
0.939
0.841
0.834
0.914
0.968
0.965
0.929

10 Om/ s
20 Om/ s
40 Om/ s
80 Om/ s

1500m/ s
5000m/ s

10,000m/s

Marathonm/ s

Variance
% Var

6.6258
0.828

0.6765
0.085

7.3023
0.913

7

8

242
Rotated Factor Loadings and Communalities

Varima Rotation
Variable
Factorl

Communality
0.913
0.939
0.841
0.834
0.914
0.968
0.965
0.929

0.369
0.423
0.466

100m/ s

200m/s
400m/ s

0.74i

800m/s

l500m/s

5000m/s

0.805
0.882
0.895
0.896

Variance

4.1116
0.514

10,000m/s
Marathonm/s
% Var

3.1907
0.399

7.3023
0.913

Factor Score Coefficients

Factorl Factor2

Variable

-0.315
-0.270
-0.186

10 Om/ s
20 Om/ s
40 Om/ s

800m/s
1500m/s
5000m/s

10,000m/s
Marathonm/s

0.178
0.236
0.341
0.371
0.405

-0.566
-0.522
-0.418
-0.004
0.056
0.178
0.215
0.261

~PC,m=2J '

. ""
.#'1(.

..

..

.....-..
.

.

.
.-

-2

~1
0
Fiis Fact

..
.

... .

1

2

243

Maximum Likelihood Factor Analysis of R (m = 2)
Unrotated Factor Loadings and Communalities

Communality
0.859
0.957
0.758
0.777
0.865
0.985
0.986
0.899

Factor1

Variable
100m/s
200m/s
400m/s
800m/s
1500m/s

5000m/ s

10.000m/s

Mara thonm/ s

0:773
0.806
0.797
0.870
0.928
0.989
0.986
0.942

% Var

7.08ti5
0.886

0.7485
0.094

6.3380
0.792

Variance

Rotated Factor Loadings and Communalities

Varima Rotation
Variable

Communality
0.859
0.957
0.758
0.777
0.865
0.985
0.986
0.899

100m/ s

200m/s
400m/ s
800m/ s

1500m/s
5000m/ s

10,000m/s
Marathonm/s
% Var

7.0865
0.886

3.1540
0.394

3.9325
0.492

Variance

Factor Score Coefficients

Factor1 Factor2
0.268
-0.128
0.951
-0.457
0.111
-0.046
0.055
-0.008

Variable
lOOmIs

200m/s
400m/ s
800m/ s
1500m/ s

5000m/s
10, OOOm/ s

Marathonm/s

0.012
0.570
0.711
0.089

0.055
-0.219
-0.388
-0.047

Score P,lotof 100m's., ..., Marathonm/s (MLE, m=2)
3

.

2

.

.

..

.

l 1
,f
" D

J

..

.

.

.
..

..

.

,. . .
.

.

00

.. .

.
.

-.. .

.
..

.

. .
. #t(L
-2

.

.

I

.

.

.

*11

-3

-3

.
.
.
..

-2

-1
0
Firs factor

1

i

244

The results from the two solution methods are very similar and very similar to the
principal component factor analysis of the covariance matrix S. Using the unrotated
loadings, the first factor might be identified as a "running excellence" factor. All
the running events load highly on this factor. The second factor appears to contrast
the shorter running events with the longer events although there is some difference
in the groupings depending on the solution method. The 800m and 1500m runs are
in the longer group for the principal component method and in the shorter group for
the maximum likelihood method. Nevertheless, this bipolar factor might be called a
"running speed-running endurance" factor. After rotation the overall excellence
factor disappears and the first factor appears to represent "running endurance" since
the running events 800m through the marathon load highly on this factor. The
second factor might be classified as a "running speed" factor. Note, for both
factors, the remaining running events in each case have moderately large loadings
on the factor. The two factor solution accounts for 89%-91 % (depending on
solution method) of the total variance. The plots of the factor scores indicate that
observations #46 (Samoa) and #11 (Cook Islands) are outliers.

The results of the m = 2 factor analysis of men's track records when time is
measured in meters per second are very much the same as the results for the m = 2
factor analysis of R presented in Exercise 9.30. If the correlation matrix R is factor
analyzed, it makes little difference whether running event time is measured in
seconds (or minutes) as in Exercise 9.30 or in meters per second. It does make a

difference if the covariance matrix S is factor analyzed, since the measurement
scales in Exercise 9.30 are quite different from the meters/second scale.

245

9.32. Factor analysis of data on bulls
Factor analysis using sample covariance matrix S
Initial Factor Method: Principal Components

Difference
Proportion

20579.6126
15704.9378
o . 8082

a . 8082

Cumulati ve

4874.6748 5 . 4292
4869.2456 2. 1129
0.1914 o . 0002
o . 9998

o . 9996

5

6

7

3.3163 a .4688

O. 0741

o .~045

2 . 8475
O. 0001

a . 3948

a . 0695

a .0000

a .0000

o .0000

1 .0000

1 .0000

1 .0000

1. ()OO

4

3

2

1

Eigenvalue

Factor Pattern
FACTORl

X3 a . 48777
X4 0 . 75367

X5 0.37408
X6 0.48170
X7 0.11083
X8 0.66769

X9 a . 96506

F ACTOR2

F ACTOR3

a . 39033

a .38532
-0 . 00086
a . 64446
a . 33505

o . 65725

a . 62342
a . 36809

-0.38394 -0.49074
o . 29875

-0 . 26204

o . 33038
o . 00009

Varimax Rotated Factor Pattern
F ACTORl F ACTOR2
FACTOR3
o . 32637

X3 0.50195 0.42460
X4 0.25853 0.90600
X5 0.83816 0.45576
X6 0.44716 0.42166

X7 -0.60974 -0.06913
X8 0 . 40890 a .46689
X9 -0.13508 0.30219

YrHgt

FtFrBody
PrctFFB
Frame
BkFat

SaleHt
SaleWt

from a covarance matrx and then

rotates the scaled loadings.
YrHgt

a . 18354

FtFrBody
PrctFFB

0.31943
0.15478

BkFat

0.33514

o . 50894
o . 94363

Frame

SaleHt
SaleWt

Initial Factor Method: Maximum Likelihood

Factor Pattern
X3
X4
X5
X6
X7
X8
X9

F ACTORl

F ACTOR2

FACTOR3

o . 00000

1 . 00000
o . 62380

o . 00000
o . 39838
o . 00000

0.42819
0.85244 a . 52282
-0.01180 o . 94025
-0.36162 -0.34428
0.85951
o . 08393
o . 00598

a . 36843

0.03120
a . 39308

YrHgt

FtFrBody
PrctFFB
Frame
BkFat

o . 28992

SaleHt

a . 83599

'SaleWt

Varimax Rotated Factor Pattern

FACTOR1 FACTOR2

X3 0 .94438 a . 28442
X4 0.41219 0.50159
X5 0 . 23003 a . 94883
X6 0.88812 0.25026
X7 -0.25711 -0.51405
X~ 0 . 75340 0 . 26667
19 (). 25282 -0.05273

F ACTOR3

0.16509
0.55648
0.21635

YrHgt
FtF;rBody

O. 18382

Frame
BkFat

0.27102
o . 43720
o . 87'634

BAS scaws the loadings obtained

PrctFFB

SaleHt
SaleWt

The scaling is Î../ ç: .

!J V"ü

246

Factor analysis using sample correlation matrix R

1 2 3 4
Eigenvalue 4.12071.33710.74140.4214

Initial Factor Method: Principal Components

6

7

0.1465

o . 0471

Difference 2.7836 0.5957 0.3200 0.2356
Proportion 0.5887 0.1910 0.1059 0.0602

5
O. 1858
o . 0393
o . 0265

o . 0994
o . 0209

Cumulati ve 0.5887 0.7797 0.8856 0.9458

o .9723

a . 9933

o .0067
1 .0000

, Factor Pattern
F ACTOR1

X3 a .91334
X4 a . 83700
X5 0.72177
X6 0.88091
X7 -0 .37900
X8 0.91927
X9 0.54798

F ACTOR2

FACTOR3

-0 .04948
0.15014

-0.35794

-0 . 36484
a . 00894

o .38772
o .48930

-0.38949
-0.03335
0.11715 -0.15210
0.21811
o .69440
o .82646

YrHgt

FtFrBody
PrctFFB
Frame
BkFat

SaleHt
SaleWt

Varimax Rotated Factor Pattern
X3
X4
X5
X6
X7
X8
X9

F ACTOR1 FACTOR2

FACTOR3

0.94188 0.27085

-0.06532

o .44792 0 .78354

o . 24262

0.26505 0.87071
0.93812 0.21799

-0.25513
-0.01382

-0.23541 -0.37460

o .79502

Frame
BkFat

0.83365 0.41206

0.13094
0.74194

SaleWt

o . 34932 0 . 39692

YrHgt

FtFrBody
PrctFFB

SaleHt

Ini tial Factor Method: Maximum Likelihood

Factor Pattern
X3
X4
X5
X6
X7
X8
X9

FACTORl
o . 00000

F ACTOR2

F ACTOR3

1 . 00000

a . 00000

YrHgt

0.42819

o .62380

o . 85244

o . 52282
o . 94025

o . 39838
o . 00000

FtFrBody
Pn:tFFB

0.03120

Frame
BkFat

-0.01180
-0.36162 -0 . 34428
o . 08393
O. 00598

0.85951
o .36843

o . 39308
o . 28992
o . 83599

SaleHt
SaleWt

Varimax Rotated Factor Pattern

FACTORl FACTOR2 FACTOR3

X3 0.94438 0.28442 0.16509
X4 0.41219 0 . 50159 0.55648
X5 0.23003 0 .94883 0 . 21635
X6 0.88812 0.25026 0.18382
X7 -0.25711 -0.51405 0.27102
X8 0.75340 0.26667 0.43720

X9 O. 25282 -0.05273 0 . 87634

YrHgt

FtFrBody
PrctFFB
Frame
BkFat

SaleHt
SaleWt

The interpretation of factors from R is different of the interpr,etation of factiJl' from
S.

247

Factor scores for the first two factors using S

and varimax rotated PC estimates of factor loadings
51

(O50

N-

..
.. -

. ...

.
.
.~. .

....., ,.

. .". .. . .

o .................._.............,.......-..i...._..n.~-_.................................................................

.0

.. .

"; -

..

ci

I

I

-2

-1

I

I

i

1

2

Factor scores for the first two factors using R

and varimax rotated PC estimates of factor loadings
"1 -

51

(O-

N

...

o

.
. . ..:..:.. .

..

.
.

o

.

'1

.o

......_..........................a......................,. .....................................................__.........._......__....

..

, ~ . ¡

"; -

.
. 0

:.:

.

r

ci-2

.

ii

i

I

i

i

.1

o

1

2

248

9.33 The correlation matrix R and the scree plot follow. The correlations are relatively
modest. These correlations and the scree plot suggest m = 2 factors is probably too

few. An initial factor analysis with m = 2 confirms this conjecture. Consequently,
we give am =3 factor solution.

Correlations: indep, supp, benev, conform, leader
supp
benev
conform

leader

indep
-0.173
-0.561
-0.471
0.187

supp

benev conform

0.018

-0.327
-0.401

0.298

-0.492

-0.333

Scte Plot ofindep, ...;leader

(¡.s

0.0,
1

3

2

Factr Number

Principal Component Factor Analysis of R (m = 3)

,.
~
lU

Unrotated Factor Loadings and Communalities

Variable
indep

supp
benev
conform

leader

Variance 2.1966

% Var 0.439

Fàctor2

Fact~r3

l:9 . 5 8 Q)

-0.009

-0.422
cr5~m,

0.163
0.100

-0.256

1. 3682 0.7559
0.274 0.151

Communality
0.943
0.909
0.670
0.819

0.979

4.3207
0.864

249
Rotated Factor Loadings and Communalities
Varimax Rotation

Variable Fact~ Factor2 Factor3

indep
supp
benev

(=0.971 0.018 -0.003

leader

-0.155 ~ -0.111

Variance

1.6506
0.330

conform

% Var

Communali ty

0.943
0.909
0.670
0.819
0.979

0.136 -0.i12 CO:890J

~1~
(~0.41-a -0.081
O....U9" -0.379 (-OJ077
1. 3587

1. 3114

0.272

4.3207
0.864

0.262

Factor Score Coefficients

Variable Factorl Factor2 Factor3
indep
-0.752 -0.362 -0.147
supp
0.119
-0.129
0.690
benev
0.372
-0.127
-0.010
conform
0.073
-0.277
-0.545
leader
0.240
0.832
0.008
..

Score Plot of,indep, ...,Ieader (PC, m=3l
4

. ..
.

.
.
..
..

.. .
..

'\. . -....
"

-..3

.
. ...

. .:
..

I" .

..

.

..,
.. .

..
.
.
. . ..
. .

.

. ... .

..

. ..

,1
0
Fii'l'actr

-2

Maximum Likelihood Factor Analysis of R (m = 3)
* NOTE * Heywood

case

Unrotated Factor Loadings

Variable

Factor3 Communality
1.000
-0.790
1.000
-0.086
0.532

CD

indep

supp
benev
conform

leader

Variance
% Var

and Communalities

1. 5591

0.312

1. 5486

0.310

0.194
0.000

0.589
1.000

1. 0133

4.1211
0.824

0.203

250
Rotated Factor Loadings and Communalities

Varimax Rotation

Variable
indep

supp
benev
conform

~ Factor2

Factor3 Communality

IT
0.515 ~

-0.992 0.034

0.0 8 -O.~

~ Gb.454)

-0.980)
0.098

leader

-0.129 0.9681

cg.432)
.213

Variance

1. 5842

% Var

0.317

1.3199
0.264

1. 2170

0.243

1. 000

1.000
0.532
0.589
1. 000

4.1211
0.824

Factor Score Coefficients

Variable Factor1 Factor2 Factor3
indep

supp
benev
conform

leader

-0 .123

-0 .130
0.219

-0. 024
-1. 069

-0. 000
-0. 000

O. 000
O. 000

-0. 000

-1. 016

0.011

1. 081

0.000

-0.211

Using the unrotated loadings and including moderate loadings of magnitudes
.4-.5, the factors are all bipolar and appear to be difficult to interpret. Moreover,
the arangement of relatively large loadings on each factor is quite different for the

two solution methods. The rotated loadings are consistent with one another for the
two solution methods and, although all the factors ar bipolar, may be easier to
interpret. The first factor is a contrast between Independence and the pair
Benevolence and Conformity. Perhaps this factor could be classifed as a

"conforming-not conforming" factor. The second factor is essentially a
"leadership" factor although if moderate loadings are included, this factor is a

251

contrast between Leadership and Benevolence. Teenagers with above average
scores on Leadership tend to be above average on this factor, while those with
above average scores on Benevolence tend to be below average on this factor.

Perhaps we could label this factor a "lead-follow" factor. The third factor is
essentially a "support" factor although, again, if moderate loadings are used, this
factor is a contrast between Support and Conformity. To our minds however, the
latter is difficult to interpret. The factor scores for the first two factors are similar
for the two solutions methods. No outliers are immediately evident.

9.34 A factor analysis of the paper property variables with either S or R suggests a m = 1
factor solution is reasonable. All variables load highly on a single factor. The
covariance matrix S and correlation matrix R follow along with a scree plot using
R. For completeness, the results for a m = 2 factor solution using both solution
methods is also given. Plots of factor scores from the two factor model suggest that
observations 58, 59, 60 and 61 may be outliers.
Covariances: BL, EM, SF, BS
BL

BL 8.302871

EM SF

EM 1. 886636

0.513359

SF 4.147318

0.987585 2.140046
0.434307 0.987966

BS 1.972056

Correlations: BL, EM, SF, BS

BL EM SF

EM 0.914

SF 0.984 0.942
BS 0.988 0.875 0.975

BS

o .480272

252
Principal Component Factor Analysis of S (m = 1)
Unrotated Factor Loadings and Communalities

Variable Factorl Communality
BL
EM

SF
BS

Variance
% Var

2.878
0.664
1.449
0.684

8.285
0.441
2.101
0.468

11. 295

11. 295

0.988

0.988

Factor Score Coefficients

Variable Factor1
BL
EM

SF
BS

0.734
0.042
0.188
0.042

The first factor explains 99% of the total varance. All varables, given their

measurement scales, load highly on this factor. Note: There is no factor
rotation with one factor.
Principal Component Factor Analysis of R (m = 1)
Unrotated Factor Loadings and Communalities

Variable

Communality

0.984
0.905
0.991
0.960

BL
EM

SF
BS

Variance 3.8395

% Var 0.960

3.8395
0.960

Factor Score Coefficients

Variable Factor1
BL
EM

SF
BS

0.258
0.248
0.259
0.255

The first factor explains 96% of the variance. All varables load highly and about
equally on this factor. This factor might be called a "paper properties index."

253

Maximum Likelihood Factor Analysis ofR (m = 1)
* NOTE * Heywood case
Unrotated Factor Loadings and Communalities

Variable Fac
BL
EM

SF
BS

Variance
% Var

1.000
0.914
0.984

Communality

1.000
0.835
0.968
0.975

0.988,

3.7784
0.945

3.7784
0.945

Factor Score Coefficients

Variable Factor1
BL
EM

SF
BS

1.000
0.000
0.000
0.000

The results are similar to the results for the principal component method. The

first factor explains about 95% of the varance and all varables load highly and
about equally on this factor. Again, the factor might be called a "paper properties
index."
Principal Component Factor Analysis of R (m = 2)
Unrotated Factor Loadings and Communalities

Variable Factor1 Factor2 Communality
0.993
-0.098
0.9 2
BL
0.999
0.307
0.951
EM
0.991
-0.008
0.996
SF
0.996
-0.191
0.980
BS
Variance
% Var

3.8395
0.960

0.1403
0.035

3.9798
0.995

Rotated Factor Loadings and Communalities

Varimax Rotation

Variable Factor1 Factor2 Communality
BL
EM

SF
BS

Variance
% Var

0.817
0.522
0.761
0.868

0.571
0.852
0.642
0.493

0.993
0.999
0.991
0.996

2.271 7

1.7082
0.427

3.9798
0.995

0.568

Factor Score Coefficients

Variable Factor1 Factor2
-0.361
0.650
BL
1. 821
-1.235
EM
0.128
0.232
SF
-0.8£8
1 . 013 1
BS

254

. #"-0

..

. lr61

. #S9

.

..,.. .I .
".
. ..

.. .,
. e...
.

.#tõfJ

..

,

..

-1 0
Firs FaCtor

Using the unrotated loadings, the second factor explains very little of the variance
beyond that of the first factor and is not needed. Since the unrotated loadings
provide a clear interpretation of the first factor there is no need to consider the
rotated loadings. The potential outlers are evident in the plot of factor scores.

Maximum Likelihood Factor Analysis of R (m = 2)
* NOTE * Heywood

case

Unrotated Factor

Loadings

Variable Factor
BL
EM

SF
BS

Variance
% Var

0.988
0.875
0.975
1.000

3.6900
0.922

and Communalities

Factor2 Communality
0.103
0.986
0.485
0.185
0.000

1. 000

0.2800
0.070

3.9700
0.992

1. 000

0.984

Rotated Factor Loadings and Communalities
Varimax Rotation

Variable Factor1
BL
EM

SF
BS

Variance
% Var

0.809
0.523
0.757
0.870

2.2572
0.564

Factor2 Communality
0.986
0.576
0.853
0.641
0.492

1.7128
Ù .428

Factor Score Coefficients

Variable Factor1
BL
EM

SF
BS

-0. 000
-1. 016
-0.0,00
1.759

Factor2
-0.000
1. 795

-0.000

-1. 078

1. 000

0.984

1. 000

3.9700
0.992

25S

The results are similar to the results for the principal component method. Using the

unrotated loadings, the first factor explains 92% of the total variance and the second
factor explains very little of the remaining variance. Since the unrotated loadings
provide a clear interpretation of the first factor (paper properties index) there is no
need to consider the rotated loadings. The same potential outlers are evident in the

plot of factor scores.

9.35 A factor analysis of the pulp fiber characteristic varables with Sand R for m = 1
and m = 2 factors is summarized below. The covarance matrix S and correlation

matrix R follow along with a scree plot using R. Plots of factor scores from the two
factor model suggest that observations 60 and 61 and possibly observations 57, S8
and 59 may be outliers. A m = 1 factor solution using R appears to be the best
choice.
Covariances: AFL, LFF, FFF, ZST
AFL

LFF
FFF
ZST

AFL

LFF

FFF

ZST

-3.21404
0.00577

221.05161
-185.63707
0.34760

308.39989
-0.40633

0.00087

0.06227
3.35980

Correlations: AFL, LFF, FFF, ZST

AFL LFF FFF

LFF 0.906
FFF -0.733 -0.711
ZST 0.784 0.793 -0.785

256

Principal Component Factor Analysis of S (m = 1)
unrotated Factor Loadings and Communalities

variable

Communali ty

0.047
175.573
279.858
0.001

AFL

LFF
FFF
ZST

variance 455.48

% Var 0.860

455.48
0.860

Factor Score Coefficients

Variable Factor1
AFL

LFF
FFF
ZST

0.000
0.433

-0.645

0.000

The first factor explains 86% of the total varance and represents a contrast between
FF (with a negative loading) and the AFL, LFF and ZST group, all with positive
loadings. AFL (average fiber length), LFF (long fiber fraction) and ZST (zero span
tensile strength) may all have to do with paper strengt while FF (fine fiber
fraction) may have something to do with paper quality. Perhaps we could label this

factor a "strength--uality" factor.

257
Principal Component Factor Analysis of R (m = 1)
Unrotated Factor Loadings and çommunalities

Variable

Communality
0.877
0.870
0.770
0.841

AFL

LFF
FFF
ZST

Variance
% Var

3.3577
0.839

3.3577
0.839

Factor Score Coefficients

Variable Factor1
AFL

LFF
FFF
ZST

0.279
0.278

-0.261
0.273

The first factor explains 84% of the variance and the pattern of loadings is
consistent with that of the m = 1 factor analysis of the covarance matrix S. Again,

we might label this bi polar factor a "strength-quality" factor.
Maximum Likelihood Factor Analysis ofR (m = 1)
Unrotated Factor Loadings and Communalities

Variable

Communali ty

0.900
0.894
0.614
0.717

AFL

LFF
FFF
ZST

Variance' 3.1245

% Var 0.781

3.1245
0.781

Factor Score Coefficients

Variable Factor1
AFL

LFF
FFF
ZST

0.422
0.394

-0. 090

0.132

The first factor explains 78% of the variance and the pattern of loadings is
consistent with that of the m = 1 factor analysis of the covariance matrix R using
the principal component method. Again, we might label this bi polar factor a

"strength-quality" factor.

258

Because the different measurement scales make the factor loadings obtained from
the covariance matrix difficult to interpret, we continue with a factor analysis of the
correlation matrix R with m = 2.

Principal Component Factor Analysis of R (m = 2)
Unrotated Factor Loadings and communalities

Factor2 communality
0.942
0.256

variable

~

AFL

- . 50

0.953
0.949
0.863

0.3493
0.087

3.7070
0.927

0.288

LFF
FFF
ZST

3.3577
0.839

Variance
% Var

Rotated Factor Loadings and Communalities
Varimax Rotation

Communality
0.942
0.953

Variable
AFL

LFF
FFF

0.949
0.863

ZST

2.0176
0.504

Variance
% Var

3.7070
0.927

1. 6893

0.422

Factor Score Coefficients

Variable Factor1 Factor2
0.696
0.757
0.613

AFL

LFF
FFF

-0.082

ZST

0.359
0.429

1. 075

-0.501

. *' sf

.t1~'!

. .. ... ..
.

... ~: ~
. l. ..-.. .

.... ..
.#'"1

.*"'0
-4

-3

~2 -1
Factor
FirS

1

2

259

Maximum Likelihood Factor Analysis of R (m = 2)
Unrotated Factor Loadings and Communalities

Factor2
-0.205
-0.292

Variable
AFL

LFF
FFF
ZST

Variance
% Var

3.2351
0.809

Communality

(-0: 38ij
0.033

0.876
0.943
0.944
0.752

0.2796
0.070

3.5146
0.879

Rotated Factor Loadings and Communalities
Varimax Rotation

Variable

F5a~~,,~~'

Communal i ty

0.876
0.943
0.944
0.752

AFL

LFF
FFF

- ,.8

. 01

ZST

Variance
% Var

2.0124
0.503

3.5146
0.879

1. 5023

0.376

Factor Score Coefficients

Variable Factor1 Factor2
-0.101
0.336
AFL
LFF
FFF
ZST

0.922
0.534
0.049

-0.423

-1. 197

0.076
m=2)

"'''c.. .¡",

..

..

.. ..

.. ..- .. .

.-:-. .....
" ..,.:
. ~
..+$7
1l~"i.

-3

-2

.":;2
-1

o

Firs Factr

Examining the unrotated loadings for both solution methods, we see that the second
the remaining variane. Also, this factor has
factor explains little (about 7%-8%) of
moderate to very small loadings on all the variables with the possible exception of

260
variable FF. If retained, this factor might be called a "fine fiber" of "quality"
factor. Using the rotated loadings, the second factor looks much like the first factor
for both solution methods. That is, this factor appears to be a contrast between

variable FF and the group of variables AF, LFF and ZST. To summarize, there
seems to be no gain in understanding from adding a second factor to the modeL. A
one factor model appears be sufficient in this case. However, plots of the factor
scores for m = 2 suggest observations 60, 61 and, perhaps, observations 57, 58 and
59 may be outliers.

9.36 The correlation matrix R and the scree plot is shown below. After m = 2 there is no
shar elbow in the scree plot and the plot falls off almost linearly. Potential choices
for mare 2, 3, 4 and 5. We give the results for m = 4 but, to our minds, here is a

case where a factor model is not paricularly well defined.
Correlations: Family, DistRd, Cotton, Maze, Sorg, Milet, Bull, Cattle, Goats
Bull Cattle
Sorg Millet
Maze
Family DistRd Cotton
DistRd -0.084
0.028
0.724
Cot ton
0.730
0.679 -0.054
Maze
0.109
0.383
0.568 -0.071
Sorg
Millet 0.506 0.022 0.389 0.217 0.382
0.353
0.443
0.623
0.765
0.727 -0.088
Bull
Cattle 0.336 -0.063 0.175 0.197 0.404 0.081 0.520
0.560
0.357
0.305
0.424
0.136
0.031
0.399
0.484
Goats

ScreeP.lotofFamily" ..,Goats

i

2

3

4 5 6
Factor Number

7

8

9

261

Principal Component Factor Analysis of R (m = 4)

F~

unrotated Factor Loadings and Communalities

Variable

Factor3

0.903

Family
DistRd
Cotton

tt~~
-0. 0

-0.068

0.175

Maze

-0.070
-0.396

sorg

Millet
Bull
Cattle

0.125
0.286

-0.178

Goa ts

% Var

Rotated

1. 0581

4.1443
0.460

Variance

0.118

Family
DistRd
Cotton

Maze

Sorg

Millet
Bull
Cattle
Goa ts

Variance
% Var

F~

~

0.714'

-0.026

\I .951í
8J11

0.092
0.226

Factor2

-0. 7 .

0.320
-0.022
0.150

~ ~~~
0.006

-0.301

o 564

-0.863 i

-0.026

-0.210

0.148
0.180

0.535
0.879
0.629

EO.~6)

2.7840
0.309

1.8985
0.211

1.6476
0.183

(0'.

724J

~~

7.3593
0.818

0.9205
0.102

and Communalities

Factor Loadings

Varimax Rotation

Variable

~

Factor4 Communality
0.842
-0.118
0.974
0.851
. 28
0.907
0.158
0.706
0.798
-0.582
0.856
0.811
0:466
0.614

0.108

Factor4 communality
0.842
0.080
0.974
61.986ì
0.851
-0.076
0.907
0.047
0.706
0.112
0.798
-0.029
0.856
0.043
0.811
0.074
0.614
-0.145
7.3593
0.818

1. 0291

0.114

Factor Score Coefficients
Factor4
Variable Factor1 Factor2 Factor3
o .Oti3
-0.171
-0.013
0.197
Family
-0.963
0.030
0.042
0.014
DistRd
-0.090
-0.115 -0.024
0.344
Cotton
0.023
0.247
-0.165
0.494
Maze
O.HlO
-0.374
0.246
-0.199
Sorg
-0.001
-0.078 -0.260 -0.697
Millet
0.005
0.110
0.204
0.224
Bull
0.019
0.329
0.633
-0.063
Cattle
-0.164
-0.156
0.338
-0.114
Goats
SCOff: PlolõfFamily" mjGoats (PC, m=4,
. 'l3'!

.

. t:i.r

.
...

.

'..~":;..

,

. e. . ,

-1

o

..

· .ft :;7
-#.¡S'

1
Firs

'i .,t. -lll.~'

Factor

2

3

4

262

Maximum Likelihood Factor Analysis ofR (m = 4)
unrotated Factor Loadings and Communalities

Factor3 Factor4 Communality
0.837
-0.162 -0.374
0.009
-0.044 -0.003
0.782
-0.044
-0.307
0.990
0.025
0.649
-0.071 r=i0~5
Q112
21)
0.369
-0.361
-0.301
0.962
0.131
-0.096
0.869
-0.074
0.465
-0.109
-rr.151
64~ì

Variable F~~~~
FamilY

- .064

DistRd
Cotton

a1

~
0.980
0.211

Maze

Sorg

Millet
Bull
Cattle

0.746
0.290
0.249

Goa ts

2.9824
0.331

Variance
% Var

Rotated

1. 7047

0.189

~

Factor Loadings and

Varirnax Rotation
Variable

- .605

FamilY

0.017

DistRd
Cotton

-0.362
-0.034

Maze

Sorg

-0.558
fj~'
rff

Millet
Bull
Cattle

-0.324
-0.15tì

C=Ô.466

Goa ts

Variance
% Var

2.2098
0.246

1.7035
0.189

0.6610
0.073

5.9322
0.659

0.5841
0.065

communalities
i ty
Factor3 Factor4 Communal
0.837
-0.148
0.229
0.009
0.025
-0.081
0.782
-0.370
0.075
0.990
-0.016
0.166
0.649
-0.089
0.303
0.369
-0.028 -0.120

0.915
fO.4il
0.268

E§:m

~:$

0.962
0.869
0.4ti5

1. 2850

0.7340
0.082

5.9322
0.ti59

0.143

Factor Score Coefficients

Factor3 Factor4
Variable Factor1 Factor2
0.247
-0.078
-0.606
0.013
FamilY
-0.002
-0.009
-0.002
0.001
DistRd
-0.161 -0.162 -0.113
0.033
cotton
0.681
0.109
0.440
0.995
Maze
0.206
0.017
-0.404
-0.023
Sorg
0.052
-0.062
-0.185
0.003
Millet
-1. 426
0.103
0.215
-0.026
Bull
0.385
0.896
0.091
-0.141
Cattle
-0.023
-0.010
-0.093
-0.009
Goa ts

,

. "7("

.
'.
( " ....

....1.

'"
-;

.
.

.

.#t;7

. l.:~

.lf52.

-.

-1

o 1
Firs Factr

. ":Js

263

The two solution methods for m = 4 factors produce somewhat different results.
The patterns for unrotated loadings on the first two factors are similar but not
identicaL. The patterns of loadings for the two solution methods on the third and

fourth factors are quite different. Notice that DistRd does not load on any factor in
the maximum likelihood solution. The factor loading patterns are more alike for the
two solution methods using the rotated loadings, although factors 2 & 3 in the
principal component solution appear to be reversed in the maximum likelihood
solution. The rotated loadings on factor 4 for the two methods are quite different.
Again, DistRd does not load on any factor in the maximum likelihood solution, it
appears to define factor 4 in the principal component solution. (From R we see that
DistRd is not correlated with any of the other varables.) Variables Family, Cotton,

Maze, and Bullocks load highly on the first factor. The variables Family, Sorghum,
Milet and Goats load highly on the second factor (maximum likelihood solution) or
the third factor (principal component solution). Growing cotton and maze is labor
intensive and bullocks are helpfuL. The first factor might be called a
"family far-row crop" factor. Milet and sorghum are grasses and may provide

feed for goats. Consequently, the second (or third in the case of the principal

component solution) factor might be called a "family farm-grass crop" factor.
The third factor in the maximum likelihood solution (second factor in the principal
component solution) may have different interpretations depending on the solution
method but in both cases, Bullocks and Cattle load highly on this factor. Perhaps
this factor could be called a "livestock" factor. The rotated loadings are
considerably different on the fourth factor. This factor is clearly a "distance to the
road" factor in the principal component solution. The interpretation is not clear in
the maximum likelihood solution. The fact that the two solution methods produce
somewhat different results and explain quite different proportions of the total
variation (82% for principal components, 66% for maximum likelihood) reinforces
the notion that a linear factor model is not paricularly well defined for this
problem. Plots of factor scores for the first two factors indicated there are several
potential outlers. If these observations are removed, the results could change.

Chapter 10
lO.l.

t-l/2lo
..-1/2
11 ~i2..-It_
t22 ~l
tii _- (0
a

2()

( .:S)2J

which has eigenvalues ~2 = (,95)2 and p;2 = o.

Thus (1)

The normlized eigenvector. are ~1 · (:1 and ': · (~l.

'ui= el
.1t 1/2x(1)
11 -= (0
a IJ1(.1xO)(X1J
(l )= x(1)
2
2

Since !i t2~/2 = (1 OJ, VI = xf2).

Thus Ui = x~l) ,VI = xfl) and Pi = .95.

iO.Z

* *

a)

Pi = .55, P2 = .49

b)

Ui = .32XP) - .36X~1)

Vi = .36Xfl) - .iox~2)

U2 = .20XP) + .30X~1)

V2 =

.23XF)

+ .30X~1)

iO.S

a)

-1 -1 Q-1D 0-10 (.4S189

t11t12t22t21 =~ 11~~22~1 =
.45189-). .28919

.14633

.14633

.28919)
.17361

= ).2_.5461 )..0005

.17361- ).
= ( À-.5457) P.-.OO09)

equation is the same as that of

The characteristic

ii/2 12 2~ 21 ii/2 (see Example 10.1) and consequently the
eigenvalues are the same.

b) U2 = -.671ZP) +l.OSSzll)
( 2)

Vz = -.863Zi

( 2)
+ .106ZZ

Var(U2) = (_0.677)2+(1.OSS)2_2(.677i(I.05S)(.4) = 1.0

Var(V2) = 1.0
Corr(UZ' V 2) = (-.677) (- .863) (.5)+( -.863) (1.u5S) (.3)

+ (.706)(-.677)(.6)+(.70ti)(I.0S5)(.4)= ..03 = p~

10.7

a)

0(p(1

* =,!,p
lp
Pi
1

Ui =

f2(l + p)

VI =

10.8

c)

1

r2(1+p)

266

(X(1)+X(I)
1 2 )

i 2

(X (2) + X( 2) )

A*

Pi = .72

A 'i

VI = .20xi2)+.70x~2)

e = 45- = 4 radians
A*

d)

PI =

.57.

A

Ui = 1.03 cos 61 +

VI =

.49 cos

A*

10.9

a)

Pl= .39

.46 Sin

a1

Sin

a2

e2 + .78

P2 = .07

Û1 = i.~6zll)-1.03Zl1); U2 = .30zl1)+.79Zl1i

V2 = -.02zl2)+1.0IZl2)

VI = 1.10zl2)-.45Z~2)

,b) n = 140, p=2, q=2, n-l- ¥p+q+1l = 136.5

Value of
Null hypothesi s

Ho: t12 = E'12 = a

test sta ti stic
-136.5 R.(. 8444 i ( .9953)

Upper 51

-Degrees of point of f

Freedom distribution
4

9.48

= 23.74

H~l): pi *0, pî=O =-136.5R.(.9953)
1 3.84
.65

A A

Therefore, reject Ho but do not reject H~l). Reading ability (summarized by Ui) does correlate with arithmetic ability (summrize~ by Vi)
but the correlation (represented by PI = .39) is not parti~ularly

strong.

JO.10

a)

267

A* A*
Pi = .33, P2 = .17

b) Û1 = i.002Z~l)-.003Z~i)

V i = -.602Z12) -.977Z~2)

U1 .nonprimary
Zi( i) -- 1973
d(
omic. h'
es 1standardized)
Vi : i zl2) +Z~2) = a "pun; shment index"

Punishment appears to be correlated with nonprimary homicides
but not primary homi ci des.

10.11 Using the correlation matrix R and standardized variables, the canonical
correlations and canonical variables follow. The Z(l).s are the banks, the Z2).S
are the oil companies.
p; =.348, p; =.130

Ûi = -.539z:I)+ i.209z~l) + .079z~1)

Û2 = i.142z:1) -.410z~1) +.142z~1)

Vi = 1.1 60z:2) - .26 lz~2)

V2 = -.728z:2) + 1.345z~2)

Additional correlations:

vi. .1.

R ZCI) = (.266 .913 .498), R" Z(2) = (.982 .532)
RVi.Z(2) = (.342 .185), Rvi.z(l) = (.093 .318 .174)

Here H 0 : 1:12 c¡12) = 0 is rejected at the 5 % level and H cil) : Pi- *' 0, p; = 0 is not

rejected at the 5% leveL. The first canonical correlation, although relatively small,
is significant. The second canonical correlation is not significant.

Focusing attention on the first pair of canonical variables, Û i is dominated by
Citibank, Vi is dominated by Royal Dutch Shell. The canonical correlation (.348)

between Û i and ~ suggests there is not much co-movement between the rates of
return for the banks on one hand and the oil companies on the other. Moreover,
Û i is not highly correlated with any of the Z2).S (oil companies) and ~ is not
the Zl)'s (banks). The first canonical varables
highly correlated with any of

differentiate stocks in different industries with some, but not much, overlap.

A*

a)

ID.12-

Pl =

.69,

Reject Ho:

A*

P12 = 0 a t the 5: level but do

Ui =

not reject

*

H~I) = pi 4: 0,
b)

268

P2 :I .19

P2 = a a t the 5: level.

. 77zI i ) +. 27Z~ 1 )

A

VI = .oszI 2) +. 90Z~ 2) + .19Z~ 2)

c)

Sample Correla tions
Vari ables

xU)
Variables
-

Be tween Original Variables and Canonical

A

..

Ui

Vi

A

X(2)
. Variables

Ui

A

Vi

.1

i.

annua 1

frequency

.99 .68

1.

of restaurant dining

2.

annua 1 frequency
of a ttendi n9 movi es

age of head of

.29 .42

househol d

2.

.89 .61

3.

annual fami ly income

.68 .98

educa ti ona 1 1 eve 1

.35 .S1

of head of household

d)

-

U1 is a measure of family entertainment outside the home. VI
may be considered a measure of family MstatuS" which is domin-

ated by family income. Essentially, family entertainment outside the home is positively associated with family income.

a)

10.13

,.P1 =

.909,

"'
P2

=

. 636,

?3

=

.256,

Va 1 ue of

test statistic

Null hypothesi s

1.

Ho: L12=P12=0

2

Ho: Pi *0, P2=." = P4=0

3.

Ho: PI *0,

P2 *0,

P3=O,

~4 --

.094

Degrees

Conclusion

of freedom

at a level

309.98

20

Reject H

78.63

12

Reject H

16.81

6

0

00 not r.eject Ho.

P4=O

0

269

Z(1)

i

Z(1 )
2

A

.30J

i~~J' r:~ -::: -::: -:;:

.55

z(l
)
3
Z(1)
4
Z(1)
5

Z(2)
1

.46 .03)

G~J' G::: -:::

Z(2)
2

.98 -.18

Z(2)
3
Z(2)
4

A

b) U1 appears to measure qual i ty of wheat as a "contrast" between
"good" aspects (Zl1), zll) and z¡i)) and "bad" aspects
(Z3 (l! Z4 (1) ).

Vi ; s harder to interpret. It appears to measure the quality of
the flour as represented by z12), z~2) and z~2).

270

10.14
a) pi = 0.7520, pi = 0.5395. And the sample canonical variates are

U1 U2

Raw Canonical Coetticients tor the Accounting measures ot protitability

BRA -0.494697741 1.9655018549

RRE 0.2133051339 -0.794353012

HR 0.7228316516 -0.538822808
RRA 2.7749354333 -4.38346956
RR -1 .383659039 1.6471230054
RR -1.032933813 2.6190103052

V1 V2

Raw Canonical Coetticients tor the Market measures ot protitability

Q 1.3930601511 -2.500804367
REV -0.431692979 2.8298904995

U1 is most highly correlated with RRA and HRA and also HRS and RRS. Ví is highly
correlated with both of its components. The second pair does not correlate well with
their respective components..

b) Standardized Variance ot the Accounting measures ot protitability
Explained by
Their elm
The epposi te

Canonical Variables
Cumulative

Proportion

Proportion

1

0.6041

0.5041

2

O. 0906

o . 6946

Canonical Variables

Canonical
R-Squared
0.5655
0.2910

CWlulati ve

Proportion

Proportion

0.2851
0.0263

0.2851
0.3114

Standardized Variance ot the Market measures ot protitability
Explained by

Their eim The epposi te

Canonical Variables Canonical Variables

Cumulative Canonical Cumulative
Proportion Proportion R-Squared Proportion Proportion

1 0.8702 0.8702 0.5665 0.4921 0.4921
2 0.1298 1.0000 0.2910 0.0378 0.5299

Market measures can be well explained by its canonical variate 'C. However, accounting
meaures cannot be well explained. In fact, from the correlation between measures and
canonical variates, accounting measures on equity have weak correlation with Ûi.
Correlations Between the Accounting measures ot
protitabili ty and Their Canonical Variables

U1 U2

BRA 0.8110 0.2711
HR 0 . 4225 0.0968

BR 0.7184 0.5626
RRA 0 .S60S .. .OOag

271

RRE 0.6741 -0.09S9

RR 0.7761 0.3814
Correlations Betveen the Market measures ot
pro~itability and Their Canonical Variables

V1 V2

Q
0.9886 0.1508
REV 0.8736 0.4866
10.15
pi = 0.9129, p; = 0.0681. And the sample canonical variates are

U1 U2
V1 V2

Rav Canonical Codticients tor the dynamc measure

X1 0.0036016621 -0.006663216
12 -0.000696736 0.0077029513
Rav Canonical Coetticients tor the static measures

13 0.0013448038 0.008471036
14 0.0018933921 -0.007828962
Standardized Variance ot the dynamic measure
Explained by
Their Olm

Canonical Variables

Proportion

Cumulative Canonical
Proportion R-Squared

Proportion

0.8840 0.8334

0.7367

0.7367

1 .0000 0 .0046

o . 0006

o .7373

1 0.8840
2 0.1160

Standardized Variance ot
Their Olm

Cumative
2

Proportion

the static measures
Explained by

The Opposite
~anonical Variables

Canonical Variables

1

The Opposite

Canoni~al Variables
Cumulative

Proportion

Proportion

0.9601
0.0399

0.9601
1.0000

Canonical
R-Squared
0.8334
o . 0046

Cumulative

Propor'tion

Proportion

o . 8002
o . 0002

o . 8002
o . 8003

Static meaures can be well explained by its canonical variate ill' Also, dynamic
meaures can be well explained by its canonical variate Vi.

272

10.16 From the computer output below, the first two canonical correlations are ßi =
0.517345 and P'2 = 0.125508. The large sample tests

-en - 1 - ~(p + q - 1) ) In((1 - p*~)(1 - p*D) ~ X;q(.05)
or
1

-(46-1-2"(3+2-1) )In((1-(.517345)2)(1-(.125508)2) J - 13.50 ~ X~(.05) = 12.59
and

-en - 1 - ~(p + q - 1) ) In((1 - p*D) ~ XlP-lXq-il05)
or
1

-(46 - 1 - 2"(3 + 2 - 1) ) In¡(1 - (.125508)2) J = 0:ô67 ~ X~(.05) = 5.99
suggest that only the first pair of canonical variables are important. Even if the variables
means were given, we prefer to interpret the canonical variables obtained from S in terms
of coeffcients of standardized variables.
Ûi - .4357zPJ - .7047zl1) + i.0815z~i)

Vi = i.020z~2) - .1609z~2)

The two insulin responses dominate Ûi while Vi consists primarily of the relative weight
variable.
Canonical Correlation Analysis

Adjusted Appr~x
Canonical

Correlation
1

0.517345

2

O. 125508

Canonical Standard
Correlation Error
0.517145
o .007324
0.125158
o . "009843

Squared
Canonical

Correlation
0.267646
0.015752

Canonical Correlation Analysis
Raw Canonical Coefficients for the ~lucose and Insulin

GLUCOSE 0.0131006541 0.0247524811

INSULIN -0.014438254 -0.009317525

I

NSULRES 0 . 023399723 -0.0"08667216

Raw Canonical Coefficients for the Weight and Fasting

WEIGHT 8.0655750801 -0.375167814
FASTING -0.019159052 0 .12~675138

273

Standardized Canonical Coefficients for the Glucose
GLUCOSE

0.4357 0.8232

INSULIN

-0.7047 -0.4547
1.0815 -0.4006

I

NSULRES

Standardized Canonical

Coefficients

WEIGHT

FASTING

Correlations

Between the Glucose

SECONDA2

1 .0202

-0.0475

-0.1£09

1.0086

and Insulin
PRIMARY

1

Variables

PRIMARY2

o . 3397

o . 6838

INSULIN

-0 . 0502

-0 . 4565
-0 . 5729

0.7551

and Fasting

and Their Canonical

GLUCOSE

I NSULRES

Correlations

for the Weight

SECONDAl

and Insulin

Between the Weight and Fasting and Their Canonical
SECONDAl
SECONDA2
WEI GHT
o . 9875
O. 1576
FASTING
o . 0465
O. 9989

Variables

10.17 The computer output below suggests maybe two .canonical pairs of variables. the
canonical correlations are 0.521594, 0.375256, 0.242181 and 0.136568. Ûi ignores the
first smoking question and Û2 ignores the third. Vi depends heavily on the difference of
annoyance and tenseness.
Even the second pairs do not explain their own variances very well. R~(1)IU2 = .1249

and R~(1)iV2 = 0.0879
Canonical Correlation

Canonical

Adjusted
Canonical

Correlation

Correlation

2
3

0.521594
0.375256
0.242181

0.52t771
0.374364
0.241172

4

O. 136568

o . 135586

1

Analysis
Approx

Standard
Error
o . 007280

o .008592

0.009414
0.009814

Squared
Canonical

Correlation
o .272060
D .140817
D .058652

0.018651

Standardized Canonical Coefficients for the Smoking

SMOKING 1 SMOKING2 SMOKING3

SMOKING4

-0.0430 1.0898 1.1161

-1.0092

SMOK2

1.1622 0 .6988 -1.4170

-1.3753 0.2081 0.015£

o . 1732

SMOK3

SMOK4

o .8909 -1 .6506 Q. 8325

SMOKl

1.6899
-0.2630

Standar4ize Canonical CoeffÜ:ients f-or the Psych and Physical State

274

STATE

CONCEN

1

0.4733

ST A TE2

ST A TE3

-0.8141
-0.4510

o . 4946
o . 5909

ANNOY

-0 .7,806

SLEEP
TENSE

o .2567

-0 . ~052
o . 3800

ALERT

0.6919
-0.1451

-0 . 1840

0.6981
-0.4190
-1.5191

IRRIT AB

-0 . 0704

O. 6255

~O . 3343

TIRED

0.3127

o .5898

CONTENT

o .3364

o . 4869

o . 2276
o . 8334

STA TE4
-0 . 1 t5'04

-0.7193
0.624'6
o . 4376

-0.7253
0.87£0
U .1861

-0.6557

Canonical Structure
Correlations Between the Smoking and Their Canonical Variables

SMOKINGl SMOKING2 SMOKING3 ~MOKING4
SMOK3

0.4458 0.5278 0.6615 0.2917
0.7305 0.3822 0.1487 0.5461
0.2910 0.2664 0.4668 0.7915

SMOK4

o . 6403 -0.0620 0 . 5586 0 .5236

SMOKl
SMOK2

Correlations Between the Psychological and Physical
State and Their Canonical Variables
STATE

1 STATE2 STATE3

5TA TE4

CONCEN

0.7199 -0.3579 0.0125

ANNOY

o .3035 0 . 1365 0 . 3906

-0.3137
-0.4058

SLEEP
TENSE

0.5995 -0.3490 0 .3709

o .2586

0.7015 0.3305 0.0053

..0.18'61

ALERT

o .7290 -0. 1539 -0. 1459

-0.3681

IRRIT AB

0.4585 0.3342 0.1211

-0 .0805
0.0749

TIRED
CONTENT

o . 6905 -0.0267 0 . 2544
o . 5323 0 . 4350 0 . 3207

-0.5601

275

Canonical Redundancy Analysis
Raw Variance of the Smoking
Explained by
Their Own
The Opposite

Canonical Variables

Canonical Variabl€s

Cumulati ve

Proportion

Proportion
o . 3068

2
3

o . 3068
O. 1249
o . 2474

4

0.3210

1. 0000

1

0.4316
o . 6790

Canonical
R-Squared Proportion
0.2721
o . 0835
0.1408
0.0176
o . 0587
0.0145
0.0187
o . 0060

Cumulati ve

Proport ion
'0 .0835

0.1'011

0.1155
0.1215

Raw Variance of the Psychological and Physical State

Explained by
The ir Own

The Opposite

Canonical Variables
Cumulati ve
1

2
3

4

Proportion

Proportion

o . 3705
O. 0879

0.3705
o .4583

0.0617
0.1032

Canonical Variable s
Cumulative

Canonical
R-Squared Proportion
o . 2721
0.1008
0.1408
0.0124

Proportion
o . 1008
O. 1132

0.5201

o . 0587

o . 0036

0.11'68

o . 6233

0.0187

0.0019

O. 1187

10.18 The canonical correlation analysis expressed in terms of standardized variables

follows. The Z(1).s are the paper characteristic varables, the :t2).S are the pulp
fiber characteristic variables.
Canonical correlations:
p; = .917, p; = .817, p; = .265, p; = .092

First three canonical variate pairs:
Ûi =-1.505z:1) -.212z~l) +1.998z~1) +.676z~l)
Vi =-.159z:2) +.633z~2) +.325z~2) +.818z~2)

Û 2 = -3.496z:l) -1.543z~1) + 1.076z~l) + 3.768z~\)
V2 = .689z:2) + i.oo3z~2) + .OO5Z~2) -1.562z~2)

Û3 = -5.702z:1) +3.525z~1) -4.714z~1) +7.153z~1)
V3 = -.513z:2) +.077Z~2) -i.663z~2) -.779z~2)

276

Additional correlations:
Ru,.zo) =(.935 .887 .977 .952), Ry,.Z(2) =(.817 .906 -.650 .940)
RU1.Z(2l =(.749 .831 -.596 .862), Ryi.zo) =(.858 .814 .896 .873)

Here H 0 : L12 (P12) = 0 is rejected at the 5 % level and H ¿I) : pt '# 0, P; = 0 is

rejected at the 5% leveL. H¿2): Pi. '# O,p; '# O,p; = P; = 0 is not rejected at the

5% leveL. The first two canonical correlations are significantly different from O.
The last two canonical correlations are not significant.

The first canonical variable Û 1 explains 88% of the total standardized varance of

it's set, the Z(1),s. The first canonical variable Vi explains 70% of the total
standardized variance of it's set, the z,2).S. The first canonical varates are good

summar measures of their respective sets of varables. Moreover, the first
canonical variates, which might be labeled a "paper characteristic index" and "a
pulp fiber strength-quality index", are highly correlated. There is a strong

association between an index of pulp fiber characteristics and an index of the
characteristics of paper made from them.
The second canonical variable Û 2 appears to be a contrast between the first two
variables, breaking length and elastic modulus, and the last two variables, stress
burst strength. However, the only moderately large (in absolute
at failure and
value) correlation between the canonical variate and it's component varables is
the correlation (-.428) between Û 2 and Z~I) , elastic modulus. The remaining

correlations are small. This canonical variable might be a "paper stretch" measure.
The canonical variable "2 appears to be determined by all variables except Z~2) ,

fine fiber fraction. This canonical variable might be a "fiber length/strength"
measure. The second pair of canonical variates is also highly correlated.

10.19 The correlation matrix R and the canonical analysis for the standardized varables

follows. The z,1), s are the running speed events (100m, 400m, long jump), the
z,2).S are the arm strength events (discus, javelin, shot put).

1.0

.7926J
.4682

.4682

1.0

.4179
Rl1 = .5520 1.0 .4706

R22 = .4179

.4706.6386J
1.0
(.6386
1.0 .5520
.4752J

RI2 = R;i = .2100 .2116 .2539
.3998 .1821
.3102
(.3509

.4953

( .7926
1.0

277

Canonical correlations:
p; = .540, p; = .212, p; = .014

Canonical variables:
Ûi = .540z~1) -.120z~1) +.633z~l)

Û2 = i.277z~l) -.768z~1) -.773z~1)

Vi = -.057z~2) + .043z~2) + 1.024z~2)

V2 = -.422z~2) -1.0685z~2) + .859z~2)

Û 3 = .399z~1) + .940z~1) - .866z~1)

V3 = 1.590z~2) - .384z~2) -1.038z~2)

Additional correlations:
RUI'Z(1) = (.662 .160 .732), Rv"Z(2J = (.772 .498 .999)

Here H 0 : L12 (P12) = 0 is rejected at the 5 % level and H cill : p; "* 0, p; = p; = 0 is

rejected at the 5 % leveL. H ci2) : p; "* 0, p; "* 0, p; = 0 is not rejected at the 5 %
leveL. The first and second canonical correlations are significant. The third
canonical correlation is not significant.

We might identify Ûi as a "running speed" measure since the 100m run and the
long jump receive the greatest weight in this canonical variate and also are each
highly correlated with Ûi' We might call Vi a "strength" or "ar strength"

measure since the shot put has a large coeffcient in this canonical variate and the
discuss, javelin and shot put are each highly correlated with Vi'

278

Chapter 11
given in (11-19) is

11.1 (a) The linear discriminant function

A (_ - )'8-1X =AI
a X

Y = Xl - X2 pooled

where
S~moo = ( _: -: i

so the the linear discriminant function is

((: i - (: iH -: -: 1 z=¡-2 ~=-2Xi

(b)

2 2

A =l(A
m
- Yl +A)
Y2 =l(AI
- a Xl +AI)'
a X2 =-8
Assign x~ to '11 if
Yo = (2 7)xo ~ rñ = -8

and assign Xo to '12 otherwise.

Since (-2 O)xo = -4 is greater than rñ = -8, assign x~ to population '11-

279

11.2 (a) '11 = Riding-mower owners; 1T2 = Nonowners

Here are some summary statistics for the data in Example 11.1:

Z¡ - (I:,,::: 1 '

Z2 - 1:::: 1

5, - ( ~:::::: -I::::: 1 '

82 = ( 200.705 -2.589 1
-2;589 4.464

8 pooled - ,

8-1
pooled =
.00637 .24475
( .00378 AJ06371

-7.204 4.273
_(.276.675
-7.204 i

The linear classification 'function for the data in Example 11.1 using (11-19)
is

.006371
r J'
x = L .100 .785 :.
20.267 17.633 .00637

( (109.475 i -( 87.400 i) i ( ,00378

where

1 1

.24475

ri = 2"(Yl + Y2) = 2"(â'xi + â'X2) = 24.719

280

(b) Assign an observation x to '11 if

0.100x¡ +0.785xi ~ 24.72

Otherwise, assign x to '12

Here are the observations and their classifications:

Owners
Observation a'xo
Classification
1
nonowner
23.44
2
owner
24.738
26.436
3
owner
25.478
4
owner
30.2261
5
owner
29.082
6
owner
27.616
7
owner
28.864
8
owner
9
25.600
owner
28.628
10
owner
25.370
11
owner
26.800
12
owner

Nonowners
Observation a/xo
Classification
1
owner
25.886
2
nonowner
24.608
3
nonowner
22.982
4
nonowner
23.334
owner
25.216
5
6
21. 736
nonowner
21.500
7
nonowner
24.044
8
nonowner
9
nonowner
20.614
10
nonowner
21.058
11
nonowner
19.090
20.918
12
nonowner

From this, we can construct the confusion matrix:

Predicted
Membership
'11 '12

Actual
membership

:~ j

11 1
2 10

Total
12
12

(c) The apparent error rate is 1~~i2 = 0.125
(d) The assumptions are that the observations from 7íi and 7í2 are from multi-

variate normal distributions;with equal covariance matrices, Li = L2 = .L.
11.3 l,Ne ned.t-o 'Shuw that the regiuns Ri and R2 that minimize the ECM are defid

281

by the values x for which the following inequalities hold:

Ri : fi(x) ;: (C(lj2)) (P2)
h(x) - c(211) Pi

R2 : fiex) ~ (cC112)) (P2)
h(x) c(211) Pi
Substituting the expressions for P(211) and p(ij2) into (11-5) gives

J R2 J Ri

ECM = c(211)Pi r fi(~)dx + c(li2)p2 r h(x)dx

And since n = Ri U R2,

1 =Jr h(x)dx
r h(x)dx '
Ri J +R2
and thus,

ECM = c(211)Pi (1 - k.i fi(x)dx) + c(112)p2 ~i h(x)rix

Since both of the integrals above are over the same. region, we have

ECM = r (c(112)p2h(x)dx - c(21

JRi

l)pifi

(x)ldx + c(2~1)Pi

The minimum is obtained when Ri is chosen to be the regon where the term in
brackets is less than or equal to O. So choose Ri so that

c(211)pifi( x) ;: c(112)pd2(:i )'Ur

282

h(æ) )0

h(x) - (C(112)
c(2j1)) Pi
(P2)

11.4 (8) The minimum ECM rule is given by assigning an observation :i to '11 if

fi(æ) )0 (C(112)) .(pi) = (100) (~) = .5

h(x) - c(211) Pi 50.8
and assigning x to '12 if

fi(x) ~ (C(112))(!!) = .(100) (.2) = .5

f2(x) c(211) Pi 50.8
(b) Since fi(x) = .3 and f2(x) = .5,
fi(x) = 6;: 5

hex) . -'
and assign x to '11'

11.5

- ~ (~-~1)'t-1(~-~1) + ~ (~-~2)lt~1(:-~2) =

1 1 1 1 - 1 1+- 1 l +-1 1
- 2(~lr :-2:~r ~+~~r, :i-~'t :+2:2+ :-~2+ ~2

1 i - 1 l l- 1 1,,- 1 J

= - 2(-2(:1-:2) ~ ~+~l~ :1-:2~ :2

i -1 i ( ) i l- 1 ( )

= (:1-:2) t : -2 :'-:2 If :1+~2.

283

11.6 a) E(~'I~I7ii) -aa = .:!:l - m = ~l!:i - ~ ~l(~i + !!2J
= 1 ~I (~i - !!2) = ~ (!:i - !:2) i r i (~i -!!) ~ 0 s ; nee
r1 is positive definite.

b) E ( ~,1 ~ lir 2) - II = ~ 1!:2 - m = l ~l (~2 - ~1)

_ 1 ( ),..-1 (
- - '2 ~l - ~2'" ~l - ~2) ~ 0 .

11.7 (a.) Here are the densities:

--x

1.0

1.0

0.6

0.6

~
0.2

-0.2

0.2
R_1

1/4

R_2

-1.0 -0.5 0.0 0.5 1.0 1.5
x

-0.2

R_1 -1/3

R_2

-1.0 -0.5 0.0 0.5 1.0 1.5
x

284

(b) 'When Pi = P2 and c(112) = c(211), the classification regions are

!i(x)

Ri ..hex)
hex)~- 1

R2 : h (x) ~ 1

These regions are given by Ri : -1 ~ x ~ .25 and R2 : .25 ~ x ~ 1.5.

(c.) When Pi = .2, P2 = .8, and c(112) = c(211), the clasification regions are

R2 : fiex)
h (x) ~ .4

Ri : fi(x) ;: P2 = .4

hex) - Pi

These regions are given by Ri : -1 ~ x ~ -1/3 and R2 : -1/3 ~ x ~ 1.5.

11.8 (al Here are the densities:

i.
ci

-~

C'

ci

,.

ci

,.

cii

R_2
-1

-1/2 R_1 1/6

R_2

o

1

x

(b) When Pi = P2 and c(112) = c(2Il), the classification regions are

R1. .h(x)
h(x);:- 1

R2 : !i(x)
hex) .( 1

2

285

These regions are given by

Ri : -1/2 =: x ~ 1/6 and R2 = -1.5 ~ x ~ -1/2, 1/6 ~ x ~2.5

11.9
a'B ,ua

=

!'((~1-~)(~1-~)' + (~2-~)(~2-~),J~'

a/La

a1ta
-

whI,ere
+ ) Thus
~ = 2'
"_1~1
- u-_
~2.
= l(2.
~ ll_l - U_2) and 11_2 - ~ =
tt ~2 - ~l ) so

a'B ,ua =

a/La

! ~I (~1-~2)(~1-~2) I ~

ala
- I

,

28~

11.10 (a) Hotellng's two-sample T2-statistic is

T2 - (:Vi - X2)' f (~i + n~) Spooled J -i (Xi - X2)

- (-3 - 2j ((I~ + 112) l-::: -::: If L ~: I = 14.52

.. ..

Under Ho : l.i = 1J2,

T2", (ni + n2- 2)p F. . .
ni n2 - P -

+ 1 p,nl+n2-p-l

Since T2 = 14.52 ~ ~i~i~~-;~~ F2,2o(.1) = 5.44, we reject the null hypothesis
Ho : J.i = J.2 at the Q' = 0.1

level of significance.

(b) Fisher's linear discriminant function is

Yo = â'xo = -.49Xi - .53x2

.i

(c) Here, m, = -.25. Assign x~ to '1i if -A9xi - .53x2 + .25 ~ O. Otherwise

assign Xo to '12.

For x~ = (0 1), Yo = -.53(1) = -.53 and Yo - m = -.28 ~ o. Thus, assign
Xo to '12.

287

11.11 Assuming equal prior probabiliti€s Pi = P2 = l, and equal misclasification costs
c(2Il) = c(112) ~ $10:

c
9

10
11

12
!13

14

P(BlIA2) P(B2IAl) P(A2 and Bl)
.006
.023
.067
.159
.309
.500

.691
.500
.309
.159
.067
.023

.346
.250
.154
.079
.033
.011

peAl and B2)
.D03

.011
.033
.079
.154
.250

P( error)

Expected
cost

.349
.261
.188
.159
.188
.261

3.49
2.61
1.88
1.59
1.88
2.61

minimized for c = 12 and the minimum expected

Using (11- 5) ) the expected cost is

cost is $1.59.

1i.~2 Assuming equal prior probabiltiesPi = P2 = l, and misclassificationcosts c(2Il) =
$5 and c(112) = $10,

expected cost = $5P(A1 and B2) + $15P(A2 and B1).

c
9
10
11
12
13

14

P(BlIA2) P(B2/A1) P(A2 and Bl)
0.006
0.023
0.067
0.159
0.309
0.500

0.691
0.500
0.309
0.159
0.067
0.023

P(AI and B2)

P(error)

0.003
0.011
0.033
0.079
0.154
0.250

0.349
0.261
0.188
0.159
0.188
0.261

0.346
0.250
0.154
0.079
0.033
0.011

Expected
cost
1.78
1.42
1.27
1.59
2.48
3.81 .

Using (11- 5) , the expected cost is minimized for c = 10.90 and the minimum
expected cost is $1.27.

11.13 Assuming prior probabilties peAl) = 0.25 and P(A2) = 0.715, and misoassIÍca-

tion costs c(2Il) = $5 and c(lj2) = $10,
expecte cost = $5P~B2jAl)(.2'5) + $15P(BIIA2)(.75).

288

c
9

10
11
12
13
14

P(Bl/A2) P(B2/A1) P(A2 and Bl) P(A1 and B2) P(error)
0.006
0.023
0.067
0.159
0.309
0.500

0.691
0.500
0.309
0.159

0.067
0.023

0.173
0.125
0.077
0.040
0.017
0.006

0.005
0.017
0.050
0.119
0.231
0.375

Expected
cost

0.178
0.142
0.127
0.159
0.248
0.381

0.93
0.88
1.14
1.98
3.56
5.65

Using (11- 5) , the expected cost is minimized for c = 9.80 and the minimum
expected cost is $0.88.

11.14 Using (11-21),

79
A* - v'â'â
-.61 1
ai
-â -(.-and
m*i = -0.10

Since â~xo = -0.14 ~ rñi = -0.1, classify Xo as 7i2'

Using (11-22),

~1 and
m; = -0.12
-.77

aA 2* -- a~ --( 1.00 i

Since â;xo = -0.18 ~ m; = -0.12, classify Xo as '12.

These

'results are consistent with the classification obtained for the case of equal

prior probabilties in Example 11.3. These two clasification r.eults should be
identical to those of Example 11.3.

289

f1 (xl (C(lIZl P2J

11.15

fZ(~) l eT Pi defines the same region as
PzJ. For a multivariate
1n fi(~) -In f2(~) l rc(1IZ)
1n Le-pi
normal distribution

, - 1 - - , --1

1n f.(x) = _12 ln It.1 _.22 ln 2rr - 21(x-ii,.)'r'(x-ii.), i=1,2
so

1 n f1 (~) - , n f 2 (:) = - ~ (:-~1)' ~i 1 (:-~, )

1 ( ) ,+- , 1 ( I t i I)

+ 2' ~-!:2 +2 (~-~Z) - '2 1n M
_ 1 ( ,.,-1 '+ -1 , +-1
- - i : "'1 : - 2~rl'1 : + ~1 "'1 ~1

1 +- -1!:2+2
,- 1!:2)1- ('21n
U iW
i/ )
- ~ ,'2 t~ -+ ,2!:2'12~
1 1(+-1 +-1) (,+-1 ,+-1)
= - 2 ~ ~1 - '12 ~ + ~1+1 - ~2"'2 ~ - k

where

1
k='21n

(iii/)
1 i ~1
-1- ,
-1 ~2) .
iW + I'!!i+1
~2i2

290

11.16

Q = In ..

fi(x)

= - i lnl+il - i(:-~l) 'ti1 (~-~1)

(f ¡(X)J

1 l' -1
+ '2 In!t21 + 2'(~-~2) t, (~-~2)

1 , (..-1 t- 1 ) i +-, 1+- i
.. .. - 1..1

= - -2 x +i -+2 X + X t II - _X 1'2 ll_Z - k

where

When

k =12'(1n (I
t ii ) 1..-' 't-1 ' J
ii + ~, 1'1 ~i - ~2T2 ~2

.

ti = h = t,

Q= i~ -i1-:1+-1
1 (~i iT t-~1 1- !:21'
1+-1)
l' ~1
+~2 - 2'
~Z

It-'()'( 1+-1 '

= ~ l ~1 - LZ - 2' l:i - e2) l (~1 +!:Z)

11.17 Assuming equal prior probabilties and misclassification costs c~2Ii) = $W and
c(1/2) = $73.89. In the table below ,

Q-__
i ("(-i "(-i) (i "(-i i -i)
2 Xo LJi - ~2 Xo + J.i ~i - 112:E2 :to
1

-~l (IEil) _ ~( i~-l i -1 )

2 n 1~21 2 1-1 i 1-1 - 1-2~2 1-2

291

x
(10,
(12,
(14,
(16,
(18,
(20,
(22,
(24,
(26,
(28,
(30,

P('1ilx)

P

('12

I

Q

x)

15)'

1. 00000

0

17)'
19)'
21)'
23)'
25)'
27)'

0.99991
0.95254
0.36731
0.21947
0.69517
0.99678

291'

1. 00000

31)'

1. 00000

331'

1. 00000

35)'

1.00000

0.00009
0.04745
0.63269
0.78053
0.30483
0.00322
0.00000
0.00000
0.00000
0.00000

Clasification

18.54
9.36
3.00
-0.54
-1.27

'1i

0.87

1l2

5.74
13.46
24.01
37.38
53.56

'1i

'1i
'11

ii2
'12

'1i
'1i
'11

'1i

The quadratic discriminator was used to classify the observations in the above
table. An observation x is classified as '11 íf

Q ~ In r(C(112)) (P2)J = In (73.89) = 2.0

L c(211) Pi 10

Otherwise, classify x as '12.

For (a), (b), (c) and (d), see the following plot.

50

40
30

0

0

0

0

0

C\

x'

20
10

0

o

20

10

)L1

30

292

11.18

The vector: is an (unsealed) eigen'l.ector of ;-1B since

t-l t-l 1

B: = t c(~1-~2)(~1-~2)IC+- (~1-~2)

= c2t-l (~1-~2) (~1-~2) i t-1 (~1-~2)

= A t-1 (~1-~2) = A :
where

A = e2 (~1-~2) 't-l (!:1-~2) .

11.19 (a) The calculated values agree with those in Example 11.7.

(b) Fisher's linear discriminant function is

A AI 1 2
3 3

Yo = a Xo = --Xl + -X2

where

17 10 27

3 3 6

Yl = -; Y2 = -; rñ = - = 4.5

Assign x~ to '1i if -lxi + ~X2 - 4.5 ~ 0
Otherwise assign x~ to '12.

a"'i
Xo ..
-m
'1i

Observation

'12

Classification

Observation

'11

1

2

2.83
0.83

'1i

3

-0.17

'12

2
3

1

a-I
Xo -..
m
-1.50
0.50
-2.50

Classification
112

7(1
7í2

293

The results from this table verify the confusion matrix given in Example 11.7.

(c) This is the table of squared distances ÎJt( x) for the observations, where

D;(x) = (x - xd8~;oied(X - Xi)

'11

Obs. ,ÎJI(x)

i

ÎJ~ (x)

'12

Classification

Obs.

ÎJ~ (x)

ÎJH x )

Clasification

3

3

'1i

1

313

i3

7f2

2

i

J!

'1i

2

l3

i3

7fi

3

4

3

3

3

'12

3

19
3

4

1

3

21

3

3

7f2

The classification results are identical to those obtained in (b)

11.20 The result obtained from this matrix identity is identical to the result of Example
11.7.

11.23 (a) Here ar the normal probabiHty plot'S for each of the vaables Xi,X'2,Xa, X4,XS

294

-2

.1

o

2

295

-2

-1

0

2

....~a_~
300

~x

0

~
00/

0

ocP

250

200

0
.2

-1

0
2

0

.2

-1

0

2

0

80

60

II
x

40
20

,i.III.ID.ooO 0

0

.2

-1

0

1

2

Standard Normal Quantiles

Variables Xi, xa, and Xs appear to be nonnormaL. The transformations In\xi)

and In(xs + 1) appear to slightly improve normality.

(b) Using the original data, the linear discriminant function is:

y = â' x = 0.023xi - O.034x2 + O.2lx3 - 0.08X4 - 0.25xs

where
ri = -23.23

, In,(x3 + 1),

296

Thus, we allocate Xo to Í1i (NMS group) if

âxo - rñ = 0.023xi - 0.034x2 + 0.2lx3 - 0.08X4 - 0.25xs + 23.23 ;: 0

Otherwise, allocate Xo to '12 (MS group).

( c) Confusion matrix:

Predicted
Membership
'1i '12

Actual
membership

;~ j

66 3
7 22

Total

t ~~

APER= 6~~~9 = .102
This is the holdout confusion matrix:

Predicted
Membership
,'1i '12

Actual
membership

;~ j

64 5
8 21

Total

t ~:

Ê(AER) = 6~~~9 = .133

11.24 (a) Here are the scatterplots for the pairs of observations (xi, X2),tXi, X3), and
~Xl' X4):

297

0

0.1

+

++* +:

0.0
C\

+

Q.

++++

+it+

+

+

ce

0

)(

-0.2
-0.3

+lt

o 0

+

.0.1

+

0

bankrupt
nonbankrupt

0
0
0

0

-0.4
-0.6

-0.2

-0.4

0.6

0.4

0.2

0.0

+

5
+

C"

+

+

4
3

++0+;
++

+

)(

+
+
++

++

2

0+

0

0

-0.6

-0.4

+

000

oOi 8(3

~

0

1

0
+

+

-0.2

0.6

0.4

0.2

0.0

0.8
-a

+

)(

0

0.4
0.2

óJ
0

0

0

0.6
+

+

0
o Cò
+

0 0
0

0

-0.6 -0.4 -0.2

+

+
++

+

+

0

0
0

+
+

+

Ll

\

+

0.2

+

q.

+

0++
0.0

+

0.4

0.6

x1

The data in the above plot appear to form fairly ellptical shapes, so bivaate
norma1ìty -does not seem like an unreasonable asumption.

298

(b) '11 = bankrupt firms, '12 = nonbankrupt firms. For (Xi,X2):

Xi

-

8i

-

-0.0819 i '
( -0,0688
X2

-

0.02847 0.02092
( 0,0442 0.02847 J

82

-

0.0551
( 0.2354
i'

0.00837 0.00231
lO'M735
0.Oæ37 J

(c), (d), (e) See the tables of part (g)

(f)
0.01751 J
8 pooled =

0.01751 0.01077

( 0.04594

Fisher's linear discriminant function is

y = â'x = -4.67xi - 5.l2x2

where

rñ = -.32
Thus, we allocate Xo to '1i (Bankrupt group) if

âxo - rñ = -4.67xi - 5.12x2 + .32 ~ 0

Otherwise, allocate Xo to '12

APER= :6 = .196.

(Nonbankrupt group).

299

Since 8i and 82 look quite different, Fisher's linear discriminant function

For the

various classification rules and error rates for these variable pairs, see

the following tables.
This is the table of quadratic functions for the variable pairs .(Xb X2),~Xb X3),

and (Xb xs), both with Pi = 0.5 and Pi = 0.05. The classification rule for any

of thee functions is to classify a new observation into 1ii (bankrupt firms)
if

the quadratic function is ~ 0, and to classify the

new observation into

300

'12 (nonbankrupt firms) otherwise. Notice in the table below that only the

constant term changes when the prior probabilties ~hange.

Variables

Prior

Quadratic function
-61.77xi + 35.84xiX2 + 407.20x~ + .s.64xi - 30.60X2

Pi = 0.5

(Xi,X2)

Pi = 0.05

-i.55x~ + 3.S9xiXa - 3.08x3 - 10.69xi + 7.9ûxa

Pi = 0.5
(xi, Xa)

Pi = 0.05
-0.46xf. + 7.75xiX4 + 8.43x¡ - 10.05xi - 8.11x4

Pi = 0.5
(Xl, X4)

Pi = 0.05

+
-

0.17
3.11
3.14
6.08
2.23
0.71

Here is a table of the APER and Ê(AER) for the various variable pairs and
prior probabilties.

APER
Variables
(Xi, X2)

(Xi, xa)
(Xi, X4)

Ê(APR)

Pi = 0.5

Pi = 0.05

Pi = 0.5

Pi = 0.05

0.20
0.11
0.17

0.26
0.37
0.39

0.22
0.13
0.22

0.26
0.39
o ,4t)

For equal priors, it appears that the (Xl, Xa) vaiable pair is the best clasifer,

as it has the lowest APER. For unequal priors, Pi = 0.05 and P2 = 0.95, the
variable pair (xi, X2) has the lowet APER.

301

(h) When using all four variables (Xb X2l X3, X4),
0.04424 0.02847 0.03428 0.00431

-0.0688
Xi

-0.0819

-

,

-

8i

X2

iJ.0330u

1.3675

0.03428 0.02580 0.1'6455

0.4368

0.00431 0.00362 0.03300 0.04441

0.04735 0.00837 0.07543

-u.00662

0.00837 0.u023l 0.00873

D.0003l

2.5939

0.07543 0.00873 1 :04596

0.03177

0.4264

-0.00662 0.00031 0.03177

0.02618

0.2354

-

0.02847 0.02092 0.0258D () .00362

0.0551
,

82

-

Assign a new observation Xo to '1i if its quadratic function .given below is less
than 0:

Prior

Quadratic function

-49.232 -20.657 -2.623

Pi = 0.5

4.91

14.050

-52.493

-28.42

-20.657

526.336

-2.623

11.412

-3.748

1.4337

8.65

14.050 -52.493

1.434

11.974

-11.80

11.412

x'0

xo+

Xo

Pi = 0.05

For Pi = 0.5 : APER = ;6 = .07, Ê(AER) = ;6 = .11
For Pi = D.n5 : APER = :6 = .20, Ê(AER) = ¡~= .24

-

2.69

-

5.64

302

11.25 (a) Fisher's linear discriminant function is

Yo = a' Xo - rñ = -4.80xi - 1.48xg + 3.33

Classify Xo to '1i (bankrupt firms) if

a' Xo - rñ ;: 0

Otherwise classify Xo to '12 (nonbankrupt firms).

The APER is 2:l4 = .13.

, This is the scatterplot of the data in the (xi, Xg) coordinate system, along

with the discriminant line.

5
4

C'

3

x
2
1

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

x1

(b) With data point 16 for the bankrupt firms delete, Fisher's linear discrimit

303

function is given by

Yo = a'a;O - m = -5.93xi - 1.46x3 + 3.31

Classify Xo to'1i (bankrupt firms) if

a'xo - m, 2: 0

Otherwise classify Xo to '12 (nonbankrupt firms).

The APER is 1;;4 = .11.

With data point 13 for the nonbankrupt firms deleted, Fisher's linear discriminant function is given by

Yo = a'xo - m = -4.35xi - i.97x3 + 4.36

Classify Xo to '1i (bankrupt firms) if

a/:.o - m ;: 0

Otherwise classify Xo to '12 (nonbankrupt firms).

The APER is 1;;3 = .089.
This is the scatterplot of the observations in the (Xl, X3), coordinate system

with the discriminant lines for the three linear discriminant functions given

abov.e. Als laheUed are observation 16 for bankrupt

firms and obrvtion

304

13 for nonbankrupt firms.

It appears that deleting these observations has changed the line signficantly.

11.26 (a) The least squares regression results for the X, Z data are:

Parameter Estimates

Variable

DF

INTERCEP

1

X3

1

Parameter
Estimate

Standard
Error

Paramet-er=O

Prob ;) ITI

-0.081412
0.307221

o . 13488497
o .05956685

-0.604
5.158

o .5492

T for HO:

0.0001

Here are the dot diagrams of the fitted values for the bankrupt fims and for
the nonbankrupt firms:

305

.. .. .. .... ...

+~--------+---------+---------+---------+---------+----- --Banupt

.
. . ..
.....
+---------+---------+--- ---- --+---------+---------+----- - - N onbanrupt

o . 00 0 . 30 O. 60 0 .90 1. 20 1.50

fitted values:

This table summarizes the classification results using the

OBS

GROUP

13
16
31

banrupt
banrupt
nonbankr

34
38
41

nonbanr
nonbanr
nonbanr

FITTED

CLASSIFICATION

--------------------------------------------o . 57896

0.53122
0.47076
O. 06025

o .48329
o . 30089

misclassify
misclassify
misclassify
misclassify
misclassify
misclassify

The confusion matrix is:

Predicted
Membership
'11 '12

Actual
membership

'11 =1 19 2

'12 J 4 21

Total

t ;;

Thus, the APER is 2:t4 = .13.
(b) The least squares regression results using all four variables Xi, X2, X3, X4 are:

306

Parameter Estimates
Standard

Error

Parameter=O

Pr.ob ;) ITI

1.122
0.335
1.268
3.214
-0 .944

O. 2ô83
o .7393

Variable

DF

Parameter
Estimate

INTERCEP

1

0.208915

Xl

1

o . 156317

0.18615284
0.46653100

X2
X3
X4

1

1. 149093

o . 90606395

1

o . 225972

1

-0.305175

0.07030479
0.32336357

T fo.r HO:

0.2119
o . 0026
O. 3508

Here are the dot diagrams of the fitted values for the bankrupt firms a:nd for

the nonbankrupt firms:

_+_________+_________+_________+_________+_________+__---Banrupt

_+_________+_________+_________+_________+_________+__---N onbankrupt

-0.35 0 . 00 0 .35 0 . 70 1 .05 1.40
This table summarizes the classification results using the fitted values:

OBS

GROUP

FITTD

CLASSIFICATION

o . 62997
o . 72676

misclassify
misclassify
misclassify
misclassify

---------------------------------------------15
16
20

banrupt
banrupt
banrupt

34

nonbanr

0.55719
0.21845

The confusion matrix is:

307

Predicted
Membership
Total

'1i '12
Actual
membership

18 3
1 24 F ;~

:~ j

Thus, the APER is 3::1 = .087. Here is a scatterplot of the residuals against

the fitted values, with points 16 of the bankrupt firms and 13 of the nonbankrupt firms labelled. It appears that point 16 of the bankrupt firms is an

outlier.

+

0

16

+

0.5

+.

"- ~

0

In

-e

:i

"C
'ëi

(I

0.0

bankrupt
nonbankrupt

"'~

°
Cò

+

0

c:

~
-0.5

0.0

,

+

.
+

°eo

+

°

°

13
0

0.5
Fitted Values

11.27 ~a) Plot'Üfthe4ata in the (Xi,X4) variabte space:

1.0

1.5

308

2.5

:;

:;

:;

2.0

:;
:;

~
,'§

-i:

1.5

Ii

-

+

Jo

+

Q)

'V

1.0

+

:;

:;

:;

:;

:;

:; :;

:;

:;

:;

:;
:;

:;

:; :;

:;
:;

:2

:;
:; :; :;

:; :; :; :; :; .

+

+
:;

+ + +

+

.++++
:;++++++
+ + + +

+ +

+ +

+

+++ ++

o

X
o

Setosa
Versiclor
Virginic

o 000
o 00
0 0 00
0000000000

0.5

o

o

2.0

+
~

00 0 0

2.5

3.0

3.5

4.0

X2 (sepal width)

'Shape. However,

The points from all three groups appear to form an ellptical

it appears that the points of '11 (Iris setosa) form an ellpæ with a different
orientation than those of '12 (Iris versicolor) and 113 (Iri virginica). This

indicates that the observations from '1i may have a different covariance matrix
from the observations from '12 and '13.

(b) Here are the results of a test of the null hypothesis Ho : Pi = 1L2 = ¡.3 vel'US

others at the a = 0.05 level

Hi : at least one of the ¡.¡'s is different from the

of significance:

Statistic
WiJ.x'S' Lambda

Value F
o .~2343B63 199.145

Num DF

Den DF Pr)J F

8

288 Q.0001

309

Thus, the null hypothesis Ho : J11 = J12 = J13 is æjected at the Q = 0.05
level of significance. As discussed earlier, the plots give us reason to doubt
the assumption of equal covariance matrices for the three

groups.

(c) '11= Iris setosa; '12 = Iris versicolor '13 = Iris virginica

The quadratic discriminant scores d~(x) given by (11-47) with Pi = P2 =
P3 = l are:

population

~(x) = _1 In ISil- l(x - Xi)' Sii(x - Xi)

'11

-3.68X2 + 6.16x2x4 - 47.60x4 + 23;71x2 + 2.30X4 - 37.67

'12

-9.09x~ + 19.57x2x4 - 22.87x~ + 24.94x2 + 7..ß3x4 - 36.53

'13

-6. 76x~ + 8.54x2X4 - 9.32x~ + 22.92x2 + 12.38x4 - 44.04

To classify the observation x~ - 13.5 1.75), compute Jftxo) for i = 1,2,,3,

and classify Xo to the population for which ~(xo) is the la¡;g.et.

AQ

di (xo) = -103.77
AQ

d2 (xo) = 0.043

cf(xo) = -1.23
So classify Xo to '12 (Iris versicolor).

(d) The linear discriminant scores di(x) are:

population I di(x) = ~SpooledX - l~Spooledæi J dïÍ2;O)

1li . 36.02x2 - 22.26x4 - 59.00 .28.12

'12 i9.3lx2 + 1£.58x4 - 37.73 '58..6

'13 15A9X2 + 3'6.28x4 -59.78 57.92

310

Since d¡(xo) is the largest for í = 2, we classify the new observation x~ =
i3.5 1.75) to'1i according to (11-52). The results are the same for (c) and

(d).
(e) To use rule (11-56), construct dki(X) = dk(x) - di(æ) for all i "It. Then

classify x to'1k if dki(X) ;: 0 for all i = 1,2,3. Here is a table of dn(%o) for

i, k = 1,2,3:

i
1

2

3

0

-30.74

30.74
29.80

0

-29.80
0.94

-0.94

0

1

J 2

i

Since dki(XO) .;: 0 for all i =l 2, we allocate Xo to '12, using (11-52)
Here is the scatterplot of the data in the (X2' X4) variable space, with the

classification regions Ri, R2, and Rg delineated.
2.5

:;

:;

-.i-

U
.~

2.0

:;
;.
:;

:;

.

1.5

eØ

-

+
+

+

ã5

Co

'"
X

1.0

:;

;.

;.
:;
:;

:;
;. ;.

:;

:;

;.

:;
:;

:; ;. :;

;.

:;

;.

;. :;

;. :; ;. ;. ;. ;i
;.
;i++++
;.++++++
+ + + +
+

+ + +

+ +

+

+

::

0.5

0

0

000
00
0
0000000000
00
0
0

2.0

2.5

3.0

0

3.5

X2 ~sepal width)

0

0

0

4.0

0

311

CHAPTER 11. DISCRIMINATION A.ND CLASSIFIC.4.TIOH

36

(f) The APER = ii~ = .033. Ê(AER) = itg = .04
11.28 (a) This is the plot of the data in the (lOgYi, 10gY2) variable space:

0

2.5

-~
-

Cl
.Q

o Setosa
+ Versiclor

~ Virginic

0
0

Oll 00
0 0
OCDai 0
0 0
0
00 0

0 o 0

2.0

0
0

!b

1.5
0

1.0

0
0

0

0

0

00 0

0
0

o

0
0

0

*

+++;.
+;. +

+ t +.t;t + + :P

+ J: +;.
V + + t ;. + ;.
++:t++~
+;.~ ;.'l~~
;.;.
;.;.;.;.
;.;.;. ;.;')l

0.4

0.6 ,

0.8

1.0

log(Y1 )

The points of all three groups appear to follow roughly an eliipse-like pattern. However, the orientation of the ellpse appears to be different for the
observations from '11 (Iris setósa), from the observations from '12 and '13. In

'1i, there also appears to be an outlier, labelled with a "*".
(b), (c) Assuming equal covariance matri.ces and ivariate normal populations,

these are the linear discriminant -scores dit x) for i = 1, 2, 3.
For both variables log Yi, and log 1':

population J df(X) = ä;SpooledX - lä;SpooledZi
'11 . 26.81
7r2 75.10 log Yí + 13.82

log Yi + 28.90 log 1' - 31.97

log 1' - 36.83

7r3 79.94 log Yi + 10.80 IQg Y2 - 37.30

312

For variable log Yi only:

population

¿¡(x) = ~SpooledX - læ~Spooleåæi

'1i

40.90 log Yi - 7.82

'12

81.84 log Yi - 31.30
85.20 log Yi - 33.93

'13

For variable 10gY2 only:

population ¿¡(x) = ~SpooiedX - l~Spooledæi

'11 30.93 log Y2 - 28.73
'12 19.52 log Y2 - 11.44
'13 16.87

Variables
log Yl, log Y2

log Yl
log Y2,

log Y2 + 8.54

APER

E(AER)

26 - 17

27 - 18

150 - .

150 - .

49 - 33

iš -.

49 - 33
150 - .

34 - 23

34 - 23

i50 - .

i50 - .

The preceeding misclassification rates are not nearly as good as those in Ex:-

ample 11.12. Using "shape" is effective in discriminating'1i (iris versicolor)
from '12 and '13. It is not as good at discriminating 7í2 from 1i3, because of

the overlap of '11 and '12 in both shape variables. Therefore, shape is not an

effective discriminator of all three species of iris.
(d) Given the bivarate normal-like scatter and the relatively large
samples, we do not expect the error rates in pars (b) and,(c) to differ.
much.

313

11.29 (a) The calculated values of Xl, Xi, X3, X, and Spooled agree with the results for

these quantities given in Example 11.11

(b)
1518.74 J

,B-

w-i _

1518.74 258471.12

.000003
( 0.000193
0.348899
0.000193) _

( 12.'50

The eigenvalues and scaled eigenvectors of W-l Bare

).i

-

5.646,

A'
ai

0.009

( 5.009 J

).2

-

0.191,

A i

a2

-0.014

( 0'2071

To classify x~ = (3.21 497), use (11-67) and compute

EJ=i(âj(x - Xi))2

i = 1,2,3

Allocate x~ to '1k if

EJ=i(âj(x - Xk))2 ::E;=i (âj(æ - Xi))2

for alli i= Ie

For :.o,
k L~_l(â'.(X - Xk)J2

1 2.63
2 16.99
3 2.43
Thus, classify Xo to '13 This result agrees with thedasifiation given in
Example 11.11. Any time there are three populations with only two discrim-

314

inants, classification results using Fisher's Discriminants wil be identical to
those using the sample distance method of Example 11.11.

11.30 (a) Assuming normality and equal covariance matrices for the three populations
'1i, '12, and '13, the minimum TPM rule is given by:

Allocate xto '1k if the linear discriminant score dk (x) = the largest of di (:.), d2 \ æ ), d3\~

where di(x) is given in the following table for i = 1,2,3.

population

di(x) = ~SpooledX - lX~SpooledXi

'11

0.70xi + 0.58x2 - l3.52x3 + 6.93x4 + 1.44xs - 44.78

'12

1.85xi + 0.32x2 - 12.78x3 + 8.33x4 - 0.14xs - 35.20

'13

2.64xi + 0.20X2 - 2.l6x3 + 5.39x4 - 0.08xs - 23.61

(b) Confusion matrix is:
Predicted

Membership
'1i '12 '13
Actual '11
membership '12

7

7í3

0

1

0
10
3

0
0
35

Total
7
11

38

And the APER O+5~+3 = .071

The holdout confusion matrix is:

Predicted

Membership

'1i '12 '13 Total

me~~~~hiP :: J ~ I ~ 1 :5 ( ~

E(AER)= 2+5~+3 = .125

315

(c) One choice of transformations, Xl, log X2, y', log X4,.. appears to improve the

normality of the data but the classification rule from these data has slightly higher
error rates than the rule derived from the original data. The error rates (APER,

Ê(AER)) for the linear discriminants in Example 11.14 are also slightly higher
than those for the original data.

11.31 (a) The data look fairly normaL.

00

500

0
0

::
cø

0 Q)
0 0

450

Q)
i:
"t

0

00

0

400

0

0
0

0
c6 0

0
0
0
0 00 õJ°
0

0

0000 000 °õJ
+

C\

+

0+

0
+
Gl

X

+0

0

+ +

+

+

0+
+ ~it +

+

a

Alaskan

+

Canadian

60

80

t

120

+

+

+ +++
+

+ +

+

100

+

++

.¡+

+

+

+ 't

+

1+

350

300

+
+

+

+

+

140

160

180

X1 (Freshwater)

Although the covariances have different signs for the two groups, the corr.ela-

tions are smalL. Thus the assumption of bivariate normal distributions with
.equal -covariance matrioes does not seem unreasnable.

316

(b) The linear discriminant function is

â'x - rñ = -0.13xi + 0.052x2 - 5.54

Classify an observation Xo to'1i (Alaskan salmon) if â'xo-m ;: 0 and clasify

Xo to

'12 (Canadian salmon) otherwise.

Dot diagrams of the discriminant scores:

. .. .

... I... .

-------+---------+---------+---------+---------+--------- Alaskan

.. .... ...
. ..

..
.... ". . ".. .......
.. . . . . .

-------+---------+---------+---------+---------+---------Canadian
-8.0

-4.0

0.0

4.0

8.,Q

12.0

It does appear that growth ring diameters separate the two groups reasonably

well, as APER= ~t~ = .07 and E(AER)= ~t~ = .07

( c) Here are the bivariate plots of the data for male and female salmon separately.

317

eo

100 12.0

80

160 180

140

i

mae ¡~:~:%~~;.~"E~"
%

500

0

0

-

45

0

0

CI

i:
.¡:

0

Cb

0

ct

e

o

0

0
0

0

o 0
400

C\

o 0

X
350

o il

0

0

o 0

o
+

o 000+
o

+
+

+ ++

+o +
+
++++
+ 0+

o
00

o

+

óJ

000

+

o

o

++
+ ++
o+0
+

+

o
+

++

+ :.+ +

+

+ +

+

+

+

+

300

140 160 180
X1 (Freshwater)

For the male salmon, these are some summary statistics

. xi
436.1667

( 100.3333 i, Si

-197.71015 1702.31884
( 181.97101 -ì97.71015 J

( ::::::: l' S2

141.64312 760.65036
( 370.17210 141:643121

X2

The linear discriminant function for the male 'Salmon only Is

â'x - m= -0.12xi + 0.D56x2 - 8.12

Classify an observation Xo to 1ii (Alaskan salmon) if â'xo-m;: 0 and clasify
:c to '12 (Canadian -salmon) oth.erwIse.

+

+

+

318

Using this classification rule, APER= 3tal = .08 and E(AER)= 3:ä2 = .w.

For the female salmon, these are some summary statistics

Z¡ - (4::::::: J' s, -210.23231 1097.91539
( 336.33385 -210.23231 i

Z2 - (:::::::: J' S2 120.64000 1038.72ûOO
( 289.21846 120.64000 J

The linear discriminant function for the female salmon only is

â' X - rñ = -O.13xi + O.05X2 - 2.66

Classify an observation xo to'1i (Alaskan salmon) if â'xo-m ~ 0 and classify
xo to '12 (Canadian salmon) otherwise.

Using this classification

rule, APER= 3i;0 = .06 and E(AER)= 3;;0 = .06.

It is unlikely that gender is a useful discriminatory varable, as splitting the
data into female and male salmon did not improve the classification results

greatly.

319

the data for the two groups:

11.32 (a) Here is the bivarate plot of

+
+

0.2

+

+
+

++

++

+

+

0.0

+

+

C\

+

X

+

+

++

+

+

+

+ ++

+0

0

+0 + ã'
+ %+ooo'
+ +

ll 0

+
++õ

+

-0.2

+
+

0

+
+

+

0

0

0

0
0

0

0
0

e

+

+0

+

-0.4

o Noncrrer

+ Ob. airrier

o

-0.6

-0.4

-0.2

0.0

X1

Because the points for both groups form fairly ellptical shapes, the bivariate
normal assumption appears to be a reasonable one. Normal -score plot-s fDr

each group confirm this.

(b) Assuming equal prior probabilties, the sample linear discriminant function is

â'x - ri = i9.32xi - l7.l2x2 + 3.56

Classify an observation Xo to '1i (Noncarriers) if â'xo - rñ ;: .0 and classify
Xo to '12 (Obligatory carriers) otherwise.

The holdout confusion matrix is

320

Predicted
Membership

'1i '12
Actual
membership

'11 j
'12

26 4
8 37

Total

t ~~

Ê(AER)= 4is8 = .16
(c) The classification results for the 10 new cases using the discriminant function

in part (b):

Case

Xl

X2

1

-0.112

2
3

-0.0'59

-0.279
-0.068
0.012
-0.052
-0.098
-0.113
-0.143
-0.037
-0.090
-0.019

4
5
6

7
8
9
10

0.064
-0.043
-0.050
-0.094
-0.123
-0.011
-0.210
-0.126

â' x - rñ Classification

6.17 '1i
3.58 '1i

4.59 111

3.62 lii

4.27 '11

3.68 '1i

3.63 lii

3.98 '11
1.04 7íi
1.45 '11

(d) Assuming that the prior probabilty of obligatory carriers is ~ and that of
, noncarriers is i, the sample linear discriminant function is

â':. - rñ = 19.32xi - 17.12x2 + 4.66

Classify an observation Xo to lii (Noncarriers) if â':.o - rñ :; 0 and classify
::o to '12 (Obligatory carriers) otherwise.
The hold.ut confusion matrix is

321

Predicted
Membership
'11 '12
Actual
membership

:: j ~~ I 2°7

Total

t ~~

Ê(AER)= l~tO = 0.24
The classification results for the 10 new cases using the discriminant function

in part (b):
Case
1

2
3
4
5
6

7
8
9
10

Xi

X2

â'x - ri

-0.112
-0.059
0.064
-0.043
-0.050
-0.094
-0.123
-0.011
-0.210
-0.126

-0.279
-0.068
0.012
-0.052
-0.098
-0.113
-0.143
-0.037
-0.090
-0.019

7.27
4.68
5.69
4.72
5.37
4.78
4.73
5.08
2.14
2.55

Classification
7ri
'1i
'11
'11

7ri

'1i
'11

'1i
'11
'11

11.33 Let X3 = YrHgt, X4 = FtFrBody, X6 = Frame, X7 = BkFat, Xa = SaleHt, and Xg =

SaleWt.

(a) For

'11 = Angus, '12 = Hereford, and '13 = Simental, here are Fisher's linear

discriminants

di
d2

cÎi

-

-3737 + l26.88X3 - 0.48X4 + 19.08x5 - 205.22x6

+275.84x7 + 28.l5xa - 0.03xg
-3686 + l27.70x3 - 0.47X4 + l8.65x5 - 206.18x6

+265.33x7 + 26.80xa - 0.03xg
-3881 + l28.08x3 - 0.48x4 + 19.59xs - 206.36x6

+245.50X7 + 29.47xa - 0:03xg

322

When x~ = (50,1000,73,7, .17,54, 1525J we obtain di = 3596.31, d2 = 3593.32,
and d3 = 3594.13, so assign the new observation to '12, Hereford.

This is the plot of the discriminant scores in the two-dimensional discriminant
space:

2

0

0

0

0

0

~

0

0

.rct
i

C\

;:

0

8000

~

:.

00

0
+
+

0

0+ .p

+

-1

0
+ 0 +

00

:.~

i.
:.

:.
"b
+

+

~

~

0

+

+

:.

+

-2

0

:.

:.

:.

.2

:.

:.:.

~
~

0

:.

?:.

:.

+ + % 0 eO

+
+

~

+

0

2

Angus
Hereford
Simental

4

y1-hat

(b) Here is the APER and Ê(AER) for different subsets of the variable:
Subset I APER Ê(AER)
X3, X4, XS, X6, X7, Xai Xg

X4, Xs, X7, Xa
XS, X7, Xa
X4,XS
X4,X7
X4,Xa
X7,XS

XS,X7
Xs,XS
11.34 For

.13
.14
.21

.43
.36
.32
.22
.25
.28

.25
.20
.24
.46

.39
.36
.22
.29
.32

'11 = General Mils, '12 = Kellogg, and '13 = Quaker and assuming multivariate
flmai data with a 'Cmmon covariance matdx,eaual costs, and equal pri,thes

323

are Fisher's linear discriminant functions:

di
d2

d3

.23x3 + 3.79x4 - 1.69xs - .Olx65.53x7

-

1.90XB + 1.36xg - O.12xio - 33.14

.32x3 + 4.l5x4 - 3.62xs - .02X69.20X7
2.07xB + 1.50xg - 0.20xio - 43.07

.29x3 + 2.64x4 - 1.20xs - .02x65.43x7
1.22xB + .65xg - ü.13xio

The Kellogg cereals appear to have high protein, fiber, and carbohydrates, and
low fat. However, they also have high sugar. The Quaker cereals appear to have
low sugar, but also have low protein and carbohydrates.
Here is a plot of the cereal data in two-dimension discriminant space:

2

0

o ar
0

1

.c

0

C\

;:

;:
;:

0

0

0
0
0
+
0
00+

+
+

+

~
+

+

o~++

-1

.t +
+

;:
;:

-2

+
+

-3
-4

+

;:

+

lU
i

0

;:

-2

0

y1-hat

0

Gen. Mils

+
;:

auar

Kellog

2

324
11.35 (a) Scatter plot of tail length and snout to vent length follows. It appears as if

these variables wil effectively discriminate gender but wil be less successful
in discriminating the age of the snakes.
,:':___::.....-.-' _ _." ..-...-----...----...--...-.. _ _.d:-'..d..d."--"

. Sêatterplotof SntoVnLength.vs Ta.
..
..
..

..

..
~

~

..

il

. .

.
.

.

~

.

~

.. ..

..

..

~

~

~
~
~

..

..

.

..

..

. .

..

.

.. . . ...
.a
... ...

.

.

140160 180
liàjlLength

OD) Linear Discriminant Function for Groups
Female

-36.429
0.039
SntoVnLength
0.310
TailLength

Constant

Male

-41.501
0.163

-0.046

sumary of Classification with Cross-validation
Put into Group

Female
Male
Total N

N correct

True Group
Male
Female
34
3

37
34

i

27
29
27

0.931

Proportion

0.919

N = 66

N Correct = 61

E(AER) = 1 - .924 = .076 ~ 7.6%

Proportion Correct

0.924

325
(e) Linear Discriminant Function for Groups
4

3

2

-112.44 -145.76 -193.14
0.45
0.38
0.33
SntoVnLength
0.65
0.60
0.53
Tai lLength

Constant

sumary of Classification with Cross-validation
True Group

Put into Group
2
3

4
Total N

N correct

proportion

N = 66

2

3

13
4

2

4
0
2

21

0

3

17
13

26
21

21
23
21

0.765

0.808

0.913

Proportion Correct

N Correct = 55

0.833

E(AER)= 1-.833= .167 ~ 16.7%
(d) Linear Discriminant Function for Groups
2

3

4
-141. 94

SntoVnLength 0.36

-102.76
0.41

Constant -79.11

0.48

sumry of Classification with Cross-validation
Put into Group
2
3

4

Total N

N correct
Proportion
N = 66

True Group
2

3

4

14

1

3

21

0
4

0

4

17
14

26
21

0.824

0.808

19
23
19

0.826

N Correct = 54

Proportion Correct

o. a18

E(AER) = 1-.818 = .182 ~ 18.2%

Using only snout to vent length to discriminate the ages of the snakes is
about as effective as using both tail length and snout to vent length.
Although in both cases, there is a reasonably high proportion of
misclassifications.

326
11.36 Logistic Regression Table

95% CI

Odds

Predictor

Constant
Freshwater

Marine

Coef

SE Coef

Z

P

3.92484
0.126051
-0.0485441

6.31500
0.0358536
0.0145240

0.62
3.52

0.534
0.000
0.001

-3.34

Ratio Lower Upper
1.13
0.95

1. 06

0.93

1.22
0.98

Log-Likelihood = -19.394
Test that all slopes are zero: G = 99.841, OF = 2, P-Value = 0.000

The regression is significant (p-value = 0.000) and retaining the constant term
the fitted function is

In( p~z) ) = 3.925+.126(freshwater growth)-.049(marinegrowth)
1- p(z)

Consequently:
Assign z to population 2 (Canadian) if in( p~z) ). ~ 0 ; otherwise assign

1- p(z)

z to population 1 (Alaskan).

The confusion matrix follows.

Predicted

1 2 Total

1 46 4 50
Actual
2

3

47

50

7

APER = - = .07 ~ 7% This is the same APER produced by the linear
100

classification function in Example 11.8.

327

Cha,pter 12
12.1

Democrat

Y~s

1 -+ South

a) Codes:

Yes

Repub 1 icanNo

o -+ non-South No

e.g. Reagan - Cart~r:

i 0
1

1

o

o

2

2

t

P.

a+d = 3/5 = 60

Pair

Coefficient (a+d)lp

R-C

.6
.4
.6

R-F
R-N

R-J
R-K

o

.6

C-F

o

C-N

.2
.4
.6

C-J
C-K

N-J

.8
.6
.4
.4

N-K

;6

F-N

F-J
F-K

J-K

.4

Y.es
No

328

12.1

b)

RankOr¿~r

Coeffi ci ent
1

2

3

1

2

3

.75

4.'5

4.5

10

10

.75

.429
.25
.429

4.5

R-N

.6
.4
.6

R-J

0

0

0

.6

.75

.429

Pair
R-C

R-F

R-K

C-F
C-N

C-J
C-K

F-N

F-J
F-K

N-J
H-K

J-K

.571

0

.2
.4
.6

.8
.6
.4
.4
.6
.4

12.2

0

0

4.5
14.5
4.5
14.5

.333

.111

13 .

.571

.25
.429
.667
.429
.25
.25
.429
.25

10

.75
.889
.75
.571
.571

.75
.571

R-C

R-F
R-N

R-J
R-K

C-F
C-N

C-J
C-K
F-N

F-J
F-K

N-J
N-K

J-K

13
10

10

4.5
14.5

4.5
14.5
13
10

4.5

4.5

4:5

1

1

1

4.5

4.5

4.5

10
10

10
10

10
10

4.5

4.5

4.5

10

10

10

Rank Order

Coeffi c i ent

Pair

4.5
14.5
4.5
14.5

5

6

7

5

6

.333

.5

.2

9
14

9

9

14

14

0

.333
0

.333

0

.5
0

.5

0

.2

9

9

9

0

14

14

14

.2

9

9

9

14

14

12
6

12
6

0

0

0

.2

.333

.111

14
12

.4
.5

.571

.25
.333

6
3

.667

.5
.25

.4
.5
.4

.667

7

3

3

1

1

.8

.5

'1

.667

.333
.143
.25
.333
.25

3
11
6

3
11

11

1i

6

3

3

3

6

'6

6

.4

.571

.667
.571

3

i = (a+b)/p¡ Y = (a+c)/p

1
p
, P

r(x.-xP = (a+b)(1-(a+b)/p)2 + (c+d)(O-(a+b)/pF = (e+d)(a+b)

r(y._y)2 = (a+c)(1-(a+cl/p)2 + (b+d)(O-(a+el/p)2 = (a+C)(b+d)

l' 1 1 1 1

r(x.-x)(y.-y) = r(x.y.-y.i-x.y+xy)

p p p1

= a _ (a+c)(a+b) _ (a+b)(a+c) + p (a+b)(a+c)

p P

= a(a+b+c+d)-~a+eHa+b) = ad-be

Therefore
(ad-bc) lp

r = ((c+dHa+b~~a+C)(b+dl )',

=

ad-be
((a+b He+d) (a+c)(b+d))~

330

12.4

Let c,

=-,

a+d

a+d

c3 =(a+d)+2( b+c)

c =
2

p

1

e3

so

then c3 = 1 +2( c; 1_1)

increases as c,

2

cz

so

A 1 so t c2

increases as

c,

i ncrea s.es

increases

= c; 1 + ,

4

Finally. Cz = c-1+3 so Cz increases as c3 increases
3

12.5 a) Single linkage

2

1

3

4

(12) 3

1 0

(123)

(12) 0

3 11

z

o

4

3

4

5

4 3
o

Dendogram

~

J
:i
1.

-+

-+ 3 (D 0

Z (j 0

~

ri. 2. '3 'l

(123) 4

4

4

o

4 (: oJ

331

12.5 b)

c.) Average Linkage

Compl ete Li nkage

Dendoaram

Dendoaram

10

S

e

4

~

"3

2-

Ll

1-

:i
:: :3 4

.1

~ ;l
12.6

"3

4

Dendograms

Complete Linkage

10

Average Li nkage

,Single Linkage

e
(¿
4-

.2

1. 4 :i 5'3.

1. 4 :z !;-:3 .

1. 4 ;2 S '5.

All three methods produee the same hierarchical arrangements. Item 3 is
somewhat different from the other items.

12.7

Treating correlations as similarity coefficients. we have:

i

Single linkage

'"

A

3

s

l.

I
i

S45 =.68
S(45)1 = max (S4!. S51) = .16

. ,g.w~
I',-'
~
.S1

Jrr-

.;r

5.(45)2 = .32, S'(45)3 = .18, and so forth.

i
1

I

I
i
i

.. i-i -- 3 ~t

j

332

i2~4S

Complete linkage
S45 = .68

. c,g_,,1

.'3
..Sl--S:

S(45)1 = min (S41, S5i) = .12

S(45)2 = .21, S(45)3 = .15, and so forth.

e3

., ~-- 1
Both methods arrive at nearly the same clust.eri ng.
12.8
1

2

3

4

1

0

2

9

0

3

3

7

0

4

6

5

9

a

5

11

10

CD

8

5

1

-+

2

(35)

1

0

2

9

0

(35 )

7

8.5

a

4

6

CD

8.5

a

1. :3 s- :: tt
Average linkage pr~uc.es r~sults si~;l¡r to single linkage.

4

..
0

333

12.9 Dendograms
Singl e Linkage

COl'pl ete Linkage

1.0 t

.S I

."
~

n

..

· 4.

· :i

_i-

5'
LL

.3
:L

i

i.::3 4 S'
'1 .i :3 cf s-

Average L; nkage

a

A1 though the vertical s~a les

are differ~ntt all three linkage
methods produce the same groupings. (Note different vertical

2

scales.)

::

1
1. .: .2 '4 S"

334

12.10 (a) ESSi = (2 - 2)2 = 0, ESS2 = (1 - 1)2 = 0, ESS3 = -(5 - 5)2 = 0, and
ESS4 =(8 - 8)2 = o.
(b) At step 2

Increase
in ESS

Clusters
t 13)-

(3)(2)-

( 14)-

(2)

t 12)-

( 1)

( 1)
( 1)-

(4)- .5
t 4)- 4.5
(3)- 18.0

(4)- 8.0

(23)(24)(2)-

(3)-24.5

~34)- 4.5

(c) At step 3

Increas
Clusters

in ESS

t 12)-

(34)-

( 123)-

(4)

5.0
8.7

Finally all four together have
ESS = (2 - 4)2 + (1 - 4)2 +(5 - 4)2 + (8 - 4)2 = 30

12.11 K = 2 initial clusters (AB) and (CD)

Xl xi
(AB) 3 1
(CD) 1 1

Final clusters (AD) and (Be)
Xi

(AD) 4
(BC) 0

xi
2.5

-.5

Squa red

di stance

~ntr.oids
C1 us ter

(AD)

(BC)

A

8

3.25 29.25

45.25

3.25

to

C

9 ro up
0

27.25

3.25

3.25

11.25

335

12.12 K = 2 initial clusters (AC) and (BD)

Xl x2

(Ae) 3 .5
(BD) -2 -.5

Squared di stance, to group
centroi ds

Final clusters (A) and (BCD)
C1 uster I

Xi x2

(A) 5 3

(A)

(BCD)

A

B

0
52

C

0

40

41

89

4

5

5

(BCD) -1 -1

As expected, this result is the same as the result in Example 12.11. A graph of the
items supports the (A) and (BCD) groupings.
12.13 K = 2 initial clusters (AB) and (CD)

Xi x2

(AB) 2 2
(CD) -1 -2
Final clusters (A) and (BCD)

(A)

(BCD)

Xi

x2

5

3

-1

-1

Squared distan~e

to group

cen troi ds

Cl uster
A

(BCD)

B

C

0

01

40

41

89

52

41

51

A

51

The final clusters (A) and (BCD) are the same as they are in Example 12.11. In
this case we start with the same initial groups and the first, and only, reassignment
is the same. It makes no difference if you star at the top or bottom of the list of
items.

336

12.14. (a) The Euclidean distances between pairs of cereal brands

CL C2 C3 C4 C5 C6 C7 C8 C9 CL0 ~11 C12
CL 0.0
C2116.0 0.0
C3 15.5 121.7 0.0

C4 6 . 4 117. 9 10 . 0 0 .0

C5 103.2 61.6 100.6 102.1 0.0

C6 72.844.178.474.454.3 0.0

C7 86.4 71 .9 82.5 84.9 22.3 52.4 0 . 0
C8 15.3 121.5 1.4 10.1 100.6 78.3 82.4 0.0

C9 46 . 2 72 . 6 54 . 7 48 . 9 75 . 8 32 . 1 65 .2 54 . 5 0 .0
CL0 54.9 123.0 68.9 59.5134.7 87.8 122.5 68.8 65.7 O.~
CL1 81.3 154.7 94.7 85.8169.6 121.3 157.0 94.6 94.5 47.1 0.0
C12 42.3 114.2 31.3 38.5 81.1 75.3 60.2 31.0 59.8 92.9 121.9 0.0
C13 163.2 163.4 177.9 168.1 208.0 155.4205.1 177.9 148.9 112.4 110.7 198.0

C14 46.7 90.8 60.4 51.5 103.8 55.4 92.9 60.3 28.5 44.3 67.5 75.9

C15 60.3 170.5 50.0 56.6 141.5 127.8 121.5 50.0 103.8 101.7 115.6 62;0
C16 46.9 90.8 60.5 51.6 103.8 55.5 92.9 60.3 28.5 44.3 67.6 75.8
C17 23.1 101.0 21.6 21.6 81.4 58.5 63.6 21.4 37.5 70.1 100.7 26.0

C18 265.7 221.1 280.0270.6278.9 233.9 283.3 280.0 235.6 227.7 218.b 294.5

C19 68.2 181.9 60.5 65.2 155.9 138.7 136.2 60.5 113.2 102.7 111.7 76.6
C20116.6 71.0 113.2115.3 19.7 69.9 32.1113.1 89.3 150.5 183.5 90.6
C21103.0 217.7 96.6100.6191.7 174.7171.6 96.6148.1129.7130.5 111.7
C22 98.6 160.1 112.6 103.4 181.3 130.5 170.2 112.6 106.9 54.1 22.5 139.2

C23 58.0 102.8 49.1 54.9 62.4 68.1 41.3 48.9 61.2 105.4 136.9. 20.7
C24 68.1 181.8 60.4 65.2 155.8 138.7 136.1 60.4 113.1 1'02.7 111.6 76.5

C25 49.4 121.0 36.2 44.8 82.5 82.1 62.8 36.2 68.9 101.7 130.2 14.7
C26 182.8 290.3 186.0 183.8 285.6 250.4 267.2 185.9 220.2 173.8 145.7 210.7

C27134.7 99.9 148.2 139.1150.9 1'01.1 152.2 148.2 1-04.2 99.6 113.7 160.9
C28 16.1128.3 14.2 14.2111.1 85.7 92.3 13.7 59.2 63.5 86.3 39.4
C29 107.5 159.0 120.3 111.6 180.7 132.1 170.7 120.3 116.0 54.1 64.6 144.1

C30 33.5 120.1 21.2 29.2 90.7 78.8 71.2 21.0 61.7 83.1 113.7 17.2
C31 78.9 80.5 90.9 82.8 108.5 59.2 103.1 90.8 56.9 52.6 90.6 101.7
C32 32.1 122.6 43.5 36.0 120.8 83.1 105.0 43.3 51.3 50.9 60.0 65.9
C33 143.1 68.0 141.3 142.4 42.0 84.5 61.1 141.2 109.8 170.6 203.8 120.8
C34 173.0 157.7 187.8 177.9 207.5 155.6206.8 187.8 151.8 127.0 123.8 205.9
C3S 116.2 70.4 112.7 114.9 16.9 69.2 30.4 112.6 89.9 148.8 183.8 90.0
C36 114.1 230.0 111.1 112.9 210.2 186.9 190.8 111.1 158.8 129.8 122.7 131.2

C37 53.1 78.2 51.4 52.4 51.6 41.3 34.2 51.1 38.1 91.1124.5 36.6

C38 54.2 100.4 45.8 51.0 61.8 63.5 43.5 45.8 59.0 99.2 133.'6 25.8

C39 48.3 93.5 42.5 45.9 61.0 ~5.1 43.3 42.5 49.6 90.7 125.9 27.3
C40 40.6140.9 51.6 44.3139.8 100.7123.8 51.4 70.3 44.1 46.2 79.4
C41 197.8 309.6 194.3 196.6 288.1 268.0 268.1 194.3 237.8 215.5 194.4 209.9

C42 191.1 301.3 190.3 190.8 286.6 260.4267.3 190.2229.3 200.8 174.~ 209.7
C43 185.2 290.7 189.2 186.6 288.1 251.4 270.2 189.2 221.4 173.6 143.7 214.8

C13 C14 C15 C16 C17 C18 C19 C20 C21 ~22 C23 C24
C13 0 . 0
C14 127.4 0.0
C15 213.2 105.0 0.0

C16127.4 1.0 105.0 0.0
C17 173.1 51.3 69.7 51.3 0.0
C18 134.4220.7 321.2 220.8 270.1 0.0
C19 212.5 11'0.8 16.2 110.9 81.2 322.6 0.0

337

C20 223.2 117.3 151.2 117.3 94.3288.6166.1 0.0
C21 234.6 142.8 50.3142.8117.2347.4 36.5201.2 0.0

C22 91.5 79.1 135.2 79.2 116.8 204.1 131.1 195.9 148.8 0.0

C23 204.9 83.3 81.1 83.2 36.8 295.9 96.2 70.9 130.9 153.2 0 .0
C24 212.5 110.7 16.0 110.8 81.1 322.6 1.4166.036.5 131.1 96.1

0.0

C25207.5 86.0 60.0 86.1 35.2303.9 75.3 91.8 110.1 147.9 23.2 75.3

C26 233.8 200.3 159.3 200.3 '204.2 342.0 143.8 297.3 121.0 152.7 231.2 143.8

C27 67.1 92.1 193.3 92.2 136.5 141.1 197.4 164.6 227.0 105.1 162.0 197.4
C28174.0 59.3 46.7 59.3 30.1278.3 55.0 123.1 89.7 104.7 58.5 54.9

C29 83.1 93.3144.4 93.3122.6214.5141.7 197.4 160.4 51.8156.3 141.7

C30 191.2 73.8 53.3 73.8 24.6 293.2 66.8 102.5 102.5 130.6 34.3 66.8

C31104.8 49.4 135.7 49.3 78.9207.0141.7 124.7 173.2 91.2104.5 141.7

C32 150.5 37.5 75.3 37.5 47.4 248.1 78.9 132.4 108.8 79.4 80.7 78.7
C33 230.0 136.6 181.8 136.5 121.5 283.5 196.3 31.7231.9214.1101.6

196 . 3

C35 221.6 117.8 150.9 117.7 93.7289.9 165.8 10.1 201.0 195.7 70.2
C36 226.8 148.7 71.8 148.7 131.9 341.0 56.0 221.0 28.8 139.2 151.3

165.7
56.0

C34 30.1 132.2 226.4 132.3 180.7 107.3 226.8 221.3 250.8 107.0 210.8 226.8

C37182.4 63.6 95.5' 63.6 31.1 270.0 108.7 64.4 144.7 138.6 27.7

108 .6

95.7
C39188.6 71.5 83.1 71.6 27.4 282.6 96.8 74.6 132.8 140.6 21.8 96.7
C40146.6 52.5 71.8 52.6 62.1252.4 70.9 152.7 96.8 66.6 96.6 70.8
C38198.4 80.8 81.3 80.9 34.1292.4 95.7 74.1131.3 148.9 17.1

C41 301.1 227.1 153.1 227.1 213.8 401.5 140.2 295.1 108.9 210.5 228.7 140 .1
C42 277.2 214.8 154.9 214.9 209.3 375.5 140.8 294.9 112.9 188.1 229.2 140.7

C43 229.1 200.6 165.0200.7207.1 335.7 149.7 300.2 128.8 149.4235.2 149.6

C25 C26 C27 C28 C29' C30 C31 C32 C33 C34C35 C36
C25 0 . 0
C26
C27
C28
C29

213.9 0.0
170.1 257.2 0.0
46.5 175.0 148.2 0.0
152.5 172.5 103.0 113.8 0.0

C30 20.8 200.3 158.2 30.2- 132.8 0.0
C31 111.4 225.7 66~9 91.2 79.1 97.2 0.0
C32 75.0 170.7 126.2 36.4 101.6 62.2 81.5 0.0

C33 122.5 324.8 167.2 151.1 214~1 131.9 137.3 157.0 O.~
C34 215.5 253.2 58.3 184.8 107.8 201.1112.6 158.5 225.1 0.0
C35 91.3297.5 163.7 122.7 194.6 101.0 121.9 133.6 33.3220.7 0.0

C36131.0 93.2 227.1 102.7 152.9 120.7 178.1 114.7 250.8 244.4220.8 0.0
C37 43.5234.6136.1 60.4141.6 44.5 81.7 72.4 91.2 186.663.7 161.4
C38 24.7 230.4 156.4 57.3 148.9 30.7 97.7 81.1 103.2 205.3 72.0150.5

C39 30.1227.7 146.5 53.6 140.6 30.7 87.9 74.5 102.6 195.3 72.6 150.5

C40 86.9 150.1132.6 41.9 88.9 71.1 88.4 24.1177.4158.4153.0 98.1

C41 209.3 98.9 305.4 186.0 236.3 204.2 264.3 190.2 325.4 315.9 297.0 96.8
C42 210.6 71.2 286.8 180.8 216.6 203.0 251.2 179.4 324.1 292.0 296.8 ~4.0

C43218.2 17.7254.4 178.3 170.3204.2225.5 172.3327.1 248.4300.5 100.9
C37 C38 C39 C40 C41 C42 C43

C37 0 . 0

C38 27 .0 0 .0

C39 20.2 10.1 0.0
C40 90.2 94.6 88.5 0.0

C41 241.1 232.1 233.1 177.4 0.0'
C42 237.9 231.7 231.2 164.5 35.2 0.0

C43 237.2233.9230.8 151.2 108.278.7 0

338

(b) Complete linkage produces results similar to single linkage.
Single linkage

~
""
ì3

g

g

~

o
N
CO..

N..
00

o
"'..
_N

00

..

Oõ

MI

ot.

Complete linkage
8..

8..

8

'"

""
ì3

~

o
..""

õú

~"
::~
0l 00

339

12.15.

In K-means method, we use the means of the clusters identified by

average linkae as

the initial cluster centers.

Final cluster centers
1
2
3
4
1 110.0 2.1 0.9 215.0
2 114.4 3.1 1.7 171.1
3 86 . 7 2. 3 o. 5 26.7
4 112.5 3.2 0.8 225.0

for K = 4
5
6
7
8
0.7 15.3 7.9 50.0
2.8 15.0 6.6 123.9
1.4 10.0 5.8 55.8
5.8 12.5 10.8 245.0

K-means
K = 2
1

CL

1

2

C2
C3
C4
C5
C6
C7
C8
C9

1

3
4
5

6
7
8
9

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

1
1

1
1
1
1

1

C10
C12
C14
C15
C16
C17
C19
C20
C21
C23
C24
C25
C26
C28
C30
C32
C33,
C35
C36
C37

1
1
1
1

C3B

1

C39
C40
C41
C42
C43
C11
C13
C18
C22
C27
C29
C31

~34

1
1
1
1

1

1
1
1

1
1

1
1
1

1
1
1

1

1
1
1

1

2
2

2
2
2
2
2
2

C10
C12
C14
C15
C16
C17
C19
. C20
C23
C24
C25
C28
C30
C31
C32
C33
C35
C37
C38
C39
C40
C21
C26
C36
C41
C42
C43
C11
C13
C18
C22
C27
C29
C34

K = 4

1

CL

1

1

C2
C3
C4
C5
C6

1

1

C4
C5
C6
C7
C8
C9

1

1
1

1
1
1
1

1
1
1
1

1
1
1

1
1
1
1
1

C7
C8
C9

C12
C15
C17
C19
C20
C23
C24
C25
C28
C30
C33
C35
C37

1

1
1

C4
C5

1

1

C6
C7
C8
C9

1

1
1
1
1

1

1
1
1
1
1
1
1

1
1
1

1

1

C3B

1

C39
CL0
C11

1

2

C14
C16
C22
C29

2
2
2
2

1

1

1
1

2
2
2
2
2
2
3
3
3
3
3
3
3

2

C31
C32
C40
C21
C2q
C36
C41
C42
C43
C13
C18
C27

3
3
3
4
4
4

(;34

4

Single
C1
C2
C3

1

1
1

0.0

1

2 86.1
0.0
3 190.0 162.2
0.0
4 195.4 132.7 275.4

4 clusters

K = 3
C1
C2
C3

Distances between centers
1
2
3
4

2

2
2

3
3
3

1
1
1
1
1

1
1

C10
1
C11
1
C12
1
C13
1
C14
1
C15
1
C16
1
C17
1
C19
1
C20
1
C21
1
C22
1
C23
1
C24
1
C25
1
C27
1
C28
1
C29
1
C30
1
C31
1
C32
1
C33
1
C34
1
C3S
1
C36
1
C37
1
C38
1
C39
1
C40
1
C18 18
C26 26
C43 26
.c41 41
C42 41

Complete
C1
C2

C3
C4
C5
C6
C7
C8
C9
C10

C12
C14
C16
C17
C20
C23
C25
C28
C30
C31
C32
C33
C35
C37
C38
C39

1

1
1
1
1

1
1
1
1

1
1
1
1
1
1
1

1
1
1
1

1

1
1
1
1
1

~40

1

CL1

11
11
11
11
11
11
15
15
15
15
15
15

.c13
C22
C27
C29
C34
C15

.c19
C21
C24
C26
C36
C41
C42

.c43

Ci8

1S

15
15
18

0.0

340

12.16 (a), (b) Dendrograms for single linkage and complete linkage follow. The
dendrograms are similar; as examples, in both procedures, countries 11, 40 and 46
form a group at a relatively high level of distance, and countries 4, 27, 37, 43, 25
and 44 form a group at a relatively small distance. The clusters are more apparent

in the complete linkage dendrogram and, depending on the distance level, might
have as few as 3 or 4 clusters or as many as 6 or 7 clusters.

341

(c) The results for K = 4 and K = 6 clusters are displayed below. The results seem
reasonable and are consistent with the results for the linkage procedures.

Depending on use, K = 4 may be an adequate number of clusters.

Data Display
Countr ClustMemK=6

ClustMemK=4

1
2

6

2

2

4

3

4

1
4

4

5

3

1

6
7

6
6

2
2
2

8

1

9

4
1

10
11
12
13
14
15
16
17
18
19
20

21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

44
45
46
47
48
49
50
51
52
53

54

5
3
2
6

3
6
6
2

2

2
3

1
4
2
1
2
2

1

2
1

1

4
1

4
4
4
6
6
6
3
3
2
1

Numer of

Cluster1
Cluster2
VCluster3
Cluster4

Within

Average

Maximum

sum of

from

from

298.660
318.294
490.251
182.870

4.494
3.613
11.895
2.681

9.049
6.800
16.915
7.024

Wi thin

Average

Maximum

sum of

from

from

490.251
128.783

11.895
2.669

5.521

cluster distance distance

observations squares centroid "Centroid
11
20
3

20

4

4

1

4

4

4
3
6

Numer of clusters:

2

2
1

Numer of clusters:

4
2
4
4

4
2
2
2
1
1

4
1

4

4

6
4
5
3

2
4

2
4
2
2
5
1

4

3

1
4

4
4
3
2

6
6

4

6
1

4

3
6
2

1

2
1
2

4

6

Numer of

cluster distance distance

observations squares centroid centroid
4.008
2.884
90.154
10
Cluster1
2.428
1.
ti3
22.813
8
Cluster2
6."651
3.346
116. S18
8
Cluster3
5.977
2.513
78.508
10
Cluster4
16.915

¡.lusterS

Cluster6

vi IdcMl.c.a,\

3

15

342
12.17 (a), (b) Dendrograms for single linkage and complete linkage follow. The
dendrograms are similar; as examples, in both procedures, countries 11 and 46
form a group at a relatively high level of distance, and countries 2, 19,35,4,48
and 27 form a group at a relatively small distance. The clusters are more apparent

in the complete linkage dendrogram and, depending on the distance level, might
have as few as 3 or 4 clusters or as many as 6 or 7 clusters.

cOoø . ... , .. ,'~.. .,., , . ' , " . '..' ,

, . ,'~~~~~~~"~"1'\~~~'~'

, Countries

343
(c) The results for K = 4 and K = 6 clusters are displayed below. The results seem

reasonable and are consistent with the results for the linkage procedures.
Depending on use, K = 4 may be an adequate number of clusters. The results
for the men are similar to the results for the women.
Data Display

Country Cl us tMern=4
1
2
3

2

4
5
6

7

2

ClustMem=6

2

2

4
4

4
1
4

1

3

4

6
2
1

8

2

9

4

2

10
11

2
3

2
5

12
13

2

1

2

14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

2

1
2
3

1
2

2

4
4
4

6
4

1

2

1
1

2
1

3

2
1
4

2

4
4

6

4
4

6
4
2

30
31
32

2
2

1

33

1

3

34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

1
4

4

1

3

4

4

4

2
2
3

49
50
51
52
53
54

2

4
3

1

4

Wi thin

Average

Maximum

cluster distance distance

Numer of sum of
from
from
observations squares centroid centroid
Clusterl
10 169 .042
3.910
5.950
Cluster2
21
73 .281
1. 684
3.041
VCluster3
2
49 .174
4.959
4.959
Cluster4
21
56 .295
1. 481
3.249

Numer of clusters:

4
1
2
5
3

2

4
2

4

2

2
1
2

1
3

2

4

6

Average

Maximum

cluster distance distance

Numer of sum of
from
from
observations squares centroid centroid
Clusterl
12
26.806
1.418
2.413
Cluster2
15
18 _ 764
1.048
1. 844
Cluster3
10 169 _ 042
3.910
5.950
Cluster4
10
10.137
0.935
1.559
vkluster5
2
49.174
4.959
4.959
Cluster6
5
6 _ 451
1.092
1.£06

3

3
2

6

wi thin

1

4

"4
4

4

4

2
2

2
1
1

Numer of clusters:

/.U-t~\'CA

12.18.

344

St~s

.1

(.

'4 ì

io

.1
(.0''7)

i

h

~

.2

. Superior

North

r
.

The multidimensional scaling
configuration is consistent
with the locations of these cities
on a map.

St. Paul t MN

Marshfield

.

,

Wausau

· Appleton

Dubuque, IA

'.

Mad i son

.

Monroe

,.

· Ft. Atki nson

.'

.

Be 1 0; t

· -ch

i1

Mi lwaukee

ago , It

345

12.19.

The stress of final configuration for q=5 is 0.000. The sites in 5 dimensions and the
plot of the sites in two dimensions are
COORDINATES IN 5 DIMNSIONS

v~ABLE

--------

PLO

DIMENSION

--------1

P1980918

A

.51

B
C
D
E

-1. 32

P1SS0960
P1S30987
P1361024

Pl3S100S
P1340945

G

P193ll31

Pl3ll137

P1301062
DIMENSION

F
H

I

.47
.39
.23
.47
.58

-1.12
-.22

2

-.28

3

4

5

-.68
.12
-.05 -.02
.69
.30
.06
-.07
.34
.09
.10
.05
.30 -.32
.12
.14 -.22 -.14 -.28
-,35 .46 .18 -.10
-.31 .~S - .01
-1. 12
,61 -.70 -.06
.01
.24
.62
.19
.05

2I
+
+
I
I
1I
+
+
I
B
I
E
DF
I
oI
+
C
+
I
AG
I
I
I
-1
+
H
+
I
I
I
I
-2
+
+
-2 -1 0 1 2
2

-+------------ --+--------------+--------------+--------------+-

-+--------------+ --------------+ --------------+--------------+-

DIMNSION 1

The results show a definite time pattern (where time of site is frequently determined

by C-14 and tree ring (lumber in great houses) dating).

346
12.20~ A correspondence analysis of the mental health-socioeconomic .data

A correspondence analysis plot of the mental health-socioeconomic-data

Ex

;\12 = 0.026

It
C\

Ò

a Impaired

..It
ò

Ox

It
0

ò
U

...~..~~.~.~.~~t:........................................... L..............:...... ......._.Ç..~...

1.2-0.0014 8- I · MUd

~
9
It
..
9
Ax

It
C\

9

a Well

-a.07

-a.05

-a.03

-a.01

0.01

0.03

c2

u

v

-0.6922 0.1539 0.5588 0.4300
-0.1100 0.3665 -0.7007 0.6022

-0.6266 -0.2313 0.0843 -0.3341
-0.1521 -0.2516 -0.5109 -0.6407

0.0411 -0.8809 -0.0659 0.4670

o . 0265 0 . 5490 0 . 5869 -0. ~756

0.7121 0.2570 0.4388 0.4841

0.4097 0.4668 -0.5519 -0.2297
o . 6448 -0. ~032 0 . 2879 -~. 3062

lambda
0.1613 0.0371 0.0082 0.0000

Cumulative inertia
0.0260 0.0274 0.0275

Cumulative proportion

o . 9475 0.9976 1.0000

The lowest economic class is located between moderate and impaired. The next lowest

das is closetto impairro.

347

12.21. .A correspondence analysis of the income and job satisfaction data
A correspondence analysis plot of the income and jOb satisfaction data

~ $50,000 c

..II
ò
..0
ò

VS
x

II

$25.000 . $50,000 c

0q

q0
......j..i.;;.ö:öööï..................................r.........................................

u II0
9

so
l(

MS
x

c: $25.000 c

~

9

II
N

9 vp
-o.Q5

-0.025

-0.005

0.005

0.015

c2

u

V

-0.6272 -0.2392 0.7412

0.2956 0.8073 0.5107

-0.6503 -0.6661 -0.3561
-0.1944 0.5933 -0.7758

o . 7206 -0.5394 0.4356

-0.3400 0.3159 0.2253
0.6510 -0.3233 -0.4696

lambda
0.1069 0.0106 0.0000

Cumati ve inertia
0.0114 0.0116

Cumulative proportion
0.9902 1.0000

Very satisfied is closest to the highest income group, and v€ry dissatisfid is b€low the
lowest income group. Satisfaction appears to in'Cl'ease with income.

348

12.22. A correspondence analysis of the Wisconsin forest data
A correspondence analysis plot of the Wisconsin for.est data
¡ ì.12 = 0.537
C!

Ironwood 59 D Sugai;Maple
x

S10 D ¡àasswood
x

CD

d

co

d

SSe

"#

d

S7e

C\

d
..u 0d

RedOak
x
'.n ........ .......... n ......... ..................1'.................. ..............Uï;Ö..Ö96
AmericànElm
x.

WhiteOak
x

"#

SSe

9
S4e
BurOak.

CD

9

S2~eS1

'I

BlackOak
x

..C'
I

-0.6

-0.4

-0.2

0.0 0.1 0.2 0.3 0.4 0.5 0.6

c2

349

U

-0.3877 -0.2108 -0.0616 0.4029 -0.0582 0.326S 0.4247 -0.1590
-0.3856 -0.2428 -0.0106 0.4345 -0.1950 -0.1968 -0.2635 -0.3835
-0.3495 -0.1821 0.4079 -0.5718 0.2343 -0.1167 0.3294 -0.1272

-0.3006 0.1355 0.0540 -0.2646 0 .0006 -0.0826 -0.6644 -0.3192
-0.1108 0.5817 -0.4856 -0.1598 -0.2333 0.1607 0.0772 -0.0518

o . 2022 0 . 5400 0 . 4626 0 . 2687 -0.0978 -0.3943 0 . 2668 -0.3606
0.1852 -0.0756 -0.5090 -0.0291 0.6026 -0.1955 0.1520 -0.5154

0.3140 0.0644 0.3394 0.1567 0.3366 0.6573 -0.2507 -0.2267
0.4200 -0.3484 -0.0394 0.1165 -0.0625 -0.3772 -0.1456 0.1381

o . 3549 -0.2897 -0.0345 -0.3393 -0.5994 0 . 20020 . 1262 -0.4907
V

-0.3904 -0.0831 -0.4781 0.4562 -0.0377 0.3369 0.4071 -0.3511
-0.5327 -0.4985 0 .4080 0.0925 -0.0738 -0.3420 -0.2464 -0.3310
-0.1999 0.3889 0.4089 -0.3622 0.4391 0.3217 0.1808 -0.4260

0.0698 0.5382 -0.1726 0.3181 -0.0544 -0.1596 -0.6122 -0.4138
-0.0820 -0.0151 -0.4271 -0.7086 -0.4160 -0.1685 0.0307 -0.3258

0.4005 0.0831 0.1478 0.1866 -0.0042 -0.5895 0.5587 -0.3412
0.3634 -0.4850 -0.3232 -0.0937 0.6298 0.0164 -0.2172 -0.2745
0.4689 -0.2476 0.3150 0.0726 -0.4771 0.5142 -0.0763 -0.3412

lambda
0.7326 0.3101 0.2685 0.2134 0.1052 0.0674 0.0623 0.0000
Cumulati ve inertia
0.5367 0.6329 0.7050 0.7506 0.7616 0.7662 0.7700

Cumulati ve proportion
0.6970 0.8219 0.9155 0.9747 0.9891 0.9950 1.0000

350

12.23' We construct biplot of the pottery type-site data, with row proportions as variables.

Eigenvectors of S

S

0.0511 -0.0059 -0.0390 -0.0061

-0.0059 0.0084 -0.0051 0.0025
-0.0390 -0.0051 0.0628 -0.0187
-0.0061 0.0025 -0.0187 0.0223

0.6233 0.5853 0.1374 -0.5
0.0064 -0.2385 -0.8325 -0.5
-0.7694 0.3464 0.1951 -0.5
O. 1396 -0.6932 0 . 5000 -~. 5

Eigenvalues of S
0.0978 0.0376 0.0091 0.0000

pel pc2 pe3 pe4
St. Dev. 0.3128 0.1940 0.0952 0
Prop. of Vax. 0.6769 0.2604 0.0627 0
Cumulati ve Prop. 0.6769 0.9373 1.0000 1

As in the ~or~esondence analysis.

351

12.24. vVe construct biplot of the mental health-socioeconomic data, with column proportions
as variables.

A bipJot of the mental health-socioeconomic data
-0.15

-0.10

-0.05

0.0

0.05

0.15

0.10

0co

oo

0

c:

c:

oo

0
c:

D

C\

0

C

c:

0C\

Mild
C\

ci

E

0
()

c:
mpaired

0

0

Well

c:

c:
A

C\

0

C\

0

c:.

c:i

Moderate

E

oo

0

c:.

co

0
c:i

-0.10

-0.05

0.0

0.05

0.10

Compo 1

S

Eigenvectors of S

0.003089 0.000809 -0.000413 -0.003485
o . 000809 0 . 000329 -0.000284 -0.000853
-0.000413 -0.000284 0.000379 0.000318
-0.003485 -0.000853 0.000318 0.004021

-0.ô487 0.0837 -0.5676 0.5
'~0.1685 0.4764 0.7033 0.5
0.0794 -0.8320 0.2270 0.5

0.7379 0.2719 -0.3628 0.5

Eigenvalues of S

0.007314 0.000480 0.000024 0.000000

pc1 pc2 pe3 pc4
St. Dev. 0.0855 0.0219 0.0049 0
Prop. of Vax. 0.9355 0.0614 0.0031 0
Cumulati ve Prop. 0.9355 0.9969 1.0000 1
The biplot gives similar locations for health and socioeconomic status. A i"eflction
about the

45 degi-ee line would make them appear more alike.

352

12.25. A Procrustes analysis of archaeological data
A two-dimensional representation of archaeological sites
produced by metric multidimensional scaling

C! _

.-

P1931131

P1301062

0

ia _

c0
iic
ai
E

Pl361024

P~7

C! _

0

PI55096

Õ

pP198il18
1340 5

"C

c0
u

ai
en

~C!

.-f

P1311137

ia
..

-1.0

i

I

i

i

i

-0.5

0.0

0.5

1.0

1.5

2.0

First Dimension

A two-dimensional representation of archaeological sites
produced by nonmetric multidimensional scaling

C! _

.-

P130106
P1931131

o

ia _

oc
iic
Q)

P1351005 P1361024

o
c) P1530987
P155090

E

Õ
"C

C

ou
ai

Il Pl340945

c)I -

en

P19B018

o

P1311137

-7 -

~I

I

-1.0

-0.5

0.0

i

i

0.5

1.0

First Dimension

-T

1.5

2.0

353

Site
P1980918
P1931131
P1550960
P1530987
P1361024
P1351005
P1340945
P1311137
P1301062

Metric MDS
-0.512 -0.278

Nonmetric MDS
-0.276 -0.829

1.318 0.692

1.469 0.703

-0.470 -0.071

-0.545 -0.156
-0.338 -0.048

-0.387 0.088
-0.234 0.296
-0.469 0.137

-0.642 0.387

-0.581 -0.349

-0.889 -0.409

1. 118 -1.122

1. 262 -0. 989

0.216 0.608

0.096 0.963

-0. 137 0 . 379

-0.1459 0 .9893

v
-0 . 9977 -0. 0679
-0 . 0679 O. 9977

Q

Lambda

u
-0.9893 -0. 1459

0.9969 0.0784
-0.0784 0.9969

4.7819 0.000
0.0000 2.715

To better align the metric and nonmetric solutions, we multiply the nonmetrk scaling
solution by the orthogonal matrix Q. This corresponds to clockwise rotation of the
nonmetric solution by 4.5 degrees. After rotation, ,the sum of squared distanc.e, 0.803,
is reduced to the Procrustes measure of fit P R2 = 0.756.

354
12.26 The dendrograms for clustering Mali Family Fars are given below for

average linkage and Ward's method. The dendrograms are similar but a moderate
number of distinct clusters is more apparent in the Ward's method dengrogram
than the average linkage dendrogram. Both dendrograms suggest there may be as
few as 4 clusters (indicated by the checkmarks in the figures) or perhaps as many

either dendrogram
would depend on the use and require some subject matter knowledge.
....:..'...:.......:...
..:..:.'._':.....
.. ....:' -......
.,....: .........,..-....:.:..,
.' '"..-',,....
.... '."...".':.-'"
.. ,-,,',
- .. ':',:-

..,..."..,.....'..,....::..
....', .:..... ," " -.

as 7 or 8 clusters. Reading the "right" number of clusters from

. Average Linkage, Euclidean Distance; Malilf=aI1i1Yi,Fatms
79.43

O.OO"f~W~"~~~~~'\~
Fars
..Waíf

.' . ..kage,
EUdit:eanOisteince;M,.
__ ,.....:.... -:. .c......
c." _:' .....'.. .",......-...., ..: "p"-. _ _ .. .... .. ...... .'

643.37

428.91

Fars

355

12.27 If average linkage and Ward's method clustering is used with the standardized
Mali Family Farm observations, the results are somewhat different from those
using the original observations and different from one another. The dendrograms
follow. There could be as few as 4 clusters

(indicated by the checkmarks in the

figures) or there could be as many as 8 or 9 clusters or more. The distinct clusters
we focus
are more clearly delineated in the Ward's method dendrogram and if
attention on the 4 marked clusters, we see the two procedures produce quite
different results.
'-..,----_._- _.,-- ._-.- _.-_.-- ,,-,',,' ".."." ,-:,:;,--,,:-:::,-,,-,
_..,'::-',.;:-::-_.,--,-.',-.,'.. .---:.--..'"',',,.,-:---.,:-.-'-,,-,",'.",--..i.,.,_.',-':.--,.--.--._:-.:..__.',,",.,-,-'-.,--,,'--.'.','.,'--.',,'-',-,'.-,',',','.--__:,":-::.-',':-.-_--:,_::--X_,d,-i'.-

~ërilge t.jnk~lIe,Eu(¡ndean;lIsll. Mali:fiamil,;f~rm$h('5tandat4jlCc-:

8.03

uCD
i:

5.36'

Û

Q

2.68

....
,'~oall~~
,....~~'\~-:..-~1~¥?TFars
W~td..Llnk.figtjifi)J~lidean Di$t; MållFamilyFafm!i
44.51 .

I 29.68
'e

,8
M

¡

356

12.28 The results for K = 5 and K = 6 clusters follow. The results seem reasonable and
are similar to the results for Ward's method considered in Exercise 12.26. Note as

the number of clusters increases from 5 to 6, cluster 1 in the K = 5 solution is
paritioned into two clusters, 1 and 6, in the K = 6 solution, there is no change in
the other clusters. Although not shown, K = 4 is a reasonable solution as welL.
D.ata Display
Farm

ClustMem=5

ClustMem=6

1

1

2

2

1
2

3

3
3

3
3

4

~
5

5
1

7

4

8
9

4

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

69
70
71

cE

2
4
3
3
3
3

::
5
4
4
2

4

3

3
3
3
3
2
3

4

4

3

3

5

5

3

3
3
3
5

2

3
3

5
5

3

3

3

3
3
2

3
3

3
3

3
3
3

3

4
4
5

3
4
4
5

3
3

3
3

4
5

4

3
3

3
3
2
3
2
3
3

3
2
2
3
3

3

2
2
3

5
3

5

!

18.498
19.511
8.878
9.072
15.030

33.076
24.647
21.053
16.024
19.619

Wi thin

Average

Maximum

696.609
4440.330
3298.539
1129.083
1943.156
1005.125

13.005
19.511
8.878
9.072
15.030
22.418

15.474
24.647

Numer of clusters: 6

Numer of

Cluster1

v-luster2
i.luster3
vCuster4
L.uster5
Cluster6

observations
4

11
35
12
8

2

cluster distance distance
from
from
sum of
squares centroid centroid

3

2
3
2
3
3
3

2
2
3
3
3
2
2
3

3

1

2431.094
4440.330
3298.539
1129.083
1943.156

3

3
4

1
1

8

Maximum

3
3

4
4

2

veuster4

\/luster5

6

11
35
12

Average

5

4
4
4

4

Clusterl
vèluster2
veluster3

2

3

5
5

Numer of
observa tions

Wi thin

cluster distance distance
from
from
sum of
squares centroid centroid

5

3

3

Numer of clusters: 5

4
4
4
2
1
1
1

s-

./ rd~lGL1\ for t-wo CttOl(;è S ö~ K

21. 053

16.024
19.619
22.418

357

follow. The results seem reasonable and

12.29 The results for K = 5 and K = 6 clusters

are similar to the results for Ward's method considered in Exercise 12.27. Note as

the number of clusters increases from 5 to 6, clusters 3 and 4 in the K = 5 solution
lose 1 and 2 farms respectively to form cluster 6 in the K = 6 solution, there is no
change in the other clusters. These results using standardized observations are
somewhat different from the corresponding results using the original data. It
makes a difference whether standardized or un-standardized observations are used.
Data Display

Farm

SdC1usMem~5

SdC1usMe=6

1
2

1
5

1

3

3
3

3

5
6

5

5

1

7

3

8
9

3

1
3
3

4

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

5
3

Numer of clusters:

5
3
3
3
3
3

5
4

4
4

4
3
4

3

3

2
3
3

2

4
4

4
4

4
3
3
3

6
3
3
3

4
2
3
4

4

3

3

3
3
3
3

3

vfuster1

4

3

3
6
3
3
3

42
43
44
45

46

4

4

47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

3

3

3

3
3

'68
'69

70
71
72

3

2

5
4
4
4

4
5
3
3

7

Numer of clusters:

2
3

3

4

35
20

vcuster5

4

3
2
3

Average

Maximum

14.050
56.727
55.318
84.099
63.071

1. 568

2.703
4.259

1. 211

1.954
2.970

1. 993

3.172
3.482

Within

Average

Maximum

14.050
56.727

1.568
3.288

2.703
4.259

3.288

3

3
4
3
4
5
3
3

39
40
41

5

C1uster4

4

3

5

Cluster3

3

4
3

within

cluster distance distance
from
from
sum of
Numer 0 f
observations squares centroid centroid

L.1 us ter2

3

5

6
3

l/lusterl
i. us ter2

6

cluster distance distance
from
from
sum of
Numer 0 f
observations squares centroid centroid
5
5

Cluster3
Cluster4

34
18

Cluster6

7
3

~uster5

51. 228

65.501
63.071
7.960

1. 183
1. 806

2.970
1.604

4
5
3
3

2

3
4
3
2
5
4
4

4
4
5
3
3

4

4

4
3

4

3

3

4
2

4

1

1

1
1

1
1

5

5

3

2

/ IdetthcOv\ for f£.uc c.ÛDtc.S o( K

1. 951

3.195
3.482

1. 954

358

12.30 The cumulative lift (gains) chart is shown below. The y-axis shows the per-centage
of positive responses. This is
the percentage of the total possible positive
responses (20,000). The x-axis shows the percentage of customers contacted,
which is a fraction of the 100,000 total customers. With no model, if we -contact
10% of the customers we would expect 10%, or 2,000 = .1 x20,000, of the
positive
responses. Our response model predicts 6,000 or 30% of the positive responses if
we contact the top to,OOO customers. Consequently, the y-values at x = 10%

shown in the char are 10% for baseline (no model) and 30% for the gain (lift)
provided by the modeL. Continuing this argument for other choices of x
(% customers contacted) and cumulating the results produces the lift (gains) chart
shown. We see, for example, if we contact the top 40% of the customers
determined by the model, we expect to get 80% of the positive responses.

Cumulative Gains Chart
100

Ul
CD
Ul
c:
&.
Ul

90
BO
70
60

-+ Lift Curve

CD

-- Baseline

ix 50
CD

E~Ul 40
30
o

0. 20

~ 10 ~
o

o 10 20 30 40 SO 60 70 80 90 100

% Customers Contacted

359

12.31 (a) The Mclust function, which selects the best overall model according to
the BIC criterion, selects a mixture with four multivariate normal components. The four estimated centers are:

..Pi =

3.3188
6.7044
0.3526
0.1418
11.9742

5.1806
5.2871
0.5910
0.1794
5.5369

íÆ2
,. =

7.2454
4.8099
0.3290
0.2431
3.2834

, ¡¿3 =

,

-P4 =

8.6893
4.1730
0.5158
0.2445
7.4846

and the estimated covariance matrices turn out to be restricted to be of the
form 1JkD where D is a diagonal matrix.

The estimated
D = diag(l1.2598, 2.7647,0.3355,0.0053,18.0295)
and the estimated scale factors are 17i = 0.0319, 172 = 0.3732, 173 = 0,0909,
174 = 0.1073.

Theestimatedproportionsarepi = 0.1059, P2 = 0.4986, P3 = 0.1322,P4 =

0.2633.
This minimum BIC model has BIG = -547.1408.
(b) The model chosen above has 4 multivariate normal components.

These four components are shown in the matrix scatter plot where the observations have been classified into one of the four populations.
The matrix scatter plot of the true classification, is given in the next
figure.

Comparing the matrix scatter plot of the four group classification with

the matrix scatter plot of the true classification, we see how the oil samples
from the Upper sandstone are essentially split into two groups. This is clear
from comparing the two scatter plots for (Xb X2).

We also repeat the analysis using the me function to select mixture distribution with K = 3 components. We further restrict the covariance matrices

to satisfy ~k = 1JkD. The K = 3 groups selected by this function have
estimated centers

..Pi =

5.3395
5.2467
0.5485
0.1862
5.2465

,

..P2 =

8.5343
4.2762
0.4988
0.2453
6.6993

,

P3 =

3.3228
6.7093
0.3511
0.1418
11.9780

360

0.10 0.15 0.2 0. 0.3

.,

G~ao

:- +¡ t a a

C DOl! &D Â
"". lA

~ A

A..: aD 0
ec

..
..

o

a
e

o dc ê 0

..

x2

o _ ~+.Îa+ +

a

+

0

lf+ .

a

x3

+ + (; Qj,

~

0
..
ò

o ll ~:i++

o

+

~cOD~+

CD

c'

.l~CD a
a

ii'i' a

g

+ aq.a

o

+ D iIa'l

CaD.
D
ri ¥"

0
0

c

a

a

+

+

a + +-i\+ +
D diD (; D
o a 8a~c 0"

+

o+¡'b,:e

t+ * -

a + + B

c ~c.s

x4

0
0

~ A~

S

++

i!

-

ò

A

+

A

0

A..

..
Ò
0

" n

Ò

i1c+ ++
l0,0
+ ++0.

~d' 0.0 To +0

gtP ~ .. c ,,0

" 4i

0lPo AAA

0

ii~

o

A Aa 'l

e

o + *t * 0
if

c

bO OJ' fl+ +

00

.0 0 a 8 0

A A

i,+ a
'l + 00

10

,

;). ++

(t +

0

0

o

D ,,+'" D

C A A

..

DC o -i+ +

0

D

a

A AA

aaaao +! t

a

~

0 rm a

D

0

o

ò

+

..

+

æ D DaD 0

a cO

+
oii llo'¥

+" ¡ + "

D

i A

00

ft"

/:i
(,
f t (,

0

0

a D a

..

+ ,4

o _+ +

a

"

CD 00"* ++
++ + c
aA
A 0 +0+

G1) D 0

A

-i

'I .0

.i' CD a.

~

e

DO

D DO D

"' il + .. a a

l_

A

I
CD
8!
0
d"ii ~+ 0-i

"

o 00

+l DO ..

c.l D ++

AA"lDD B + B ++

+

d'

a
a

D

a

" I) Dl ".

N

0

00

a

ai"t'+
a00
'" 0++~
++

aa

DC IJ D 0
tm D-+O+ D+ +

Ò °0

0

"

a 00

..

"O.l.,." .

80 ¡ 00 0

a ""A o~ 00 c
D D~a-b~~

* 0

Oil +. +*:J

o

o

oir 0 'D +
!X

Aa

+ c 0
0

+ a

o a

J U

D

8 a0

e

:l

co l'
'l: 11+ +
e H~ +.t

" a

A a a

"

0
N

D 8. +(¡
D0Da
a Ai
c..~
8

a

~""q¡ a a 0

..

D a C°itOD ++

ai c~OD
D ODD
~~A 68

+-0
D D
Do0 0Q

x1

+

t +t+
aa +:t
,

+ + '" t;

~
.§
,pa+ooa

o CT.

..

x5

..

o :t
+0 +L¡i + ..
D QCoO
D-i ..&

D ë ages rì ~

ofr's'co'' a

0.0 02 0.4 0.6 0.8 ,.0 12

Figure 1: Classification into four groups using Mclust

..
-T

T

10

12

N

361

i.

0,10 0.15 020 0.25 0.3

6 7

4
i

I

Do

,~OCG

x1

AD

(OJO.

ceo 00
c

.

o

0

... flo

6

.

o

0

c

'"

o 'l ~.o

00

08
o~

. 0
~ i0

°00°1a&
110

II

li

x2

'i'"
0 0 . HCe
~O.DC

0

.

e
~

o 0 ~ ~gåo 0

c

0

.00.

B

.

c

c 0

0

c

-i

cOc~°Qi°o

c
of~aoOl"
00 c

0

B

.

o 0
a

II
'"

0

0.0 :"0
0

.

~
N

9..

I/

O.
~

~

0

0

000 8'0

0

· ¡¡o .1 00 0

0

O.

i. i"o.i

-

oS
o. 0
0

Oll iP 0

)0

'ì

0
, 0
0

0

D ÚD§DOcjJ

o. Do
000

. :U~
, 6

!ì ~c

0
,8 § 0 ~

.

o

r

0

o 'I 0 ~o
. 0 _'I 0

8

"

°

..

0

0

0

0

.. 0 lD 0

. .
0

8.
¡Pll 0 Øl0là

0

0'0 cP 0

.

o Il

1'

n

"

u

"

00
DD D 0 0

i:

fk coooe DO D

o D .a DC

.0 0 00

om

¡; C "'0

O.

"'..

ufO

. .. n

.&0 0000

o

ci.. 0 0 go

o
N

d An

.t

ior..' 0

.

~

å

00°0

.
.

.

~.

0,¡ 'F
.
. 'lQ)
0

0

c
~ 0 8°~o

.0
00

0

. !.
,. 00
l. . . 8
~ ° .0
0

0

10

Figure 2: True

0., c

II

å

00

0
4

..0c"

llo

0

m
0

0

.
0
0"0

.

0

- ¡¡Ec 0.0
·

rJ

~

å

0
å

c

rlo
S

..

oB

f 0

o o.
00

o 0i

.

0 0

0

x5

.0

~

o 0lf

0

.

DO

00 dt

.0 ~ o.

0

c n

B

o 0

iiJ
.. I

0

-t00

.0 .

0
Dc

0
0

0

i

0

~OO

0

0

.. l

0°il°1l08

Il

=0

.. lIO II

~

x4

Bg E c
0
B 8

C ø g

q¡ iao

0
E

0

0

OdjOoo 0

C

..

.

. ,0

0
C

0

0

0

o.¡¡eo

c

°OCOo~i¡

! .0
0 D.. ..0

o"C

'l °

B

"''i8'ci 0

- 0

0

0 0
0 0

00

DC 0

~D.. CC

..

"!

"
0

D.:t C

00 DJoOCOO 0

o

..a CD

0

c °coog

0

o 1& ODD

0

"

å

0

. m
'"

.. t- DC CJ DO i.

c

ni .0

li

0

DO DO D

.. 0

"" 0

00

o DO

000

0

0

oO"lu B 0 !l 00

x3

00

0

i:

D

0

B

c c
0
0

'.

0

.l. ii0
~ ..4:

C

.l SubMuli

II

B
CC 0

0

0

e "-

.f Jl'òrt'ó

0.0 0.2 0.4 0,6 0.8 1.0 1.2

classification into sandstone strata

II

Cl

Upper

0

Wilhelm

~

T

i
10

12

N

362

the estimated diagonal matrix

D = diag(1O.1535, 2.6295,0.2969,0.0052,24.0955)
with estimated scale parameters rii = 0.3702, rì2 = 0.1315, rì3 = 0.0314, with

resulting BIG = -534.0949.
The estimated proportions are Pi = 0.5651, P2 = 0.3296, P3 = 0.1052.
If we use this method to classify the oil samples, the following samples

are misclassified:

7 19 22 25 26 27 28 29 30 31
32 33 34 35 39 44 45 46 49
and the misclassification error rate is 33.93%.



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.6
Linearized                      : Yes
Create Date                     : 2010:03:27 19:57:12-07:00
Creator                         : Adobe Acrobat Pro 9.2.0
Modify Date                     : 2010:03:27 19:57:12-07:00
XMP Toolkit                     : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:08:04
Metadata Date                   : 2010:03:27 19:57:12-07:00
Creator Tool                    : Adobe Acrobat Pro 9.2.0
Format                          : application/pdf
Document ID                     : uuid:2eb21c76-bfd9-4e7b-932a-6ff32023820e
Instance ID                     : uuid:a7eb123e-9e31-45bc-a8c7-f6533009ac12
Producer                        : Adobe Acrobat Pro 9.2.0
Page Count                      : 369
EXIF Metadata provided by EXIF.tools

Navigation menu