Solu Manual For MSA Johnson
User Manual:
Open the PDF directly: View PDF .
Page Count: 369
Download | |
Open PDF In Browser | View PDF |
Preface This solution manual was prepared as an aid for instrctors who wil benefit by having solutions available. In addition to providing detailed answers to most of the problems in the book, this manual can help the instrctor determne which of the problems are most appropriate for the class. The vast majority of the problems have been solved with the help of available the problems have been solved with software (SAS, S~Plus, Minitab). A few of computer hand calculators. The reader should keep in mind that round-off errors can occurparcularly in those problems involving long chains of arthmetic calculations. We would like to take this opportnity to acknowledge the contrbution of many students, whose homework formd the basis for many of the solutions. In paricular, we would like to thank Jorge Achcar, Sebastiao Amorim, W. K. Cheang, S. S. Cho, S. G. Chow, Charles Fleming, Stu Janis, Richard Jones, Tim Kramer, Dennis Murphy, Rich Raubertas, David Steinberg, T. J. Tien, Steve Verril, Paul Whitney and Mike Wincek. Dianne Hall compiled most of the material needed to make this current solutions manual consistent with the sixth edition of the book. The solutions are numbered in the same manner as the exercises in the book. Thus, for example, 9.6 refers to the 6th exercise of chapter 9. We hope this manual is a useful aid for adopters of our Applied Multivariate Statistical Analysis, 6th edition, text. The authors have taken a litte more active role in the preparation of the current solutions manual. However, it is inevitable that an error or two has slipped through so please bring remaining errors to our attention. Also, comments and suggestions are always welcome. Richard A. Johnson Dean W. Wichern .ì Chapter 1 1.1 Xl =" 4.29 X2 = 15.29 51i = 4.20 522 = 3.56 S12 = 3.70 1.2 a) Scatter Plot and Marginal Dot Plots . . . . . . . . . . . . 17.5 . 15.0 I' . . . . . 12~5 . . . )C 10.0 . 7.5 . . . . . . . 5.0 0 4 2 6 10 8 12 xl b) SlZ is negative c) Xi =5.20 x2 = 12.48 sii = 3.09 S22 = 5.27 SI2 = -15.94 'i2 = -.98 Large Xl occurs with small Xz and vice versa. d) x = 12.48 (5.20 ) Sn -- -15.94) -15.94 ( 3.09 5.27 R =( 1 -.98) -.98 1 . 2 1.3 SnJ6 : -~::J x = - -. 40~ UJ R = . 3~OJ .(1 (synet:; .577c ) L (synetric) 2 . 1.4 a) There isa positive correlation between Xl and Xi. Since sample size is small, hard to be definitive about nature of marginal distributions. However, marginal distribution of Xi appears to be skewed to the right. . The marginal distribution of Xi seems reasonably symmetrc. .....'....._.,..'....,...,..'.": SCëtter.PJot andMarginaldøøt:~llôt!; . . . . . . 25 20 . I' )C 15 . . 50 . . . . .. . . 10 . . . . . . . . . . . . 100 ;.. 150 xl 200 250 300 b) Xi = 155.60 x2 = 14.70 sii = 82.03 S22 = 4.85 SI2 = 273.26 'i2 = .69 Large profits (X2) tend to be associated with large sales (Xi); small profits with small sales. 3 1.5 a) There is negative correlation between X2 and X3 and negative correlation between Xl and X3. The marginal distribution of Xi appears to be skewed to the right. The marginal distribution of X2 seems reasonably symmetric. The marginal distribution of X3 also appears to be skewed to the right. Sêåttêr'Plotäl'(i'Marginal.DotPiØ_:i..~sxli. . . . . . . . . 1600 )C . . . 1200 M . . . . . . . . . . .800 . 400 . . . . 0 25 20 15 10 . . . . x2 . '-' . .Scatiêr;Plötànd:Marginal.alÎ.'llÎi:lîtfjtì.I... . . . . . 1600 )C . . . . . . 1200 M . .. . . 800 . 400 . . 0 50 100 150 xl 200 250 . 300 . . . . . . . . . 4 1.5 b) 273.26 Sn = 273.26 (- 32018.36 82.03 x = 14.70 710.91 (155.60J 1 -.85 1.6 - 948.45 461.90 -.85) -.42 .69 R = ( ~69 4.85 -32018.36) -948.45 -.42 1 a) Hi stograms Xs Xi NUMBER OF HIDDLE OF HIDIILE OF INTERVAL co OBSERVATioNS ***** 5 INTERVAL 5. 6. 7. 8. 9. 6. ******** S ****** X2 NUHBER OF OBSERVATIONS 1 * HIDDLE OF INTERVAL 30. 40. 50. 60. 70. 80. 90. J 2 3 10 12 a 100. 110. 2 1 s 9. 10. 11. 12. 13. 14. 15. 16. 17. un* 6 4 4 ********** ************ ******** 0 2. 4. 6. 1* OBSERVA T I'ONS 1. .... . :3 . 4. s. NUMBER OF. OBSERVA T IONS 1 J ***$********* 15 *************** a ******** 5 1 ui** * J **** *** 4 7. ******* 5 n*** 2 2 ** u 1* 2 1 ** * 20. 22. 24. 26. X4 NUltEiER OF OI4SERVATIOllS 7 ******* B ******** 10. 12. 14. 16. 18. NUHBER OF S u*** INTERVAL * 1 I' I DOLE .OF 19 ******************* 9 ********* .3 U* HIDDLE OF * 1 Xl :s ***** 4. 5. 6. 7. * 0 a. J. * 1 n * X3 2. **** 1 0 INTE"RVAL HIDIILE OF INTERVAL 4 0 19. 20. 21. ** *** u** Uu.* .S LS. n* ***** ***** ****** **** S a. *********** 11 5 6 U* J 7. u***** 7 10. oJ . NUMB£R. OF 08SERVATIONS 2 ** o o X7 HIDDLE OF INTERVAL 2. J. 4. s. NUMBER OF OBSERVATIONS 7 ******* 9 ********* 1* 25 ************************* 5 1.6 2.440 7.5 b) 293.360 73 .857 - x -2 . 714 4.548 = 2. 191 -.369 3.816 1.486 S = -.452 - . 571 -2.1 79 .,.67 -1 .354 30.058 2.755 :609 .658 6.602 2.260 1 . 154 1 . 062 -.7-91 .172 11. 093 3.052 1 .019 30.241 .580 n 1 0 . 048 9.405 3.095 .138 .467 (syrtric) The pair x3' x4 exhibits a small to moderate positive correlation and so does the pair x3' xs' Most of the entries are small. 1.7 ill b) 3 x2 . 4 4 .. . 2 . Xl 2 2 4 Scatter.p1'Ot (vari ab 1 e space) ~ ~tem space.) 1 -6 1.8 Using (1-12) d(P,Q) = 1(-1-1 )2+(_1_0)2; = /5 = 2.236 Using (1-20) d(P.Q)' /~H-1 )'+2(l)(-1-1 )(-1-0) '2t(-~0);' =j~~ = 1.38S Using (1-20) the locus of points a c~nstant squared distance 1 from Q = (1,0) is given by the expression t(xi-n2+ ~ (x1-1 )x2 + 2t x~ = 1. To sketch the locus of points defined by this equation, we first obtain the coordinates of some points sati sfyi ng the equation: (-1,1.5), (0,-1.5), (0,3), (1,-2.6), (1,2.6), (2,-3), (2,1.5), (3,-1.5) The resulting ellipse is: X1 1.9 a) sl1 = 20.48 s 12 = 9.09 s 22 = 6. 19 X2 . 5 . . . 0 . -"5 . . . 5 xi 10 7 1.10 a) This equation is of the fonn (1-19) with aii = 1, a12 = ~. and aZ2 = 4. Therefore this is a distance for correlated variables if it is non-negative easily if we write for all values of xl' xz' But this follows 2. 2. 1 1 15 2. xl + 4xZ + x1x2 = (xl + r'2) + T x2 ,?o. b) In order for this expression to be a distance it has to be non-negative for 2. :¿ all values xl' xz' Since, for (xl ,x2) = (0,1) we have xl-2xZ = -Z, we conclude that this is not a validdistan~e function. 1.11 d(P,Q) = 14(X,-Yi)4 + Z(-l )(x1-Yl )(x2-YZ) + (x2-Y2):¿' = 14(Y1-xi):¿ + 2(-i)(yi-x,)(yZ-x2) + (xz-Yz):¿' = d(Q,P) Next, 4(x,-yi)2. - 2(xi-y,)(x2-y2) + (x2-YZ): = =,(x1-Yfx2+Y2):1 + 3(Xi-Yi):1,?0 so d(P,Q) ~O. The s€cond term is zero in this last ex.pr.essi'on only if xl = Y1 and then the first is zero only if x.2 = YZ. 8 1.12 a) If P = (-3,4) then d(Q,P) =max (1-31,141) = 4 b) The locus of points whosesquar~d distance from (n,O) is , is X2 .1 1 .. 1 -1 7 x, -1 c) The generalization to p-dimensions is given by d(Q,P) = max(lx,I,lx21,...,lxpl)' 1.13 Place the faci'ity at C-3. 9 1.14 a) 360.+ )(4 . 320.+ . . 280.+ . . .... . . 240.+ 200.+ . ... . . . . .. . .. . .* I: I: 160.+ +______+_____+-------------+------~.. )(2 130. 1:5:5. 180. 20:5.' 230. 2:5:5. Strong positive correlation. No obvious "unusual" observations. b) Mul tipl e-scl eros; s group. 42 . 07 179.64 x = 12.31 236.62 13.16 116.91 Sn 61 .78 -20.10 61 . 1 3 -27 . 65 812.72 -218.35 865.32 3 as . 94 221 '. 93 90.48 286.60 82.53 = 1146.38 (synetric) 337.80 10 .200 1 1 R -. H)6 .167 -.139 .438 .896 .173 1 .375 .892 .133 = 1 ( synetrit: ) 1 Non multiple-sclerosis group. 37 . 99 147.21 i = 1 .56 1 95.57 1.62 5.28 1.84 1.78 273.61 95.08 11 0.13 sn = 1 01 . 67 3.2u 1 03 .28 2.15 2.22 .49 2.35 2.32 183 . 04 . (syietric) 1 .548 1 R = (symmetric) .239 .132 .454 .727 .127 .134 1 .123 .244 1 .114 1 11 1.15 a) Scatterplot of x2 and x3. ., ..".. ... . . . ., .. .0 . . . . . . . + . . . . l . . . . + . . . . + . . . . . . . . . +. . . . . . . . . + . . .. .... + . . . . . . . . . + . .. . . . . . . . . . . .l - . . I . . 1 . 3. it . . . J - . . 1.2 . t 1 1 ., . I 1 -- . ~ . 1 . 1 t . "'. . 1 - E E . 1 1 I -- ~ . . . t I : . I -.- '_ 1 X:i . ~ I I +. 1 . 2 . 1 1 III 2. cl . . 1 I 1 . . 1 I . . .z.o .1 3 . . 1 I . + t 1 . . . . I 1 . \1 . 1 t I i . I 1 \1 . . 1 1 . 1 . i J .2 I t 1 . l 1 . . 1 . J 1I 2 1 . 1 .80 .. I . . .. .. . . . . . 75f) . . 1 . .. . .. . 1.25 1.88 t.~~ . .. . 1.75 .. '. ... t. ~ e .. . . .. 2.25 ... ...- 2.7'5 3.25 Z "~A ACTIVITY X% b) 3.54 1.81 x = 2.14 ~.21 2.58 1.27 .. . 3.P,1) 3.S11 . 3.75 G.25 ".--. . . 1I.llfl 12 4.61 1.15 Sn ..92 .58 .27 .61 .11 .12 .57 .09 1.~6 .39 .34 .11 .21 = .85 ;. (synetric) 1 .551 1 .362 .187 1 R = .386 .455 .346 1 (syretric) .537 .15 -.02 .11 .02 -.01 .85 . 077 ' .535 .496 .704 -.035 1 -. 01 0 .156 .071 1 The largest correlation is between appetite and amount of food eaten. Both activity and appetite have moderate positive correlations with symptoms. A1 so, appetite and activity have a moderate positive correl a tion. 13 1.16 There are signficant positive correlations among al variable. The lowest correlation is . 0.4420 between Dominant humeru and Ulna, and the highest corr.eation is 0.89365 bewteen Dominant hemero and Hemeru. 0.8438 0.8183 1.00000 0.85181 0.69146 0.66826 0.74369 0.67789 0.85181 1.00000 0.61192 0.74909 0.74218 0.80980 1. 7927 1. 7348 , R = 0.66826 0.74909 0.89365 1.00000 0.ti2555 0.61882 0.7044 0.6938 0.74369 0.74218 0.55222 0.62555 1.00000 0.72889 0.67789 0.80980 0.44020 0.61882 0.72889 1.00000 x- 0.69146 0.61192 1.00000 -0.89365 0.55222 0.4420 0.0124815 0.0099633 0.0214560 0.0192822 0.0087559 0.0076395 0.0099633 0.0109612 0.0177938 0.0202555 0.0081886 0.0085522 0.02145tiO Sn - 0.0192822 0.0087559 0.0076395 0.0177938 0.0202555 0.0081886 0.0085522 0.0771429 0.0641052 0.0161635 0.0123332 0.0641052 0.0667051 0.0170261 0.0161219 0.0161635 0.0170261 0.0111057 0.0077483 0.0123332 0.0161219 0.0077483 0.0101752 1.17 There are large positive correlations among all variables. Paricularly large correlations occur between running events that are "similar", for example, the 1 OOm and 200m dashes, and the 1500m and 3000m runs. 11.36 .152 .338 .875 .027 .082 .230 4.254 23.12 .338 .847 2.152 .065 .199 .544 10.193 51.99 .875 2.152 6.621 .178 .500 1.400 .027 .065 .178 .007 .021 . .060 .082 .199 .500 .021 .073 .212 .230 .544 1.400 .060 .212 .652 28.368 1.197 3.474 10.508 265.265 x = 2.02 4.19 9.08 153.62 So= 4.254 10.193 28.368 1.197 1.000 .941.871 .809 .782 .728 .669 .941 1.000 .909 .820 .801 .732 .680 .871 .909 1.000 .806 .720 .674 .677 R = .809 .820 .806 1.000 .905 .867 .854 .782 .801 .720. .905 1.000 .973 .791 .728 .732 .674 .867 .973 1.000 .799 .669 .680 .677 .854 .791 .799 1.000 3.474 10.508 14 1.18 There are positive correlations among all variables. Notice the correlations decrease as the distances between pairs of running events increase (see the first column of the correlation matrx R). The correlation matrix for running events measured in meters per second is very similar to the correlation matrix for the running event times given in Exercise 1.17. 8.81 .091 .096 .097 .065 .082 .092 .081 8.66 .096 .115 .114 .075 .096 .105 .093 7.71 .097 .114 .138 .081 .095 .108 .102 x = 6.60 Sn = .065 .075 .081 .074 .086 .100 .094 5.99 .082 .096 .095 .086 .124 .144 .118 5.54 4.62 .092 .105 .108 .100 .144 .177 .147 .081 .093 .102 .094 .118 .147 .167 1.000 .938 .866 .797 .776 .729 .660 .938 1.000 .906 .816 .806 .741 .675 .866 .906 1.000 .804 .731 .694 .672 R = .797 .816 .804 1.000 .906 .875 .852 .776 .806 .731 .906 1.000 .972 .824 .729 .741 .694 .875 .972 1.000 .854 .660 .675 .672 .852 .824 .854 1.000 15 1.19 (a) o _R A 0 IUS RADIUS LHUI.ERUS tlUME~US ILULNA ULNA c- .. I o' '0 .. o' : '. .' .' " " .. .... z- C ... .. C .,. .' .. :; 00 00. : o. .. : .. .. ..'" - -..II 0' i:: , , - .... -z ....I: II ~ " o' " .0 - - .... -I: .. : .. .. QI t- " .~ -..I: ..C co .. .. '" ....C co co ..- UI .. .... ....I: ~ .... t- ~ CI - .QI .0 - QI CI ". i :: ;: o c: en .. .~.,.. '" o ,...I: i = c: ..¡c:: , c: .o'' 0, in .. .... = C ~ .. c: ,. .- ..¡c:; c: . .. en " o' ,. 00 " .. .... Q I c: i- C ,. .. . z: ;: .. : .. ,,' " CI -.. c: i- ., z: ;: .. 00 '. .. o en ..,. - .. -0 c: C CD c,. :: ;: C ..,. .. .. c .... . ". ". . 0, . 16 1.19 (b) ~. .c . P. ~. . . . . i:_...to l . . ... l.!,l .0' ..' \ . ... .. .i -. t . . ~ .. . , -i-. :i~ ... . . . . "-:f' I! .8., ~ ., . I. ~ \; . . .1, . .. .. ~,:~ t.:. . . . . . " . - , . .~.... ...." .. " .. . .. .~ ~; . -it . . .. . t- .1.;. .'\: .. . . . ... . :.. . .~ ..( . .- ,-. \. -l .. 'L ~. . ... .' ~..,... . . .~. . :,. " ";t':",o;, ' tl,' !t ~ 1" ~.,. .... .. . ... \.lý . . .,. . . . . 'l . ... ., _. .-, .. . A. ...~. .,¡ .,..... ~l t.., -Ii.. ... ". .... . .. .' ... . · ~c,. .t:..... . . . . .~. .~,. ~..: .~~ . ,. . .. ... . . . \..... ll. . . . l. .. . .. i l 0 \. \.~ "~ .. f .~. . .;;- :,. .. . ~: . -\1: .. .1. · .I.. .1fiii: '.~: .. . :. 17 1.20 Xl (a) . . .. .. . .. '. . .. .. (b) ,,' A L_-l_ X . .. ... .. . . . . .. ~'T · '\\ .. . .... L _ _ (,"" i. \ l'ø.l l . \. . . ." r . '" ~'~t ... . . ,".. \ .ö . , . ~ x3 . ~ . x1 . .. . . \ X3 X1 . .. . . (a1 The plot looks like a cigar shape, but bent. Some observations. in the lower left hand part could be outliers. From the highlighted plot in (b) (actually non-bankrupt group not highlighted), there is one outlier in the nonbankruptgroup, which is apparently located in the bankrupt group, besides the strung out pattern to the right. (ll) The dotted line in the plot would be an orientation for the classificà.tion. 18 1.21 o (a) o (b) Outlier Outlier ~~ô 0'" . . ... . X1 ... . .. . ... . ...... . . . .. . .. . X1 . . . . . .. . . ...... . . .. . .. ... .... .. .. G~~ . . ..-....l.~... . . ... . . . .. . . X3 . tfe' Ó,e Outlier Q (a) There are two outliers in the upper right and lower right corners of the plot. (b) Only the points in the gasoline group are highlighted. The observation in the upper right is the outlier. As indiCated in the plot, there is an orientation to classify into two groups. 19 1.22 possible outliers are indicated. G Outlier ~ø ~~ ~ø. t I ~~\e X1 . . . .. . .. . .. . . . . . . . fi . / \, X1 Xz/ · . .. . .. . . . . .. x, . .. Xz .." .. . . . ... . .. . . . . .... . . . . . ...,. . x, .. .. . . . . fi .. . ~~e .. Xz . ~,et ot)~ ... )l" ;I ./ . ./ ./ .. .~ .. ... ./ / ././ .. · ),. il ./ ~e~~\.e . .. .. ..s.. Outliers ~ VI Q. .Q -0 VI CI ... s. U ..u VI C s. 0 IØ U s. c: fa s. V) oi = V) s. II C' Cd .:: oi :: V) s. ci V) Co = V) i. s. ~ci :: iu VI II ai ci ~VI:: ..a:u 4; . .. .. z: Cd CJ I) .,. c: . ~ i-aci .Q c: ci ~ s. s. ci VI en :: a. ~ C" ~ :: iU ~ci ~ ~s. ~VIa. --i.. ci to :: iCo .- -u Cd ci Q. VI CI VI ci u .. ..I¡ Cd a c: s. ci .c u N oi c: -= s. ~ciVI en :: ..c:s. iu ci ~VI i-U:: iVI -~ .. -u:: cc: s. ci ..c;I -a: ra :: VI s. c( .G M N. '" +J VI ra ci .c :: en i- Cd ci VI ..~a: i-n: +J l-0 ..u :: "' e a. 20 21 Cl uster 1 1.24 20 10 13 4 C1 uster 2 3 9 14 19 18 C1 uster 3 22 .s 1 22 Clust~r 4 16 8 11 Cl uster 5 21 5 Cluster 6 17 12 2 C1 uster 7 We have cluster~d these faces in. the same manner as those in Example 1.12. Note, however, other groupings are~qually . plausible. for instance, utilities 9 and 18 l1ight be swit.ched from 7. '5 Cluster 2 toC1 uster 3 and so forth. 23 1.25 We illustrate one cluster of "stars.l. The shown) can be gr~uped in 3 or 4 additional r.emai ni ng stars cl usters. .... 10 4 -. '. -. ."-,,. / ....1 ¡.. ..l ".......;- ....'." .¡..~. '/ ....~." I: . ...0: .. .": .. .i .... -,'-1 .....: l. f ~ 20 ",. -. 13 '-a.- (not 24 1.26 Bull data R (a) XBAR Breed SalePr YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt 4. 3816 1742.4342 50.5224 995.9474 70.8816 6.3158 0.1967 1.000 -0.224 0.525 0.409 0.472 0.434 -0.~15 0.487 0.116 -0.224 1.000 0.423 0.102 -0.113 0.479 0.277 0.390 0.317 o . 525 0 .423 1 .000 O. 624 0 . 523 0 . 940 -0.344 0 . 860 0 . 368 0.409 0.102 0..624 1.000 0.691 0.605 -0.168 0.699 0.5££ 0.472 -0.113 0.523 0.691 1.000 0.482 -0.488 0.521 0.198 0.434 0.479 0.940 0.605 0.482 1.000 -0.260 0.801 0.368 -0.615 0.277 -0.344 -0.168 -0.488 -0.260 1.QOO ~0.282 0.208 £4. 1263 0.487 0.390 0.860 0.699 0.521 0.801 -0.282 1.00 0.~66 1555.2895 0.116 0.317 0.368 0.555 0.198 0.368 0.208 0.566 1.000 Sn SalePr -429.02 Breed YrHgt FtFrBody PrctFFB Frame BkFat 2.79 116.28 1.23 -0.17 4;73 -429 ..02 383026.64 450.47 5813.09 -226.46 272.78 15.24 450. 47 2.96 2.79 98.81 2.92 1.49 -0.05 5813.09 98.81 8481. 26 206 . 75 51. 27 -1.38 116.28 -226.46 2.92 206. 75 10.55 4.73 1.44 -0.14 272.78 1.49 51.27 1.44 1.23 0.85 -0.02 15.24 -0.05 -0. 17 -1. 38 -0.14 -0.02 0.01 480 . 56 2.94 128.23 3.00 3.37 1.47 -0.05 46.32 25308.44 81.72 6592.41 82.82 43.74 2.38 9.55 SaleHt 3.00 480.56 2.94 128.23 3.37 1.47 'SaleWt 46.32 25308 . 44 81. 72 6592.41 82.82 43.74 2.38 145.35 16628.94 -0.05 3.97 145 . 35 90 1100 t30 5.0 6.0 7.0 8.0 . . . . . . ., CD CD . Breed . . . . . . . Breed .. N '" . cci . a.. . . . . . .. . . . . . . . . . . ~ . . . . . . . . 8 --¡.: Frame :i . ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 4 6 8 §! . . . . . . ~ . ~ . . . . . FtFrB I , . . . . . . BkFat . '" d '" d . . d O. t 0.2 0.3 0.4 0.5 . . ... . I..,.,;.... '. ..~'.l . . . . .., .1, . . . . . - . -:- .I ..-.. . . .... l. \..i'.. ..,.. -.-'.. o CD .. CD CD on SaleHt l: -:,: . . ,. :; ~ g 2 4 6 8 50 52 54 1i 58 60 25 1.27 (a) Correlation r = .173 Scatterplot of Size Y5 viSitors 2500 . 2000 . 1500 iI üi . .1000 . . Gæct '5lio\£~ "' . 500 .. . . 0 0 1 . . . . . 2 3 4 5 6 7 8 9 Visitors (b) Great Smoky is unusual park. Correlation with this park removed is r = .391. This single point has reasonably large effect on correlation reducing the positive correlation by more than half when added to the national park data set. (c) The correlation coefficient is a dimensionless measure of association. The correlation in (b) would not change if size were measured in square miles instead of acres. 26 Chapter 2 2.1 a) ; ... ---. I, i ; : i , , : ... ------ II i i . I . i , . ~ , I ! . /l oJ : : , r I , -A : t- ~ ..1'- , ø ':, . - ,i :=-: -j.1 3 d~j = ,:': ,,,~.; . .,;' . ,.. .:~..,:/ _~,__ " ,1 ' ,1 .. . . . /" 1-. : '7 ~~ /. -~'"' K (-~ ' "'7' .! . _ ... i I . ~ =, -ii;;;;;.I~ -.A : 'JO " : i i ; , , , l-' : ; '.,; ../ : '-' , ILl / i I; :I Ii I i -tI ! ii i 7. ./ I ; ~ . , , , , I ! ! i/'i : g ¡ , . : ; ; ¡ i ~. . : , . i : I i . , 'i; i ,,/_. . ! " I I , I , I , i ; I i I I I , , I , : I , i !! I i : I . , , , i ! : : 1 I I I , ./ == i . i I J : . ,t I b) i) il) Lx = RX cas(e) = .. x.y = Iß = 5.9l'i 1 = = 19.621 LxLy .051 - - e ; 1 i) ;, 870 = arc cos ( .051 ) proJection of L on x ;s lt~i is1is x = i 3š x = 7~35'35 (1 1 31' c) ': : i , : ! I I . . i , I ;-. i i : l : =i i I 1 I i 02-2: . I !- . ::t i i I , i' i, :i:-'. i .1 .. :i 'i i I .1 i. i :3 . 1 ~ I ~ ; "T ~-:~~-';'-i-' . ~._~-:.i" 1 .' :. . :. -"-- --_..-- --- I 27 15) 2.2 a) b) SA = 20 -6 r-6 1a ( -~ -q SA = - ~ -9 c) d) = AIBI -1 (-1 : e) No. a) AI b) C' c i so 3. . (~ (A I) = A' = A (C'f"l~ :l .(: 1a -ìa lõ,i 31 -1 = J) 10 c) 8 ' 10 = 4 (1: AB = n (i has -Tõ 7)' (AB) , = d) il). (t''-' (C' J' 'l- 1~ iÕ 4 -iÕJ i2 B'A' (12, -7) -: ) 1). A 2.3 = C'B U ':11 ) 11 (~ ~) - = (~ (AB) i ':) 11 (i ~j )th entry k a,. = i aitb1j 1 =Ja"b1, 1 +Ja'2b2' 1 J+...+ 1 a,,,b,,, J R.=1 Consequently, (AB) i has entry. ('1 ~J',)th k c,, = Jl Next ßI I ajR,b1i , 1=1 has .th row (b, i ,b2i ~'" IbkiJ 1 and A' lias. jth 28 column (aji,aj2"",ajk)1 so SIAl has ~i~j)th entry k bliaji+b2ibj2+...+bk1~jk = t~l ajtb1i = cji 51 nce i and j were arbi trary choices ~ (AB) i = B i A I . 2.4 a) I = II and AA-l = I = A-1A. and 1= (A-1A)' = A1(A-l)l. Thus I i = I = (AA - ~ ) I = (A-l)' A,I Consequently, (A-l)1 is the inverse of Al or (AI r' = (A-l)'. (f1A)B - B-1S' I so AS has inverse (AS)-1 · I bl (S-lA-l)AS _ B-1 B-1 A- i. It was suff1 ci ent to check for a 1 eft inverse but we may also verify AB(B-1A-l) =.A(~Bi~)A-i = AA-l = I , ¡s 2.5 IT QQI = -12 13 2,6 12l r _121 r IT IT 13 = 1 69 5 12 i3 IT 5 i3 a 169 ~1 A' is symetric. a) 5i nce b) Since the quadratic form A = AI, = QIQ , 1 :J .l: 9xi - 4x1 X2 + 6X2 x' Ax . (xi ,x2J ( 9 - - .. -2 -:)(::1~ (2Xi.x2)2 + 5(x;+xi) ~ 0 for tX,lx2) -l (O~O) we conclude that A is positive definite. 2.7 a) Eigenvalues: Ål = 10, Å2 = 5 . Nonnalized eigenvectors: ':1 = (2/15~ -1/15)= (,894~ -,447) ~2 = (1/15, 2/15) = (.447, .894) 29 b) A' V-2 -2 ) . 1 fIlS r2/1S. -1//5 + 5 (1/1S1 (1/IS, 9-1/~ 2/~ 2) . (012 c) -1 A = 2//5 0041 1 9(6)-( -2)( -2) (: 9 ,04 .18 d) Eigenval ues: ll = ,2, l2 = ,1 Normal;z~ eigenvectors: ;1 = (1/¡;~ 2/15J ;z =: (2/15~, -1I/5J 2.8 Ei genva1 ues: l1 = 2 ~ l2 = -3 Norma 1; zed e; genvectors: ;~ = (2/15 ~ l/~ J =~ = (1/15. -2/15 J 2) = 2 (2//5) (2/15, 1/15J _ 3( 1/1S)(1//s' -2/151 ' A · (: 2.9 -2 1/15 -2/~ a) A-1 = 1(-2)-2(2) 1 - -1 (-2 -2) -2 =i1131 11 3 6 b) Eigenvalues: l1 = 1/2~ l2 = -1/3 Nonna1iz.ed eigenvectors: ;1 = (2/ß, l/I5J ;z = (i/ß~ -2/I5J cJ A-l =(t 11 = 1 (2/15) (2/15, . 1//5J _ir 1/15) (1//5, -2/ß1 -1 2 1/15 3L-21 5 30 2.10 B-1- 4(4,D02001 _ 1 r 4.002001 -44,0011 )-(4,OOl)~ ~4,OOl . ~.0011 = 333,333 -4 , 001 ( 4,OÒZOCl -: 00011 1 ( 4.002 -1 A = 4(4,002)~(4,OOl)~ -4,001 -: 00011 = -1,000,000 -4 , 001 . ( 4.002 Thus A-1 ~ (_3)B-1 with p=2, aii- and 2.11 With p=l~ laii\ = aii a a a22 = a11a2Z - 0(0) = aiia22 Proceeding by induction~we assume the result holds for any (p-i)x(p-l) diagonal matrix Aii' Then writing aii = A (pxp) a a a . . . Aii a we expand IAI according to Definition 2A.24 to find IAI = aii I Aii I + 0 + ,.. + o. S~nce IAnl =, a2Za33 ... ~pp by the induction hypothesis~ IAI = al'(a2Za33.... app) = al1a22a33 ,.. app' 31 2.12 By (2-20), A = PApl with ppi = pip = 1. From Result 2A.l1(e) IAI = ¡pi IAI Ipil = ¡AI. Since A is a diagonal matrix wlth p p diagonal elements Ài,À2~...,À , we can apply Exercise 2.11 to get I A I = I A I = n À , . '1 1= 1 2.14 Let À be ,an eigenvalue of A, Thus a = tA-U I. If Q ,is orthogona 1, QQ i = I and I Q II Q i I = 1 by Exerci se 2.13. . Us; ng Result 2A.11(e) we can then write a = I Q I I A-U I I Q i I = I QAQ i -ÀI I and it follows that À is also an eigenvalue of QAQ' if Q is orthogona 1 . 2.16 show; ng A i A ; s symetric. (A i A) i = A i (A i ) I = A i A Yl Y = Y 2 = Ax. p _.. .. .. Then a s Y12+y22+ ,.. + y2 = yay = x'A1Ax yp and AlA is non-negative definite by definition. 2.18 Write c2 = xlAx with A = r 4 -n1. Theeigenvalue..nonnalized - - tl2 3 eigenvector pairs for A are: Ài = 2 ~ Å2 = 5, '=1 = (.577 ~ ,816) ':2 = (.81 6, -, 577) 'For c2 = 1, the hal f 1 engths of the major and minor axes of the elllpse of constant distance are ~1 12 ~ ~ ~ = -i = ,707 and ~ =.. = .447 respectively, These axes 1 ie in the directions of the vectors ~1 and =2 r~spectively, 32 For c2 = 4~ th,e hal f lengths of the major and mlnor axes are c 2 ' ñ:, .f c _ 2 _ - = - = 1.414 and -- - -- - .894 . ñ:2 ' IS As c2 increases the lengths of, the major and mi~or axes ; ncrease. 2.20 Using matrx A in Exercise 2.3, we determne Ài = , ,382, :1 = (,8507, - .5257) i À2 = 3.6'8~ :2 = (.5257., .8507)1 We know ,325) A '/2 = Ifl :1:1 + 1r2 :2:2 ,325 __(' .376 1. 701 - .1453 J A-1/2 = -i e el + -- e el _ ( ,7608 If, -1 -1 Ir -2 _2 ~ -,1453 We check Al/ A-1/2 =(: ~) . A-l/2 Al/2 .6155 33 2,21 (a) A' A = r 1 _2 2 J r ~ -~ J = r 9 1 J l1 22 l2 2 l19 0= IA'A-A I I = (9-A)2- 1 = (lu- A)(8-A) , so Ai = 10 and A2 = 8. Next, U;J ¡::J ¡ i ~ J ¡:~ J - 10 ¡:~ J - 8 ¡:~J gives gives ei - . 1/.; - (W2J ¡ 1/.; J e2 = -1/.; (b) AA'= ¡~-n U -; n = ¡n ~J o = /AA' - AI 12-A I - .1 0 80- À40 4 0 8-A = (2 - A)(8 - A)2 - 42(8 - A) = (8 - A)(A -lO)A so Ai = 10, A2 = 8, and A3 = O. (~ ~ ~ J ¡ ~ J - 10 (~J .gves ¡~ gives so ei= ~(~J 4e3 - 8ei 8e2 - lOe2 0 8 0 ~J 4e3 4ei Also, e3 = 1-2/V5,O, 1/V5 J' - 8 (~J ¡ :: J - Gei U so e,= (!J 34 \C) u -~ J - Vi ( l, J ( J" J, 1 + VB (! J (to, - J, I 2,22 (a) AA' = r 4 8 8 J l 3 6 -9 r : ~ J = r 144 -12 J l8 -9 L -12 126 o = IAA' - À I I = (144 - À)(126 - À) - (12)2 = (150 - À)(120 - À) , so Ài = 150 and À2 =' 120. Next, r 144 -12) r ei J = .150 r ei J L -12 126 L e2 le2 . r 2/.; ) gives ei = L -1/.; . and À2 = 120 gives e2 = f1/v512/.;)'. (b) AI A = r: ~ J l8 -9 r438 8J - r ~~ i~~ i~ J l 6-9 25 - À 505 l 5 10 145 0= IA'A - ÀI 1= 50 100 - À 10 = (150 - A)(A - 120)A 5 10 145 - À so Ai = 150, A2 = 120, and Ag = 0, Next, ¡ 25 50 5 J 50 100 10 5 10 145 gives r ei J' r ei J l :: = 150 l:: -120ei + 60e2 0 1 ( J -25ei + 5eg VùU O or ei = 'W0521 lD 145 ( ~5 i~~ i~ J eg e2 ( :~ J = 120 (:~ J 35 gives -l~~~ ~ -2:~: ~ or., = ~ ( j J Also, ea = (2/J5, -l/J5, 0)'. (c) 3 68 -9 (4 8J = Ý150 ( _~ J (J. vk j, J + Ý120 ( ~ J (to ~ - to J 2.24 a) ;-1 = ~ '9 ( 1 c) For ~-l +: À1 = 4, a 1 a b) n À1 = 1/4, À2 = 9 ~ À3 = 1, ':1 = (1 ,O,~) i À2 = 1 /.9, ':2 = (0 ~ 1 ,0) , À3 = 1, el -3 = (OlO~l)1 =l=('~O,OJ' =2 = (0,1,0)' =3 = (0,0,1)' 36 2.25 Vl/2 "(: a) a 3 4/15 1/6 1 (: 0 0Jt 1 -1/5 4flJ (5 2 a -1/5, a 3 4/15 1/6 = (~: a) -.2 .26~ - 2 1 il .1'67 ' 1 67 i " ~:i'67 V 1/2 .e v 1/2 = b) 2.26 2 OJ ( 1 -1/5 4fl5J o 'if.= ,-1/5 1 1/6= a 1 ° OJ i5 -1 1/6 0 2 ° = -2/5 2 1 a a 3 4/5 1/2 4/3) (5 a 1/3 a 2 3 a 0 :J -2 4 n =f 1 1/2 i /2 P13 = °13/°11 °22, = 4/13 ¡q = 4l15 = ,2£7 b) Write Xl = 1 'Xl + O'X2 + O-X3 = ~~~. with ~~ = (1 ~O~O) 1 1 i , i 1 1 2 x2 + 2 x3 = ~2 ~ W1 th ~2 = (0 i 2' 2" J Then Var(Xi) =al1 = 25. By (2-43), ~ 1X 1X ,+ 1 2 1 .19 Var(2" 2 +2" 3) =':2 + ~2 =4 a22 + 4 a23 + '4 °33 = 1 + 2+ 4 15 = T = 3.75 By (2-45) ~ (see al so hi nt to Exerc,ise 2.28), 1 1 i 1 1 Cov(X, ~ 2Xi + 2 Xi) = ~l r ~2 = "'0'12 +"2 °13 = -1 + 2 = 1 ~o 37 1Xl +1'2 X2) = Corr(X1 ~ '2 2.27 , 1 COy(X" "2X, + '2X2) 1 .103 ~r(Xi) har(~ Xl + ~ X2) =Sl3 := a) iii - 2iiZ ~ aii b) -lll + 3iZ ~ aii + 9a22 - 6a12 c) iii + \12 + \13' d) ii, +~2\12 -. \13, + 4a22 - 4012 aii + a22 + a3i + 2a12 + 2a13 +2a23 aii' +~a22 + a33 + 402 - 2a,.3 - 4023 e) 3i1 - 4iiZ' 9a11 + 16022 since a12 = a . 38 2,31 (a) E¡X(l)J = ¡,(l) = ¡ :i (b) A¡,(l) = ¡ 1 -'1 1 ¡ ~ J = 1 (c) COV(X(l) ) = Eii = ¡ ~ ~ J (d) COV(AX(l) ) = AEiiA' = ¡i -1 i ¡ ~ n ¡ -iJ = 4 (e) E(X(2)J = ¡,,2) = ¡ n tf) B¡,(2) (~ -iJ ¡ n = ¡ n (g) COV(X(2) ) = E22 = ¡ -; -: J (h) COV(BX(2)) = BE22B' = ¡ ~ -~ J (-; -: J (-~ ~ J - (~: -~ J 0) COV(X(l), X(2)) = ¡ ~ ~ J (j) COV(AX(1),BX(2))=AE12B'=(1 -1) ¡~ ~J ¡ _~ n=(O 21 39 2,32 ~a) EIX(l)j = ILll) = ¡ ~ J (b) AIL(l) = ¡ ~ -~ J ¡ ~ J = ¡ -~ J (c) Co(X(l) ) = En = l-i -~ J td) COV(AX(l)) = AEnA' = ¡ ~ -¡ J ¡ -i -~ J L ~~ ~ J - ¡ i ~ J (e) E(Xl2)j = IL(;) = ( -~ J (f) BIL(2) = ¡ ~ ; -~ J ( -~ i = ¡ -; J (g) COV(X(2) ) = ~22 = 1 4 ( -1 6 10 -~1 i (h) COV(BX(2) ) = BE22B' , = U i -~ J (j ~ -~ J U -n 0) CoV(X(1),X(2)) = ¡ l ::J ~ J (j) COV(AX(l) i BX(2)) = AE12B' ¡ 12 9 J 9 24 40 - U j J H =l n (¡ j J - l ~ ~ J 2,33 (a) E(X(l)j = Li(l) = ( _~ J (b) Ati(l) = L î -~ ~ J ( _~ J - ¡ ~ J (c) Cov(X(l¡ ) = Eii = - ~ - ~ ( 4 i 6-i~J (d) COV(AX(l) ) = Ai:iiA' , ¡234) = (î -~ ~) (-¡ -~!J (-~ n - 4 63 (e) E(X(2)J = ti(2) = ¡ ~ ) (f) Bti(2) = ¡ ~ -î J ¡ ~ J = I ; ) (g) Co( X(2) ) = E" = ¡ ¿ n (h) CoV(BX,2) ) = BE"B' = U - î ) L ¿ ~ J D - ~ J - I 1~ ~ J 41 (i) COv(X(1),X(2))= -1 0 1 -1 ( _1 0 J ü) COV(AX(l), BX~2)) = A:E12B1 = ¡ 2 -1 0 J (=!O J 1 1 3 i1 -1 0 ¡ ~ - ~ J = ¡ -4,~ 4,~ J 42 2.34 bib = 4 + 1 + 16 + a = 21,-did - = 15 and bid = -2-3-8+0 = -13 (ÉI~)Z = 169 ~ 21 (15) = 315 2.35 bid - - biBb - = -4 + 3 = -1 = (-4, 3) = L: -:J (-:14 23) ( -~ J · 125 (-~ J 2/6 ) il ) d I B-1 d = (1~1) 2/6 11/6 = 2/6 1 ( 5/6 --' so 1 = (bld)Z s 125 (11/6)" = 229.17 2.36 4x~ + 4x~ + 6xix, = x'Ax wher A = (: ~). (4 - ).)2 - 32 = 0 gives ).1 = 7,).2 = 1. Hence the maximum is 7 and the minimum is 1. 2.37 From (2~51), max x'x=l - - X i Ax = max ~ 'A! ~13 = À1 ~fQ where À1 is the largest eigenvalue of A. For A given in 2.7 ~ Ài = 10 and -1 x I x Fl Exercise 2.6, we have from Exercise el . (.894, -,447), Therefore max xlAx = 10 andth1s maximum is attained for : = ~1. 2.38 Using computer, ).1 = 18, ).2 = 9, ).3 = 9, Hence the maximum is 18 and the minimum is 9, 43 2.41 (8) E(AX) = AE(X) = APX = m o OJ (b) Cav(AX) = ACov(X)A' = ALXA' = (~ 18 0 o 36 (c) All pairs of linear combinations have zero covarances. 2.42 (8) E(AX) = AE(X)= Apx =(i o OJ (b) Cov(AX) = ACov(X)A' = ALxA' = ( ~ 12 0 o 24 (c) All pairs of linear combinations have zero covariances. 44 Chapter 3 3.1 a) ~ = (:) b) ~, = ~, - i,! = (4 tOt -4) i ':2 = ~z - x2! = (-1 t '. 0) I c) et L = m. L = 12 ..1 :2 Let e be the angl e between .:, and :2' then èos ~e) ~ -4//32 x 2 = -.5 :, 22 ~2 Therefore n s" = L2 or $" = 32/3; n S = i2 or S22 = 2/3; n 5'2 = ~i':2 or slZ = -4/3. Also, riZ = cos (e) = -.S. Conse- quently S = and n -4/3 2/3R =-.5 1 (32/3 -413) "( 1" -.5) 3.2 a) g = (;J b) :1 = II - xl! = (-', 2, -11' ~2 = l2 - xz! = (3, -3, 0)' c) L =/6; L =11 ':1 ~2 Let e be the angle between ':1 and ~2' then eOs (e) = -9/16 x 18 = - .866 . Therefore n 31, = L!l or s" = 6/3 = 2; n 522 ~ l~ or szi = = 18/3 = "6; n ši.2 = :~ -:2 or :5'2 = -9/3 = -3. Also, r1Z = ~"s (e) = _ .8:6'6. Consequently So =( Z -3) and R= ( 1 - .8661 -3 '6 . -.86'6 1 J 45 3.3 xl ! II = (1, 4, 4)'; = (3,3. 3); Thus li = 4 = a) l'.(~ 1 .-- + 3 3 4 3.5 _2J 3 1 5 5 - l1& X :) ; 3 (: i :J _.) - ')' ~2J S .l6 0 =e -:)E ~;o1. -4 ( 32 1 -:J and l sIc: l2 -2 6 b) ii ! + (ll - xl l) 1 2S=(X-xl CX-xl .. .. so II - ii ! = (-2, 1, 1 J' i l' · (; 1: = , (34 -2 4 ;J 1 :J ~ ". ,. .. .. 2 S = (X - 1 x')' ( X-I ?) = so S =.. ( 3 -9/2 J -1 3 ~ -3 -1 0 2 (-31 -3 -:J e -9) = -9 18 and Isl = 2.7/4 -9/2 9 - N 3 -1 2 N 3.6 a) X'- 1 x' = r -~ ~ -~ J. Thus d'i = (-3, 0, -3), !2 = to, 1, -1) and ~/3 = (-3, 1,2) . Since,Ši = .92 = 23' the matrx of deviations is not offull ra. 46 15J -3 b) 2 S = (X -.. 1 X')' X-I-xl) = ( ~ ~ "i ( øw 15 So S = -3/2 1 -1/2 ( 15/2 9 -3/2. 2 -1 -1 l4 1'5/2) -1/2 7 . . Isl = 0 (Verify). The 3 deviation vectors lie in a 2-dimensional subspace. - The 3-dimensional volume ,enclosed by ~he deviation vectors 1 s zero. c) Total sample varia-nce = 9 + 1 + 7 = 17 . - 3.7 All e11 ipses are 'centered at i) For S = (: : J ' -x . -4/9J S";1 ~ (-:~: 519 Eigenvalue-normalized eigenv~ctor pairs for 5-1 are: À1 = 1. ;1 = (.707, -.707) À2 = 1/9, !~ = (.707, .7n7) Half lengths of axes .. - ..- of ellipse (x-x)'S-l(X-X) S 1 are l/Ir = 1 and l/~ = 3 respectively. The major axis of ell ipse 1 ies in the direction of ~2; the minor axis 1 ies in the direction of :1. if) For s= , s = -4 5 ( 5 -4) -1 . 4/9) 4/9 (5/9 Eigenvalue-normal ized eigenvectors for 5/9 5-1 Ài = 1. :~ = (.707. .707) i À2 = 1/9, ~2 = (.7~7, -.7Ð7) are: 47 Half l~ngths ~faxes of ell ipse (x.... - x)'S-l ... (x - x) ~ 1 are, of the again. l/lr = 1 and 1/1. = 3. The major axes ellipse li.es in the direction of ':2; the minor axis lieS' in the directi~n of =1. Note that ~2 here is =1 in "part (i) above and =1 here is =2 in part (i) above. o 3 0 l/3 iii) For S = (3 0),. S-l = (1/3 OJ Eigenvalue-normalized eigenvector pairs for 5-1 are: ).1 = 1 13; ~i = (1, 0) ).2 = 1/3, !~ = (0. lJ axes Half lengths of (x.... - x)' 5-1_.. (x - x) s 1 of ellipse are equal and given by l/ir = l/lr = 13. Major and minor " axes of ellipse can be taken to lie in the directions of the coordinate axes. Here, the salid ellipse.is, fn fact, a solid sphere. Notice for aii three cases 1s1 = 9. 3.8 a) Total sample variance in both cases is 3. 0 b) For S. G 1 0 Isl = 1 ~J. -1/2 For S =(-1~2 -1/2 1 - l/2 -1/2J -1/2 , 1 Isl = 0 48 3.9 (8) Vve calculate æ = (16,18,34 l and -4 -1 -5 2 2 4 Xc= -2 -2 -4 4 0 0 4 1 1 and we notice coh( Xc)+ coh( Xc) = cOli( Xc) so a = fl, 1, -1 J' gives Xca = O. (b) S = 1~ 2.~ 5~~ so S = -(13)2(2.5) _ 9(18.5) -55(5.5) = 0 13 5.5 18.5 " ( J I I 10(2.5)(18.5) + 39(15.5) + 39(15.5) As above in a) Sa = ( 3 ~ ;53 -= 5~~ J - ¡o~ J 13 + 5.5 - 18.5 ( c) Check. 3.10 (a) VVe calculate æ = (5,2,3 J' and -2 -1 -3 1 Xc= -1 2 0 2 0 -2 1 3 -1 and we notice coh( Xc)+ ~012( Xc) = cOli( Xc) 0 1 so a = iI, 1, -1 J' giv.es Xca = O. ~b) S =. 0 2.5 2.5 soI-S-(2.5)3 _ 0 -+(2.5)3 I - 5(2.5)2 0 + 0 = i) ( 2.52.5 2.5 .0 2.55 J Using the save coeffient vector a as in Part a) Sa = O. 49 (c:) Setting Xa = 0, 3ai + a2 = 0 7ai + 3ag = 0 so 5ai + 3a2 + 4ag = 0 ai 5ai g -"jag 3(3ai) + 4ag = 0 so we must have ai = as = 0 but then, by the first equation in the fil"t set, a2 = O. The columns of the data matrix are linearly independent. 1 4213 J 3.11 Con~equently S = 14213 i14808 15538 o) 09:70) ; 01/2 = (121 ~6881 R = 124 .6515 ( 09:70 and 0-1/2 = (" 0:82 00:0 J The relationships R = 0-1/2 S 0-1/2 and S = 0'12 R 01/2 can now be verifi ed by direct matrix multiplication. 50 3.14. a) From fi rst pri nciples we hav.e f ~l · (2 3) (~J' 21 - Similarly Ë' ~2 = 19 and Ë' ~3 = 8 so sample mean = 2l+19+8 = 16 3 sampl~ vari ance = (21_16)1+(19-16)2+(8-16)2 = 49 2 I Also :' ~1 · (-1 2) (~J = -7; C -2 X= _ 1 and :' ~3 = 3 so sampl e mean = -1 sampl~ variance = 28 Finally sample covariance = (21-16)(-7+1)+(19-16)(1+1)+(8-16)(3+1) = -2.8. b) ~-I= (5 . 2) Using (3-36) and S · ( ~: -12 J 51 sample mean of b' X =~' ~. (2 3) (:1 = 16 sample mean of :' ~ = (-1 2) (:1.-1 sample variance of b' X · ~' S~ · (2 3) e: -121(: 1 = 49 sample variance of C' X = :' S:.' (-1 21 C: -121 (";1 · .28 '" sample covariance of .. b' X.. and.. c' X :b'Sc=(23) " . =-28 - -, l6 -2-2J1 (-11 2 Resul ts same as those ; n part (a). 3.15 -2.5 E · (;1. S = -2.5 1.5 -1.5 (13 1 sampl e mean of -b.-X= 12 sample mean of c. X = -1 - samp1 e variance of b' X = l2 sampl e vari ance of c' X = 43 sample covariance of b' X and c' X = -3 1.SJ -1.5 3 52 3.16 S 1 nee tv =E(~ -~V)(~ -~V)' I , I I) = E(~ - ~V - ~V~ +~VJ:V , 'E(V' )" , ,,,, ;: E(~ ) - E(~)!:V - ~V _ +~V!:V :: E(~ ) - !:VJ:V -: !:V~V + ~V!:V = E(~') - !:V!;V. ' we have E(VV') = * + !;V!;~ . 3.18 (a) Let y = Xi+X2+X3+X4 be the total energy consumption. Then y=(1 1 1 l)x=1.873 , s~ =(1 1 1 I)S(1 1 1 1) =3.913 (b) Let y = Xl -X2 be the excess of petroleum consumption over natural gas consumption. Then y=(i -1 0 0)x=.258 , s~ =(1 -1 0 O)S(1 -1 0 0) =.154 S3 Chapter 4 4.1 (a) We are given p = 2 i 2 -.8 x V2i J ¡i=(;J E=¡ -.8 x J2 50 I E I = .72 and E-1 = ~: i (i 1 2 2V2 2 ) (27l) .72 2 .72.9 .72 ( 4: i V2) :7 I(:i) = V: exp -- ( -(Xi - 1) + -(Xl - i)(X2 - 3) + -(X2 - 3)2) 1 ( )2 2V2( 2 2 .72.9 .72 -(b) - Xl - 1 + - Xl - 1)(x2 - 3) + -(X2 - 3) 4.2 ta) We are given p = 2 , I' = (n E =( 2 1 V2 and 2 L-l = v' ~) so I E I = 3/2 -4 . V2"J 2 -T :i = (27l)'¡3/2 3Xi -23Xl(X2 + 3 X2 - 2) ) I( ) i (exp1-2"(2 2V2- 2)4()2 (b) 2 2 2V2 4 2 3 3 3 -Xi - -Xi~X2 - 2) + -(X2 - 2 ) ~c) c2 = x~(.5) = 1.39. Ellpse centered at (0,2)' with the major ax liav- in, haif-length .¡ c = \12.366\11.39 = 1.81. The major ax lies in the direction e = I.SSg, .460)'. The minor axis lies in the direction e =i-Aß-O , .B81' and has half-length ý' c = \I;ô34v'1.S9 = .94. 54 Constant density contour that contains 50% of the probability oc? I. ~ C' x ~0 ..I. ..o -3 -2 -1 o 1 2 3 x1 4.3 We apply Result 4.5 that relates zero covariance to statisti~a1 in- dependence a) No, 012 1 0 b) Yes, 023 = 0 c) Yes, 013 = 023 = 0 d) Yes, by Result 4.3, (X1+XZ)/Z and X3 are jointly normal and their covariance is210 = 0. (0 , 1 +1a. 3 2 ¿3 e) No, by Result 4.3 with A = _~ 1 to see that the covari anc.e i~ 10 and not o. _ ~ ), form A * A i ss 4.4 a) 3Xi - 2X2 + X3 is N03,9) b) Require Cov (X2,X2-aiXi-a3X3) = : - a, - 2a3 = O. Thus any ~ i = tai ,a3J of the fonn ~ i requirement. As an example, 4.5 = (3-2a3,.a3J wi 11 meet the -a' = (1,1). a) Xi/x2 is N(l'(XZ-2),~) b) X2/xi ,x3 is N(-2xi-5, 1) c) x3lxi ,x2 is N(¥x1+X2+3) ,!) 4.6 (a) Xl and X2 are independent since they have a bivariate normal distribution with covariance 0"12 = O. (b) Xl and X3 are dependent since they have nonzero covariancea13 = - i. ~c) X2 and X3 are independent sin-ce they have a bivariate normal distribution with covariance 0"23 = O. (d) Xl, X3 and X2 are independent since they have a trivariate normal distribution where al2 = û and a32 = o. te) Xl and Xl + 2X2 - 3X3 are dependent since they have nonzero covariance au + 20"12 - 3a13 = 4 + 2(0) - 3( -1) = 7 4.7 (a) XilX3 is N(l + "&(X3 - 2) , 3.5) .(b) Xilx2,X3 is N(l + .5(xa - 2) ,3.5) . Since)(2 is independent of Xi, conditioning further on X2 does not change the answer from Part a). S6 4.16 (a) By Result 4.8, with Cl = C3 = 1/4, C2 = C4 - -1/4 and tLj = /- for . j = 1, ...,4 we have Ej=1 CjtLj = a and ( E1=1 c; ) E = iE. Consequently, VI is N(O, lL). Similarly, setting b1 = b2 = 1/4 and b3 = b4 = -1/4, we find that V2 is N(a, iL). (b) A.gain by Result 4.8, we know that Viand V 2 are jointly multivariate normal with covariance 4 (1 1 -1 1 1 -1 -1 -1 ) ( L bjcj ) L = -( -) + -( - ) + -( -) + -( - ) E = 0 j=1 4 4 4 4 4 4 4 4 That is, ( ~: J is distributed N,p (0, (l; l~ J ) so the joint density of the 2p variables is I( v¡, v,) = (21l)pf lE I exp ( - ~(v;, v; J (l; l~ r (:: J ) 1 . (1 i -1 i -l ) ) = (27l)pl lE I exp - s( VI E Vl + V2 E V2 4.17 By Result 4.8, with Cl = C2 = C3 = C4 = Cs = 1/5 and /-j - tL for j = 1, ...,5 we find that V 1 has mean EJ=1 Cj tLj = tL and covariance matrix ( E;=1 cJ ) .L = lL. Similarly, setting bi = b3 = bs = 1/5 and b2 = b4 = -1/5 we fid that V2 has mean ì:;=i bj/-j = l/- and covariance matrix ( ¿:J=1 b; ) L = fE. Again by Result 4.8, we know that Vi and V2 have covariance 4 (1 1 -1 1 1 1 -1 1 1 1) 1 (~b'c.)L= -( -)5+-5.(-)5.Jl -( - )+-( .;; J 1 "5 5 -)5+-( 5"5 '5 5-) ~=-E 25 57 4.18 By Result 4.11 we know that the maximum 1 He1 i hood estimat.es of II and and t are x = (4,6) i 1 L - -)' n _n j=l (x.-x)(x.-x = t tmH~J)(m-mHm-(~J)((:1-(m' -J - -J.(GJ-m)(~H~J) '.((~J-m)mimn . = t tc~J Gi aj.~¥o -i).m(i j) .(~JfP 1)1 b) From (4-23), ~ - N~(~,io t). Then ~-~ - N~(~,io t) and finally I2 (~-~) - Nô(~,t) c) From (4-23), 195 has a Wishart distribution with 19 d.f. 4.20 8(195)B' is a 2x2 matrix distributed as W19('1 BtBt) with 19 d.f. where 1 1 1 1 1 1 1 1'1 1 1 i a) BtB i has (1,1) entry =011 + ~22 + tf33 - 012 - G13 + Z'23 l' (1 ,2) entry = -r14 of :t.24 +tf34 -'ZOlS +:tZ5 +-r35 +?'l ô - za26 - f13'ô (2,2) entry = 0ô6 + :t55 + tf44 - °46 - °S6 + zc45 °131 . b) stB' °31 =l °11 G33J S8 4.21 (a) X is distributed N4(J.1 n-l~ ) (b) Xl - J- is distributed N4\OI L ) so ( Xl - J. )'L-1( Xl - J. ) is distributed as chi-square with p degrees of freedom. (c) Using Part a)i ( X - J. )'( n-1L )-l( X - J. ) = n( X - Jl )'~-l( X - J. ) is distributed as chi-square with p degrees of freedom. (d) Approximately distributed as chi-square with p degrees of freedom. Since i L can be replaced by S. the sample size is 1 arge 59 4.22' a) We see that n = 75 is a sufficiently lar"ge sample (compared with p)and apply R,esult 4.13 to get Iñ(~-!:) is approximately Hp(~,t) and that ~ is approximately Np(~'~ t). c i -1(- ) By (4-28) we ~onclude that ýn(X-~) S ~-~ is approximately b) X2 p. 4.23 (a) The Q-Q plot shown below is not particularly straight, but the sample size n = 10 is small. Diffcult to determine if data are normally distributed from the plot. Q-Q Plot for Dow Jones Data 30 . . . . 20 . -C 10 . )C . 0 . . -10 . -20 -2 -1 0 1 q(i) (b) TQ = .95 and n = 10. Since TQ = .95 ~ .9351 (see Table 4.2), cannot reject hypothesis of normality at the 10% leveL. 2 60 4.24 (a) Q-Q plots for sales and profits are given below. Plots not particularly straight, although Q-Q plot for profits appears to be "straighter" than plot for sales. Difficult to assess normality from plots with such a small sample size (n = 10). Q-Q Plot for Sales 300 250 a.'I. 200 ~ . 150 . 100 50 -2 -1 o 2 1 q(i) .. Q"4 P1Ót for l)rofits . . lS . 10 -2 ~1 o 1 2 q(i) (b) The critical point for n = i 0 when a = . i 0 is .935 i. For sales, TQ = .940 and for profits, TQ = .968. Since the values for both of these correlations are greater than .9351, we cannot reject normality in either case. 61 4.25 The chi-square plot for the world's largest companies data is shown below. The plot is reasonably straight and it would be difficult to reject multivariate normality given the small sample size of n = i O. Information leading to the construction of this plot is also displayed. 5 4 1i is 3 g '" l! u 2 '! o 1 o o 2 1 3ChiSqQuantii. 4 5 6 303.6 -35576 J x = 14.7 S = 303.6 710.9 (155.6J (-35576 7476.5 Ordered SqDist .3142 1.2894 1.4073 1.6418 2.0195 3.0411 3.1891 4.3520 4.8365 4.9091 26.2 -1053.8 -l053.8 237054 Chi-square Ouantiles .3518 .7978 1.2125 1.6416 2.1095 2.6430 3.2831 4.1083 5.3170 7.8147 7 8 62 x=( 12.48 5.20J s=( 10.6222 -17.7102 J s-I 1.2569 =(2.1898 .7539 1.2569 4.26 (a) ' -17.7102 30.8544' J Thus dJ = 1.8753, 2.0203, 2.9009, .7353, .3105, .0176, 3.7329, .8165, 1.3753, 4.2153 50% contour. (b) Since xi(.5) = 1.39, 5 observations (50%) are within the (c) The chi-square plot is shown below. CÍ1i~squåre pløt for . . .. . . 2 (d) Given the results in pars (b) and (c) and the small number of observations (n = 10), it is diffcult to reject bivarate normality. 4.27 q-~ plot is shown below. 63 * * * 100. * 2 2 2 so. *".. *3* **2* :;3 3 2* 2* 60.' 40. * * * * * :\ 20. \ -2. S i I -1.S -0.5 0.5 i.5 \'a(i) %.5 Since r-q = .970 -i .973 (See Table 4..2 for n = 40 and .a = ..05) t we would rejet the hypothesis of normality at the 5% leveL. 64 4.29 (a). x = (~~4~:~~:~)' s = (11.363531 3~:~~:~~~). Generalized distances are as follows; 2.3771 0.8162 1 . 6283 0.4135 o . 47£ 1 1. 1849 1.3566 o .6228 5.6494 o . 8988 4. 7647 3.0089 6.1489 1 .0360 2.2489 3.4438 0.1901 o .4607 1 .8985 2 .7783 8.4731 1.1472 0.6370 o . 6592 O. 1388 7 . 0857 o . 7032 0.3159 2.7741 0.8856 o .4607 o . 6592 10.6392 0.4135 1.0360 0.1388 0.1225 o . 7874 O. 1380 O. 1225 1 .4584 1. 80 14 (b). The number of observations whose generalized distances are less than X2\O.ti) = 1.39 is 26. So the proportion is 26/42=0.6190. (c). CHI-SQUARE PLOT FOR (X1 X2) 8 w a: 8 ~ ~ 4 c 2 0 0 2 4 6 8 10 ~saUARE 4.30 (a) ~ = 0.5 but ~ = 1 (i.e. no transformation) not ruled out by data. For ~ = 1, TQ = .981 ~.9351 the critical point for testing normality with n = 10 and a = .10. We cannot reject the hypothesis of normality at the 10% level (and, consequently, not at the 5% level). (b) ~ = 1 (i.e. no transformation). For ~ = 1, TQ = .971 ~.9351 the critical point for testing normality with n = 10 and a = .1 O. We cannot reject the hypothesis of normality at the 10% level (and, consequently, not at the 5% level). (c) The likelihood function 1~Â" --) is fairly flat in the region of Â, = 1, -- = 1 so these values are not ruled out by the data. These results are consistent with those in parts (a) and (b). n-n niot~ follow 65 4.31 The non-multiple-scle"rosis group: X2 X3 X4 Xs 0.96133Xi3.S 0.95585(X3 + 0.005)°.4 0.91574X¡3.4 0.94446- Xl X2 X3 0.91137 0.97209 0.79523- X4 0.978-69 Xs 0.84135- Xi 0.94482X-o.s 1 rQ (Xs + 0.'(05)°.32 Transformation *: significant at 5 % level (the critical point = 0.9826 for n=69). The multiple-sclerosis group: rQ - - - (X5 + 0.005)°.21 (X3 + 0.005)°.26 Transformation *: significant at 5 % level (the critical point = 0.9640 for n=29). Transformations of X3 and X4 do not improve the approximaii-on to normality V~l"y much because there are too many zeros. 4.32 Xl X2 X3 X4 rQ 0.98464 - 0.94526- 0.9970 0.98098- Transformation (Xl + 0.005)-0.59 x.¡0.49 *: significant at 5 % level - XO.2S 4 X6 Xs 0.99057 - 0.92779(Xs + 0.ûå5)0.Sl (the critical point = O.USïO for n=98). 4.33 Marginal Normality: rQ Xl X2 0.95986* 0.95039- X3 0.96341 X4 0.98079 *: significant at 5 % level (the ci"itical point = 'Ü.9652 for n=30). Bivariate Normality: the X2 plots are (X31 X4) appear reasonably straight. given in the next page. Those for (Xh X2), (Xh X3), 66 CHI-SQUARE PLOT FOR (X1,X3) CHI-SQUARE PLOT FOR (X1,X2) 8 8 6 w " ~ .¿ ~ ~ 8 i: c " is is 2 2 0 0 2 0 " 8 6 CHI-SQUARE PLOT FOR (X2,X3) CHI-SQUARE PLOT FOR tX1 ,X4) 8 8 6 w a: w ~ c ~.¿ " :f 6 " is CJ 2 2 0 0 0 2 " 8 8 " 2 0 12 10 8 10 12 CHI-SQUARE PLOT FOR (X3,X4) CHI-SQUARE PLOT FOR (X2,X4) 8 8 8 w 6 a: c ~ i: ~ 8 e-SOARE e-SOARE w 10 e-SORE e-SOUARE i: c~ 8 6 " 2 0 10 " " :f (. :f CJ 2 2 0 0 0 5 10 e-SOUARE 15 0 2 " e-SORE 6 a 67- 4.34 Mar,ginal Normality: Xl rQ. 0.95162- X2 X3 X4 0.97209 0.98421 0.99011 Xs 0.98124 X6 0.99404 *: significant at 5 % level (the critical point == 0.9591 for n==25). Bivariate Normalitv: Omitted. 4.35 Marginal normality: & (MachDir) X;i ,(CrossDir) Xl (Density) .991 .924* rQ I .897* * significant at the 5% level; critical point = .974 for n = 41 From the chi-square plot (see below), it is obvious that observation #25 is a multivariate outlier. If this observation is removed, the chi-square plot is considerably more "straight line like" and it is difficult to reject a hypothesis of multivariate normality. Moreover, rQ increases to .979 for density, it is virtually unchanged (.992) for machine direction and cross direction (.926). Chi-square Plot 3S :l 25 :! 15 Chi-square Plot without observation 25 10 6 10 12 2 4 6 B 10 12 68 4.36 Marginal normality: 100m 200m 400m 800m rQ I .983 .976* .969* .952* 1500m 3000m Marathon .909* .866* .859* * significant at the 5% level; critical point = .978 for n = 54 Notice how the values of rQ decrease with increasing distance. As the distance increases, the distribution of times becomes increasingly skewed to the right. The chi-square plot is not consistent with multivariate normality. There are several multivariate outliers. 4.37 Marginal normality: 100m 200m 400m 800m rQ I .989 .985 .984 .968* 1500m 3000m Marathon .947* .929* .921* * significant at the 5% level; critical point = .978 for n = S4 As measured by rQ, times measured in meters/second for the various distances are more nearly marginally normal than times measured in seconds or minutes (see Exercise 4.36). Notice the values of rQ decrease with increasing distance. In this case, as the distance increases the distribution of times becomes increasingly skewed to the left. The chi-square plot is not consistent with multivariate normality. There are several multivariate outliers. 69 4.38. Marginal and multivariate normality of bull data Normaliy of Bull Data A chi-square plot of the ordered distances o C\ r:l/ .¡ .. 'C CI ~0 -ë .. o lt . ..' ~'" .... . . 2 4 6 8 10 12 14 16 18 qchisq .. I/ r = 0.9916 normal 00 ..C' C\ _I/ 01 "8 0 ;; 0I/ ~ :: u. II 0010 =- :i ai 0 .. -2 .1 0 2 1 -2 Quantiles of Standard Norml 0 II not nonnal r = 0.9631 0 -1 Quantlles of 1 2 Standard Nonnal I/ c: r = 0.9847 nonnal r = 0.9376 not nonnal .. c: ai I/ u. iu. ~ a. ~~ oX 0 ai 0i- C\ .0 .. d lt co -2 0 co II 0 1 Quantiles of Standard Nonnal -1 2 . ... .2 -1 0 1 2 Quantiles of Standard Nonnal 00 ..01 r = 0.9956 normal lt r = 0.9934 normal 00 _ is: .. Gl _I/ :i CI co Æ ;g ¡¡ 00 en I/ .. C\ I/ 0 I/ 00 ..C' -2 -1 0 1 Quantiles of Standard Nonnal 2 -2 -1 0 1 Ouantiles of Standard Norml 2 70 XBAR S FtFrBody 100.1305 8594.3439 2 . 9600 209.5044 -0 .0534 -1. 3982 2.9831 129.9401 82.8108 6680. 3088 YrHgt 5-0.5224 995.9474 70.881-6 0.1967 54. 1263 1555.2895 1 2 3 4 5 6 7 8 2 . 9980 100 . 1305 Ordered dsq qchisq 1 . 3396 0.7470 1. 7751 1.1286 1 . 7762 1.3793 2.2021 1 .5808 2.3870 1.7551 2.5512 1 . 9118 2.5743 2.0560 2.5906 2.1911 2. 7604 2.3189 3.0189 2.4411 3 . 0495 2.5587 9 10 11 12 3 . 2679 13 3.2766 14 3.3115 15 3.3470 16 3 . 3669 17 3.3721 18 3.4141 19 3 . 5279 2 .6725 2.7832 2.8912 2.9971 3.1011 3 . 2036 3 . 3048 3.4049 20 3.5453 3 . 5041 21 3 . 6097 3 .6027 22 23 24 25 3.6485 3.6681 3 . 7007 3 . 7236 3. 7983 3. 8957 3.7395 3.9929 3.4142 -0.0506 SaleHt SaleWt 2.9831 82.8108 129.9401 6680 . 3088 3.4142 83 .9254 -0.0506 2.4130 4.0180 147.2896 83.9254 2.4130 147.2896 16850.6618 PrctFFB BkFat 2 . 9600 -0.0534 209.5044 -1.3982 10.6917 -0.1430 -0.1430 Ordered dsq qchisq 26 3.8618 4 .0902 27 3 . 8667 4.1875 28 3 .9078 4.2851 29 4.0413 4 .3830 30 4.1213 4.4812 31 4. 1445 4.5801 32 4 . 2244 4.6795 33 4.2522 4 . 7797 34 4.2828 4 . 8806 35 4.4599 4.9826 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 4. 7603 4. 8587 5. 1129 5 . 1876 5.2891 5 . 0855 5. 1896 5 . 2949 5 .4017 5 .5099 5 . 3004 5.6197 5.3518 5 . 7313 5 .4024 5 .8449 5 .9605 5.5938 6.0783 5.6333 6.1986 5 .7754 6.3215 6.2524 6.4472 5 .6060 6 . 3264 6.57'60 6.6491 6.7081 o .0080 Ordered dsq qchisq 51 52 53 54 55 56 57 58 59 6 . 6693 6 . 8439 6.6748 6 .9836 6 .6751 6.8168 7 . 1276 7 . 2763 6 . 9863 7 .430 1 7. 1405 7 .5896 7 . 1763 7 . 7554 7.4577 7 .9281 7.5816 8.1085 7 .6287 8.2975 60 61 8 . 0873 8 . 4963 62 8 .6430 8 .7062 63 8 . 7748 8 .9286 64 8.7940 9. 1657 65 9.3973 9.4197 66 9 . 3989 9.6937 67 9 .6524 9.9917 68 10.6254 10.3191 69 10.6958 10.6829 70 10.8037 11. 0936 71 10.9273 11.5665 72 11.3006 12.1263 73 11.321$ 12.8160 74 12.4744 13.7225 75 17.6149 15.0677 76 21.5751 17.8649 From Table 4.2, with a = 0.05 and n = 76, the critical point f.or the Q - Q plot correlation coeffcient test for normality is 0.9839. We reject the hypothesis of multivariate normality at a = 0.05, because some marginals are not normaL. 71 4.39 (a) Marginal normality: independence support benevolence conformity leadership .997 .984* .993 .997 rQ I .991 * significant at the 5% level; critical point = .990 for n = 130 (b) The chi-square plot is shown below. Plot is straight with the exception of observation #60. Certainly if this observation is deleted would be hard to argue against multivariate normality. Chi-square plot for indep, supp, benev, conform, leader 15 . . ... 10 . ~& .. .. ".. du)A2 5 .. o o 2 4 6 8 10 12 14 16 18 q(u-.5)/130) (c) Using the rQ statistic, normality is rejected at the 5% level for leadership. If leadership is transformed by taking the square root (i.e. 1 = 0.5), rQ = .998 and we cannot reject normality at the 5% leveL. 72 4.40 (a) Scatterplot is shown below. Great Smoky park is an outlier. o G..u;t .;..01/.0': 500 o -... .' . cO "Visitors 5 6 8 7 9 (b) The power transformation -l = 0.5 (i.e. square root) makes the size observations more nearly normaL. rQ = .904 before transformation and rQ = .975 after transformation. The 5% critical point with n = 15 for the hypothesis of normality is .9389. The Q-Q plot for the transformd observations is given below. 10 -1 1 (c) The power transformation ~ = 0 (i.e. logarithm) makes the visitor observations more nearly normaL. rQ = .837 before transformation and rQ == .960 after transformation. The 5% critical point with n = 1"5 for the hypothesis of normality is .9389. The Q-Q plot for the transformed observations is .given next. 73 (d) A chi-square plot for the transformed observations is shown below. Given the small sample size (n = i 5), the plot is reasonably straight and it would be hard to reject bivarate normality. .. _.,..,.'...... -, ....,... .........'..,.... ".. .. ",-,-,-, .......:.,. transformed nat Chi-square plot for .. o Ð 1 .. 2 3 4 5 Chi~square quantiles 6 7 74 4.41 (a) Scatterplot is shown below. There do not appear to be any outliers with the possible exception of observation #21. (b) The power transformation ~ = 0 (i.e. logarithm) makes the duration observations more nearly normaL. TQ = .958 before transformation and TQ = .989 after transformation. The 5% critical point with n = 25 for the hypothesis of normality is .9591. The Q-Q plot for the transformed observations is given below. Dutå60n Q-QPlatfotNatural Log 3.0 2.5 (I 5' o~ 2.0 1.5 1.0 -2 -1 o q(i) 1 2 7S (c) The power transformation t = -0.5 (i.e. reciprocal of square root) makes the man/machine time observations more nearly normaL. rQ = .939 before transformation and rQ = .991 after transformation. The 5% critical point with n = 25 for the hypothesis of normality is .9591. The Q-Q plot for the transformed observations is given next. ManlMachinl! Time ,QeQ',plot for Reê:iprocal of Square Root of .. .. . . . . .. .2 -1 o 2 1 q(i) (d) A chi-square plot for the transformed observations is shown below. The plot is straight and it would be difficult to reject bivariate normality. Ci., ",:,-,..,::'_',::-...','__d"":,'d,.'.' _ ',' "',, , ' ,,:':::;_:,":':'"::-_'..,.,..,'.:.;c_',:::,.,:;,',"',""" ._,_, _ "".:;..:',_'"':::',, -,-," 'J;'g'~ Chi-square plot for transformed,snow,rernovat data ...0 0'0 10 8 6 o ...... o ..... . .. . .. 2 . . . 3 4 5 Chi-squa.~ quanti 6 7 8 7-6 Chapter 5 -i:13). 5.1 .) ~ " (i60) ; s" (-i-:/3 f2 = 1 SO/ll = 13.64 b) T2 is 3ri,2 (see (5-5)) c) HO :~. ~ (7,11) a =- .05 so F2,2(.05~ - 19.00 Since T2 _ 13.64 (;' 3FZ,2(.05) = 3(19) =57; do not reject H1l at the a - .05 1 eve1 n (n-1)!.J.g1 (:j-~O)(:j-~O)'! 5.3 a) TZ ;. n ,- - (n-1) = 3(~4) - 3 = 13:64 !j=l r (x.-i)(xJ.-i)'! . -J - - - b) li - (I Jïi (~j-~)(~j-~) 'I 'r = 244 =.0325 (44)2 - I j~i (~r~~H~j-~o)' i/ Wil ks i 1 ambda '" A2/n = A1/Z = '.0325 - .1803 5.5 HO:~' = (.'5'5,;60); TZ = 1.17 a -.05; FZ ,40( .05) ;. 3.23 Since TZ '" 1.17 (; 2~~) F2,40( .05) =- 2.05(3.23) = 6.62, we do not reject HO at the a" .05 level. The r,esult is ~onsistent with the 9Si confidence ellipse - for ~ pi~tured in Figure 5.1 since \11 = (.'55,.60) is inside the ellipse. 77 -1(- ) el X-\1 .: - CI-S -0 5.8 f227.273 -181.8181 t18L818 212.121 J .603 .60 ((.Sti4 J -( .5'5 J ) -1.909 = (2.636 J tZ = n(~'(~-~O))Z = a' - SA - 42(~.£,'3L. -1.9a'"J . ) (.014) .003 2 = 1.31 = TZ r2.636 -1 9091 .(.0144 .01 i71f2.6361 1. ':J .0117 .0146jL-i.909j 5.9 a) Large sample 95% T simultaeous confdence intervals: Weight: (69.56, 121.48) Girt: (83.49, 103.29) Body leng: (16.55, 19.41) lengt: (152.17, 176.59) Head Neck: (49.61, 61.77) Head width: (29.04, 33.22) b) 95% confidence region determined by all Pi,P4 such that (95.52 - ,up93.39 ~93.39 12.59/61 = .2064 - .006927--,u4 .019248 - P4 L,002799 .006927J(9S.52 -Pi) Beginng at the center x' = (95.52,93.39), the axes of the 95% confidence ellpsoid are: major axis ., minor axis :t .J3695.52.Ji 2.5 9(' 939) .343 :t .J45.92.J12.59(- .343) .939 (See confidence ellpsoid in par d.) c) Bonferroni 95% simultaneous confidence intervals (m = 6): 160 (.025 / 6) = 2.728 (Alternative multiplier is z(.025/6) = 2.638) Weight: . (75.56, 115.48) Gii1h: (86.27, 100.51) Body lengt: (155.00, 173.76) Head length: (16.~, 19.0g) Neck: (51.01, 60.37) Head width: (29.52, 32.74) d) Because ofthe high positive correlation between weight (Xi) and girt~X4), the 95% confidence ellpse is smaller, more informative, than the 95% Bonferroni rectangle. 78 5.9 ,Continued) Large sample 95% confidence regions. 0 .... large sample simultaneous Bonferroni ..0 LO 0 ..0 "' -- --- -~ -------- - - - - ---- - ---- LO )( C' 0 C' I : , i I : i : :, I. - - - - - - - - - - - - - - - - -. LO CD I . . . . .. .. . . . . .'. .. . . . .. . ... . . . . . . . . . -' . . . . . . . . . . . . "l . . . . . . ~ 0 CD 60 70 80 100 90 110 120 130 x1 e) Bonferroni 95% simultaneous confidence interval for difference between mean head width and mean head lengt (,u6 - tls ) follows. (m = 7 to allow for new statement and statements about individual means): t60 (.025/7) = 2.783 (Alternative multiplier is z(.025/7) = 2.690) n 61 x6 -xs :tt60 . J - (0036) S66 - 2sS6 + sss = (31.13 -17.98) +_ 2.78~~2i.26 -2(13.88) + 9.95 or 12.49:: tl6 -,us:: 13.81 79 5.10 a) 95% T simultanous confidence intervals: Lngt: (13D.65, 155.93) Lngt4: (160.33, 185.95) Lngt3: (127.00, 191.58) Lngt5: (155.37, 198.91) b) 95% T- simultaneous intervals for change in lengt (ALngt): ~Lngth2-3: (-21.24, 53.24) ~Lngt-4: (-22.70, 50.42) ~Lngth4-5: (-20.69, 28.69) c) 95% confidenceregon determined by all tl2-3,tl4-S such that . 16-tl2_3,4-tl4_s ( i.Oll024 .009386J(16.- ~72.96/7=10.42 ,u2-3) .009386 .025135 4 - ,u4-S where ,u2-3 is the mean increase in length from year 2 to 3, and tl4-S is the mean increase in length from year 4 to 5. the 95% confidence Beginnng at the center x' = (16,4), the axes of ellpsoid are: maior axis .~~.895) :tv157.8 72.96. - .447 mior axis :t .J33.53.J72.96 . ( .895 447) (See confidence ellpsoid in par e.) d) Bonferroni 95% simultaneous confdence intervals (m = 7): Lngt: (137.37, 149.21) Lngt4: (167.14, 179.14) Lngth3: (144.18, 174.40) Lngth5: (166.95, 187.33) ..6Lngth2-3: (-1.43, 33.43) i1Lngth4-5: (-7.55, 15.55) i1Lngt3-4: (-3.25, 30.97) '80 -5.10 (Continued) e) The Bonferroni 95% confidence rectangle is much smaller and more informative than the 95% confidence ellpse. 95% confidence regions. o "l simultaneous T"2 o C" Bonferroni ...... .......... .................................................... 0C\ ., vI ,.0 :: 0 0,. I 0C\ I I I I I I I I I I I . I ; I : I ; I ; : I i : I I I --~--------- ----------------~ .. . , . .~. . ... .. , ... , , . , . . . J. . . . . .. . .. , . . . , , . . . . , . . , , , . , .. . . . -20 o 20 J.2-3 40 81 5.11 a) E' =- (5.1856, 16.0700) S = (176.0042 . 287.2412J; S-1 =( 287.2412 527.8493 .0508 ~ .0276 J -.0276 .0169 Eigenvalues and eigenvectors of S: , ,t = 688.759 ~ -- A .42 = 15.094 £1 = (.49,.87) , i. = (.87,-.49) §i 16 Fp,n_p(.10) =: 7 F2.7(.10) = T (3.26) = 7.45 Confidence Region 45 40 35 ~ L. V) 'O N )( !' 15 I 20 25 30 35 40 45 -10 - , I -10 J x1 ( C r ) b) 90% T intervals for the full data set: Cr: (-6.88, 17.25) Sr: (-4.83, 36.97) (.30, 1 OJ' is a plausible value for i. .r- 82 5.11 (Continued) c) Q-Q pJotsfor the margial distributions of both varables . oi 30 020 o. ...... 10 . -l. -UL .0.5 0.0 os 1.0 1.5 nomscor normty for ths varable at a = 0.01 Since r = 0.627 we rejec the hypothesis of 80 . 7I eo 50 u; 40 30 20 10 0 . -1. .. . .. .. -1.0 .0.5 0.0 0.5 1.0 1.5 nomsrSr Since r= 0.818 we rejec the hypothesis of this varable at a = 0.01 normty for d) With data point (40.53, 73.68) removed, ii = (.7675, 8.8688); .3786 S =b r .0303 1.0303 J 69.8598. S-1·(2.7518 - .0406 -.0406 J . 0149 -T p1n-p 'I 1. F (.10)= 7(62t F" 6(.10) '" 164 (3.4'6) ~. 8.07 90% r intervals: Cr: (.15, 1.9) Sc: (.47, 17.27) 83 5.12 Initial estimates are 2 1.5 'ß - 6, ~ - 2.0 0.0 . ( 4 i - - (0.5 0.0 0.5 i The first revised estimates are 'ß = 6.0000 , E = 2.500 0.0. 2.2500 1.9375 ( 4.0833 i -( 0.6042 0.1667 0.8125 i 5.13 The X2distribution with 3 degrees of freeom. Bonferroni interval = tn_i(a/2)/tn_i(a/2m). 5.14 Length of one-at-a time t-interval / Length of n 2 15 0.8546 25 0.8632 -50 0.8691 100 0.8718 00 0.8745 m 4 10 0.7489 0.7644 D.7749 0.7799 0.6449 0.6678 0.6836 0.6911 0.6983 "0.7847 5.15 (0). E(Xij) = (l)Pi + (0)(1 - Pi) = Pi. Var(Xij) = (1 - pi)2pi +(0 - p¡)2(1 - Pi) = Pi(1 - Pi) (b). COV(Xij, Xkj) = E(XijXik) - E(Xij)E(Xkj) = 0 - PiPIi =-PiPk. 5.16 (6). Using Pj:: vx3.(0.05)VPj(1 - pj)ln, the 95 % confidence intervals for Pi, P2, 11, P4, Ps are (0.221, 0.370),(0.258, 0.412), (0.098, 0.217), (0.029, 0.112),\0.084, .a.198) respectively. (b). Using Pi - ßi :l Vx3.(0.05)V(pi(1 - ßi) + ßi(1 - ßi) - 2ßiPi) In, the 95 % confdence interval for Pi - P2 is (-0.118, 0.0394), There is no significant difference in two proportions. 5.17 ßi = 0.585, ßi = 0.310, P3 = 0.105. Using Pj:l vx'5(O.-D5)VPj(1 - Pi)fn, the 95 %.confidence intervals for Pi, P2, 11 are "(0.488, 0.682), (0.219, 0.401), ('0.044, 0.lô6), respectively. 84 5.18 \lo). Hotellng's T2 = 223.31. The critical point for the statistic (0: = 0.05) is 8.33. We reject Ho : fl = (500,50,30)'. That is, The group of students represented by scores are significantly different from average college students. (b). The lengths of three axes are 23.730,2.473, 1.183. And directions of corresponding ax.. are -0.010 . 0.995 , 0.103 ., 0.999 ) ( -0.037 0.006 ) ( -0.104 0.038 ) ( 0.994 .(c). Data look fairly normaL. .. 700 70 / 60 I' .I- . 500 I" ir 60 I' I ;C ~ .Lf xM i J ii o. ..."1 -1 0 - . -- 15 -2 2 1 -. . - - . .- 25 20 30 -2 - . .- 30 J 50 40 400 35 ~ -1 0 .2 2 1 2 1 NORMAL SCORE NORMA SCORE NORMA SCORE 0 -1 . . 700 600 -, .. . .a. . 50' . 400 . ..~.... ...-I. ...,.0.... . ... ...t .... : . .. .......... .. 60 ..- .'1. ..e : .._ ;c ... .a. ... I.. .... x 500 400 I... .. i. .:.-. .. . .. . .. . . . :. 0. 60 . N x I . 01 .. . . . ...-.. . .o .-.: :. .. i . 1 0 I.oi o.0 ..0 . .! . 50 40 . . . . . . .. 0 0 0 70 700 . .. . 30 .o 30 40 50 60 70 15 20 2S 30 35 X2 X3 15 20 2S 30 35 X3 5.19 a) The summary statistics are: 361"621 .031 n = 30, -x __ (18£0. 50J ~354 .13 and s = (124055.17 361621 .03 348"6330.9'0 J 85 wher~ S has e i g~nva 1 ues and e; g~nv ect~rs Å1 = 3407292 e~ = (.105740, .994394) _1 Å2 = 82748 !2 = (.994394,-.1 0574~) , n, ' Then, since 1 p(;:~) .Fp n_p(a) = 3~ 2~i) F2 2St .tl5) = .2306, . - is given by the set of \1 a 95% confidence region for ~ - (1860.'50-\11' 8354.13-~2) .. 361621 .03 348633tl. 90 83~4 .13-~2 (124055.17 3'61~21.03J' ~1(1860.5tl-~lJ . ~ .2306 half lengths of the axes of this ellipse are 1.2300 Ir = 886.4 and The l. 2306 .~ = 138 ~ 1. Th~refore the ell ipse has the form -------_. --_...__.. . ...... --_..- .._._.. ". -----------_.- ------_._- -~ Ì" /,. '1 , ; , ,. : -- : f j ; 'v.. , : ; ~ ; , : , I ! ; ... I ! i . i . i , i , , , , I : , - , , i : , , , ¡ ,~~.So . :: -- - l I , i "J JI : ! , ! i J .1 I 1 . ; : ! i ; i ; : , ! l : 11 , ! : I i ! . i i : ß~5'4.13 .1 , , ; : '. i ; , : i : : ~ I : i : ';fJ"w : , : ; ; , , , i : , , ; : . ¡ '- I 10000 : ; /' i : , I ! i ; 2,000 - , : i ~E . , :¿1, I '. . IOQ" i 2,.aoø. ' , . 3öOO ' I '. l.øo.ft Xl : 86 b) Since ~O = (2000, 10000)' does not fall within the 9Siconfidence ellipse, we would reJect the hypothesis HO:~ = ~O at the 5% level. Thus, the data analyz~d are not consistent with these values. c) The Q-Q plots for both stiffness and bending strength (see below) show that the marginal normal ity is not seri ously viol ated. Ai so ; coefficients for the test of normal ity are .989 and the correlation .990 respectively so that we fail to reject even at the ii signifi- cance level. Finally, the scatter diagram (see below) does not indicate departure from bivariate normality. So, the bivariate normal distribution is a plausible probability model for these data. Q-Q Plot-Bend i n9 Strength X2 12000. . * ...._----- * * * 10000. ** ** . ._-* *** ---- -.- -- ** 8000 . ..2--..' .. ***** - _.. .,-------***" * ._-"- ..._--- -_._- . -._--_._-- * * * 6000. - ...... * 4000. :i -2.0 ------_.._--- ._--.._.._....... _. l , -1.'0 t:rr.e 1 at; on .989 0.0 1.~ 2.0 3.0 I. ! 87 Q-Q Plot-Stiffness Xi 2800. * * * 2400 . * * *** * ::ooo. .. ***2 *2 ** ** **** 1600 . * * * ."__0"" ._____. .._--_ -_._---" 1200. * * 800. .._----------- ._-- ..---" .. -2.0 _ ____, ._-_ _=J.!.9..._ . _.._ . ..Correlation .. -.990 . _......-_.. ._.__....__.- ~.-, I 0.0 . 1.0 .....-._.. ---~------I--:. 2.0 ._ _ _ _.._~ .9.___ '88 Sea tter 01 agram -_.. . -...._....~.. ..- - -------- - ...*. 2400 ;. . --- ----_..- ** * * *. * * * 2000. * * * * * **** 1600. ** * * * * * .....-_.._-- -_..__... * ** . '.._-.- . ........_... .. . ... -- ---.- -._---- 1200. . * * ..._-_..__....- _.. ......._. .. ,...__. --------- 800. I 4000 . 6000. ._-.. -- ---- . _._........-- I 80ÖO. . - - .. 10000. _._-~.-:- .__.,.. .1---0- -~r. 12000. X2 .. . i 4000 . 89 5.20 (6). Yes, they are plausible since the hypothesized vector eo (denoted as . in the plot) is inside the "95% confidence region. . 96li S1mullJeouB Cooldence Region for Wean Veclor iiI ii. ii. i .~ ii i ii. " i.o i .. ¥ i" i' . i, i Ui i' . ... 110 II . 'ii ,., ... 1'. ,.. "7 ... iiu. (11). 189 .822 UPPER 197 . 423 274.782 284.774 LOWER Bonferroni C. i.: Simultaneous C. i.: 189.422 197 .823 274. 25S 285.299 Simultaneous confidence intervals are larger than Bonferroni's confidence intervals. Simulfrom outside. (c). Q~Q plots suggests non-normality of (Xii X2). Could try tra.nsforming XI. taneous confidence intervals wil touch the simultaneous confidence region Q-Q PLOT FOR X1 Q-Q PLOT FOR X2 310 -. 210 -. 300 . 200 .. . )( 190 . xN . --- .2 280 270 . 29 xN .. .- 0 NORMAL SCORE 2 .. . . '. . 260 250 -1 . ..: .- . ... 280 270 .. 260 . .. 180 r ./ .. 300 J 290 . .. . .. 310 . .. 250 -2 -1 0 NOMA SC-QRE 2 ISO 20 X1 . 90 5.21 HOTELLING T SQUARE - 9 .~218 P-VALUE 0.3616 T2 INTERVAL xl x2 x3 x4 x5 x6 N 2S 25 25 25 25 25 MEAN 0.84380 0.81832 1.79268 1.73484 0.70440 0.69384 STDEV 0.11402 0.10685 0.28347 0.26360 0.10756 0.10295 .742 .723 1.540 1. 499 .608 .602 .946 .914 2.046 Bonferroni the T2 intevals use the constant 4.465. t ( .778 .757 .642 .635 .786 .00417 ) TO 1. 629 1. 583 1. 970 .800 The intervals use BONFERRONI TO - 2.88 and .909 .880 1. 95"6 1. 887 .766 .753 5.22 91 la). After eliminating outliers, the approximation to normality is improved. a-a PLOT FOR X1 a-a PL-DT FOR X2 a-a PLOT FOR X3 30 18 2S 15 20 X C/ W . . .' . 5 .. :: 0 ~ .' 10 a: .. ,. _.. 15 l- -I -2 0 .. 10 111 .'. .. -- x.. ... 5 2 . .. . -2 NOMA SCRE .. 14 12 10 ,.'-- 8 6 0 .... 4 0 .1 2 -2 NOMA SCORE :: ~ .,. 0 -1 2 NORMA SCORE l- 15 ~ .. '. 10 . .'.. . 5 . .l 8 ... 10 5 ..X is 20 25 a: W ..l:: a l:: a:: .. ø. . .. 8 8 6 4 4 5 20 15 25 .. 30 10 S 1S a-a PLOT FOR X1 a-a PLOT FOR X2 a-a PLOT FOR X3 12 10 8 6 .. o. .. .. .. .- ..- ~ . . .. 14 12 10 8 .. 6 . .. . 4 2 . 2 0 -1 .2 14 .. .. .. . 4 . 2 4 II ... 8 10 X XI x .. . ... 6 0 2 ,. ILL .. . :. . . . .. '" . x 14 12 10 . 6 II 4 4 4 8 8 10 X1 14 -I 0 2 NOMA SCRE 111 12 10 ..."" .. ... . -2 iI 8 14 .1 14 '" . .. '" NOM4 SCORE l- 12 10 8 -_.. ... 18 16 14 12 10 8 4 NORMA SCRE II 10 I. . . X2 -2 ~ xM o. X1 4 ~ , ... o . 16 14 12 10 Xi 14 X 18 30 18 CJ 18 18 14 12 10 I. . . .. 2 4 6 8 10 X2 14 92 l. Outliers remov.edi~ LOWER UPPER Bonferroni c. i.: 9.63 5.24 8.82 12.87 9.67 12.34 Simul taneous C. i.: 9.25 4.72 8.41 13.24 10.19 12.76 Simultaneous confidence intervals are larger than ßonferroni's confidence intervals. (b) Full data set: Bonferroni C. I.: Simultaneous C. I.: Lower Upper 9.79 15.33 5.78 10.55 8.65 12.44 9.16 5.23 8.21 15.96 11.09 12.87 93 5.23 a) The data appear to be multivanate normal as shown by the "straightness" of the Q-Q plòts and chi-square plot below. . 140 - . .c 't CD x t' :2 130 - . . . . . .. . . . . 140 . . . - . . . . . . . . or 0) :i Ul lU 130 ID . . . . 120 - . -1 i -2 -1 . I i 0 1 . 120 2 NScMB -2 -1 0 2 NScBH ~= .994 i- = .97'6 . ... . 55 110 - -i s: . . . . . C) Ul lU m 100 - . 90 - . . . . . . . . - .c Ul lU 50 - Z . 45 - . -T .1 I I , -2 -1 0 1 2 . . -2 '. I -1 0 ! r;= .995 10 - . 5 _. . ... .. . ... .. .. ... . . .. .. I i "U 5 . . . . . . ¡ 10 .fe,4l.( -.5)/30) i 2 NScNH i. = .992 o - . . . NScBL d¿) . C) :i . . . . 94 5.23 (Continued) b) Bonferroni 95% simultaneous confidence intervals (m = p = 4): t29 (.05/8) = 2.663 MaxBrt: BasHgth: BasLngth: NasHgt: (128.87, 133.87) (131.42, 135.78) (96.32, 102.02) (49.17, 51.89) 95% T simultaneous confidence intervals: 4(29) F (.05) = 3.496 26 4.26 MaxBrt: BasHgt: BasLngt: NasHgth: (128.08, 134.66) (130.73, 136.47) (95.43, 102.91) (48.75, 52.31) The Bonferroni intervals are slightly shorter than the T intervals. 9S 5.24 Individual X charts for the Madison, Wisconsin, Police Department data LegalOT ExtraOT xbar s 3557.8 ô06.5 LeL UCL 5377.4 1478.4 1182.8 -2.070.0 5026.9 Holdover 2676.9 1207.7 COA 13563.6 1303.2 800.0 474.0 MeetOT 1738. 1 -946 . 2 6300 . 0 9654.0 17473.2 2222 . 1 -622. 1 use use L-CL = 0 LCL = 0 use LCL=O The XBAR chart for x3 = holdover hours 0 ai :: ii ;: ii:: "C ":; 'e .5 0 .0 CD 0 0 0 (" 0 .0 0 .. . . . . . . . ...a................y........................;..........;..........................__......................;................................ . . . . ---------------------------------------- . 2 4 6 8 10 12 14 16 Observation Number The XBAR chart for x4 = COA hours ai :: ii ;: ii:: "C ":; 'e .5 0 0 0 ,... .0 0 a(" .. 0 0 0 Q) . . . . . . ............................................................--;........................................................................................ . . . . . . . 2 4 6 8 10 Observation Number Both holdover and COA hours are stable and in control. 12 14 16 96 5.25 Quality ellpse and T2 chart for the holdover and COA overtime hours. quality control 95% ellpse is All points ar.e in control. The 1.37x 10-6(X3 - 2677)2 + 1.18 x 10-6(X4 - 13564)2 +1.80 X 1O-6(x3 - 2677)(X4 - 13564) =5.99. The quality control 95% ellipse for holdover hours and COA hours 00 0 r.. 0 00 .. co . . .. 00 0 ..It 0 in 0'I0 :i 0 .. J: c( 0 u0 0 t'0 .. 0 0 0 C\ .. 00 0 T- . . .+ . . . . . T- -1000 0 1000 3000 5000 Holdover Hours The 95% Tsq chart for holdover hours and COA hours a: r- UCL = 5.991 ci .................. '''..n..... ............ ........ ..... ...... .._...... ..... ..............._.._...........__. i: in t! 'It' C\ o 97 5.26 T2 chart using the data on Xl = legal appearances overtime hours, X2 - extraordinary event overtime hours, and X3 = holdover overtime hours. All points are in control. The 99% Tsq chart based on x1, x2 and x3 o .. ................................................................................................................................................ . CD C' ~ co v N o 5.27 The 95% prediction ellpse for X3 = holdover hours and X4 = COA hours is 1.37x 10-6(x3 - 2677)2 + 1.18 x 1O-6(x4 - 13564)2 +1.80x 1O-6(x3 - 2677)(X4 - 13564) = 8.51. The 95% control ellpse for future holdover hours and COA hours 0 .. 0co0 0j !! :z . ... 00 . . . .+ . 0 ..v c( 0 () 0 .. 0N0 o o o o .. -1000 0 1000 3000 Holdover Hours 5000 98 5.28 (a) x= -.506 .0626 .0616 .0474 .0083 .0197 .0031 -.207" -.062 .0616 .0924 .0268 -.0008 .0228 .0155 .0474 .0268 .1446 .0078 .0211 -.0049 .0083 -.0008 .0078 .1086 .0221 .0066 .698 .0197 .0228 .0211 .0221 .3428 .0146 -.065 .0031 .0155 -.0049 .0066 .0146 .0366 -.032 s= The fl char follows. limit. (b) Multivariate observations 20, 33,36,39 and 40 exceed the upper control The individual variables that contribute significantly to the out of control data points are indicated in the table below. Point Variable P-Value Grea ter Than UCL 20 33 Xl X2 X3 X4 X5 X6 X4 X6 O. 0000 0.00.01 0.0000 0.0105 0.0210 0.0032 .0.0088 O. 0000 o . 0000 \) .0000 36 Xl 39 X2 X3 X4 X2 X4 X5 X6 40 XL 0.0000 X2 X3 X4 O. 0088 \). OO.QO 0.0343 0.0198 0.0001 0.0054 o . 000'0 0.0114 0.0-013 99 2 472' 2 29(6) . 5.29 T = 12. . Since T = 12.472 c: -- F6,24 (.05) = 7.25(2.51) = 18.2 , we do not reject H 0 : ¡. = 0 at the 5% leveL. 5.30 (a) Large sample 95% Bonferroni intervals for the indicated means follow. Multiplier is t49 (.05/2(6)):: z(.0042) = 2.635 Petroleum: .766:t 2.635(.9251,J) = .766:t .345 -7 (.421, 1.111) Natural Gas: .508:t 2.635(.753/.J) = .508:t .282 -7 (.226, .790) Coal: .438:t2.635(.4141.J) = .438:t.155 -7 (.283, .593) Nuclear: .161:t 2.635(.207/.J) = .161 :t.076 -7 (.085, .237) Total: 1.873:t 2.635(1.978/.J) = 1.873 :t.738 -7 (1.135, 2.611) Petroleum - Natural Gas: .258:t2.635(.392/.J) = .258:t.146 -- (.112, .404) (b) Large sample 95% simultaneous r intervals for the indicated means follow. Multiplier is ~%;(.05) = .J9.49 = 3.081 Petroleum: .766:t3.081(.9251.J) = .766:t.404 -- (.362, 1.170) Natural Gas: .508:t3.081(.753/.J) = .508:t.330 -- (.178, .838) Coal: .438:t3.081(.414/.J) = .438:t.182 -- (.256, .620) Nuclear: .161:t3.081(.207/.J) =.161:t.089 -- (.072, .250) Total: 1.873:t 3.081(1.978/.J) = 1.873:t .863 -- (1.010, 2.736) Petroleum - Natural Gas: .258:t 3.081(.392/.J) = .258:t .171-- (.087, .429) Since the multiplier, 3.081, for the 95% simultaneous r intervals is larger than given interval is the same, the r intervals wil be wider than the Bonferroni intervals. the multiplier, 2.635, for the Bonferroni intervals and everything else for a 100 5.31 (a) The power transformation ~ = 0 (i.e. logarthm) makes the duration observations more nearly normaL. The power transformation t = -0.5 (i.e. reciprocal of square root) makes the man/machine time observations more nearly normaL. (See Exercise 4.41.) For the transformed observations, say Yi = In Xi' Y2 = 1/'¡ where Xl is duration and X2 is man/machine time, - = p.171J Y l .240 s = r .1513 -.0058J l- .0058 .0018 , , S-i - r 7.524 23.905J l23.905 624.527 The eigenvalues for S are Â. = .15153, Â. = .00160 with corresponding eigenvectors ei = (.99925 - .03866), e2 = (.03866 .99925l Beginning at center y, the axes of the 95% confidence ellpsoid are maior axis: :! IT v Â.2(24) F2 23 (.05) ei = :t.208el .. r: 2(24) :tvÂ. F223(.OS)e2 =:t.021e2 mInor axis: The ratio of 25(23) . the lengths of 25(23) . the major and minor axes, .416/.042 = 9.9, indicates the confidence ellpse is elongated in the ei direction. (b) t24 (.05/2(2)) = 2.391, so the 95% confidence intervals for the two component the transformed observations) are: means (of Yi :tt24(.0125)¡; = 2.171:t2.391.J.1513 = 2.171:t.930 ~ (1.241, 3.101) Y2 :tt24 (.0125)'¡ =.240:t2.391.J.0018 =.240:t.101 ~ (.139, .341) Chapter 6 ii.1 WI Ei9~nvalues andei9~nvectnrs of Sd are: "1 = 449.778, !1 = (.333, .943) "2 = 168.082, ~~ = (.943, -.333) Ellipse cent~rl!d at r = (-9.36,.13.27). Half length of major axis is 20.57 units. Half length of minor axis is 12.58 units. Major and minor axes lie in :1 and !2 d;r~ctions, respetively. Yes, the t.est answers the question: Is ô = 0 ins1tfe the 95i confi- dence e 11 ipse 1 6.2 Using a critical value tn_i(cr/2p) = tio(O.0l25) = 2.6338, UlWER Bonferroni ~. I.: -20 . 57 -2.97 Simul taneous 'C. I.: -22 . 45 -5.70 UPPER 1.85 29.52 3.73 32.25 Simultaneous confidence intervals are larger than Bonferroni's confidence intervals. 6.3 The 95% Bonferroni intervals are LOWER Bonferroni 'C. i.: Simultaneous C.I. : UPPER -21.92 -2.08 -3.31) 20.56 -23.70 -~ . 30 -5 .50 22.70 Since the hypothesize vector '6 = 0 (denoted as * in the plot) is outside the joint confidenæ r.egion, we reject Ho : '6 = O. Bonferroni C.!. are consistent with this result. After the elimination of the outlier, the difference between pairs became significant. 95% Simultaneous Conidence Region (or Della Vector 102 .3 0 M 20 U 1 2 10 M U 2 0 2 - 10 -20 -10 -.3 0 MU11-MU21 o 6.3 Problem 6.4 (a). HoteHing's T2 _ 10.215. Since the critical point with cr Ho : .. ó =...o. (b). Bonferroni C. I.: T Simultaneous C. 1.: .. 1.. '.0 Lower Uoner -1.09 -0.04 -0.02 -1.18 -0.10 0.07 0.69 - 0.05 is 9.4'59, we reject 0.64 95% Cofidence Slips Ab the Me Vecor ... ... ..S ... 0.' 0.' (o~O) ... -0.' -0.2 -0.4 -0." -1.1i -1.5 -1.4 -1.3 -I.R -1.1 ....0 -o.S -0.. -0.7 _0.. -0.& -0.4 -0.8 -0.2 -e., 0.0 0.' .... o.a .. ld ...."'1 -it Figure 1: 95% Confidence Ellpse and Diffence 'Simultaneous T2 Interv for the Mea 103 (c) The Q-Q plots for In(DiffBOD) and In(DiffSS) are shown below. Marginal normality cannot be rejected for either variable. The.%2 plot is not straight (with at least one apparent lJivariate outlier) and, although the sample size "argue for bivariate normality. (n =11) is small, it is diffcult to a-a Plot o.S /' .... ôo _o.s // -.,/ ./ ///// ../". . .--~/ ~/- ê .. i5 / .. -I.. .//-' ~......, .-......- ////_1.5 . r"''/ // ..../ // _2.0 . ///..,,~ o 0.& _1.5 lr' Qlt.. 1_ Q-Q Plots 1.25 1.00 .. 0.76 ~ 0.60 .. 0.25 CI is .. /.. //~' .. / // //"/ _./ ..... ../ _0.25 _0.50 /////" /-,.. ~/ ...// ../..... ../.-///- o ..5 ..1~...1_ Chi -squa-e Aot d th OderedDistcnce d . 0-11 , 3 , -'- 4 .i 7 104 -1 .0-- 6.5 a) H: Cii = 0 wher.e C = ('0 1 -~ ). ~. = (~1'~2'~3) · -32.6) i:x = (-11.2), CSt' = (55.5 - 6.9 -32.6 66.4 - - T2 = n(Ci)' (ese' )-1 (ei) = 90.4; n = 40; q = 3 ((~~~li)l) Fq_l.n_a+i(.05) = (3~~2 (3.25) = 6.67 o-- Since T2 = 90.4 ~ 6.67 reject H :Cii = 0 b) 951 simultaneous confidence intervals: 111 - 1-2: (46.1 - 57.3) :! -/6.67 J5~õ5 = -11.2 :! 3.0 1-2 - 1-3: ti.9 :! 3.3 111 - 1-3: -4.3 :! 3.3 The means are all different from one another. IOS 6.'6 a) Tr"eatment 2: Sampl e mean vector -3~2 J (:l sampl e covariance matrix (-3;2 Trea tment 3: Sampl e mean vector samplè covariance matrix (:) ; -4/3 J 4/3 r~13 Spoled = -1.4 ( 1.6 b) ;1.~ TZ = (2-3, 4-2) ((1 + 1) -1.4 (1.6 = 3.-88 r (:~J ("1 +n2-2)p _ (5)2 _ -(" p 1 ( .01) - 4 (18) - 45 1 +n2-p 1)- Fp'1n +n 2-- Since TZ = 3.88 ~ 45 do not reject HO=l2 -!!3 = ~ at the ci = .01 1 evel . c). 99%simul taneous confi-dence intervals: 1121 - l1:n: (2-3) :! I4 Æ~+l)l.,- = -1 :16.5 1122 - l132: 2 :I 7.2 6.7 TZ = (74.4 201.6) (45 + 55) 21505.5 ( 1 1 (10963.! 21505.5 7;4.4 . = 1'6 1 53661.3 _ 201 .'6 . JJ-1( J (ni +"2-2)P 1 FP'"l +n 2-P1 ( .05)= 6 .~6 "1 +n2-p. Since r2 = 16.1 ;) 6.26 reject HO:~l - t!2 = ~ at the ct = ..o level. ,. -1ldxi-it= (- -). &êrS _ poo ~ - ;., 106 .0026 (.001 7 J 6.8 a) For first variable: trea t:nt observation (: 5 8 1 2 5 3 + mean. = : 7) = (: 4 4 4 4 4 4 r.esidual + 4) (2 2 2 2 4 + 1 -1 a + -2 -2 -2 4 -1 -1 -1 -1 SS · 1 92 mean . SSobs = 246 . effect SStr = 36 1 (0-1-12 20.-1 -2 J SSres :: 18 For second variable: 5 55) (333 3 3 6 3 = 5 5 5 + -1 -1 -1 311355 (79 6 9 9) (5 5 SSobs = 402 5 5 -3 -3 -3 -3 SStr =84 SSmean = 300 3)(-1 1 ~2 1 'J + -1 2-1 1 -1 -1 1 SS res· 18 Cross product contri butions: 275 48 240 -13 b) MANOVA tabl e: Source of Vari ation SSP Treatment B = (36 Residual W - 48 d.f. 48 J -13J - -13 rlB 3-1=2. 84 18 35) Tota 1 (corr~ct~) 35 Hl2 (54 '5+3+4-3-9 n 107 * ~ 155 c) li = TäT = 4283 = .U362 Using Table 6.3 with p = Z and 9 = 3 (1 - Ià \ (En 1 - 9 - ~ = 17 .02 . \IK) 9-1) . Since F4,16(.01) = 4.77 we:~onc1ude that treatmnt differences exist at e = .01 l~vel. Al ternat1vely, using Bartl ett' s procedure, ( (p+g.) (5) ( ) - n - 1 - 2 ) ln A* = - 12 - 1 - '2 1n .0362 = 28.i09 Since x;e .01) = 13.28 we again conclude treatment differences exist at e = .01 level. 6.9 for!! matrix C _ n..J n"J .. a = 1 I: d. = C~ 1 1: x.) = C X and so 6.10 d. - a = C(x. - x) ..J" -J" . n- -J - -J - n-. ..J - ..J" . . Sd =..1 r(d.-a)(d.-a)' = C(..1 r(x.-x)(x.-x)')C' = t:SC' .. ... . . ..g ei 1)'((xi-x)u1 + ... + (xg-x)u ) = x((x1-x)ni + ... + (ig -x)ngJ = i(nix1 + ... + ngxg-x(ni + ....+ ng)). = x(("l + ... + "g)i-x(ni +... +" )) = 0 . 9 108 6.11 l(~1'!:2,t) = L(~l ,t)l(~2';) z~ ",)+nzlexp 2 lt ~;1 2 -1) j 51.+( "2-1152) =(( (", +"21p (tr t-' ((", + ",(~, _~,l' t-'(~-~,l + "2(~2-~21' t-l(~2 - !:1)1 ". _ A_ using (4-16) and (4-17). The likelihood is maximized with respect to ~, and ~z at ~l = ~1 and ~2 = ~ respectively and with . respect to * at 1 Qi+n2-~ 12.12 i = n +n ((n1 -1)S, :l (nZ - 2)SZ) = n +n S poo 1 l!d (for the maximization with respect to ; see Result 4.10 with n,+nZ b = 2 and B = (n1 -1)S, + (nZ - 2)52) 6.13 . a) and b) For firs.t variab1 e: . factor 1 Observation = mean + effect + factor 2 .. residual effect ,. -2 4 -3J + 2 -1 0-1 -3 -4 3 -4.J 1 1 1 1 . -3 -3 -3 -3 1 -2 4 -3 -2 0 1 ,. ( : -: : ~l = (~. ~ ~ ~J + (-~ -~ -~ .~~J + (1 .:Z 4 -3'1 (0 1 -1 . OJ SStot = 220 SSmean = 12 SSfac 1 = 104 SSfaC Z = 90 SSres=14 For second variable: 8233 = 3333 + ~ 6 -5T -3 -6 (3 3 33 33 3 (8 Z OJ 3J SSt~t = 44() 1 -ZJ (-3 0 3 OJ 1 1 1 1 + 3-2 1 -2 + 1 ° -2 1 -6 (3-2 3 -2 1 -2 2 ° -1 -1 (-65 -6 5 -6 5 5) SSmean = 1\) SSfac 1 = 248 SSfac Z:i 54 SSres · 3Q 109 Sum of ~ross products: SCP tot = SCP m~an + SCP fac 1 + SCP fac 2 + SCP r~s 227 = 36 + 148 + 51 - 8 c) MANOVA table: Source of Variation SSP d.f. 9 ...1"=3 -1 =2 Factor 1 148 1481 248J l04 b-l=4-1-3 Factor 2 51 51) 54 ( 90' (g-l )( b-l) = 1) Residual (14 ..8) -8 .3 0 Total (Corrected) r 208 1911 . gb - 1 = 11 L191 332J d) We reject HO:!l =!2 = !3 = ~ at a = .05 l~vel since ((g-l )(b-l) _ (p+l2- (Q-1 = -(61_ 3-2)ln. r~s 2 \ ))J1nA* SSP fac +( SSP res , ss i. IÙ ë: -5.5 1n ( 356 ) = 19.87 ~ X:( .05) = 9.49 .13204' and concl ude there are factor 1 effects. We al so reject HO:~l = ~2 = ~3 = ~4 = ~ at the ~ = .05 level since 110 . K ~ res _ ((g-l )(b-l) _ (p+l - 2 (b-l))R.nti* -(6 _ 3-3)R.n~ res I \)" 2" i==SSPf ., +Iss,p SSP r. 0: -6 R.n ( 356 ~ = 17.77 ~ X~ (.05) = 12.59' 6887 . and concl ude there are factor 2 effe~ts. 6.14 b) MANOVA Tabl e: Source of d."f. SSP Variation 1841 2 Factor 1 184 (496 208J 24) '3 Factor 2 .24 36 .0 (32 4:) (36 Interaction .6 ~S41 12 Residual Total c) Since -84 (312 400J (Corrected) 23 " 1 24 124J 688 (876 . G . . I SSP I -(gb(n-l) - (p+l - (g-l Hb-l n/2)R.nA*. =lSSP1, -13 5tn tn+ res SSP res .. -13.5R.n( .808) = 2.88 0( xi: (.05) = 21.03 we ~.! reject HO:!l1 = !12 = ... = !34 = ~ (no i!lteraction effects) a~ t~ a = .05 level. 111 Si nc~ -(gb(n-l)-(p+i-Cg-1))/2)R-nA*=-11.SLnfac lssp 1 ~e~sp res r . ( lssp 1 ) = -;1.SR-n(.24.47) =16.19 ~ XH.05) = 9.49 we rejet:t -2 -3 - HO:"t_1 = "C = "t = 0 (no factor 1 effects) at the a. = .05 1 eve 1 . Since . ~ ISSPresl ) -(gb(n-1)-(p+l-(b-l ))/2)Wi* :I -lZl lssp + SSP r fac 2 r..s :: -12R.n(.7949) =2.1fi 0( XU.05) = 12."59 we do not reJect HO:~l = ~2 ? ~3 = ~4 = ~ (no factor 2 effects) at.the a = . OS 1 eve 1 . 112 6.15 Example "6.1l. g. b · 2, n · '5; a) For "0:!1 .;2 .~, A* D .3819 Since '* -(gb(n-l)-(p+l-(g-l))/2)tn A =-14.51n(.3819):2 · 13.96 :) X: (.05) = 7.81 . we reject HO at a = .05 level. For HO: ~1 = ~2 = ~. 14*'= .'5230 and :~4.5~n (.5230) :0 9~40. Again we reject "0 at a. '.05 level. These results are consistent with the exact F tests. -1 a a-- 6.16 H : Cll = 0; Hi: C!: 'I Q where, c=U 1 -1 o 1 -1~J . . Suniary stati stics: 1 906.1 x = 1749.5 1509.1 1725.0 1 05"625 . , S = 94759 87249 94268 1 01 761 761 6ô 81193 91809 90333 1043Z9 - - r2 = n ( Cx p ( CSC i ) -1 (Cx) = 254.7 . ~(~:~ii)l) Fq_l.n_q+1(a) = (3~~it11F3,27(.05) = 9.54 Since T2 = 254.7 ,;) 9.54 we reject "a at C1 = .OS level. 95i simultaneous confidence interval for -dynamic. versus .static. means h.11 + ll2) - (1.3 + 1.4) is, with :' = (1 1 -1 -1). - I ( n-1 q-1 ) () ; I~ :t (Il~q+ 1) fHq-1 ,n-q+l a I rc =n = 421.5 :: 174.5 -- (247. 59ó) 113 Arabic G) 6.17 (a) Q) Format ø Words ø Different Same Party Effects Contrast Party main: (¡.2 + ¡.4) - (#¡ + #3) Format main: (¡.3 + #4) - (¡¡ + #2) Interaction: (#2 + #3) - (#1 + #4) Contrast matrx: c= -1 -1 (-1 ~1 ;1 J S. T2 31(3) . ince = 135.9;: -(2.93) = 9.40, reject H 0 : C,u = 0 (no treatment-effects) at the 5% leveL. 29 (b) 95% simultaneous T intervals for the contrasts: Party main effect: -206.4:t.J9.40 20,598.6 -7~-280.3, -125.1) 32 Format main effect: -307:t.J9.40 42,939.5 -7(-411.4, -186.9) 32 Interaction effect: 22.4:t.J9.40~9,8l8.5 -7 (-32.3, 75.0) 32 No interaction effect. Party effect-"different" resonses slower than "same" responses. Format effect-"words" slower than "Arabic". (c) The M model of numerical cognition is a reasonable population model for the scores. (d) The multivarate normal model is a reasonable model for the scoresconesponding to the party contrast, the format contrast and the interaction contrast. 114 6.18 Female turtle Male turtle A chi-square plot of the ordered distances (0 . . 10 -ê ~ "C .. C" Q) E 0 . C\ .. ... ... 0 ... . ... . .. . co . C' CD A chi-square plot of the ordered distances . . C' ~ (0 . . . . 'C .. . . . Q) ~ Q) "E .. 0 C\ 0 0 . .... 4 2 6 8 10 .. 0 . . ....... 2 6 4 qchisq . 0 :2 ui ë, c Q) .. -i: ~. :: co (0 -2 . .. .. ... .. -.. .. ëi i: ,. Q) . .s . .. ::~ (0 . . 2 1 -2 Quantiles of Standard Normal .. ... .... .. . . . 0 . . .. . .. -1 10 CJ co . ... . .. . .... ... 8 qchisq 0 -1 2 1 Quantiles of Standard Normal . . co .. ~ .~ (0 -g .. ~ .. -'C ... -.s.~ ~~ .. . . :2 . . -2 ... . . .... ... . ... :5 10 .. 0 -1 . (0 1 2 ~ ~ . - ..-C) Q) . .s .- CJ S (' .. ,. M . -2 . . .. . ..... . . .. .. . .. . .. .. e. . . -2 Quantiles of Standard Normal .. .. . .. .. . . . .. ... . 0 -1 -2 1 Quantiles of Standard Normal . . -1 1 0 2 Quantile of Stanar Noral 10 co M -,. -. 10 -§ C" ëü ..-10 .s ta (' 10 ia (' .. . ~ .. ... .... ... . . . . .. . . . -1 t) 1 Quantile ófStandard Norm::l 2 115 mean vector for f~males: mean vector for males: X1BAR SPOOLED X2BAR 4.9006593 4. 7254436 4. 6229089 3. 9402858 4.4775738 3.7031858 0.0187388 0.0140655 0.0165386 0.0140655 0.0113036 0.0127148 0.0165386 0.0127148 0.0158563 TSQ CVTSQ F CVF PVALUE 85.052001 8.833461 27.118029 2.8164658 4. 355E-1 0 linear combination most responsible for rej ection of HO has coeffici.ent vector: COEFFVEC - 43. 72677 -8.710687 67.546415 95% simultaneous CI for the difference in female and male means Bonferroni CI LOWER UPPER 0.0577676 0.2926638 0.0541167 0.2365537 0.1290622 0.3451377 LOWER UPPeR o .07ß8599 0.2735714 0.0689451 0.2217252 0.1466248 0.3275751 116 6.19 a) ~1 = 8.113 i :z = (~~:~:J i. 9.590 _ (12.219J 18. Hi8 223.0134 12.3664 2.9066 51 = 17.5441 4.7731 13.9'633 Sz = 25.8512 7.6857 6543 ( 4.3623 . .7599 46. 2.3'621 J; Spooled = 20.7458 5.8960 '( 15.8112 7.855026.5750 2.6959 J' (('1 + pool l)S ed )-1 = n1, n2 .8745 -. 1525 . 5640 , L--i'0939 -.4084 -,0203) ,HO: lh - !!Z' = 2 , (( )-1 (- ~l -) -:2 = St'.92 Since T2 =(-:1-)-:2 'n 1+ 1 n2) Spooled :) (n1+n£-2)p . _ (57)(3) (ni+nZ-p-l) Fp,ni+n2-p-,t.Ol) - 55 F3.SS(.01) = 13. we reject HO at the a = .01 1 evel. Thereis a diff~rence in the (mean) cost vectors betw.een gaso1 ine trucks and dies.el trucks. wax x = -1.ae ,a.) '" _s-l (- -) c: pool.ed -1 - ( -2 3.5,8 J . -4.48 117 c) 99% simul taneous conf;~ence interval sara: ~ll - ~21: 2.113 t 3.790 ~12 - ~22: -2.650:! 4.341 ~13 - ~23: -8.578:! 4.913 dl Assumpti on ti = t~. Since 51 and 52 are quite differant, it may not be reasonab1.e to pool. However, using "large sample" theory (n1 = 36~ n2 = 23) wa have, by Result 6.4, - - )) r: 1 l' )-1 (- - ( )) 1 (~l -.~2 - (~1 - ~2 'Lñ1 51 +ñ2 52 ~1 - ~2 - !:l -!:2 - xp 5inca (- -) I ( 1 1 )-1 (- -) 2 ( ) . ~1 - :2 ñ1 5, + "2 S2 ~l - ~2 = 43.15 ~)(3 .01 = 11.34 we reject HO: ~l - ~2 = ~ at the a = .01 level. Thisis consistent with the result in part (a). 118 6.20 (I) 31. 260 260 L a 240 I 1 2201 m 200i 160 . . .. . .... . . . . . ... . . .... . ..I. .. .. . . . 260 . .. .. . 300 280 '" i ngm (b) The output below shows that the analysis does not differ when we delete the observtion 31 or when we consider it equals 184. Both tests reject the null hypothesis of equal mean difference. The most critical linear combination leading to the rejection of Ho has coeffcient vecor (-3.490238; 2.07955)' and the the linear combination most responsible for the rejection of Ho is the Tail difference. (c) Results below. Comparing Mean Vectors from Tvo Populations rObS. 31 Delete~ T2 C 25.005014 5.9914645 Reject HO. There is mean difference '951. simultaneous confidence intervals,: LAELCI Mean Diff. 1: Mean Diff. 2: LICIMD LSCIMD -11.76436 -1.161905 -5 . 985685 8. 3392202 RESULT Coefficient Vector: COEF -3.490238 2.07955 (Tail difference) (Wing difference) 119 ~omparing Mean Vectors from Two Populations 1,,~\ ,i lObs. 31 = 184. T2 C 25.662531 5.9914645 Reject HO. There is mean difference 957'simultaneous confidence intervals: LICIMD LABELCI Mean Diff. 1: Mean Diif. 2: LSCIMD -11.78669 -1.27998 -6.003431 8.1812088 REULT COEF Coefficient Vector: -3.574268 2. 1220203 ..s. 95% Cofidence Ellips Ab th Me Veda -:61 ..e. ~. 5 .. .. .. .-.~ . :: -eo "f iL\" \ t."' (d) Female birds ai~ g~flerally larger, since the confidence intervl bounds for difference in Tails (Male - Female) are negative and the confidence intervl for difference in Wings includes zero, indicating no significance difference. 120 6.21 (a) The (4,2) and (4,4) entri~s in 51 and 52 differ .con- siderably. Howev~r, "1 = n2 so the large sample approximation amounts to pooling. ( b) H 0 : ~1 - ~2 = ~ and H1: ~1 - ~2 t ~ T2 = 15.830 :) (3~~(4) F4,3S(.OS) = 11.47 so we reject HO at the ~. .OS level. ( c) x ) -3.74 = .16 ,. S-l a a:ed(X-1 _ -2 _ pool . .01 (-.241 121 (d) Looking at the coeffic1.ents â1'Sii.pooled. whieh apply to the standardi zed variables. we see that X2: long term interest rate has the largest coefficient and therefore might be useful in -classifying a bond as 'lhigh" or "mediumlt quality. 4+16 (e) From (b), T2 = 15.830. Have p = 4 and v = = 37.344 so, at the 5% level, the .53556 critical value is vp F (.05) 37.344(4) F (.05)=149.376(2.647)=11.513 v - p + 1 p,v-p+1 37.344 - 4 + 1 4,37.344-4+1 34.344 Since T2 = 15.830 ::11.513, reject H 0: I! -!J2 = 0, the same conclusion reached in (b). Notice the critical value here is only slightly larger than the critical value in (b). 6.22 (a) The sample means for female and male are: ¡ 0.3136 J ( 0.3972 J _ jS8' XM _ 5.3296 XF'.1.1 = 2.3152 = 3.6876 . 38.1548 49.3404 The Hotellng's T2 = 96.487 ). 11.00 where 11.00 is a critical point corresponding to cr = 0.0~5. Therefore, we reject Ho : J.i - J.2 = O. The coeffcient of the linear combination of most responsible for rejection is (-95.600,6.145,5.737, -0.762)'. (b) The 95% simultaneous C. 1. for female mean -male mean: ¡ -0.1697.234, 0.0025.2336 J -1.4650835, 1.16348346 -1.87"60572, -0.8687428 -17.032834, -5.3383659 (c) \Ve cannot extend the obtained result to the population of persons in their midtwenties. Firstly this was a self selected sample of volunteers (rrienàs) and is not even a random sample of graduate students. Further, graduate students are -probably more sedentary than the typical persons of their age. 122 6.23 n1 = n2 = n3 = 50;p = 2, 9 = 3 (~epal width and petal Width) responses only! .30~ 1 .18576 ~l = (3.428 J; S =. (.143£4 -.00474 J x = -2 · 0418 J .326 J ; ( 1U70 ~3 = 2.026 J (2.974 ., S2 = (.09860 .03920 .0471i.4 J S = 3 (.1 0368 .07563 NAllOVA Table: Source Trea tment d.f. SSP -21 .820 J B = (11.344 75.352 2 4.125 J R.esidua1 . W = (16.950 14.729 Total B+W = . (28.294 . 232.64~ A =* ~ -l = 2235.64 -17 · .147 695J 90.081 . 149 .104 Since (rni-p-2\ (1 - IÃ) ~ !. P ~ IA 153.3 ~ 2.37 - F4 .292( .OS) "Ie rej.ect Ho: !l =!2 =!3 at th~ ci. .05 level. 123 6.24 Wilks' lambda: A* = .8301. Sinceg= 3,(90-4-2'(1~) = 2.049 is anF 4 A .8301 value with 8 and I~8 degrees of freedom. Since p-value = P(F:; 2.049) = .044, we would just reject the null hypothis Ho :.1"1 =!2 =.r3 = Q at the 5% level implyig there is a time period effect. Fstatistics andp-values for ANOVA's: F p-value MaxBrth: BasHght: BasLgt: 3.66 .030 0.47 .629 3.84 .025 NasHght: O.LU .901 Any differences over time periods are probably due to changes in maximum breath of skull (Maxrth) and basialveolar lengt of skull (BasLgt). 95% Bonferroni simultaneous intervals: t87(.05/24) = 2.94 BasBrth BasH;Et BasLgth m = pg(g-I)/2 =12, £11 -£21 : -1:t 2.94 1785.4(2- + 2-) -- -1:t 3.44 £11 - 'l31 : -3.1:t 3.44 £21 -£31 : - 2.1:t 3.44 £12 -£22 : 0.9:t 2.94 1924.3(2- + 2-) -- 0.9:t 3.57 £12 -£32 : - 0.2:t 3.57 £22 - £32 : - 1.1:t 3.57 'l13 - £23 : 0.lO:t2.94 2153(2-+2-) -- 0.1O:t3.78 87 . 30 30 87 30 30 87 30 30 'l13 - £33: 3.14:t 3.78 £23 - £33: 3.03:t 3.78 NasH T" - T,,: 0.30 :t 2.94 /840.2 (2- + 2-) -- 0.30:t 2.36 V 87 30 30 'l14 - £34: - 0..o3:t 2.36 £24 - £34: - 0.33:t 2.36 size over time is marginal. If-changes exist, then these changes might be in maximum breath and basialveolar lengthofskull frm time periods 1 to 3. All the simultaneous intervals include O. Evidence for changes in skull The usual MA~OV A assumptions appear to be satisfied for thse data. 124 6.25 Without transftlrming the data, A * =IWI =.i 159 and F = 18.98. IB+WI Afer transformation, A * :: .1198 and F = 18.52. ~ .FO,98 (.05) = 1.93 There is a clear need for transforming the data to make the hypothesis tenable. 6.26 To test for paralle11 sm, consider H01: C~l = C~2 with C giv~n by (.6-61). C(~l - ~2) = - .167 ; .947 2.014 (CS c1r1 = poo 1 es .616J 1 . 144 L.674 .036 (-- .413J 2.341 11 = 9.58 ;) cZ = 8.0., we reject HO at the 11 = .05 level. The excess electri~al usage of the test group was much low~r than that of the control group for the 11 A.rl.. 1 P.M. and 3 P.M. hours. The s imi 1 ar 9 A.M~ usage for the two groups contradi cts the parallelism hypothesis. . 6.27 a) . Plots of the husband and wife profiles look similar but seem di sparate for the 1 evel of acompanionat~ lnve' tha t you feel for your partneru. b) Parall el ism hypothesis HO: C~l = ~2 with C given by (ó~l). C (~1 - ~2) = -. 17 i (- ·.33 13 J .733 .870 CSpool~dCI = (.685 .029 J -.028 .095 fo r a = . 05, c 2 = 8.7 ( see (6-62)). Since T1 = 19.58 ;) c~ = 8.7 we reject Ha at the a. .~ level. 125 "' .. ;t 6.28 T2 = 106.13 ~ 16.59. \Ve reject Ho :¡.i - J12 = 0 at 5% significance leveL. There is a significant difference in the two species. Sample Mean for L. torrens and L .carteri : L. torrens 96.457 42..914 35.371 14.514 25 .629 9.571 9.714 L. carteri 99 . 343 43 . 743 39.314 14.657 30.000 9.657 9.371 Difference -2.886 -0 .829 -3.943 -0.143 -4.371 -0 .086 o .343 Pooled Sample Covariance Matrix: 36.008 2.426 2.649 1.053 0.934 6.437 o . 692 1. 615 0.211 0.671 3.039 2.407 o . 274 0.229 13.767 0.565 o . 637 1. 213 0.914 0.990 14.595 6.078 16.639 2.764 3.675 9.573 2.992 6.101 Linear Combination of most responsible for rejection of Ho: L. torrens mean - L. carteri mean = 0 is (0.006,0.151, -0.854, 0.268, -~.383, -2.187, 2.971)' 951. S imul taneous C. I. for L. torrens mean - L.carteri mean: UPPER LOWER -8 . 73 -4.80 -6.41 -1.84 -7.98 -1.16 -0 . 63 2.96 3.14 -1.47 1.55 -0 . 76 0.99 1.31 The third and fifth components are most responsible for rejecting Ho. The X2 plots look fairy straight. CHI-SQUARE CHI-SQUARE PLOT FOR Lcarteri fOR L.torr~ns PLOT '5 15 ä! '" :; w a: '" :; '0 10 ~ ~ 0 51 ..:i 0 5 5 '5 '0 5 25 20 15 10 5 o 20 25 o-SOARE O-SOUARE 6.29 (a). S XBAR o .02548 o .05784 Summary Statistics: 0.01056 0.00366259 0.00482862 O. 0~154159 0.00482862 0.01628931 0.00304801 0.00154159 0.00304801 0.00602526 IIotellng's T2 = 5.946. The critical point is 9.979 and we fail to reject Ho :/£1 - Jl2 = 0 at 5% significance leveL. (b). (e). LOWER -0.0057 -0.0079 -0.0294 Bonferroni C. i.: Simultaneous C. i.: UPPER o . 0566 0.1235 o .0505 -0.0128 -0.0228 o . 0637 -0 .0385 o . 0596 0.1385 6.30 HOTELLING T SQUARE - P-VALUE 0 .3616 9.0218 T2 INTERVAL xl x2 x3 x4 xS xli The N 24 24 24 24 24 24 MEAN 0.00012 -0.00325 -0.0072 -0.0123 0.01513 o . 00017 .Bonfer~oni STDEV intervals use the T in"tevals use -.0443 -.0286 0.04817 0.02751 0.1030 0.0625 0.03074 0.04689 TO - .1020 -.0701 -.01'30 - .0430 t the constan-t ( .0445 .0221 .0876 .0455 .0436 .0434 .(H)417 4.516. BONFERRONI ) - TO -.0283 .~285 -.0195 .0130 -.0679 .0535 -.0493 .0247 -.0'030 .0333 -.0275 ."0278 2.89 and 126 127 6.31 (8) Two-factor MANOVA of peanuts data E = Error SSkCP Matrix XL X2 X2 49.365 X3 76.48 49.3"65 XL 104.205 I3 76.48 121.995 94.835 352.105 121. 995 XL X2 XL 0.7008333333 -10.6575 H = Type III SS&CP Matrix for FACTORl X2 -10.6575 162.0675 X3 7.12916666"67 -108.4125 ( Loco.+~ot') X3 7.1291666667 -108.4125 72.520833333 Manova Test Criteria and Exact F Statistics for the Hypothesis of no Overall FACTORl Effect H = Type III SSkCP Matrix for FACTORl E = Error SSkCP Matrix S=l M=0.5 Statistic N=l Value 0.10651620 0.89348380 Wilks' Lambda Pillai ' s Trace Hoteiiing-La~iey Trace Roy's Greatest Root F Num DF 3 3 3 3 11.1843 8 . 38824348 11. 1843 11. 1843 8.38824348 11.1843 Den DF Pr ~ F 4 4 4 0.0205 4 o .0205 o .0205 o .OQ05 XL X2 X3 H = Type III SSkCP Matrix for FACTOR2. (Vat'~e.t~) XL 196.115 365.1825 42.6275 X2 365.1825 1089.015 414.655 X3 42.6275 414.655 284.101666"67 Manova Test Criteria and F Approximations for the Hypothesis of no Overall FACTOR2 Effect H = Type III SS&CP Matrix for FACTOR2 E = Error SSkCP Matrix S=2 M=O N=l Statistic Wilks' Lambda Pillai's Trace Hoteiiing-La~iey Trace Roy's Greatest Root Value 0.01244417 1.70910921 10.6191 21.375"67504 18.187"61127 10.6878 30.3127 XL X2 F Num DF Den DF fir ~ F 6 6 6 3 10 0.0011 6 0.0055 9 .7924 H = Type III SS&CP Matrix for FACTOR1*FACTOR2 XL 12 X3 205.10166£67 363.6675 3153 .~675 760.695 1~7 . 7~583333 254 . 22 X3 1'07.78583333 254.22 ~5 . 95166"667 8 0..0019 5 0...012 128 Manova Test ~riteria and F Approximations for the Hypothesis of no Overall FACTOR1*FACTOR2 Effect H = Type III SS&CP Matrix for FACTOR1*FACTOR2 E = Error SS&CP Matrix S=2 M=O N=1 Statistic Pillai l s Trace Value 0.07429984 1.29086073 Hotelling-La~iey Trace 7 . 54429038 Roy l s Createst Root 6.82409388 Wilks J Lambda F Num OF 3.5582 3.0339 3.7721 11.3735 6 6 6 3 Den DF Pr) F 8 0.0508 10 0.0587 6 0.0655 5 0.0113 seem large in absolute value, but (b) The residuals for X2 at location 2 for variety 5 Q-Q plots of residuals indicate that univariate normality -cannot be rejected for all three variables. 1 PRE2 RES2 PRE3 RES3 CODE FACTOR1 FACTOR2 PRED1 RES a 1 5 194.80 0.50 160.40 -7.30 52.55 -1.15 a 1 5 194.80 -0.50 160.40 7.30 52.55 1.15 b 2 5 185.05 4.65 130.30 9.20 49.95 5.55 b 2 5 185.05 -4.65 130.30 -9.20 49.95 -5.55 c 1 6 199.45 3.55 161.40 -4.60 47.80 2.00 c 1 6 199.45 -3.55 161.40 4.$0 47.80 -2.00 d d e e f f 2 2 1 1 2 2 6 6 8 8 8 8 200.15 2.55 1Q3. 95 2.15 57.25 3.15 200.15 -2.55 163.95 -2.15 57.25 -3.15 190.25 3.25 1"64.80 -0.30 58.20 -0.40 190.25 -3.25 1$4.80 0.30 58.20 0.40 200.75 0.75 170.30 -3.50 66.10 -1.10 200.75 -0.75 170.30 3.5066.10 1.10 Figu i: Q-Q P1ø - Red-lfot Yied .. //"~// .. /' -- -.- --/ ,/ *//./ ~y/-/ /- ./ ./~ . --'I -i.._ """/~ 129 .. 2: Q-Q Plot - Residual for Sound Mature Kernels /' ., Figure ..//' /~/",/' '// .~.//- .-...... .~~."'/.; .' .' .,/ //,,/ //' // /' .../+ /~/ ,... .. . _10....1_ //. //.'/' .."," Figure 3: Q_Q Plot - Reidual for Seed Size .../' . ~. ..//,,,/ / / /' ,// '/' ..... /,,/ y'/' / ../ .. ..:./ /,A ..4/..... ...'" ..,".'" -iI a-...._ (c) Univariate two factor ANOV As follow. *Evidence of variety effect and, for Xl = yield variety interaction. and X2 = sound mature kernel, a location Dependent Variable : yield; Sum of OF Squares Mean Square Model 5 401 .9175000 80.3835000 Error 6 1 04 . 2050000 17.3675000 11 506 . 1225000 Source Corrected Total Source location variety location*variety F R-Square Coeff Var Root MSE yield Mean O. 794111 2.136324 4.167433 195.0750 OF 1 2 2 Type III 55 Mean Square o . 7008333 196 . 11 50000 0.7008333 98.0575000 205. 1016667 102.5"508333 F Value Pr ~ F 4.63 0.0446 Value Pr ~ F 0.04 5.65 5.90 0.8474 0.0418 O. 038~ 130 Dependent Variable: sdmatker Sum of OF ~quares Mean Square F Value Pr :. F Model 5 ~031 .777500 406 . 355500 6.92 0.0177 Error 6 352. 105000 58.684167 11 2383 . 882500 Source Corrected Total A-Square Coeff Var Aoot MSE sdmatker Mean o . 852298 4 . 832398 7 . 660559 158.5250 OF Source location variety location*variety 1 2 2 Type II I SS Mean Square 162.067500 1089.015000 780.695000 162.067500 544.507500 390.347500 F Value Pr :. F 2.76 9.28 6.65 O. 1476 Value Pr :. F 5.60 o . 0292 Value Pr :. F 4.59 8.99 2.72 o . 0759 0.0146 0.0300. The GLM Procedure Dependent Variable: seedsize Sum of OF Squares Mean Square Model 5 442.5741667 88.5148333 Error 6 94 . 8350000 15 . 8058333 11 537 .4091667 Source Corrected Total F A-Square Coeff Var Aoot MSE seedsize Mean o . 823533 7.188166 3.975655 55 . 30833 Source location variety location*variety OF Type II I SS Mean Square 1 72 . 5208333 72.5208333 142.0508333 42.9758333 2 2 284.1016667 85.9516667 F 0.0157 0.1443 131 (d) Bonferroni ~simultaneous comparisoRs of va-ri.ety. differ, and they differ only on X3. Only varieties 5 and 8 Bonferroni (Dun) T tests for variable: XL Alpha= O. Q5 Confiden~e= 0.95 df= 8 MSE= 38.66333 ~ritica1 Value of T= 3.01576 Minimum Significant Difference= 13.26 '***' . Comparisons si~ificant at the 0.05 level are indicated by Simultaneous Simutaneous Difference Upper Confidence Limit Between Confidence Means Limi t -8 . 960 -3 . 385 -17 . 560 4.300 9.875 17 .560 -4 . 300 -7.685 -23.135 -18.835 5.575 -9 .875 -5 .575 Lower F ACTOR2 Compari son 6-8 6-5 8-6 8-5 5-6 5-8 23.135 8.960 18.835 3.385 7.685 Bonferroni \Dun) T tests for variable: X2 Alpha= 0.05 Confidence= 0.95 df= 8 MSE= 141.6 Cri tical Value of T= 3.01576 Minimum Significant Differen~e= 25.375 Comparisons significant at the 0.05 level are indicated by '***'. F ACTOR2 Difference Upper Confidence Between Confidence Limit Comparison 8-6 8-5 6-8 6-5 5-8 5-6 Simultaneous Simul taneous Lower Limi t Means -20. '500 4.875 30 . 250 -3. 175 22 . 200 47.575 -30.250 -8.050 -47.575 -42.700 -4. 875 20 . 500 17 . 325 42.700 3.175 8.050 -22.200 -17.325 Bonferroni (Dun) T tests for variable: X3 Alpha= 0.05 Confidence= 0.95 dr= 8 MSE= 22.59833 Critical Value of T= 3.01576 Minimum Significant Difference= 10.137 -Comparis.ons significant at the 0.0"5 level are indicated by '***'. . Simultaneous Lower Comparison Confidence Limit FACTOR2 8 8 6 6 5 i: Simultaneous Difference Upper Between .confidence Means Limi t - 6 -0.512 9.625 19.7'62 - 5 o . 763 - 8 -19.7"62 1'0.900 -9."625 21.037 0.512 - S -8.862 1.275 11. 412 -21.'037 - HL 900 _1 "7i: -0 . 763 - 8 - ~ -11 .11') R Qi:" *** *** 132 6.32 (a) MADV A f-or Species: Wilks' lambda A~ = .00823 F= 5.011; p-value = P( F-; 5.011) = .173 F4,2 (.05) = 19.25 'Species effects Do not reject Ho: No MADV A for Nutrient: Wilk'S' lambda A~ = .31599 F = 1.082; p-value = P( F -; 1.082) = .562 F2,l (.05) = 199.5 Do not reject Ho: No nutrent effects (b) Minitab output for the two-way ANOV A's: 560cM Analysis of Variance for 56QCM Source Spec Nutrient Error Total DF 2 1 2 5 SS MS 8; 260 23.738 8.260 2.3£1 47.476 4.722 60.458 F P 10 . 06 0 . 09\l 3.50 0.202 720cM. Analysis of Variance for 720CM Source Spec Nutrient Error Total DF 2 1 2 5 SS 2£2.239 4.489 9.099 275.827 MS 131.119 4.489 4.550 F 28.82 0.99 P 0.034 0.425 The ANDV A results are mostly consistent with the MANDV A results. The exception is for 720CM where there appears to be Species effects. A look at the data suggests the spectral reflectance of Japanse larch (JL) at 720 nanometers is somewhat larger than the reflectance species (SS and LP) regardless of the other two nutrent leveL. This difference is not as of apparent at 560 nanometers. Wilks' lambda statistic does not indicate Species effects. However, Pilai's trace statistic, 1.6776 with F = 5.203 and p-value = .07, suggests there may be Species effects. For MANOV A, the value of (For Nutrent, Wilks' lambda and Pillai's trace statistic give the sam F value.) For larger sample sizes, Wilks' lambda and Pilai's trace stati'Stic would .give essentially the same result for all factors. 133 6.33. (a) MAGV A for Species: Wilk' lambda A~ = .06877 F = 36.571; p-value = P( F ~ 36.571) = .000 F4,52 (.05) = 2.55 Reject H(J: No species effects MANDV A for Time: Wilks' lambda A'2 = .04917 F= 45.629; p-value =P( F~ 45.629) = .000 F4,'52 (.05) = 2.55 Reject Ho: No time effects MANOV A for Species*Time: Wilks' lambda A~2 = .08707 F= 15.528; p-value=P(F~ 15.528)=.000 Fa,52 (.05) = 2.12 Reject Ho: No interaction effects (b) A few outliers but, in general, residuals approximately normally distrbuted (see histograms bèiow). Observations are likely to be positively correlated over time. Observations are not independent. Histogram of the Residuals Histogram of the Residuals (nipo Is 560nm) (..po Is 720rv) 90 90 90 eo 70 ,. 70. 60 g 50 tì 60 ~ 60 :i ~ 40 Gl 6- 40 I! u. 30 u. 30 20 20 10 10 o .s -6 -2 0 -4 4 .20 o 10 .10 Residual Residual (c) Interaction shows up .for the 560nm wavelengt but not for the 720nm ~ wavelengt. See the Mintab ANDV A output below. Analysis of Variance for Source Species Time Species*Time Error Total DF 2 2 SS 965.18 1275.25 4 7 9S . 81 27 35 76.66 3112.90 MS 482.59 637.62 198.95 2.84 F 169.97 224.58 7(J. \)7 P O. 000 0.000 O. (Joa Analysis of Variance for 720nm Source Species Time Species Error Total * Time DF SS 2 2026. 8~ 2 5573.81 4 193. 5S 27 35 1769.t54 95:3.85 MS 1013.43 2766.90 48.39 65.54 F P 15.46 0.000 42.52 -0.000 0.74 '0.574 20 30 6.33-,. (Continued) 134 t d) The data might be analyzed using the growth cure methodology discussed in Section 6.l.. The data might also be analyzed asuming species are "nested" within date. fu ths case, an interesting question is: Is "Spetral reflctane the same for all species for each date? 6.34 Fitting a linear gr.owth curve to calcium measurements on the dominant ulna XBAR Grand mean 72.3800 .69.2875 MLEof beta 71.1939 73.2933 70.6562 72.4733 71.1812 71 .8273 73.4707 70.5049 -1.9035 -0.9818 64.7867 64.5312 (B'Sp~ (-l)B) - (-1) 93.1313 -5.2393 -5.2393 1 .2948 72.1848 65.2$67 Sl S2 92.1189 86.1106 73.3623 74.5890 98.1745 97.013489.482486.1111 86 .11~6 89.0764 72.9555 71.7728 97.0134 100. õ960 88.1425 88.2095 73.3623 72.9555 71.8907 63.5918 74.5890 71.7728 63.5918 75.4441 89.4824 88.1425 86.3496 80. 5S06 86.1111 88.2095 80.5506 81.4156 Spooled W = (N-g) *Spooled 95.2511 91.7500 81.7003 80.5487 91.7500 95.0348 80.8108 80.2745 81.7003 80.8108 79.3694 72.3636 80.5487 80.2745 72.3636 78.5328 Estimated covariance matrix 7.1816 -0.4040 0.0000 0.0000 -0 .4040 0 . 0998 0 . 0000 0 . ~OOO 2762.282 2660.749 2369.308 2335.912 2660.749 2756.009 2343.514 2327.961 2369.308 2343.514 2301.714 2098.544 2335.912 2327.961 2098.544 2277.452 WL 2803.839 2610.438 2271.920 2443.549 2610.438 2821.243 2464.120 2196.065 o . 0000 0 . 0000 6.7328 -0.3788 2271 .920 2464. 120 2531. 625 1~45. 313 0.0000 ~.~OO -0.3788 ~.0936 2443.549 2196.065 1845.313 25S6.818 Lambàa = 1~ I / IWll = 0.201 Since, with a = 0.01, - IN - ~tp - q + g)) 10g(A) = 45.72 ;: X~4-i-l)2(O.0l) = 13.28, we reject the null hypothesis of a Iinear fit at a = u.Ol. 135 6.35 Fitting a quadratic growth curve to calcium measurements.on the dominant ulna, treating all 31 subjects as a single group. XBAR MLE of beta (B 'Sp~ (-l)B) - (-1) 70.7839 71.6039 92.2789 -5 .9783 0 .0799 71. 9323 71.80:65 3 . 8673 -1.9404 -5.9783 9.3020 -2.9033 0.0799 -2.9033 1.0760 64.6548 S W = (n-l) *8 94.5441 90.7962 80.0081 78.0676 90.7962 93.6616 78.9965 77.7725 80.0081 78.9965 77.1546 70.0366 2836.322 2723.886 2400.243 2342.027 78.0676 77.7725 70.0366 75.9319 2723.886 2809.848 23ß9. 894 2333. 175 2400.243 2369.894 2314.~39 2101.099 2342.027 2333.175 2101.099 2277.957 Estimated covariance matrix W2 2857.167 2764.522 2394.410 23ß9.674 3.1894 -0.2066 0.0028 -0.2066 0.3215 -0.1003 0.0028 -0.1003 0.0372 2764.522 2889.063 2358.522 2387 .~70 2394.410 2358.522 2316.271 2093.3ß2 2369.674 2387.070 2093.362 2314.'"25 Lambda = I w I / I W21 = 0.7653 Since, with a = V.OI, - (n - Hp - q + 1)) 10g(A) = 7.893 ~ XL2_i(0.01) = 6:635, we reject the null hypothesis of a quadratic fit at a = 0.01. 6.36 Here p = 2, n¡ = 45, n2 = 55, In 1 S¡ 1= 19.90948, In I S2 1= 18.40324, In 1 S pool~d 1= 19.27712 so u =(~+~- 1 J(2(4)+3(2)-lJ = .02242 44 54 44+54 6(2+1)(2-1) and C = (1- .02242)(98(19.27712) -44(19.90948) -54(18.40324)) = 18.93 The chi-square degrees of freedom v =.! 2(3)(1) = 3 and z; (.05) = 7.81. Since 2 C = 18.93;: Z;(.05) = 7.83, reject Ho : ~¡ = ~2 = ~ at the 5% leveL. 136 6.37 Here p = 3, n, = 24, n2 = 24, In 1 S, 1= 9.48091, In 1 S2 1= 6.67870, In I Spooled 1= 8.62718 so u =(~+~- 1 Jr 2(9)+3(3)-1) =.07065 23 23 23+23 L 6(3+1)(2-1) and C = (1-.07065)(46(8.62718) - 23(9.48091) - 23(6.67870)) = 23.40 The chi-square degrees of freedom v = .!3(4)(1) = 6 and .%;(.05) = 12.59. Since 2 C = 23.40 )0 xi (.05) = 12.59, reject H 0 : 1:, = 1:2 = 1: at the 5% leveL. 6.38 Working with the transformed data, Xl = vanadium, X2= .Jiron, X3 =,Jberyllum, X4 = 1 ¡f saturated hydrocarbons J , Xs = aromatic hydrocarbons, we have p = 5, n, = 7, n2 = 11, n3 = 38, In 1 S, 1= -17.81620, In I S2 1= -7.24900, InIS31=-7.09274,lnISpoled 1=-7.11438 so u=r.!+..+~- 1 Jr2(25)+3(5)-I) L6 10 37 6+10+37 L 6(5+1)(3-1) =.24429 and C = (1-.24429)(53(-7.11438) -6(-17.81620) -10(-7.24900) -37(-7.09274)) = 48.94 The chi-square degrees of freedom v = .!5(6)(2) = 30 and .%;0(.05) = 43.77. Since 2 C = 48.94)0 x;o (.05) = 43.77, reject H 0 : 1:1 =I:2 = 1:3 = 1: at the 5% leveL. 6.39 (a) Following Example 6.5, we have (iF - xM)' = (119.55, 29.97), -8 +-8 - an i- - . . inee (28 1 F 1 28 J-'M-r- .108533 .033186 .423508 -.108533) d.. -76 97 S' r = 76.97 )0 xi (.05) = 5.99, we reject H 0 : PF - PM = 0 at the 5% leveL. (b) With equal sample sizes, the large sample procedure is essentially the same as the procedure based on the pooled covariance matrix. (e) Here p=2, 154(.05/2(2)):: z(0125) = 2.24, (J.8 +J.sJ =(186.148 47.705J, so 28 F 28 M 47.705 14.587 PF' - PM': 119.55:f 2.24.186.148 ~ (88.99, 150.11) PF2 -PM2: 29.97:f2.24.J14.587 ~(21.41, 38.52) Female Anacondas are considerably longer and heavier than males. 137 6.41 Three factors: (Problem) Severity, Wroblem) Complexity and (Engineer) Experience, each at two levels. Two responses: Assessment time, Implementation time. MANOV A results for significant (at the 5% level) effects. Effect Severity Complexity Experience Severity*complexity Wilks' F lambda .06398 .01852 .03694 .33521 P-value 73.1 .00 265.0 130.4 9.9 .000 .000 .004 the two responses, Assessment time and Implementation time, show only the same three main effects and two factor interaction as significant with p-values for the appropriate F statistics less than .01 in all cases. We see that both assessment time and implementation time is affected by problem severity, problem complexity and engineer experience as well as the interaction between severity and complexity. Because of the interaction effect, the main effects severity and complexity are not additive and do not have a clear interpretation. For this reason, we do not calculate simultaneous confidence intervals for the magnitudes of the mean differences in times across the two levels of each of these main effects. There is no interaction term associated with Individual ANOV A's for each of experience however. Since there are only two levels of experience, we can calculate ordinary t intervals for the mean difference in assessment time and the mean difference in implementation time for gurus (G) and novices (N). Relevant summary statistics and calculations are given below. 1.217J Error sum of squares and crossproducts matrix = 1.217 2.667 (2.222 Error deg. of freedom: 11 Assessment time: xG = 3.68, xN = 5.39 95% confidence interval for mean difference in experience: 3.68-5.39 :!2.201.J2.222 2 = -1.71:!.49 -7 (-2.20, -1.22) 11 8 Implementation time: xG = 6.80, xN = 10.96 95% confidence interval for mean difference in experience: ~2.667 2 6.80 -1 0.96 :! 2.201 -- = -4.16:! .54 -7 (-4.70, - 3.62) 11 8 The decrease in mean assessment time for gurus relative to novices is estimated to 138 be between 1.22 and 2.20 hours. Similarly the decrease in mean implementation time for gurus relative to novices is estimated to be between 3.62 and 4.70 hours. l39 Chapter 7 7.1 -1~1 (8::)= 11 (-::) = (~:::J 1 l (120 - - 120 -10 ß = (Z'Z)- l'y =-- 180 85 123 .. .... 1 Y=LS=_ _ 15 351 199 142 = 12 . 000 15 5.667 8.200 23.400 13.2£7 9 3 ,. ,. £: = Y-Y - - = 9 .4'67 25 - l2 .000 5.667 3.3J3 8 .lOO -'5.200 .23.400 = 9 13.267 13 9 .467 3 .ooõl 1.~0 -o.2t7 3.533 .. ,. Residual sum of squares: :1: = 101.467 fitted equation: y = -.667 + 1.2£7 zl 7.2 Standardized variables zl z2 Y - .292 -1 .088 .391 -1 . 166 - . 7.2£ - . 391 - . 81 7 1 .283 - .726 -1 .174 . 3£3 1.695 -.117 .726 - . £5.2 1 . 1 08 1 .451 .130 fi tted equa ti on: .. Y = 1 .33z1 - .7 9zZ Al so, pri or to standardi zi ng the variables, zl = 11 .6'ó7, z1 z12 :z ž2 = 5.000 and y = 12.tlOO; Is = 5.716, '¡sz z = .2.7'57 and IS = 7.6'67 . yy The fi tted equation for the origi na 1 variabl~s is = 1 33 - y _ 1 2 (Zl - 11 .6£7) 7.667. 5.716. .79 2.757; (Z2 - 5\ ,. y = .43 + 1.7Bz1 - 2.19z2 7.3 - ~ - - -w Foll.o\'1 hint and note that s* = y* - y* = v-1/2y_v-1/2;æ ami (n-r-l )02 = Ê*'.Ê* is distributed as X1 n-r-1. 140 7.4 ii ) v=I b) V -1 ,. 1 n n so ß.w = (zlz)- z'y = t L z.Y.)/( ¿z~). ~ - - - - j=l J J j=l J is diagonal with jth diagonal -element 1/'1. so J n n ""W - . j=l J j=l J â = (zIV-lz)-l :iv-l~ = (L y.)/( r z.) cj y-l is diagonal with jth diagonal element l/z~ ~o J n ~W - .. .... J=1 ß = (z'y-1z)-lzIV-ly = (.r (YJ,¡zJ.))/n 7.5 So, ution follows from Hi nt. 7.6 a) irs t nO.e at A. 1,0,...0) F. + th - --d1ag . r Ai -1,...,). -1 ri + is a generalized inverse of il since . . o À1 AA- = r 1;1+1 . so M - A = :J .Àr, = A +1 a .1) .0 Si nce Z'l = ! )..e.e! = PAP' . , 1-1-1 1= ri+1 (Z'Z)- = ¿ ).:' e.e~ = PA-P' 1.= 1 1 - 1_ 1 . with PP' = P'P = I , we check that th~ defining relation holds p -.~ (Z'Z)(Z'Z)-(Z'Z) = PAp1(PA-P')PAP' = PM- Api = PAP' = l Z ti ) 8y the hint, ,. lZ8 = Z'y. if ze-,. is the 'Projection t c) , that In we show ,. ze- .0 = - is the Z' (y- ,. - or - ia) pro je.ct;..o n of y- . . 141 _ -1/2 c) = l,~,..., r 1 +1 . Then Consider q. - À. Ze. for _1 1 _1 ri +1 -1 . )ZI ri +1 Z(Z'Z)-Z' = Z(I.~ 1À. e.e. = -1-1 i =1 1 I. ;=1 q.q. _1_1 The (S11 are r1+l mutually perpendicular unit l~ngth vectors that span the space of all linear combinations of the columns of Z. Thé projection of iis then (see R.esul t 2A.2 and Def; nition 2A.12) ri+l ri+l ri+l '1-1-1- - I (q!y)q. = ¿ q.(q~y) = ( L q.q~)y = Z(Z'Z)- Z'y 1= ;=1 -1- -1 i=l -1 -1d) See Hint. 7.7 and Z = (Zl h J . Write . = ~(2) ~ (_~U1J .. - =.(2) - r - :r-q A Recall from Result 7.4 that ß =(~ii) = (Z'Z)-lZ'Y is distributed as N +1(ß,a2(Z'Z)-1) indepen4ently of nâz = (n-r-l)sZ which is distributed as a2 X~-r-l. From the Hint, (~(2'-~(2))'(Cl~~(2'-~(2)) iscl2 and this is distribut~d independently of S2. Ühe latter follows because the full random vect-or ê is distributed independently of SZ). The result follows from the definition of a F random variable as the ratio of .two independent X2 random variables divided by their degrees of freedom. 7.8 (;t) H2 = Z(Z'Z)-l Z'Z(Z'Z)-i Z' = Z(Z'Z)-i Z' = H. (h) Since i - H is an idempotent matrix, it is positive semidefinite. Let a be an n x i unit vector with j th element 1. Then 0 ~ a'(l - H)a = (1- hii)' That is, hji ~ 1. On the other hand, (Z'Z)-l is posiiÏe definite. Hence hij = bj(Z'Z)-lbi ~ 0 where hi is the i th row of z. ¿'i~:hij = tr(Z(Z'Z)-iZ') = tr((Z'Z)-iZ'Z) =tr(Ir+1) =r+l. 142 (c) Usill (Z'Z)-In£J1=1 ="':1(z' ¡ ¿~I zl -l:~1 - i"i:z'ZiJ n' J- -z)2 £Ji=1 we obtain 1 ( ßn ) i_I , 1=1 i=i hjj - (1 Zj)(Z'ZJ-I ( ;j ) - ní:~ (z' _ z)2 L:z; - 2z; ¿Zi + nzj) 1 (Zj - z)2 - ;; + í:i'i(Zj - z)2 7.9 Z' = (' , -2 -1 1 1 :l a (ZIZ)-l '=('0/5 1;10 J ~(1) = (Z'Zi-1Z'l(1) = L~91; ~(2) = (Z'Z)-lZ'~(2) = ri ~5 J t = (~(l) :1 ~~2)J - ( - - ~9 1 ~5 J Hence 4.8 -3.0 3.9 -1.5 " ,. y = Z~ = 3.0 a 2.1 1.5 1.2 3.0 ,. ,. e =y-y= 5 -3 3 -1. -1 4 2 1 2 3 4.8 3.9 3.0 2.1 1.2 -3.0 -1.5 0 1.5 3.0 = .2 - .9 .5 1.0 .1.0 - .1 .5 - .2 0 ".A A A Y'Y = y'y + tit r 55 J-15 l- 0 - 1 SJ ( 53 . 1 -13.SJ + r 1.9 -1. SJ 24 =. -13.5 22 .5 L - i .5 1.5 143 7.10 a) Using Result 7.7, the 95% confidence interval for the mean reponse is given by (1, .5) l'"3.0) :t 3.18 .5) (.: .~)I.1 (\9) or - .9 (1.35, 3.75). b) Usi ng Resu1 t 7.8, the 95% prediction interval for the actual Y is given by (1, -. 5 J (3 .0 J- :! 3.18 -.9 )11 + (1, os) (0: .~H~KI j9)'or (~ . 25, 5.35) . c) Using (7-l.¿) a 95% prediction ellipse for the actual V's is given by (YOl -2.55, Y02 - .75) 7.5' 9.5 Y02 - .75 (7.5 7.5J (Y01. -2.SS) s (1 + .225) ~2)P~ (19) = 69.825 144 7.11 The proof follows the proof of Result 7.10 with rl replaced by A. n (Y- ZB ) i (T- Z' B) = I (V. -8 z . )( Y . -B z . ) , j=l -J -J -J -J and i:~=1 dj(B) = tr(A-1rY'-ZB)'(Y-IS)) Next, (¥- ZS) i (Y-ZB) = (Y-Z~+Zp-ZB) i (y- zP+ZS-ZB) = ê'€ + '~~-B) i Z i Z(~-B)J so i:~=1 dj(B) = tr(A-l£'tJ + tr(A-l(j-B)'Z'I~i-B)) The fi rst tenn does not depend on the choice of B. Usi n9 Resul t 2A. 1 2 ( c ) tr(A-lt~-B)'Z'Z\P-B) = tr(~p-B)'Z'Z(s-8)AJ = tr(Z i Z (S-B )A(~-B) i) ,. ,. = tr(Z'(f3-B )A(S-ß)' Zi) ~ C i Ac ) 0 - - ~/here ~ is any non-zero row of ~(~-B). Unless B = i, Z(S-B) will have a non-zero row. Tl)us ~ is the best 'Choice f-or any positive d'efi ni te A. 145 7.12 (a) (1)) best linear pr~di~tor = -4 + 2Z1 - Z2 +-1 mean square error = cr - a i + az = 4 yy _ zy zz - y a a i t-1 (c) PY(x) = -zyayy zz -zy _ IS _ - '3 - .745 (d) Following equation (7-5b), we partition t as t = iL ~ -i ~J 1 1 ii 1 and detenni ne cava r; ance of ( 1 given z2 to be :, ( : : J - ( : J (1 ) - , (1. 1) = l: ~). Therefore =IiT= 2 Py Z i · Z 2 = 7.13 .¡ If (a) By Result 7.13, ß_ =zz s-l-s _zy (b) Let !(2) = (Z2,Z3J = r 3.73) L 5. 57 R = zl (Z2Z3) =/3452.33 = VS691 .34 .78 (c) Partition ~ = t l~11 so .707 1 s -z(2)zl çl s z~ 2)z(2)-z(2 )zl s zizl 146 S691.34 r S I s' s i i S f---------- = z(l )Z(l): -Z3Z.(,) i S = '600.51 126.05 i _________l-___ 217.25 23.37 i l23.11 L _z3z(,) i '3z3 and s - s' s-l s 380.821 z(l )z(l) -z3z(1) z3z3 -z3z(,) = r3649.04 380.82 ' 02.42 Thus 380 . 82 r z, z2.z3 7.14 = .£2 /3649.04 1''02.42 (a) The large positive correlation between a manager's experience and achieved rate of return on portfolio indicat~s an apparent advantage for managers with experience. The negative correla- tion between attitude to\'iard risk 'and achieved rate of return indica tes an apparent advantage for conservative managers. (b) from (7-S1) s s syz, _ YZ2s 2, ryz, · z2 s z z Zi yz'.ZZ =ß') 2 = /s · Is i S syz yy-z2 z,zl.z2 _ --. S2 yy z2z2 s zl zl ryz, - YZ2 r r2, z2 11 - r~-YZ2I' -zlr~-z2 = zl Z2 s 'S z2z2 = .31 Removifl9 lIy.ear'S of eXl'eriencell from ,consideration, we no\'1 have a positive c.orr-elation between "attitude towar.d riskll and "achieved 147 returnll. After adjusti ng for years of experience, there ;s an apparent advantage ,to mana~ers who take ri sks. 7.15 (a) MINlTAB computer output gives: y = 11 ,870 + 2634'1 + 45.2z,z; residual sum of squares = 2tl499S012 with 17 degrees of freedom. Thus s = 3473. Now for example, the ~stimated standard devia- ,. ,. .. tion of ßO is /1.996152 = 4906. Similar calculations give the estimated standard deviations of ß1 and ß2. are no apparent (b) An analysis of the residuals indicate there model inadequacies. (c) The 95% predi~tion interval is ($51 ,228; $60,23~) (d) Using (7-",Q), F = (45.2)( .0067)-1 (45.2) = .025 12058533 . Since fi,17(.OS) = 4.45 we cannot reject HO:ß2 =~. It appears as if Zl is not needed in the model provided £1 is include~ in the model. 7.16 Predictors P=r+1 C.o 1.025 Zl 2 Z2 2 12.24 3 3 Zl 'Zz 148 sales and assets follows. 7.17 (a) Minitab output for the regression of profits on Profits = 0.~1 + 0.0'6~1 Sales + 0.00577 Assets Predic-tor Constant Sal.es Assets S = SE Coef (;oef 0.013 0.02785 0.004946 0.0'6806 0.005768 p 0.999 0.045 0.282 R-Sq(adj) R-Sq = 55.7% 3.86282 T 0.00 2.44 1.17 7.'641 = 43.0% Analysis of Variance Source DF SS 2 131. 26 Regression Residual Error Total 7 9 104.45 235.71 MS F 65.63 14.92 4.4() P 0.058 (b) Given the small sample size, the residual plots below are consistent with the usual regression assumptions. The leverages do not indicate any unusual observations. All leverages are less than 3p/n=3(3)110=.9. Resîdual Plots for Prots ;-",:¡':-,:--,. ,',',,",.', -"',:-,:;:'Nónf~tProbabtltypìól: of the ResdualsÝetthe fi Value Residiials 99 , . 5.0 90 :. .." ii 2.5 ! 0.0 ~2,5 "5.0 1 .0 "10 10,0 125 15.0 17.5 20.0 10 Fi Value .Reual Residuals Versus the Order of the Data llistni..oftt~ Residuals 4: 5.0 ii.. D' 3 .. Ii I " 2 f 1 -2.5 o ,Reua' (c) With sales = 100 and assets = 500, a 95% prediction interval for profits is: (-1.55, 20.95). (d) The t-value for testing H 0 : ß2 = 0 is t = 1.17 with a p value of .282. We cannot reject H 0 at any reasonable significance leveL. The model should be refit after dropping assets as a predictor varable. That is, consider the simple linear regression model relating profits to'sales. 149 7.18 (a) The calculations for the Cp plot are given below. Note that p is the number of model parameters including the intercpt. 2 (sales) 2.4 2 (assets) 7.0 3 (sales, assets) 3.0 (b) The AIC values are shown below. p (predictor) AIC 2 (sales) 29.24 2 (assets) 33.63 3 (sales, assets) 29.46 7.19 (a) The "best" regression equation involving In(y) and Z¡, Z2,..' ,Zs is In(y) = 2.756-.322z2 +.114z4 with s = 1.058 and R2 = .60. It may be possible to find a better model using first and second order predictor variable terms. (b) A plot of the residuals versus the predicted values indicates no apparent problems. A Q-Q plot of the residuals is a bit wavy but the sample size is not large. Perhaps a transformation other than the logarthmic transformation would produce a better modeL. iso 7.20 Ei genva 1 ues 'Of the carrel atÍ\Jn matrix of the predi ctor vari able'S 2:1, z2,...,z5 are 1.4465,1.1435, .8940, .8545, .6615. The correspoml- of '1' z2,...,z5 in the ing eigenvectors give the coefficients principle component. for example, the first principal component, written in terms of standardized predictor variables, is .. * * * * * Xl = .60647.1 .3901Z2 .6357Z 3 - .2755Z4 - .0045zS A regression of Ln(y) on the first principle component gives " .. 1n(y) = 1.7371 - .070.li with s = .701 and R% = .015. A regression of 1n(y) on the fourth principle ~ompon~nt produ~~s the best of the one pri ncipl e component pr.edictor variable regress ions. .. In this case 1n(y) = 1.7371 + .3604x4 and s = .618 and R1 = .235. 7.21' This data set doesn1t appear to yield a regr.ession relationship whkh explains a larg.e proportion of the variation in the r~sponses. (a) (i) One reader, starting with a full quadratic model in t~e predictors z1 and z2' suggested the fitted regressi'On equation: " Yl = -7.3808 + .5281 z2 - .0038z2 z with s = 3.05 and R% = .22. (Can you do bett.er than this?) of the residuals versus the fitted values SU99~sts (ii) A plot the response may not have constant variance. Al so a Q-Q plot of the residuals has the fOllO\'ling gen,eraT ap?ear- ance: 151 Normal probabilty plot . co . .... (0 .,........ C/ ei :: 'C ïñ -. ..... C\ Q) 0: 0 -. I ....~.. ...- ..;¡. ...... ..~ ...... . . ~..., . . . .... ..' -2 -1 o 1 2 Quantiles of Standard Normal Therefore the normality assumption may a 1so b~ suspect. Perhaps a better regr.ession can be obtained after the responses have been transformed or re-expressed in a di fferent metri c. (iii) Using the results in (a)(i), a 95~ prediction interval of zl = 10 (not needed) and z2 = 80 is 10.84 :! 2.0217 or (5.32,16.37). 152 7.22 (a) The full regression model relating the dominant radius bone to the four predictor variables is shown below along with the "best" model after eliminating non- significant predictors. A residual analysis for the best model indicates there is no reason to doubt the standard regression assumptions although observations 19 and 23 have large standardized residuals. Q) The regression equation is DomRadius = 0.103 + 0.276 DomHumerus - 0.165 Humerus + 0.357 DomUlna + 0.407 Ulna Coef 0.1027 0.2756 Predictor Constant DomHumerus Humerus 0.3566 0.4068 Ulna P -1. 20 1. 80 1. 87 0.246 0.088 0.076 0.97 0.346 2.40 0.02"6 0.1064 0.1147 0.1381 0.1985 0.2174 -0.1652 DomUlna T SE Coef R-sq(adj) = 66.1% R-Sq = 71.8% S = 0.0663502 - .- ~ -,------~~-------------"'-"~_._~~- - The regression equation is DomRadius 0.164 + 0.162 DomHumerus + 0.552 DomUlna predictor Constant DomHumerus DomUlna Coef 0.1637 0.16249 0.5519 S = 0.0687763 P T SE Coef 0.128 0.012 0.002 1. 58 0.1035 0.05940 0.1566 2.74 3.53 R-sq(adj) = 63.6% R-Sq = 66.7% Analysis of variance Source DF Regression Residual Error Total (ii) SS 0.20797 0.10406 0.31204 2 22 24 F P MS 0.10399 21. 98 0.000 0.00473 Residual910ts for DomRadius..Dom Hi:metus and 'Dm Ulna PtedictlS Normal probabilty Plot " 'O the Reduàls Reiduals Verus the Fitt Value . 1l 90 .. -I ,~ 50 .. . .. :D; 10 1 1l J. ~ '0.1 0;0 0;6 0.1 0;7 OJ! 0.9 1.0 Resual Fi Value Histogl"in of the Residuàls ResidualsVelsus tleOrder ofthè Ðlt 8 f& :! i .. 4 ! II .2 '0.1 ~ò16 "8.08 .O¡O 0.08 0.16 ll..1 ~Or 2 4 6 8 10 12 14 16 18 20 22 24 153 (b) The full regression model relating the radius bone to the four predictor varables is 'Shown below. This fitted model along with the fitted model for the dominant radius bone using four predictors shown in part (a) (i) and the error sum of squares and cross products matrix constitute the multivanate multiple regression modeL. It appears as if a multivariate regression model with only one or two predictors wil represent the data well. Using Result 7.11, a multivarate regression model with predictors dominant ulna and ulna may be reasonable. The results for these predictors follow. The regression equation is Radius = 0.114 _ 0.0110 DomHumerus + 0.152 Humerus + 0.198 DomUlna + 0.462 Ulna Coef Predictor Constant o . 11423 DomHumrus Humerus DomUlna Ulna -0.01103 0.1520 0.1976 0.4625 SE Coef T 1.27 0.08971 0.09676 -0.11 1.31 1.18 2.52 0.11'65 0.1674 0.1833 P 0.217 0.910 0.207 0.252 0.020 S = 0.0559501 R-Sq 77.2% R-Sq(adj) = 72.6% Error sum of squares and cross products matrix: The regression equation is Radius = 0.178 + 0.322 DomUlna + 0.595 Ulna The regression equation is DomRadius 0.223 + 0.564 DomUlna + 0.321 Ulna Predictor Coef 0.2235 DomUlna o . 5645 Constant Ulna 0.3209 SE Coef 0.1120 0.2108 0.2202 T 2.00 2.68 1.46 Predictor p Constant 0.059 0.014 0.159 DomUlna Ulna SE Coef 0.08931 0.1680 0.1755 T 2.00 1.92 3.39 Analysis of Variance Analysis of Variance Source DF 5S 2 0.184863 Res idual Error 22 0.127175 24 0.312038 Total Coef 0.17846 0.3220 0.5953 MS 0.092431 0.005781 F 15.99 P VIF 0.058 o . 0'68 2 . 1 0.003 2.1 S = 0.0'606160 R~5q = 70.5% R-Sq(adj) = '67.8% S = 0.07'60309 R-Sq = 59.2% R-sq(adj) = 55.5% Regression .050120 .050120J .062608 (.088047 P 0.000 Source DF 5S 2 0.193195 Residual Error 22 0.080835 24 0.274029 Regression Total Error sum of squares and cross products matnx: MS F I 0.09'6597 26.29 O.OOC 0.003'674 .064903J .064903 .080835 (.127175 154 7.23. (a) Regression analysis using the response Yi = SalePr. Sumary of Backward Elimination Procedure for Dependent Variable X2 Variable Number Partial Model Step 1 2 3 Removed In R**2 R**2 K9 7 0.0041 0.5826 X3 0.0043 0.5655 0.5782 X5 56 0.0127 6.3735 6.4341 Sum of Mean DF Squares Square Model F o . 66g7 o . 7073 C Total R-square 425.05739 Parameter Estimates Root MSE Variable DF INTERCEP 1 XL 1 1 X4 X6 X7 1 1 1 18 o .4033 o .1538 2.0795 F Value Prob) F 18.224 0.0001 5 16462859.832 3292571.9663 70 12647164.839 180673.78342 75 29110024.671 Error Prob)F 0.4161 SalePr Dependent Variable: X2 Analysis of Variance Source C(p) 7 .6697 o . 5655 Parameter -Standard Estimate Error -5605.823664 1929.3g86440 -77.633612 22.29880197 -2.332721 o . 75490590 389.364490 1749.420733 89.17300145 701.21819165 133. 177'529 46 .66673277 T for HO: Parameter=O Prob ) ITI -2.905 -3.482 -3.090 4.366 2.495 2.854 o . 0049 o .0009 o . 0029 o . 0001 0.01'50 o .0057 The 95% prediction interv~l for SalePr for %0 is z~ß:f t70(0.025) /(425.06)2(1 + z~(Z/Z)-lZO)' SalePr:~reed .) FtFrBody J Frame ~ BkFat) SaleHt) (a) Residual plot oo II ~ o ll ....... 0c: oo ..o '" 1ã i::: 1i Gl a: ~b) Normal probability plot 00 ll ~ .. c: ~ '" o ii:: i: 'Uj .. 00 ll o o '9·.... .-~. u; .. 1000 2000 Predicted /:...-..-.. Gl a: . ... ~ . . 0 o .._.................'W......:;.._.~..........................._.. .... . . . . e. 00 3000 /' .- .~.......~.;:~.. . -2 -1 o 2 Quanties of Standard Normal 155 (b) Regression analysis using the r.esponse Yi = In(SalePr). Sumary ofBa~kward Elimination Procedure f~rDependent Variable LOGX2 Variable Number Partial Medel Removed In R**2 R**2 Step X3 76 0.0033 0.6368 X7 0.0057 0.6311 i9 5 0.0122 0.6189 X4 4 0.0081 0.6108 1 2 3 4 C(p) F Prob~F 7.6121 6.6655 0.6121 '0 .4368 1. 0594 6 . 9445 2 . 2902 o .3070 o . 1348 6.4537 1 .4890 o . 2265 Dependent Variable: LOGX2 Analysis of Variance 'Sum of Mean Source DF 'Squares Square F Value Prob~F Mode 1 4 71 75 4.02968 1. 00742 27.854 0.0001 2 . 56794 0.03'617 Error C Total 6.597-'2 0.19018 R~ot MSE R-square Parameter Estimates 0.6108 Parameter Estimate 'Standard Error Parameter=O Prob ~ ITI 0.91286786 5.736 0.0001 o . 00846029 o . 00827438 -5 .841 -3 .337 o . 000 1 1 5.235773 -0.049418 -'0.027613 0.183611 o .058996 4.599 3.060 o . 0001 1 0.03992448 0.01927655 Variable DF INTERCEP 1 XL 1 X5 X6 X8 1 T for HO: 0.0013 0.0031 The 95% prediction interval for In(SalePr) for Zo is z~ß:f t7o\O.025) J~O.19.Q2)2(1 + z~(ziz)-izo). The few outliers among these latter residuals are not so pronounced. In(alePrFfl3r.ed S PrctPF8 j Frame i SaIeHt) (b) Normal probabilty plot (a) Residual plot . c: C\ II iü :: c: "0 .¡¡ '" a: y/ "" . ... .- .'.... c: .. . .... .". . . . .:".. . C\ II c: II 0 iü :: :2 .- ." j........ '" ~ . ............................................................................... c: .. .:. ..1.\ .: ~ . : C\ c? "" .. . .. 9 7.0 7.2 7.4 7.6 7.8 8.0 Predicted a: J C\ 9 ....~~;. ...... ~ ........ 9 '. -2 -1 o 2 Quantiles of Stadard Nonnal 156 7.24. (a) Regression analysis using the response Yi = SaleHt and the predictors Zi = YrHgt and Z2 = FtFr Body. SaleHt Dependent Variable: X8 Analysis of Variance Sum of Mean OF Squares Square F Value Model 2 235.74'533 117 .87267 131.165 Error 73 75 65.60204 301.34737 o . 89866 Source C Total R-square 0.94798 Root MSE Parameter Estimates o . 7823 Standard Error Parameter Estimate Prob)F 0.0001 Variable OF INTECEP 1 1 7.846281 3.36221288 X3 o . 802235 o . 08088562 X4 1 o . 005,773 0.00151072 T for HO: Prob ) Parameter=O 2.334 9.918 3.821 ITI 'Û . 0224 o . 000 1 o .0003 The 95% prediction interval for SaleHt for z~ = (1,50.5,970) is 53.96:f t73(0.025) \/0.8987(1.0148). = (52.06,55.86). SaleHt:r~rHgt) FtFrBody) (b) Normal probabilty plot (a) Residual plot ......,. N N .... . e. :.......: .'.' . /~:. ./ .t' .- ... .... Ul ii:i 'l c a: ... ëii Gl Ul . .. ll .- . ,,' .. .:. e. ....... ............wr........ v-....... 60.............._;.. ........ ii:i 'l 0 a: .., C) 54 Predicted 56 ..pa .... C) 52 ././ ëii Gl 58 ,,".......:. . .2 -1 o 2 Quantiles of Standard Normal 157 (b) Regression analysis using the response 1í = SaleWt and the predictors Zi = YrHgt and Z2 = FtFrBody. SaleWt Dependent Variable: X9 Analysis of Varian~e Sum of Mean Source DF Squares Square Model 2 390456.63614 195228.31807 73 873342.99544 11963.60268 75 1263799.6316 Error C Total 109.37826 Parameter Estimates Root MSE Variable DF INTERCEP X3 X4 1 Parameter Estimate R-square 1 Prob;)F 16.319 0.0001 o . 3090 Standard Error 675.316794 387 . 93499836 9.33265244 2.719286 0.17430765 0.745610 1 F Value T for HO: Parameter=O 1.741 Prob ;) ITI 0.291 4.278 0.771'6 o .0859 o .0001 The 95% prediction interval for SaleWt for z~ = (1,50.5,970) is 1535.9:: t73(0.025)V1l963.6(1.0148) = (1316.3,1755.5). SaleW~rHgt) FtFrBody) (a) Residual plot (b) Normal probabilit plot oo C' o o C' o o N II ñi =' '0 '¡¡ Gl IX o o - ... .'. .. . .' ooC\ . .. .. .... .. . .... . ~ ' . .... o ... o .. ._.._.;....._.....;~.._;.. .-_.................................. II ñi =' '0 .¡¡ Gl IX o 8 . ci Predicted r /" / ¡..- .or o oo ... 1500 1600 1700 1800 . .' , .,'.' ,. ..... oo - -. o ......rI. -. . ci .- ..... ... ..... ., . ..., ...... .2 -1 o 2 Quantiles of Standard Normal 158 Multivariate regression analysis using the responses Yi = SaleHt and Y2 = SaleWt and the predictors Zi = YrHgt and Z2 = FtFrBody. Multivariate Test: HO: YrHgt = 0 Multivariate Statisti~s and Exact F Statistics S=1 M=O N=35 Statistic Value F Wilks i Lambda o . 38524567 57.4469 Pillai's Trace Hotelling-Lawley Trace 0.61475433 57 . 4469 Roy i s Greatest Root Multivariate Test: HO: Multivariate Statistics S=l 1 .59574625 57.4469 1.59574625 57 .4469 Num DF 2 2 2 2 Den DF Num DF 2 2 2 2 Den DF Pr ~ F 72 72 72 72 0.0001 0.0001 Pr ~ F 72 0.0001 72 0.0001 72 a .cQOO1 72 0..0001 FtFrBody = a and Exact F Statistics N=35 M=O Statistic Value 0.75813396 0.24186604 0.31902811 0.31902811 Wilks i Lambda Pillai's Trace Hotelling-Lawley Trace Roy's Greatest Root F 11. 4850 11 .4850 11. 4850 11 .4850 a .0001 0..0001 The theory requires using Xa (YrHgt) to predict both SaleHt and'SaleWt, even though this term could be dropped in the prediction equation for Sale\Vt. The '95% prediction ellpse for both SaleHt and SaleVvt for z~ = (1,50.5,970) is 1.3498(1'i - 53.96)2 + O.000l(Yó2 - 1535.9)2 - O.0098(Yói - 53.96)(ìó2 - 1535.9) 2(73) - i.OI4872F2,72(O.05) = 6.4282. The 95% predicion ellpse for both SaleHt andSaleWt Chi-square plot of residuals o - ~ Ò 'l...:/' oo ~ co l: II ~l! '" N o ~ ai o 2 4 qchisq oo io - o , o- CO 6 B 10 51 S2 53 S4 5S 55 57 Y01 159 7.25. (a) R-egression analysis using the first response Yi. The backward elimination proce- dure gives Yi = ßoi + ßl1Zi + ß2iZ2. AU variables left in the model are significant at the D.05 leveL. (It is possible to drop the intercpt but We retain it.) Dependent Variable: Y1 Analysis of Variance TOT Sum of Mean Source Model DF Square s Square F Value Prob~F 2 5905583.8728 2952791.9364 22.962 o .0001 Error 14 1800356. 3625 128596.88303 C Total 16 7705940.2353 R-square 358 . 60408 Root MSE o . 7664 Parameter Estimates Parameter Estimate Standard Parameter=O Frob ~ ITI 1 56. 720053 507.073084 Eror o . 328962 0.274 2.617 6.609 o . 7878 o . 0203 1 206.70336862 193.79082471 0.04977501 Variable DF INTERCEP 1 Zl Z2 T for HO: 0..0001 The 95% prediction interval for YÍ = TOT for z~ = (1,1,1200) is 958.5:l ti4(O.025)y'128596.9(1.0941) = (154.0,1763.1). TOT~eN ; AMT) (a) Residual plot 0co0 (b) Normal probabilty plot 00 co . .....' -' . .' .' . ..... 00 0N0 In ãi :: Q "C ,ñ ............................................................. ~ .. CD ir 0 0 y r' .' N In ãi :: "C üi 0 ,.' ,... . .....~.. 00 y .... " oo ~ 500 1000 2000 Predicted 3000 ..... ...... . ..... . CD ir . . .... .' ...... g . ~ -2 -1 o Quantiles of Standard Normal 2 160 (b) Regression analysis using the second response Y2' The backward elimination procedure gives 11 = ß02 + ßi2Zi + ß22Z2. All variables left in the model are significant at the 0.05 leveL. Dependent Variable: Y2 AMI Analysis of Variance of Mean Sum Source DF Squares Square F Value Prob.)F Model 2 5989720.5384 2994860.2692 14 1620657.344 115761.23886 16 7610377.8824 25.871 0.0001 Error C Total R-square 340.23703 Root MSE Parameter Estimates Variable DF INTERCEP 1 Zl 1 Z2 1 o . 7870 Parameter Standard Estimate Error -241.347910 196.11640164 T for HO: Parameter=O Prob .) ITI -1.231 3.298 6.866 o . 2387 183.86521452 606 . 309666 o . 324255 o . 04722563 0.0053 o . 0001 The 95% prediction interval for 1'2 = AMI for z~ = (1,1,1200) is 754.1 :l t14(0.025) Jii5761.2(1.0941) = (-9.234,1517.4). AMi=t(eN J AMT) (b) Normal probabilty plot (a) Residual plot oo CD oo CD (/ ¡¡ :: ooC\ . .' .' ....... .. en . ¡¡ i::i oo C\ ........ !t0P'" o oo i: o ......A............................................................. G) oo ïii c: i.Gl c: C) C) oo oo ...~... ....o¡ . ....;. .' .... ............ . ..' '9 '9 500 1000 2000 Predicted .2 .1 o Quantiles of Standard Normal 2 161 (c) Multivariate regresion analysis using Yi and 1"2. Multivariate Test: HO: PR=O. DIAP=O. QRS=O Multivariate Statistics and F Approximations 8=2 M=O N=4 Statistic Wilks' Lambda Value 0.44050214 Pillai 's Trace o . 60385990 Hotelling-Lavley Trace Roy's Greatest Root 1.16942861 1.07581808 F 1 .6890 1 .5859 Num DF 1.7541 6 3 Den DF 6 Û 3 .9447 Pr ~ F 20 .( . 1755 22 .( . 1983 18 0.1657 11 0.0391 Based on Wilks' Lambda, the three variables Zs, Z4 and Zs are not significant. The 95% prediction ellpse for both TOT and AMI for z~ = (1,1,1200) is 4.305 x 10-5(Yå1 - 958.5)2 + 4.782 x 10-5(l'2 - 754.1)2 s( ? 2(14) ( ) - 8.214 x 10- 101 - 958.5)(1'Ó2 -754.1) = i.0941-iF2,1s,\O.D5 - 8.968. The 95% prediction ellpse for both TOT and AMI Chi-square plot of residuals U) gII - II i: .. lI 'C 'C N 1: !! CO ~ CI o'E N 0 00 0- 00 10 .. .. ... .. . . 0 , 0 2 3 4 qchisq 5 1l 7 o 500 1000 1500 2000 Y01 162 7.26 (a) (i) The table below summarizes the results of the "best" individual regressions. Each predictor variable is significant at the 5% leveL. 73.6% s 1.5192 76.5% 75.4% .3530 .3616 80.7% .6595 75.7% .3504 R2 Fitted model Yi = -70.1 + .0593z2 + .0555z3 + 82.53z4 27.04z4 Y2 =-21.6-.9640z1 + 26.12z4 Y2 = -20.92+.01 17z3 + 44.59z4 Y3 =-43.8+.0288z2 +.0282z3 + Y4 = -17.0+.0224z2 +.0120z3+ 15.77z4 (ii) Observations with large standardized residuals (outliers) include #51, #52 and #56. Observations with high leverage include #57, #58, #60 and #61. Apar from the outliers, the residuals plots look good. (ii) 95% prediction interval for Y 3 is: (1.077, 4.239) (b) (i) Using all four predictor variables, the estimated coefficient matrix and estimated error covariance matrix are -74.232 -24.015 -45.763 -17.727 -.550 -1.486 - 1.185 - 3.120 B= i I= .098 .009 .047 .029 .049 .008 .025 .011 85.076 28.755 45.798 16.220 2.244 .398 .914 .511 .398 .118 .193 .089 .914 .193 .419 .210 .511 .089 .210 .122 A multivariate regression model using only the three predictors Zi, Z3 and Z4 wil adequately represent the data. (ii) The same outliers and leverage points indicated in (a) (ii) are present. Otherwise the residual analysis suggests the usual regression assumptions are reasonable. (ii) The simultaneous prediction interval for Y 3 wil be wider than the individual interval in (a) (iii). 163 7.27 The table below summarizes the results of the "best" individual regressions. Ea~h predictor variable is significant at the 5% leveL. (The levels of Severity are coded: Low= 1, High=2; the levels of Complexity are coded: Simple= 1, Complex=2; Exper are coded: Novice=l, Guru=2, Experienced=3.) There are no the levels of signifcant interaction terms in either modeL. Fitted model Assessment = -1.834 + 1.270Severity + 3.003Complexity Implementation = -4.919 + 3.477 Severity + 5.827 Complexity 74.1% s .9853 71.9% 2.1364 RZ For the multivariate regression with the two predictor variables Severity and Complexity, the estimated coeffcient matrix and estimated error covariance matrix are B = 1.270 3.477. 3.003 -4.919J 5.827 (-1.834 î: = ( .9707 1.9162J 1.9162 4.5643 A residual analysis suggests there is no reason to doubt the standard regression assumptions. 164 Chapter 8 8.1 Eigenvalues of * are À1 = 6. À2 = 1. The principal campenents are Yl = ..894X1 + .447X2 ,. Y2 = .~47X1 - .894XZ Vareyi) = À1 '= 6. Therefore. proporti()n of tatal population variance explained by ~l is 6/(15+1) = .86. 8.2 .6325 1 £ = (1 . .6325J (a) Y, = ,.707L, + .7!J7L¿ Var(Yi) = À, = 1.6325 Proportien of total population '2 = .7n7Z, ~ .707Z2 varianceexpl ained by Yl is 1.6325/(1+1. = ..816 (b) No. The two (standardized) variables contribute ~qually to the principal components in 8.2(a). The two variables contribute unequally to the principal components in 8.1 because 'Of their unequal varian~es. (c) Py L = .903; 1 1 . PYl Zz = .903;. Py Z = .429 2, 8.3 Ei~envalues of tare 2.' 4. 4. E;genvect~rs assaciate with the ei~en- values 4. 4 are not unique. One choi~~ is =i = iO 1 O)çnd :~ =(0 0 1). With these assignments .the 'principal components are y 1 = Xl' Y 2 = X2 and Y 3 = X3 . 8.4 figenvalues of * are selutions.of 1;-À11 = (a2-Àp-2t~2_ÀH.a2,p)2 = 0 Thus ~0'2-À)H.a2_À)2_2cr4p2J = 0 S'O À = 0'2 'Or À ='"2~lt~hl,). For À1 = (12,~i = (l/ff,.o,-l/I2J. 'For À2 =O'2(l+i'Ii; ~ fj/Z.)1NEiZ). fer À~'=a2'tl-pI2). ~~ = 0/". -1/12; 1/12 1.65 Propert ion of Total Pri nci'fa 1 Component Vari ance 1 1 02 1/3 . a2(l+pm 1 (1+1'12) Y 1 = a Xl - 12 X3 1 1 1 1 1 1 Variance Explained y 2 = "2 Xl + 12 X2 + 2" X3 0'2 (1-.p12) y 3 = '2 Xl - /ž X2 + 2" X3 1 (l-pl2) 8.5 (a) Eigenva.l ues of 2 satisfy IE-ul = (l-À)3 + 2.p3 - 3(1:'À)p1 = 0 or (l +29-À)(1-p-À)2 = O. Hence À1 = 1 + 2.p; À2 = À3 = 1 - ? and results are consistent with (8-16) for p = 3. 1 1 (b) By direct multiplication .i y.p - y~ - .ø ( c 1) = (1 + (P-1)9 H c 1 ) thus varidying the fir~t eig~nvalu~-eigenv~ct~~ pair. further ~ :i= (l-p)~i' ; = 2.3....,p . 166 8.6 (a) Yi = .999xl + .041x2 Sample variance of Yi = -l = 7488.8 variance Y2 =-.041xl +.999x2 Sample of Y2 =t =13.8 (b) Proportion of total sample variance explained by Yi is -l/(-l + t) = .9982 (c) Center of constant density ellpse is (155.60, 14.70). Half length of major axis is 102.4 in direction ofyi' Half length of perpndicular minor axis is 4.4 in direction of Y2' 19 1 .' 2 (d) r)~ x = 1.000, ry" x = .687 The first component is almost completely deterined by Xi = sales since its variance is approximately 285 times that of X2 = profits. This is confirmed by the correlation coefficient ry'¡.xi = 1.000. 8.7 (a) Yi = .707z1 +.707z2 Y2 = .707z1 -.707z2 Sample varance of Yi = -l = 1.6861 Sample variance of Y2 =t =.3139 (b) Proportion of total sample variance explained by Yi is -l /(-l + t) = .8431 (c) rýi.i¡ = .918, rÝ1'l2 = .918 The standardized "sales" and "profits" contribute equally to the first sample principal component. (d) The sales numbers are much larger than the profits numbers and consequently, sales, with the larger variance, wil dominate the first principal component obtained from the sample covariance matrix. Obtaining the principal components from the sample correlation matrix (the covariance matrix of the standardized variables) typically produces components where the importance of the varables, as measured by correlation coefficients, is more nearly equal. It is usually best to use the correlation matrix or equivalently, to put the all the variables on similar numerical scales. 167 8.8 (a) rý¡,Zl == êik.J i == 1,2 k = 1,2,.. .,5 Correlations: i'\k 1 2 1 .732 -.437 2 .831 -.280 4 .604 .694 3 .726 -.374 5 .564 .719 The correlations seem to reinforce the interpretations given in Example 8.5. (b) Using (8-34) and (8-35) we have k rk 1 2 3 4 5 .353 .435 .354 .326 .299 r = .353 T= 103.1;: .%;(.01)=21.67 so r=2.485 would reject Ho at the 1% leveL. This test assumes a large random sample and a multivariate normal parent population. lti8 8.9 (a) By (S-lt) _ .. 2 £! E!!l e max LÜi~t) -' ~.r - ( 21) 2 \ n-n1 ) 2 I S 12 The same resul t appl ;ed to each variable independently 9; ves n max e-i n n n L(l1. ~O' ..) = 1 11 (2ir)'2 (n-l)2 s~. n 11 11. .. i ~o11 ~ 1 11 . p Under HO ~ max L(ii.rO) = .II L(\1.~a..) 11.+0 1=1 and the 1 i kel i hood ratio stat; stic becomes . il:fo L(~.tO) A = 'max Ui = .- :~.; - ; =1 11 n PT n 5.. . n and (4-17) \.¡e get (b) When t = 0'2 I . using (4-l6) 1 max L(\l ,O"~I) = ii 1 -2aLttr(~n-l)$)) .e !Y !æ (2n) 2 (cr2.) 2 1"69 8.9~ Continue) 50 np e ( )np/2 -np/2 max l.(ll ,a2 I) = -). .0'2 (lr)np/2(n_l )np/2(tr(s))np/2 e -np/2 = n p 1 np/2 (21T)np/Z (.!) (1 tr (5) )np/2 and the result follows. Under HO there' are P lJ; 's and. '01Ì v~riance so the dimension of the parameter space;s YO = p. + 1. The unrestricted case has dimension p + p(p+l)lZ so the X2 has p(p+l )/2 - 1 = (p+2)~p-l )/2 d. f. 8.10 (a) Covariances: JPMorgan, CitiBank, WellsFargo, RoyDutShell, ExxonMobii JPMorgan JPMorgan CitiBank WellsFargo RoyDutShell ExxonMobil 0.00043327 0.00027566 0.00015903 0.00006410 0.00008897 CitiBank WellsFargo RoyDutShe1l ExxonMobi1 0.00043872 0.00017999 0.00018144 0.00012325 o .00022398 0.00007341 0.00072251 0.00006055 0.00050828 0.00076568 Fargo, RoyDutShell, Exxon Principal Component Analysis: JPMorgan, CitBank, Wells Eigenana1ysis of the Covariance Matrix 103 cases used Eigenvalue Proportion Cumulative Variable JPMOrgan CitiBank WellsFargo RoyDu.tShel1 ExxonMi 1 0.0013677 0.529 0.529 PC1 0.223 0.307 0.155 0.639 0..651 0.00.07012 0.271 0.801 PC2 PC3 0.0002538 0.098 0.899 PC4 0.000142ti 0.055 0.954 PC5 -0.625 -0.326 0.663 -0.118 -0.570 0.250 -.0.414 0.589 -0.345 0.038 -0.497 -0.780 0.248 0.309 -0.149 O. .642 0.094 0.322 -0.64ti -0.216 0.0.001189 .0 .04ti 1.000 170 (b) From par (a), ~ = .00137 t = .00070 .t = .00025 14 = .00014 is = .00012, (c) Using (8-33), Bonferroni 90% simultaneous confidence intervals for Âi Â. ~ are íl: (.00106, .00195) Â.: (.00054, .00100) ~: (.00019, .0036) (d) Stock returns are probably best summarized in two dimensions with 80% of the total variation accounted for by a "market" component and an "industry" component. 8.11 (a) 3.397 - 1. 102 9.673 s= 4.306 - 2.078 .270 -1.513 10.953 12.030 55.626 - 28.937 -.440 89.067 9 :570 31.900 (Symetric) (b) ~ = 108.27 A ei t =43.15 .t = 31.29 14 = 4.60 is = 2.35 ê3 ê4 ês ê2 . -0.037630 0.118931 -0.479670 0.858905 0.128991 0.554515 -0.062264 0.040076 -0.249442 -0.259861 -0.769147 -0.759246 0.431404 -0.-027909 o .068822 -0.315978 0.393975 0.308887 -0.767815 -0.507549 0.828018 0.514314 -0.081081 -0.049884 -0.202000 171 .91 =-.038xl +.119x2 -.480x3 +.8S9x4 +.129xs 5'2 =-.062xl -.249x2 -.759x3 -.316x4 -.508xS (c) Correlations between variables and components: Xl X2 X3 X4 Xs r.y¡,x; -.212 .398 -.669 .947 .238 r.Y2,X¡ -.222 -.527 -.669 -.220 -.590 The proportion of total sample variance explained by the first two principal Components is (108.27+43.15)/(108.27+43.15+31/29+4.60+2.35)=.80. The first component appears to be a weighted difference between percent total employment and percent employed by government. We might call this component an employment contrast. The second component appears to be influenced most by roughly equal contributions from percent with professional degree (X2), percent employment (X3) and median home value (xs). We might call this an achievement component. The change in scale for Xs did not appear to have much affect on the first sample principal component (see Example 8.3) but did change the nature of the second component. Variable Xs now has much more influence in the second principal component. 172 -2. 768 2 . 500 8.12 (300. 51G) - .378 -.4'64 - .586 -2.23'5 .171 . 3.914 - 1 .395 6.779 30.779 ..624 1 .522 .673 2.316 2.822 .142 1.182 1 .089 ,:.811 .177 11 .364 3 . 1 33 1 .04'5 3Q.978 ..593 S = .479 (Symmetrk) 1.0 -.101 1.0 -. 1 94 - .27-0 - .110 - .254 . 15'6 .183 - .074 .11£ .319 .052 .502 .557 .411 .166 .297 - .1 34 .235 .167 .448 1.0 1.0 1.0 R = 1.0 ( Symetri c ) .154 1.0 Using $: ~1 = 304.2£; ~2 = 28.28; ~3 = ll.4~; ~4 = 2.52; ~~ = 1.28; ~6 = .53; 5:7 = .21 The first sampl-e princi-pal component ,. Y1 = -.Oinxi +.993x2 +.014x3 -.OO5x4 +.024xS +.112xii +;OO2x7 accounts f-or 87% of the total sampl-e variance. Tliefirst .c'Ompont is essentially IIso1ar r-adiation". ~Nete t~ large sample varianc~ f"()r x2 in S). 173 Usingji: ,. A ~1 = 2 .34; ~2 '=: 1.39; ,. ,. A 1.3 = 1.20; '"4 = .73; À5 = :65; À£ = .54; À 7 = .16 The first thre,e sample principle components are A Yl = .~37Z1 -.~05z2 -.551z3 ,-.378z4 -.498zS -.324z6. -.319z7 ,. Yi = -.278z,- +.527z2 +.007z3 -.435z4 -..199z5 +.5S7zti .-.308z7. ,. Y3 = .ó44z1 +;225z2 -.113z3 -.407z4 +.197z5 +.1~9z6 +.541z7 These components ~cceunt fer 70% of the total sample vari ance. The first camponent contrasts "\'/ind" with the. remaining variables. It might be some general measur.e of the pol1uti()n level ~ The second component is largely cemposed of "solar radiati,on".. and the pollutants "NO" and Iln3". It might represent the effects of solar radiåtion since solar radiation is involved in the production of NO and D3 fro!l the other pollutants. The third 'c-omponent is -eampos-d largely of ii..tind" and certain pollu- tants (e.g. "NO" and "He"). It might represent a wi~ transport . effect. A "better" interpretation of the components \'iould depend on more .extensive subject matter knowledge. The data can be eff€ctive1y summarized in three or few~r dimensions. The choice of S' or R makes a difference. 174 8.13 (a) Covariane Matrix XL X2 XL X2 X3 4.6'54750889 0.93134537C 0.931345370 0.612821160 0.110933412 o .589699088 O. 1184"69052 o .087004959 l1C933412 0.571428861 o . X4 " .58g699088 '0.276915309 X5 1 .074885"659 o .388886434 X6 0.15815'0852 -0.024851988 0.347989910 0.110131391 X4 X5 X6 0.276915309 0.118469052 1 .074885659 o .388886434 o .087004959 0.347989910 0.110409072 0.21740"5649 o . 217405"649 .0.862172372 -0.008817'694 X3 XL X2 X3 X4 X5 X6 0.021814433 o . 15815.Q52 -0.024851988 o . 11'(131391 0.021814433 -0.008817694 0.861455923 Correlati~n Matrix XL X2 X3 14 X5 X-ô XL 1 .0000 o . 5514 0.3616 o . 53"66 0.0790 X2 X3 0.5514 1.0000 0.1875 o .3863 o .4554 o .3464 - . 0342 -0. 157" 1.0000 o .5350 o .4958 o .704'6 0.7'Û46 1 . 0000 - .0102 0.0707 -.0102 1 . 000 )(4 X5 X6 o .3616 o .3863 o . 4554 o . 5350 o .536' 0.0790 - . 0342 o . 1875 1.'ÛOOO o . 3464 0.4958 0.1570 0.0707 (b) We wil work with R since the sample variance of xl is approximately 40 times lai.ger than that of x4. Eigenvalues of the Correlation Matrix PRIN1 PRIN2 PRIN3 PRIN4 PRINS PRIN6 Proportion 'Cumula t i va 0.47738 0.77764 0.12733 0.477385 0.179408 0.129607 0.65031 0.2"6228 0.1-08386 0.89479 0.38803 0.14478 o . '064672 o .040543 0.9594"6 Eigenvalue Difference 2.86431 1.78786 1 .-07"645 0 .29881 o . 2432"6 o . 65'679 o .78640 1 .00000 175 Eigenvectors PRIN2 PRINl PRIN3 PRIN4 PRIN5 -.551149 -.061367 -.421060 0.665604 -.600851 PRING O. 146492 o . 687297 o .076408 0.331839 0.211635 o .532689 - .116262 o .4458 - .026600 0.339330 o .498607 X4 o .429300 o .358773 o . 402854 -.291738 0.380135 XS 0.521276 XL X2 X3 X6 - . 020959 - . 073090 o . 873960 o .055877 -.628157 - .124585 - .203339 o .200526 -.207413 -.103175 o .429880 0.178715 o . 053090 -.794127 (c) It is not possible to summarize the radiotherapy data with a single component. We nee the fit four components to summarize the data. (d) Correlations between principal components and Xl - X6 are PRINl PRIN2 PRIN3 PRIN4 -0.02766 -0 .302ti8 X4 o . 78335 X5 o .88222 o . -09457 0.39440 -0.02175 -0.07646 0.43969 -0 .55393 -0.10986 -0.17931 -0.44446 -0.04949 X3 o .75289 o .72056 o .60720 o . 29923 o . 90675 o .37909 XL X2 X6 -0.339"55 o .53'67"6 0.16171 0.14412 8.14 S is given in Example 5~Z_ ~l = 200.5, ~2 = 4..5. . Å3 = 1.3 The first sample principal component explains a proporticn AI J 200.5/(200.5 + 4.5 + 1.3) = .97 of the total sample variance. Also, =1 = (-.051. -.998. .029 ,. Hence Yl = -.051x1 -.998x2 +.029x3 176 The first principal cQmponent is essentially Xz = sQdium content. "s"dium in S). A (NQte the (r,elativ.ely) large sample vtlriance for Q_Q plot of the Yl values is shown bel-ow. Theseàata appear to be approximately normal with no suspect observations. o. ,. Yl (1) * -'15. w .. w .. -30. li '¡w "... * oW 'f' -45. ;¡ 'I' ** * .. .. -60. ' ** ** w '.. * ... .,. -75. 1 -2.0 -1.0 1 I 0.0 1.0 ,. Q-Q plot for Yl. 2.0 i..i~ 3.\) q(i ) 177 1088.40 8.15 831 .28 1128.41 S = 7'63.23 784.09 850.32 92'6.73 1336.15 904.53 (Symmetri c) 1395.1"5' ~ A A A À1 = 3779.01; À2 = 4'68.25; À3 = 452.13; À4 = 24~.72 Consequent1y~ the first sample principal component aCt:ounts for a proportion .3779.01/~948.l1 = .76 of the total sample variance. A 1 so , "" :1 = (.45. . .49. .51, .53) Co nsequent 1 y ~ ,. Y, = .45xi + .49x2 + .5lx3 + .53x4 The interpretation of the first component is the same as the interpretation of the first component, obtained from R. in Example 8.6. (Note the sample variances in S are nearly equal)., 178 8.16. Principal component analysis of Wisconsin fish data .(') An are positively correlated. (b) Principal component analysis using xl - x4 Eigenvalues -of R 2.153g 0.7875 0.6157 0.4429 Eig~nvectors of R O. 7~32 0 . 4295 O. 1886 -0.7'071 0.6722 0.3871 -0.4652 ~.4702 0.5914 -0.7126 -0.2787 -0.3216 0.6983 -0.2016 0.4938 0.5318 pel pc2 pe3 pe4 St. Dev. 1.4676 0.8874 0.7846 0.66£5 Prop. of Vax. 0.£385 0.1969 0.1539 0.1107 Cumulative Prop. 0.5385 0.7354 0.8893 1.0000 The first principal component is essentially a total of all four. The second contrasts the Bluegil and Crappie with the two bass. (c) Principal component analysis using xl - x6 Eigenvalues of R 2.3549 1.0719 0.9842 0.6644 0.5004 0.4242 Eigenvectors of R -0.6716 0.0114 0.5284 -0.'0471 0.3765 -0.7293 -0.6668 -0.0100 0.2302 -0.7249 -0.1863 0.5172 -0.5555 -'0.2927 -0.2911 0 .1810 ~O. 6284 -0.3'081 -0.7'013 -'0.0403 0.0355 0.6231 0.34'07 '0.5972 0.3621 -0.4203 0.0143 -0.2250 0.5074 0.0872 -'0.4111 0.0917 -0.8911 ~O.2530 0.4021 -0.1731 pe 1 pe2 pe3 pc4 peS pe6 St. Dev. 1.5346 1.0353 0.9921 0.81£1 0.7074 0.6513 Prop. of Var. (). 3925 0.1786 0.1640 0.1107 0.0834 0.0707 Cumulative Pr~p. '0.392'0.5711 0.7352 0.84£9 0.9293 1. 0000 The \Va.liey~ is eontrasted with aU the others in the first principal eompoont ,look at theLOvariance pattern). The second principal component is essentially the 'Walleye and somewhat th,e largemouth bas. The thkd principal component is nearly a contrast betV'æ..n Northern pike and BluegilL 179 8.17 COVARIANCE MATRIX ----------------- xl x2 x3 x4 xS x6 ..Q13001'6 .0103784 ..Q223S.Q0 .0200857 .0912071 ..0079578 .0114179 .0185352 .0210995 .0085298 .0089085 .0803572 .06677"62 .0168369 .0128470 .0694845 .0177355 .0167936 .0115684 .0080712 .01'05991 The eigenvalues are o .lS4 0.018 0..008 o .003 0.0.02 0.001 and the first two principal components are .218 , .204, .673, .633 , .181 , .159 .337 , .432 , -.500 , .024 , .430 , .514 -x x ... 180 8.18 (a) & (b) Principal component analysis of the correlation matrix follows. Correlations: 100m(s), 200m(s), 400m(s), 800m, 1500m, 3000m, Marathon 100m(s) ().941 200m(s) 400m(s) 800m 1500m 3000m Mara thon 200m(s) 400m(s) 800m 1500m 3000m 0.909 0.820 0.801 0.732 0.680 0.806 0.720 0.674 0.677 0.905 0.867 0.854 0.973 0.791 0.799 0.871 0.809 0.782 0.728 0.669 Eigenanalysis of the Correlation Matrix 0.0143 0.6287 0.2793 0.1246 0.0910 0.0545 0.002 Eigenvalue 5.8076 0.008 0.013 0.018 0.040 0.090 1.000 proportion 0.830 0.998 0.990 0.977 cumulative 0.830 0.919 0.959 Variable 100m(s) 200m(s) 400m(s) 800m 1500m 3000m Mara thon PC1 0.378 0.383 0.368 0.395 0.389 0.376 0.355 PC2 -0.407 -0.414 -0.459 0.161 0.309 0.423 0.389 PC3 0.141 0.101 -0.237 -0.148 0.422 0.406 -0.741 PC4 -0.587 -0.194 PC5 0.167 -0.094 -0.327 0.819 -0.026 -0.352 -0.321 -0.247 0.645 0.295 0.067 0.080 Pe6 PC7 0.089 0.745 -0.266 0.127 -0.240 0.017 -0.195 0.731 0.189 -0.240 -0.572 0.082 0.048 -0.540 )71 = .378z1 + .383z2 + .368z3 + .395z4 + .389z5 + .376z6 + .355z7 )72 =-A07z1 -A14z2 -AS9z3 +.161z4 +.309z5 +A23z6 +.389z7 Zi Zz Z3 Z4 r.Yi,l; .911 .923 .887 .952 r.Y2'Z¡ -.323 -.328 -.364 .128 Z6 'l7 .937 .906 .856 .245 .335 .308 Z5 Cumulative proportion of total sample varance explained by the first two components is .919. (c) All track events contribute about equally to the first component. This component might be called a track index or track excellence component. The second component contrasts the times for the shorter distanes (100m, 200m 400m) with the times for the longer distances (800m, 1500m, 3000m, marathon) and might be called a distance component. (d) The "track excellence" rankings for the first 10 and very last countries follow. These rankings appear to be consistent with intuitive notions of athletic excellence. 1. USA 2. Germany 3. Russia 4. China 5. France 6. Great Britain 7. Czech Republic 8. Poland 9. Romania 10. Australia .... 54. Somoa nn 8.19 Principal component analysis of the covariance matrix follows. Covariances: 100m/s, 200m/s, 400m/s, 800m/s, 1500mls, 3000m/s, Marmls 3000m/s 1500ml s 800m/s 40 Oml s 200m/s 100ml s 0.0905383 lOOmIs o .0956u63 200m/s 400m/s 800m/s Marml s 0.0966724 0.0650640 0.0822198 0.0921422 0.0810999 Marml s 0.1667141 l500m/s 3000m/s o . 114'6714 0.1377889 0.1138699 0.0749249 o . -0809409 0.0735228 0.10831'64 0.0997547 0.0943056 0.0954430 0.0%01139 0.1054364 0.0933103 0.1018807 0.08'64542 0.12384.Q5 0.1765843 0.1465604 0.1437148 0.1184578 Marml s Eigenanalysis of the Covariance Eigenval ue Proportion Cumulati ve Variable lOOmIs 20 Oml s 400m/s 800m/s 1500m/s 3000m/s Marml s 0.73215 0.829 0.829 PC1 0.310 0.357 0.379 0.299 0.391 0.460 0.423 0.08607 0.097 0.926 PC2 Matrix 0.01498 0.017 0.981 0.03338 0.038 0.964 PC3 -0.376 0.098 -0.434 0.089 -0.519 -0.274 0.053 -0.053 0.435 0.211 0.427 0.396 0.445 -0.730 PC4 0.00885 0.010 0.991 0.00617 0.007 0.998 PC5 PC6 PC7 0.127 0.236 -0.199 0.081 -0 . 499 0.00207 0.002 1.000 -0.585 -0.046 -0.624 0.138 -0.323 -0.030 0.689 -0.311 0.132 0.667 -0.187 -0.124 0.894 -0.136 -0.2'65 0.128 0.055 0.184 -0.237 -0.357 -0.136 0.734 0.095 5'1 =.3 lOx¡ + .357 x2 + .379x3 + .299x4 + .391xs + .460x6 + .423X7 5'2 =-.376x¡ -.434x2 -.519x3 +.053x4 +.21 IXs +.396x6 +.445x7 Xl I X2 X3 X4 Xs X6 X7 r.YllXi .882 .902 .874 .944 .951 .937 .886 r.Yi,X¡ -.367 -.376 -.410 .057 .176 .276 .320 Cumulative proportion of total sample variance explained by the first two components is .926. The interpretation of the sample component is similar to the interpretation in Exercise 8.18. All track events contribute about equally to the first component. This component might be called a track index or track excellence component. The second component contrasts times in mls for the shorter distances (100m, 200m 400m) with the times for the longer distances (800m, l500m, 3000m, marathon) and might be called a distance component. The "track excellence" rankings for the countries are very similar to the rankings for the countries obtained in Exercise 8.18. 182 8.20 (a) & (b) Principal component analysis of the correlation matrix follows. Eigenanalysis of the Correlation Matrix Eigenvalue 6.7033 proportion 0.838 cumulative 0.838 0.'6384 Variable PC2 PC1 0.332 0.346 0.339 0.353 0.366 0.370 0.366 10,000m Marathon 0.354 100m 200m 400m 800m 1500m 5000m 0.080 0.918 0.529 0.470 0.345 -0.089 -0.154 -0.295 -0.334 -0.387 0.2275 0.028 0.946 0.2058 0.026 0.972 PC3 PC4 o .0976 0.012 0.984 PC5 0.300 0.0707 0.009 0.993 PC6 -0.362 o .04'69 0.OQ6 0.0097 0.001 1.000 PC7 PC8 0.999 0.348 -0.381 -0.217 -0.541 0.349 -0.440 o . 114 0.077 0.133 0.851 -0 .067 0.259 -0.783 -0.134 -0.227 -0.341 -0.147 0.530 0.652 -0 . 233 -0.244 0.072 -0.359 -0.328 0.055 0.183 0.087 -0.061 -0.273 -0.351 0.244 0.594 0.375 0.335 -0.018 -0.338 0.344 -0.004 -0.066 0.061 -0.003 -0.039 -0.040 0.706 -0.697 0.069 Yi = .332z1 + .346('2 + .339 ('3 + .353z4 + .366z5 + .370('6 + .366z7 + .354z8 Y2 =.529z1 +.470'2 +.345z3 -.089z4 -.154z5 -.295z6 -.334z7 -.387('8 Zs ('6 ('7 ('8 .878 l4 .914 .948 .958 .948 .917 .276 -.071 -.123 -.236 -.267 -.309 ('1 Z2 ('3 r.YI,Z¡ .860 .896 r.Y2'Z¡ .423 .376 Cumulative proportion of total sample variance explained by the first two components is .918. (c) All track events contribute aboutequally to the first component. This component might be called a track index or track excellence component. The second component contrasts the times for the shorter distances (100m, 200m 400m) with the times for the longer distances (800m, 1500m, 500m, lu,OOOm, marathon) and might be called a distance component. (d) The male "track excellence" rankings for the first 10 and very lasti:ountris follow. These rankings appear to be consistent with intuitive notions of athletic excellence. 1. USA 2. Great Britain 3. Kenya 4. France 5. Australia 6. Italy 7. Brazil 8. Germany 9. Portugal 10. Canada ....54. Cook Islands The principal component analysis of the women. the men's track data is consistent with that for 183 component analysis of the covariance matrx follows. 8.21 Principal Covariances: 1oom/s, 2oom/s, 400m/s, 8oom/s, 1500mls, 5000mls, 1o,oom/s,lfiJQ~~ 10Om/s 200m/s 40Om/s 800m/s 1500m/s 0.0434979 0.0482772 0.0434632 0.0314951 0.0425034 0.0469252 0.0448325 0.0431256 100m/s 200m/s 400m/s 800m/s lS00m/s 5000m/s 10,OOOm/s Marathonm/S 5000../5 10,OOOm/s Marathonø/S 0.0648452 0.0558678 0.0432334 0.0535265 0.0587731 0.0572512 0.0562945 0.0688217 0.0428221 0.0537207 0.0617664 0.0599354 0.0567342 0.0761i388 0.0745719 0.0736518 0.0942894 0.0909952 0.0979276 Covariance Matrix Eigenanlysis of the 0.01391 0.024 0.947 Eigenvalue 0.49405 0.04622 proportion 0.0729140 10,OOOm/s Marathonm/s 5000../s 0.0959398 0.0937357 0.0905819 cumlative 0.0468840 0.0523058 0.0571560 0.0553945 0.0541911 0.079 0.923 0,844 0.844 0.01332 0.023 0.970 0.00752 0.013 0.983 0.00575 0.010 0.993 0.00322 0.006 0.998 Eigenvalue 0.00112 proportion cuulative Variable 10Om/s 200m/s 400m/s BOOm/s 1500m/s 5000m/s 10,OOOm/s Marathonm/s 0.002 1. 000 PCL 0.244 0.311 0.317 0.278 0.364 0.428 0.421 0.416 pc3 PC2 -0.432 -0.523 -0.469 0.173 0.235 -0.684 0.436 0.439 -0; 033 0.063 0.261 0.310 0.387 -0.111 -0.187 -0.128 pc4 pc5 PC6 -0.450 -0 .390 -0.318 0.341 0.420 0.046 0.332 0.543 0.317 -0.303 -0.016 -0.374 -0.100 -0.215 -0.339 0:584 0.119 -0.247 0.177 pC7 0.584 -0.535 0.039 pc8 -0.119 0.096 -0.008 -0.070 -0.044 -0.368 0.432 0.608 -0.327 -0.334 -0.006 0.'696 -0.352 -0. ,180 -0.6,93 0.074 0.215 0.391 j\ = .244xl + .311x2 +.317 X3 + .278x4 + .364xs+ .428x6 + .421x7 + .416xs 5'2 =-.432xl -.'S23x2 -.469x3 -.033x4 +.063xS +.261x6 +.3lOx7 +.387xs Xl X2 X3 X4 Xs X6 X7 Xs r.YI,X¡ , .822 .858 .849 .902 .948 .971 .964 .934 r.Y2'X¡ -,445 -.442 -.384 -.033 .050 .181 .217 .266 Cumulative proportion of total sample varance explained by the first two components is .923. The interpretation of the sample component is similar tt) the interpretatìon in Exercise 8.20. All track events contribute about equally to the first component. This component might be called a track index or track excellence component. , The second component contrasts times in rns for the shorter distances (100, 200 400m, 800m) with the times for the longer distances (1500m, 'SooOm, 10,0Q, marathon) and might be called a distance component. The "track excellence" rankings for the countries are very similar to the rankings for the countres obtained in Exercise 8.20. 184 8.22 Using S Eigenvalues of the CovarianeeMatrix Cumulative Eigenvalue Oi ffe renee proportion 20579.6 4874.7 5.4 15704.9 2.8 0.4 0.808198 0.191437 0.000213 0.000130 0.000018 0.1 o . 000003 PRIN1 PRIN2 PRIN3 PRIN4 PRINS PRIN6 PRIN7 ~!!S!!.2 2.1 3.3 0.5 0.1 ./ 0.000000 0.0 Eigenveetors X3 X4 X5 X6 X7 X8 X9 PRIN3 PR IN4 PRINS PRIN7 PRIN2 PRIN6 PRIN1 0.005887 0.487047 o . 009680 0.286337 0.608787 _ .003227 _.425175 0.311194 o .535569 o . 000444 o . 008388 -.509727 -.000457 0.010389 0.024592 _.000253 ~ o . 008526 0.003112 0.000069 0,009330 (Õ. e72697 .; 0.029196 0.004886 _ .000493 0.008577 _ .487193 - .034277 0.904389 0.133267 _ ,018864 0.284215 0.004847 0.593037 0.390573 0.011906 _.748598 -.005597 o . 002665 -.005278 O. 855204 0.043786 0.082331 _.000341 Plot of Y1.Y2. Symbol is value of X1. (NOTE: 10 obs hidden. I 2500 8 Y1 8 1 8115 2000 5 8 1 5 1 8 81 8 8 18 5 885 5111 11 551 51 15 111 1 1 8 155 55 18 1 1 8 8 8 1 8 8 8 8 8 1 5 5 1500 -100 a 100 Y2 200 300 0.014293 _.037984 0.998778 0.013820 _ . 000256 yrhgt ftfrbody prctffb fraiie bkfBt saleht salewt 185 8.22 (C"Ontinued) Using R ~igenvalues of the ~orrelation Matrix Ugenvalue Difference Proportion 'Cuaiulative 2.78357 0.581171 0.191018 0.105912 0.060204 0.58867 0.77969 4.12070 1.33713 0.74138 0.42143 0.18581 0.14650 0.04706 PRINI PRIN2 PRIN3 PRIN4 PRINS PRIN6 PRIN7 o . 59575 0.31996 o .~3562 o . 88560 0.94580 0.97235 0.99328 o . 02644 0.03930 0.09945 0.020929 0.006722 1 .00000 Eigenvectors X3 X4 X5 X6 X7 X8 X9 PRINS PRIN6 PRIN7 0.065871 _.072234 _.177061 0.127800 _.434144 0.208017 0.799288 -.276561 0.774926 0.017768 PRINl PRIN2 PRIN3 PRIN4 0.449931 0.412326 0.355562 0.433957 ...186705 0.452854 0.269947 ... 042790 0.129837 -.415709 - .038732 0.113356 0.247479 0.314787 0.242818 0.618117 _ . 176650 -.215769 -.109535 0.253312 _ . 582433 0.290547 0.450292 0.568273 _.452345 ..315508 0.007728 0.714719 0.101315 0.600515 Plot Syiibol of Vl.Y2. -.719343 0.579367 0.142995 0.160238 yrhgt o . 042442 ftfrbody pr(:tffb freiie bkfat - . 236723 0.047036 sa leht salewt - .002397 - . 582337 is value of Xl. (NOTE: 27 obs hidden.) 1200 8 8 8 8 88181 1000 8 118851 8151 Vl 1 800 8 88811111 1 8 15 1 1111155 55 1 1 5 5 600 i i 800 900 1000 1200 1100 1300 V2 Plot of VL.02. (NOTE: Syiibol used is Plot of Yl.02. (NOTE: FOA S 36 obs hidden.) 38 obs hidden.) 1200 2500 VL SymbOl used is hi( ~ ***** 1000 VL 2000 . . 800 .*. ... ....... ...... ........ ***...... .- .. .. 600 1500 1 -3 .2 -1 0 .02 2 3 .3 .2 i i -1 a Q2 1 2 3 1~6 8.23 a) Using S Eigenvalues of S 4478.87 152.47 32.32 8.12 1.52 0.54 Eigenvectorsof S (in colums) -0.849339 -0.368552 -0.194132 -0.314€78 -0.043918 -0.064458 0.470832 -0.22€606 0.074260 -0.008692 -0.000202 -0.846078 -0.368132 0.012754 -0.110784 -0.019105 -Q.058127 0.303143 -0.928388 -0.012289 -0.070597 -0.216748 0.848576 0.355060 -0.082353 0.032666 -'0.060354 0.001815 -0.060162 0.440119 0.892805 -0.092026 0.033880 0.052267 0.887138 -0.443264 The first component might be identified as a "size" component. It is domiated by Weight, Body lengt and Gir, those varables with the largest sample varances. The first component explains 4478.87/4673.84 = .958 or 95.8% of the total sample varance. The second component essentially contrasts Weight with the remaining body size varables, Body length, Neck, Gir, Head lengt, component and Head width, although the sample correlation between the second and Neck is small (-.05). The first two components explain 99.1 % of the total sample varance. These body measurement data can be effectively sumarze in one dienion. b) Using R R 1.0000 0.8752 0.9559 0.9437 0.9025 0.9045 0.8752 1.-0000 0.9013 0.9177 0.9461 0.9503 0.9559 0.9013 1.0000 0.9635 0.9270 0.9200 0.9437 0.9177 0.9635 1.0000 0.9271 0.9439 O. 9025 0.9461 0.9270 0.9271 1.0000 0.9544 0.9045 0.9503 O. 9200 0.9439 0.9544 1.0000 Eigenvalues of R 5.6447 0.1758 0.0565 0.0492 0.0473 0.0266 Eigenvectors of R (in colums) -0.558334 0.532348 -0.409938 -0.389366 -0.411999 -0.222694 -0.4091 £2 0.318718 -0.41'0333 0.319513 -0. 403'672 -'0.4'04313 0.286817 0.261937 -0.598371 0.128024 -0.186741 0.719785 .0.0-04276 0.012490 0.035396 0.073950 -0.561034 -0.599053 -0.581252 -0.228969 0.231095 0.580499 0.695916 -0.291938 0.251473 0.313431 -0.243840 -0.519785 -0.458838 -0.435168 l87 8.23 (Continue) Again, the first principal component is a "size" component. All varables contribute equally to the first component. This component explains 5.6447/6 = .941 or 94.1 % of the total sample variance. The second principal component contrasts Weight, Neck and Girth with Body length, Head lengt and Head width. The first two components explain 97% of the total sample variance. These data can be effectively sumarzed in one dimension. c) The results are similar for both the covarance matrx S and the correlation matrx R. The fist component in each analysis is a "size" component and aInost all of the varation in the data. The analyses differ a bit with respect to the second and remaining components, but these latter components explain very little of the total sample varance. 188 8.24 An ellipse format chart based on the first two principal.cmponents of the Madison, Wisconsin, Police Department data XBAR 3557.8 1478.4 2676.9 13563.6 800 7141 S -72093.8 367884 .7 -72093..8 85714.8 222491.4 1399053.1 43399 .9 139692.2 -1113809.8 1698324. 4 ~244 785.9 -44908 . 3 110517.1 101312.9 11'61018.3 -244785.9 224718 .~ 4277'67 .S -462615.6 42771)7 .5 24138728.4 85714.8 222491.4 -44908 .3 43399 .9 139692.2 110517 .1 1458543.~ -1113809.8 330923.8 330923.8 1079573.3 1~1312. 9 111)1018.3 1079573.3 -4'6261S .6 Eigenvalues of S 4045921.9 2265078.9 761592.1 288919.3 181437.0 94302.6 Eigenvectors of S -0.0008 -0.0567 -0.5157 0.6122 0.4311 -0.4126 -0.3092 -0.5541 0.5615 0.4932 -0.1796 -0.0810 -0.4821 0.3862 -0.3270 0 .3404 -0.5696 0 . 2667 0.3675 -0.6415 -0.4898 -0.0642 -0.4308 0.1543 -0'.1544 0.0359 -0.0316 -0.3071 -0.4062 -0.8453 -0.711)3 -0.3575 -0.2662 -0.4094 0.3269 0.1173 Principal components yl y2 y3 y4 y5 y6 1 1745.4 -1479.3 618.7 222.6 7.2 178.1 2 -1096.6 2011.8 652.5 -69.5 636.9 560.2 3 210.6 490.6 365.8 -899.8 -293.5 -15.2 4 -1360.1 1448. 1 420.1 523.5 -972.2 88.5 5 -1255.9 502.1 -422.4 -893.8 359.9 -273.7 6 971.6 284.7 -316.9 -942.8 -83.5 -70.1 7 1118.5 123.7 572.9 319.9 -60.8 -598.5 8 -1151.6 1752.0 -1322.1 700.2 -242.2 -158.8 9 -497.3 -593.0 209.5 -149.2 101.6 -586.2 10 -2397.1 1819.6 -9.5 -147.6 -109.9 207.8 11 -3931.9 -3715.7 924.1 35.1 -274.2 152.9 12 -1392.4 -1688.0 -2285.1 372.1 444.0 85.2 13 326.8 650.8 1251.6 728.8 809.S -140.0 14 3371.4 -379.1 -499.9 -114.6 -324.3 286.9 15 3076.S -199.1 -105.7 419.8 -122.3 3.4 16 2261.9 -1029.3 -53.7 -104.5 123.8 279.6 189 2.5 X 10-7 yl + 4.4 x lL-7 yi = 5.99 The 95% 'Control ellipse base on the first two principal.cmponents of overtime hours ooo ~ ooo "' -400 o 2000 4000 y1 8.25 A control chart based on the sum of squares dij. Period 12 looks unusuaL. Sum of squares of unexplained t:omponent of jth deviation . It ~ 0 ~ .. M ~ en en iq 0 d . . . . 2 . .. .. 4 6 . 8 Period .. 1-0 . 12 14 1'6 190 8.26 (a)-(c) Principal component analysis ofthe correlation matrix R. Correlations: Indep, Supp, Benev, Conform, Leader Indep -0.173 -0.561 -0.471 0.187 Supp Benev Conform Leader Supp Benev Conform 0.018 0.298 -0.327 -0.401 -0.492 -0.333 Cell Contents: Pearson ~orrelation Principal Component Analysis: Indep, Supp, Benev, Conform, Leader Eigenanalysis of the Correlation Matrix 1. 3682 0.439 0.274 0.713 0.7559 0.151 0.864 PCL PC2 PC3 Eigenvalue 2.1966 0.439 l'ortion Cumulative Variable Indep Supp Benev Conform Leader 0.5888 0.0905 0.118 0.018 1.000 0.982 PC4 PC5 -0.521 0.087 -0.667 -0.253 -0.460 0.351 -0.454 0.187 0.788 0.121 0.115 -0.733 -0.386 0.548 -0.008 0.525 -0.451 0.439 -0.491 -0.295 -0.469 -0.361 0.648 0.007 -0.480 Using the scree plot and the proportion of variance explained, it appears as if 4 components should be retained. These components explain almost all (98%) of the variabilty. It is difficult to provide an interpretation of the components without knowing more about the subject matter. All four of the components represent contrasts of some form. The first component contrasts independence and leadership with benevolence and conformity. The second component contrasts -support with conformty and leadership and so on. SG-llot of Indap, t.o 0;5 0.0 1 2 3 Component Number 4 5 191 . Scatterplot of y2hatvs 11l1lt .. . . 1 U' .... . .. 2 . -: .. .' . . . .. .. . .. .... . ... . .. -3 .. -4 .. .. .., ... .~ . . . . .. . '.. . . . " .. . .' . fi ..I. .. . .. " .. .2 yII1 -I ~3 . ... o 2 1 3 .':. -',:: .:--:'-., .. .... .. .."", ," ".." .: - -,-: ~\ '" - " .:---- ,::'--'....... ....._-,. .__,..___,-::-":___.:::_,'::'/-.:--d"::,":-: SCàlterplot òfy2h:al vsyiihat . . . . . .' -: .. . . . ,., .. .~ . . . .' . . . .. .. /till o . -4 .3 . .. -2 .. . .. .... . ... ., . . . .- .. . yll .. e. . . " ... . .. ". .. . .. .. . . i 3 The two dimensional plot of the scores on the first two components suggests that the two socioeonomic levels cannot be distinguished from one another nor can the two genders be distinguished. Observation #111 is a bit removed from the rest and might be called an outlier. 192 covarance matrix S. (a)-(d) Principal component analysis of the Coyariances: Indep, Supp, Seney, Conform, Leader Indep 34.7502 -4.271;7 Inde Supp Benev Conform -18.0718 -15.9729 5.7165 Leader Benev Conform 29.8447 9.3488 -13.9422 33.0426 Supp 17.5134 0.4198 -7.8682 -8.7233 -9.9419 Leader 26.9580 Principal Component Analysis: Indep, Supp, Seney, Conform, Leader Eigenanalysis of the Covariance Matrix 68.752 0.484 0.484 Eigenvalue Proportion Cumulative Variable PC1 Indep -0.579 Leader -0.380 0.042 0.524 0.493 Supp Benev Conform 31. 509 0.222 0.706 23.101 0.163 0.868 pc3 PC2 -0.643 0.140 0.119 -0.422 0.079 0.612 0.219 -0.572 16.354 0.115 0.983 PC4 0.309 -0.515 0.734 -0 .304 0.090 0.612 -0.494 2.392 0.017 1.000 pc5 0.386 0.583 0.352 0.398 0.478 Using the scree plot and the proportion of variance explained, it appears as if 4 components should be retained. These components explain almost all (98%) 'Of the variabilty. The components are very similar to those obtained from the correlation matrix R. All four of the components represent contrasts of some form. The first component contrasts independence and leadership with benevolence and conformity. The second component contrasts support with conformity and leadership and so on. In this case, it makes little difference whether the components are obtained from the sample 'Correlation matrix or the sample covariance matrix. of ItidèP# -.1 LeaCler--Cv Mamx Scre Plot 50 11 i "1 ~ .=i 30 ¡¡ 20 10 o 1 2 3 CompolientNumbe 4 5 193 Scatterplbt of y2hatcov vs ylhatcov 15 . 1 . 2 .. . ... . ..l. u. So'. ....... . ~ . .. . ... ..... -. ". ... .... . .. .. . . . .. _... .. e.- . .. . .. . .. . . .. . .. . . . ". "" #111 ø . .. 1# ~ l lö4 -15 -20 .10 io o y1hav 20 ~ .Sctterplot of y2hatatv VB y1hàtcv Lj 15 to .. .-. . . .. . . . _'. .. .. ... . e. .. . ... ..... ....,. ." . . .. . . .. fill . ø "" . .. -.. . . . .. ... . . . . i... . . . .. . .. 1# . .. i. . -iO .15 ø -J 10,! .20 .Uil o iO 20 yilhatcv The two dimensional plot of the scores on the first two components suggests that the two socioeconomic levels cannot be distinguished from one another nor can the two genders be distinguished. Observations #111 and #104 are a bit removed from the rest and might be labeled outliers. Large sample 95% confidence interval for Â.i: (l-1.96-21130 ((l+1.96.21130 68.752 , 68.752 )=(55.31,90.83) 194 8.27 (a)-(d) Principal component analysis of the correlation matrix R. Correlations: BL, EM, SF, BS BL EM SF EM 0.914 SF 0.984 0.942 BS 0 . 988 0 . 875 0 . 975 Cell Contents: Pearson correlation Principal Component Analysis: BL, EM, SF, BS Eigenanalysis of the Correlation Matrix Eigenvalue 3.8395 Proportion 0.960 0.960 Cumulative o . 1403 Variable PC2 BL EM SF BS PC1 0.506 0.485 0.508 0.500 0.035 0.995 0.0126 O. 003 0.998 PC3 0.0076 0.002 1.000 PC4 -0.261 -0.565 0.597 0.819 -0.194 -0.237 -0.020 0.800 0.318 -0.510 -0.053 -0.698 The proportion of variance explained and the scree plot below suggest that one principal component effectively summarzes the paper properties data. All the variables load about equally on this component so it might be labeled an index of paper strength. Component ftJlmbe 195 The plot below of the scores on the first two sample principal components does not indicate any obvious outliers. Sætterplot ofylhat vs y2hat . . . .~. . e. .. . o.:.. .. ~ . . o. . .: .. .- ... 'O e. .. ~3 -4 -050 "0.25 0.00 0.25 0;50 y2hiit U)O 0.75 1.5 (a)-(d) Principal component analysis of the covariance matrix S. Covariances: BL, EM, SF, BS EM SF BS 0.513359 0.987585 0.434307 2.140046 0.987966 0.480272 BL BL EM SF BS 8.302871 1. 88£636 4.147318 i.972056 Principal Component Analysis: BL, EM, SF, BS Eigenanalysis of the Covariance Matrix Eigenvalue proportion cumulative Variable BL EM SF BS 11.295 0.988 0.988 0.104 0.009 0.997 PC1 PC2 0.856 0.198 0.431 0.204 0.032 0.003 0.999 PC3 0.006 0.001 1. 000 PC4 -0.332 0.155 0.786 -0.497 -0.3Hl 0.259 0.733 0.458 -0.201 0.325 -0.901 -0.364 The proportion of variance explained and the scree plot that follows suggest that one principal component effectively summarzes the paper properties data. The loadings of the variables on the first component are all positive, but there are some differences in magnitudes. However, the cOl'elations of the variables with 196 the first component are .998, .928, .990 and .989 for BL, EM, SF and BS respectively. Again, this component might be labeled an index of paper strength. Component NurilJ The plot below of the scores on the first two sample principal components does not indicate any obvious outliers. 'Stàtb~r,plot of ylhatcov vs y2håtc . 27.5 .. . ..' ..- a. . .. . .... . , .. . .. .. . ". . a... 17;5 . 15.0 . 0.0 0.4 0.8 y2hatcov / 1.2 1.6 197 8.28 (a) See scatter plots below. Observations 25, 34, 69 and 72 are outliers. Scttei¡løt øf'family YS Distad 160 .~~ . . . .." .. . .,. ... "1 i'. .. . 21) :. ' tt 1 I) .. . . il ?i 114'1: 200 100 0 Dist 300 40 50 5cttl'1CJtCJfPistRlI"SiCatte 500 . #" r.'1 "1 i .. 300 .... l=.. ¡ ¡OO 100 0 . I . . " ". r.. . .. .. . . 0 20 li 3~. . 40 Catt 60 8Ø 100 (b) Principal component analysis of R follows. Removing the outliers has some but relatively little effect on the analysis. Five components explain about 90% of given the the total variabilty in the data set and seems a reasonable number scree plot. 198 .3 Coirpone 45 Numbe 6 Prlnclp81 Compon8nt An8lysls: AdjF8m, AdjDlstRd, AdjCotton, AdjMalz AdjSorU..Outle 25.34,68,72 remove) Eigenalysis of the correlation Matrix proportion culative 0.121 0.088 0.745 0.833 0.160 0.625 0.465 0.465 0.3661 0.2400 0.041 0.027 0.941 0.968 0.6043 0.067 0.900 1.0845 0.7918 1. 4381 Eigenvalue 4.1851 O. 1718 0.019 0.987 Eigenvalue 0.1182 proprtion 0.013 C\lative 1.000 variable AdjFam AdjOistRd AdjCotton AdjMaize Adj Sorg AdjMillet AdjBull AdjCattle AdjGoats pe 0.434 0.008 0.446 0.352 0.204 0.240 0.445 0.355 0.255 PC2 PC3 0.098 -0.569 0.132 0.388 -0.111 PC4 0.171 0.496 -0.027 0.240 -0.059 0.616 0.065 -0.497 -0.009 -0.353 0.604 0.415 -0.116 -0.068 -0.030 -0.146 -0.284 0.014 -0.373 0.049 -0.687 -0.351 PC5 0.011 -0.378 -0.219 -0 . 079 -0.645 0.527 -0 . 028 0.218 0.249 PC6 -0.040 0.187 -'0.200 -0.273 0.246 0.181 -0.134 0.759 -0.402 PC7 PC8 PC9 -0.797 -0.263 -0.249 0.021 -0.048 -0.065 0.361 0.329 -0 . 675 -0.024 0.363 0.574 -0.021 0.126 0.293 0.241 0.077 0.048 0.396 -0.751 0.190 -0.011 0.169 0.038 0.274 0.149 -0 . 131 Princlp81 Component An8lysls: F8mlly, Dlsd, Cotton, Møz Sor9, MIII8 BulL. .. · Eigenanlysis of the Correlation Matrix proportion 4.1443 0.460 0.460 Eigenvalue 0.1114 0.012 Eigenvalue cuulative proportion C\lative variable Family OistRd cotton Maze sorg Millet Bull Cattle Gots 1. 2364 1. 0581 0.9205 0.102 0.818 0.6058 0.067 0.885 0_5044 0.056 0.941 0.137 0.598 0.118 0.715 PC2 PC3 PC4 PC5 -0. 002 -0 .123 -0. 089 -0 . 127 0.100 -0.216 0.129 0.110 0.770 o . 043 0.2720 0.1470 0.030 0.016 0.971 0.988 1. 000 PCL 0.444 -0 .100 -0.033 -0.072 -0.831 0.502 0.411 -0.342 -0. 068 0.030 0.337 -0.554 0.170 0.164 0.311 0.452 -0.069 -0 .229 0.043 -0.385 -0.606 0.269 0.440 -0.029 0.122 0.197 0.247 0.458 0.278 0.486 0.309 0.379 -0.173 0.100 PC6 -0.194 -0.051 -0.134 O. 053 -0.361 -0.632 -0.182 O. 594 -0.392 0_407 PC7 -0.579 -0.045 0.509 -0.352 0.055 0.089 0.458 -0.012 -0.242 PC8 PC9 0.454 -0.461 0.041 0.082 -0.372 -0 . 504 0.499 -0.360 0.300 -0 .139 0.077 -0.097 0.357 0.621 -0.215 -0..225 -0.242 0.095 199 (c) All the variables (all crops, all livestock, family) except for distance to road (DistRd) load about equally on the first component. This component might be called a far size component. Milet and sorghum load positively and distance to road and maize load negatively on the second component. Without additional subject matter knowledge, this component is difficult to interpret. The third component is essentially a distance to the road and goats component. This component might represent subsistence farms. The fourth component appears to be a contrast between distance to road and milet versus cattle and goats. component appears to Again, this component is diffcult to interpret. The fifth contrast sorghum with milet. 8.29 (a) The 95% ellpse format chart using the first two principal components from the covariance matrix S (for the first 30 cases of the car body"2 assembly data) is "2 shown below. The ellpse consists of all YI':h such that Yl + ~2 S X; (.05) = 5.99 Â, Â. lie outside the ellpse. where -l = .354, t = .186. Observations 3 and 11 Scalterplot of y2hat-y2bar vs ylhat-yl_ .1111 -T -1.5 -1 o 1 2 ylhat-ylbar (b) To construct the alternative control char based upon unexplained components of the observations we note that di = .4137, S~2 = .0782 so e .0782 = .0946 v = 2 (.4137)2 = 4.4. Conservatively, we set the chi- 2(.4137) , ;U782 squared degrees of freedom to 1) = 5 and the VCL becomes ex; (.05) = .0946(11.07) = 1.05 or approximately 1.0. The alternative control char is plotted on the next page and it appears as if multivariate observation 18 is out of control. For observation 18, y; makes the largest contribution to d~18 and 200 getting the most weight in Y 4 are the thickness measurements Xl and X2. Car body #18 could be examined at locations 1 and 2 to determine the cause of the unusual deviations in thickness from the nominal levels. the variables t. l. =ä 5(.05) ." 1.0 201 Chapter 9 9.1 .8' .63 .45 L' = (.9 .7 .5); LL' = .63 .49 .35 .45 .35 .25 so 2 = LL' + 'l 9.2 å) For m-' h1 = 9.Ìi = .81 h1 - III = 49 . 2 - lY21' . hi = 9.ii = .25 The communalities are those parts of the variances of the variables explained by the single factor. b) corr(Zi'F,) = Cov(Zi'Fi)' i = J,2,3.' By (9-5) cov(Zi,F,) = .lil. Thus Corr(Zp'F1) = 111 = .9; Corr(~,F1) = 9.21 =. .7; .corr(Z3,Fi) = 9.31 = .5.. The first .variab1e, Zl' has the largest correlation with the factor and therefore will probably' carry the most weight in naming the factor(. .6251 9.3 a) L = r'':1 = 1i . ~93. = . ' .507, .711 " .831 . Slightly different (.87'61 from result in Exercise 9.1. b) Proportion of total variance explained = ~ = , .i6 = .65 .451 9.4 i (.81 .63 .e = f - '¥ = LL = .45 .63 .49 .35 .35 .25 . ' L = h1 ~1 = Ii .5ti23 =,.7 .40Hi . . (.91 .'5 (.7~29J 202 Result is consistent with results in Exercise 9.1. It shoul.c - - _. - be since m = 1 common factor completely determin.e~ e = 2 - 'l . 9.5 Since V is diagonal and S - LL' -, has zeros on the diagonal, (sum of squared entri es of S - LL i - V) S (sum of squared .entries of .,. ,. A " I . S - LL). By the, hint, S - LL =,P(2)A(2)P(3) \'1hich has sum of squared er1tri es A ,. Ai ,. A ¡Ai. A,. Ai Ai tr(P (2)A(2)P (2) (p (2)A(2)P (2)) J = tr(P (2)A(2f (2)P (2)) ,. "'i Ai A A,.i = tr(A(2)A(2)P(2)Pc2)) = tr(A(2)A (2)) ,.,. A m+ m+. '11 = ~2 1 + ~2 2 + ... + l! Therefore, ..1 - ,. It A (sum of squared entri es of : S - LL - ,) s ~~+l + À~+2 + . .. + Àp 9.6 a) Follows directly from hint. b) Using the hint, \'ie post multiply by (Ll' +'1) to get I = ('1-1 -'1-1L(I +L,'¥-1L)-1L''1-1)(LL' +'1) = '1-1 (lL' +'1) -'1-1L(I +L''l-lL)-1L'v-1(LL' +'i) '-(use part (a)) = ,-1 (Ll + '1) - '¥-l L( I ~'(I + L' '1-1 L) -1) l' - '(-1 LÙ + L ''1-1 L) -1 L' = ,-1Ll +1 -'1-1LL' +'1-1L(I +L'V-1L)-lL' _ '1-1i(i +LI'1~1L)-lL' = I Note all these multiplication steps are reversibl~. c) Multinlyin~ the result in (t) by L we get 203 (Ll' +V)-1L = 1f-\_iy-1LlI +L''i-lLrll''¥-~ .. (use part (a)) = '¥-lL-V-\(1 _ (I i-L1'1-1L)-1) = iy-'L(I +L''1-1L)-1 Result follows by taking the transpose of both sides of the final equal ity.. 9.7 Fran the equation ~ = Ll ..' '¥, m = 1, we have ;1 9.z~ rii ai ~ , ~12 a2~ 121 +"'d 9.11121 = (111 +~1 'so aii = 9.11 +,wl' a2i = ~21 +"'2 and a12 = 111121 let p = a12/lan la22 . Then, for any choice Ipl/a22 s 121 :S /aZ2 t set .lll = alZ/121 an~ check. a12 = 9.119.21. We 2 on 't1 = aii lYll = (111 .l~1 ~ (111 pZau obtai ,Ii 112 al2 . a12 - -a11 -~V11 --0 and tPZ = a22 - 12.1 ~ 0'22 -0'22 = o. Since i21 \.¡as arbitrary of within a suitable interval, there are an infinite number solutions to the factorization. 9.8 . 1: = Ll + 'i for m = 1 imp 1 i es = .l; 1 + il1 .. ¥ = 9.,,9.21 1 ,: 9.21 + *2 ( 1 .., = 111131 ) .i = 121131 1 = 9.31 + "'3 No\'1 i~ ~ = :; and .l119.21 = .4 , so 9.; 1 = (:;)(.4) and 9." =:! .717. Thus .l21 =':! .55&. finally, from .9 = 111R.31 ~ -w have .t31 =:! .9/.717 = :t 1.255 · 204 Note all the loadings must be of the same sign because all the have covari ances åre positiv.e~ ' We ~o 717 J .4 LL'.= .558 (.717 .558 1.255 J. .3111 .9 1.255 = (:¡14 .7 .9 J .7 1 .575 so' ~3 = 1 - 1 ~~7S-= ~.57~, which is inadmissible as a varianc~. 9.9 (a) Stoetzel's interpretation seems reasonable. The first factor seems to contrast sweet witb strong 1 iquors. ,(b) ---"-"'-_._... __ 0' -_... .-...._..__. . Factor 2 .::.......... .. ,1 .0 -- - ._-. ._....:-...- -_.._... -- ... ..' - _.... -. -_.. .,. ... - _...- .. -'_._"--- ...._..__._- _._--~. ". -. .... .... ,, --_.... .... ....-;. .. .. _.. - - -- ---~ - .. .....,. _.... ._---- ._~...;...... : ...... ~..=:-.... ., -:._:..~ -. ',... ---_._...._. .-_.' . ... _. .., . .... .. :~:::~~~ .:. ~~:="~.""'" ~ -':': ," .. '.. -_........_- . '.' ,. .,-........-..-.------~__---.--. - ....... _. . ._--- ------_... .. O....n ..... __.________..._ 0.. ,-..;- ... ....-.- '--7- ~ ._......- ._.._._------.. . "': ..... ....... -..._-~.__. ~.._~- . ,.,. ....... ... _. ..-. ._--: _.._-- o Rum .. .. .... .. ..... ........_..-.. ... ....... -_. .... . ._ ."M. ._...._~_ .. ..'_'''_._ , ' " Marc. '0 ..__._. ._.._.. - ." ..... ... . .'-.'-' -'. .- _..._----- '.- .. ....-. -- -_._.- ..__....._. ._.- ':'-- .5 --_...... ... .. -- .- - --'-- _........ - .. .-_..--.. -'" .....-: . .. ...- . ~. .... - .. ... - -_. ~.. _...._... ':'.: _. .... . . ....~.. .... .. _.. n .. .. .. -:.~ -- ....; .- ...__.... - ..... _.. . n..... ..__.. ..... .:... ._.:~:_'.~:"~.- .'- ..'-.:.: . ........._. , -_..... " . . -..1 . N .-:.. . ....__._. . . ... .. . , Ca 1 vados L... ._._...._ -_. .,...- ,... , .' .." . ... .. --factorl --.......... -_. ...... .. . - _. ._- - - ' ' .5 Li quors'- ....' .. .._-,.._--'r-. . -- .----_...-_... .........._- C ._. ....'---.7- :_h. .....~:-.--l a ." ,o~.~~c..-.~5 '-:". .. . ~.. Kirsch" . ,1.0 ~--. .., ---,_.::;.::..: k _...- _.. .., .... --.-.--.. ... -.._--_..~.- _.--- -- . 'A a oc ' ,. .. Wh. Mirabelle --'" .... -'--'-'-' --,--=~:_~::::__ ~ =~~ d:b.~~~!.~~.::i :. ... :- _.~ ~::~: ;...:~~: ~::::::. _: .~:_._-=::~?n~ ::~:=::£--= -----:~--~_. . . .... -----_. _.__._--;---------"'------.--.. ~-':'=-_-:;_.C:-:~-~.'~=:.:_~.~ .' :......, :.:'--:..' is e~ ._........-.';.. .... -----......---;-.;..-..--:.- .. . . ._~_____:- _.. _. '-.5' .-.::__ _.. .__.. ." _ '" __.__'-___.0. ~ --~_......_-----_.. _.. ...._.-:~-_.....--:.--:~-~ ...------_..._-- ." _.. ..... -----~.._._--_.... ".._._. ._"- --_.._.._-_.... .._..~-----_. ~___~.:..-~_. ..------. ... -----:-~7-. ~-~.. .. ---~-~... - . ._..... -~_.._..--.._-_.. i:\. ,- It doesn't appear as if rotation of the factor axes is necessary. (a) & (b) The speci f; c variances and communal ities based on the unrotated factors, are given in the f~llcwing table: 205 Speci fi c Vari ance Vari abl e Communa 1 ity .5976 .4D24 Skull breadth .7582 .2418 Femur 1 ength .1221 .8779 Ti bi a 1 ength .0000 1 .0000 Humerus 1 ength .0095 .9905 Ulna length . 0938 .9062 Skull 1 ength (e) The proportion of variance explained by each factor is: Factor 1 : ~ ;=1 r 9.;; = Factor 2 : ! r 12i 6 0 1 = 1= (c) 4.0001 6 .4177 6 or 66.7"h or 6.7% ,. A ,. R-Lz l-'i= z 0 .193 -.017 -.032 0 .000 .000 0 - . 000 .001 .000 .000 0 - . 001 -.018 .003 .000 .000 .000, 9.11 0 0 Substituting the factor loadings given in the table (Exerci'Se 9.10) into equation (9-45) gives. Y (unrotated) = .01087 y' (rotated) = .04692 Al though the rotated 1 cadi ngs are to be preferred by the vari-max ("sim.pl.e struct-ur.ell) cri terion, interpretation -of the fa(:tor-s 206 seems clearer with the unrotated loadings. 9.12 The covariance matrix for the logarithms of turtle meaurements is: S = 10-3 x 8.0191419 6.4167255 6.0052707 8.1596480 6.0052707 6.7727585 J ( 11.0720040 8.0191419 8.1596480 The maximum likelihood estimates of the factor loadings for an m=1 model are Estimated factor loadings Variable 1. In(length) 2. In(width) 0.1021632 3. In(height) 0.0765267 Fi 0.075201 7 Therefore, i = 0.0752017 , ( 0.0765267 0.1021632 J it' = 10-3 X 7.6828 5.6553 5.7549 7.8182 5.7549 5.8563 J ( 10.4373 7.6828 7.8182 (b) Since li~ = Îti for an m=l model, the communalities are '" 2 . A 2 .... A 2 . ", _ hi = 0.0104373, h2 = 0.0056053, h3 = 0.0058563 (a) To fid specific variances .,i'S, we use the equation .. A 2 .,i = 8¡¡ - hi the maximum Note that in this case, we should use 8n to get 8¡i, not S because likelihood estimation method is used. n - 1 23 (10.6107 7.685 7.8197 J Sn = -8 = -2 S = 10-3 X 7.685 6.1494 5.7551 n 4 7.8197 5.7551 6.4906 Thus we get .Ji = 0.0001 734, .J2 = 0.0004941, .J3 = 0.0006342 (c.) The proportion explained by the factor is .. 2 .. 2 .. 2 hi + h-i + h3 = 0.0219489 = .9440 811 + 822 + 833 0.0232507 (.:) From (a)-(c), the residual matrix is: 8n - it' - \Î = 10-6 X 2.1673 0 00.112497. 0.1124971.4474 J ( 1.4474 0 2.1673 207 9.13 Equation (9-40) requires m ~ ¥2P+l - ¡g). Hêre we have m = 1, P = 3 and the sti"ict inequality docs not hold. 9.14 Since "'~ Ä_l A~ Al ""1,. ,. A 1f 1f '1 = I, /i ~/i ~ = /i and E f E = I , '" '" 1.. ',.~ "!!S~-1 "'!."'..I ..l, "'At. ALIl ,. L''l- L = /i"tl1f~ V..Et~~:: /i~fEA"l = /i"'/i"S = A. 9.15 (a) variable HRA HRE HRS RRA RRE RRS Q REV variance communality 0.188966 0.133955 0.068971 0.100611 0.079682 0.096522 0.02678 0.039634 0.811034 0.866045 0.931029 0.899389 0.920318 0.903478 0.97322 0.960366 (b) Residual Matrix o 0.021205 0.014563 -0.022111 -0.093691 -0.078402 -0.02145 -0.015523 0.021205 0 0.063146 -0.107308 -0.068312 -0.052289 -0.005616 0.036712 0.014563 0.063146 0 -0.065101 -0.009639 -0.070351 0.006454 0.013953 -0.022111 -0.107308 -0.065101 0 0.036263 0.058416 0.00696 -0.033857 -0.093691 -0.058312 -0.009639 0.036263 0 0.032646 0.008864 0.00066 -0.078402 -0.062289 -0.010351 0.068416 0.032645 0 0.002626 -Q. 004011 -0.02145 -0.005516 0.005464 0.00696 0.006854 0.002626 0 -0.02449 -0.015523 0.035712 0.013953 -0.033867 0.00066 -0.004011 -0.02449 0 The m=3 factor model appears appropriate. (c) The first factor is related to market-value measures -(Q, REV). The second factor is related to accounting historical measures on equity (HRE, RRE). The third factor is historical related to accounting historical measures on sales (HRS, RRS). Accounting meaures on assets (HRA,RRA) are weakly related to all factors. Ther-efore, market- value meaures provide evidence of profitabilty distinct from that provided by the accounting measures. However, we cannot separate accounting historical measures of profitabilty from accounting replacement measures. 208 PROBLEM 9.15 HRE . RRE 0.8 R :¡ NO.6 o t; HRA a: "".. ' it 0.4 HRS 02 Q REV 02 0.6 0.6 0.4 FACTOR 1 Roia FaCr Panem 0.9 HRS 0.8 RRS 0.7 '" a: 0 t; c u. 0.6 0.5 R~RA 0.4 RE 0.3 Q RRE HRE 02 0.4 . T 0.6 0.6 FACTOR 1 Rotatl FaClr Pattem 0.9 HRS 0.8 RS 0.7 I' a: 0.6 gc u. HRA 0.5 RRA 0.4 REV 0.3 Q .02 0.4 FACTOR 2 ROlated Factr Panem 0.6 0.8 RRE HRE 209 9.16 '" '" 1A " 1 fJ. = Â- L''Y- (x.-x) and : ~J ~ n A 1'" "1 n L U l. j=l !,J. -! = _. . J=l From (9-50) ,\" _fJ' = A - L"- \" ( ) 0 "'A A 1"''' Since 1 ~1"'Al fjfj = Â- L ''l- (Cj - &HìSj - &)''1- L6. - . n "',. '" 1.. "1 "r ( -)(' -)1"'-1;"-1 '" 'f, f I. . = ti- L' '1- x . - x x . _ x UI I A J.;l -J-J' . -J - -J - x LU, J=l ' n, "1SAlA" = n ti",- 1" L' '1V- U-1 Us; ng (9A-l), n "'1'" "'l...1n. ""'1 r fJ.fJ~ = n ti- LI'1- ~-~(I +ti)ti- j=l Al" "'.., "" = n ti- ti(I+6)Â- = n(I+ti- ), a diagonal matrix. Consequently, the factor scores have sampl e mean vector Q and zero sampl e covarfances. 9.17 Using the information in Example 9.12, we have A I A i A i (.2220 -.0283J (Lz 'l; Lz)- = which, apar from rounding error, is a -.0283 .0137 diagonal matrix. Since the number in the (1,1) position, .2220, is appreciably different from 0, and the observations have been standardized, equation (9-57) suggests the regression and generalized least squares methods for computing factor scores could give somewhat different results. 210 9.18. Factor analysis of Wisconsin fish data (a) Principal component solution using Xl - X4 1 2 3 4 Ini tial Factor Method: Principal Components Eigenvalue 2.1539 0.7876 0.6157 0.4429 Difference 1.3663 0.1719 0.1728 Proportion 0.5385 0.1969 0.1539 0.1107 Cumulati ve 0.5385 0.7354 0 .8893 1.0000 Factor Pattern (m = 1) Factor Pattern (m = 2) FACTORl FACTOR2 F ACTORl BCRAPP IE 0.77273 0.73867 SBASS LBASS o . 64983 o . 76738 BLUEGILL BLUEGILL BCRAPPIE SBASS LBASS 0.77273 -0.40581 o . 73867 -0.36549 o . 64983 ~ .67309 0.76738 0.19047 (b) lvlaximum likelihood solution using Xl - X4 Ini tial Factor Method: Maximum Likelihood Factor Pattern (m = 1) FACTOR1 BLUEGILL BCRAPPIE SBASS LBASS 0.70812 o . 63002 o . 48544 0.65312 Factor Pattern (m = 2) F ACTOR1 F ACTOR2 BLUEGILL 0.98748 -0.02251 BCRAPPIE o . 50404 0.25907 SBASS 0.28186 0.65863 LBASS 0.48073 0.41799 (c) Varimax rotation. Note that rotation is not possible with 1 factor. Principal Components Varimax Rotated Factor Pattern BLUEGILL BCRAPPIE SBASS LBASS FACTOR1 0.85703 0.80526 0.08767 0.48072 FACTOR2 0.16518 0.17543 0.93147 0.62774 Maximum Likelihood Varimax Rotated Factor Pattern F ACTOR1 F ACTOR2 BLUEGILL 0.96841 0.19445 BCRAPPIE o .4350i 0 . 36324 SBASS 0.13066 O. 70439 LBASS 0.37743 0.51319 For both solutions, Bluegil and Crappie load heavily on the first factor, while large- mouth and smallmouth bass load heavily on the second factor. 211 (d) Factor analysis using Xl - X6 1 2 3 4 Initial Factor Method: Principal Components Eigenvalue 2.3549 1.0719 0.9843 0.6644 Difference 1.2830 0.0876 0.3199 0.1640 Proport ion 0.3925 0.1786 0.1640 0.1107 Cumulative 0.3925 0.5711 0.7352 0.8459 Factor Pattern (m = 3) F ACTORl F ACTOR2 o . 72944 -0.02285 BLUEGILL 0.72422 0.01989 BCRAPPIE o . 60333, 0 .58051 SBASS 0.76170 0.07998 LBASS WALLEYE NPIKE 5 o . 5004 6 o .4242 0.0762 o . 0834 o . 0707 o .9293 1 .0000 F ACTOR3 -0.47611 -0.20739 o . 26232 -0 . 39334 0 . 83342 -0.03199 -0.01286 0.44657 -0.18156 o . 80285 Varimax Rotated Factor Pattern F ACTORl F ACTOR2 F ACTOR3 o . 85090 -0.12720 -0. 13806 BLUEGILL 0.74189 0.11256 -0.06957 BCRAPPIE 0.51192 0.46222 0.54231 SBASS LBASS WALLEYE NPIKE 0.71176 0.28458 0.00311 -0.24459 -0.21480 0.86227 0.05282 0.92348 -0.14613 Initial Factor Method: Maximum Likelihood Factor Pattern FACTORl FACTOR2 F ACTOR3 o . 00000 BLUEGILL 1 . 00000 o . 00000 0.18979 BCRAPPIE 0.49190 0.23481 o . 96466 SBASS o . 26350 o . 00000 0.29875 LBASS o . 46530 o . 29435 O. 12927 -0.22770 -0.49746 WALLEYE 0.24062 NPIKE o . 06520 o . 46665 Varima Rotated Factor Pattern F ACTOR1 F ACTOR2 F ACTOR3 BLUEGILL BCRAPPIE SBASS LBASS WALLEYE NPIKE o . 99637 0 . 06257 0 .05767 0.46485 0.21097 0.26931 0.20017 0.97853 0.04905 0.42801 0.31567 0.33099 -0.20771 O. 13392 -0.50492 o . 02359 0 . 22600 0 .47779 The first principal component factor influences the Bluegil, Crappie and the Bas. The Northern Pike alone loads heavily on the second factor, and the Walleye and smallmouth bass on the third factor. The MLE solution is different. 212 9.19 (a), (b) and (c) l1aximum Likelihood (m = 3) lJNROTATED FACTOR LOADINGS (PATTRN) FOR l1AXIMU~' LIKELIHOOD CANONICAL FACTORS Factor '1 Growth 1 Profits 2 3 4 5 6 7 Newaccts Creati ve r~echani c Abstract Math VP 0.772 0.570 '0..774 0.389 0..509 0.968, 0.632 26'Z 3. Factor Factor 2 3 0.295 0.347 0.433 0.527 0.721 0.355 0.000 0.334 0.921 o .426 ,..-0.250 0.181 O~OOO 1 .520 1 .566 0.729 ROTATED FACTOR LOADINGS (PATTERN) Growth Prof; ts Newaccts Creat; ve Mechani c . Abstract Math 1 2 3 4 5 6 7 VP Factor Factor Factor 1 2 3 0.374 0.316 0.544 0.437 0.794 0.912 0.653 . O~ 919 0.054 0.179 0.437 0.019 0.208 0.953 0.295 3~ 180 1 . 720 1 . 4"54 0.255 0.541 0.300 Communa 1; ti es 1 _ Growth 2 Profi ts 3 Newaccts 4 Creative 5 Mechani c 6 Abstract 7 Math O. 1 84 0.9615 0.9648 O. 967 O. 464 Specifi~ Variances 1 . nOOO .0385 .0352 .0876 .0000 .4481 .0000 0.9631 .'0369 '0.9124 1 .0000 0.S519 213 1.0 .926 1.0 .884 .843 1.0 R = .542 .708 .746 .674 .465 .700 .637 .641 1.0 .591 .147 .386 .572 1.0 1.0 (Symetri c) 1.0 .923 1.0 Ll ,. .912 .848 .572 .542 .700 1.0 + 'l = . '575 .56£ 1.0 1.0 ""A .927 .944 .853 .413 .694 .679 .674 .455 . .696 .641 .591 .147 .386 1.0 1.0 (Symmetri c) .925 .948 .826 .413 .646 .566 1.0 It is clear from an examination of the r.esidual matrix ,. A R - (LL i +'1) that an m = 3 factor sol ution repr.esents the observed carrel ations quite well. However, it is dlfficul t to . provide intei:-retations for the factors. If we consióer the, . rotated loadings, we see that the last two factors ar.e dominated by the- single variables IIcreativell and "abstra'Ct" r.espectively. The first factor links the salespeople performance variables wi th ma th a bi 1 i ty. '(4) Using (9-39) \.iith n = 50, p. = 7, m = 3 we have 43 833 1 n (. 00007593l\ = 62 1 ;) x32(.,o1)= 11.3 , . .000018427). 214 so \'le reject HO:r = LL' + 'l for m = 3. Neither.of the m = 2, m,= 3 factor models appear to fit by the' x'- criteri-on. He , AA " note that the matrices R, LL' + V have small determinants and rounding error could affect the calculation of the test statistic. Again, t~e residual matrix above indicates a good fit for m = 3. (e.) ~' = (1.522, -.852, .465, .957, 1.129, .673, .497) Using the regression method for computing factor scores, we , A_1 have; wi th f = LzR~ : - Principal components (m = 3) Maximum 1 ikel i hood (m = 3) f' = (.686, .271,1.395) f' = (-.70Z, .679, -.751) ,computed Factor scores using weighted least squares can 'only be A_l for the principal component sol utions si nce '1 cannot be com" puted for the maximum likelihood solutions. ('1 has zeros on the main diagonal for the maximum lik~lihood solutions). Using (9-50), Principal components (m = 3) l' = (..344, .2~3, 1.805) 9.20 Xs ~ -.59 -2.23 6.78 3u.78 11.36 3.13 31.98 L(symetric) 2.;; -;~7 S = 300.52 ) 215 (a) Princi pa 1 components (m = 2) Factor 1 Factor 2 1 oadi ngs 1 oadi ngs i Xl el.lind) X2 ~solar rad.) Xs (N02) X6 (03) -.17 17.32 -.37 i -.61 I .42 .74 1.96 I 5.19 I Cb) Maximum 1 ikol ihood estimates of the loadings are obtained from L = ~z where Lz a~e the l.oadings obtåined from the sample A Z '. correlation matrix R. (For t see problem 9.23). Note: Maximum 1 ikol ihood estimates of the loadings for m = 2 may be di ff1cul t to obtain for some computer packages without good . , estimates of the communålities. One choice for initial esti- mates of the comnunallties are the communalities from the m = 2 principal components solution. (c) Haximum likelihood estimation (\.,ith m = 2) does a better job. of accounti ng for the covari ances inS than the m = 2 principal component sol uti on. On the other hand, the pri ncipal component sol ution generally produces uniformly small~r ~stimates of the specific variances. For thë unrotated m = L solution, the first factor is dominated by Xl = solar ,radiation and Xs = °3. The second factor seems t~ be a contrast ,between the paJr Xl = wind; X2 = solar radi~tion and the pair X5 = NOZ and ~6 = OJ . ~gain the ff.rst factoi. is dominated by solar radiation and,. to som~ extent, ozone. The second' factor" might ba interpretad as a contrast bebieen wind and the pair of pollutants N02 and 03. Recall solar radiation and ozone have the larg~st sample variances. This will affect the estimated loadings obtained by the principal component method.' " 9.22 (a) Since, for maximum 1 ikel ihood estimat.es, ,. L =i D~Lz and S = O'lRO\ the factor scores gener~ted by the equations for tj in (9-58) will be identical. Similarly, the fact~r scores generated by the we; ghted 1 eas~ squares formul as in (9-SQ) wi 11 be identical. l"e factor scores generated by the regression method wi th .. maximum likelihood estimates (m =2; seeproblem9.23~) are giv€n -l below for the first 10 case~. Case 2 3 ,. f1 - 0..316 0.252 0.129 4 0.'332 5 6 0.492 0.515 0.530 7 8 9 10 :¡ .070 0.384 -::0._,179 " f2 -0.544 -0.546 -0.509 -0.790 -0.01.2 -0.370 -0.456 0.724 -0. tl23 p.io: 217 (b) Factor scores using principal component estimates (m = 2) and (9-51) for the fit.st 10 cases are given below: Case 1 2 3 ,. f1 '" 1 .203 -0.368 'f . 646 ~ 1 . 029 1.447 0.717 0.856 4 0.795 o . 811 0.518 0.950 ~O. 083 1 .1 68 0.410 ~0.492 10 -0.937 -0.049 0.394 5 6 7 8 9 f2 0.259 0.072 (c) The sets of factor scores are quite different. Factor scores depend heavily on the method used to estimate loadings and specific variances as well as the method us~d to g~nerate them. 9.23 , Principal components (m = 2) Factor 2 Rotated load; ngs 1 oadi ngs loadin~s Facto r 1 -.56 Factor 1 . X2 (solar rad.) .65 -.24 -.52 Xs ( NOZ) ,.48 .74 .77 ~'.20 Xl Xs (wind) (°3) -.31 C! -.05 Q£ 2 Factor I -.53 I - .04 (Æ .30 218 l1aximum likeli'hood (m = 2) Factor 1 Factor 2 1 oadi ngs loadings Factor 1 Factor -.38 .32 -.09 -X ' 2 (solar rad.) .50 .27 C: em IX5 (N02)' .25 -.04 .17 - .19 ~6 (°3) .65 -.03 C& I -.43 J ~ 1 (wi nd) Rota teet loadi ngs 2 -.10 Examining the rotated loadings, we see that both solution methods yield similar estimated loadings for the first .factr. It mi ght be called a "ozone pollution factor'l. There are some differences for the s,econd factor-. However, the second factor appears to compare one of the pollutants with wind. It might be called a "pollutant transportU factor. \4e note that the intèrpretations of the factors might differ depending upon the choice of R or S (see problems 9.20 and 9.21) for analysis. Al so the two sol ution methods give somewhat different results indicating the solution is not ve~ stabl e. Some of the observed carrel ations between the variables are vary small implying that a m = 1 or m = 2 factor model for these 'four variables will not be a completely satiSfactory description of the under~ 'lying structure. We may need about as many factors as vari~blas. If this is ,the ca~e, there is nothing to be gained by proposing a fa-ctor model. 219 9.24 -.192 .313 -.119 .026 -.192 1.0 - .065 .373 .685 R = .313 -.065 1.0 -.411 -.010 1.0 -.119 .026 .373 -.411 1.0 .180 .685 -.010 .180 1.0 The correlations are relatively small with the possible exception of .685, the correlation between Percent Professional Degree and Median Home Value. Consequently, a factor analysis with fewer than 4 or 5 factors may be problematic. The scree plot, shown below, reinforces this conjecture., The scree plot falls off almost linearly, there is no sharp elbow. However, we present a factor analysis with m = 3 factors for both the principal components and maximum likelihood solutions. SçreêPlqlofPopulation, .., MedianHøme 2.0 1.5 lI :i ii ~ 1.0 lI ai iü 0.5 0.0 2 4 3 Factr 5 Numbe Principal Component Factor Analysis (m = 3) Unrotated Factor Loadings and Communalities Factor1 Factor2 Factor3 Communality 0.9'62 -0.371 -0.541 -0.729 0.870 0.153 0.837 -0.381 PerCen tProDeg 0.756 0.209 -0.460 -0.708 PerCentEmp::16 0.807 -0.512 0.295 0.676 PerCen tGovEmp 0.830 0.064 -0.584 0.696 MedianHorne Variable Population Variance % Var 1.9919 0.398 1.3675 0.274 0.8642 0.173 4.2236 0.845 220 Rotated Factor Loadings and Communalities Varimax Rotation Variable Population Factor1 Factor2 Factor3 Coiiunal i ty 0.102 C-d.801ì -0.321 11.756 -0.059 -0.118 ~ ~ 0.160 0.147 PerCen tProDeg PerCentEmp~16 0.962 0.870 MedianHome ~ 0.009 -0.068 0.807 0.830 Variance 1.7382 0.348 1.4050 0.281 4.2236 0.845 PerCen tGovEmp % Var 0.277 _.Q_85Q'/ -0.082 1.0803 0.216 Factor Score Coefficients Factor1 Factor2 Factor3 0.138 -0.940 -0.019 Variable Population -0.028 -0.577 0.658 -0.099 0.522 0.169 0.052 0.544 PerCentProDeg PerCentEmp~16 PerCen tGovEmp MedianHome 0.109 -0.135 -0.278 -0.070 Score Plot of Population, .., MedianHome (PC:) 4 . . 3 2 .~ . ~ 1 æ . . . . . . .~. . . . 'a .. .. . . .. . . . I 0 -1 . . . . . ... . . . . ,. . . . . .' . . i . -2 . -3 -2 -1 o Firs i Faêtr 2 4 3 Maximum Likelihood Factor Analysis (m = 3) * NOTE * Heywood case Unrotated Factor Loadings and Coiiunalities Factor1 Factor2 -0.047 -0.999 0.146 0.989 PerCen tProDeg -0.020 -0.313 PerCentEmp~16 0.103 0.362 PerCen tGovEmp Variable Population MedianHome Variance % Var Factor3 -Coiiunality -0.0011 -0.000 0.941 0.701 -0.059 -0.395 -0.015 1.6043 0.321 1.1310 0.226 1.0419 0.208 1.000 1.000 0.984 0.298 0.49'6 3.7772 0.755 221 ~ c¡ ~ 0.145 1. 000 1. 000 -0.165 0.041 -0.061 0.984 0.298 0.496 1.1740 0.235 1. 0282 3 .7772 Factor1 Factor2 -0.177 0.137 -0.053 1. 017 PerCen tProDeg 1. 025 0.070 PerCentEmp::16 -0.001 -0.010 PerCen tGovEmp -0.000 -0.001 MedianHome Factor3 -1.046 -0.046 0.159 -0.002 -0.000 Rotated Factor Loadings and Communalities Varimax Rotation Factor1 Factor2 0.155 -0.036 -0.090 PerCentProDeg 0.047 PerCentEmp::16 -0.430 0.333 PerCen tGovEmp Variable population ~ MedianHome -. 4 1.5750 0.315 Variance % Var Factor3 Conuunality 0.755 0.206 Factor Score Coefficients Variable population Plot Score Population, .., MedianHoníé (MU£') of . 2 . .. . . 1 .. .. ~ : 0 .,. . ~ . . . ... .. -ø i: 8 . iX -1 . . .. . . .. . . . . . . . .. . . . . . . . . . . . . -2 . . -3 -2 -1 o 1 2 3 4 fil'Fllctr A m = 3 factor solution explains from 75% to 85% of the variance depending on the solution method. Using the rotated loadings, the first factor in both methods has large loadings on Percent Professional Degree and Median Home Value. It is difficult to label this factor but since income is probably somewhere in this mix, it might be labeled an "affluence" or "white collar" factor. The second and third factors from the two solutions are similar as well. The second factor is a bipolar factor with large loadings (in absolute value) on Percent Employed over 16 and Percent Government Employment. We call this factor an "employment" factor. The third factor is clearly a "population" factor. Factor scores for the first two factors from the two solutions methods are similaro 222 9.25 105,625 S -' 94,734 87,242 94,Z80 101,761 76,186 81 ,204 91 ,809 90,343 H)4,329 (Symmetric) A m = 1 factor model appears to represent these data qui te well . Pri nci pa 1 Components Factor 1. loadings Maximum Likelihood Fa ctor 1 loadings Shocki./ave 317. 320. Vibration 293. 291. Stati c test 1 . 287. 275. Static test 2 307. 297. 90.1% 86.9% Proportion . Variance Expl ained Factor scores (m = 1) using the ~gression method for the first 'few cases are: Principal Components Maximum L i kel ihood -.009 -.033. 1 .530 1.524 .808 .719 - .804 - .802 The factor scores produced from the two sol ution methods ar.e v.ery similar. The correlation between the two sets of sc~~es is .992. T1i'Outli.ers, spet:imens 9 and 16, were i'Óentifi..d in 'Exæipl,e 4.15. 223 9.26 a) Principal Compûn~nts L m = i( Factor 1 loadi nos lm=2( '1 . Factor 2 '1 oadi naS I,Factor 1 11 oadi nQS 1 'P . , Litter 1 21.9 309.0 27.9 -6.2 271 .2 Li tter 2 " 30.4 205.7 30.4 -4.9 182.2 Litter 3 31.5 344.3 31.5 18.5 1.7 Li tter 4, 32.9 310.0 32.9 -8.0 245.8 Percentage Variance Explained 76.4i 76.4% l! b) " 9.4i Maximum Likelihood Litter 1 , Factor 10adinas 26.8 ' '~ v.i 370.2 Litter 2 30.5 1 98.2 Litter 3 28.4 529.6 Litter 4 ',30.4 471. 0 Percentage Vari ance ' , 68.8i Explained, The maximum likelihood. estimates of the factor loadings for ii = Z we're not o,btained due to convergence difficul ti es in the computer program. c) It is only necessary to r~tate the m = 2 solution. 224 Principal Components (m = 2) Rata ted 1 oadi ngs FactOr 1 'Factor 2 Litter 1 26.Z 11.4 li tter 2 27.5 13.8 Litter 3 14.7 33.4 Litter 4 31.4 12.8 Percentage Var; ance .53.5~ 32.4% Explained 9.27 ,Principal Components, (m = 2) Rotated loadin9s 'l . Factor 1 Factor. 2 10a~ings .. loadings Litter 1 .86 .44 .06 .33 .91 Li tter 2 .91 .12 .15 .59 .71 Li tter 3 .85 -.36 .14 .87 .32 Litter 4 ,.87 -°.21 .20 .78 .44 45.4% 40.6% i Factor 1 , J Factor 2 .. Percentage Variance Expl ained' 76.5% 9.5~ 225 Maximum Likelihood (m = 1) ,. Factor 1 '1 . loadings 1 Li tter 1 .81 .34' Litter 2 .91 .17 litter 3 .78 .39 litter 4 .ßl Percentage Variance 68.81 Expl ai ned '" "-1 f = L R z z. _ = .297 , .34 , 226 9.28 The covariance matrix S (see below) is dominated by the marathon since the marathon times are given in minutes. It is unlikely that a factor analysis wil be useful; however, the principal component solution with m = 2 is given below. Using the unrotated loadings, the first factor explains about 98% of the variance and the largest factor loading is associated with the marathon. Using the rotated loadings, the first factor explains about 87% of the varance and again the largest loading is associated with the marathon. The second factor, with either unrotated or rotated loadings, explains relatively little of the remaining variance and can be ignored. The first factor might be labeled a "running endurance" factor but this factor provides us with little insight into the nature of the running events. It is better to factor analyze the correlation matrix R in this case. Covariances: 100m(s), 200m(s), 400m(s), 800m, 1500m, 3000m, Marathon 100m(s) 200m(s) 400m(s) 800m 1500m 3000m 100m(s) 200m(s) 400m(s) 800m o .02770 0.86309 2.19284 0.06617 0.20276 0.55435 10.38499 6.74546 0.18181 0.50918 1.42682 28.90373 0.00755 0.02141 0.06138 0.15532 0.34456 0.89130 Marathon 0.08389 0.23388 4.33418 Marathon Marathon 270.27015 1. 21965 Principal Component Factor Analysis of S (m = 2) Unrotated Factor Loadings and Communalities Variable Factorl Factor2 Communality 0.124 -0.230 0.267 100m (s) 0.749 -0.582 0.640 200m( s) 6.725 -1.881 1.785 400m(s) 0.006 -0.027 0.075 800m 0.052 -0.073 0.217 1500m 0.453 -0.158 3000m Mara thon Variance % Var ~ 16.438-' 0.238 270.270 274.36 0.984 4.02 0.014 278.38 0.999 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factor1 Factor2 Communality 0.124 -0.308 0.172 100m(s) 200m(s) 400m(s) 800m 150 Om 3000m 0.401 1. 030 0.061 0.178 o ~~ Marathon (i5.517' Variance 242.38 0.869 % Var -0.767 -2.380 -0.051 -0.143 -0.373 -5.431 0.749 6.725 0.006 0.052 0.453 270.270 36.00 0.129 278.38 0.999 1500m 3000m 0.07418 0.21616 3.53984 0.66476 10.70609 227 The correlation matrix ~ for the women's track records follows. Correlations: 100m(s), 200m(s), 400m(s), 800m, 1500m, 3000m, Marathon 100m(s) 0.941 0.871 0.809 0.782 0.728 0.669 200m(s) 400m (s) 800m 150 Om 3000m Mara thon 200m (s) 400m (s) 800m 1500m 3000m 0.909 0.820 0.801 0.732 0.680 0.806 0.720 0.674 0.677 0.905 0.867 0.854 0.973 0.791 0.799 The scree plot below suggests at most a m = 2 factor solution. Scree Plot of iOOm($l, _.,Maràthon(èortlilàtion lbtnx) II ,~ ¡c 3 ,ii ØI ¡¡ 2 1 o 1 3 4 5 i 6 F.ctrNumbet Principal Component Factor Analysis of R (m =2) Unrotated Factor Loadings and Communalities Communality 0.933 Variable Factor1 100m(s) 200m(s) 400m(s) 800m 150 Om 3000m Marathon Variance % Var 0.910 0.923 0.887 0.951 0.938 0.906 0.856 5.8076 0.830 0.960 0.919 0.921 0.940 0.934 0.828 0.6287 0.090 6.4363 0.919 7 228 Rotated Factor Loadings and Communalities Varimax Rotation Variable Communali ty 0.933 0.960 0.919 0.921 0.940 0.934 0.828 100m(s) 200m(s) 400m(s) 800m 1500m 3000m Marathon 3.3530 0.479 Variance % Var 6.4363 0.919 3.0833 0.440 Factor Score Coefficients Variable Factor1 Factor2 -0.240 -0.480 100m(s) -0.244 -0.488 200m(s) -0.288 -0.525 400m(s) 0.259 0.035 800m 0.172 0.386 1500m 0.280 0.481 3000m 0.255 Marathon 0.445 Plot of 10011(5), ..,Marâthôll(PC:, rn=2, Score 2 !1 . .. j 0 :. ,;c . . ~-1 tI . . . .. . . .. . . . .. . . . .. . .. . . .. .. .. . . .~i.(. '#1/0 . . . . -2 .#3/ . #11 -3 -2 -1 o 1 Factr 2 3 Firs Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Communality 100m(s) 200m(s) 400m(s) o . 90'6 0.976 0.848 0.856 0.984 0.972 800m 1500m 3000m Marathon Variance % Var o . 6'62 5 . 6 i 04 o .592? 0.801 0.085 6.2032 0.886 4 5 229 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factorl Marathon 0.455 0.449 0.395 .728 0.879 0.915 0.690 Variance 3.1B06 100m(s) 200m(s) 400m(s) 800m 150 Om 3000m % Var 0.454 Communality 0.906 0.976 0.84B 0.B56 0.984 0.972 0.662 3.0225 0.432 Factor Score Coefficients 6.2032 0.886 Variable Factor1 Fat:tor2 100m(s) -0.107 0.237 200m(s) -0.481 1.019 400m(s) -0.077 0.lS7 0.036 0.772 0.595 0.024 BOOm 1500m 3000m Marathon 0.025 -0.317 -0.369 -0.003 ScorePJol of 100m(s), ..., Maràthon(JiL~, m=2) 3 . #'1 ..#31 2 .. ~ ..I. ... , l 1 = l .. *":u. . . .::. . 1° . .. -1 -2 -2 -1 o 1firs Fad 2 3 4 The results from the two solution methods are very similar. Using the unrotated loadings, the first factor might be identified as a "running excellence" factor. All the running events load highly on this factor. The second factor appears to contrast the shorter running events (100m, 200m, 400m) with the longer events (800m, 1500m, 3000m, marathon). This bipolar factor might be called a "running speed-running endurance" factor. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance"-since the running events 800m through the marathon load highly on this factor. The second factor might be classified as a "running speed" factor. Note, for both factors, the remaining running events in each case have moderately large loadings on the factor. The two factor solution accounts for 89%-92% (depending on solution method) of the total variance. The plots of the factor scores indicate that observations #46 (Samoa), #11 (Cook Islands) and #31 (North Korea) are outliers. 230 9.29 The covariance matrix S for the running events measured in meters/second is given below. Since all the running event variables are now on a commensurate measurement scale, it is likely a factor analysis of S wil produce nearly the same results as a factor analysis of the correlation matrix R. The results for a m = 2 factor analysis of S using the principal component method are shown below. A factor analysis of R follows. Covariances: 100m/s, 200m/s, 400m/s,800m/s, 1500m/s, 3000m/s, Marmls 3000m/s 1500ml s 800m/s 400ml s 200m/s 100ml s Marml s 0.0905383 0.0956063 0.0966724 0.0650640 0.0822198 0.0921422 0.0810999 Marml s 0.1667141 lOOmIs 200m/s 400m/s 800m/s 1500m/s 3000m/s 0.1146714 0.1138699 0.0749249 0.0960189 0.1054364 0.0933103 0.1377889 0.0809409 0.0954430 0.1083164 0.1018807 0.0735228 0.0864542 0.0997547 0.0943056 Marml s Principal Component Factor Analysis of S (m = 2) unrotated Factor Loadings and communalities communality 0.083 0.110 0.128 0.066 0.116 0.168 0.148 variable lOOmIs 200m/s 400m/s 800m/s 1500m/s 3000m/s Marml s Variance 0.73215 0.08607 % Var 0.829 0.097 0.81822 0.926 Rotated Factor Loadings and Communalities Varimax Rotation communality 0.083 0.110 0.128 0.066 0.116 0.168 0.148 variable 10 Oml s 200m/s 400m/s 800m/s 1500m/s 3000m/s Marrl s Variance 0.45423 0.36399 % Var 0.514 0.412 Factor Score Coefficients Factor2 variable Factor1 -0.171 -0.363 lOOmIs -0.222 -0.471 200m/s -0.306 -0.603 400m/s 800ml s 1500ml s 3000m/s Marmls 0.104 0.287 0.542 ,Q . sse -0.025 0.08'5 o . 28'0 -0 . 33S 0.81822 0.926 0.1238405 0.1437148 0.1184S78 0.1765843 0.1465-604 231 Using the unrotated loadings, the first factor might be identified as a "running same size loadings on excellence" factor. All the running events have roughly the factor. The second factor appears to contrast the shorter running events (100m, 200m, 400m) with the longer events (800m, 1500m, 3000m, marathon). This bipolar factor might be called a "running speed-running endurance" factor. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon have higher loadings on this factor. The second factor might be classified as a "running speed" factor. Note, for both factors, the remaining running events in each case have moderate and roughly equal loadings on the factor. The two factor this solution accounts for 93% of the varance. The correlation matrix R is shown below along with the scree plot. A two factor solution seems waranted. Correlations: 100m/s, 200m/s, 400m/s, 800m/s, 1500m/s, 3000m/s, Marm/s lOOmIs 0.938 0.866 0.797 0.776 0.729 0.660 200m/s 400m/s 800m/s 1500ml s 3000m/s Is Marr 2 OOml s 400m/s 800m/s 1500ml s 3000ml s 0.906 0.816 0.806 0.741 0.675 0.804 0.731 0.694 0.672 0.906 0.875 0.852 0.972 0.824 0.854 Scree Plot of lOOmIs, .., Marm/s (Correlation Matñx) 0.8 fl.7 D,6 .:lI 0.5 ¡,i: 0. II ! D.3 0.2 0.1 0.0 1 2 3 4 5 component Number 6 7 232 Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Communal i ty Vari.able 0.932 0.960 0.911 0.914 0.941 0.947 0.875 lOOmIs 20 Oml s 40 Oml s 800m/s 1500m/s 3000m/s Is Marr Vari.ance % Var 5.8323 0.833 6.4799 0.926 0.6477 0.093 Rotated Factor Loadings and Communalities Varima Rotation Variable Factor1 10 Oml s 20 Oml s 400m/s 80 Oml s l500m/s 3000m/s Marml s Variance % Var Communality 0.932 0.418 0.436 0.400 0.771 0.839 0.886 0.871 0.960 0.911 0.914 0.941 0.947 0.875 3.3675 0.481 6.4799 0.926 3.1125 0.445 Factor Score Coefficients Variable Factor1 Factor2 -0.252 -0.489 lOOmIs -0.243 -0.484 20 Oml s -0.265 -0.499 400m/s 800m/s 15 OOml s 3000m/s Marr/ s 0.248 0.358 0.455 0.484 0.025 0.142 0.249 0.293 ScoreP1..tíifl'ØOm/s, ..,Marm/s (PC,m=2) 3 .#31 . ",i \ 2 . . 0 .. o ~q!" l 1 11 . :t.(j,:: 'ii .. 'l . . -1 . . . .. .. . .. ai o. . . . -2 -4 -3 -2 -1 Firs Factr . . . . '8 0 . I) . . . . . 0'. . . 0 0 . . . i 2 233 Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and communalities Communa1i ty Variable 0.896 0.983 0.836 0.850 0.971 0.984 0.737 lOOmIs 20 Oml s 400ml s 80 Oml s 1500m/s 3000m/s MannI s % Var 6.2560 0.894 0.5716 0.082 5.6844 0.812 Variance Rotated Factor Loadings and communalities Varimax Rotation Communality 0.896 0.983 0.836 0.850 Variable Factor1 0.441 0.435 0.412 rO. 100ml s 200m/s 400m/s 800m/s 26 0.971 0.984 0.737 0.859 0.914 0.765 1500ml s 3 OOOml s MannI s 3.2395 0.463 Variance % Var 6.2560 0.894 3.0165 0.431 Factor Score Coefficients Variable Factor1 Factor2 -0.167 -0.073 10 Oml s -0.521 -1.122 2 OOml s -0.106 -0.048 40 Oml s 0.039 0.379 0.949 0.041 80 Oml s 1500m/s 3000m/s MannI s -0.014 0.124 0.518 0.017 scOl'ePlotøf100mls, ..., Marmls (MLE, m=2) .~ .ii 3 . ~! ',. 2 " " . .. 1 ~ l. ,:c: . *q,~ . .. A . . . o lI -l .:i . . . ". . . . " . . . .. . . .. ., . . .. " . . . . . " " . . . ..,... . . -2 -3 -4 -3 -2 -1 Firs Fac:or o 1 2 234 The results from the two solution methods are very similar and very similar to the principal component factor analysis of the covariance matrix S. Using the unrotated loadings, the first factor might be identified as a "running excellence" factor. All the running events load highly on this factor. The second factor appears to contrast the shorter running events (100m, 200m, 400m) with the longer events (800m, 1500m, 3000m, marathon). This bipolar factor might be called a "running speed-running endurance" factor. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon load highly on this factor. The second factor might be classified as a "running speed" factor. Note, for both factors, the remaining running events in each case have moderately large loadings on the factor. The two factor solution accounts for 89%-93% (depending on solution method) of the total variance. The plots of the factor scores indicate that observations #46 (Samoa), #11 (Cook Islands) and #31 (North Korea) are outliers. women's track records when time is measured in meters per second are very much the same as the results for the m = 2 factor analysis of R presented in Exercise 9.28. If the correlation matrix R is factor analyzed, it makes little difference whether running event time is measured in The results of the m = 2 factor analysis of seconds (or minutes) as in Exercise 9.28 or in meters per second. It does make a difference if the covariance matrix S is factor analyzed, since the measurement scales in Exercise 9.28 are quite different from the meters/second scale. 235 9.30 The covariance matrix S (see below) is dominated by the marathon since the marathon times are given in minutes. It is unlikely that a factor analysis wil be useful; however, the principal component solution with m = 2 is given below. Using the unrotated loadings, the first factor explains about 98% of the variance and the largest factor loading is associated with the marathon. Using the rotated loadings, the first factor explains about 83% of the varance and again the largest loading is associated with the marathon. The second factor, with either unrotated or rotated loadings, explains relatively little of the remaining variance and can be ignored. The first factor might be labeled a "running endurance" factor but this factor provides us with little insight into the nature of the running events. It is better to factor analyze the correlation matrx R in this case. Covariances: 100m, 200m, 400m, 800m, 1500m, 5000m, 10,OOOm, Marathon 5000m 1500m 800m 400m 200m 100m 100m 200m 400m 800m 1500m 500 Om 10 i OOOm Marathon 10 i OOOm Marathon 0.048973 0.111044 0.256022 0.008264 0.025720 0.124575 0.265613 1. 340139 0.300903 0.666818 0.022929 0.066193 0.317734 0.688936 3.541038 10 i OOOm Mara thon 2.819569 14.342538 80.135356 2.069956 0.057938 0.168473 0.853486 1.849941 9.178857 0.002751 0.007131 0.034348 0.074257 0.378905 Principal Component Factor Analysis of S (m = 2) Unrotated Factor Loadings and Communalities Variable Factor1 Factor2 Communality 0.034 -0.107 0.152 100m 0.234 0.401 -0.270 200m 2.049 -0.979 1.044 400m 0.002 -0.015 0.043 800m 0.019 0.134 -0.033 1500m 0.537 -0.125 0.722 5000m 10.000m Mara thon Variance % Var ~ ~. -0.223 0.179 2.643 80.130 84.507 0.983 1.141 0.013 85.649 0.996 Rotated Factor Loadings and Communalities Varimax Rotation Variable 100m 200m 400m 800m 1500m 5000m 10 i OOOm Mara thon Variance % Var Factor1 Factor2 Communality -0.158 -0.406 -1.312 -0.031 -0.083 -0.399 -0.841 0.034 0.234 2.049 0.002 0.019 0.537 2.643 80.130 71.529 14.119 '0.832 0.lb4 85.649 0.996 0.097 0.262 0.573 0.033 0.110 0.615 1.392 ~~J 0.023034 0.105833 0.229701 1.192564 0.578875 1. 262533 6.430489 236 The correlation matrix Rfor the men's track records follows. Correlations: 100m, 200m, 400m, 800m, 1S00m, SOOOm, 10,OOOm, Marathon 200m 400m 800m lS00m SOOOm 10,000m 0.845 0.797 0.795 0.761 0.748 0.721 0.768 0.772 0.780 0.766 0.713 0.896 0.861 0.843 0.807 0.917 0.901 0.878 0.988 0.944 0.954 100m 0.915 0.804 0.712 0.766 0.740 0.715 0.676 200m 400m 800m lS00m SOOOm 10,000m Marathon The scree plot below suggests at most a m = 2 factor solution. .. S Fà.rNumbe Principal Component Factor Analysis of R (m =2) Unrotated Factor Loadings and Communalities Variable Fa to 100m 200m 400m 800m lS00m SOOOm 10,000m Mara thon Variance % Var 0.861 0.896 0.878 0.914 0.948 0.957 0.947 0.917 6.7033 0.838 Fa tor2 0.423 0.376 0.276 Communality -0.123 -0.236 -0.267 -0.309., 0.920 0.944 0.847 0.840 0.913 0.972 0.969 0.937 0.6384 0.080 7.3417 0.918 1 237 Rotated Factor Loadings and Communalities Varimax Rotation Communality Variable 0.920 0.944 0.847 0.840 0.913 0.972 0.969 0.937 100m 200m 400m 800m 1500m 5000m 10.000m Mara thon Variance % Var 7.3417 0.918 3.2249 0.403 4.1168 0.515 Factor Score Coefficients Variable Factor1 Factor2 0.586 -0.335 100m 0.533 -0.283 200m 0.413 -0.183 400m 800m 150 Om 5000m 10.000m Marathon 0.004 -0.053 -0.186 -0.224 -0.277 0.176 0.233 0.349 0.380 0.420 ,Scareeløt~f'100ml .n, Marathoia:lPÇ,ltn=2) . #-" . it tj 10 . .. .. ... .0.. . .. . . . ., -.. .. o ~ -3 o -1 -2 3 1 Firs Fadr Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Fac 100m 200m 400m 800m 1S00m SOOOm 10.000m Marathon Variance % Var r Communali ty 0.866 0.963 0.772 0.788 0.866 0.988 0.989 0.912 0.780 0.814 0.810 0.875 0.927 0.991 0.989 0.949 6.4134 0.802 0.7299 0.091 7.1432 0.893 238 Rotated Factor Loadings and Communalities Varimax Rotation Variable Communal i ty 0.866 0.963 0.772 0.788 0.866 0.988 0.989 0.912 100m 200m 400m 800m 150 Om 5000m 10,000m Marathon 7.1432 0.893 Variance 3.9446 3.1986 % Var 0.493 0.400 Factor Score Coefficients Variable Factorl Factor2 0.256 -0.125 100m 0.994 -0.490 200m '0.104 -0.044 400m 0.054 -0.011 800m 1500m 5000m 10,000m Marathon 0.003 0.558 0.761 0.089 0.056 -0.209 -0.423 -0.051 ,Seore9Jpt,of1.'ØOJDI"" Marathon (ML£,nì::l) 3 . iF II .. L ..,'ll, .. 1 . '" e. .. . ... . . .. fi 'I1l 'i ..... 1.1 . . . .. -2 . -3 -2 ~1 o 1 2 3 Firs Factr The results from the two solution methods are very similar. Using the unrotated loadings, the first factor might be identified as a "running excellence" factor. ,All the running events load highly on this factor. The second factor appears to contrast the shorter running events with the longer events although the nature of the contrast is a bit different for the two methods. For the principal component method, the 100m, 200m and 400m events have positive loadings and the 800m, IS00m, 5000m, 1O,000m and marathon events have negative loadings. For the maximum likelihood method, the 100m, 200m, 400m, 800m and 1 SOOm events are in one group (positive loadings) and the 5000, 1O,OOOm and marathon are in the other group (negative loadings). Nevertheless, this bipolar factor might be called a 239 "running speed-running endurance" factor. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon load highly on this factor. Th~ second factor might be classified as a "running speed" factor. Note, for both factors, the remaining running events in each case have moderately large loadings on the factor. The two factor solution accounts for 89%-92% (depending on solution method) of the total varance. The plots of the factor scores indicate that observations #46 (Samoa) and #11 (Cook Islands) are outliers. The factor analysis of the men's track records is very much the same as that for the women's track records in Exercise 9.28. 9.31 The covariance matrix S for the running events measured in meters/second is given below. Since all the running event variables are now on a commensurate measurement scale, it is likely a factor analysis of S wil produce nearly the same results as a factor analysis of the correlation matrix R. The results for a m = 2 factor analysis of S using the principal component method are shown below. A factor analysis of R follows. Covariances: 100m/s, 200m/s, 400m/s, 800m/s, 1S00m/s, SOOOm/s, 10,OOOm/s,... lOOmIs 2 OOml s 400m/s 800m/s 1500m/s 5000m/s 10, OOOml s Mara thonr/ s 5000m/s 10. OOOml s Mara thonrl s lOOmIs 0.0434979 0.0482772 0.0434632 0.0314951 0.0425034 0.0469252 0.0448325 0.0431256 200m/s 400m/s 800m/s 1500m/s 0.0648452 0.0558678 0.0432334 0.0535265 0.0587731 0.0572512 0.0562945 0.0688217 0.0428221 0.0537207 0.0617664 0.0599354 0.0567342 0.0468840 0.0523058 0.05715£0 0.0553945 0.0541911 0.076'6388 5000ml s 10, OOOml s Marathonr/s 0.0942894 0.0909952 0.0979276 0.0959398 0.0937357 0.0905819 Principal Component Factor Analysis of S (m = 2) Unrotated Factor Loadings and Communalities Variable lOOmIs 2 OOml s 400m/s 800m/s 1500ml s 5000m/s 10. OOOml s Marathonr/s Variance % Var Fact 0.171 0.219 0.223 0.195 ' 0.256 0.301 0.296 0.293 0.49405 0.04622 0.844 0.079 Communali ty 0.038 0.061 0.060 0.038 0.066 0.094 0.092 0.093 0.54027 0.923 0.0729140 0.0745719 0.0736518 240 Rotated Factor Loadings and Communalities varimax Rotation Factor1 Variable 100m/s 200m/s 400m/s 800m/ s 1S00m/ s 5 OOOm/ s 10,000m/s Marathonr/s Variance % Var Communality 0.038 0.061 0.060 0.038 0.066 0.094 0.092 0.093 0.080 0.105 0.116 0.151 . 12 0.273 0.275 0.283 0.32860 0.21168 0.562 0.362 o .54027 0.923 Factor Score Coefficients Variable 100m/s 2 OOm/ s 400m/s 800m/s lS00m/s 5000m/ s 10, OOOm/ s Mara thonr/ s Factor1 Factor2 -0.197 -0.377 -0.287 -0.561 -0.254 -0.526 -0.078 0.048 -0.022 0.159 0.379 0.415 0.489 0.184 0.240 0.334 Using the unrotated loadings, the first factor might be identified as a "running excellence" factor. All the running events have roughly the same size loadings on this factor. The second factor appears to contrast the shorter running events (100m, 200m, 400m, 800m) with the longer events (1500m, 5000m, 10,000, marathon). This bipolar factor might be called a "running speed-running endurance" factor. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 1500m through the marathon have higher loadings on this factor. The second factor might be classified as a "running speed" factor. Note, the 800m run has about equal (in absolute value) loadings on both factors and the remaining running events in each 'Case have moderate and roughly equal loadings on the factor. The two factor solution accounts for 92% of the variance. The correlation matrix R is shown next along with the scree plot. A two factor solution seems warranted. 241 Correlations: 100m/s, 200m/s, 400m/s, 800mls, 1S00m/s, SOOOm/s, 10,OOOm/s, on 20 Om/ s 400m/s 800m/s 1500m/ s 0.909 0.794 0.697 0.755 0.726 0.700 0.661 0.836 0.784 0.778 0.745 0.732 0.706 0.754 0.758 0.760 0.744 0.691 0.895 0.852 0.833 0.800 0.916 0.899 0.872 5000m/s 10 i OOOm/s 100m/s 200m/s 400m/s 800m/ s 1500m/ s 5000m/s 10,OOOm/s Marathonm/s 0.986 0.935 10, OOOm/ s Marathonm/s 0.947 ScteePlot of lOOmIs, .., Maråthonl11$ (C()rrêlatiol1f1atr~) 7 5 l 4 'ii- i: 3. 3 ¡¡ 2 1 o 1 2 3 4 5 6 Factr Number Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Communali ty 0.913 0.939 0.841 0.834 0.914 0.968 0.965 0.929 10 Om/ s 20 Om/ s 40 Om/ s 80 Om/ s 1500m/ s 5000m/ s 10,000m/s Marathonm/ s Variance % Var 6.6258 0.828 0.6765 0.085 7.3023 0.913 7 8 242 Rotated Factor Loadings and Communalities Varima Rotation Variable Factorl Communality 0.913 0.939 0.841 0.834 0.914 0.968 0.965 0.929 0.369 0.423 0.466 100m/ s 200m/s 400m/ s 0.74i 800m/s l500m/s 5000m/s 0.805 0.882 0.895 0.896 Variance 4.1116 0.514 10,000m/s Marathonm/s % Var 3.1907 0.399 7.3023 0.913 Factor Score Coefficients Factorl Factor2 Variable -0.315 -0.270 -0.186 10 Om/ s 20 Om/ s 40 Om/ s 800m/s 1500m/s 5000m/s 10,000m/s Marathonm/s 0.178 0.236 0.341 0.371 0.405 -0.566 -0.522 -0.418 -0.004 0.056 0.178 0.215 0.261 ~PC,m=2J ' . "" .#'1(. .. .. .....-.. . . . .- -2 ~1 0 Fiis Fact .. . ... . 1 2 243 Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Communality 0.859 0.957 0.758 0.777 0.865 0.985 0.986 0.899 Factor1 Variable 100m/s 200m/s 400m/s 800m/s 1500m/s 5000m/ s 10.000m/s Mara thonm/ s 0:773 0.806 0.797 0.870 0.928 0.989 0.986 0.942 % Var 7.08ti5 0.886 0.7485 0.094 6.3380 0.792 Variance Rotated Factor Loadings and Communalities Varima Rotation Variable Communality 0.859 0.957 0.758 0.777 0.865 0.985 0.986 0.899 100m/ s 200m/s 400m/ s 800m/ s 1500m/s 5000m/ s 10,000m/s Marathonm/s % Var 7.0865 0.886 3.1540 0.394 3.9325 0.492 Variance Factor Score Coefficients Factor1 Factor2 0.268 -0.128 0.951 -0.457 0.111 -0.046 0.055 -0.008 Variable lOOmIs 200m/s 400m/ s 800m/ s 1500m/ s 5000m/s 10, OOOm/ s Marathonm/s 0.012 0.570 0.711 0.089 0.055 -0.219 -0.388 -0.047 Score P,lotof 100m's., ..., Marathonm/s (MLE, m=2) 3 . 2 . . .. . l 1 ,f " D J .. . . . .. .. . ,. . . . . 00 .. . . . -.. . . .. . . . . #t(L -2 . . I . . . *11 -3 -3 . . . .. -2 -1 0 Firs factor 1 i 244 The results from the two solution methods are very similar and very similar to the principal component factor analysis of the covariance matrix S. Using the unrotated loadings, the first factor might be identified as a "running excellence" factor. All the running events load highly on this factor. The second factor appears to contrast the shorter running events with the longer events although there is some difference in the groupings depending on the solution method. The 800m and 1500m runs are in the longer group for the principal component method and in the shorter group for the maximum likelihood method. Nevertheless, this bipolar factor might be called a "running speed-running endurance" factor. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon load highly on this factor. The second factor might be classified as a "running speed" factor. Note, for both factors, the remaining running events in each case have moderately large loadings on the factor. The two factor solution accounts for 89%-91 % (depending on solution method) of the total variance. The plots of the factor scores indicate that observations #46 (Samoa) and #11 (Cook Islands) are outliers. The results of the m = 2 factor analysis of men's track records when time is measured in meters per second are very much the same as the results for the m = 2 factor analysis of R presented in Exercise 9.30. If the correlation matrix R is factor analyzed, it makes little difference whether running event time is measured in seconds (or minutes) as in Exercise 9.30 or in meters per second. It does make a difference if the covariance matrix S is factor analyzed, since the measurement scales in Exercise 9.30 are quite different from the meters/second scale. 245 9.32. Factor analysis of data on bulls Factor analysis using sample covariance matrix S Initial Factor Method: Principal Components Difference Proportion 20579.6126 15704.9378 o . 8082 a . 8082 Cumulati ve 4874.6748 5 . 4292 4869.2456 2. 1129 0.1914 o . 0002 o . 9998 o . 9996 5 6 7 3.3163 a .4688 O. 0741 o .~045 2 . 8475 O. 0001 a . 3948 a . 0695 a .0000 a .0000 o .0000 1 .0000 1 .0000 1 .0000 1. ()OO 4 3 2 1 Eigenvalue Factor Pattern FACTORl X3 a . 48777 X4 0 . 75367 X5 0.37408 X6 0.48170 X7 0.11083 X8 0.66769 X9 a . 96506 F ACTOR2 F ACTOR3 a . 39033 a .38532 -0 . 00086 a . 64446 a . 33505 o . 65725 a . 62342 a . 36809 -0.38394 -0.49074 o . 29875 -0 . 26204 o . 33038 o . 00009 Varimax Rotated Factor Pattern F ACTORl F ACTOR2 FACTOR3 o . 32637 X3 0.50195 0.42460 X4 0.25853 0.90600 X5 0.83816 0.45576 X6 0.44716 0.42166 X7 -0.60974 -0.06913 X8 0 . 40890 a .46689 X9 -0.13508 0.30219 YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt from a covarance matrx and then rotates the scaled loadings. YrHgt a . 18354 FtFrBody PrctFFB 0.31943 0.15478 BkFat 0.33514 o . 50894 o . 94363 Frame SaleHt SaleWt Initial Factor Method: Maximum Likelihood Factor Pattern X3 X4 X5 X6 X7 X8 X9 F ACTORl F ACTOR2 FACTOR3 o . 00000 1 . 00000 o . 62380 o . 00000 o . 39838 o . 00000 0.42819 0.85244 a . 52282 -0.01180 o . 94025 -0.36162 -0.34428 0.85951 o . 08393 o . 00598 a . 36843 0.03120 a . 39308 YrHgt FtFrBody PrctFFB Frame BkFat o . 28992 SaleHt a . 83599 'SaleWt Varimax Rotated Factor Pattern FACTOR1 FACTOR2 X3 0 .94438 a . 28442 X4 0.41219 0.50159 X5 0 . 23003 a . 94883 X6 0.88812 0.25026 X7 -0.25711 -0.51405 X~ 0 . 75340 0 . 26667 19 (). 25282 -0.05273 F ACTOR3 0.16509 0.55648 0.21635 YrHgt FtF;rBody O. 18382 Frame BkFat 0.27102 o . 43720 o . 87'634 BAS scaws the loadings obtained PrctFFB SaleHt SaleWt The scaling is Î../ ç: . !J V"ü 246 Factor analysis using sample correlation matrix R 1 2 3 4 Eigenvalue 4.12071.33710.74140.4214 Initial Factor Method: Principal Components 6 7 0.1465 o . 0471 Difference 2.7836 0.5957 0.3200 0.2356 Proportion 0.5887 0.1910 0.1059 0.0602 5 O. 1858 o . 0393 o . 0265 o . 0994 o . 0209 Cumulati ve 0.5887 0.7797 0.8856 0.9458 o .9723 a . 9933 o .0067 1 .0000 , Factor Pattern F ACTOR1 X3 a .91334 X4 a . 83700 X5 0.72177 X6 0.88091 X7 -0 .37900 X8 0.91927 X9 0.54798 F ACTOR2 FACTOR3 -0 .04948 0.15014 -0.35794 -0 . 36484 a . 00894 o .38772 o .48930 -0.38949 -0.03335 0.11715 -0.15210 0.21811 o .69440 o .82646 YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt Varimax Rotated Factor Pattern X3 X4 X5 X6 X7 X8 X9 F ACTOR1 FACTOR2 FACTOR3 0.94188 0.27085 -0.06532 o .44792 0 .78354 o . 24262 0.26505 0.87071 0.93812 0.21799 -0.25513 -0.01382 -0.23541 -0.37460 o .79502 Frame BkFat 0.83365 0.41206 0.13094 0.74194 SaleWt o . 34932 0 . 39692 YrHgt FtFrBody PrctFFB SaleHt Ini tial Factor Method: Maximum Likelihood Factor Pattern X3 X4 X5 X6 X7 X8 X9 FACTORl o . 00000 F ACTOR2 F ACTOR3 1 . 00000 a . 00000 YrHgt 0.42819 o .62380 o . 85244 o . 52282 o . 94025 o . 39838 o . 00000 FtFrBody Pn:tFFB 0.03120 Frame BkFat -0.01180 -0.36162 -0 . 34428 o . 08393 O. 00598 0.85951 o .36843 o . 39308 o . 28992 o . 83599 SaleHt SaleWt Varimax Rotated Factor Pattern FACTORl FACTOR2 FACTOR3 X3 0.94438 0.28442 0.16509 X4 0.41219 0 . 50159 0.55648 X5 0.23003 0 .94883 0 . 21635 X6 0.88812 0.25026 0.18382 X7 -0.25711 -0.51405 0.27102 X8 0.75340 0.26667 0.43720 X9 O. 25282 -0.05273 0 . 87634 YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt The interpretation of factors from R is different of the interpr,etation of factiJl' from S. 247 Factor scores for the first two factors using S and varimax rotated PC estimates of factor loadings 51 (O50 N- .. .. - . ... . . .~. . ....., ,. . .". .. . . o .................._.............,.......-..i...._..n.~-_................................................................. .0 .. . "; - .. ci I I -2 -1 I I i 1 2 Factor scores for the first two factors using R and varimax rotated PC estimates of factor loadings "1 - 51 (O- N ... o . . . ..:..:.. . .. . . o . '1 .o ......_..........................a......................,. .....................................................__.........._......__.... .. , ~ . ¡ "; - . . 0 :.: . r ci-2 . ii i I i i .1 o 1 2 248 9.33 The correlation matrix R and the scree plot follow. The correlations are relatively modest. These correlations and the scree plot suggest m = 2 factors is probably too few. An initial factor analysis with m = 2 confirms this conjecture. Consequently, we give am =3 factor solution. Correlations: indep, supp, benev, conform, leader supp benev conform leader indep -0.173 -0.561 -0.471 0.187 supp benev conform 0.018 -0.327 -0.401 0.298 -0.492 -0.333 Scte Plot ofindep, ...;leader (¡.s 0.0, 1 3 2 Factr Number Principal Component Factor Analysis of R (m = 3) ,. ~ lU Unrotated Factor Loadings and Communalities Variable indep supp benev conform leader Variance 2.1966 % Var 0.439 Fàctor2 Fact~r3 l:9 . 5 8 Q) -0.009 -0.422 cr5~m, 0.163 0.100 -0.256 1. 3682 0.7559 0.274 0.151 Communality 0.943 0.909 0.670 0.819 0.979 4.3207 0.864 249 Rotated Factor Loadings and Communalities Varimax Rotation Variable Fact~ Factor2 Factor3 indep supp benev (=0.971 0.018 -0.003 leader -0.155 ~ -0.111 Variance 1.6506 0.330 conform % Var Communali ty 0.943 0.909 0.670 0.819 0.979 0.136 -0.i12 CO:890J ~1~ (~0.41-a -0.081 O....U9" -0.379 (-OJ077 1. 3587 1. 3114 0.272 4.3207 0.864 0.262 Factor Score Coefficients Variable Factorl Factor2 Factor3 indep -0.752 -0.362 -0.147 supp 0.119 -0.129 0.690 benev 0.372 -0.127 -0.010 conform 0.073 -0.277 -0.545 leader 0.240 0.832 0.008 .. Score Plot of,indep, ...,Ieader (PC, m=3l 4 . .. . . . .. .. .. . .. '\. . -.... " -..3 . . ... . .: .. I" . .. . .., .. . .. . . . . .. . . . . ... . .. . .. ,1 0 Fii'l'actr -2 Maximum Likelihood Factor Analysis of R (m = 3) * NOTE * Heywood case Unrotated Factor Loadings Variable Factor3 Communality 1.000 -0.790 1.000 -0.086 0.532 CD indep supp benev conform leader Variance % Var and Communalities 1. 5591 0.312 1. 5486 0.310 0.194 0.000 0.589 1.000 1. 0133 4.1211 0.824 0.203 250 Rotated Factor Loadings and Communalities Varimax Rotation Variable indep supp benev conform ~ Factor2 Factor3 Communality IT 0.515 ~ -0.992 0.034 0.0 8 -O.~ ~ Gb.454) -0.980) 0.098 leader -0.129 0.9681 cg.432) .213 Variance 1. 5842 % Var 0.317 1.3199 0.264 1. 2170 0.243 1. 000 1.000 0.532 0.589 1. 000 4.1211 0.824 Factor Score Coefficients Variable Factor1 Factor2 Factor3 indep supp benev conform leader -0 .123 -0 .130 0.219 -0. 024 -1. 069 -0. 000 -0. 000 O. 000 O. 000 -0. 000 -1. 016 0.011 1. 081 0.000 -0.211 Using the unrotated loadings and including moderate loadings of magnitudes .4-.5, the factors are all bipolar and appear to be difficult to interpret. Moreover, the arangement of relatively large loadings on each factor is quite different for the two solution methods. The rotated loadings are consistent with one another for the two solution methods and, although all the factors ar bipolar, may be easier to interpret. The first factor is a contrast between Independence and the pair Benevolence and Conformity. Perhaps this factor could be classifed as a "conforming-not conforming" factor. The second factor is essentially a "leadership" factor although if moderate loadings are included, this factor is a 251 contrast between Leadership and Benevolence. Teenagers with above average scores on Leadership tend to be above average on this factor, while those with above average scores on Benevolence tend to be below average on this factor. Perhaps we could label this factor a "lead-follow" factor. The third factor is essentially a "support" factor although, again, if moderate loadings are used, this factor is a contrast between Support and Conformity. To our minds however, the latter is difficult to interpret. The factor scores for the first two factors are similar for the two solutions methods. No outliers are immediately evident. 9.34 A factor analysis of the paper property variables with either S or R suggests a m = 1 factor solution is reasonable. All variables load highly on a single factor. The covariance matrix S and correlation matrix R follow along with a scree plot using R. For completeness, the results for a m = 2 factor solution using both solution methods is also given. Plots of factor scores from the two factor model suggest that observations 58, 59, 60 and 61 may be outliers. Covariances: BL, EM, SF, BS BL BL 8.302871 EM SF EM 1. 886636 0.513359 SF 4.147318 0.987585 2.140046 0.434307 0.987966 BS 1.972056 Correlations: BL, EM, SF, BS BL EM SF EM 0.914 SF 0.984 0.942 BS 0.988 0.875 0.975 BS o .480272 252 Principal Component Factor Analysis of S (m = 1) Unrotated Factor Loadings and Communalities Variable Factorl Communality BL EM SF BS Variance % Var 2.878 0.664 1.449 0.684 8.285 0.441 2.101 0.468 11. 295 11. 295 0.988 0.988 Factor Score Coefficients Variable Factor1 BL EM SF BS 0.734 0.042 0.188 0.042 The first factor explains 99% of the total varance. All varables, given their measurement scales, load highly on this factor. Note: There is no factor rotation with one factor. Principal Component Factor Analysis of R (m = 1) Unrotated Factor Loadings and Communalities Variable Communality 0.984 0.905 0.991 0.960 BL EM SF BS Variance 3.8395 % Var 0.960 3.8395 0.960 Factor Score Coefficients Variable Factor1 BL EM SF BS 0.258 0.248 0.259 0.255 The first factor explains 96% of the variance. All varables load highly and about equally on this factor. This factor might be called a "paper properties index." 253 Maximum Likelihood Factor Analysis ofR (m = 1) * NOTE * Heywood case Unrotated Factor Loadings and Communalities Variable Fac BL EM SF BS Variance % Var 1.000 0.914 0.984 Communality 1.000 0.835 0.968 0.975 0.988, 3.7784 0.945 3.7784 0.945 Factor Score Coefficients Variable Factor1 BL EM SF BS 1.000 0.000 0.000 0.000 The results are similar to the results for the principal component method. The first factor explains about 95% of the varance and all varables load highly and about equally on this factor. Again, the factor might be called a "paper properties index." Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Factor1 Factor2 Communality 0.993 -0.098 0.9 2 BL 0.999 0.307 0.951 EM 0.991 -0.008 0.996 SF 0.996 -0.191 0.980 BS Variance % Var 3.8395 0.960 0.1403 0.035 3.9798 0.995 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factor1 Factor2 Communality BL EM SF BS Variance % Var 0.817 0.522 0.761 0.868 0.571 0.852 0.642 0.493 0.993 0.999 0.991 0.996 2.271 7 1.7082 0.427 3.9798 0.995 0.568 Factor Score Coefficients Variable Factor1 Factor2 -0.361 0.650 BL 1. 821 -1.235 EM 0.128 0.232 SF -0.8£8 1 . 013 1 BS 254 . #"-0 .. . lr61 . #S9 . ..,.. .I . ". . .. .. ., . e... . .#tõfJ .. , .. -1 0 Firs FaCtor Using the unrotated loadings, the second factor explains very little of the variance beyond that of the first factor and is not needed. Since the unrotated loadings provide a clear interpretation of the first factor there is no need to consider the rotated loadings. The potential outlers are evident in the plot of factor scores. Maximum Likelihood Factor Analysis of R (m = 2) * NOTE * Heywood case Unrotated Factor Loadings Variable Factor BL EM SF BS Variance % Var 0.988 0.875 0.975 1.000 3.6900 0.922 and Communalities Factor2 Communality 0.103 0.986 0.485 0.185 0.000 1. 000 0.2800 0.070 3.9700 0.992 1. 000 0.984 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factor1 BL EM SF BS Variance % Var 0.809 0.523 0.757 0.870 2.2572 0.564 Factor2 Communality 0.986 0.576 0.853 0.641 0.492 1.7128 Ù .428 Factor Score Coefficients Variable Factor1 BL EM SF BS -0. 000 -1. 016 -0.0,00 1.759 Factor2 -0.000 1. 795 -0.000 -1. 078 1. 000 0.984 1. 000 3.9700 0.992 25S The results are similar to the results for the principal component method. Using the unrotated loadings, the first factor explains 92% of the total variance and the second factor explains very little of the remaining variance. Since the unrotated loadings provide a clear interpretation of the first factor (paper properties index) there is no need to consider the rotated loadings. The same potential outlers are evident in the plot of factor scores. 9.35 A factor analysis of the pulp fiber characteristic varables with Sand R for m = 1 and m = 2 factors is summarized below. The covarance matrix S and correlation matrix R follow along with a scree plot using R. Plots of factor scores from the two factor model suggest that observations 60 and 61 and possibly observations 57, S8 and 59 may be outliers. A m = 1 factor solution using R appears to be the best choice. Covariances: AFL, LFF, FFF, ZST AFL LFF FFF ZST AFL LFF FFF ZST -3.21404 0.00577 221.05161 -185.63707 0.34760 308.39989 -0.40633 0.00087 0.06227 3.35980 Correlations: AFL, LFF, FFF, ZST AFL LFF FFF LFF 0.906 FFF -0.733 -0.711 ZST 0.784 0.793 -0.785 256 Principal Component Factor Analysis of S (m = 1) unrotated Factor Loadings and Communalities variable Communali ty 0.047 175.573 279.858 0.001 AFL LFF FFF ZST variance 455.48 % Var 0.860 455.48 0.860 Factor Score Coefficients Variable Factor1 AFL LFF FFF ZST 0.000 0.433 -0.645 0.000 The first factor explains 86% of the total varance and represents a contrast between FF (with a negative loading) and the AFL, LFF and ZST group, all with positive loadings. AFL (average fiber length), LFF (long fiber fraction) and ZST (zero span tensile strength) may all have to do with paper strengt while FF (fine fiber fraction) may have something to do with paper quality. Perhaps we could label this factor a "strength--uality" factor. 257 Principal Component Factor Analysis of R (m = 1) Unrotated Factor Loadings and çommunalities Variable Communality 0.877 0.870 0.770 0.841 AFL LFF FFF ZST Variance % Var 3.3577 0.839 3.3577 0.839 Factor Score Coefficients Variable Factor1 AFL LFF FFF ZST 0.279 0.278 -0.261 0.273 The first factor explains 84% of the variance and the pattern of loadings is consistent with that of the m = 1 factor analysis of the covarance matrix S. Again, we might label this bi polar factor a "strength-quality" factor. Maximum Likelihood Factor Analysis ofR (m = 1) Unrotated Factor Loadings and Communalities Variable Communali ty 0.900 0.894 0.614 0.717 AFL LFF FFF ZST Variance' 3.1245 % Var 0.781 3.1245 0.781 Factor Score Coefficients Variable Factor1 AFL LFF FFF ZST 0.422 0.394 -0. 090 0.132 The first factor explains 78% of the variance and the pattern of loadings is consistent with that of the m = 1 factor analysis of the covariance matrix R using the principal component method. Again, we might label this bi polar factor a "strength-quality" factor. 258 Because the different measurement scales make the factor loadings obtained from the covariance matrix difficult to interpret, we continue with a factor analysis of the correlation matrix R with m = 2. Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and communalities Factor2 communality 0.942 0.256 variable ~ AFL - . 50 0.953 0.949 0.863 0.3493 0.087 3.7070 0.927 0.288 LFF FFF ZST 3.3577 0.839 Variance % Var Rotated Factor Loadings and Communalities Varimax Rotation Communality 0.942 0.953 Variable AFL LFF FFF 0.949 0.863 ZST 2.0176 0.504 Variance % Var 3.7070 0.927 1. 6893 0.422 Factor Score Coefficients Variable Factor1 Factor2 0.696 0.757 0.613 AFL LFF FFF -0.082 ZST 0.359 0.429 1. 075 -0.501 . *' sf .t1~'! . .. ... .. . ... ~: ~ . l. ..-.. . .... .. .#'"1 .*"'0 -4 -3 ~2 -1 Factor FirS 1 2 259 Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Factor2 -0.205 -0.292 Variable AFL LFF FFF ZST Variance % Var 3.2351 0.809 Communality (-0: 38ij 0.033 0.876 0.943 0.944 0.752 0.2796 0.070 3.5146 0.879 Rotated Factor Loadings and Communalities Varimax Rotation Variable F5a~~,,~~' Communal i ty 0.876 0.943 0.944 0.752 AFL LFF FFF - ,.8 . 01 ZST Variance % Var 2.0124 0.503 3.5146 0.879 1. 5023 0.376 Factor Score Coefficients Variable Factor1 Factor2 -0.101 0.336 AFL LFF FFF ZST 0.922 0.534 0.049 -0.423 -1. 197 0.076 m=2) "'''c.. .¡", .. .. .. .. .. ..- .. . .-:-. ..... " ..,.: . ~ ..+$7 1l~"i. -3 -2 .":;2 -1 o Firs Factr Examining the unrotated loadings for both solution methods, we see that the second the remaining variane. Also, this factor has factor explains little (about 7%-8%) of moderate to very small loadings on all the variables with the possible exception of 260 variable FF. If retained, this factor might be called a "fine fiber" of "quality" factor. Using the rotated loadings, the second factor looks much like the first factor for both solution methods. That is, this factor appears to be a contrast between variable FF and the group of variables AF, LFF and ZST. To summarize, there seems to be no gain in understanding from adding a second factor to the modeL. A one factor model appears be sufficient in this case. However, plots of the factor scores for m = 2 suggest observations 60, 61 and, perhaps, observations 57, 58 and 59 may be outliers. 9.36 The correlation matrix R and the scree plot is shown below. After m = 2 there is no shar elbow in the scree plot and the plot falls off almost linearly. Potential choices for mare 2, 3, 4 and 5. We give the results for m = 4 but, to our minds, here is a case where a factor model is not paricularly well defined. Correlations: Family, DistRd, Cotton, Maze, Sorg, Milet, Bull, Cattle, Goats Bull Cattle Sorg Millet Maze Family DistRd Cotton DistRd -0.084 0.028 0.724 Cot ton 0.730 0.679 -0.054 Maze 0.109 0.383 0.568 -0.071 Sorg Millet 0.506 0.022 0.389 0.217 0.382 0.353 0.443 0.623 0.765 0.727 -0.088 Bull Cattle 0.336 -0.063 0.175 0.197 0.404 0.081 0.520 0.560 0.357 0.305 0.424 0.136 0.031 0.399 0.484 Goats ScreeP.lotofFamily" ..,Goats i 2 3 4 5 6 Factor Number 7 8 9 261 Principal Component Factor Analysis of R (m = 4) F~ unrotated Factor Loadings and Communalities Variable Factor3 0.903 Family DistRd Cotton tt~~ -0. 0 -0.068 0.175 Maze -0.070 -0.396 sorg Millet Bull Cattle 0.125 0.286 -0.178 Goa ts % Var Rotated 1. 0581 4.1443 0.460 Variance 0.118 Family DistRd Cotton Maze Sorg Millet Bull Cattle Goa ts Variance % Var F~ ~ 0.714' -0.026 \I .951í 8J11 0.092 0.226 Factor2 -0. 7 . 0.320 -0.022 0.150 ~ ~~~ 0.006 -0.301 o 564 -0.863 i -0.026 -0.210 0.148 0.180 0.535 0.879 0.629 EO.~6) 2.7840 0.309 1.8985 0.211 1.6476 0.183 (0'. 724J ~~ 7.3593 0.818 0.9205 0.102 and Communalities Factor Loadings Varimax Rotation Variable ~ Factor4 Communality 0.842 -0.118 0.974 0.851 . 28 0.907 0.158 0.706 0.798 -0.582 0.856 0.811 0:466 0.614 0.108 Factor4 communality 0.842 0.080 0.974 61.986ì 0.851 -0.076 0.907 0.047 0.706 0.112 0.798 -0.029 0.856 0.043 0.811 0.074 0.614 -0.145 7.3593 0.818 1. 0291 0.114 Factor Score Coefficients Factor4 Variable Factor1 Factor2 Factor3 o .Oti3 -0.171 -0.013 0.197 Family -0.963 0.030 0.042 0.014 DistRd -0.090 -0.115 -0.024 0.344 Cotton 0.023 0.247 -0.165 0.494 Maze O.HlO -0.374 0.246 -0.199 Sorg -0.001 -0.078 -0.260 -0.697 Millet 0.005 0.110 0.204 0.224 Bull 0.019 0.329 0.633 -0.063 Cattle -0.164 -0.156 0.338 -0.114 Goats SCOff: PlolõfFamily" mjGoats (PC, m=4, . 'l3'! . . t:i.r . ... . '..~":;.. , . e. . , -1 o .. · .ft :;7 -#.¡S' 1 Firs 'i .,t. -lll.~' Factor 2 3 4 262 Maximum Likelihood Factor Analysis ofR (m = 4) unrotated Factor Loadings and Communalities Factor3 Factor4 Communality 0.837 -0.162 -0.374 0.009 -0.044 -0.003 0.782 -0.044 -0.307 0.990 0.025 0.649 -0.071 r=i0~5 Q112 21) 0.369 -0.361 -0.301 0.962 0.131 -0.096 0.869 -0.074 0.465 -0.109 -rr.151 64~ì Variable F~~~~ FamilY - .064 DistRd Cotton a1 ~ 0.980 0.211 Maze Sorg Millet Bull Cattle 0.746 0.290 0.249 Goa ts 2.9824 0.331 Variance % Var Rotated 1. 7047 0.189 ~ Factor Loadings and Varirnax Rotation Variable - .605 FamilY 0.017 DistRd Cotton -0.362 -0.034 Maze Sorg -0.558 fj~' rff Millet Bull Cattle -0.324 -0.15tì C=Ô.466 Goa ts Variance % Var 2.2098 0.246 1.7035 0.189 0.6610 0.073 5.9322 0.659 0.5841 0.065 communalities i ty Factor3 Factor4 Communal 0.837 -0.148 0.229 0.009 0.025 -0.081 0.782 -0.370 0.075 0.990 -0.016 0.166 0.649 -0.089 0.303 0.369 -0.028 -0.120 0.915 fO.4il 0.268 E§:m ~:$ 0.962 0.869 0.4ti5 1. 2850 0.7340 0.082 5.9322 0.ti59 0.143 Factor Score Coefficients Factor3 Factor4 Variable Factor1 Factor2 0.247 -0.078 -0.606 0.013 FamilY -0.002 -0.009 -0.002 0.001 DistRd -0.161 -0.162 -0.113 0.033 cotton 0.681 0.109 0.440 0.995 Maze 0.206 0.017 -0.404 -0.023 Sorg 0.052 -0.062 -0.185 0.003 Millet -1. 426 0.103 0.215 -0.026 Bull 0.385 0.896 0.091 -0.141 Cattle -0.023 -0.010 -0.093 -0.009 Goa ts , . "7(" . '. ( " .... ....1. '" -; . . . .#t;7 . l.:~ .lf52. -. -1 o 1 Firs Factr . ":Js 263 The two solution methods for m = 4 factors produce somewhat different results. The patterns for unrotated loadings on the first two factors are similar but not identicaL. The patterns of loadings for the two solution methods on the third and fourth factors are quite different. Notice that DistRd does not load on any factor in the maximum likelihood solution. The factor loading patterns are more alike for the two solution methods using the rotated loadings, although factors 2 & 3 in the principal component solution appear to be reversed in the maximum likelihood solution. The rotated loadings on factor 4 for the two methods are quite different. Again, DistRd does not load on any factor in the maximum likelihood solution, it appears to define factor 4 in the principal component solution. (From R we see that DistRd is not correlated with any of the other varables.) Variables Family, Cotton, Maze, and Bullocks load highly on the first factor. The variables Family, Sorghum, Milet and Goats load highly on the second factor (maximum likelihood solution) or the third factor (principal component solution). Growing cotton and maze is labor intensive and bullocks are helpfuL. The first factor might be called a "family far-row crop" factor. Milet and sorghum are grasses and may provide feed for goats. Consequently, the second (or third in the case of the principal component solution) factor might be called a "family farm-grass crop" factor. The third factor in the maximum likelihood solution (second factor in the principal component solution) may have different interpretations depending on the solution method but in both cases, Bullocks and Cattle load highly on this factor. Perhaps this factor could be called a "livestock" factor. The rotated loadings are considerably different on the fourth factor. This factor is clearly a "distance to the road" factor in the principal component solution. The interpretation is not clear in the maximum likelihood solution. The fact that the two solution methods produce somewhat different results and explain quite different proportions of the total variation (82% for principal components, 66% for maximum likelihood) reinforces the notion that a linear factor model is not paricularly well defined for this problem. Plots of factor scores for the first two factors indicated there are several potential outlers. If these observations are removed, the results could change. Chapter 10 lO.l. t-l/2lo ..-1/2 11 ~i2..-It_ t22 ~l tii _- (0 a 2() ( .:S)2J which has eigenvalues ~2 = (,95)2 and p;2 = o. Thus (1) The normlized eigenvector. are ~1 · (:1 and ': · (~l. 'ui= el .1t 1/2x(1) 11 -= (0 a IJ1(.1xO)(X1J (l )= x(1) 2 2 Since !i t2~/2 = (1 OJ, VI = xf2). Thus Ui = x~l) ,VI = xfl) and Pi = .95. iO.Z * * a) Pi = .55, P2 = .49 b) Ui = .32XP) - .36X~1) Vi = .36Xfl) - .iox~2) U2 = .20XP) + .30X~1) V2 = .23XF) + .30X~1) iO.S a) -1 -1 Q-1D 0-10 (.4S189 t11t12t22t21 =~ 11~~22~1 = .45189-). .28919 .14633 .14633 .28919) .17361 = ).2_.5461 )..0005 .17361- ). = ( À-.5457) P.-.OO09) equation is the same as that of The characteristic ii/2 12 2~ 21 ii/2 (see Example 10.1) and consequently the eigenvalues are the same. b) U2 = -.671ZP) +l.OSSzll) ( 2) Vz = -.863Zi ( 2) + .106ZZ Var(U2) = (_0.677)2+(1.OSS)2_2(.677i(I.05S)(.4) = 1.0 Var(V2) = 1.0 Corr(UZ' V 2) = (-.677) (- .863) (.5)+( -.863) (1.u5S) (.3) + (.706)(-.677)(.6)+(.70ti)(I.0S5)(.4)= ..03 = p~ 10.7 a) 0(p(1 * =,!,p lp Pi 1 Ui = f2(l + p) VI = 10.8 c) 1 r2(1+p) 266 (X(1)+X(I) 1 2 ) i 2 (X (2) + X( 2) ) A* Pi = .72 A 'i VI = .20xi2)+.70x~2) e = 45- = 4 radians A* d) PI = .57. A Ui = 1.03 cos 61 + VI = .49 cos A* 10.9 a) Pl= .39 .46 Sin a1 Sin a2 e2 + .78 P2 = .07 Û1 = i.~6zll)-1.03Zl1); U2 = .30zl1)+.79Zl1i V2 = -.02zl2)+1.0IZl2) VI = 1.10zl2)-.45Z~2) ,b) n = 140, p=2, q=2, n-l- ¥p+q+1l = 136.5 Value of Null hypothesi s Ho: t12 = E'12 = a test sta ti stic -136.5 R.(. 8444 i ( .9953) Upper 51 -Degrees of point of f Freedom distribution 4 9.48 = 23.74 H~l): pi *0, pî=O =-136.5R.(.9953) 1 3.84 .65 A A Therefore, reject Ho but do not reject H~l). Reading ability (summarized by Ui) does correlate with arithmetic ability (summrize~ by Vi) but the correlation (represented by PI = .39) is not parti~ularly strong. JO.10 a) 267 A* A* Pi = .33, P2 = .17 b) Û1 = i.002Z~l)-.003Z~i) V i = -.602Z12) -.977Z~2) U1 .nonprimary Zi( i) -- 1973 d( omic. h' es 1standardized) Vi : i zl2) +Z~2) = a "pun; shment index" Punishment appears to be correlated with nonprimary homicides but not primary homi ci des. 10.11 Using the correlation matrix R and standardized variables, the canonical correlations and canonical variables follow. The Z(l).s are the banks, the Z2).S are the oil companies. p; =.348, p; =.130 Ûi = -.539z:I)+ i.209z~l) + .079z~1) Û2 = i.142z:1) -.410z~1) +.142z~1) Vi = 1.1 60z:2) - .26 lz~2) V2 = -.728z:2) + 1.345z~2) Additional correlations: vi. .1. R ZCI) = (.266 .913 .498), R" Z(2) = (.982 .532) RVi.Z(2) = (.342 .185), Rvi.z(l) = (.093 .318 .174) Here H 0 : 1:12 c¡12) = 0 is rejected at the 5 % level and H cil) : Pi- *' 0, p; = 0 is not rejected at the 5% leveL. The first canonical correlation, although relatively small, is significant. The second canonical correlation is not significant. Focusing attention on the first pair of canonical variables, Û i is dominated by Citibank, Vi is dominated by Royal Dutch Shell. The canonical correlation (.348) between Û i and ~ suggests there is not much co-movement between the rates of return for the banks on one hand and the oil companies on the other. Moreover, Û i is not highly correlated with any of the Z2).S (oil companies) and ~ is not the Zl)'s (banks). The first canonical varables highly correlated with any of differentiate stocks in different industries with some, but not much, overlap. A* a) ID.12- Pl = .69, Reject Ho: A* P12 = 0 a t the 5: level but do Ui = not reject * H~I) = pi 4: 0, b) 268 P2 :I .19 P2 = a a t the 5: level. . 77zI i ) +. 27Z~ 1 ) A VI = .oszI 2) +. 90Z~ 2) + .19Z~ 2) c) Sample Correla tions Vari ables xU) Variables - Be tween Original Variables and Canonical A .. Ui Vi A X(2) . Variables Ui A Vi .1 i. annua 1 frequency .99 .68 1. of restaurant dining 2. annua 1 frequency of a ttendi n9 movi es age of head of .29 .42 househol d 2. .89 .61 3. annual fami ly income .68 .98 educa ti ona 1 1 eve 1 .35 .S1 of head of household d) - U1 is a measure of family entertainment outside the home. VI may be considered a measure of family MstatuS" which is domin- ated by family income. Essentially, family entertainment outside the home is positively associated with family income. a) 10.13 ,.P1 = .909, "' P2 = . 636, ?3 = .256, Va 1 ue of test statistic Null hypothesi s 1. Ho: L12=P12=0 2 Ho: Pi *0, P2=." = P4=0 3. Ho: PI *0, P2 *0, P3=O, ~4 -- .094 Degrees Conclusion of freedom at a level 309.98 20 Reject H 78.63 12 Reject H 16.81 6 0 00 not r.eject Ho. P4=O 0 269 Z(1) i Z(1 ) 2 A .30J i~~J' r:~ -::: -::: -:;: .55 z(l ) 3 Z(1) 4 Z(1) 5 Z(2) 1 .46 .03) G~J' G::: -::: Z(2) 2 .98 -.18 Z(2) 3 Z(2) 4 A b) U1 appears to measure qual i ty of wheat as a "contrast" between "good" aspects (Zl1), zll) and z¡i)) and "bad" aspects (Z3 (l! Z4 (1) ). Vi ; s harder to interpret. It appears to measure the quality of the flour as represented by z12), z~2) and z~2). 270 10.14 a) pi = 0.7520, pi = 0.5395. And the sample canonical variates are U1 U2 Raw Canonical Coetticients tor the Accounting measures ot protitability BRA -0.494697741 1.9655018549 RRE 0.2133051339 -0.794353012 HR 0.7228316516 -0.538822808 RRA 2.7749354333 -4.38346956 RR -1 .383659039 1.6471230054 RR -1.032933813 2.6190103052 V1 V2 Raw Canonical Coetticients tor the Market measures ot protitability Q 1.3930601511 -2.500804367 REV -0.431692979 2.8298904995 U1 is most highly correlated with RRA and HRA and also HRS and RRS. Ví is highly correlated with both of its components. The second pair does not correlate well with their respective components.. b) Standardized Variance ot the Accounting measures ot protitability Explained by Their elm The epposi te Canonical Variables Cumulative Proportion Proportion 1 0.6041 0.5041 2 O. 0906 o . 6946 Canonical Variables Canonical R-Squared 0.5655 0.2910 CWlulati ve Proportion Proportion 0.2851 0.0263 0.2851 0.3114 Standardized Variance ot the Market measures ot protitability Explained by Their eim The epposi te Canonical Variables Canonical Variables Cumulative Canonical Cumulative Proportion Proportion R-Squared Proportion Proportion 1 0.8702 0.8702 0.5665 0.4921 0.4921 2 0.1298 1.0000 0.2910 0.0378 0.5299 Market measures can be well explained by its canonical variate 'C. However, accounting meaures cannot be well explained. In fact, from the correlation between measures and canonical variates, accounting measures on equity have weak correlation with Ûi. Correlations Between the Accounting measures ot protitabili ty and Their Canonical Variables U1 U2 BRA 0.8110 0.2711 HR 0 . 4225 0.0968 BR 0.7184 0.5626 RRA 0 .S60S .. .OOag 271 RRE 0.6741 -0.09S9 RR 0.7761 0.3814 Correlations Betveen the Market measures ot pro~itability and Their Canonical Variables V1 V2 Q 0.9886 0.1508 REV 0.8736 0.4866 10.15 pi = 0.9129, p; = 0.0681. And the sample canonical variates are U1 U2 V1 V2 Rav Canonical Codticients tor the dynamc measure X1 0.0036016621 -0.006663216 12 -0.000696736 0.0077029513 Rav Canonical Coetticients tor the static measures 13 0.0013448038 0.008471036 14 0.0018933921 -0.007828962 Standardized Variance ot the dynamic measure Explained by Their Olm Canonical Variables Proportion Cumulative Canonical Proportion R-Squared Proportion 0.8840 0.8334 0.7367 0.7367 1 .0000 0 .0046 o . 0006 o .7373 1 0.8840 2 0.1160 Standardized Variance ot Their Olm Cumative 2 Proportion the static measures Explained by The Opposite ~anonical Variables Canonical Variables 1 The Opposite Canoni~al Variables Cumulative Proportion Proportion 0.9601 0.0399 0.9601 1.0000 Canonical R-Squared 0.8334 o . 0046 Cumulative Propor'tion Proportion o . 8002 o . 0002 o . 8002 o . 8003 Static meaures can be well explained by its canonical variate ill' Also, dynamic meaures can be well explained by its canonical variate Vi. 272 10.16 From the computer output below, the first two canonical correlations are ßi = 0.517345 and P'2 = 0.125508. The large sample tests -en - 1 - ~(p + q - 1) ) In((1 - p*~)(1 - p*D) ~ X;q(.05) or 1 -(46-1-2"(3+2-1) )In((1-(.517345)2)(1-(.125508)2) J - 13.50 ~ X~(.05) = 12.59 and -en - 1 - ~(p + q - 1) ) In((1 - p*D) ~ XlP-lXq-il05) or 1 -(46 - 1 - 2"(3 + 2 - 1) ) In¡(1 - (.125508)2) J = 0:ô67 ~ X~(.05) = 5.99 suggest that only the first pair of canonical variables are important. Even if the variables means were given, we prefer to interpret the canonical variables obtained from S in terms of coeffcients of standardized variables. Ûi - .4357zPJ - .7047zl1) + i.0815z~i) Vi = i.020z~2) - .1609z~2) The two insulin responses dominate Ûi while Vi consists primarily of the relative weight variable. Canonical Correlation Analysis Adjusted Appr~x Canonical Correlation 1 0.517345 2 O. 125508 Canonical Standard Correlation Error 0.517145 o .007324 0.125158 o . "009843 Squared Canonical Correlation 0.267646 0.015752 Canonical Correlation Analysis Raw Canonical Coefficients for the ~lucose and Insulin GLUCOSE 0.0131006541 0.0247524811 INSULIN -0.014438254 -0.009317525 I NSULRES 0 . 023399723 -0.0"08667216 Raw Canonical Coefficients for the Weight and Fasting WEIGHT 8.0655750801 -0.375167814 FASTING -0.019159052 0 .12~675138 273 Standardized Canonical Coefficients for the Glucose GLUCOSE 0.4357 0.8232 INSULIN -0.7047 -0.4547 1.0815 -0.4006 I NSULRES Standardized Canonical Coefficients WEIGHT FASTING Correlations Between the Glucose SECONDA2 1 .0202 -0.0475 -0.1£09 1.0086 and Insulin PRIMARY 1 Variables PRIMARY2 o . 3397 o . 6838 INSULIN -0 . 0502 -0 . 4565 -0 . 5729 0.7551 and Fasting and Their Canonical GLUCOSE I NSULRES Correlations for the Weight SECONDAl and Insulin Between the Weight and Fasting and Their Canonical SECONDAl SECONDA2 WEI GHT o . 9875 O. 1576 FASTING o . 0465 O. 9989 Variables 10.17 The computer output below suggests maybe two .canonical pairs of variables. the canonical correlations are 0.521594, 0.375256, 0.242181 and 0.136568. Ûi ignores the first smoking question and Û2 ignores the third. Vi depends heavily on the difference of annoyance and tenseness. Even the second pairs do not explain their own variances very well. R~(1)IU2 = .1249 and R~(1)iV2 = 0.0879 Canonical Correlation Canonical Adjusted Canonical Correlation Correlation 2 3 0.521594 0.375256 0.242181 0.52t771 0.374364 0.241172 4 O. 136568 o . 135586 1 Analysis Approx Standard Error o . 007280 o .008592 0.009414 0.009814 Squared Canonical Correlation o .272060 D .140817 D .058652 0.018651 Standardized Canonical Coefficients for the Smoking SMOKING 1 SMOKING2 SMOKING3 SMOKING4 -0.0430 1.0898 1.1161 -1.0092 SMOK2 1.1622 0 .6988 -1.4170 -1.3753 0.2081 0.015£ o . 1732 SMOK3 SMOK4 o .8909 -1 .6506 Q. 8325 SMOKl 1.6899 -0.2630 Standar4ize Canonical CoeffÜ:ients f-or the Psych and Physical State 274 STATE CONCEN 1 0.4733 ST A TE2 ST A TE3 -0.8141 -0.4510 o . 4946 o . 5909 ANNOY -0 .7,806 SLEEP TENSE o .2567 -0 . ~052 o . 3800 ALERT 0.6919 -0.1451 -0 . 1840 0.6981 -0.4190 -1.5191 IRRIT AB -0 . 0704 O. 6255 ~O . 3343 TIRED 0.3127 o .5898 CONTENT o .3364 o . 4869 o . 2276 o . 8334 STA TE4 -0 . 1 t5'04 -0.7193 0.624'6 o . 4376 -0.7253 0.87£0 U .1861 -0.6557 Canonical Structure Correlations Between the Smoking and Their Canonical Variables SMOKINGl SMOKING2 SMOKING3 ~MOKING4 SMOK3 0.4458 0.5278 0.6615 0.2917 0.7305 0.3822 0.1487 0.5461 0.2910 0.2664 0.4668 0.7915 SMOK4 o . 6403 -0.0620 0 . 5586 0 .5236 SMOKl SMOK2 Correlations Between the Psychological and Physical State and Their Canonical Variables STATE 1 STATE2 STATE3 5TA TE4 CONCEN 0.7199 -0.3579 0.0125 ANNOY o .3035 0 . 1365 0 . 3906 -0.3137 -0.4058 SLEEP TENSE 0.5995 -0.3490 0 .3709 o .2586 0.7015 0.3305 0.0053 ..0.18'61 ALERT o .7290 -0. 1539 -0. 1459 -0.3681 IRRIT AB 0.4585 0.3342 0.1211 -0 .0805 0.0749 TIRED CONTENT o . 6905 -0.0267 0 . 2544 o . 5323 0 . 4350 0 . 3207 -0.5601 275 Canonical Redundancy Analysis Raw Variance of the Smoking Explained by Their Own The Opposite Canonical Variables Canonical Variabl€s Cumulati ve Proportion Proportion o . 3068 2 3 o . 3068 O. 1249 o . 2474 4 0.3210 1. 0000 1 0.4316 o . 6790 Canonical R-Squared Proportion 0.2721 o . 0835 0.1408 0.0176 o . 0587 0.0145 0.0187 o . 0060 Cumulati ve Proport ion '0 .0835 0.1'011 0.1155 0.1215 Raw Variance of the Psychological and Physical State Explained by The ir Own The Opposite Canonical Variables Cumulati ve 1 2 3 4 Proportion Proportion o . 3705 O. 0879 0.3705 o .4583 0.0617 0.1032 Canonical Variable s Cumulative Canonical R-Squared Proportion o . 2721 0.1008 0.1408 0.0124 Proportion o . 1008 O. 1132 0.5201 o . 0587 o . 0036 0.11'68 o . 6233 0.0187 0.0019 O. 1187 10.18 The canonical correlation analysis expressed in terms of standardized variables follows. The Z(1).s are the paper characteristic varables, the :t2).S are the pulp fiber characteristic variables. Canonical correlations: p; = .917, p; = .817, p; = .265, p; = .092 First three canonical variate pairs: Ûi =-1.505z:1) -.212z~l) +1.998z~1) +.676z~l) Vi =-.159z:2) +.633z~2) +.325z~2) +.818z~2) Û 2 = -3.496z:l) -1.543z~1) + 1.076z~l) + 3.768z~\) V2 = .689z:2) + i.oo3z~2) + .OO5Z~2) -1.562z~2) Û3 = -5.702z:1) +3.525z~1) -4.714z~1) +7.153z~1) V3 = -.513z:2) +.077Z~2) -i.663z~2) -.779z~2) 276 Additional correlations: Ru,.zo) =(.935 .887 .977 .952), Ry,.Z(2) =(.817 .906 -.650 .940) RU1.Z(2l =(.749 .831 -.596 .862), Ryi.zo) =(.858 .814 .896 .873) Here H 0 : L12 (P12) = 0 is rejected at the 5 % level and H ¿I) : pt '# 0, P; = 0 is rejected at the 5% leveL. H¿2): Pi. '# O,p; '# O,p; = P; = 0 is not rejected at the 5% leveL. The first two canonical correlations are significantly different from O. The last two canonical correlations are not significant. The first canonical variable Û 1 explains 88% of the total standardized varance of it's set, the Z(1),s. The first canonical variable Vi explains 70% of the total standardized variance of it's set, the z,2).S. The first canonical varates are good summar measures of their respective sets of varables. Moreover, the first canonical variates, which might be labeled a "paper characteristic index" and "a pulp fiber strength-quality index", are highly correlated. There is a strong association between an index of pulp fiber characteristics and an index of the characteristics of paper made from them. The second canonical variable Û 2 appears to be a contrast between the first two variables, breaking length and elastic modulus, and the last two variables, stress burst strength. However, the only moderately large (in absolute at failure and value) correlation between the canonical variate and it's component varables is the correlation (-.428) between Û 2 and Z~I) , elastic modulus. The remaining correlations are small. This canonical variable might be a "paper stretch" measure. The canonical variable "2 appears to be determined by all variables except Z~2) , fine fiber fraction. This canonical variable might be a "fiber length/strength" measure. The second pair of canonical variates is also highly correlated. 10.19 The correlation matrix R and the canonical analysis for the standardized varables follows. The z,1), s are the running speed events (100m, 400m, long jump), the z,2).S are the arm strength events (discus, javelin, shot put). 1.0 .7926J .4682 .4682 1.0 .4179 Rl1 = .5520 1.0 .4706 R22 = .4179 .4706.6386J 1.0 (.6386 1.0 .5520 .4752J RI2 = R;i = .2100 .2116 .2539 .3998 .1821 .3102 (.3509 .4953 ( .7926 1.0 277 Canonical correlations: p; = .540, p; = .212, p; = .014 Canonical variables: Ûi = .540z~1) -.120z~1) +.633z~l) Û2 = i.277z~l) -.768z~1) -.773z~1) Vi = -.057z~2) + .043z~2) + 1.024z~2) V2 = -.422z~2) -1.0685z~2) + .859z~2) Û 3 = .399z~1) + .940z~1) - .866z~1) V3 = 1.590z~2) - .384z~2) -1.038z~2) Additional correlations: RUI'Z(1) = (.662 .160 .732), Rv"Z(2J = (.772 .498 .999) Here H 0 : L12 (P12) = 0 is rejected at the 5 % level and H cill : p; "* 0, p; = p; = 0 is rejected at the 5 % leveL. H ci2) : p; "* 0, p; "* 0, p; = 0 is not rejected at the 5 % leveL. The first and second canonical correlations are significant. The third canonical correlation is not significant. We might identify Ûi as a "running speed" measure since the 100m run and the long jump receive the greatest weight in this canonical variate and also are each highly correlated with Ûi' We might call Vi a "strength" or "ar strength" measure since the shot put has a large coeffcient in this canonical variate and the discuss, javelin and shot put are each highly correlated with Vi' 278 Chapter 11 given in (11-19) is 11.1 (a) The linear discriminant function A (_ - )'8-1X =AI a X Y = Xl - X2 pooled where S~moo = ( _: -: i so the the linear discriminant function is ((: i - (: iH -: -: 1 z=¡-2 ~=-2Xi (b) 2 2 A =l(A m - Yl +A) Y2 =l(AI - a Xl +AI)' a X2 =-8 Assign x~ to '11 if Yo = (2 7)xo ~ rñ = -8 and assign Xo to '12 otherwise. Since (-2 O)xo = -4 is greater than rñ = -8, assign x~ to population '11- 279 11.2 (a) '11 = Riding-mower owners; 1T2 = Nonowners Here are some summary statistics for the data in Example 11.1: Z¡ - (I:,,::: 1 ' Z2 - 1:::: 1 5, - ( ~:::::: -I::::: 1 ' 82 = ( 200.705 -2.589 1 -2;589 4.464 8 pooled - , 8-1 pooled = .00637 .24475 ( .00378 AJ06371 -7.204 4.273 _(.276.675 -7.204 i The linear classification 'function for the data in Example 11.1 using (11-19) is .006371 r J' x = L .100 .785 :. 20.267 17.633 .00637 ( (109.475 i -( 87.400 i) i ( ,00378 where 1 1 .24475 ri = 2"(Yl + Y2) = 2"(â'xi + â'X2) = 24.719 280 (b) Assign an observation x to '11 if 0.100x¡ +0.785xi ~ 24.72 Otherwise, assign x to '12 Here are the observations and their classifications: Owners Observation a'xo Classification 1 nonowner 23.44 2 owner 24.738 26.436 3 owner 25.478 4 owner 30.2261 5 owner 29.082 6 owner 27.616 7 owner 28.864 8 owner 9 25.600 owner 28.628 10 owner 25.370 11 owner 26.800 12 owner Nonowners Observation a/xo Classification 1 owner 25.886 2 nonowner 24.608 3 nonowner 22.982 4 nonowner 23.334 owner 25.216 5 6 21. 736 nonowner 21.500 7 nonowner 24.044 8 nonowner 9 nonowner 20.614 10 nonowner 21.058 11 nonowner 19.090 20.918 12 nonowner From this, we can construct the confusion matrix: Predicted Membership '11 '12 Actual membership :~ j 11 1 2 10 Total 12 12 (c) The apparent error rate is 1~~i2 = 0.125 (d) The assumptions are that the observations from 7íi and 7í2 are from multi- variate normal distributions;with equal covariance matrices, Li = L2 = .L. 11.3 l,Ne ned.t-o 'Shuw that the regiuns Ri and R2 that minimize the ECM are defid 281 by the values x for which the following inequalities hold: Ri : fi(x) ;: (C(lj2)) (P2) h(x) - c(211) Pi R2 : fiex) ~ (cC112)) (P2) h(x) c(211) Pi Substituting the expressions for P(211) and p(ij2) into (11-5) gives J R2 J Ri ECM = c(211)Pi r fi(~)dx + c(li2)p2 r h(x)dx And since n = Ri U R2, 1 =Jr h(x)dx r h(x)dx ' Ri J +R2 and thus, ECM = c(211)Pi (1 - k.i fi(x)dx) + c(112)p2 ~i h(x)rix Since both of the integrals above are over the same. region, we have ECM = r (c(112)p2h(x)dx - c(21 JRi l)pifi (x)ldx + c(2~1)Pi The minimum is obtained when Ri is chosen to be the regon where the term in brackets is less than or equal to O. So choose Ri so that c(211)pifi( x) ;: c(112)pd2(:i )'Ur 282 h(æ) )0 h(x) - (C(112) c(2j1)) Pi (P2) 11.4 (8) The minimum ECM rule is given by assigning an observation :i to '11 if fi(æ) )0 (C(112)) .(pi) = (100) (~) = .5 h(x) - c(211) Pi 50.8 and assigning x to '12 if fi(x) ~ (C(112))(!!) = .(100) (.2) = .5 f2(x) c(211) Pi 50.8 (b) Since fi(x) = .3 and f2(x) = .5, fi(x) = 6;: 5 hex) . -' and assign x to '11' 11.5 - ~ (~-~1)'t-1(~-~1) + ~ (~-~2)lt~1(:-~2) = 1 1 1 1 - 1 1+- 1 l +-1 1 - 2(~lr :-2:~r ~+~~r, :i-~'t :+2:2+ :-~2+ ~2 1 i - 1 l l- 1 1,,- 1 J = - 2(-2(:1-:2) ~ ~+~l~ :1-:2~ :2 i -1 i ( ) i l- 1 ( ) = (:1-:2) t : -2 :'-:2 If :1+~2. 283 11.6 a) E(~'I~I7ii) -aa = .:!:l - m = ~l!:i - ~ ~l(~i + !!2J = 1 ~I (~i - !!2) = ~ (!:i - !:2) i r i (~i -!!) ~ 0 s ; nee r1 is positive definite. b) E ( ~,1 ~ lir 2) - II = ~ 1!:2 - m = l ~l (~2 - ~1) _ 1 ( ),..-1 ( - - '2 ~l - ~2'" ~l - ~2) ~ 0 . 11.7 (a.) Here are the densities: --x 1.0 1.0 0.6 0.6 ~ 0.2 -0.2 0.2 R_1 1/4 R_2 -1.0 -0.5 0.0 0.5 1.0 1.5 x -0.2 R_1 -1/3 R_2 -1.0 -0.5 0.0 0.5 1.0 1.5 x 284 (b) 'When Pi = P2 and c(112) = c(211), the classification regions are !i(x) Ri ..hex) hex)~- 1 R2 : h (x) ~ 1 These regions are given by Ri : -1 ~ x ~ .25 and R2 : .25 ~ x ~ 1.5. (c.) When Pi = .2, P2 = .8, and c(112) = c(211), the clasification regions are R2 : fiex) h (x) ~ .4 Ri : fi(x) ;: P2 = .4 hex) - Pi These regions are given by Ri : -1 ~ x ~ -1/3 and R2 : -1/3 ~ x ~ 1.5. 11.8 (al Here are the densities: i. ci -~ C' ci ,. ci ,. cii R_2 -1 -1/2 R_1 1/6 R_2 o 1 x (b) When Pi = P2 and c(112) = c(2Il), the classification regions are R1. .h(x) h(x);:- 1 R2 : !i(x) hex) .( 1 2 285 These regions are given by Ri : -1/2 =: x ~ 1/6 and R2 = -1.5 ~ x ~ -1/2, 1/6 ~ x ~2.5 11.9 a'B ,ua = !'((~1-~)(~1-~)' + (~2-~)(~2-~),J~' a/La a1ta - whI,ere + ) Thus ~ = 2' "_1~1 - u-_ ~2. = l(2. ~ ll_l - U_2) and 11_2 - ~ = tt ~2 - ~l ) so a'B ,ua = a/La ! ~I (~1-~2)(~1-~2) I ~ ala - I , 28~ 11.10 (a) Hotellng's two-sample T2-statistic is T2 - (:Vi - X2)' f (~i + n~) Spooled J -i (Xi - X2) - (-3 - 2j ((I~ + 112) l-::: -::: If L ~: I = 14.52 .. .. Under Ho : l.i = 1J2, T2", (ni + n2- 2)p F. . . ni n2 - P - + 1 p,nl+n2-p-l Since T2 = 14.52 ~ ~i~i~~-;~~ F2,2o(.1) = 5.44, we reject the null hypothesis Ho : J.i = J.2 at the Q' = 0.1 level of significance. (b) Fisher's linear discriminant function is Yo = â'xo = -.49Xi - .53x2 .i (c) Here, m, = -.25. Assign x~ to '1i if -A9xi - .53x2 + .25 ~ O. Otherwise assign Xo to '12. For x~ = (0 1), Yo = -.53(1) = -.53 and Yo - m = -.28 ~ o. Thus, assign Xo to '12. 287 11.11 Assuming equal prior probabiliti€s Pi = P2 = l, and equal misclasification costs c(2Il) = c(112) ~ $10: c 9 10 11 12 !13 14 P(BlIA2) P(B2IAl) P(A2 and Bl) .006 .023 .067 .159 .309 .500 .691 .500 .309 .159 .067 .023 .346 .250 .154 .079 .033 .011 peAl and B2) .D03 .011 .033 .079 .154 .250 P( error) Expected cost .349 .261 .188 .159 .188 .261 3.49 2.61 1.88 1.59 1.88 2.61 minimized for c = 12 and the minimum expected Using (11- 5) ) the expected cost is cost is $1.59. 1i.~2 Assuming equal prior probabiltiesPi = P2 = l, and misclassificationcosts c(2Il) = $5 and c(112) = $10, expected cost = $5P(A1 and B2) + $15P(A2 and B1). c 9 10 11 12 13 14 P(BlIA2) P(B2/A1) P(A2 and Bl) 0.006 0.023 0.067 0.159 0.309 0.500 0.691 0.500 0.309 0.159 0.067 0.023 P(AI and B2) P(error) 0.003 0.011 0.033 0.079 0.154 0.250 0.349 0.261 0.188 0.159 0.188 0.261 0.346 0.250 0.154 0.079 0.033 0.011 Expected cost 1.78 1.42 1.27 1.59 2.48 3.81 . Using (11- 5) , the expected cost is minimized for c = 10.90 and the minimum expected cost is $1.27. 11.13 Assuming prior probabilties peAl) = 0.25 and P(A2) = 0.715, and misoassIÍca- tion costs c(2Il) = $5 and c(lj2) = $10, expecte cost = $5P~B2jAl)(.2'5) + $15P(BIIA2)(.75). 288 c 9 10 11 12 13 14 P(Bl/A2) P(B2/A1) P(A2 and Bl) P(A1 and B2) P(error) 0.006 0.023 0.067 0.159 0.309 0.500 0.691 0.500 0.309 0.159 0.067 0.023 0.173 0.125 0.077 0.040 0.017 0.006 0.005 0.017 0.050 0.119 0.231 0.375 Expected cost 0.178 0.142 0.127 0.159 0.248 0.381 0.93 0.88 1.14 1.98 3.56 5.65 Using (11- 5) , the expected cost is minimized for c = 9.80 and the minimum expected cost is $0.88. 11.14 Using (11-21), 79 A* - v'â'â -.61 1 ai -â -(.-and m*i = -0.10 Since â~xo = -0.14 ~ rñi = -0.1, classify Xo as 7i2' Using (11-22), ~1 and m; = -0.12 -.77 aA 2* -- a~ --( 1.00 i Since â;xo = -0.18 ~ m; = -0.12, classify Xo as '12. These 'results are consistent with the classification obtained for the case of equal prior probabilties in Example 11.3. These two clasification r.eults should be identical to those of Example 11.3. 289 f1 (xl (C(lIZl P2J 11.15 fZ(~) l eT Pi defines the same region as PzJ. For a multivariate 1n fi(~) -In f2(~) l rc(1IZ) 1n Le-pi normal distribution , - 1 - - , --1 1n f.(x) = _12 ln It.1 _.22 ln 2rr - 21(x-ii,.)'r'(x-ii.), i=1,2 so 1 n f1 (~) - , n f 2 (:) = - ~ (:-~1)' ~i 1 (:-~, ) 1 ( ) ,+- , 1 ( I t i I) + 2' ~-!:2 +2 (~-~Z) - '2 1n M _ 1 ( ,.,-1 '+ -1 , +-1 - - i : "'1 : - 2~rl'1 : + ~1 "'1 ~1 1 +- -1!:2+2 ,- 1!:2)1- ('21n U iW i/ ) - ~ ,'2 t~ -+ ,2!:2'12~ 1 1(+-1 +-1) (,+-1 ,+-1) = - 2 ~ ~1 - '12 ~ + ~1+1 - ~2"'2 ~ - k where 1 k='21n (iii/) 1 i ~1 -1- , -1 ~2) . iW + I'!!i+1 ~2i2 290 11.16 Q = In .. fi(x) = - i lnl+il - i(:-~l) 'ti1 (~-~1) (f ¡(X)J 1 l' -1 + '2 In!t21 + 2'(~-~2) t, (~-~2) 1 , (..-1 t- 1 ) i +-, 1+- i .. .. - 1..1 = - -2 x +i -+2 X + X t II - _X 1'2 ll_Z - k where When k =12'(1n (I t ii ) 1..-' 't-1 ' J ii + ~, 1'1 ~i - ~2T2 ~2 . ti = h = t, Q= i~ -i1-:1+-1 1 (~i iT t-~1 1- !:21' 1+-1) l' ~1 +~2 - 2' ~Z It-'()'( 1+-1 ' = ~ l ~1 - LZ - 2' l:i - e2) l (~1 +!:Z) 11.17 Assuming equal prior probabilties and misclassification costs c~2Ii) = $W and c(1/2) = $73.89. In the table below , Q-__ i ("(-i "(-i) (i "(-i i -i) 2 Xo LJi - ~2 Xo + J.i ~i - 112:E2 :to 1 -~l (IEil) _ ~( i~-l i -1 ) 2 n 1~21 2 1-1 i 1-1 - 1-2~2 1-2 291 x (10, (12, (14, (16, (18, (20, (22, (24, (26, (28, (30, P('1ilx) P ('12 I Q x) 15)' 1. 00000 0 17)' 19)' 21)' 23)' 25)' 27)' 0.99991 0.95254 0.36731 0.21947 0.69517 0.99678 291' 1. 00000 31)' 1. 00000 331' 1. 00000 35)' 1.00000 0.00009 0.04745 0.63269 0.78053 0.30483 0.00322 0.00000 0.00000 0.00000 0.00000 Clasification 18.54 9.36 3.00 -0.54 -1.27 '1i 0.87 1l2 5.74 13.46 24.01 37.38 53.56 '1i '1i '11 ii2 '12 '1i '1i '11 '1i The quadratic discriminator was used to classify the observations in the above table. An observation x is classified as '11 íf Q ~ In r(C(112)) (P2)J = In (73.89) = 2.0 L c(211) Pi 10 Otherwise, classify x as '12. For (a), (b), (c) and (d), see the following plot. 50 40 30 0 0 0 0 0 C\ x' 20 10 0 o 20 10 )L1 30 292 11.18 The vector: is an (unsealed) eigen'l.ector of ;-1B since t-l t-l 1 B: = t c(~1-~2)(~1-~2)IC+- (~1-~2) = c2t-l (~1-~2) (~1-~2) i t-1 (~1-~2) = A t-1 (~1-~2) = A : where A = e2 (~1-~2) 't-l (!:1-~2) . 11.19 (a) The calculated values agree with those in Example 11.7. (b) Fisher's linear discriminant function is A AI 1 2 3 3 Yo = a Xo = --Xl + -X2 where 17 10 27 3 3 6 Yl = -; Y2 = -; rñ = - = 4.5 Assign x~ to '1i if -lxi + ~X2 - 4.5 ~ 0 Otherwise assign x~ to '12. a"'i Xo .. -m '1i Observation '12 Classification Observation '11 1 2 2.83 0.83 '1i 3 -0.17 '12 2 3 1 a-I Xo -.. m -1.50 0.50 -2.50 Classification 112 7(1 7í2 293 The results from this table verify the confusion matrix given in Example 11.7. (c) This is the table of squared distances ÎJt( x) for the observations, where D;(x) = (x - xd8~;oied(X - Xi) '11 Obs. ,ÎJI(x) i ÎJ~ (x) '12 Classification Obs. ÎJ~ (x) ÎJH x ) Clasification 3 3 '1i 1 313 i3 7f2 2 i J! '1i 2 l3 i3 7fi 3 4 3 3 3 '12 3 19 3 4 1 3 21 3 3 7f2 The classification results are identical to those obtained in (b) 11.20 The result obtained from this matrix identity is identical to the result of Example 11.7. 11.23 (a) Here ar the normal probabiHty plot'S for each of the vaables Xi,X'2,Xa, X4,XS 294 -2 .1 o 2 295 -2 -1 0 2 ....~a_~ 300 ~x 0 ~ 00/ 0 ocP 250 200 0 .2 -1 0 2 0 .2 -1 0 2 0 80 60 II x 40 20 ,i.III.ID.ooO 0 0 .2 -1 0 1 2 Standard Normal Quantiles Variables Xi, xa, and Xs appear to be nonnormaL. The transformations In\xi) and In(xs + 1) appear to slightly improve normality. (b) Using the original data, the linear discriminant function is: y = â' x = 0.023xi - O.034x2 + O.2lx3 - 0.08X4 - 0.25xs where ri = -23.23 , In,(x3 + 1), 296 Thus, we allocate Xo to Í1i (NMS group) if âxo - rñ = 0.023xi - 0.034x2 + 0.2lx3 - 0.08X4 - 0.25xs + 23.23 ;: 0 Otherwise, allocate Xo to '12 (MS group). ( c) Confusion matrix: Predicted Membership '1i '12 Actual membership ;~ j 66 3 7 22 Total t ~~ APER= 6~~~9 = .102 This is the holdout confusion matrix: Predicted Membership ,'1i '12 Actual membership ;~ j 64 5 8 21 Total t ~: Ê(AER) = 6~~~9 = .133 11.24 (a) Here are the scatterplots for the pairs of observations (xi, X2),tXi, X3), and ~Xl' X4): 297 0 0.1 + ++* +: 0.0 C\ + Q. ++++ +it+ + + ce 0 )( -0.2 -0.3 +lt o 0 + .0.1 + 0 bankrupt nonbankrupt 0 0 0 0 -0.4 -0.6 -0.2 -0.4 0.6 0.4 0.2 0.0 + 5 + C" + + 4 3 ++0+; ++ + )( + + ++ ++ 2 0+ 0 0 -0.6 -0.4 + 000 oOi 8(3 ~ 0 1 0 + + -0.2 0.6 0.4 0.2 0.0 0.8 -a + )( 0 0.4 0.2 óJ 0 0 0 0.6 + + 0 o Cò + 0 0 0 0 -0.6 -0.4 -0.2 + + ++ + + 0 0 0 + + + Ll \ + 0.2 + q. + 0++ 0.0 + 0.4 0.6 x1 The data in the above plot appear to form fairly ellptical shapes, so bivaate norma1ìty -does not seem like an unreasonable asumption. 298 (b) '11 = bankrupt firms, '12 = nonbankrupt firms. For (Xi,X2): Xi - 8i - -0.0819 i ' ( -0,0688 X2 - 0.02847 0.02092 ( 0,0442 0.02847 J 82 - 0.0551 ( 0.2354 i' 0.00837 0.00231 lO'M735 0.Oæ37 J (c), (d), (e) See the tables of part (g) (f) 0.01751 J 8 pooled = 0.01751 0.01077 ( 0.04594 Fisher's linear discriminant function is y = â'x = -4.67xi - 5.l2x2 where rñ = -.32 Thus, we allocate Xo to '1i (Bankrupt group) if âxo - rñ = -4.67xi - 5.12x2 + .32 ~ 0 Otherwise, allocate Xo to '12 APER= :6 = .196. (Nonbankrupt group). 299 Since 8i and 82 look quite different, Fisher's linear discriminant function For the various classification rules and error rates for these variable pairs, see the following tables. This is the table of quadratic functions for the variable pairs .(Xb X2),~Xb X3), and (Xb xs), both with Pi = 0.5 and Pi = 0.05. The classification rule for any of thee functions is to classify a new observation into 1ii (bankrupt firms) if the quadratic function is ~ 0, and to classify the new observation into 300 '12 (nonbankrupt firms) otherwise. Notice in the table below that only the constant term changes when the prior probabilties ~hange. Variables Prior Quadratic function -61.77xi + 35.84xiX2 + 407.20x~ + .s.64xi - 30.60X2 Pi = 0.5 (Xi,X2) Pi = 0.05 -i.55x~ + 3.S9xiXa - 3.08x3 - 10.69xi + 7.9ûxa Pi = 0.5 (xi, Xa) Pi = 0.05 -0.46xf. + 7.75xiX4 + 8.43x¡ - 10.05xi - 8.11x4 Pi = 0.5 (Xl, X4) Pi = 0.05 + - 0.17 3.11 3.14 6.08 2.23 0.71 Here is a table of the APER and Ê(AER) for the various variable pairs and prior probabilties. APER Variables (Xi, X2) (Xi, xa) (Xi, X4) Ê(APR) Pi = 0.5 Pi = 0.05 Pi = 0.5 Pi = 0.05 0.20 0.11 0.17 0.26 0.37 0.39 0.22 0.13 0.22 0.26 0.39 o ,4t) For equal priors, it appears that the (Xl, Xa) vaiable pair is the best clasifer, as it has the lowest APER. For unequal priors, Pi = 0.05 and P2 = 0.95, the variable pair (xi, X2) has the lowet APER. 301 (h) When using all four variables (Xb X2l X3, X4), 0.04424 0.02847 0.03428 0.00431 -0.0688 Xi -0.0819 - , - 8i X2 iJ.0330u 1.3675 0.03428 0.02580 0.1'6455 0.4368 0.00431 0.00362 0.03300 0.04441 0.04735 0.00837 0.07543 -u.00662 0.00837 0.u023l 0.00873 D.0003l 2.5939 0.07543 0.00873 1 :04596 0.03177 0.4264 -0.00662 0.00031 0.03177 0.02618 0.2354 - 0.02847 0.02092 0.0258D () .00362 0.0551 , 82 - Assign a new observation Xo to '1i if its quadratic function .given below is less than 0: Prior Quadratic function -49.232 -20.657 -2.623 Pi = 0.5 4.91 14.050 -52.493 -28.42 -20.657 526.336 -2.623 11.412 -3.748 1.4337 8.65 14.050 -52.493 1.434 11.974 -11.80 11.412 x'0 xo+ Xo Pi = 0.05 For Pi = 0.5 : APER = ;6 = .07, Ê(AER) = ;6 = .11 For Pi = D.n5 : APER = :6 = .20, Ê(AER) = ¡~= .24 - 2.69 - 5.64 302 11.25 (a) Fisher's linear discriminant function is Yo = a' Xo - rñ = -4.80xi - 1.48xg + 3.33 Classify Xo to '1i (bankrupt firms) if a' Xo - rñ ;: 0 Otherwise classify Xo to '12 (nonbankrupt firms). The APER is 2:l4 = .13. , This is the scatterplot of the data in the (xi, Xg) coordinate system, along with the discriminant line. 5 4 C' 3 x 2 1 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 x1 (b) With data point 16 for the bankrupt firms delete, Fisher's linear discrimit 303 function is given by Yo = a'a;O - m = -5.93xi - 1.46x3 + 3.31 Classify Xo to'1i (bankrupt firms) if a'xo - m, 2: 0 Otherwise classify Xo to '12 (nonbankrupt firms). The APER is 1;;4 = .11. With data point 13 for the nonbankrupt firms deleted, Fisher's linear discriminant function is given by Yo = a'xo - m = -4.35xi - i.97x3 + 4.36 Classify Xo to '1i (bankrupt firms) if a/:.o - m ;: 0 Otherwise classify Xo to '12 (nonbankrupt firms). The APER is 1;;3 = .089. This is the scatterplot of the observations in the (Xl, X3), coordinate system with the discriminant lines for the three linear discriminant functions given abov.e. Als laheUed are observation 16 for bankrupt firms and obrvtion 304 13 for nonbankrupt firms. It appears that deleting these observations has changed the line signficantly. 11.26 (a) The least squares regression results for the X, Z data are: Parameter Estimates Variable DF INTERCEP 1 X3 1 Parameter Estimate Standard Error Paramet-er=O Prob ;) ITI -0.081412 0.307221 o . 13488497 o .05956685 -0.604 5.158 o .5492 T for HO: 0.0001 Here are the dot diagrams of the fitted values for the bankrupt fims and for the nonbankrupt firms: 305 .. .. .. .... ... +~--------+---------+---------+---------+---------+----- --Banupt . . . .. ..... +---------+---------+--- ---- --+---------+---------+----- - - N onbanrupt o . 00 0 . 30 O. 60 0 .90 1. 20 1.50 fitted values: This table summarizes the classification results using the OBS GROUP 13 16 31 banrupt banrupt nonbankr 34 38 41 nonbanr nonbanr nonbanr FITTED CLASSIFICATION --------------------------------------------o . 57896 0.53122 0.47076 O. 06025 o .48329 o . 30089 misclassify misclassify misclassify misclassify misclassify misclassify The confusion matrix is: Predicted Membership '11 '12 Actual membership '11 =1 19 2 '12 J 4 21 Total t ;; Thus, the APER is 2:t4 = .13. (b) The least squares regression results using all four variables Xi, X2, X3, X4 are: 306 Parameter Estimates Standard Error Parameter=O Pr.ob ;) ITI 1.122 0.335 1.268 3.214 -0 .944 O. 2ô83 o .7393 Variable DF Parameter Estimate INTERCEP 1 0.208915 Xl 1 o . 156317 0.18615284 0.46653100 X2 X3 X4 1 1. 149093 o . 90606395 1 o . 225972 1 -0.305175 0.07030479 0.32336357 T fo.r HO: 0.2119 o . 0026 O. 3508 Here are the dot diagrams of the fitted values for the bankrupt firms a:nd for the nonbankrupt firms: _+_________+_________+_________+_________+_________+__---Banrupt _+_________+_________+_________+_________+_________+__---N onbankrupt -0.35 0 . 00 0 .35 0 . 70 1 .05 1.40 This table summarizes the classification results using the fitted values: OBS GROUP FITTD CLASSIFICATION o . 62997 o . 72676 misclassify misclassify misclassify misclassify ---------------------------------------------15 16 20 banrupt banrupt banrupt 34 nonbanr 0.55719 0.21845 The confusion matrix is: 307 Predicted Membership Total '1i '12 Actual membership 18 3 1 24 F ;~ :~ j Thus, the APER is 3::1 = .087. Here is a scatterplot of the residuals against the fitted values, with points 16 of the bankrupt firms and 13 of the nonbankrupt firms labelled. It appears that point 16 of the bankrupt firms is an outlier. + 0 16 + 0.5 +. "- ~ 0 In -e :i "C 'ëi (I 0.0 bankrupt nonbankrupt "'~ ° Cò + 0 c: ~ -0.5 0.0 , + . + °eo + ° ° 13 0 0.5 Fitted Values 11.27 ~a) Plot'Üfthe4ata in the (Xi,X4) variabte space: 1.0 1.5 308 2.5 :; :; :; 2.0 :; :; ~ ,'§ -i: 1.5 Ii - + Jo + Q) 'V 1.0 + :; :; :; :; :; :; :; :; :; :; :; :; :; :; :; :; :; :2 :; :; :; :; :; :; :; :; :; . + + :; + + + + .++++ :;++++++ + + + + + + + + + +++ ++ o X o Setosa Versiclor Virginic o 000 o 00 0 0 00 0000000000 0.5 o o 2.0 + ~ 00 0 0 2.5 3.0 3.5 4.0 X2 (sepal width) 'Shape. However, The points from all three groups appear to form an ellptical it appears that the points of '11 (Iris setosa) form an ellpæ with a different orientation than those of '12 (Iris versicolor) and 113 (Iri virginica). This indicates that the observations from '1i may have a different covariance matrix from the observations from '12 and '13. (b) Here are the results of a test of the null hypothesis Ho : Pi = 1L2 = ¡.3 vel'US others at the a = 0.05 level Hi : at least one of the ¡.¡'s is different from the of significance: Statistic WiJ.x'S' Lambda Value F o .~2343B63 199.145 Num DF Den DF Pr)J F 8 288 Q.0001 309 Thus, the null hypothesis Ho : J11 = J12 = J13 is æjected at the Q = 0.05 level of significance. As discussed earlier, the plots give us reason to doubt the assumption of equal covariance matrices for the three groups. (c) '11= Iris setosa; '12 = Iris versicolor '13 = Iris virginica The quadratic discriminant scores d~(x) given by (11-47) with Pi = P2 = P3 = l are: population ~(x) = _1 In ISil- l(x - Xi)' Sii(x - Xi) '11 -3.68X2 + 6.16x2x4 - 47.60x4 + 23;71x2 + 2.30X4 - 37.67 '12 -9.09x~ + 19.57x2x4 - 22.87x~ + 24.94x2 + 7..ß3x4 - 36.53 '13 -6. 76x~ + 8.54x2X4 - 9.32x~ + 22.92x2 + 12.38x4 - 44.04 To classify the observation x~ - 13.5 1.75), compute Jftxo) for i = 1,2,,3, and classify Xo to the population for which ~(xo) is the la¡;g.et. AQ di (xo) = -103.77 AQ d2 (xo) = 0.043 cf(xo) = -1.23 So classify Xo to '12 (Iris versicolor). (d) The linear discriminant scores di(x) are: population I di(x) = ~SpooledX - l~Spooledæi J dïÍ2;O) 1li . 36.02x2 - 22.26x4 - 59.00 .28.12 '12 i9.3lx2 + 1£.58x4 - 37.73 '58..6 '13 15A9X2 + 3'6.28x4 -59.78 57.92 310 Since d¡(xo) is the largest for í = 2, we classify the new observation x~ = i3.5 1.75) to'1i according to (11-52). The results are the same for (c) and (d). (e) To use rule (11-56), construct dki(X) = dk(x) - di(æ) for all i "It. Then classify x to'1k if dki(X) ;: 0 for all i = 1,2,3. Here is a table of dn(%o) for i, k = 1,2,3: i 1 2 3 0 -30.74 30.74 29.80 0 -29.80 0.94 -0.94 0 1 J 2 i Since dki(XO) .;: 0 for all i =l 2, we allocate Xo to '12, using (11-52) Here is the scatterplot of the data in the (X2' X4) variable space, with the classification regions Ri, R2, and Rg delineated. 2.5 :; :; -.i- U .~ 2.0 :; ;. :; :; . 1.5 eØ - + + + ã5 Co '" X 1.0 :; ;. ;. :; :; :; ;. ;. :; :; ;. :; :; :; ;. :; ;. :; ;. ;. :; ;. :; ;. ;. ;. ;i ;. ;i++++ ;.++++++ + + + + + + + + + + + + :: 0.5 0 0 000 00 0 0000000000 00 0 0 2.0 2.5 3.0 0 3.5 X2 ~sepal width) 0 0 0 4.0 0 311 CHAPTER 11. DISCRIMINATION A.ND CLASSIFIC.4.TIOH 36 (f) The APER = ii~ = .033. Ê(AER) = itg = .04 11.28 (a) This is the plot of the data in the (lOgYi, 10gY2) variable space: 0 2.5 -~ - Cl .Q o Setosa + Versiclor ~ Virginic 0 0 Oll 00 0 0 OCDai 0 0 0 0 00 0 0 o 0 2.0 0 0 !b 1.5 0 1.0 0 0 0 0 0 00 0 0 0 o 0 0 0 * +++;. +;. + + t +.t;t + + :P + J: +;. V + + t ;. + ;. ++:t++~ +;.~ ;.'l~~ ;.;. ;.;.;.;. ;.;.;. ;.;')l 0.4 0.6 , 0.8 1.0 log(Y1 ) The points of all three groups appear to follow roughly an eliipse-like pattern. However, the orientation of the ellpse appears to be different for the observations from '11 (Iris setósa), from the observations from '12 and '13. In '1i, there also appears to be an outlier, labelled with a "*". (b), (c) Assuming equal covariance matri.ces and ivariate normal populations, these are the linear discriminant -scores dit x) for i = 1, 2, 3. For both variables log Yi, and log 1': population J df(X) = ä;SpooledX - lä;SpooledZi '11 . 26.81 7r2 75.10 log Yí + 13.82 log Yi + 28.90 log 1' - 31.97 log 1' - 36.83 7r3 79.94 log Yi + 10.80 IQg Y2 - 37.30 312 For variable log Yi only: population ¿¡(x) = ~SpooledX - læ~Spooleåæi '1i 40.90 log Yi - 7.82 '12 81.84 log Yi - 31.30 85.20 log Yi - 33.93 '13 For variable 10gY2 only: population ¿¡(x) = ~SpooiedX - l~Spooledæi '11 30.93 log Y2 - 28.73 '12 19.52 log Y2 - 11.44 '13 16.87 Variables log Yl, log Y2 log Yl log Y2, log Y2 + 8.54 APER E(AER) 26 - 17 27 - 18 150 - . 150 - . 49 - 33 iš -. 49 - 33 150 - . 34 - 23 34 - 23 i50 - . i50 - . The preceeding misclassification rates are not nearly as good as those in Ex:- ample 11.12. Using "shape" is effective in discriminating'1i (iris versicolor) from '12 and '13. It is not as good at discriminating 7í2 from 1i3, because of the overlap of '11 and '12 in both shape variables. Therefore, shape is not an effective discriminator of all three species of iris. (d) Given the bivarate normal-like scatter and the relatively large samples, we do not expect the error rates in pars (b) and,(c) to differ. much. 313 11.29 (a) The calculated values of Xl, Xi, X3, X, and Spooled agree with the results for these quantities given in Example 11.11 (b) 1518.74 J ,B- w-i _ 1518.74 258471.12 .000003 ( 0.000193 0.348899 0.000193) _ ( 12.'50 The eigenvalues and scaled eigenvectors of W-l Bare ).i - 5.646, A' ai 0.009 ( 5.009 J ).2 - 0.191, A i a2 -0.014 ( 0'2071 To classify x~ = (3.21 497), use (11-67) and compute EJ=i(âj(x - Xi))2 i = 1,2,3 Allocate x~ to '1k if EJ=i(âj(x - Xk))2 ::E;=i (âj(æ - Xi))2 for alli i= Ie For :.o, k L~_l(â'.(X - Xk)J2 1 2.63 2 16.99 3 2.43 Thus, classify Xo to '13 This result agrees with thedasifiation given in Example 11.11. Any time there are three populations with only two discrim- 314 inants, classification results using Fisher's Discriminants wil be identical to those using the sample distance method of Example 11.11. 11.30 (a) Assuming normality and equal covariance matrices for the three populations '1i, '12, and '13, the minimum TPM rule is given by: Allocate xto '1k if the linear discriminant score dk (x) = the largest of di (:.), d2 \ æ ), d3\~ where di(x) is given in the following table for i = 1,2,3. population di(x) = ~SpooledX - lX~SpooledXi '11 0.70xi + 0.58x2 - l3.52x3 + 6.93x4 + 1.44xs - 44.78 '12 1.85xi + 0.32x2 - 12.78x3 + 8.33x4 - 0.14xs - 35.20 '13 2.64xi + 0.20X2 - 2.l6x3 + 5.39x4 - 0.08xs - 23.61 (b) Confusion matrix is: Predicted Membership '1i '12 '13 Actual '11 membership '12 7 7í3 0 1 0 10 3 0 0 35 Total 7 11 38 And the APER O+5~+3 = .071 The holdout confusion matrix is: Predicted Membership '1i '12 '13 Total me~~~~hiP :: J ~ I ~ 1 :5 ( ~ E(AER)= 2+5~+3 = .125 315 (c) One choice of transformations, Xl, log X2, y', log X4,.. appears to improve the normality of the data but the classification rule from these data has slightly higher error rates than the rule derived from the original data. The error rates (APER, Ê(AER)) for the linear discriminants in Example 11.14 are also slightly higher than those for the original data. 11.31 (a) The data look fairly normaL. 00 500 0 0 :: cø 0 Q) 0 0 450 Q) i: "t 0 00 0 400 0 0 0 0 c6 0 0 0 0 0 00 õJ° 0 0 0000 000 °õJ + C\ + 0+ 0 + Gl X +0 0 + + + + 0+ + ~it + + a Alaskan + Canadian 60 80 t 120 + + + +++ + + + + 100 + ++ .¡+ + + + 't + 1+ 350 300 + + + + + 140 160 180 X1 (Freshwater) Although the covariances have different signs for the two groups, the corr.ela- tions are smalL. Thus the assumption of bivariate normal distributions with .equal -covariance matrioes does not seem unreasnable. 316 (b) The linear discriminant function is â'x - rñ = -0.13xi + 0.052x2 - 5.54 Classify an observation Xo to'1i (Alaskan salmon) if â'xo-m ;: 0 and clasify Xo to '12 (Canadian salmon) otherwise. Dot diagrams of the discriminant scores: . .. . ... I... . -------+---------+---------+---------+---------+--------- Alaskan .. .... ... . .. .. .... ". . ".. ....... .. . . . . . -------+---------+---------+---------+---------+---------Canadian -8.0 -4.0 0.0 4.0 8.,Q 12.0 It does appear that growth ring diameters separate the two groups reasonably well, as APER= ~t~ = .07 and E(AER)= ~t~ = .07 ( c) Here are the bivariate plots of the data for male and female salmon separately. 317 eo 100 12.0 80 160 180 140 i mae ¡~:~:%~~;.~"E~" % 500 0 0 - 45 0 0 CI i: .¡: 0 Cb 0 ct e o 0 0 0 0 o 0 400 C\ o 0 X 350 o il 0 0 o 0 o + o 000+ o + + + ++ +o + + ++++ + 0+ o 00 o + óJ 000 + o o ++ + ++ o+0 + + o + ++ + :.+ + + + + + + + + 300 140 160 180 X1 (Freshwater) For the male salmon, these are some summary statistics . xi 436.1667 ( 100.3333 i, Si -197.71015 1702.31884 ( 181.97101 -ì97.71015 J ( ::::::: l' S2 141.64312 760.65036 ( 370.17210 141:643121 X2 The linear discriminant function for the male 'Salmon only Is â'x - m= -0.12xi + 0.D56x2 - 8.12 Classify an observation Xo to 1ii (Alaskan salmon) if â'xo-m;: 0 and clasify :c to '12 (Canadian -salmon) oth.erwIse. + + + 318 Using this classification rule, APER= 3tal = .08 and E(AER)= 3:ä2 = .w. For the female salmon, these are some summary statistics Z¡ - (4::::::: J' s, -210.23231 1097.91539 ( 336.33385 -210.23231 i Z2 - (:::::::: J' S2 120.64000 1038.72ûOO ( 289.21846 120.64000 J The linear discriminant function for the female salmon only is â' X - rñ = -O.13xi + O.05X2 - 2.66 Classify an observation xo to'1i (Alaskan salmon) if â'xo-m ~ 0 and classify xo to '12 (Canadian salmon) otherwise. Using this classification rule, APER= 3i;0 = .06 and E(AER)= 3;;0 = .06. It is unlikely that gender is a useful discriminatory varable, as splitting the data into female and male salmon did not improve the classification results greatly. 319 the data for the two groups: 11.32 (a) Here is the bivarate plot of + + 0.2 + + + ++ ++ + + 0.0 + + C\ + X + + ++ + + + + ++ +0 0 +0 + ã' + %+ooo' + + ll 0 + ++õ + -0.2 + + 0 + + + 0 0 0 0 0 0 0 0 e + +0 + -0.4 o Noncrrer + Ob. airrier o -0.6 -0.4 -0.2 0.0 X1 Because the points for both groups form fairly ellptical shapes, the bivariate normal assumption appears to be a reasonable one. Normal -score plot-s fDr each group confirm this. (b) Assuming equal prior probabilties, the sample linear discriminant function is â'x - ri = i9.32xi - l7.l2x2 + 3.56 Classify an observation Xo to '1i (Noncarriers) if â'xo - rñ ;: .0 and classify Xo to '12 (Obligatory carriers) otherwise. The holdout confusion matrix is 320 Predicted Membership '1i '12 Actual membership '11 j '12 26 4 8 37 Total t ~~ Ê(AER)= 4is8 = .16 (c) The classification results for the 10 new cases using the discriminant function in part (b): Case Xl X2 1 -0.112 2 3 -0.0'59 -0.279 -0.068 0.012 -0.052 -0.098 -0.113 -0.143 -0.037 -0.090 -0.019 4 5 6 7 8 9 10 0.064 -0.043 -0.050 -0.094 -0.123 -0.011 -0.210 -0.126 â' x - rñ Classification 6.17 '1i 3.58 '1i 4.59 111 3.62 lii 4.27 '11 3.68 '1i 3.63 lii 3.98 '11 1.04 7íi 1.45 '11 (d) Assuming that the prior probabilty of obligatory carriers is ~ and that of , noncarriers is i, the sample linear discriminant function is â':. - rñ = 19.32xi - 17.12x2 + 4.66 Classify an observation Xo to lii (Noncarriers) if â':.o - rñ :; 0 and classify ::o to '12 (Obligatory carriers) otherwise. The hold.ut confusion matrix is 321 Predicted Membership '11 '12 Actual membership :: j ~~ I 2°7 Total t ~~ Ê(AER)= l~tO = 0.24 The classification results for the 10 new cases using the discriminant function in part (b): Case 1 2 3 4 5 6 7 8 9 10 Xi X2 â'x - ri -0.112 -0.059 0.064 -0.043 -0.050 -0.094 -0.123 -0.011 -0.210 -0.126 -0.279 -0.068 0.012 -0.052 -0.098 -0.113 -0.143 -0.037 -0.090 -0.019 7.27 4.68 5.69 4.72 5.37 4.78 4.73 5.08 2.14 2.55 Classification 7ri '1i '11 '11 7ri '1i '11 '1i '11 '11 11.33 Let X3 = YrHgt, X4 = FtFrBody, X6 = Frame, X7 = BkFat, Xa = SaleHt, and Xg = SaleWt. (a) For '11 = Angus, '12 = Hereford, and '13 = Simental, here are Fisher's linear discriminants di d2 cÎi - -3737 + l26.88X3 - 0.48X4 + 19.08x5 - 205.22x6 +275.84x7 + 28.l5xa - 0.03xg -3686 + l27.70x3 - 0.47X4 + l8.65x5 - 206.18x6 +265.33x7 + 26.80xa - 0.03xg -3881 + l28.08x3 - 0.48x4 + 19.59xs - 206.36x6 +245.50X7 + 29.47xa - 0:03xg 322 When x~ = (50,1000,73,7, .17,54, 1525J we obtain di = 3596.31, d2 = 3593.32, and d3 = 3594.13, so assign the new observation to '12, Hereford. This is the plot of the discriminant scores in the two-dimensional discriminant space: 2 0 0 0 0 0 ~ 0 0 .rct i C\ ;: 0 8000 ~ :. 00 0 + + 0 0+ .p + -1 0 + 0 + 00 :.~ i. :. :. "b + + ~ ~ 0 + + :. + -2 0 :. :. :. .2 :. :.:. ~ ~ 0 :. ?:. :. + + % 0 eO + + ~ + 0 2 Angus Hereford Simental 4 y1-hat (b) Here is the APER and Ê(AER) for different subsets of the variable: Subset I APER Ê(AER) X3, X4, XS, X6, X7, Xai Xg X4, Xs, X7, Xa XS, X7, Xa X4,XS X4,X7 X4,Xa X7,XS XS,X7 Xs,XS 11.34 For .13 .14 .21 .43 .36 .32 .22 .25 .28 .25 .20 .24 .46 .39 .36 .22 .29 .32 '11 = General Mils, '12 = Kellogg, and '13 = Quaker and assuming multivariate flmai data with a 'Cmmon covariance matdx,eaual costs, and equal pri,thes 323 are Fisher's linear discriminant functions: di d2 d3 .23x3 + 3.79x4 - 1.69xs - .Olx65.53x7 - 1.90XB + 1.36xg - O.12xio - 33.14 .32x3 + 4.l5x4 - 3.62xs - .02X69.20X7 2.07xB + 1.50xg - 0.20xio - 43.07 .29x3 + 2.64x4 - 1.20xs - .02x65.43x7 1.22xB + .65xg - ü.13xio The Kellogg cereals appear to have high protein, fiber, and carbohydrates, and low fat. However, they also have high sugar. The Quaker cereals appear to have low sugar, but also have low protein and carbohydrates. Here is a plot of the cereal data in two-dimension discriminant space: 2 0 o ar 0 1 .c 0 C\ ;: ;: ;: 0 0 0 0 0 + 0 00+ + + + ~ + + o~++ -1 .t + + ;: ;: -2 + + -3 -4 + ;: + lU i 0 ;: -2 0 y1-hat 0 Gen. Mils + ;: auar Kellog 2 324 11.35 (a) Scatter plot of tail length and snout to vent length follows. It appears as if these variables wil effectively discriminate gender but wil be less successful in discriminating the age of the snakes. ,:':___::.....-.-' _ _." ..-...-----...----...--...-.. _ _.d:-'..d..d."--" . Sêatterplotof SntoVnLength.vs Ta. .. .. .. .. .. ~ ~ .. il . . . . . ~ . ~ .. .. .. .. ~ ~ ~ ~ ~ .. .. . .. .. . . .. . .. . . ... .a ... ... . . 140160 180 liàjlLength OD) Linear Discriminant Function for Groups Female -36.429 0.039 SntoVnLength 0.310 TailLength Constant Male -41.501 0.163 -0.046 sumary of Classification with Cross-validation Put into Group Female Male Total N N correct True Group Male Female 34 3 37 34 i 27 29 27 0.931 Proportion 0.919 N = 66 N Correct = 61 E(AER) = 1 - .924 = .076 ~ 7.6% Proportion Correct 0.924 325 (e) Linear Discriminant Function for Groups 4 3 2 -112.44 -145.76 -193.14 0.45 0.38 0.33 SntoVnLength 0.65 0.60 0.53 Tai lLength Constant sumary of Classification with Cross-validation True Group Put into Group 2 3 4 Total N N correct proportion N = 66 2 3 13 4 2 4 0 2 21 0 3 17 13 26 21 21 23 21 0.765 0.808 0.913 Proportion Correct N Correct = 55 0.833 E(AER)= 1-.833= .167 ~ 16.7% (d) Linear Discriminant Function for Groups 2 3 4 -141. 94 SntoVnLength 0.36 -102.76 0.41 Constant -79.11 0.48 sumry of Classification with Cross-validation Put into Group 2 3 4 Total N N correct Proportion N = 66 True Group 2 3 4 14 1 3 21 0 4 0 4 17 14 26 21 0.824 0.808 19 23 19 0.826 N Correct = 54 Proportion Correct o. a18 E(AER) = 1-.818 = .182 ~ 18.2% Using only snout to vent length to discriminate the ages of the snakes is about as effective as using both tail length and snout to vent length. Although in both cases, there is a reasonably high proportion of misclassifications. 326 11.36 Logistic Regression Table 95% CI Odds Predictor Constant Freshwater Marine Coef SE Coef Z P 3.92484 0.126051 -0.0485441 6.31500 0.0358536 0.0145240 0.62 3.52 0.534 0.000 0.001 -3.34 Ratio Lower Upper 1.13 0.95 1. 06 0.93 1.22 0.98 Log-Likelihood = -19.394 Test that all slopes are zero: G = 99.841, OF = 2, P-Value = 0.000 The regression is significant (p-value = 0.000) and retaining the constant term the fitted function is In( p~z) ) = 3.925+.126(freshwater growth)-.049(marinegrowth) 1- p(z) Consequently: Assign z to population 2 (Canadian) if in( p~z) ). ~ 0 ; otherwise assign 1- p(z) z to population 1 (Alaskan). The confusion matrix follows. Predicted 1 2 Total 1 46 4 50 Actual 2 3 47 50 7 APER = - = .07 ~ 7% This is the same APER produced by the linear 100 classification function in Example 11.8. 327 Cha,pter 12 12.1 Democrat Y~s 1 -+ South a) Codes: Yes Repub 1 icanNo o -+ non-South No e.g. Reagan - Cart~r: i 0 1 1 o o 2 2 t P. a+d = 3/5 = 60 Pair Coefficient (a+d)lp R-C .6 .4 .6 R-F R-N R-J R-K o .6 C-F o C-N .2 .4 .6 C-J C-K N-J .8 .6 .4 .4 N-K ;6 F-N F-J F-K J-K .4 Y.es No 328 12.1 b) RankOr¿~r Coeffi ci ent 1 2 3 1 2 3 .75 4.'5 4.5 10 10 .75 .429 .25 .429 4.5 R-N .6 .4 .6 R-J 0 0 0 .6 .75 .429 Pair R-C R-F R-K C-F C-N C-J C-K F-N F-J F-K N-J H-K J-K .571 0 .2 .4 .6 .8 .6 .4 .4 .6 .4 12.2 0 0 4.5 14.5 4.5 14.5 .333 .111 13 . .571 .25 .429 .667 .429 .25 .25 .429 .25 10 .75 .889 .75 .571 .571 .75 .571 R-C R-F R-N R-J R-K C-F C-N C-J C-K F-N F-J F-K N-J N-K J-K 13 10 10 4.5 14.5 4.5 14.5 13 10 4.5 4.5 4:5 1 1 1 4.5 4.5 4.5 10 10 10 10 10 10 4.5 4.5 4.5 10 10 10 Rank Order Coeffi c i ent Pair 4.5 14.5 4.5 14.5 5 6 7 5 6 .333 .5 .2 9 14 9 9 14 14 0 .333 0 .333 0 .5 0 .5 0 .2 9 9 9 0 14 14 14 .2 9 9 9 14 14 12 6 12 6 0 0 0 .2 .333 .111 14 12 .4 .5 .571 .25 .333 6 3 .667 .5 .25 .4 .5 .4 .667 7 3 3 1 1 .8 .5 '1 .667 .333 .143 .25 .333 .25 3 11 6 3 11 11 1i 6 3 3 3 6 '6 6 .4 .571 .667 .571 3 i = (a+b)/p¡ Y = (a+c)/p 1 p , P r(x.-xP = (a+b)(1-(a+b)/p)2 + (c+d)(O-(a+b)/pF = (e+d)(a+b) r(y._y)2 = (a+c)(1-(a+cl/p)2 + (b+d)(O-(a+el/p)2 = (a+C)(b+d) l' 1 1 1 1 r(x.-x)(y.-y) = r(x.y.-y.i-x.y+xy) p p p1 = a _ (a+c)(a+b) _ (a+b)(a+c) + p (a+b)(a+c) p P = a(a+b+c+d)-~a+eHa+b) = ad-be Therefore (ad-bc) lp r = ((c+dHa+b~~a+C)(b+dl )', = ad-be ((a+b He+d) (a+c)(b+d))~ 330 12.4 Let c, =-, a+d a+d c3 =(a+d)+2( b+c) c = 2 p 1 e3 so then c3 = 1 +2( c; 1_1) increases as c, 2 cz so A 1 so t c2 increases as c, i ncrea s.es increases = c; 1 + , 4 Finally. Cz = c-1+3 so Cz increases as c3 increases 3 12.5 a) Single linkage 2 1 3 4 (12) 3 1 0 (123) (12) 0 3 11 z o 4 3 4 5 4 3 o Dendogram ~ J :i 1. -+ -+ 3 (D 0 Z (j 0 ~ ri. 2. '3 'l (123) 4 4 4 o 4 (: oJ 331 12.5 b) c.) Average Linkage Compl ete Li nkage Dendoaram Dendoaram 10 S e 4 ~ "3 2- Ll 1- :i :: :3 4 .1 ~ ;l 12.6 "3 4 Dendograms Complete Linkage 10 Average Li nkage ,Single Linkage e (¿ 4- .2 1. 4 :i 5'3. 1. 4 :z !;-:3 . 1. 4 ;2 S '5. All three methods produee the same hierarchical arrangements. Item 3 is somewhat different from the other items. 12.7 Treating correlations as similarity coefficients. we have: i Single linkage '" A 3 s l. I i S45 =.68 S(45)1 = max (S4!. S51) = .16 . ,g.w~ I',-' ~ .S1 Jrr- .;r 5.(45)2 = .32, S'(45)3 = .18, and so forth. i 1 I I i i .. i-i -- 3 ~t j 332 i2~4S Complete linkage S45 = .68 . c,g_,,1 .'3 ..Sl--S: S(45)1 = min (S41, S5i) = .12 S(45)2 = .21, S(45)3 = .15, and so forth. e3 ., ~-- 1 Both methods arrive at nearly the same clust.eri ng. 12.8 1 2 3 4 1 0 2 9 0 3 3 7 0 4 6 5 9 a 5 11 10 CD 8 5 1 -+ 2 (35) 1 0 2 9 0 (35 ) 7 8.5 a 4 6 CD 8.5 a 1. :3 s- :: tt Average linkage pr~uc.es r~sults si~;l¡r to single linkage. 4 .. 0 333 12.9 Dendograms Singl e Linkage COl'pl ete Linkage 1.0 t .S I ." ~ n .. · 4. · :i _i- 5' LL .3 :L i i.::3 4 S' '1 .i :3 cf s- Average L; nkage a A1 though the vertical s~a les are differ~ntt all three linkage methods produce the same groupings. (Note different vertical 2 scales.) :: 1 1. .: .2 '4 S" 334 12.10 (a) ESSi = (2 - 2)2 = 0, ESS2 = (1 - 1)2 = 0, ESS3 = -(5 - 5)2 = 0, and ESS4 =(8 - 8)2 = o. (b) At step 2 Increase in ESS Clusters t 13)- (3)(2)- ( 14)- (2) t 12)- ( 1) ( 1) ( 1)- (4)- .5 t 4)- 4.5 (3)- 18.0 (4)- 8.0 (23)(24)(2)- (3)-24.5 ~34)- 4.5 (c) At step 3 Increas Clusters in ESS t 12)- (34)- ( 123)- (4) 5.0 8.7 Finally all four together have ESS = (2 - 4)2 + (1 - 4)2 +(5 - 4)2 + (8 - 4)2 = 30 12.11 K = 2 initial clusters (AB) and (CD) Xl xi (AB) 3 1 (CD) 1 1 Final clusters (AD) and (Be) Xi (AD) 4 (BC) 0 xi 2.5 -.5 Squa red di stance ~ntr.oids C1 us ter (AD) (BC) A 8 3.25 29.25 45.25 3.25 to C 9 ro up 0 27.25 3.25 3.25 11.25 335 12.12 K = 2 initial clusters (AC) and (BD) Xl x2 (Ae) 3 .5 (BD) -2 -.5 Squared di stance, to group centroi ds Final clusters (A) and (BCD) C1 uster I Xi x2 (A) 5 3 (A) (BCD) A B 0 52 C 0 40 41 89 4 5 5 (BCD) -1 -1 As expected, this result is the same as the result in Example 12.11. A graph of the items supports the (A) and (BCD) groupings. 12.13 K = 2 initial clusters (AB) and (CD) Xi x2 (AB) 2 2 (CD) -1 -2 Final clusters (A) and (BCD) (A) (BCD) Xi x2 5 3 -1 -1 Squared distan~e to group cen troi ds Cl uster A (BCD) B C 0 01 40 41 89 52 41 51 A 51 The final clusters (A) and (BCD) are the same as they are in Example 12.11. In this case we start with the same initial groups and the first, and only, reassignment is the same. It makes no difference if you star at the top or bottom of the list of items. 336 12.14. (a) The Euclidean distances between pairs of cereal brands CL C2 C3 C4 C5 C6 C7 C8 C9 CL0 ~11 C12 CL 0.0 C2116.0 0.0 C3 15.5 121.7 0.0 C4 6 . 4 117. 9 10 . 0 0 .0 C5 103.2 61.6 100.6 102.1 0.0 C6 72.844.178.474.454.3 0.0 C7 86.4 71 .9 82.5 84.9 22.3 52.4 0 . 0 C8 15.3 121.5 1.4 10.1 100.6 78.3 82.4 0.0 C9 46 . 2 72 . 6 54 . 7 48 . 9 75 . 8 32 . 1 65 .2 54 . 5 0 .0 CL0 54.9 123.0 68.9 59.5134.7 87.8 122.5 68.8 65.7 O.~ CL1 81.3 154.7 94.7 85.8169.6 121.3 157.0 94.6 94.5 47.1 0.0 C12 42.3 114.2 31.3 38.5 81.1 75.3 60.2 31.0 59.8 92.9 121.9 0.0 C13 163.2 163.4 177.9 168.1 208.0 155.4205.1 177.9 148.9 112.4 110.7 198.0 C14 46.7 90.8 60.4 51.5 103.8 55.4 92.9 60.3 28.5 44.3 67.5 75.9 C15 60.3 170.5 50.0 56.6 141.5 127.8 121.5 50.0 103.8 101.7 115.6 62;0 C16 46.9 90.8 60.5 51.6 103.8 55.5 92.9 60.3 28.5 44.3 67.6 75.8 C17 23.1 101.0 21.6 21.6 81.4 58.5 63.6 21.4 37.5 70.1 100.7 26.0 C18 265.7 221.1 280.0270.6278.9 233.9 283.3 280.0 235.6 227.7 218.b 294.5 C19 68.2 181.9 60.5 65.2 155.9 138.7 136.2 60.5 113.2 102.7 111.7 76.6 C20116.6 71.0 113.2115.3 19.7 69.9 32.1113.1 89.3 150.5 183.5 90.6 C21103.0 217.7 96.6100.6191.7 174.7171.6 96.6148.1129.7130.5 111.7 C22 98.6 160.1 112.6 103.4 181.3 130.5 170.2 112.6 106.9 54.1 22.5 139.2 C23 58.0 102.8 49.1 54.9 62.4 68.1 41.3 48.9 61.2 105.4 136.9. 20.7 C24 68.1 181.8 60.4 65.2 155.8 138.7 136.1 60.4 113.1 1'02.7 111.6 76.5 C25 49.4 121.0 36.2 44.8 82.5 82.1 62.8 36.2 68.9 101.7 130.2 14.7 C26 182.8 290.3 186.0 183.8 285.6 250.4 267.2 185.9 220.2 173.8 145.7 210.7 C27134.7 99.9 148.2 139.1150.9 1'01.1 152.2 148.2 1-04.2 99.6 113.7 160.9 C28 16.1128.3 14.2 14.2111.1 85.7 92.3 13.7 59.2 63.5 86.3 39.4 C29 107.5 159.0 120.3 111.6 180.7 132.1 170.7 120.3 116.0 54.1 64.6 144.1 C30 33.5 120.1 21.2 29.2 90.7 78.8 71.2 21.0 61.7 83.1 113.7 17.2 C31 78.9 80.5 90.9 82.8 108.5 59.2 103.1 90.8 56.9 52.6 90.6 101.7 C32 32.1 122.6 43.5 36.0 120.8 83.1 105.0 43.3 51.3 50.9 60.0 65.9 C33 143.1 68.0 141.3 142.4 42.0 84.5 61.1 141.2 109.8 170.6 203.8 120.8 C34 173.0 157.7 187.8 177.9 207.5 155.6206.8 187.8 151.8 127.0 123.8 205.9 C3S 116.2 70.4 112.7 114.9 16.9 69.2 30.4 112.6 89.9 148.8 183.8 90.0 C36 114.1 230.0 111.1 112.9 210.2 186.9 190.8 111.1 158.8 129.8 122.7 131.2 C37 53.1 78.2 51.4 52.4 51.6 41.3 34.2 51.1 38.1 91.1124.5 36.6 C38 54.2 100.4 45.8 51.0 61.8 63.5 43.5 45.8 59.0 99.2 133.'6 25.8 C39 48.3 93.5 42.5 45.9 61.0 ~5.1 43.3 42.5 49.6 90.7 125.9 27.3 C40 40.6140.9 51.6 44.3139.8 100.7123.8 51.4 70.3 44.1 46.2 79.4 C41 197.8 309.6 194.3 196.6 288.1 268.0 268.1 194.3 237.8 215.5 194.4 209.9 C42 191.1 301.3 190.3 190.8 286.6 260.4267.3 190.2229.3 200.8 174.~ 209.7 C43 185.2 290.7 189.2 186.6 288.1 251.4 270.2 189.2 221.4 173.6 143.7 214.8 C13 C14 C15 C16 C17 C18 C19 C20 C21 ~22 C23 C24 C13 0 . 0 C14 127.4 0.0 C15 213.2 105.0 0.0 C16127.4 1.0 105.0 0.0 C17 173.1 51.3 69.7 51.3 0.0 C18 134.4220.7 321.2 220.8 270.1 0.0 C19 212.5 11'0.8 16.2 110.9 81.2 322.6 0.0 337 C20 223.2 117.3 151.2 117.3 94.3288.6166.1 0.0 C21 234.6 142.8 50.3142.8117.2347.4 36.5201.2 0.0 C22 91.5 79.1 135.2 79.2 116.8 204.1 131.1 195.9 148.8 0.0 C23 204.9 83.3 81.1 83.2 36.8 295.9 96.2 70.9 130.9 153.2 0 .0 C24 212.5 110.7 16.0 110.8 81.1 322.6 1.4166.036.5 131.1 96.1 0.0 C25207.5 86.0 60.0 86.1 35.2303.9 75.3 91.8 110.1 147.9 23.2 75.3 C26 233.8 200.3 159.3 200.3 '204.2 342.0 143.8 297.3 121.0 152.7 231.2 143.8 C27 67.1 92.1 193.3 92.2 136.5 141.1 197.4 164.6 227.0 105.1 162.0 197.4 C28174.0 59.3 46.7 59.3 30.1278.3 55.0 123.1 89.7 104.7 58.5 54.9 C29 83.1 93.3144.4 93.3122.6214.5141.7 197.4 160.4 51.8156.3 141.7 C30 191.2 73.8 53.3 73.8 24.6 293.2 66.8 102.5 102.5 130.6 34.3 66.8 C31104.8 49.4 135.7 49.3 78.9207.0141.7 124.7 173.2 91.2104.5 141.7 C32 150.5 37.5 75.3 37.5 47.4 248.1 78.9 132.4 108.8 79.4 80.7 78.7 C33 230.0 136.6 181.8 136.5 121.5 283.5 196.3 31.7231.9214.1101.6 196 . 3 C35 221.6 117.8 150.9 117.7 93.7289.9 165.8 10.1 201.0 195.7 70.2 C36 226.8 148.7 71.8 148.7 131.9 341.0 56.0 221.0 28.8 139.2 151.3 165.7 56.0 C34 30.1 132.2 226.4 132.3 180.7 107.3 226.8 221.3 250.8 107.0 210.8 226.8 C37182.4 63.6 95.5' 63.6 31.1 270.0 108.7 64.4 144.7 138.6 27.7 108 .6 95.7 C39188.6 71.5 83.1 71.6 27.4 282.6 96.8 74.6 132.8 140.6 21.8 96.7 C40146.6 52.5 71.8 52.6 62.1252.4 70.9 152.7 96.8 66.6 96.6 70.8 C38198.4 80.8 81.3 80.9 34.1292.4 95.7 74.1131.3 148.9 17.1 C41 301.1 227.1 153.1 227.1 213.8 401.5 140.2 295.1 108.9 210.5 228.7 140 .1 C42 277.2 214.8 154.9 214.9 209.3 375.5 140.8 294.9 112.9 188.1 229.2 140.7 C43 229.1 200.6 165.0200.7207.1 335.7 149.7 300.2 128.8 149.4235.2 149.6 C25 C26 C27 C28 C29' C30 C31 C32 C33 C34C35 C36 C25 0 . 0 C26 C27 C28 C29 213.9 0.0 170.1 257.2 0.0 46.5 175.0 148.2 0.0 152.5 172.5 103.0 113.8 0.0 C30 20.8 200.3 158.2 30.2- 132.8 0.0 C31 111.4 225.7 66~9 91.2 79.1 97.2 0.0 C32 75.0 170.7 126.2 36.4 101.6 62.2 81.5 0.0 C33 122.5 324.8 167.2 151.1 214~1 131.9 137.3 157.0 O.~ C34 215.5 253.2 58.3 184.8 107.8 201.1112.6 158.5 225.1 0.0 C35 91.3297.5 163.7 122.7 194.6 101.0 121.9 133.6 33.3220.7 0.0 C36131.0 93.2 227.1 102.7 152.9 120.7 178.1 114.7 250.8 244.4220.8 0.0 C37 43.5234.6136.1 60.4141.6 44.5 81.7 72.4 91.2 186.663.7 161.4 C38 24.7 230.4 156.4 57.3 148.9 30.7 97.7 81.1 103.2 205.3 72.0150.5 C39 30.1227.7 146.5 53.6 140.6 30.7 87.9 74.5 102.6 195.3 72.6 150.5 C40 86.9 150.1132.6 41.9 88.9 71.1 88.4 24.1177.4158.4153.0 98.1 C41 209.3 98.9 305.4 186.0 236.3 204.2 264.3 190.2 325.4 315.9 297.0 96.8 C42 210.6 71.2 286.8 180.8 216.6 203.0 251.2 179.4 324.1 292.0 296.8 ~4.0 C43218.2 17.7254.4 178.3 170.3204.2225.5 172.3327.1 248.4300.5 100.9 C37 C38 C39 C40 C41 C42 C43 C37 0 . 0 C38 27 .0 0 .0 C39 20.2 10.1 0.0 C40 90.2 94.6 88.5 0.0 C41 241.1 232.1 233.1 177.4 0.0' C42 237.9 231.7 231.2 164.5 35.2 0.0 C43 237.2233.9230.8 151.2 108.278.7 0 338 (b) Complete linkage produces results similar to single linkage. Single linkage ~ "" ì3 g g ~ o N CO.. N.. 00 o "'.. _N 00 .. Oõ MI ot. Complete linkage 8.. 8.. 8 '" "" ì3 ~ o .."" õú ~" ::~ 0l 00 339 12.15. In K-means method, we use the means of the clusters identified by average linkae as the initial cluster centers. Final cluster centers 1 2 3 4 1 110.0 2.1 0.9 215.0 2 114.4 3.1 1.7 171.1 3 86 . 7 2. 3 o. 5 26.7 4 112.5 3.2 0.8 225.0 for K = 4 5 6 7 8 0.7 15.3 7.9 50.0 2.8 15.0 6.6 123.9 1.4 10.0 5.8 55.8 5.8 12.5 10.8 245.0 K-means K = 2 1 CL 1 2 C2 C3 C4 C5 C6 C7 C8 C9 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 1 1 1 1 1 1 1 C10 C12 C14 C15 C16 C17 C19 C20 C21 C23 C24 C25 C26 C28 C30 C32 C33, C35 C36 C37 1 1 1 1 C3B 1 C39 C40 C41 C42 C43 C11 C13 C18 C22 C27 C29 C31 ~34 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 C10 C12 C14 C15 C16 C17 C19 . C20 C23 C24 C25 C28 C30 C31 C32 C33 C35 C37 C38 C39 C40 C21 C26 C36 C41 C42 C43 C11 C13 C18 C22 C27 C29 C34 K = 4 1 CL 1 1 C2 C3 C4 C5 C6 1 1 C4 C5 C6 C7 C8 C9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C7 C8 C9 C12 C15 C17 C19 C20 C23 C24 C25 C28 C30 C33 C35 C37 1 1 1 C4 C5 1 1 C6 C7 C8 C9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C3B 1 C39 CL0 C11 1 2 C14 C16 C22 C29 2 2 2 2 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 2 C31 C32 C40 C21 C2q C36 C41 C42 C43 C13 C18 C27 3 3 3 4 4 4 (;34 4 Single C1 C2 C3 1 1 1 0.0 1 2 86.1 0.0 3 190.0 162.2 0.0 4 195.4 132.7 275.4 4 clusters K = 3 C1 C2 C3 Distances between centers 1 2 3 4 2 2 2 3 3 3 1 1 1 1 1 1 1 C10 1 C11 1 C12 1 C13 1 C14 1 C15 1 C16 1 C17 1 C19 1 C20 1 C21 1 C22 1 C23 1 C24 1 C25 1 C27 1 C28 1 C29 1 C30 1 C31 1 C32 1 C33 1 C34 1 C3S 1 C36 1 C37 1 C38 1 C39 1 C40 1 C18 18 C26 26 C43 26 .c41 41 C42 41 Complete C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C12 C14 C16 C17 C20 C23 C25 C28 C30 C31 C32 C33 C35 C37 C38 C39 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ~40 1 CL1 11 11 11 11 11 11 15 15 15 15 15 15 .c13 C22 C27 C29 C34 C15 .c19 C21 C24 C26 C36 C41 C42 .c43 Ci8 1S 15 15 18 0.0 340 12.16 (a), (b) Dendrograms for single linkage and complete linkage follow. The dendrograms are similar; as examples, in both procedures, countries 11, 40 and 46 form a group at a relatively high level of distance, and countries 4, 27, 37, 43, 25 and 44 form a group at a relatively small distance. The clusters are more apparent in the complete linkage dendrogram and, depending on the distance level, might have as few as 3 or 4 clusters or as many as 6 or 7 clusters. 341 (c) The results for K = 4 and K = 6 clusters are displayed below. The results seem reasonable and are consistent with the results for the linkage procedures. Depending on use, K = 4 may be an adequate number of clusters. Data Display Countr ClustMemK=6 ClustMemK=4 1 2 6 2 2 4 3 4 1 4 4 5 3 1 6 7 6 6 2 2 2 8 1 9 4 1 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 5 3 2 6 3 6 6 2 2 2 3 1 4 2 1 2 2 1 2 1 1 4 1 4 4 4 6 6 6 3 3 2 1 Numer of Cluster1 Cluster2 VCluster3 Cluster4 Within Average Maximum sum of from from 298.660 318.294 490.251 182.870 4.494 3.613 11.895 2.681 9.049 6.800 16.915 7.024 Wi thin Average Maximum sum of from from 490.251 128.783 11.895 2.669 5.521 cluster distance distance observations squares centroid "Centroid 11 20 3 20 4 4 1 4 4 4 3 6 Numer of clusters: 2 2 1 Numer of clusters: 4 2 4 4 4 2 2 2 1 1 4 1 4 4 6 4 5 3 2 4 2 4 2 2 5 1 4 3 1 4 4 4 3 2 6 6 4 6 1 4 3 6 2 1 2 1 2 4 6 Numer of cluster distance distance observations squares centroid centroid 4.008 2.884 90.154 10 Cluster1 2.428 1. ti3 22.813 8 Cluster2 6."651 3.346 116. S18 8 Cluster3 5.977 2.513 78.508 10 Cluster4 16.915 ¡.lusterS Cluster6 vi IdcMl.c.a,\ 3 15 342 12.17 (a), (b) Dendrograms for single linkage and complete linkage follow. The dendrograms are similar; as examples, in both procedures, countries 11 and 46 form a group at a relatively high level of distance, and countries 2, 19,35,4,48 and 27 form a group at a relatively small distance. The clusters are more apparent in the complete linkage dendrogram and, depending on the distance level, might have as few as 3 or 4 clusters or as many as 6 or 7 clusters. cOoø . ... , .. ,'~.. .,., , . ' , " . '..' , , . ,'~~~~~~~"~"1'\~~~'~' , Countries 343 (c) The results for K = 4 and K = 6 clusters are displayed below. The results seem reasonable and are consistent with the results for the linkage procedures. Depending on use, K = 4 may be an adequate number of clusters. The results for the men are similar to the results for the women. Data Display Country Cl us tMern=4 1 2 3 2 4 5 6 7 2 ClustMem=6 2 2 4 4 4 1 4 1 3 4 6 2 1 8 2 9 4 2 10 11 2 3 2 5 12 13 2 1 2 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 2 1 2 3 1 2 2 4 4 4 6 4 1 2 1 1 2 1 3 2 1 4 2 4 4 6 4 4 6 4 2 30 31 32 2 2 1 33 1 3 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 1 4 4 1 3 4 4 4 2 2 3 49 50 51 52 53 54 2 4 3 1 4 Wi thin Average Maximum cluster distance distance Numer of sum of from from observations squares centroid centroid Clusterl 10 169 .042 3.910 5.950 Cluster2 21 73 .281 1. 684 3.041 VCluster3 2 49 .174 4.959 4.959 Cluster4 21 56 .295 1. 481 3.249 Numer of clusters: 4 1 2 5 3 2 4 2 4 2 2 1 2 1 3 2 4 6 Average Maximum cluster distance distance Numer of sum of from from observations squares centroid centroid Clusterl 12 26.806 1.418 2.413 Cluster2 15 18 _ 764 1.048 1. 844 Cluster3 10 169 _ 042 3.910 5.950 Cluster4 10 10.137 0.935 1.559 vkluster5 2 49.174 4.959 4.959 Cluster6 5 6 _ 451 1.092 1.£06 3 3 2 6 wi thin 1 4 "4 4 4 4 2 2 2 1 1 Numer of clusters: /.U-t~\'CA 12.18. 344 St~s .1 (. '4 ì io .1 (.0''7) i h ~ .2 . Superior North r . The multidimensional scaling configuration is consistent with the locations of these cities on a map. St. Paul t MN Marshfield . , Wausau · Appleton Dubuque, IA '. Mad i son . Monroe ,. · Ft. Atki nson .' . Be 1 0; t · -ch i1 Mi lwaukee ago , It 345 12.19. The stress of final configuration for q=5 is 0.000. The sites in 5 dimensions and the plot of the sites in two dimensions are COORDINATES IN 5 DIMNSIONS v~ABLE -------- PLO DIMENSION --------1 P1980918 A .51 B C D E -1. 32 P1SS0960 P1S30987 P1361024 Pl3S100S P1340945 G P193ll31 Pl3ll137 P1301062 DIMENSION F H I .47 .39 .23 .47 .58 -1.12 -.22 2 -.28 3 4 5 -.68 .12 -.05 -.02 .69 .30 .06 -.07 .34 .09 .10 .05 .30 -.32 .12 .14 -.22 -.14 -.28 -,35 .46 .18 -.10 -.31 .~S - .01 -1. 12 ,61 -.70 -.06 .01 .24 .62 .19 .05 2I + + I I 1I + + I B I E DF I oI + C + I AG I I I -1 + H + I I I I -2 + + -2 -1 0 1 2 2 -+------------ --+--------------+--------------+--------------+- -+--------------+ --------------+ --------------+--------------+- DIMNSION 1 The results show a definite time pattern (where time of site is frequently determined by C-14 and tree ring (lumber in great houses) dating). 346 12.20~ A correspondence analysis of the mental health-socioeconomic .data A correspondence analysis plot of the mental health-socioeconomic-data Ex ;\12 = 0.026 It C\ Ò a Impaired ..It ò Ox It 0 ò U ...~..~~.~.~.~~t:........................................... L..............:...... ......._.Ç..~... 1.2-0.0014 8- I · MUd ~ 9 It .. 9 Ax It C\ 9 a Well -a.07 -a.05 -a.03 -a.01 0.01 0.03 c2 u v -0.6922 0.1539 0.5588 0.4300 -0.1100 0.3665 -0.7007 0.6022 -0.6266 -0.2313 0.0843 -0.3341 -0.1521 -0.2516 -0.5109 -0.6407 0.0411 -0.8809 -0.0659 0.4670 o . 0265 0 . 5490 0 . 5869 -0. ~756 0.7121 0.2570 0.4388 0.4841 0.4097 0.4668 -0.5519 -0.2297 o . 6448 -0. ~032 0 . 2879 -~. 3062 lambda 0.1613 0.0371 0.0082 0.0000 Cumulative inertia 0.0260 0.0274 0.0275 Cumulative proportion o . 9475 0.9976 1.0000 The lowest economic class is located between moderate and impaired. The next lowest das is closetto impairro. 347 12.21. .A correspondence analysis of the income and job satisfaction data A correspondence analysis plot of the income and jOb satisfaction data ~ $50,000 c ..II ò ..0 ò VS x II $25.000 . $50,000 c 0q q0 ......j..i.;;.ö:öööï..................................r......................................... u II0 9 so l( MS x c: $25.000 c ~ 9 II N 9 vp -o.Q5 -0.025 -0.005 0.005 0.015 c2 u V -0.6272 -0.2392 0.7412 0.2956 0.8073 0.5107 -0.6503 -0.6661 -0.3561 -0.1944 0.5933 -0.7758 o . 7206 -0.5394 0.4356 -0.3400 0.3159 0.2253 0.6510 -0.3233 -0.4696 lambda 0.1069 0.0106 0.0000 Cumati ve inertia 0.0114 0.0116 Cumulative proportion 0.9902 1.0000 Very satisfied is closest to the highest income group, and v€ry dissatisfid is b€low the lowest income group. Satisfaction appears to in'Cl'ease with income. 348 12.22. A correspondence analysis of the Wisconsin forest data A correspondence analysis plot of the Wisconsin for.est data ¡ ì.12 = 0.537 C! Ironwood 59 D Sugai;Maple x S10 D ¡àasswood x CD d co d SSe "# d S7e C\ d ..u 0d RedOak x '.n ........ .......... n ......... ..................1'.................. ..............Uï;Ö..Ö96 AmericànElm x. WhiteOak x "# SSe 9 S4e BurOak. CD 9 S2~eS1 'I BlackOak x ..C' I -0.6 -0.4 -0.2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 c2 349 U -0.3877 -0.2108 -0.0616 0.4029 -0.0582 0.326S 0.4247 -0.1590 -0.3856 -0.2428 -0.0106 0.4345 -0.1950 -0.1968 -0.2635 -0.3835 -0.3495 -0.1821 0.4079 -0.5718 0.2343 -0.1167 0.3294 -0.1272 -0.3006 0.1355 0.0540 -0.2646 0 .0006 -0.0826 -0.6644 -0.3192 -0.1108 0.5817 -0.4856 -0.1598 -0.2333 0.1607 0.0772 -0.0518 o . 2022 0 . 5400 0 . 4626 0 . 2687 -0.0978 -0.3943 0 . 2668 -0.3606 0.1852 -0.0756 -0.5090 -0.0291 0.6026 -0.1955 0.1520 -0.5154 0.3140 0.0644 0.3394 0.1567 0.3366 0.6573 -0.2507 -0.2267 0.4200 -0.3484 -0.0394 0.1165 -0.0625 -0.3772 -0.1456 0.1381 o . 3549 -0.2897 -0.0345 -0.3393 -0.5994 0 . 20020 . 1262 -0.4907 V -0.3904 -0.0831 -0.4781 0.4562 -0.0377 0.3369 0.4071 -0.3511 -0.5327 -0.4985 0 .4080 0.0925 -0.0738 -0.3420 -0.2464 -0.3310 -0.1999 0.3889 0.4089 -0.3622 0.4391 0.3217 0.1808 -0.4260 0.0698 0.5382 -0.1726 0.3181 -0.0544 -0.1596 -0.6122 -0.4138 -0.0820 -0.0151 -0.4271 -0.7086 -0.4160 -0.1685 0.0307 -0.3258 0.4005 0.0831 0.1478 0.1866 -0.0042 -0.5895 0.5587 -0.3412 0.3634 -0.4850 -0.3232 -0.0937 0.6298 0.0164 -0.2172 -0.2745 0.4689 -0.2476 0.3150 0.0726 -0.4771 0.5142 -0.0763 -0.3412 lambda 0.7326 0.3101 0.2685 0.2134 0.1052 0.0674 0.0623 0.0000 Cumulati ve inertia 0.5367 0.6329 0.7050 0.7506 0.7616 0.7662 0.7700 Cumulati ve proportion 0.6970 0.8219 0.9155 0.9747 0.9891 0.9950 1.0000 350 12.23' We construct biplot of the pottery type-site data, with row proportions as variables. Eigenvectors of S S 0.0511 -0.0059 -0.0390 -0.0061 -0.0059 0.0084 -0.0051 0.0025 -0.0390 -0.0051 0.0628 -0.0187 -0.0061 0.0025 -0.0187 0.0223 0.6233 0.5853 0.1374 -0.5 0.0064 -0.2385 -0.8325 -0.5 -0.7694 0.3464 0.1951 -0.5 O. 1396 -0.6932 0 . 5000 -~. 5 Eigenvalues of S 0.0978 0.0376 0.0091 0.0000 pel pc2 pe3 pe4 St. Dev. 0.3128 0.1940 0.0952 0 Prop. of Vax. 0.6769 0.2604 0.0627 0 Cumulati ve Prop. 0.6769 0.9373 1.0000 1 As in the ~or~esondence analysis. 351 12.24. vVe construct biplot of the mental health-socioeconomic data, with column proportions as variables. A bipJot of the mental health-socioeconomic data -0.15 -0.10 -0.05 0.0 0.05 0.15 0.10 0co oo 0 c: c: oo 0 c: D C\ 0 C c: 0C\ Mild C\ ci E 0 () c: mpaired 0 0 Well c: c: A C\ 0 C\ 0 c:. c:i Moderate E oo 0 c:. co 0 c:i -0.10 -0.05 0.0 0.05 0.10 Compo 1 S Eigenvectors of S 0.003089 0.000809 -0.000413 -0.003485 o . 000809 0 . 000329 -0.000284 -0.000853 -0.000413 -0.000284 0.000379 0.000318 -0.003485 -0.000853 0.000318 0.004021 -0.ô487 0.0837 -0.5676 0.5 '~0.1685 0.4764 0.7033 0.5 0.0794 -0.8320 0.2270 0.5 0.7379 0.2719 -0.3628 0.5 Eigenvalues of S 0.007314 0.000480 0.000024 0.000000 pc1 pc2 pe3 pc4 St. Dev. 0.0855 0.0219 0.0049 0 Prop. of Vax. 0.9355 0.0614 0.0031 0 Cumulati ve Prop. 0.9355 0.9969 1.0000 1 The biplot gives similar locations for health and socioeconomic status. A i"eflction about the 45 degi-ee line would make them appear more alike. 352 12.25. A Procrustes analysis of archaeological data A two-dimensional representation of archaeological sites produced by metric multidimensional scaling C! _ .- P1931131 P1301062 0 ia _ c0 iic ai E Pl361024 P~7 C! _ 0 PI55096 Õ pP198il18 1340 5 "C c0 u ai en ~C! .-f P1311137 ia .. -1.0 i I i i i -0.5 0.0 0.5 1.0 1.5 2.0 First Dimension A two-dimensional representation of archaeological sites produced by nonmetric multidimensional scaling C! _ .- P130106 P1931131 o ia _ oc iic Q) P1351005 P1361024 o c) P1530987 P155090 E Õ "C C ou ai Il Pl340945 c)I - en P19B018 o P1311137 -7 - ~I I -1.0 -0.5 0.0 i i 0.5 1.0 First Dimension -T 1.5 2.0 353 Site P1980918 P1931131 P1550960 P1530987 P1361024 P1351005 P1340945 P1311137 P1301062 Metric MDS -0.512 -0.278 Nonmetric MDS -0.276 -0.829 1.318 0.692 1.469 0.703 -0.470 -0.071 -0.545 -0.156 -0.338 -0.048 -0.387 0.088 -0.234 0.296 -0.469 0.137 -0.642 0.387 -0.581 -0.349 -0.889 -0.409 1. 118 -1.122 1. 262 -0. 989 0.216 0.608 0.096 0.963 -0. 137 0 . 379 -0.1459 0 .9893 v -0 . 9977 -0. 0679 -0 . 0679 O. 9977 Q Lambda u -0.9893 -0. 1459 0.9969 0.0784 -0.0784 0.9969 4.7819 0.000 0.0000 2.715 To better align the metric and nonmetric solutions, we multiply the nonmetrk scaling solution by the orthogonal matrix Q. This corresponds to clockwise rotation of the nonmetric solution by 4.5 degrees. After rotation, ,the sum of squared distanc.e, 0.803, is reduced to the Procrustes measure of fit P R2 = 0.756. 354 12.26 The dendrograms for clustering Mali Family Fars are given below for average linkage and Ward's method. The dendrograms are similar but a moderate number of distinct clusters is more apparent in the Ward's method dengrogram than the average linkage dendrogram. Both dendrograms suggest there may be as few as 4 clusters (indicated by the checkmarks in the figures) or perhaps as many either dendrogram would depend on the use and require some subject matter knowledge. ....:..'...:.......:... ..:..:.'._':..... .. ....:' -...... .,....: .........,..-....:.:.., .' '"..-',,.... .... '."...".':.-'" .. ,-,,', - .. ':',:- ..,..."..,.....'..,....::.. ....', .:..... ," " -. as 7 or 8 clusters. Reading the "right" number of clusters from . Average Linkage, Euclidean Distance; Malilf=aI1i1Yi,Fatms 79.43 O.OO"f~W~"~~~~~'\~ Fars ..Waíf .' . ..kage, EUdit:eanOisteince;M,. __ ,.....:.... -:. .c...... c." _:' .....'.. .",......-...., ..: "p"-. _ _ .. .... .. ...... .' 643.37 428.91 Fars 355 12.27 If average linkage and Ward's method clustering is used with the standardized Mali Family Farm observations, the results are somewhat different from those using the original observations and different from one another. The dendrograms follow. There could be as few as 4 clusters (indicated by the checkmarks in the figures) or there could be as many as 8 or 9 clusters or more. The distinct clusters we focus are more clearly delineated in the Ward's method dendrogram and if attention on the 4 marked clusters, we see the two procedures produce quite different results. '-..,----_._- _.,-- ._-.- _.-_.-- ,,-,',,' ".."." ,-:,:;,--,,:-:::,-,,-, _..,'::-',.;:-::-_.,--,-.',-.,'.. .---:.--..'"',',,.,-:---.,:-.-'-,,-,",'.",--..i.,.,_.',-':.--,.--.--._:-.:..__.',,",.,-,-'-.,--,,'--.'.','.,'--.',,'-',-,'.-,',',','.--__:,":-::.-',':-.-_--:,_::--X_,d,-i'.- ~ërilge t.jnk~lIe,Eu(¡ndean;lIsll. Mali:fiamil,;f~rm$h('5tandat4jlCc-: 8.03 uCD i: 5.36' Û Q 2.68 .... ,'~oall~~ ,....~~'\~-:..-~1~¥?TFars W~td..Llnk.figtjifi)J~lidean Di$t; MållFamilyFafm!i 44.51 . I 29.68 'e ,8 M ¡ 356 12.28 The results for K = 5 and K = 6 clusters follow. The results seem reasonable and are similar to the results for Ward's method considered in Exercise 12.26. Note as the number of clusters increases from 5 to 6, cluster 1 in the K = 5 solution is paritioned into two clusters, 1 and 6, in the K = 6 solution, there is no change in the other clusters. Although not shown, K = 4 is a reasonable solution as welL. D.ata Display Farm ClustMem=5 ClustMem=6 1 1 2 2 1 2 3 3 3 3 3 4 ~ 5 5 1 7 4 8 9 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 cE 2 4 3 3 3 3 :: 5 4 4 2 4 3 3 3 3 3 2 3 4 4 3 3 5 5 3 3 3 3 5 2 3 3 5 5 3 3 3 3 3 2 3 3 3 3 3 3 3 3 4 4 5 3 4 4 5 3 3 3 3 4 5 4 3 3 3 3 2 3 2 3 3 3 2 2 3 3 3 2 2 3 5 3 5 ! 18.498 19.511 8.878 9.072 15.030 33.076 24.647 21.053 16.024 19.619 Wi thin Average Maximum 696.609 4440.330 3298.539 1129.083 1943.156 1005.125 13.005 19.511 8.878 9.072 15.030 22.418 15.474 24.647 Numer of clusters: 6 Numer of Cluster1 v-luster2 i.luster3 vCuster4 L.uster5 Cluster6 observations 4 11 35 12 8 2 cluster distance distance from from sum of squares centroid centroid 3 2 3 2 3 3 3 2 2 3 3 3 2 2 3 3 1 2431.094 4440.330 3298.539 1129.083 1943.156 3 3 4 1 1 8 Maximum 3 3 4 4 2 veuster4 \/luster5 6 11 35 12 Average 5 4 4 4 4 Clusterl vèluster2 veluster3 2 3 5 5 Numer of observa tions Wi thin cluster distance distance from from sum of squares centroid centroid 5 3 3 Numer of clusters: 5 4 4 4 2 1 1 1 s- ./ rd~lGL1\ for t-wo CttOl(;è S ö~ K 21. 053 16.024 19.619 22.418 357 follow. The results seem reasonable and 12.29 The results for K = 5 and K = 6 clusters are similar to the results for Ward's method considered in Exercise 12.27. Note as the number of clusters increases from 5 to 6, clusters 3 and 4 in the K = 5 solution lose 1 and 2 farms respectively to form cluster 6 in the K = 6 solution, there is no change in the other clusters. These results using standardized observations are somewhat different from the corresponding results using the original data. It makes a difference whether standardized or un-standardized observations are used. Data Display Farm SdC1usMem~5 SdC1usMe=6 1 2 1 5 1 3 3 3 3 5 6 5 5 1 7 3 8 9 3 1 3 3 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 5 3 Numer of clusters: 5 3 3 3 3 3 5 4 4 4 4 3 4 3 3 2 3 3 2 4 4 4 4 4 3 3 3 6 3 3 3 4 2 3 4 4 3 3 3 3 3 3 3 vfuster1 4 3 3 6 3 3 3 42 43 44 45 46 4 4 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 3 3 3 3 3 '68 '69 70 71 72 3 2 5 4 4 4 4 5 3 3 7 Numer of clusters: 2 3 3 4 35 20 vcuster5 4 3 2 3 Average Maximum 14.050 56.727 55.318 84.099 63.071 1. 568 2.703 4.259 1. 211 1.954 2.970 1. 993 3.172 3.482 Within Average Maximum 14.050 56.727 1.568 3.288 2.703 4.259 3.288 3 3 4 3 4 5 3 3 39 40 41 5 C1uster4 4 3 5 Cluster3 3 4 3 within cluster distance distance from from sum of Numer 0 f observations squares centroid centroid L.1 us ter2 3 5 6 3 l/lusterl i. us ter2 6 cluster distance distance from from sum of Numer 0 f observations squares centroid centroid 5 5 Cluster3 Cluster4 34 18 Cluster6 7 3 ~uster5 51. 228 65.501 63.071 7.960 1. 183 1. 806 2.970 1.604 4 5 3 3 2 3 4 3 2 5 4 4 4 4 5 3 3 4 4 4 3 4 3 3 4 2 4 1 1 1 1 1 1 5 5 3 2 / IdetthcOv\ for f£.uc c.ÛDtc.S o( K 1. 951 3.195 3.482 1. 954 358 12.30 The cumulative lift (gains) chart is shown below. The y-axis shows the per-centage of positive responses. This is the percentage of the total possible positive responses (20,000). The x-axis shows the percentage of customers contacted, which is a fraction of the 100,000 total customers. With no model, if we -contact 10% of the customers we would expect 10%, or 2,000 = .1 x20,000, of the positive responses. Our response model predicts 6,000 or 30% of the positive responses if we contact the top to,OOO customers. Consequently, the y-values at x = 10% shown in the char are 10% for baseline (no model) and 30% for the gain (lift) provided by the modeL. Continuing this argument for other choices of x (% customers contacted) and cumulating the results produces the lift (gains) chart shown. We see, for example, if we contact the top 40% of the customers determined by the model, we expect to get 80% of the positive responses. Cumulative Gains Chart 100 Ul CD Ul c: &. Ul 90 BO 70 60 -+ Lift Curve CD -- Baseline ix 50 CD E~Ul 40 30 o 0. 20 ~ 10 ~ o o 10 20 30 40 SO 60 70 80 90 100 % Customers Contacted 359 12.31 (a) The Mclust function, which selects the best overall model according to the BIC criterion, selects a mixture with four multivariate normal components. The four estimated centers are: ..Pi = 3.3188 6.7044 0.3526 0.1418 11.9742 5.1806 5.2871 0.5910 0.1794 5.5369 íÆ2 ,. = 7.2454 4.8099 0.3290 0.2431 3.2834 , ¡¿3 = , -P4 = 8.6893 4.1730 0.5158 0.2445 7.4846 and the estimated covariance matrices turn out to be restricted to be of the form 1JkD where D is a diagonal matrix. The estimated D = diag(l1.2598, 2.7647,0.3355,0.0053,18.0295) and the estimated scale factors are 17i = 0.0319, 172 = 0.3732, 173 = 0,0909, 174 = 0.1073. Theestimatedproportionsarepi = 0.1059, P2 = 0.4986, P3 = 0.1322,P4 = 0.2633. This minimum BIC model has BIG = -547.1408. (b) The model chosen above has 4 multivariate normal components. These four components are shown in the matrix scatter plot where the observations have been classified into one of the four populations. The matrix scatter plot of the true classification, is given in the next figure. Comparing the matrix scatter plot of the four group classification with the matrix scatter plot of the true classification, we see how the oil samples from the Upper sandstone are essentially split into two groups. This is clear from comparing the two scatter plots for (Xb X2). We also repeat the analysis using the me function to select mixture distribution with K = 3 components. We further restrict the covariance matrices to satisfy ~k = 1JkD. The K = 3 groups selected by this function have estimated centers ..Pi = 5.3395 5.2467 0.5485 0.1862 5.2465 , ..P2 = 8.5343 4.2762 0.4988 0.2453 6.6993 , P3 = 3.3228 6.7093 0.3511 0.1418 11.9780 360 0.10 0.15 0.2 0. 0.3 ., G~ao :- +¡ t a a C DOl! &D  "". lA ~ A A..: aD 0 ec .. .. o a e o dc ê 0 .. x2 o _ ~+.Îa+ + a + 0 lf+ . a x3 + + (; Qj, ~ 0 .. ò o ll ~:i++ o + ~cOD~+ CD c' .l~CD a a ii'i' a g + aq.a o + D iIa'l CaD. D ri ¥" 0 0 c a a + + a + +-i\+ + D diD (; D o a 8a~c 0" + o+¡'b,:e t+ * - a + + B c ~c.s x4 0 0 ~ A~ S ++ i! - ò A + A 0 A.. .. Ò 0 " n Ò i1c+ ++ l0,0 + ++0. ~d' 0.0 To +0 gtP ~ .. c ,,0 " 4i 0lPo AAA 0 ii~ o A Aa 'l e o + *t * 0 if c bO OJ' fl+ + 00 .0 0 a 8 0 A A i,+ a 'l + 00 10 , ;). ++ (t + 0 0 o D ,,+'" D C A A .. DC o -i+ + 0 D a A AA aaaao +! t a ~ 0 rm a D 0 o ò + .. + æ D DaD 0 a cO + oii llo'¥ +" ¡ + " D i A 00 ft" /:i (, f t (, 0 0 a D a .. + ,4 o _+ + a " CD 00"* ++ ++ + c aA A 0 +0+ G1) D 0 A -i 'I .0 .i' CD a. ~ e DO D DO D "' il + .. a a l_ A I CD 8! 0 d"ii ~+ 0-i " o 00 +l DO .. c.l D ++ AA"lDD B + B ++ + d' a a D a " I) Dl ". N 0 00 a ai"t'+ a00 '" 0++~ ++ aa DC IJ D 0 tm D-+O+ D+ + Ò °0 0 " a 00 .. "O.l.,." . 80 ¡ 00 0 a ""A o~ 00 c D D~a-b~~ * 0 Oil +. +*:J o o oir 0 'D + !X Aa + c 0 0 + a o a J U D 8 a0 e :l co l' 'l: 11+ + e H~ +.t " a A a a " 0 N D 8. +(¡ D0Da a Ai c..~ 8 a ~""q¡ a a 0 .. D a C°itOD ++ ai c~OD D ODD ~~A 68 +-0 D D Do0 0Q x1 + t +t+ aa +:t , + + '" t; ~ .§ ,pa+ooa o CT. .. x5 .. o :t +0 +L¡i + .. D QCoO D-i ..& D ë ages rì ~ ofr's'co'' a 0.0 02 0.4 0.6 0.8 ,.0 12 Figure 1: Classification into four groups using Mclust .. -T T 10 12 N 361 i. 0,10 0.15 020 0.25 0.3 6 7 4 i I Do ,~OCG x1 AD (OJO. ceo 00 c . o 0 ... flo 6 . o 0 c '" o 'l ~.o 00 08 o~ . 0 ~ i0 °00°1a& 110 II li x2 'i'" 0 0 . HCe ~O.DC 0 . e ~ o 0 ~ ~gåo 0 c 0 .00. B . c c 0 0 c -i cOc~°Qi°o c of~aoOl" 00 c 0 B . o 0 a II '" 0 0.0 :"0 0 . ~ N 9.. I/ O. ~ ~ 0 0 000 8'0 0 · ¡¡o .1 00 0 0 O. i. i"o.i - oS o. 0 0 Oll iP 0 )0 'ì 0 , 0 0 0 D ÚD§DOcjJ o. Do 000 . :U~ , 6 !ì ~c 0 ,8 § 0 ~ . o r 0 o 'I 0 ~o . 0 _'I 0 8 " ° .. 0 0 0 0 .. 0 lD 0 . . 0 8. ¡Pll 0 Øl0là 0 0'0 cP 0 . o Il 1' n " u " 00 DD D 0 0 i: fk coooe DO D o D .a DC .0 0 00 om ¡; C "'0 O. "'.. ufO . .. n .&0 0000 o ci.. 0 0 go o N d An .t ior..' 0 . ~ å 00°0 . . . ~. 0,¡ 'F . . 'lQ) 0 0 c ~ 0 8°~o .0 00 0 . !. ,. 00 l. . . 8 ~ ° .0 0 0 10 Figure 2: True 0., c II å 00 0 4 ..0c" llo 0 m 0 0 . 0 0"0 . 0 - ¡¡Ec 0.0 · rJ ~ å 0 å c rlo S .. oB f 0 o o. 00 o 0i . 0 0 0 x5 .0 ~ o 0lf 0 . DO 00 dt .0 ~ o. 0 c n B o 0 iiJ .. I 0 -t00 .0 . 0 Dc 0 0 0 i 0 ~OO 0 0 .. l 0°il°1l08 Il =0 .. lIO II ~ x4 Bg E c 0 B 8 C ø g q¡ iao 0 E 0 0 OdjOoo 0 C .. . . ,0 0 C 0 0 0 o.¡¡eo c °OCOo~i¡ ! .0 0 D.. ..0 o"C 'l ° B "''i8'ci 0 - 0 0 0 0 0 0 00 DC 0 ~D.. CC .. "! " 0 D.:t C 00 DJoOCOO 0 o ..a CD 0 c °coog 0 o 1& ODD 0 " å 0 . m '" .. t- DC CJ DO i. c ni .0 li 0 DO DO D .. 0 "" 0 00 o DO 000 0 0 oO"lu B 0 !l 00 x3 00 0 i: D 0 B c c 0 0 '. 0 .l. ii0 ~ ..4: C .l SubMuli II B CC 0 0 0 e "- .f Jl'òrt'ó 0.0 0.2 0.4 0,6 0.8 1.0 1.2 classification into sandstone strata II Cl Upper 0 Wilhelm ~ T i 10 12 N 362 the estimated diagonal matrix D = diag(1O.1535, 2.6295,0.2969,0.0052,24.0955) with estimated scale parameters rii = 0.3702, rì2 = 0.1315, rì3 = 0.0314, with resulting BIG = -534.0949. The estimated proportions are Pi = 0.5651, P2 = 0.3296, P3 = 0.1052. If we use this method to classify the oil samples, the following samples are misclassified: 7 19 22 25 26 27 28 29 30 31 32 33 34 35 39 44 45 46 49 and the misclassification error rate is 33.93%.
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.6 Linearized : Yes Create Date : 2010:03:27 19:57:12-07:00 Creator : Adobe Acrobat Pro 9.2.0 Modify Date : 2010:03:27 19:57:12-07:00 XMP Toolkit : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:08:04 Metadata Date : 2010:03:27 19:57:12-07:00 Creator Tool : Adobe Acrobat Pro 9.2.0 Format : application/pdf Document ID : uuid:2eb21c76-bfd9-4e7b-932a-6ff32023820e Instance ID : uuid:a7eb123e-9e31-45bc-a8c7-f6533009ac12 Producer : Adobe Acrobat Pro 9.2.0 Page Count : 369EXIF Metadata provided by EXIF.tools