George Casella And Roger Berger Solutions Manual For Statistical Inference

User Manual:

Open the PDF directly: View PDF .
Page Count: 195 [warning: Documents this large are best viewed by clicking the View PDF Link!]

intro.pdf
ch1sol.pdf
ch2sol.pdf
ch3sol.pdf
ch4sol.pdf
ch5sol.pdf
ch6sol.pdf
ch7sol.pdf
ch8sol.pdf
ch9sol.pdf
ch10sol.pdf
ch11sol.pdf
ch12sol.pdf

Solutions Manual for

Statistical Inference, Second Edition

George Casella

University of Florida

Roger L. Berger

North Carolina State University

Damaris Santana

University of Florida

0-2 Solutions Manual for Statistical Inference

“When I hear you give your reasons,” I remarked, “the thing always appears to me to be so

ridiculously simple that I could easily do it myself, though at each successive instance of your

reasoning I am baﬄed until you explain your process.”

Dr. Watson to Sherlock Holmes

A Scandal in Bohemia

0.1 Description

This solutions manual contains solutions for all odd numbered problems plus a large number of

solutions for even numbered problems. Of the 624 exercises in Statistical Inference, Second Edition,

this manual gives solutions for 484 (78%) of them. There is an obtuse pattern as to which solutions

were included in this manual. We assembled all of the solutions that we had from the ﬁrst edition,

and ﬁlled in so that all odd-numbered problems were done. In the passage from the ﬁrst to the

second edition, problems were shuﬄed with no attention paid to numbering (hence no attention

paid to minimize the new eﬀort), but rather we tried to put the problems in logical order.

A major change from the ﬁrst edition is the use of the computer, both symbolically through

Mathematicatm and numerically using R. Some solutions are given as code in either of these lan-

guages. Mathematicatm can be purchased from Wolfram Research, and Ris a free download from

http://www.r-project.org/.

Here is a detailed listing of the solutions included.

Chapter Number of Exercises Number of Solutions Missing

1 55 51 26,30,36,42

2 40 37 34,38,40

3 50 42 4,6,10,20,30,32,34,36

4 65 52 8,14,22,28,36,40

48,50,52,56,58,60,62

5 69 46 2,4,12,14,26,28

all even problems from 36 −68

6 43 35 8,16,26,28,34,36,38,42

7 66 52 4,14,16,28,30,32,34,

36,42,54,58,60,62,64

8 58 51 36,40,46,48,52,56,58

9 58 41 2,8,10,20,22,24,26,28,30

32,38,40,42,44,50,54,56

10 48 26 all even problems except 4 and 32

11 41 35 4,20,22,24,26,40

12 31 16 all even problems

0.2 Acknowledgement

Many people contributed to the assembly of this solutions manual. We again thank all of those

who contributed solutions to the ﬁrst edition – many problems have carried over into the second

edition. Moreover, throughout the years a number of people have been in constant touch with us,

contributing to both the presentations and solutions. We apologize in advance for those we forget to

mention, and we especially thank Jay Beder, Yong Sung Joo, Michael Perlman, Rob Strawderman,

and Tom Wehrly. Thank you all for your help.

And, as we said the ﬁrst time around, although we have beneﬁted greatly from the assistance and

ACKNOWLEDGEMENT 0-3

comments of others in the assembly of this manual, we are responsible for its ultimate correctness.

To this end, we have tried our best but, as a wise man once said, “You pays your money and you

takes your chances.”

George Casella

Roger L. Berger

Damaris Santana

December, 2001

Chapter 1

Probability Theory

“If any little problem comes your way, I shall be happy, if I can, to give you a hint or two as

to its solution.”

Sherlock Holmes

The Adventure of the Three Students

1.1 a. Each sample point describes the result of the toss (H or T) for each of the four tosses. So,

for example THTT denotes T on 1st, H on 2nd, T on 3rd and T on 4th. There are 24= 16

such sample points.

b. The number of damaged leaves is a nonnegative integer. So we might use S={0,1,2, . . .}.

c. We might observe fractions of an hour. So we might use S={t:t≥0}, that is, the half

inﬁnite interval [0,∞).

d. Suppose we weigh the rats in ounces. The weight must be greater than zero so we might use

S= (0,∞). If we know no 10-day-old rat weighs more than 100 oz., we could use S= (0,100].

e. If nis the number of items in the shipment, then S={0/n, 1/n, . . . , 1}.

1.2 For each of these equalities, you must show containment in both directions.

a. x∈A\B⇔x∈Aand x /∈B⇔x∈Aand x /∈A∩B⇔x∈A\(A∩B). Also, x∈Aand

x /∈B⇔x∈Aand x∈Bc⇔x∈A∩Bc.

b. Suppose x∈B. Then either x∈Aor x∈Ac. If x∈A, then x∈B∩A, and, hence

x∈(B∩A)∪(B∩Ac). Thus B⊂(B∩A)∪(B∩Ac). Now suppose x∈(B∩A)∪(B∩Ac).

Then either x∈(B∩A) or x∈(B∩Ac). If x∈(B∩A), then x∈B. If x∈(B∩Ac),

then x∈B. Thus (B∩A)∪(B∩Ac)⊂B. Since the containment goes both ways, we have

B= (B∩A)∪(B∩Ac). (Note, a more straightforward argument for this part simply uses

the Distributive Law to state that (B∩A)∪(B∩Ac) = B∩(A∪Ac) = B∩S=B.)

c. Similar to part a).

d. From part b).

A∪B=A∪[(B∩A)∪(B∩Ac)] = A∪(B∩A)∪A∪(B∩Ac) = A∪[A∪(B∩Ac)] =

A∪(B∩Ac).

1.3 a. x∈A∪B⇔x∈Aor x∈B⇔x∈B∪A

x∈A∩B⇔x∈Aand x∈B⇔x∈B∩A.

b. x∈A∪(B∪C)⇔x∈Aor x∈B∪C⇔x∈A∪Bor x∈C⇔x∈(A∪B)∪C.

(It can similarly be shown that A∪(B∪C) = (A∪C)∪B.)

x∈A∩(B∩C)⇔x∈Aand x∈Band x∈C⇔x∈(A∩B)∩C.

c. x∈(A∪B)c⇔x /∈Aor x /∈B⇔x∈Acand x∈Bc⇔x∈Ac∩Bc

x∈(A∩B)c⇔x /∈A∩B⇔x /∈Aand x /∈B⇔x∈Acor x∈Bc⇔x∈Ac∪Bc.

1.4 a. “Aor Bor both” is A∪B. From Theorem 1.2.9b we have P(A∪B) = P(A)+P(B)−P(A∩B).

1-2 Solutions Manual for Statistical Inference

b. “Aor Bbut not both” is (A∩Bc)∪(B∩Ac). Thus we have

P((A∩Bc)∪(B∩Ac)) = P(A∩Bc) + P(B∩Ac) (disjoint union)

= [P(A)−P(A∩B)] + [P(B)−P(A∩B)] (Theorem1.2.9a)

=P(A) + P(B)−2P(A∩B).

c. “At least one of Aor B” is A∪B. So we get the same answer as in a).

d. “At most one of Aor B” is (A∩B)c, and P((A∩B)c) = 1 −P(A∩B).

1.5 a. A∩B∩C={a U.S. birth results in identical twins that are female}

b. P(A∩B∩C) = 1

90 ×1

3×1

1.6

p0= (1 −u)(1 −w), p1=u(1 −w) + w(1 −u), p2=uw,

p0=p2⇒u+w= 1

p1=p2⇒uw = 1/3.

These two equations imply u(1 −u)=1/3, which has no solution in the real numbers. Thus,

the probability assignment is not legitimate.

1.7 a.

P(scoring ipoints) = (1−πr2

Aif i= 0

πr2

Ah(6−i)2−(5−i)2

52iif i= 1, . . . , 5.

P(scoring ipoints|board is hit) = P(scoring ipoints ∩board is hit)

P(board is hit)

P(board is hit) = πr2

P(scoring ipoints ∩board is hit) = πr2

A(6 −i)2−(5 −i)2

52i= 1, . . . , 5.

Therefore,

P(scoring ipoints|board is hit) = (6 −i)2−(5 −i)2

52i= 1, . . . , 5

which is exactly the probability distribution of Example 1.2.7.

1.8 a. P(scoring exactly ipoints) = P(inside circle i)−P(inside circle i+ 1). Circle ihas radius

(6 −i)r/5, so

P(sscoring exactly ipoints) = π(6 −i)2r2

52πr2−π((6−(i+ 1)))2r2

52πr2=(6 −i)2−(5 −i)2

52.

b. Expanding the squares in part a) we ﬁnd P(scoring exactly ipoints) = 11−2i

25 , which is

decreasing in i.

c. Let P(i) = 11−2i

25 . Since i≤5, P(i)≥0 for all i.P(S) = P(hitting the dartboard) = 1 by

deﬁnition. Lastly, P(i∪j) = area of iring + area of jring = P(i) + P(j).

1.9 a. Suppose x∈(∪αAα)c, by the deﬁnition of complement x6∈ ∪αAα, that is x6∈ Aαfor all

α∈Γ. Therefore x∈Ac

αfor all α∈Γ. Thus x∈ ∩αAc

αand, by the deﬁnition of intersection

x∈Ac

αfor all α∈Γ. By the deﬁnition of complement x6∈ Aαfor all α∈Γ. Therefore

x6∈ ∪αAα. Thus x∈(∪αAα)c.

Second Edition 1-3

b. Suppose x∈(∩αAα)c, by the deﬁnition of complement x6∈ (∩αAα). Therefore x6∈ Aαfor

some α∈Γ. Therefore x∈Ac

αfor some α∈Γ. Thus x∈ ∪αAc

αand, by the deﬁnition of

union, x∈Ac

αfor some α∈Γ. Therefore x6∈ Aαfor some α∈Γ. Therefore x6∈ ∩αAα. Thus

x∈(∩αAα)c.

1.10 For A1, . . . , An

(i) n

[

i=1

Ai!c

i=1

i(ii) n

i=1

Ai!c

[

i=1

Proof of (i): If x∈(∪Ai)c, then x /∈ ∪Ai. That implies x /∈Aifor any i, so x∈Ac

ifor every i

and x∈ ∩Ai.

Proof of (ii): If x∈(∩Ai)c, then x /∈ ∩Ai. That implies x∈Ac

ifor some i, so x∈ ∪Ac

1.11 We must verify each of the three properties in Deﬁnition 1.2.1.

a. (1) The empty set ∅ ∈ {∅, S}. Thus ∅ ∈ B. (2) ∅c=S∈ B and Sc=∅ ∈ B. (3) ∅∪S=S∈ B.

b. (1) The empty set ∅is a subset of any set, in particular, ∅ ⊂ S. Thus ∅ ∈ B. (2) If A∈ B,

then A⊂S. By the deﬁnition of complementation, Acis also a subset of S, and, hence,

Ac∈ B. (3) If A1, A2, . . . ∈ B, then, for each i, Ai⊂S. By the deﬁnition of union, ∪Ai⊂S.

Hence, ∪Ai∈ B.

c. Let B1and B2be the two sigma algebras. (1) ∅∈B1and ∅∈B2since B1and B2are

sigma algebras. Thus ∅ ∈ B1∩ B2. (2) If A∈ B1∩ B2, then A∈ B1and A∈ B2. Since

B1and B2are both sigma algebra Ac∈ B1and Ac∈ B2. Therefore Ac∈ B1∩ B2. (3) If

A1, A2, . . . ∈ B1∩B2, then A1, A2, . . . ∈ B1and A1, A2, . . . ∈ B2. Therefore, since B1and B2

are both sigma algebra, ∪∞

i=1Ai∈ B1and ∪∞

i=1Ai∈ B2. Thus ∪∞

i=1Ai∈ B1∩ B2.

1.12 First write

P ∞

[

i=1

Ai!=P n

[

i=1

Ai∪

∞

[

i=n+1

Ai!

=P n

[

i=1

Ai!+P ∞

[

i=n+1

Ai!(Ais are disjoint)

i=1

P(Ai) + P ∞

[

i=n+1

Ai!(ﬁnite additivity)

Now deﬁne Bk=S∞

i=kAi. Note that Bk+1 ⊂Bkand Bk→φas k→ ∞. (Otherwise the sum

of the probabilities would be inﬁnite.) Thus

P ∞

[

i=1

Ai!= lim

n→∞ P ∞

[

i=1

Ai!= lim

n→∞ "n

i=1

P(Ai) + P(Bn+1)#=

∞

i=1

P(Ai).

1.13 If Aand Bare disjoint, P(A∪B) = P(A) + P(B) = 1

3+3

4=13

12 , which is impossible. More

generally, if Aand Bare disjoint, then A⊂Bcand P(A)≤P(Bc). But here P(A)> P (Bc),

so Aand Bcannot be disjoint.

1.14 If S={s1, . . . , sn}, then any subset of Scan be constructed by either including or excluding

si, for each i. Thus there are 2npossible choices.

1.15 Proof by induction. The proof for k= 2 is given after Theorem 1.2.14. Assume true for k, that

is, the entire job can be done in n1×n2× ··· × nkways. For k+ 1, the k+ 1th task can be

done in nk+1 ways, and for each one of these ways we can complete the job by performing

1-4 Solutions Manual for Statistical Inference

the remaining ktasks. Thus for each of the nk+1 we have n1×n2× ··· × nkways of com-

pleting the job by the induction hypothesis. Thus, the number of ways we can do the job is

(1 ×(n1×n2× ··· × nk)) + ··· + (1 ×(n1×n2× ··· × nk))

| {z }

nk+1terms

=n1×n2× ··· × nk×nk+1.

1.16 a) 263. b) 263+ 262. c) 264+ 263+ 262.

1.17 There are n

2=n(n−1)/2 pieces on which the two numbers do not match. (Choose 2 out of

nnumbers without replacement.) There are npieces on which the two numbers match. So the

total number of diﬀerent pieces is n+n(n−1)/2 = n(n+ 1)/2.

1.18 The probability is (n

2)n!

nn=(n−1)(n−1)!

2nn−2. There are many ways to obtain this. Here is one. The

denominator is nnbecause this is the number of ways to place nballs in ncells. The numerator

is the number of ways of placing the balls such that exactly one cell is empty. There are nways

to specify the empty cell. There are n−1 ways of choosing the cell with two balls. There are

n

2ways of picking the 2 balls to go into this cell. And there are (n−2)! ways of placing the

remaining n−2 balls into the n−2 cells, one ball in each cell. The product of these is the

numerator n(n−1)n

2(n−2)! = n

2n!.

1.19 a. 6

4= 15.

b. Think of the nvariables as nbins. Diﬀerentiating with respect to one of the variables is

equivalent to putting a ball in the bin. Thus there are runlabeled balls to be placed in n

unlabeled bins, and there are n+r−1

rways to do this.

1.20 A sample point speciﬁes on which day (1 through 7) each of the 12 calls happens. Thus there

are 712 equally likely sample points. There are several diﬀerent ways that the calls might be

assigned so that there is at least one call each day. There might be 6 calls one day and 1 call

each of the other days. Denote this by 6111111. The number of sample points with this pattern

is 712

66!. There are 7 ways to specify the day with 6 calls. There are 12

6to specify which of

the 12 calls are on this day. And there are 6! ways of assigning the remaining 6 calls to the

remaining 6 days. We will now count another pattern. There might be 4 calls on one day, 2 calls

on each of two days, and 1 call on each of the remaining four days. Denote this by 4221111.

The number of sample points with this pattern is 712

46

28

26

24!. (7 ways to pick day with 4

calls, 12

4to pick the calls for that day, 6

2to pick two days with two calls, 8

2ways to pick

two calls for lowered numbered day, 6

2ways to pick the two calls for higher numbered day,

4! ways to order remaining 4 calls.) Here is a list of all the possibilities and the counts of the

sample points for each one.

pattern number of sample points

6111111 712

66! = 4,656,960

5211111 712

567

25! = 83,825,280

4221111 712

46

28

26

24! = 523,908,000

4311111 712

468

35! = 139,708,800

3321111 7

212

39

356

24! = 698,544,000

3222111 712

36

39

37

25

23! = 1,397,088,000

2222211 7

512

210

28

26

24

22! = 314,344,800

3,162,075,840

The probability is the total number of sample points divided by 712, which is 3,162,075,840

712 ≈

.2285.

1.21 The probability is (n

2r)22r

(2n

2r). There are 2n

2rways of choosing 2rshoes from a total of 2nshoes.

Thus there are 2n

2requally likely sample points. The numerator is the number of sample points

for which there will be no matching pair. There are n

2rways of choosing 2rdiﬀerent shoes

Second Edition 1-5

styles. There are two ways of choosing within a given shoe style (left shoe or right shoe), which

gives 22rways of arranging each one of the n

2rarrays. The product of this is the numerator

n

2r22r.

1.22 a) (31

15)(29

15)(31

15)(30

15)···(31

15)

(366

180)b) 336

366

335

365 ··· 316

336

(366

30 ).

1.23

P( same number of heads ) =

x=0

P(1st tosses x, 2nd tosses x)

x=0 "n

x1

2x1

2n−x#2

=1

4nn

x=0 n

x2

1.24 a.

P(Awins) =

∞

i=1

P(Awins on ith toss)

2+1

221

2+1

241

2+··· =

∞

i=0 1

22i+1

= 2/3.

b. P(Awins) = p+ (1 −p)2p+ (1 −p)4p+··· =P∞

i=0 p(1 −p)2i=p

1−(1−p)2.

c. d

dp p

1−(1−p)2=p2

[1−(1−p)2]2>0. Thus the probability is increasing in p, and the minimum

is at zero. Using L’Hˆopital’s rule we ﬁnd limp→0p

1−(1−p)2= 1/2.

1.25 Enumerating the sample space gives S0={(B, B),(B, G),(G, B),(G, G)},with each outcome

equally likely. Thus P(at least one boy) = 3/4 and P(both are boys) = 1/4, therefore

P( both are boys |at least one boy ) = 1/3.

An ambiguity may arise if order is not acknowledged, the space is S0={(B, B),(B, G),(G, G)},

with each outcome equally likely.

1.27 a. For nodd the proof is straightforward. There are an even number of terms in the sum

(0,1,···, n), and n

kand n

n−k, which are equal, have opposite signs. Thus, all pairs cancel

and the sum is zero. If nis even, use the following identity, which is the basis of Pascal’s

triangle: For k > 0, n

k=n−1

k+n−1

k−1. Then, for neven

k=0

(−1)kn

k=n

0+

n−1

k=1

(−1)kn

k+n

n

=n

0+n

n+

n−1

k=1

(−1)kn−1

k+n−1

k−1

=n

0+n

n−n−1

0−n−1

n−1= 0.

b. Use the fact that for k > 0, kn

k=nn−1

k−1to write

k=1

kn

k=n

k=1 n−1

k−1=n

n−1

j=0 n−1

j=n2n−1.

1-6 Solutions Manual for Statistical Inference

c. Pn

k=1 (−1)k+1kn

k=Pn

k=1(−1)k+1n−1

k−1=nPn−1

j=0 (−1)jn−1

j= 0 from part a).

1.28 The average of the two integrals is

[(nlog n−n) + ((n+ 1) log (n+ 1) −n)] /2=[nlog n+ (n+ 1) log (n+ 1)] /2−n

≈(n+ 1/2) log n−n.

Let dn= log n!−[(n+ 1/2) log n−n], and we want to show that limn→∞ mdn=c, a constant.

This would complete the problem, since the desired limit is the exponential of this one. This

is accomplished in an indirect way, by working with diﬀerences, which avoids dealing with the

factorial. Note that

dn−dn+1 =n+1

2log 1 + 1

n−1.

Diﬀerentiation will show that ((n+1

2)) log((1 + 1

n)) is increasing in n, and has minimum

value (3/2) log 2 = 1.04 at n= 1. Thus dn−dn+1 >0. Next recall the Taylor expansion of

log(1 + x) = x−x2/2 + x3/3−x4/4 + ···. The ﬁrst three terms provide an upper bound on

log(1 + x), as the remaining adjacent pairs are negative. Hence

0< dndn+1 <n+1

21

2n2+1

3n3−1 = 1

12n2+1

6n3.

It therefore follows, by the comparison test, that the series P∞

1dn−dn+1 converges. Moreover,

the partial sums must approach a limit. Hence, since the sum telescopes,

lim

N→∞

dn−dn+1 = lim

N→∞ d1−dN+1 =c.

Thus limn→∞ dn=d1−c, a constant.

1.29 a.

Unordered Ordered

{4,4,12,12}(4,4,12,12), (4,12,12,4), (4,12,4,12)

(12,4,12,4), (12,4,4,12), (12,12,4,4)

Unordered Ordered

(2,9,9,12), (2,9,12,9), (2,12,9,9), (9,2,9,12)

{2,9,9,12}(9,2,12,9), (9,9,2,12), (9,9,12,2), (9,12,2,9)

(9,12,9,2), (12,2,9,9), (12,9,2,9), (12,9,9,2)

b. Same as (a).

c. There are 66ordered samples with replacement from {1,2,7,8,14,20}. The number of or-

dered samples that would result in {2,7,7,8,14,14}is 6!

2!2!1!1! = 180 (See Example 1.2.20).

Thus the probability is 180

66.

d. If the kobjects were distinguishable then there would be k! possible ordered arrangements.

Since we have k1, . . . , kmdiﬀerent groups of indistinguishable objects, once the positions of

the objects are ﬁxed in the ordered arrangement permutations within objects of the same

group won’t change the ordered arrangement. There are k1!k2!···km! of such permutations

for each ordered component. Thus there would be k!

k1!k2!···km!diﬀerent ordered components.

e. Think of the mdistinct numbers as mbins. Selecting a sample of size k, with replacement,

is the same as putting kballs in the mbins. This is k+m−1

k, which is the number of distinct

bootstrap samples. Note that, to create all of the bootstrap samples, we do not need to know

what the original sample was. We only need to know the sample size and the distinct values.

1.31 a. The number of ordered samples drawn with replacement from the set {x1, . . . , xn}is nn. The

number of ordered samples that make up the unordered sample {x1, . . . , xn}is n!. Therefore

the outcome with average x1+x2+···+xn

nthat is obtained by the unordered sample {x1, . . . , xn}

Second Edition 1-7

has probability n!

nn. Any other unordered outcome from {x1, . . . , xn}, distinct from the un-

ordered sample {x1, . . . , xn}, will contain m diﬀerent numbers repeated k1, . . . , kmtimes

where k1+k2+··· +km=nwith at least one of the ki’s satisfying 2 ≤ki≤n. The

probability of obtaining the corresponding average of such outcome is

k1!k2!···km!nn<n!

nn,since k1!k2!···km!>1.

Therefore the outcome with average x1+x2+···+xn

nis the most likely.

b. Stirling’s approximation is that, as n→ ∞,n!≈√2πnn+(1/2)e−n, and thus

n!

nn √2nπ

en!=n!en

nn√2nπ =√2πnn+(1/2)e−nen

nn√2nπ = 1.

c. Since we are drawing with replacement from the set {x1, . . . , xn}, the probability of choosing

any xiis 1

n. Therefore the probability of obtaining an ordered sample of size nwithout xi

is (1 −1

n)n. To prove that limn→∞(1 −1

n)n=e−1, calculate the limit of the log. That is

lim

n→∞ nlog 1−1

n= lim

n→∞

log 1−1

n

1/n .

L’Hˆopital’s rule shows that the limit is −1, establishing the result. See also Lemma 2.3.14.

1.32 This is most easily seen by doing each possibility. Let P(i) = probability that the candidate

hired on the ith trial is best. Then

P(1) = 1

N, P (2) = 1

N−1, . . . , P (i) = 1

N−i+ 1, . . . , P (N) = 1.

1.33 Using Bayes rule

P(M|CB) = P(CB|M)P(M)

P(CB|M)P(M) + P(CB|F)P(F)=.05 ×1

.05 ×1

2+.0025 ×1

=.9524.

1.34 a.

P(Brown Hair)

=P(Brown Hair|Litter 1)P(Litter 1) + P(Brown Hair|Litter 2)P(Litter 2)

=2

31

2+3

51

2=19

30.

b. Use Bayes Theorem

P(Litter 1|Brown Hair) = P(BH|L1)P(L1)

P(BH|L1)P(L1) + P(BH|L2)P(L2=2

31

2

=10

19.

1.35 Clearly P(·|B)≥0, and P(S|B) = 1. If A1, A2, . . . are disjoint, then

P ∞

[

i=1

Ai

B!=P(S∞

i=1 Ai∩B)

P(B)=P(S∞

i=1 (Ai∩B))

P(B)

=P∞

i=1 P(Ai∩B)

P(B)=

∞

i=1

P(Ai|B).

1-8 Solutions Manual for Statistical Inference

1.37 a. Using the same events A, B, C and Was in Example 1.3.4, we have

P(W) = P(W|A)P(A) + P(W|B)P(B) + P(W|C)P(C)

=γ1

3+ 0 1

3+ 1 1

3=γ+1

Thus, P(A|W) = P(A∩W)

P(W)=γ/3

(γ+1)/3=γ

γ+1 where,











γ+1 =1

3if γ=1

γ+1 <1

3if γ < 1

γ+1 >1

3if γ > 1

b. By Exercise 1.35, P(·|W) is a probability function. A,Band Care a partition. So

P(A|W) + P(B|W) + P(C|W) = 1.

But, P(B|W) = 0. Thus, P(A|W) + P(C|W) = 1. Since P(A|W) = 1/3, P(C|W) = 2/3.

(This could be calculated directly, as in Example 1.3.4.) So if Acan swap fates with C, his

chance of survival becomes 2/3.

1.38 a. P(A) = P(A∩B) + P(A∩Bc) from Theorem 1.2.11a. But (A∩Bc)⊂Bcand P(Bc) =

1−P(B) = 0. So P(A∩Bc) = 0, and P(A) = P(A∩B). Thus,

P(A|B) = P(A∩B)

P(B)=P(A)

1=P(A)

b. A⊂Bimplies A∩B=A. Thus,

P(B|A) = P(A∩B)

P(A)=P(A)

P(A)= 1.

And also,

P(A|B) = P(A∩B)

P(B)=P(A)

P(B).

c. If Aand Bare mutually exclusive, then P(A∪B) = P(A) + P(B) and A∩(A∪B) = A.

Thus,

P(A|A∪B) = P(A∩(A∪B))

P(A∪B)=P(A)

P(A) + P(B).

d. P(A∩B∩C) = P(A∩(B∩C)) = P(A|B∩C)P(B∩C) = P(A|B∩C)P(B|C)P(C).

1.39 a. Suppose Aand Bare mutually exclusive. Then A∩B=∅and P(A∩B) = 0. If Aand B

are independent, then 0 = P(A∩B) = P(A)P(B). But this cannot be since P(A)>0 and

P(B)>0. Thus Aand Bcannot be independent.

b. If Aand Bare independent and both have positive probability, then

0< P (A)P(B) = P(A∩B).

This implies A∩B6=∅, that is, Aand Bare not mutually exclusive.

1.40 a. P(Ac∩B) = P(Ac|B)P(B) = [1 −P(A|B)]P(B) = [1 −P(A)]P(B) = P(Ac)P(B) , where

the third equality follows from the independence of Aand B.

b. P(Ac∩Bc) = P(Ac)−P(Ac∩B) = P(Ac)−P(Ac)P(B) = P(Ac)P(Bc).

Second Edition 1-9

1.41 a.

P( dash sent |dash rec)

=P( dash rec |dash sent)P( dash sent)

P( dash rec |dash sent)P( dash sent) + P( dash rec |dot sent)P( dot sent)

=(2/3)(4/7)

(2/3)(4/7) + (1/4)(3/7) = 32/41.

b. By a similar calculation as the one in (a) P(dot sent|dot rec) = 27/434. Then we have

P( dash sent|dot rec) = 16

43 . Given that dot-dot was received, the distribution of the four

possibilities of what was sent are

Event Probability

dash-dash (16/43)2

dash-dot (16/43)(27/43)

dot-dash (27/43)(16/43)

dot-dot (27/43)2

1.43 a. For Boole’s Inequality,

P(∪n

i=1)≤

i=1

P(Ai)−P2+P3+··· ± Pn≤

i=1

P(Ai)

since Pi≥Pjif i≤jand therefore the terms −P2k+P2k+1 ≤0 for k= 1, . . . , n−1

2when

nis odd. When nis even the last term to consider is −Pn≤0. For Bonferroni’s Inequality

apply the inclusion-exclusion identity to the Ac

i, and use the argument leading to (1.2.10).

b. We illustrate the proof that the Piare increasing by showing that P2≥P3. The other

arguments are similar. Write

P2=X

1≤i<j≤n

P(Ai∩Aj) =

n−1

i=1

j=i+1

P(Ai∩Aj)

n−1

i=1

j=i+1 "n

k=1

P(Ai∩Aj∩Ak) + P(Ai∩Aj∩(∪kAk)c)#

Now to get to P3we drop terms from this last expression. That is

n−1

i=1

j=i+1 "n

k=1

P(Ai∩Aj∩Ak) + P(Ai∩Aj∩(∪kAk)c)#

≥

n−1

i=1

j=i+1 "n

k=1

P(Ai∩Aj∩Ak)#

≥

n−2

i=1

n−1

j=i+1

k=j+1

P(Ai∩Aj∩Ak) = X

1≤i<j<k≤n

P(Ai∩Aj∩Ak) = P3.

The sequence of bounds is improving because the bounds P1, P1−P2+P3, P1−P2+P3−P4+

P5, . . ., are getting smaller since Pi≥Pjif i≤jand therefore the terms −P2k+P2k+1 ≤0.

The lower bounds P1−P2, P1−P2+P3−P4, P1−P2+P3−P4+P5−P6, . . ., are getting

bigger since Pi≥Pjif i≤jand therefore the terms P2k+1 −P2k≥0.

1-10 Solutions Manual for Statistical Inference

c. If all of the Aiare equal, all of the probabilities in the inclusion-exclusion identity are the

same. Thus

P1=nP (A), P2=n

2P(A), . . . , Pj=n

jP(A),

and the sequence of upper bounds on P(∪iAi) = P(A) becomes

P1=nP (A), P1−P2+P3=n−n

2+n

3P(A), . . .

which eventually sum to one, so the last bound is exact. For the lower bounds we get

P1−P2=n−n

2P(A), P1−P2+P3−P4=n−n

2+n

3−n

4P(A), . . .

which start out negative, then become positive, with the last one equaling P(A) (see Schwa-

ger 1984 for details).

1.44 P(at least 10 correct|guessing) = P20

k=10 20

k1

4k3

4n−k=.01386.

1.45 Xis ﬁnite. Therefore Bis the set of all subsets of X. We must verify each of the three properties

in Deﬁnition 1.2.4. (1) If A∈ B then PX(A) = P(∪xi∈A{sj∈S:X(sj) = xi})≥0 since P

is a probability function. (2) PX(X) = P(∪m

i=1{sj∈S:X(sj) = xi}) = P(S) = 1. (3) If

A1, A2, . . . ∈ B and pairwise disjoint then

PX(∪∞

k=1Ak) = P(

∞

[

k=1{∪xi∈Ak{sj∈S:X(sj) = xi}})

∞

k=1

P(∪xi∈Ak{sj∈S:X(sj) = xi}) =

∞

k=1

PX(Ak),

where the second inequality follows from the fact the Pis a probability function.

1.46 This is similar to Exercise 1.20. There are 77equally likely sample points. The possible values of

X3are 0, 1 and 2. Only the pattern 331 (3 balls in one cell, 3 balls in another cell and 1 ball in a

third cell) yields X3= 2. The number of sample points with this pattern is 7

27

34

35 = 14,700.

So P(X3= 2) = 14,700/77≈.0178. There are 4 patterns that yield X3= 1. The number of

sample points that give each of these patterns is given below.

pattern number of sample points

34 77

36 = 1,470

322 77

36

24

22

2= 22,050

3211 77

364

25

22! = 176,400

31111 77

36

44! = 88,200

288,120

So P(X3= 1) = 288,120/77≈.3498. The number of sample points that yield X3= 0 is

77−288,120 −14,700 = 520,723, and P(X3= 0) = 520,723/77≈.6322.

1.47 All of the functions are continuous, hence right-continuous. Thus we only need to check the

limit, and that they are nondecreasing

a. limx→−∞ 1

2+1

πtan−1(x) = 1

2+1

π−π

2= 0, limx→∞ 1

2+1

πtan−1(x) = 1

2+1

ππ

2= 1, and

dx 1

2+1

πtan−1(x)=1

1+x2>0, so F(x) is increasing.

b. See Example 1.5.5.

c. limx→−∞ e−e−x= 0, limx→∞ e−e−x= 1, d

dx e−e−x=e−xe−e−x>0.

d. limx→−∞(1 −e−x) = 0, limx→∞(1 −e−x) = 1, d

dx (1 −e−x) = e−x>0.

Second Edition 1-11

e. limy→−∞ 1−

1+e−y= 0, limy→∞ +1−

1+e−y= 1, d

dx (1−

1+e−y) = (1−)e−y

(1+e−y)2>0 and d

dx (+1−

1+e−y)>

0, FY(y) is continuous except on y= 0 where limy↓0(+1−

1+e−y) = F(0). Thus is FY(y) right

continuous.

1.48 If F(·) is a cdf, F(x) = P(X≤x). Hence limx→∞ P(X≤x) = 0 and limx→−∞ P(X≤x) = 1.

F(x) is nondecreasing since the set {x:X≤x}is nondecreasing in x. Lastly, as x↓x0,

P(X≤x)→P(X≤x0), so F(·) is right-continuous. (This is merely a consequence of deﬁning

F(x) with “ ≤”.)

1.49 For every t,FX(t)≤FY(t). Thus we have

P(X > t) = 1 −P(X≤t) = 1 −FX(t)≥1−FY(t) = 1 −P(Y≤t) = P(Y > t).

And for some t∗,FX(t∗)< FY(t∗). Then we have that

P(X > t∗) = 1 −P(X≤t∗) = 1 −FX(t∗)>1−FY(t∗) = 1 −P(Y≤t∗) = P(Y > t∗).

1.50 Proof by induction. For n= 2

k=1

tk−1= 1 + t=1−t2

1−t.

Assume true for n, this is Pn

k=1 tk−1=1−tn

1−t. Then for n+ 1

n+1

k=1

tk−1=

k=1

tk−1+tn=1−tn

1−t+tn=1−tn+tn(1−t)

1−t=1−tn+1

1−t,

where the second inequality follows from the induction hypothesis.

1.51 This kind of random variable is called hypergeometric in Chapter 3. The probabilities are

obtained by counting arguments, as follows.

x fX(x) = P(X=x)

05

025

4.30

4≈.4616

15

125

3.30

4≈.4196

25

225

2.30

4≈.1095

35

325

1.30

4≈.0091

45

425

0.30

4≈.0002

The cdf is a step function with jumps at x= 0,1,2,3 and 4.

1.52 The function g(·) is clearly positive. Also,

Z∞

g(x)dx =Z∞

f(x)

1−F(x0)dx =1−F(x0)

1−F(x0)= 1.

1.53 a. limy→−∞ FY(y) = limy→−∞ 0 = 0 and limy→∞ FY(y) = limy→∞ 1−1

y2= 1. For y≤1,

FY(y) = 0 is constant. For y > 1, d

dy FY(y) = 2/y3>0, so FYis increasing. Thus for all y,

FYis nondecreasing. Therefore FYis a cdf.

b. The pdf is fY(y) = d

dy FY(y) = 2/y3if y > 1

0 if y≤1.

c. FZ(z) = P(Z≤z) = P(10(Y−1) ≤z) = P(Y≤(z/10) + 1) = FY((z/10) + 1). Thus,

FZ(z) = (0 if z≤0

1−1

[(z/10)+1]2if z > 0.

1-12 Solutions Manual for Statistical Inference

1.54 a. Rπ/2

0sin xdx = 1. Thus, c= 1/1 = 1.

b. R∞

−∞ e−|x|dx =R0

−∞ exdx +R∞

0e−xdx = 1 + 1 = 2. Thus, c= 1/2.

1.55

P(V≤5) = P(T < 3) = Z3

1.5e−t/1.5dt = 1 −e−2.

For v≥6,

P(V≤v) = P(2T≤v) = PT≤v

2=Zv

1.5e−t/1.5dt = 1 −e−v/3.

Therefore,

P(V≤v) = (0−∞ < v < 0,

1−e−20≤v < 6 ,

1−e−v/36≤v

Chapter 2

Transformations and Expectations

2.1 a. fx(x) = 42x5(1 −x), 0 <x<1; y=x3=g(x), monotone, and Y= (0,1). Use Theorem

2.1.5.

fY(y) = fx(g−1(y)) 

dy g−1(y)

=fx(y1/3)d

dy (y1/3) = 42y5/3(1 −y1/3)(1

3y−2/3)

= 14y(1 −y1/3) = 14y−14y4/3,0< y < 1.

To check the integral,

(14y−14y4/3)dy = 7y2−14y7/3

7/3

= 7y2−6y7/3

0= 1 −0 = 1.

b. fx(x) = 7e−7x, 0 < x < ∞,y= 4x+ 3, monotone, and Y= (3,∞). Use Theorem 2.1.5.

fY(y) = fx(y−3

4)

dy (y−3

4)

= 7e−(7/4)(y−3) 

4

4e−(7/4)(y−3),3< y < ∞.

To check the integral,

Z∞

4e−(7/4)(y−3)dy =−e−(7/4)(y−3)

∞

3= 0 −(−1) = 1.

c. FY(y) = P(0 ≤X≤√y) = FX(√y). Then fY(y) = 1

2√yfX(√y). Therefore

fY(y) = 1

2√y30(√y)2(1 −√y)2= 15y1

2(1 −√y)2,0< y < 1.

To check the integral,

15y1

2(1 −√y)2dy =Z1

(15y1

2−30y+ 15y3

2)dy = 15(2

3)−30(1

2) + 15(2

5) = 1.

2.2 In all three cases, Theorem 2.1.5 is applicable and yields the following answers.

a. fY(y) = 1

2y−1/2, 0 < y < 1.

b. fY(y) = (n+m+1)!

n!m!e−y(n+1)(1 −e−y)m, 0 < y < ∞.

c. fY(y) = 1

σ2

log y

ye−(1/2)((log y)/σ)2, 0 < y < ∞.

2.3 P(Y=y) = P(X

X+1 =y) = P(X=y

1−y) = 1

3(2

3)y/(1−y), where y= 0,1

2,2

3,3

4, . . . , x

x+1 , . . . .

2.4 a. f(x) is a pdf since it is positive and

Z∞

−∞

f(x)dx =Z0

−∞

2λeλxdx +Z∞

2λe−λxdx =1

2+1

2= 1.

2-2 Solutions Manual for Statistical Inference

b. Let Xbe a random variable with density f(x).

P(X < t) = (Rt

−∞

2λeλxdx if t < 0

−∞

2λeλxdx+Rt

2λe−λxdx if t≥0

where, Rt

−∞

2λeλxdx =1

2eλx

−∞ =1

2eλt and Rt

2λe−λxdx =−1

2e−λx

0=−1

2e−λt +1

Therefore,

P(X < t) = 1

2eλt if t < 0

1−1

2e−λtdx if t≥0

c. P(|X|< t) = 0 for t < 0, and for t≥0,

P(|X|< t) = P(−t < X < t) = Z0

−t

2λeλxdx +Zt

2λe−λxdx

21−e−λt+1

2−e−λt+1= 1 −e−λt.

2.5 To apply Theorem 2.1.8. Let A0={0},A1= (0,π

2), A3= (π, 3π

2) and A4= (3π

2,2π). Then

gi(x) = sin2(x) on Aifor i= 1,2,3,4. Therefore g−1

1(y) = sin−1(√y), g−1

2(y) = π−sin−1(√y),

g−1

3(y) = sin−1(√y) + πand g−1

4(y) = 2π−sin−1(√y). Thus

fY(y) = 1

2π

√1−y

2√y

2π−1

√1−y

2√y

2π

√1−y

2√y

2π−1

√1−y

2√y

πpy(1 −y),0≤y≤1

To use the cdf given in (2.1.6) we have that x1= sin−1(√y) and x2=π−sin−1(√y). Then by

diﬀerentiating (2.1.6) we obtain that

fY(y)=2fX(sin−1(√y)d

dy (sin−1(√y)−2fX(π−sin−1(√y)d

dy (π−sin−1(√y)

= 2( 1

2π

√1−y

2√y)−2( 1

2π−1

√1−y

2√y)

πpy(1 −y)

2.6 Theorem 2.1.8 can be used for all three parts.

a. Let A0={0},A1= (−∞,0) and A2= (0,∞). Then g1(x) = |x|3=−x3on A1and

g2(x) = |x|3=x3on A2. Use Theorem 2.1.8 to obtain

fY(y) = 1

3e−y1/3y−2/3,0< y < ∞

b. Let A0={0},A1= (−1,0) and A2= (0,1). Then g1(x) = 1 −x2on A1and g2(x) = 1 −x2

on A2. Use Theorem 2.1.8 to obtain

fY(y) = 3

8(1 −y)−1/2+3

8(1 −y)1/2,0< y < 1

Second Edition 2-3

c. Let A0={0},A1= (−1,0) and A2= (0,1). Then g1(x) = 1 −x2on A1and g2(x) = 1 −x

on A2. Use Theorem 2.1.8 to obtain

fY(y) = 3

16(1 −p1−y)21

√1−y+3

8(2 −y)2,0< y < 1

2.7 Theorem 2.1.8 does not directly apply.

a. Theorem 2.1.8 does not directly apply. Instead write

P(Y≤y) = P(X2≤y)

=P(−√y≤X≤√y) if |x| ≤ 1

P(1 ≤X≤√y) if x≥1

=(R√y

−√yfX(x)dx if |x| ≤ 1

R√y

1fX(x)dx if x≥1.

Diﬀerentiation gives

fy(y) = (2

√yif y≤1

9+1

√yif y≥1.

b. If the sets B1, B2, . . . , BKare a partition of the range of Y, we can write

fY(y) = X

fY(y)I(y∈Bk)

and do the transformation on each of the Bk. So this says that we can apply Theorem 2.1.8

on each of the Bkand add up the pieces. For A1= (−1,1) and A2= (1,2) the calculations

are identical to those in part (a). (Note that on A1we are essentially using Example 2.1.7).

2.8 For each function we check the conditions of Theorem 1.5.3.

a. (i) limx→0F(x) = 1 −e−0= 0, limx→−∞ F(x) = 1 −e−∞ = 1.

(ii) 1 −e−xis increasing in x.

(iii) 1 −e−xis continuous.

(iv) F−1

x(y) = −log(1 −y).

b. (i) limx→−∞ F(x) = e−∞/2 = 0, limx→∞ F(x) = 1 −(e1−∞/2) = 1.

(ii) e−x/2is increasing, 1/2 is nondecreasing, 1 −(e1−x/2) is increasing.

(iii) For continuity we only need check x= 0 and x= 1, and limx→0F(x) = 1/2,

limx→1F(x) = 1/2, so Fis continuous.

(iv)

F−1

X(y) = log(2y) 0 ≤y < 1

2≤y < 1,

1−log(2(1 −y)) 1

2≤y < 1

c. (i) limx→−∞ F(x) = e−∞/4 = 0, limx→∞ F(x) = 1 −e−∞/4 = 1.

(ii) e−x/4 and 1 −e−x/4 are both increasing in x.

(iii) limx↓0F(x) = 1 −e−0/4 = 3

4=F(0), so Fis right-continuous.

(iv) F−1

X(y) = log(4y) 0 ≤y < 1

−log(4(1 −y)) 1

4≤y < 1

2-4 Solutions Manual for Statistical Inference

2.9 From the probability integral transformation, Theorem 2.1.10, we know that if u(x) = Fx(x),

then Fx(X)∼uniform(0,1). Therefore, for the given pdf, calculate

u(x) = Fx(x) = (0 if x≤1

(x−1)2/4 if 1 < x < 3

1 if 3 ≤x

2.10 a. We prove part b), which is equivalent to part a).

b. Let Ay={x:Fx(x)≤y}. Since Fxis nondecreasing, Ayis a half inﬁnite interval, either

open, say (−∞, xy), or closed, say (−∞, xy]. If Ayis closed, then

FY(y) = P(Y≤y) = P(Fx(X)≤y) = P(X∈Ay) = Fx(xy)≤y.

The last inequality is true because xy∈Ay, and Fx(x)≤yfor every x∈Ay. If Ayis open,

then

FY(y) = P(Y≤y) = P(Fx(X)≤y) = P(X∈Ay),

as before. But now we have

P(X∈Ay) = P(X∈(− ∞,xy)) = lim

x↑yP(X∈(−∞, x]),

Use the Axiom of Continuity, Exercise 1.12, and this equals limx↑yFX(x)≤y. The last

inequality is true since Fx(x)≤yfor every x∈Ay, that is, for every x<xy. Thus,

FY(y)≤yfor every y. To get strict inequality for some y, let ybe a value that is “jumped

over” by Fx. That is, let ybe such that, for some xy,

lim

x↑yFX(x)< y < FX(xy).

For such a y,Ay= (−∞, xy), and FY(y) = limx↑yFX(x)< y.

2.11 a. Using integration by parts with u=xand dv =xe

−x2

2dx then

EX2=Z∞

−∞

x21

2πe

−x2

2dx =1

2π"−xe

−x2

2

∞

−∞

+Z∞

−∞

−x2

2dx#=1

2π(2π) = 1.

Using example 2.1.7 let Y=X2. Then

fY(y) = 1

2√y1

√2πe

−y

2+1

√2πe

−y

2=1

√2πy e

−y

Therefore,

EY=Z∞

√2πy e

−y

2dy =1

√2π−2y1

−y

2

∞

0+Z∞

−1

−y

2dy=1

√2π(√2π) = 1.

This was obtained using integration by parts with u= 2y1

2and dv =1

−y

2and the fact the

fY(y) integrates to 1.

b. Y=|X|where −∞ < x < ∞. Therefore 0 < y < ∞. Then

FY(y) = P(Y≤y) = P(|X| ≤ y) = P(−y≤X≤y)

=P(x≤y)−P(X≤ −y) = FX(y)−FX(−y).

Second Edition 2-5

Therefore,

FY(y) = d

dy FY(y) = fX(y) + fX(−y) = 1

√2πe

−y

2+1

√2πe

−y

2=r2

πe

−y

Thus,

EY=Z∞

yr2

πe

−y

2dy =r2

πZ∞

e−udu =r2

π−e−u∞

0=r2

π,

where u=y2

EY2=Z∞

y2r2

πe

−y

2dy =r2

π−ye

−y

2

∞

0+Z∞

−y

2dy=r2

πrπ

2= 1.

This was done using integration by part with u=yand dv =ye

−y

2dy. Then Var(Y) = 1−2

π.

2.12 We have tan x=y/d, therefore tan−1(y/d) = xand d

dy tan−1(y/d) = 1

1+(y/d)2

ddy =dx. Thus,

fY(y) = 2

πd

1+(y/d)2,0< y < ∞.

This is the Cauchy distribution restricted to (0,∞), and the mean is inﬁnite.

2.13 P(X=k) = (1 −p)kp+pk(1 −p), k= 1,2, . . .. Therefore,

EX=∞

k=1

k[(1 −p)kp+pk(1 −p)] = (1 −p)p"∞

k=1

k(1 −p)k−1+∞

k=1

kpk−1#

= (1 −p)p1

p2+1

(1 −p)2=1−2p+ 2p2

p(1 −p).

2.14

Z∞

(1 −FX(x))dx =Z∞

P(X > x)dx

=Z∞

0Z∞

fX(y)dydx

=Z∞

0Zy

dxfX(y)dy

=Z∞

yfX(y)dy = EX,

where the last equality follows from changing the order of integration.

2.15 Assume without loss of generality that X≤Y. Then X∨Y=Yand X∧Y=X. Thus

X+Y= (X∧Y)+(X∨Y). Taking expectations

E[X+Y] = E[(X∧Y)+(X∨Y)] = E(X∧Y) + E(X∨Y).

Therefore E(X∨Y) = EX+ EY−E(X∧Y).

2.16 From Exercise 2.14,

ET=Z∞

0ae−λt+(1 −a)e−µtdt =−ae−λt

λ−(1 −a)e−µt

µ

∞

λ+1−a

µ.

2-6 Solutions Manual for Statistical Inference

2.17 a. Rm

03x2dx =m3set

2⇒m=1

21/3=.794.

b. The function is symmetric about zero, therefore m= 0 as long as the integral is ﬁnite.

πZ∞

−∞

1+x2dx =1

πtan−1(x)

∞

−∞

ππ

2+π

2= 1.

This is the Cauchy pdf.

2.18 E|X−a|=R∞

−∞ |x−a|f(x)dx =Ra

−∞ −(x−a)f(x)dx +R∞

a(x−a)f(x)dx. Then,

daE|X−a|=Za

−∞

f(x)dx −Z∞

f(x)dx set

= 0.

The solution to this equation is a= median. This is a minimum since d2/da2E|X−a|= 2f(a)>

2.19

daE(X−a)2=d

da Z∞

−∞

(x−a)2fX(x)dx =Z∞

−∞

da(x−a)2fX(x)dx

=Z∞

−∞ −2(x−a)fX(x)dx =−2Z∞

−∞

xfX(x)dx −aZ∞

−∞

fX(x)dx

=−2[EX−a].

Therefore if d

da E(X−a)2= 0 then −2[EX−a] = 0 which implies that EX=a. If EX=athen

da E(X−a)2=−2[EX−a] = −2[a−a] = 0. EX=ais a minimum since d2/da2E(X−a)2=

2>0. The assumptions that are needed are the ones listed in Theorem 2.4.3.

2.20 From Example 1.5.4, if X= number of children until the ﬁrst daughter, then

P(X=k) = (1 −p)k−1p,

where p = probability of a daughter. Thus Xis a geometric random variable, and

EX=∞

k=1

k(1 −p)k−1p=p−∞

k=1

dp(1 −p)k=−pd

dp "∞

k=0

(1 −p)k−1#

=−pd

dp 1

p−1=1

Therefore, if p = 1

2,the expected number of children is two.

2.21 Since g(x) is monotone

Eg(X) = Z∞

−∞

g(x)fX(x)dx =Z∞

−∞

yfX(g−1(y)) d

dy g−1(y)dy =Z∞

−∞

yfY(y)dy = EY,

where the second equality follows from the change of variable y=g(x), x=g−1(y) and

dx =d

dy g−1(y)dy.

2.22 a. Using integration by parts with u=xand dv =xe−x2/β2we obtain that

Z∞

x2e−x2/β2dx2=β2

2Z∞

e−x2/β2dx.

The integral can be evaluated using the argument on pages 104-105 (see 3.3.14) or by trans-

forming to a gamma kernel (use y=−λ2/β2). Therefore, R∞

0e−x2/β2dx =√πβ/2 and hence

the function integrates to 1.

Second Edition 2-7

b. EX= 2β/√πEX2= 3β2/2 VarX=β23

2−4

π.

2.23 a. Use Theorem 2.1.8 with A0={0},A1= (−1,0) and A2= (0,1). Then g1(x) = x2on A1

and g2(x) = x2on A2. Then

fY(y) = 1

2y−1/2,0< y < 1.

b. EY=R1

0yfY(y)dy =1

3EY2=R1

0y2fY(y)dy =1

5VarY=1

5−1

32=4

45 .

2.24 a. EX=R1

0xaxa−1dx =R1

0axadx =axa+1

a+1 

0=a

a+1 .

EX2=R1

0x2axa−1dx =R1

0axa+1dx =axa+2

a+2 

0=a

a+2 .

VarX=a

a+2 −a

a+1 2=a

(a+2)(a+1)2.

b. EX=Pn

x=1 x

n=1

nPn

x=1 x=1

n(n+1)

2=n+1

EX2=Pn

i=1 x2

n=1

nPn

i=1 x2=1

n(n+1)(2n+1)

6=(n+1)(2n+1)

VarX=(n+1)(2n+1)

6−n+1

22=2n2+3n+1

6−n2+2n+1

4=n2+1

12 .

c. EX=R2

0x3

2(x−1)2dx =3

2R2

0(x3−2x2+x)dx = 1.

EX2=R2

0x23

2(x−1)2dx =3

2R2

0(x4−2x3+x2)dx =8

VarX=8

5−12=3

2.25 a. Y=−Xand g−1(y) = −y. Thus fY(y) = fX(g−1(y))|d

dy g−1(y)|=fX(−y)| − 1|=fX(y)

for every y.

b. To show that MX(t) is symmetric about 0 we must show that MX(0 + ) = MX(0 −) for

all  > 0.

MX(0 + ) = Z∞

−∞

e(0+)xfX(x)dx =Z0

−∞

exfX(x)dx +Z∞

exfX(x)dx

=Z∞

e(−x)fX(−x)dx +Z0

−∞

e(−x)fX(−x)dx =Z∞

−∞

e−xfX(x)dx

=Z∞

−∞

e(0−)xfX(x)dx =MX(0 −).

2.26 a. There are many examples; here are three. The standard normal pdf (Example 2.1.9) is

symmetric about a= 0 because (0 −)2= (0 + )2. The Cauchy pdf (Example 2.2.4) is

symmetric about a= 0 because (0 −)2= (0 + )2. The uniform(0,1) pdf (Example 2.1.4)

is symmetric about a= 1/2 because

f((1/2) + ) = f((1/2) −) = 1 if 0 <  < 1

0 if 1

2≤ < ∞.

Z∞

f(x)dx =Z∞

f(a+)d (change variable, =x−a)

=Z∞

f(a−)d (f(a+) = f(a−) for all  > 0)

=Za

−∞

f(x)dx. (change variable, x=a−)

2-8 Solutions Manual for Statistical Inference

Since

−∞

f(x)dx +Z∞

f(x)dx =Z∞

−∞

f(x)dx = 1,

it must be that

−∞

f(x)dx =Z∞

f(x)dx = 1/2.

Therefore, ais a median.

EX−a= E(X−a) = Z∞

−∞

(x−a)f(x)dx

=Za

−∞

(x−a)f(x)dx +Z∞

(x−a)f(x)dx

=Z∞

(−)f(a−)d +Z∞

f(a+)d

With a change of variable, =a−xin the ﬁrst integral, and =x−ain the second integral

we obtain that

EX−a= E(X−a)

=−Z∞

f(a−)d +Z∞

f(a−)d (f(a+) = f(a−) for all  > 0)

= 0.(two integrals are same)

Therefore, EX=a.

d. If a >  > 0,

f(a−) = e−(a−)> e−(a+)=f(a+).

Therefore, f(x) is not symmetric about a > 0. If − < a ≤0,

f(a−) = 0 < e−(a+)=f(a+).

Therefore, f(x) is not symmetric about a≤0, either.

e. The median of X= log 2 <1 = EX.

2.27 a. The standard normal pdf.

b. The uniform on the interval (0,1).

c. For the case when the mode is unique. Let abe the point of symmetry and bbe the mode. Let

assume that ais not the mode and without loss of generality that a=b+ > b for  > 0. Since

bis the mode then f(b)> f(b+)≥f(b+ 2) which implies that f(a−)> f(a)≥f(a+)

which contradict the fact the f(x) is symmetric. Thus ais the mode.

For the case when the mode is not unique, there must exist an interval (x1, x2) such that

f(x) has the same value in the whole interval, i.e, f(x) is ﬂat in this interval and for all

b∈(x1, x2), bis a mode. Let assume that a6∈ (x1, x2), thus ais not a mode. Let also assume

without loss of generality that a= (b+)> b. Since bis a mode and a= (b+)6∈ (x1, x2)

then f(b)> f(b+)≥f(b+ 2) which contradict the fact the f(x) is symmetric. Thus

a∈(x1, x2) and is a mode.

d. f(x) is decreasing for x≥0, with f(0) > f(x)> f(y) for all 0 < x < y. Thus f(x) is

unimodal and 0 is the mode.

Second Edition 2-9

2.28 a.

µ3=Z∞

−∞

(x−a)3f(x)dx =Za

−∞

(x−a)3f(x)dx +Z∞

(x−a)3f(x)dx

=Z0

−∞

y3f(y+a)dy +Z∞

y3f(y+a)dy (change variable y=x−a)

=Z∞

0−y3f(−y+a)dy +Z∞

y3f(y+a)dy

= 0.(f(−y+a) = f(y+a))

b. For f(x) = e−x,µ1=µ2= 1, therefore α3=µ3.

µ3=Z∞

(x−1)3e−xdx =Z∞

(x3−3x2+ 3x−1)e−xdx

= Γ(4) −3Γ(3) + 3Γ(2) −Γ(1) = 3! −3×2! + 3 ×1−1 = 3.

c. Each distribution has µ1= 0, therefore we must calculate µ2= EX2and µ4= EX4.

(i) f(x) = 1

√2πe−x2/2, µ2= 1, µ4= 3, α4= 3.

(ii) f(x) = 1

2,−1< x < 1, µ2=1

3,µ4=1

5,α4=9

(iii) f(x) = 1

2e−|x|,−∞ < x < ∞, µ2= 2, µ4= 24, α4= 6.

As a graph will show, (iii) is most peaked, (i) is next, and (ii) is least peaked.

2.29 a. For the binomial

EX(X−1) =

x=2

x(x−1)n

xpx(1 −p)n−x

=n(n−1)p2

x=2 n−2

xpx−2(1 −p)n−x

=n(n−1)p2

n−2

y=0 n−2

ypy(1 −p)n−2−y=n(n−1)p2,

where we use the identity x(x−1)n

x=n(n−1)n−2

x, substitute y=x−2 and recognize

that the new sum is equal to 1. Similarly, for the Poisson

EX(X−1) = ∞

x=2

x(x−1)e−λλx

x!=λ2∞

y=0

e−λλy

y!=λ2,

where we substitute y=x−2.

b. Var(X) = E[X(X−1)] + EX−(EX)2. For the binomial

Var(X) = n(n−1)p2+np −(np)2=np(1 −p).

For the Poisson

Var(X) = λ2+λ−λ2=λ.

EY=

y=0

y+an

ya+b−1

a

n+a+b−1

y+a=

y=1

(y−1) + (a+ 1)n−1

y−1a+b−1

a

(n−1)+(a+1)+b−1

(y−1)+(a+1) 

2-10 Solutions Manual for Statistical Inference

y=1

(y−1) + (a+ 1)n−1

y−1a+b−1

a

(n−1)+(a+1)+b−1

(y−1)+(a+1) 

a+1 a+b−1

a

a+1+b−1

a+1 

y=1

a+ 1

(y−1) + (a+ 1)n−1

y−1a+1+b−1

a+1 

(n−1)+(a+1)+b−1

(y−1)+(a+1) 

=na

a+b

n−1

j=0

a+ 1

j+ (a+ 1)n−1

ja+1+b−1

a+1 

(n−1)+(a+1)+b−1

(j+(a+1) =na

a+b,

since the last summation is 1, being the sum over all possible values of a beta-binomial(n−

1, a + 1, b). E[Y(Y−1)] = n(n−1)a(a+1)

(a+b)(a+b+1) is calculated similar to EY, but using the identity

y(y−1)n

y=n(n−1)n−2

y−2and adding 2 instead of 1 to the parameter a. The sum over all

possible values of abeta-binomial(n−2, a + 2, b) will appear in the calculation. Therefore

Var(Y) = E[Y(Y−1)] + EY−(EY)2=nab(n+a+b)

(a+b)2(a+b+ 1).

2.30 a. E(etX ) = Rc

0etx 1

cdx =1

ct etx

0=1

ct etc −1

ct 1 = 1

ct (etc −1).

b. E(etX ) = Rc

c2etxdx =2

c2t2(ctetc −etc + 1).(integration-by-parts)

E(etx) = Zα

−∞

2βe(x−α)/βetxdx +Z∞

2βe−(x−α)/βetxdx

=e−α/β

2β

β+t)ex(1

β+t)

−∞

+−eα/β

2β

β−t)e−x(1

β−t)

∞

=4eαt

4−β2t2,−2/β < t < 2/β.

d. E etX =P∞

x=0 etxr+x−1

xpr(1 −p)x=prP∞

x=0 r+x−1

x(1 −p)etx.Now use the fact

that P∞

x=0 r+x−1

x(1 −p)etx1−(1 −p)etr= 1 for (1 −p)et<1, since this is just the

sum of this pmf, to get E(etX ) = p

1−(1−p)etr, t < −log(1 −p).

2.31 Since the mgf is deﬁned as MX(t) = EetX , we necessarily have MX(0) = Ee0= 1.But t/(1 −t)

is 0 at t= 0, therefore it cannot be an mgf.

2.32

dtS(t)t=0

dt (log(Mx(t))t=0

dt Mx(t)

Mx(t)t=0

=EX

1= EXsince MX(0) = Ee0= 1

dt2S(t)t=0

dt M0

x(t)

Mx(t)t=0

=Mx(t)M00

x(t)−[M0

x(t)]2

[Mx(t)]2t=0

=1·EX2−(EX)2

1= VarX.

2.33 a. MX(t) = P∞

x=0 etx e−λλx

x!=e−λP∞

x=1

(etλ)x

x!=e−λeλet=eλ(et−1).

EX=d

dt Mx(t)t=0 =eλ(et−1)λett=0 =λ.

Second Edition 2-11

EX2=d2

dt2Mx(t)t=0 =λeteλ(et−1)λet+λeteλ(et−1)t=0 =λ2+λ.

VarX= EX2−(EX)2=λ2+λ−λ2=λ.

Mx(t) = ∞

x=0

etxp(1 −p)x=p∞

x=0

((1 −p)et)x

=p1

1−(1 −p)et=p

1−(1 −p)et, t < −log(1 −p).

EX=d

dtMx(t)t=0

=−p

(1 −(1 −p)et)2−(1 −p)ett=0

=p(1 −p)

p2=1−p

EX2=d2

dt2Mx(t)t=0

=1−(1 −p)et2p(1 −p)et+p(1 −p)et21−(1 −p)et(1 −p)et

(1 −(1 −p)et)4t=0

=p3(1 −p)+2p2(1 −p)2

p4=p(1 −p) + 2(1 −p)2

p2.

VarX=p(1 −p) + 2(1 −p)2

p2−(1 −p)2

p2=1−p

p2.

c. Mx(t) = R∞

−∞ etx 1

√2πσ e−(x−µ)2/2σ2dx =1

√2πσ R∞

−∞ e−(x2−2µx−2σ2tx+µ2)/2σ2dx. Now com-

plete the square in the numerator by writing

x2−2µx −2σ2tx+µ2=x2−2(µ+σ2t)x±(µ+σ2t)2+µ2

= (x−(µ+σ2t))2−(µ+σ2t)2+µ2

= (x−(µ+σ2t))2−[2µσ2t+ (σ2t)2].

Then we have Mx(t) = e[2µσ2t+(σ2t)2]/2σ21

√2πσ R∞

−∞ e−1

2σ2(x−(µ+σ2t))2dx =eµt+σ2t2

EX=d

dt Mx(t)t=0 = (µ+σ2t)eµt+σ2t2/2t=0 =µ.

EX2=d2

dt2Mx(t)t=0 = (µ+σ2t)2eµt+σ2t2/2+σ2eµt+σ2t/2t=0 =µ2+σ2.

VarX=µ2+σ2−µ2=σ2.

2.35 a.

EXr

1=Z∞

xr1

√2πxe−(log x)2/2dx (f1is lognormal with µ= 0, σ2= 1)

√2πZ∞

−∞

ey(r−1)e−y2/2eydy (substitute y= log x, dy = (1/x)dx)

√2πZ∞

−∞

e−y2/2+ry dy =1

√2πZ∞

−∞

e−(y2−2ry+r2)/2er2/2dy

=er2/2.

2-12 Solutions Manual for Statistical Inference

Z∞

xrf1(x) sin(2πlog x)dx =Z∞

xr1

√2πxe−(log x)2/2sin(2πlog x)dx

=Z∞

−∞

e(y+r)r1

√2πe−(y+r)2/2sin(2πy + 2πr)dy

(substitute y= log x, dy = (1/x)dx)

=Z∞

−∞

√2πe(r2−y2)/2sin(2πy)dy

(sin(a+ 2πr) = sin(a) if r= 0,1,2, . . .)

= 0,

because e(r2−y2)/2sin(2πy) = −e(r2−(−y)2)/2sin(2π(−y)); the integrand is an odd function

so the negative integral cancels the positive one.

2.36 First, it can be shown that

lim

x→∞ etx−(log x)2=∞

by using l’Hˆopital’s rule to show

lim

x→∞

tx −(log x)2

tx = 1,

and, hence,

lim

x→∞ tx −(log x)2= lim

x→∞ tx =∞.

Then for any k > 0, there is a constant csuch that

Z∞

xetxe( log x)2/2dx ≥cZ∞

xdx =clog x|∞

k=∞.

Hence Mx(t) does not exist.

2.37 a. The graph looks very similar to Figure 2.3.2 except that f1is symmetric around 0 (since it

is standard normal).

b. The functions look like t2/2 – it is impossible to see any diﬀerence.

c. The mgf of f1is eK1(t). The mgf of f2is eK2(t).

d. Make the transformation y=exto get the densities in Example 2.3.10.

2.39 a. d

dx Rx

0e−λtdt =e−λx. Verify

dx Zx

e−λtdt=d

dx −1

λe−λt

0=d

dx −1

λe−λx +1

λ=e−λx.

b. d

dλ R∞

0e−λtdt =R∞

dλ e−λtdt =R∞

0−te−λtdt =−Γ(2)

λ2=−1

λ2. Verify

dλ Z∞

e−λtdt =d

dλ

λ=−1

λ2.

c. d

dt R1

x2dx =−1

t2. Verify

dt Z1

x2dx=d

dt −1

x

t!=d

dt −1 + 1

t=−1

t2.

d. d

dt R∞

(x−t)2dx =R∞

dt 1

(x−t)2dx =R∞

12(x−t)−3dx =−(x−t)−2

∞

1=1

(1−t)2. Verify

dt Z∞

(x−t)−2dx =d

dt h−(x−t)−1

∞

1i=d

1−t=1

(1 −t)2.

Chapter 3

Common Families of Distributions

3.1 The pmf of Xis f(x) = 1

N1−N0+1 ,x=N0, N0+ 1, . . . , N1. Then

EX=

x=N0

N1−N0+1 =1

N1−N0+1 N1

x=1

x−

N0−1

x=1

N1−N0+1 N1(N1+1)

2−(N0−1)(N0−1 + 1)

2

=N1+N0

Similarly, using the formula for PN

1x2, we obtain

Ex2=1

N1−N0+1 N1(N1+1)(2N1+1) −N0(N0−1)(2N0−1)

6

VarX= EX2−EX=(N1−N0)(N1−N0+2)

12 .

3.2 Let X= number of defective parts in the sample. Then X∼hypergeometric(N= 100, M, K)

where M= number of defectives in the lot and K= sample size.

a. If there are 6 or more defectives in the lot, then the probability that the lot is accepted

(X= 0) is at most

P(X= 0 |M= 100, N = 6, K) = 6

094

K

100

K=(100 −K)· ··· ·(100 −K−5)

100 · ··· · 95 .

By trial and error we ﬁnd P(X= 0) = .10056 for K= 31 and P(X= 0) = .09182 for

K= 32. So the sample size must be at least 32.

b. Now P(accept lot) = P(X= 0 or 1), and, for 6 or more defectives, the probability is at

most

P(X= 0 or 1 |M= 100, N = 6, K) = 6

094

K

100

K+6

1 94

K−1

100

K.

By trial and error we ﬁnd P(X= 0 or 1) = .10220 for K= 50 and P(X= 0 or 1) = .09331

for K= 51. So the sample size must be at least 51.

3.3 In the seven seconds for the event, no car must pass in the last three seconds, an event with

probability (1 −p)3. The only occurrence in the ﬁrst four seconds, for which the pedestrian

does not wait the entire four seconds, is to have a car pass in the ﬁrst second and no other

car pass. This has probability p(1 −p)3. Thus the probability of waiting exactly four seconds

before starting to cross is [1 −p(1 −p)3](1 −p)3.

3-2 Solutions Manual for Statistical Inference

3.5 Let X= number of eﬀective cases. If the new and old drugs are equally eﬀective, then the

probability that the new drug is eﬀective on a case is .8. If the cases are independent then X∼

binomial(100, .8), and

P(X≥85) =

100

x=85 100

x.8x.2100−x=.1285.

So, even if the new drug is no better than the old, the chance of 85 or more eﬀective cases is

not too small. Hence, we cannot conclude the new drug is better. Note that using a normal

approximation to calculate this binomial probability yields P(X≥85) ≈P(Z≥1.125) =

.1303.

3.7 Let X∼Poisson(λ). We want P(X≥2) ≥.99, that is,

P(X≤1) = e−λ+λe−λ≤.01.

Solving e−λ+λe−λ=.01 by trial and error (numerical bisection method) yields λ= 6.6384.

3.8 a. We want P(X > N)< .01 where X∼binomial(1000,1/2). Since the 1000 customers choose

randomly, we take p= 1/2. We thus require

P(X > N) =

1000

x=N+1 1000

x1

2x1−1

21000−x

< .01

which implies that

1

21000 1000

x=N+1 1000

x< .01.

This last inequality can be used to solve for N, that is, Nis the smallest integer that satisﬁes

1

21000 1000

x=N+1 1000

x< .01.

The solution is N= 537.

b. To use the normal approximation we take X∼n(500,250), where we used µ= 1000( 1

2) = 500

and σ2= 1000(1

2)(1

2) = 250.Then

P(X > N) = PX−500

√250 >N−500

√250 < .01

thus,

PZ > N−500

√250 < .01

where Z∼n(0,1). From the normal table we get

P(Z > 2.33) ≈.0099 < .01 ⇒N−500

√250 = 2.33

⇒N≈537.

Therefore, each theater should have at least 537 seats, and the answer based on the approx-

imation equals the exact answer.

Second Edition 3-3

3.9 a. We can think of each one of the 60 children entering kindergarten as 60 independent Bernoulli

trials with probability of success (a twin birth) of approximately 1

90 . The probability of having

5 or more successes approximates the probability of having 5 or more sets of twins entering

kindergarten. Then X∼binomial(60,1

90 ) and

P(X≥5) = 1 −

x=0 60

x 1

90x1−1

9060−x

=.0006,

which is small and may be rare enough to be newsworthy.

b. Let Xbe the number of elementary schools in New York state that have 5 or more sets

of twins entering kindergarten. Then the probability of interest is P(X≥1) where X∼

binomial(310,.0006). Therefore P(X≥1) = 1 −P(X= 0) = .1698.

c. Let Xbe the number of States that have 5 or more sets of twins entering kindergarten

during any of the last ten years. Then the probability of interest is P(X≥1) where X∼

binomial(500, .1698). Therefore P(X≥1) = 1 −P(X= 0) = 1 −3.90 ×10−41 ≈1.

3.11 a.

lim

M/N→p,M→∞,N→∞ M

xN−M

K−x

N

K

=K!

x!(K−x)! lim

M/N→p,M→∞,N→∞

M!(N−M)!(N−K)!

N!(M−x)!(N−M−(K−x))!

In the limit, each of the factorial terms can be replaced by the approximation from Stirling’s

formula because, for example,

M! = (M!/(√2πMM+1/2e−M))√2πMM+1/2e−M

and M!/(√2πMM+1/2e−M)→1. When this replacement is made, all the √2πand expo-

nential terms cancel. Thus,

lim

M/N→p,M→∞,N→∞ M

xN−M

K−x

N

K

=K

xlim

M/N→p,M→∞,N→∞

MM+1/2(N−M)N−M+1/2(N−K)N−K+1/2

NN+1/2(M−x)M−x+1/2(N−M−K+x)N−M−(K−x)+1/2.

We can evaluate the limit by breaking the ratio into seven terms, each of which has a ﬁnite

limit we can evaluate. In some limits we use the fact that M→ ∞,N→ ∞ and M/N →p

imply N−M→ ∞. The ﬁrst term (of the seven terms) is

lim

M→∞ M

M−xM

= lim

M→∞

M−x

MM= lim

M→∞

1+−x

MM=1

e−x=ex.

Lemma 2.3.14 is used to get the penultimate equality. Similarly we get two more terms,

lim

N−M→∞ N−M

N−M−(K−x)N−M

=eK−x

and

lim

N→∞ N−K

NN

=e−K.

3-4 Solutions Manual for Statistical Inference

Note, the product of these three limits is one. Three other terms are

lim M→ ∞M

M−x1/2

= 1

lim

N−M→∞ N−M

N−M−(K−x)1/2

= 1

and

lim

N→∞ N−K

N1/2

= 1.

The only term left is

lim

M/N→p,M→∞,N→∞

(M−x)x(N−M−(K−x))K−x

(N−K)K

= lim

M/N→p,M→∞,N→∞ M−x

N−KxN−M−(K−x)

N−KK−x

=px(1 −p)K−x.

b. If in (a) we in addition have K→ ∞,p→0, MK/N →pK →λ, by the Poisson approxi-

mation to the binomial, we heuristically get

M

xN−M

K−x

N

K→K

xpx(1 −p)K−x→e−λλx

x!.

c. Using Stirling’s formula as in (a), we get

lim

N,M,K→∞,M

N→0,KM

N→λM

xN−M

K−x

N

K

= lim

N,M,K→∞,M

N→0,KM

N→λ

e−x

KxexMxex(N−M)K−xeK−x

NKeK

x!lim

N,M,K→∞,M

N→0,KM

N→λKM

NxN−M

NK−x

x!λxlim

N,M,K→∞,M

N→0,KM

N→λ 1−

K!K

=e−λλx

x!.

3.12 Consider a sequence of Bernoulli trials with success probability p. Deﬁne X= number of

successes in ﬁrst ntrials and Y= number of failures before the rth success. Then Xand Y

have the speciﬁed binomial and hypergeometric distributions, respectively. And we have

Fx(r−1) = P(X≤r−1)

=P(rth success on (n+ 1)st or later trial)

=P(at least n+ 1 −rfailures before the rth success)

=P(Y≥n−r+ 1)

= 1 −P(Y≤n−r)

= 1 −FY(n−r).

Second Edition 3-5

3.13 For any Xwith support 0,1, . . ., we have the mean and variance of the 0−truncated XTare

given by

EXT=∞

x=1

xP (XT=x) = ∞

x=1

xP(X=x)

P(X > 0)

∞

x=1

xP (X=x) = 1

P(X > 0)

∞

x=0

xP (X=x) = EX

P(X > 0).

In a similar way we get EX2

T=EX2

P(X>0) .Thus,

VarXT=EX2

P(X > 0) −EX

P(X > 0)2

a. For Poisson(λ), P(X > 0) = 1 −P(X=0)=1−e−λλ0

0! = 1 −e−λ, therefore

P(XT=x) = e−λλx

x!(1−e−λ)x= 1,2, . . .

EXT=λ/(1 −e−λ)

VarXT= (λ2+λ)/(1 −e−λ)−(λ/(1 −e−λ))2.

b. For negative binomial(r, p), P(X > 0) = 1 −P(X= 0) = 1 −r−1

0pr(1 −p)0= 1 −pr. Then

P(XT=x) = r+x−1

xpr(1 −p)x

1−pr, x = 1,2, . . .

EXT=r(1 −p)

p(1 −pr)

VarXT=r(1 −p) + r2(1 −p)2

p2(1 −pr)−r(1 −p)

p(1 −pr)2.

3.14 a. P∞

x=1 −(1−p)x

xlog p=1

log pP∞

x=1 −(1−p)x

x= 1,since the sum is the Taylor series for log p.

EX=−1

log p"∞

x=1

(1−p)x#=−1

log p"∞

x=0

(1−p)x−1#== −1

log p1

p−1=−1

log p1−p

p.

Since the geometric series converges uniformly,

EX2=−1

log p

∞

x=1

x(1 −p)x=(1−p)

log p

∞

x=1

dp(1 −p)x

=(1−p)

log p

∞

x=1

(1 −p)x=(1−p)

log p

dp 1−p

p=−(1−p)

p2log p.

Thus

VarX=−(1−p)

p2log p1 + (1−p)

log p.

Alternatively, the mgf can be calculated,

Mx(t) = −1

log p

∞

x=1 h(1−p)etix=log(1+pet−et)

log p

and can be diﬀerentiated to obtain the moments.

3-6 Solutions Manual for Statistical Inference

3.15 The moment generating function for the negative binomial is

M(t) = p

1−(1 −p)etr

= 1 + 1

r(1 −p)(et−1)

1−(1 −p)et!r

the term

r(1 −p)(et−1)

1−(1 −p)et→λ(et−1)

1=λ(et−1) as r→ ∞, p →1 and r(p−1) →λ.

Thus by Lemma 2.3.14, the negative binomial moment generating function converges to

eλ(et−1), the Poisson moment generating function.

3.16 a. Using integration by parts with, u=tαand dv =e−tdt, we obtain

Γ(α+ 1) = Z∞

t(α+1)−1e−tdt =tα(−e−t)

∞

0−Z∞

αtα−1(−e−t)dt = 0 + αΓ(α) = αΓ(α).

b. Making the change of variable z=√2t, i.e., t=z2/2, we obtain

Γ(1/2) = Z∞

t−1/2e−tdt =Z∞

√2

ze−z2/2zdz =√2Z∞

e−z2/2dz =√2√π

√2=√π.

where the penultimate equality uses (3.3.14).

3.17

EXν=Z∞

xν1

Γ(α)βαxα−1e−x/βdx =1

Γ(α)βαZ∞

x(ν+α)−1e−x/βdx

=Γ(ν+α)βν+α

Γ(α)βα=βνΓ(ν+α)

Γ(α).

Note, this formula is valid for all ν > −α. The expectation does not exist for ν≤ −α.

3.18 If Y∼negative binomial(r, p), its moment generating function is MY(t) = p

1−(1−p)etr,and,

from Theorem 2.3.15, MpY (t) = p

1−(1−p)ept r.Now use L’Hˆopital’s rule to calculate

lim

p→0p

1−(1 −p)ept = lim

p→0

(p−1)tept+ept =1

1−t,

so the moment generating function converges to (1 −t)−r, the moment generating function of

a gamma(r, 1).

3.19 Repeatedly apply the integration-by-parts formula

Γ(n)Z∞

zn−1z−zdz =xn−1e−x

(n−1)! +1

Γ(n−1) Z∞

zn−2z−zdz,

until the exponent on the second integral is zero. This will establish the formula. If X∼

gamma(α, 1) and Y∼Poisson(x). The probabilistic relationship is P(X≥x) = P(Y≤α−1).

3.21 The moment generating function would be deﬁned by 1

πR∞

−∞

etx

1+x2dx. On (0,∞), etx > x, hence

Z∞

etx

1+x2dx > Z∞

1+x2dx =∞,

thus the moment generating function does not exist.

Second Edition 3-7

3.22 a.

E(X(X−1)) = ∞

x=0

x(x−1)e−λλx

=e−λλ2∞

x=2

λx−2

(x−2)! (let y=x−2)

=e−λλ2∞

y=0

λy

y!=e−λλ2eλ=λ2

EX2=λ2+ EX=λ2+λ

VarX= EX2−(EX)2=λ2+λ−λ2=λ.

E(X(X−1)) = ∞

x=0

x(x−1)r+x−1

xpr(1 −p)x

=∞

x=2

r(r+ 1)r+x−1

x−2pr(1 −p)x

=r(r+ 1)(1 −p)2

∞

y=0 r+2+y−1

ypr + 2(1 −p)y

=r(r−1)(1 −p)2

p2,

where in the second equality we substituted y=x−2, and in the third equality we use the

fact that we are summing over a negative binomial(r+ 2, p) pmf. Thus,

VarX= EX(X−1) + EX−(EX)2

=r(r+ 1)(1 −p)2

p2+r(1 −p)

p−r2(1 −p)2

=r(1 −p)

p2.

EX2=Z∞

x21

Γ(α)βαxα−1e−x/βdx =1

Γ(α)βαZ∞

xα+1e−x/β dx

Γ(α)βαΓ(α+ 2)βα+2 =α(α+ 1)β2.

VarX= EX2−(EX)2=α(α+ 1)β2−α2β2=αβ2.

d. (Use 3.3.18)

EX=Γ(α+1)Γ(α+β)

Γ(α+β+1)Γ(α)=αΓ(α)Γ(α+β)

(α+β)Γ(α+β)Γ(α)=α

α+β.

EX2=Γ(α+2)Γ(α+β)

Γ(α+β+2)Γ(α)=(α+1)αΓ(α)Γ(α+β)

(α+β+1)(α+β)Γ(α+β)Γ(α)=α(α+1)

(α+β)(α+β+1).

VarX= EX2−(EX)2=α(α+1)

(α+β)(α+β+1) −α2

(α+β)2=αβ

(α+β)2(α+β+1).

3-8 Solutions Manual for Statistical Inference

e. The double exponential(µ, σ) pdf is symmetric about µ. Thus, by Exercise 2.26, EX=µ.

VarX=Z∞

−∞

(x−µ)21

2σe−|x−µ|/σdx =Z∞

−∞

σz21

2e−|z|σdz

=σ2Z∞

z2e−zdz =σ2Γ(3) = 2σ2.

3.23 a.

Z∞

x−β−1dx =−1

βx−β

∞

βαβ,

thus f(x) integrates to 1 .

b. EXn=βαn

(n−β), therefore

EX=αβ

(1 −β)

EX2=αβ2

(2 −β)

VarX=αβ2

2−β−(αβ)2

(1−β)2

c. If β < 2 the integral of the second moment is inﬁnite.

3.24 a. fx(x) = 1

βe−x/β,x > 0. For Y=X1/γ ,fY(y) = γ

βe−yγ/βyγ−1,y > 0. Using the transforma-

tion z=yγ/β, we calculate

EYn=γ

βZ∞

yγ+n−1e−yγ/βdy =βn/γ Z∞

zn/γ e−zdz =βn/γ Γn

γ+1.

Thus EY=β1/γ Γ( 1

γ+ 1) and VarY=β2/γ hΓ2

γ+1−Γ21

γ+1i.

b. fx(x) = 1

βe−x/β,x > 0. For Y= (2X/β)1/2,fY(y) = ye−y2/2,y > 0 . We now notice that

EY=Z∞

y2e−y2/2dy =√2π

since 1

√2πR∞

−∞ y2e−y2/2= 1, the variance of a standard normal, and the integrand is sym-

metric. Use integration-by-parts to calculate the second moment

EY2=Z∞

y3e−y2/2dy = 2 Z∞

ye−y2/2dy = 2,

where we take u=y2,dv =ye−y2/2. Thus VarY= 2(1 −π/4).

c. The gamma(a, b) density is

fX(x) = 1

Γ(a)baxa−1e−x/b.

Make the transformation y= 1/x with dx =−dy/y2to get

fY(y) = fX(1/y)|1/y2|=1

Γ(a)ba1

ya+1

e−1/by.

Second Edition 3-9

The ﬁrst two moments are

EY=1

Γ(a)baZ∞

01

ya

e−1/by =Γ(a−1)ba−1

Γ(a)ba=1

(a−1)b

EY2=Γ(a−2)ba−2

Γ(a)ba=1

(a−1)(a−2)b2,

and so VarY=1

(a−1)2(a−2)b2.

d. fx(x) = 1

Γ(3/2)β3/2x3/2−1e−x/β,x > 0. For Y= (X/β)1/2,fY(y) = 2

Γ(3/2) y2e−y2,y > 0. To

calculate the moments we use integration-by-parts with u=y2,dv =ye−y2to obtain

EY=2

Γ(3/2) Z∞

y3e−y2dy =2

Γ(3/2) Z∞

ye−y2dy =1

Γ(3/2)

and with u=y3, dv =ye−y2to obtain

EY2=2

Γ(3/2) Z∞

y4e−y2dy =3

Γ(3/2) Z∞

y2e−y2dy =3

Γ(3/2)√π.

Using the fact that 1

2√πR∞

−∞ y2e−y2= 1, since it is the variance of a n(0,2), symmetry yields

R∞

0y2e−y2dy =√π. Thus, VarY= 6 −4/π, using Γ(3/2) = 1

2√π.

e. fx(x) = e−x,x > 0. For Y=α−γlog X,fY(y) = e−eα−y

γeα−y

γ1

γ,−∞ < y < ∞. Calculation

of EYand EY2cannot be done in closed form. If we deﬁne

I1=Z∞

log xe−xdx, I2=Z∞

(log x)2e−xdx,

then EY= E(α−γlog x) = α−γI1, and EY2= E(α−γlog x)2=α2−2αγI1+γ2I2.The

constant I1=.5772157 is called Euler’s constant.

3.25 Note that if T is continuous then,

P(t≤T≤t+δ|t≤T) = P(t≤T≤t+δ, t ≤T)

P(t≤T)

=P(t≤T≤t+δ)

P(t≤T)

=FT(t+δ)−FT(t)

1−FT(t).

Therefore from the deﬁnition of derivative,

hT(t) = 1

1−FT(t)= lim

δ→0

FT(t+δ)−FT(t)

δ=F0

T(t)

1−FT(t)=fT(t)

1−FT(t).

Also,

−d

dt (log[1 −FT(t)]) = −1

1−FT(t)(−fT(t)) = hT(t).

3.26 a. fT(t) = 1

βe−t/β and FT(t) = Rt

βe−x/βdx =−e−x/β 

0= 1 −e−t/β. Thus,

hT(t) = fT(t)

1−FT(t)=(1/β)e−t/β

1−(1 −e−t/β)=1

β.

3-10 Solutions Manual for Statistical Inference

b. fT(t) = γ

βtγ−1e−tγ/β, t ≥0 and FT(t) = Rt

βxγ−1e−xγ/βdx =Rtγ/β

0e−udu =−e−u|tγ/β

1−e−tγ/β, where u=xγ/β . Thus,

hT(t) = (γ/β)tγ−1e−tγ/β

e−tγ/β =γ

βtγ−1.

c. FT(t) = 1

1+e−(t−µ)/β and fT(t) = e−(t−µ)/β

(1+e−(t−µ)/β )2. Thus,

hT(t) = 1

βe−(t−µ)/β(1+e−(t−µ)/β )21

e−(t−µ)/β

1+e−(t−µ)/β

βFT(t).

3.27 a. The uniform pdf satisﬁes the inequalities of Exercise 2.27, hence is unimodal.

b. For the gamma(α, β) pdf f(x), ignoring constants, d

dx f(x) = xα−2e−x/β

β[β(α−1) −x], which

only has one sign change. Hence the pdf is unimodal with mode β(α−1).

c. For the n(µ, σ2) pdf f(x), ignoring constants, d

dx f(x) = x−µ

σ2e−(−x/β)2/2σ2, which only has

one sign change. Hence the pdf is unimodal with mode µ.

d. For the beta(α, β) pdf f(x), ignoring constants,

dxf(x) = xα−2(1 −x)β−2[(α−1) −x(α+β−2)] ,

which only has one sign change. Hence the pdf is unimodal with mode α−1

α+β−2.

3.28 a. (i) µknown,

f(x|σ2) = 1

√2πσ exp −1

2σ2(x−µ)2,

h(x) = 1, c(σ2) = 1

√2πσ2I(0,∞)(σ2), w1(σ2) = −1

2σ2, t1(x) = (x−µ)2.

(ii) σ2known,

f(x|µ) = 1

√2πσ exp −x2

2σ2exp −µ2

2σ2exp µx

σ2,

h(x) = exp −x2

2σ2, c(µ) = 1

√2πσ exp −µ2

2σ2, w1(µ) = µ, t1(x) = x

σ2.

b. (i) αknown,

f(x|β) = 1

Γ(α)βαxα−1e

−x

β,

h(x) = xα−1

Γ(α),x > 0, c(β) = 1

βα, w1(β) = 1

β, t1(x) = −x.

(ii) βknown,

f(x|α) = e−x/β 1

Γ(α)βαexp((α−1) log x),

h(x) = e−x/β,x > 0, c(α) = 1

Γ(α)βαw1(α) = α−1, t1(x) = log x.

(iii) α, β unknown,

f(x|α, β) = 1

Γ(α)βαexp((α−1) log x−x

β),

h(x) = I{x>0}(x), c(α, β) = 1

Γ(α)βα, w1(α) = α−1, t1(x) = log x,

w2(α, β) = −1/β, t2(x) = x.

c. (i) αknown, h(x) = xα−1I[0,1](x), c(β) = 1

B(α,β), w1(β) = β−1, t1(x) = log(1 −x).

(ii) βknown, h(x) = (1 −x)β−1I[0,1](x), c(α) = 1

B(α,β), w1(x) = α−1, t1(x) = log x.

Second Edition 3-11

(iii) α, β unknown,

h(x) = I[0,1](x), c(α, β) = 1

B(α,β), w1(α) = α−1, t1(x) = log x,

w2(β) = β−1, t2(x) = log(1 −x).

d. h(x) = 1

x!I{0,1,2,...}(x), c(θ) = e−θ, w1(θ) = log θ, t1(x) = x.

e. h(x) = x−1

r−1I{r,r+1,...}(x), c(p) = p

1−pr, w1(p) = log(1 −p), t1(x) = x.

3.29 a. For the n(µ, σ2)

f(x) = 1

√2π e−µ2/2σ2

σ!e−x2/2σ2+xµ/σ2,

so the natural parameter is (η1, η2) = (−1/2σ2, µ/σ2) with natural parameter space

{(η1,η2):η1<0,−∞ < η2<∞}.

b. For the gamma(α, β),

f(x) = 1

Γ(α)βαe(α−1) log x−x/β ,

so the natural parameter is (η1, η2) = (α−1,−1/β) with natural parameter space

{(η1,η2):η1>−1,η2<0}.

c. For the beta(α, β),

f(x) = Γ(α+β)

Γ(α)Γ(β)e(α−1) log x+(β−1) log(1−x),

so the natural parameter is (η1, η2)=(α−1, β −1) and the natural parameter space is

{(η1,η2):η1>−1,η2>−1}.

d. For the Poisson

f(x) = 1

x!e−θexlogθ

so the natural parameter is η= log θand the natural parameter space is {η:−∞ < η < ∞}.

e. For the negative binomial(r, p), rknown,

P(X=x) = r+x−1

x(pr)exlog (1−p),

so the natural parameter is η= log(1 −p) with natural parameter space {η:η < 0}.

3.31 a.

0 = ∂

∂θ Zh(x)c(θ) exp k

i=1

wi(θ)ti(x)!dx

=Zh(x)c0(θ) exp k

i=1

wi(θ)ti(x)!dx

+Zh(x)c(θ) exp k

i=1

wi(θ)ti(x)! k

i=1

∂wi(θ)

∂θj

ti(x)!dx

=Zh(x)∂

∂θj

logc(θ)c(θ) exp k

i=1

wi(θ)ti(x)!dx + E "k

i=1

∂wi(θ)

∂θj

ti(x)#

=∂

∂θj

logc(θ)+E"k

i=1

∂wi(θ)

∂θj

ti(x)#

Therefore E hPk

i=1

∂wi(θ)

∂θjti(x)i=−∂

∂θjlogc(θ).

3-12 Solutions Manual for Statistical Inference

0 = ∂2

∂θ2Zh(x)c(θ) exp k

i=1

wi(θ)ti(x)!dx

=Zh(x)c00(θ) exp k

i=1

wi(θ)ti(x)!dx

+Zh(x)c0(θ) exp k

i=1

wi(θ)ti(x)! k

i=1

∂wi(θ)

∂θj

ti(x)!dx

+Zh(x)c0(θ) exp k

i=1

wi(θ)ti(x)! k

i=1

∂wi(θ)

∂θj

ti(x)!dx

+Zh(x)c(θ) exp k

i=1

wi(θ)ti(x)! k

i=1

∂wi(θ)

∂θj

ti(x)!2

+Zh(x)c(θ) exp k

i=1

wi(θ)ti(x)! k

i=1

∂2wi(θ)

∂θ2

ti(x)!dx

=Zh(x)"∂2

∂θ2

logc(θ)#c(θ) exp k

i=1

wi(θ)ti(x)!dx

+Zh(x)c0(θ)

c(θ)2

c(θ) exp k

i=1

wi(θ)ti(x)!dx

+2 ∂

∂θj

logc(θ)E"k

i=1

∂wi(θ)

∂θj

ti(x)#

+E "(

i=1

∂wi(θ)

∂θj

ti(x))2#+ E "k

i=1

∂2wi(θ)

∂θ2

ti(x)#

=∂2

∂θ2

logc(θ) + ∂

∂θj

logc(θ)2

−2E "k

i=1

∂wi(θ)

∂θj

ti(x)#E"k

i=1

∂wi(θ)

∂θj

ti(x)#

+E "(

i=1

∂wi(θ)

∂θj

ti(x))2#+ E "k

i=1

∂2wi(θ)

∂θ2

ti(x)#

=∂2

∂θ2

logc(θ) + Var k

i=1

∂wi(θ)

∂θj

ti(x)!+ E "k

i=1

∂2wi(θ)

∂θ2

ti(x)#.

Therefore Var Pk

i=1

∂wi(θ)

∂θjti(x)=−∂2

∂θ2

logc(θ)−EhPk

i=1

∂2wi(θ)

∂θ2

ti(x)i.

3.33 a. (i) h(x) = exI{−∞<x<∞}(x), c(θ) = 1

√2πθ exp(−θ

2)θ > 0, w1(θ) = 1

2θ, t1(x) = −x2.

(ii) The nonnegative real line.

b. (i) h(x) = I{−∞<x<∞}(x), c(θ) = 1

√2πaθ2exp(−1

2a)− ∞ < θ < ∞, a > 0,

w1(θ) = 1

2aθ2, w2(θ) = 1

aθ , t1(x) = −x2, t2(x) = x.

(ii) A parabola.

Second Edition 3-13

c. (i) h(x) = 1

xI{0<x<∞}(x), c(α) = αα

Γ(α)α > 0, w1(α) = α, w2(α) = α,

t1(x) = log(x), t2(x) = −x.

(ii) A line.

d. (i) h(x) = Cexp(x4)I{−∞<x<∞}(x), c(θ) = exp(θ4)− ∞ < θ < ∞, w1(θ) = θ,

w2(θ) = θ2, w3(θ) = θ3, t1(x) = −4x3, t2(x) = 6x2, t3(x) = −4x.

(ii) The curve is a spiral in 3-space.

(iii) A good picture can be generated with the Mathematica statement

ParametricPlot3D[{t, t^2, t^3}, {t, 0, 1}, ViewPoint -> {1, -2, 2.5}].

3.35 a. In Exercise 3.34(a) w1(λ) = 1

2λand for a n(eθ, eθ), w1(θ) = 1

2eθ.

b. EX=µ=αβ, then β=µ

α. Therefore h(x) = 1

xI{0<x<∞}(x),

c(α) = αα

Γ(α)( µ

α)α, α > 0, w1(α) = α, w2(α) = α

µ, t1(x) = log(x), t2(x) = −x.

c. From (b) then (α1, . . . , αn, β1, . . . , βn) = (α1, . . . , αn,α1

µ, . . . , αn

µ)

3.37 The pdf ( 1

σ)f((x−µ)

σ) is symmetric about µbecause, for any  > 0,

σf(µ+)−µ

σ=1

σf

σ=1

σf−

σ=1

σf(µ−)−µ

σ.

Thus, by Exercise 2.26b, µis the median.

3.38 P(X > xα) = P(σZ +µ > σzα+µ) = P(Z > zα) by Theorem 3.5.6.

3.39 First take µ= 0 and σ= 1.

a. The pdf is symmetric about 0, so 0 must be the median. Verifying this, write

P(Z≥0) = Z∞

1+z2dz =1

πtan−1(z)

∞

ππ

2−0=1

b. P(Z≥1) = 1

πtan−1(z)∞

1=1

ππ

2−π

4=1

4.By symmetry this is also equal to P(Z≤ −1).

Writing z= (x−µ)/σ establishes P(X≥µ) = 1

2and P(X≥µ+σ) = 1

3.40 Let X∼f(x) have mean µand variance σ2. Let Z=X−µ

σ. Then

EZ=1

σE(X−µ) = 0

and

VarZ= Var X−µ

σ=1

σ2Var(X−µ) = 1

σ2VarX=σ2

σ2= 1.

Then compute the pdf of Z,fZ(z) = fx(σz +µ)·σ=σfx(σz +µ) and use fZ(z) as the standard

pdf.

3.41 a. This is a special case of Exercise 3.42a.

b. This is a special case of Exercise 3.42b.

3.42 a. Let θ1> θ2. Let X1∼f(x−θ1) and X2∼f(x−θ2). Let F(z) be the cdf corresponding to

f(z) and let Z∼f(z).Then

F(x|θ1) = P(X1≤x) = P(Z+θ1≤x) = P(Z≤x−θ1) = F(x−θ1)

≤F(x−θ2) = P(Z≤x−θ2) = P(Z+θ2≤x) = P(X2≤x)

=F(x|θ2).

3-14 Solutions Manual for Statistical Inference

The inequality is because x−θ2> x −θ1, and Fis nondecreasing. To get strict inequality

for some x, let (a, b] be an interval of length θ1−θ2with P(a < Z ≤b) = F(b)−F(a)>0.

Let x=a+θ1. Then

F(x|θ1) = F(x−θ1) = F(a+θ1−θ1) = F(a)

< F (b) = F(a+θ1−θ2) = F(x−θ2) = F(x|θ2).

b. Let σ1> σ2. Let X1∼f(x/σ1) and X2∼f(x/σ2). Let F(z) be the cdf corresponding to

f(z) and let Z∼f(z). Then, for x > 0,

F(x|σ1) = P(X1≤x) = P(σ1Z≤x) = P(Z≤x/σ1) = F(x/σ1)

≤F(x/σ2) = P(Z≤x/σ2) = P(σ2Z≤x) = P(X2≤x)

=F(x|σ2).

The inequality is because x/σ2> x/σ1(because x > 0 and σ1> σ2>0), and Fis

nondecreasing. For x≤0, F(x|σ1) = P(X1≤x)=0=P(X2≤x) = F(x|σ2). To

get strict inequality for some x, let (a, b] be an interval such that a > 0, b/a =σ1/σ2and

P(a < Z ≤b) = F(b)−F(a)>0. Let x=aσ1. Then

F(x|σ1) = F(x/σ1) = F(aσ1/σ1) = F(a)

< F (b) = F(aσ1/σ2) = F(x/σ2)

=F(x|σ2).

3.43 a. FY(y|θ) = 1 −FX(1

y|θ)y > 0, by Theorem 2.1.3. For θ1> θ2,

FY(y|θ1) = 1 −FX1

y

θ1≤1−FX1

y

θ2=FY(y|θ2)

for all y, since FX(x|θ) is stochastically increasing and if θ1> θ2,FX(x|θ2)≤FX(x|θ1) for

all x. Similarly, FY(y|θ1) = 1 −FX(1

y|θ1)<1−FX(1

y|θ2) = FY(y|θ2) for some y, since if

θ1> θ2,FX(x|θ2)< FX(x|θ1) for some x. Thus FY(y|θ) is stochastically decreasing in θ.

b. FX(x|θ) is stochastically increasing in θ. If θ1> θ2and θ1, θ2>0 then 1

θ2>1

θ1. Therefore

FX(x|1

θ1)≤FX(x|1

θ2) for all xand FX(x|1

θ1)< FX(x|1

θ2) for some x. Thus FX(x|1

θ) is

stochastically decreasing in θ.

3.44 The function g(x) = |x|is a nonnegative function. So by Chebychev’s Inequality,

P(|X| ≥ b)≤E|X|/b.

Also, P(|X| ≥ b) = P(X2≥b2). Since g(x) = x2is also nonnegative, again by Chebychev’s

Inequality we have

P(|X| ≥ b) = P(X2≥b2)≤EX2/b2.

For X∼exponential(1), E|X|= EX= 1 and EX2= VarX+ (EX)2= 2 . For b= 3,

E|X|/b = 1/3>2/9 = EX2/b2.

Thus EX2/b2is a better bound. But for b=√2,

E|X|/b = 1/√2<1 = EX2/b2.

Thus E|X|/b is a better bound.

Second Edition 3-15

3.45 a.

MX(t) = Z∞

−∞

etxfX(x)dx ≥Z∞

etxfX(x)dx

≥eta Z∞

fX(x)dx =etaP(X≥a),

where we use the fact that etx is increasing in xfor t > 0.

MX(t) = Z∞

−∞

etxfX(x)dx ≥Za

−∞

etxfX(x)dx

≥eta Za

−∞

fX(x)dx =etaP(X≤a),

where we use the fact that etx is decreasing in xfor t < 0.

c. h(t, x) must be nonnegative.

3.46 For X∼uniform(0,1), µ=1

2and σ2=1

12 , thus

P(|X−µ|> kσ) = 1 −P1

2−k

√12 ≤X≤1

2+k

√12=1−2k

√12 k < √3,

0k≥√3,

For X∼exponential(λ), µ=λand σ2=λ2, thus

P(|X−µ|> kσ) = 1 −P(λ−kλ ≤X≤λ+kλ) = 1 + e−(k+1) −ek−1k≤1

e−(k+1) k > 1.

From Example 3.6.2, Chebychev’s Inequality gives the bound P(|X−µ|> kσ)≤1/k2.

Comparison of probabilities

ku(0,1) exp(λ) Chebychev

exact exact

.1 .942 .926 100

.5 .711 .617 4

1 .423 .135 1

1.5 .134 .0821 .44

√3 0 0.0651 .33

2 0 0.0498 .25

4 0 0.00674 .0625

10 0 0.0000167 .01

So we see that Chebychev’s Inequality is quite conservative.

3.47

P(|Z|> t)=2P(Z > t)=21

√2πZ∞

e−x2/2dx

=r2

πZ∞

1+x2

1+x2e−x2/2dx

=r2

πZ∞

1+x2e−x2/2dx+Z∞

1+x2e−x2/2dx.

3-16 Solutions Manual for Statistical Inference

To evaluate the second term, let u=x

1+x2,dv =xe−x2/2dx,v=−e−x2/2,du =1−x2

(1+x2)2, to

obtain

Z∞

1 + x2e−x2/2dx =x

1 + x2(−e−x2/2)

∞

t−Z∞

1−x2

(1 + x2)2(−e−x2/2)dx

1 + t2e−t2/2+Z∞

1−x2

(1 + x2)2e−x2/2dx.

Therefore,

P(Z≥t) = r2

1 + t2e−t2/2+r2

πZ∞

t1

1 + x2+1−x2

(1 + x2)2e−x2/2dx

=r2

1 + t2e−t2/2+r2

πZ∞

t2

(1 + x2)2e−x2/2dx

≥r2

1 + t2e−t2/2.

3.48 For the negative binomial

P(X=x+ 1) = r+x+ 1 −1

x+ 1 pr(1 −p)x+1 =r+x

x+ 1(1 −p)P(X=x).

For the hypergeometric

P(X=x+ 1) = 









(M−x)(k−x+x+1)(x+1)

P(X=x)if x < k,x < M,x≥M−(N−k)

x+1)( N−M

k−x−1)

k)if x=M−(N−k)−1

0 otherwise.

3.49 a.

E(g(X)(X−αβ)) = Z∞

g(x)(x−αβ)1

Γ(α)βαxα−1e−x/βdx.

Let u=g(x), du =g0(x), dv = (x−αβ)xα−1e−x/β ,v=−βxαe−x/β . Then

Eg(X)(X−αβ) = 1

Γ(α)βα−g(x)βxαe−x/β 

∞

0+βZ∞

g0(x)xαe−x/βdx.

Assuming g(x) to be diﬀerentiable, E|Xg0(X)|<∞and limx→∞ g(x)xαe−x/β = 0, the ﬁrst

term is zero, and the second term is βE(Xg0(X)).

Eg(X)β−(α−1)1−X

x=Γ(α+β)

Γ(α)Γ(β)Z1

g(x)β−(α−1)1−x

xxα−1(1 −x)β−1dx.

Let u=g(x) and dv = (β−(α−1)1−x

x)xα−1(1 −x)β. The expectation is

Γ(α+β)

Γ(α)Γ(β)g(x)xα−1(1 −x)β

0+Z1

(1 −x)g0(x)xα−1(1 −x)β−1dx= E((1 −X)g0(X)),

assuming the ﬁrst term is zero and the integral exists.

Second Edition 3-17

3.50 The proof is similar to that of part a) of Theorem 3.6.8. For X∼negative binomial(r, p),

Eg(X)

=∞

x=0

g(x)r+x−1

xpr(1 −p)x

=∞

y=1

g(y−1)r+y−2

y−1pr(1 −p)y−1(set y=x+ 1)

=∞

y=1

g(y−1) y

r+y−1r+y−1

ypr(1 −p)y−1

=∞

y=0 y

r+y−1

g(y−1)

1−pr+y−1

ypr(1 −p)y(the summand is zero at y = 0)

= E X

r+X−1

g(X−1)

1−p,

where in the third equality we use the fact that r+y−2

y−1=y

r+y−1r+y−1

y.

Chapter 4

Multiple Random Variables

4.1 Since the distribution is uniform, the easiest way to calculate these probabilities is as the ratio

of areas, the total area being 4.

a. The circle x2+y2≤1 has area π, so P(X2+Y2≤1) = π

b. The area below the line y= 2xis half of the area of the square, so P(2X−Y > 0) = 2

c. Clearly P(|X+Y|<2) = 1.

4.2 These are all fundamental properties of integrals. The proof is the same as for Theorem 2.2.5

with bivariate integrals replacing univariate integrals.

4.3 For the experiment of tossing two fair dice, each of the points in the 36-point sample space are

equally likely. So the probability of an event is (number of points in the event)/36. The given

probabilities are obtained by noting the following equivalences of events.

P({X= 0, Y = 0}) = P({(1,1),(2,1),(1,3),(2,3),(1,5),(2,5)}) = 6

36 =1

P({X= 0, Y = 1}) = P({(1,2),(2,2),(1,4),(2,4),(1,6),(2,6)}) = 6

36 =1

P({X= 1, Y = 0})

=P({(3,1),(4,1),(5,1),(6,1),(3,3),(4,3),(5,3),(6,3),(3,5),(4,5),(5,5),(6,5)})

=12

36 =1

P({X= 1, Y = 1})

=P({(3,2),(4,2),(5,2),(6,2),(3,4),(4,4),(5,4),(6,4),(3,6),(4,6),(5,6),(6,6)})

=12

36 =1

4.4 a. R1

0R2

0C(x+ 2y)dxdy = 4C= 1, thus C=1

b. fX(x) = R1

4(x+ 2y)dy =1

4(x+ 1) 0 < x < 2

0 otherwise

c. FXY (x, y) = P(X≤x, Y ≤y) = Rx

−∞ Ry

−∞ f(v, u)dvdu. The way this integral is calculated

depends on the values of xand y. For example, for 0 < x < 2 and 0 < y < 1,

FXY (x, y) = Zx

−∞ Zy

−∞

f(u, v)dvdu =Zx

0Zy

4(u+ 2v)dvdu =x2y

8+y2x

But for 0 < x < 2 and 1 ≤y,

FXY (x, y) = Zx

−∞ Zy

−∞

f(u, v)dvdu =Zx

0Z1

4(u+ 2v)dvdu =x2

8+x

4-2 Solutions Manual for Statistical Inference

The complete deﬁnition of FXY is

FXY (x, y) = 









0x≤0 or y≤0

x2y/8 + y2x/4 0 < x < 2 and 0 < y < 1

y/2 + y2/2 2 ≤xand 0 < y < 1

x2/8 + x/4 0 < x < 2 and 1 ≤y

1 2 ≤xand 1 ≤y

d. The function z=g(x) = 9/(x+ 1)2is monotone on 0 <x<2, so use Theorem 2.1.5 to

obtain fZ(z) = 9/(8z2), 1 < z < 9.

4.5 a. P(X > √Y) = R1

0R1

√y(x+y)dxdy =7

20 .

b. P(X2< Y < X) = R1

0R√y

y2xdxdy =1

4.6 Let A= time that Aarrives and B= time that Barrives. The random variables Aand Bare

independent uniform(1,2) variables. So their joint pdf is uniform on the square (1,2) ×(1,2).

Let X= amount of time Awaits for B. Then, FX(x) = P(X≤x) = 0 for x < 0, and

FX(x) = P(X≤x) = 1 for 1 ≤x. For x= 0, we have

FX(0) = P(X≤0) = P(X= 0) = P(B≤A) = Z2

1Za

1dbda =1

And for 0 < x < 1,

FX(x) = P(X≤x) = 1−P(X > x) = 1−P(B−A > x) = 1−Z2−x

1Z2

a+x

1dbda =1

2+x−x2

4.7 We will measure time in minutes past 8 A.M. So X∼uniform(0,30), Y∼uniform(40,50) and

the joint pdf is 1/300 on the rectangle (0,30) ×(40,50).

P(arrive before 9 A.M.) = P(X+Y < 60) = Z50

40 Z60−y

300dxdy =1

4.9

P(a≤X≤b, c ≤Y≤d)

=P(X≤b, c ≤Y≤d)−P(X≤a, c ≤Y≤d)

=P(X≤b, Y ≤d)−P(X≤b, Y ≤c)−P(X≤a, Y ≤d) + P(X≤a, Y ≤c)

=F(b, d)−F(b, c)−F(a, d)−F(a, c)

=FX(b)FY(d)−FX(b)FY(c)−FX(a)FY(d)−FX(a)FY(c)

=P(X≤b) [P(Y≤d)−P(Y≤c)] −P(X≤a) [P(Y≤d)−P(Y≤c)]

=P(X≤b)P(c≤Y≤d)−P(X≤a)P(c≤Y≤d)

=P(a≤X≤b)P(c≤Y≤d).

4.10 a. The marginal distribution of Xis P(X= 1) = P(X= 3) = 1

4and P(X= 2) = 1

2. The

marginal distribution of Yis P(Y= 2) = P(Y= 3) = P(Y= 4) = 1

3. But

P(X= 2, Y = 3) = 0 6= (1

2)(1

3) = P(X= 2)P(Y= 3).

Therefore the random variables are not independent.

b. The distribution that satisﬁes P(U=x, V =y) = P(U=x)P(V=y) where U∼Xand

V∼Yis

Second Edition 4-3

1 2 3

V31

4.11 The support of the distribution of (U, V ) is {(u, v) : u= 1,2, . . . ;v=u+ 1, u + 2, . . .}. This

is not a cross-product set. Therefore, Uand Vare not independent. More simply, if we know

U=u, then we know V > u.

4.12 One interpretation of “a stick is broken at random into three pieces” is this. Suppose the length

of the stick is 1. Let Xand Ydenote the two points where the stick is broken. Let Xand Y

both have uniform(0,1) distributions, and assume Xand Y are independent. Then the joint

distribution of Xand Yis uniform on the unit square. In order for the three pieces to form

a triangle, the sum of the lengths of any two pieces must be greater than the length of the

third. This will be true if and only if the length of each piece is less than 1/2. To calculate the

probability of this, we need to identify the sample points (x, y) such that the length of each

piece is less than 1/2. If y > x, this will be true if x < 1/2, y−x < 1/2 and 1 −y < 1/2.

These three inequalities deﬁne the triangle with vertices (0,1/2), (1/2,1/2) and (1/2,1). (Draw

a graph of this set.) Because of the uniform distribution, the probability that (X, Y ) falls in

the triangle is the area of the triangle, which is 1/8. Similarly, if x > y, each piece will have

length less than 1/2 if y < 1/2, x−y < 1/2 and 1 −x < 1/2. These three inequalities deﬁne

the triangle with vertices (1/2,0), (1/2,1/2) and (1,1/2). The probability that (X, Y ) is in this

triangle is also 1/8. So the probability that the pieces form a triangle is 1/8+1/8 = 1/4.

4.13 a.

E(Y−g(X))2

= E ((Y−E(Y|X)) + (E(Y|X)−g(X)))2

= E(Y−E(Y|X))2+ E(E(Y|X)−g(X))2+ 2E [(Y−E(Y|X))(E(Y|X)−g(X))] .

The cross term can be shown to be zero by iterating the expectation. Thus

E(Y−g(X))2= E(Y−E(Y|X))2+E(E(Y|X)−g(X))2≥E(Y−E(Y|X))2,for all g(·).

The choice g(X) = E(Y|X) will give equality.

b. Equation (2.2.3) is the special case of a) where we take the random variable Xto be a

constant. Then, g(X) is a constant, say b, and E(Y|X) = EY.

4.15 We will ﬁnd the conditional distribution of Y|X+Y. The derivation of the conditional distri-

bution of X|X+Yis similar. Let U=X+Yand V=Y. In Example 4.3.1, we found the

joint pmf of (U, V ). Note that for ﬁxed u,f(u, v) is positive for v= 0, . . . , u. Therefore the

conditional pmf is

f(v|u) = f(u, v)

f(u)=

θu−ve−θ

(u−v)!

λve−λ

(θ+λ)ue−(θ+λ)

=u

v λ

θ+λvθ

θ+λu−v

, v = 0, . . . , u.

That is V|U∼binomial(U, λ/(θ+λ)).

4.16 a. The support of the distribution of (U, V ) is {(u, v) : u= 1,2, . . . ;v= 0,±1,±2, . . .}.

If V > 0, then X > Y . So for v= 1,2, . . ., the joint pmf is

fU,V (u, v) = P(U=u, V =v) = P(Y=u, X =u+v)

=p(1 −p)u+v−1p(1 −p)u−1=p2(1 −p)2u+v−2.

4-4 Solutions Manual for Statistical Inference

If V < 0, then X < Y . So for v=−1,−2, . . ., the joint pmf is

fU,V (u, v) = P(U=u, V =v) = P(X=u, Y =u−v)

=p(1 −p)u−1p(1 −p)u−v−1=p2(1 −p)2u−v−2.

If V= 0, then X=Y. So for v= 0, the joint pmf is

fU,V (u, 0) = P(U=u, V = 0) = P(X=Y=u) = p(1 −p)u−1p(1 −p)u−1=p2(1 −p)2u−2.

In all three cases, we can write the joint pmf as

fU,V (u, v) = p2(1 −p)2u+|v|−2=p2(1 −p)2u(1 −p)|v|−2, u = 1,2, . . . ;v= 0,±1,±2, . . . .

Since the joint pmf factors into a function of uand a function of v,Uand Vare independent.

b. The possible values of Zare all the fractions of the form r/s, where rand sare positive

integers and r < s. Consider one such value, r/s, where the fraction is in reduced form. That

is, rand shave no common factors. We need to identify all the pairs (x, y) such that xand

yare positive integers and x/(x+y) = r/s. All such pairs are (ir, i(s−r)), i= 1,2, . . ..

Therefore,

PZ=r

s=∞

i=1

P(X=ir, Y =i(s−r)) = ∞

i=1

p(1 −p)ir−1p(1 −p)i(s−r)−1

=p2

(1 −p)2

∞

i=1

((1 −p)s)i=p2

(1 −p)2

(1 −p)s

1−(1 −p)s=p2(1 −p)s−2

1−(1 −p)s.

P(X=x, X +Y=t) = P(X=x, Y =t−x) = P(X=x)P(Y=t−x) = p2(1 −p)t−2.

4.17 a. P(Y=i+ 1) = Ri+1

ie−xdx =e−i(1 −e−1), which is geometric with p= 1 −e−1.

b. Since Y≥5 if and only if X≥4,

P(X−4≤x|Y≥5) = P(X−4≤x|X≥4) = P(X≤x) = e−x,

since the exponential distribution is memoryless.

4.18 We need to show f(x, y) is nonnegative and integrates to 1. f(x, y)≥0, because the numerator

is nonnegative since g(x)≥0, and the denominator is positive for all x > 0, y > 0. Changing

to polar coordinates, x=rcos θand y=rsin θ, we obtain

Z∞

0Z∞

f(x, y)dxdy =Zπ/2

0Z∞

2g(r)

πr rdrdθ =2

πZπ/2

0Z∞

g(r)drdθ =2

πZπ/2

1dθ = 1.

4.19 a. Since (X1−X2).√2∼n(0,1), (X1−X2)2.2∼χ2

1(see Example 2.1.9).

b. Make the transformation y1=x1

x1+x2,y2=x1+x2then x1=y1y2,x2=y2(1 −y1) and

|J|=y2. Then

f(y1, y2) = Γ(α1+α2)

Γ(α1)Γ(α2)yα1−1

1(1 −y1)α2−1 1

Γ(α1+α2)yα1+α2−1

2e−y2,

thus Y1∼beta(α1, α2), Y2∼gamma(α1+α1,1) and are independent.

Second Edition 4-5

4.20 a. This transformation is not one-to-one because you cannot determine the sign of X2from

Y1and Y2. So partition the support of (X1, X2) into A0={−∞ < x1<∞, x2= 0},

A1={−∞ < x1<∞, x2>0}and A2={−∞ < x1<∞, x2<0}. The support of (Y1, Y2)

is B={0< y1<∞,−1< y2<1}. The inverse transformation from Bto A1is x1=y2√y1

and x2=py1−y1y2

2with Jacobian

J1=

√y1√y1

√1−y2

√y1

y2√y1

√1−y2



2p1−y2

The inverse transformation from Bto A2is x1=y2√y1and x2=−py1−y1y2

2with J2=

−J1. From (4.3.6), fY1,Y 2(y1, y2) is the sum of two terms, both of which are the same in this

case. Then

fY1,Y 2(y1, y2)=2"1

2πσ2e−y1/(2σ2)1

2p1−y2

2πσ2e−y1/(2σ2)1

p1−y2

,0< y1<∞,−1< y2<1.

b. We see in the above expression that the joint pdf factors into a function of y1and a function

of y2. So Y1and Y2are independent. Y1is the square of the distance from (X1, X2) to

the origin. Y2is the cosine of the angle between the positive x1-axis and the line from

(X1, X2) to the origin. So independence says the distance from the origin is independent of

the orientation (as measured by the angle).

4.21 Since Rand θare independent, the joint pdf of T=R2and θis

fT,θ (t, θ) = 1

4πe−t/2,0< t < ∞,0< θ < 2π.

Make the transformation x=√tcos θ,y=√tsin θ. Then t=x2+y2,θ= tan−1(y/x), and

J=

2x2y

−y

x2+y2−x

x2+y2= 2.

Therefore

fX,Y (x, y) = 2

4πe−1

2(x2+y2),0< x2+y2<∞,0<tan−1y/x < 2π.

Thus,

fX,Y (x, y) = 1

2πe−1

2(x2+y2),−∞ < x, y < ∞.

So Xand Yare independent standard normals.

4.23 a. Let y=v,x=u/y =u/v then

J=

∂x

∂u

∂x

∂v

∂y

∂u

∂y

∂v =

v−u

0 1 =1

fU,V (u, v) = Γ(α+β)

Γ(α)Γ(β)

Γ(α+β+γ)

Γ(α+β)Γ(γ)u

vα−11−u

vβ−1vα+β−1(1−v)γ−11

v,0< u < v < 1.

4-6 Solutions Manual for Statistical Inference

Then,

fU(u) = Γ(α+β+γ)

Γ(α)Γ(β)Γ(γ)uα−1Z1

vβ−1(1 −v)γ−1(v−u

v)β−1dv

=Γ(α+β+γ)

Γ(α)Γ(β)Γ(γ)uα−1(1 −u)β+γ−1Z1

yβ−1(1 −y)γ−1dy y=v−u

1−u, dy =dv

1−u

=Γ(α+β+γ)

Γ(α)Γ(β)Γ(γ)uα−1(1 −u)β+γ−1Γ(β)Γ(γ)

Γ(β+γ)

=Γ(α+β+γ)

Γ(α)Γ(β+γ)uα−1(1 −u)β+γ−1,0< u < 1.

Thus, U∼gamma(α, β +γ).

b. Let x=√uv,y=pu

vthen

J=

∂x

∂u

∂x

∂v

∂y

∂u

∂x

∂v =

2v1/2u−1/21

2u1/2v−1/2

2v−1/2u−1/2−1

2u1/2v−3/2=1

2v.

fU,V (u, v) = Γ(α+β+γ)

Γ(α)Γ(β)Γ(γ)(√uvα−1(1 −√uv)β−1ru

vα+β−11−ru

vγ−11

2v.

The set {0< x < 1,0< y < 1}is mapped onto the set {0< u < v < 1

u,0<u<1}. Then,

fU(u)

=Z1/u

fU,V (u, v)dv

=Γ(α+β+γ)

Γ(α)Γ(β)Γ(γ)uα−1(1−u)β+γ−1

| {z }Z1/u

u1−√uv

1−uβ−1 1−pu/v

1−u!γ−1(pu/v)β

2v(1 −u)dv.

Call it A

To simplify, let z=√u/v−u

1−u. Then v=u⇒z= 1, v= 1/u ⇒z= 0 and dz =−√u/v

2(1−u)vdv.

Thus,

fU(u) = AZzβ−1(1 −z)γ−1dz ( kernel of beta(β, γ))

=Γ(α+β+γ)

Γ(α)Γ(β)Γ(γ)uα−1(1 −u)β+γ−1Γ(β)Γ(γ)

Γ(β+γ)

=Γ(α+β+γ)

Γ(α)Γ(β+γ)uα−1(1 −u)β+γ−1,0<u<1.

That is, U∼beta(α, β +γ), as in a).

4.24 Let z1=x+y,z2=x

x+y, then x=z1z2,y=z1(1 −z2) and

|J|=

∂x

∂z1

∂x

∂z2

∂y

∂z1

∂y

∂z2

=

z2z1

1−z2−z1=z1.

The set {x > 0, y > 0}is mapped onto the set {z1>0,0< z2<1}.

fZ1,Z2(z1, z2) = 1

Γ(r)(z1z2)r−1e−z1z2·1

Γ(s)(z1−z1z2)s−1e−z1+z1z2z1

Γ(r+s)zr+s−1

1e−z1·Γ(r+s)

Γ(r)Γ(s)zr−1

2(1 −z2)s−1,0< z1,0< z2<1.

Second Edition 4-7

fZ1,Z2(z1, z2) can be factored into two densities. Therefore Z1and Z2are independent and

Z1∼gamma(r+s, 1), Z2∼beta(r, s).

4.25 For Xand Zindependent, and Y=X+Z,fXY (x, y) = fX(x)fZ(y−x). In Example 4.5.8,

fXY (x, y) = I(0,1)(x)1

10I(0,1/10)(y−x).

In Example 4.5.9, Y=X2+Zand

fXY (x, y) = fX(x)fZ(y−x2) = 1

2I(−1,1)(x)1

10I(0,1/10)(y−x2).

4.26 a.

P(Z≤z, W = 0) = P(min(X, Y )≤z, Y ≤X) = P(Y≤z, Y ≤X)

=Zz

0Z∞

λe−x/λ 1

µe−y/µdxdy

=λ

µ+λ1−exp −1

µ+1

λz.

Similarly,

P(Z≤z,W =1) = P(min(X, Y )≤z, X ≤Y) = P(X≤z, X ≤Y)

=Zz

0Z∞

λe−x/λ 1

µe−y/µdydx =µ

µ+λ1−exp −1

µ+1

λz.

P(W= 0) = P(Y≤X) = Z∞

0Z∞

λe−x/λ 1

µe−y/µdxdy =λ

µ+λ.

P(W=1)=1−P(W= 0) = µ

µ+λ.

P(Z≤z) = P(Z≤z, W = 0) + P(Z≤z, W = 1) = 1 −exp −1

µ+1

λz.

Therefore, P(Z≤z, W =i) = P(Z≤z)P(W=i), for i= 0,1, z > 0. So Zand Ware

independent.

4.27 From Theorem 4.2.14 we know U∼n(µ+γ, 2σ2) and V∼n(µ−γ, 2σ2). It remains to show

that they are independent. Proceed as in Exercise 4.24.

fXY (x, y) = 1

2πσ2e−1

2σ2[(x−µ)2+(y−γ)2](by independence, sofXY =fXfY)

Let u=x+y,v=x−y, then x=1

2(u+v), y=1

2(u−v) and

|J|=

1/2 1/2

1/2−1/2=1

The set {−∞ < x < ∞,−∞ < y < ∞} is mapped onto the set {−∞ < u < ∞,−∞ < v < ∞}.

Therefore

fUV (u, v) = 1

2πσ2e−1

2σ2((u+v

2)−µ)2+((u−v

2)−γ)2·1

4πσ2e−1

2σ2h2(u

2)2−u(µ+γ)+ (µ+γ)2

2+2(v

2)2−v(µ−γ)+ (µ+γ)2

=g(u)1

4πσ2e−1

2(2σ2)(u−(µ+γ))2·h(v)e−1

2(2σ2)(v−(µ−γ))2.

By the factorization theorem, Uand Vare independent.

4-8 Solutions Manual for Statistical Inference

4.29 a. X

Y=Rcos θ

Rsin θ= cot θ. Let Z= cot θ. Let A1= (0, π), g1(θ) = cot θ,g−1

1(z) = cot−1z,

A2= (π, 2π), g2(θ) = cot θ,g−1

2(z) = π+ cot−1z. By Theorem 2.1.8

fZ(z) = 1

2π|−1

1 + z2|+1

2π|−1

1 + z2|=1

1 + z2,−∞ < z < ∞.

b. XY =R2cos θsin θthen 2XY =R22 cos θsin θ=R2sin 2θ. Therefore 2XY

R=Rsin 2θ.

Since R=√X2+Y2then 2XY

√X2+Y2=Rsin 2θ. Thus 2XY

√X2+Y2is distributed as sin 2θwhich

is distributed as sin θ. To see this let sin θ∼fsin θ. For the function sin 2θthe values of

the function sin θare repeated over each of the 2 intervals (0, π) and (π, 2π) . Therefore

the distribution in each of these intervals is the distribution of sin θ. The probability of

choosing between each one of these intervals is 1

2. Thus f2 sin θ=1

2fsin θ+1

2fsin θ=fsin θ.

Therefore 2XY

√X2+Y2has the same distribution as Y= sin θ. In addition, 2XY

√X2+Y2has the

same distribution as X= cos θsince sin θhas the same distribution as cos θ. To see this let

consider the distribution of W= cos θand V= sin θwhere θ∼uniform(0,2π). To derive

the distribution of W= cos θlet A1= (0, π), g1(θ) = cos θ,g−1

1(w) = cos−1w,A2= (π, 2π),

g2(θ) = cos θ,g−1

2(w) = 2π−cos−1w. By Theorem 2.1.8

fW(w) = 1

2π|−1

√1−w2|+1

2π|1

√1−w2|=1

√1−w2,−1≤w≤1.

To derive the distribution of V= sin θ, ﬁrst consider the interval ( π

2,3π

2). Let g1(θ) = sin θ,

4g−1

1(v) = π−sin−1v, then

fV(v) = 1

√1−v2,−1≤v≤1.

Second, consider the set {(0,π

2)∪(3π

2,2π)}, for which the function sin θhas the same values

as it does in the interval ( −π

2,π

2). Therefore the distribution of Vin {(0,π

2)∪(3π

2,2π)}is

the same as the distribution of Vin (−π

2,π

2) which is 1

√1−v2,−1≤v≤1. On (0,2π) each

of the sets (π

2,3π

2), {(0,π

2)∪(3π

2,2π)}has probability 1

2of being chosen. Therefore

fV(v) = 1

√1−v2+1

√1−v2=1

√1−v2,−1≤v≤1.

Thus Wand Vhas the same distribution.

Let Xand Ybe iid n(0,1). Then X2+Y2∼χ2

2is a positive random variable. Therefore

with X=Rcos θand Y=Rsin θ,R=√X2+Y2is a positive random variable and

θ= tan−1(Y

X)∼uniform(0,1). Thus 2XY

√X2+Y2∼X∼n(0,1).

4.30 a.

EY= E {E(Y|X)}= EX=1

VarY= Var (E(Y|X)) + E (Var(Y|X)) = VarX+ EX2=1

12 +1

3=5

12.

EXY = E[E(XY |X)] = E[XE(Y|X)] = EX2=1

Cov(X, Y )=EXY −EXEY=1

3−1

22

12.

b. The quick proof is to note that the distribution of Y|X=xis n(1,1), hence is independent

of X. The bivariate transformation t=y/x,u=xwill also show that the joint density

factors.

Second Edition 4-9

4.31 a.

EY= E{E(Y|X)}= EnX =n

VarY= Var (E(Y|X)) + E (Var(Y|X)) = Var(nX)+EnX(1 −X) = n2

12 +n

P(Y=y, X ≤x) = n

yxy(1 −x)n−y, y = 0,1, . . . , n, 0< x < 1.

P(y=y) = n

yΓ(y+ 1)Γ(n−y+ 1)

Γ(n+ 2) .

4.32 a. The pmf of Y, for y= 0,1, . . ., is

fY(y) = Z∞

fY(y|λ)fΛ(λ)dλ =Z∞

λye−λ

Γ(α)βαλα−1e−λ/βdλ

y!Γ(α)βαZ∞

λ(y+α)−1exp 





−λ

β

1+β





dλ

y!Γ(α)βαΓ(y+α)β

1+βy+α

If αis a positive integer,

fY(y) = y+α−1

y β

1+βy1

1+βα

the negative binomial(α, 1/(1 + β)) pmf. Then

EY= E(E(Y|Λ)) = EΛ = αβ

VarY= Var(E(Y|Λ)) + E(Var(Y|Λ)) = VarΛ + EΛ = αβ2+αβ =αβ(β+ 1).

b. For y= 0,1, . . ., we have

P(Y=y|λ) = ∞

n=y

P(Y=y|N=n, λ)P(N=n|λ)

=∞

n=yn

ypy(1 −p)n−ye−λλn

=∞

n=y

y!(n−y)! p

1−py

[(1 −p)λ]ne−λ

=e−λ∞

m=0

y!m!p

1−py

[(1 −p)λ]m+y(let m=n−y)

=e−λ

y!p

1−py

[(1 −p)λ]y"∞

m=0

[(1−p)λ]m

m!#

=e−λ(pλ)ye(1−p)λ

=(pλ)ye−pλ

y!,

4-10 Solutions Manual for Statistical Inference

the Poisson(pλ) pmf. Thus Y|Λ∼Poisson(pλ). Now calculations like those in a) yield the

pmf of Y, for y= 0,1, . . ., is

fY(y) = 1

Γ(α)y!(pβ)αΓ(y+α)pβ

1+pβ y+α

Again, if αis a positive integer, Y∼negative binomial(α, 1/(1 + pβ)).

4.33 We can show that Hhas a negative binomial distribution by computing the mgf of H.

EeHt = EE eHtN= EE e(X1+···+XN)tN= E nEeX1tNNo,

because, by Theorem 4.6.7, the mgf of a sum of independent random variables is equal to the

product of the individual mgfs. Now,

EeX1t=∞

x1=1

ex1t−1

logp

(1 −p)x1

=−1

logp

∞

x1=1

(et(1 −p))x1

=−1

logp−log 1−et(1 −p).

Then

Elog {1−et(1 −p)}

logpN

=∞

n=0 log {1−et(1 −p)}

logpne−λλn

n!(since N∼Poisson)

=e−λeλlog(1−et(1−p))

logp

∞

n=0

−λlog(1−et(1−p))

logpλlog(1−et(1−p))

logpn

n!.

The sum equals 1. It is the sum of a Poisson[λlog(1 −et(1 −p))]/[logp]pmf. Therefore,

E(eHt) = e−λhelog(1−et(1−p))iλ/ log p=elogp−λ/ logp1

1−et(1 −p)−λ/ log p

=p

1−et(1 −p)−λ/ logp

This is the mgf of a negative binomial(r, p), with r=−λ/ log p, if ris an integer.

4.34 a.

P(Y=y) = Z1

P(Y=y|p)fp(p)dp

=Z1

0n

ypy(1 −p)n−y1

B(α, β)pα−1(1 −p)β−1dp

=n

yΓ(α+β)

Γ(α)Γ(β)Z1

py+α−1(1 −p)n+β−y−1dp

=n

yΓ(α+β)

Γ(α)Γ(β)

Γ(y+α)Γ(n+β−y)

Γ(α+n+β), y = 0,1, . . . , n.

P(X=x) = Z1

P(X=x|p)fP(p)dp

=Z1

0r+x−1

xpr(1 −p)xΓ(α+β)

Γ(α)Γ(β)pα−1(1 −p)β−1dp

Second Edition 4-11

=r+x−1

xΓ(α+β)

Γ(α)Γ(β)Z1

p(r+α)−1(1 −p)(x+β)−1dp

=r+x−1

xΓ(α+β)

Γ(α)Γ(β)

Γ(r+α)Γ(x+β)

Γ(r+x+α+β)x= 0,1, . . .

Therefore,

EX= E[E(X|P)] = E r(1 −P)

P=rβ

α−1,

since

E1−P

P=Z1

01−P

PΓ(α+β)

Γ(α)Γ(β)pα−1(1 −p)β−1dp

=Γ(α+β)

Γ(α)Γ(β)Z1

p(α−1)−1(1 −p)(β+1)−1dp =Γ(α+β)

Γ(α)Γ(β)

Γ(α−1)Γ(β+ 1)

Γ(α+β)

=β

α−1.

Var(X) = E(Var(X|P)) + Var(E(X|P)) = E r(1 −P)

P2+ Var r(1 −P)

P

=r(β+ 1)(α+β)

α(α−1) +r2β(α+β−1)

(α−1)2(α−2),

since

E1−P

P2=Z1

Γ(α+β)

Γ(α)Γ(β)p(α−2)−1(1 −p)(β+1)−1dp =Γ(α+β)

Γ(α)Γ(β)

Γ(α−2)Γ(β+ 1)

Γ(α+β−1)

=(β+ 1)(α+β)

α(α−1)

and

Var 1−P

P= E "1−P

P2#−E1−P

P2

=β(β+ 1)

(α−2)(α−1) −(β

α−1)2

=β(α+β−1)

(α−1)2(α−2),

where

E"1−P

P2#=Z1

Γ(α+β)

Γ(α)Γ(β)p(α−2)−1(1 −p)(β+2)−1dp

=Γ(α+β)

Γ(α)Γ(β)

Γ(α−2)Γ(β+ 2)

Γ(α−2 + β+ 2) =β(β+ 1)

(α−2)(α−1).

4.35 a. Var(X) = E(Var(X|P)) + Var(E(X|P)). Therefore,

Var(X) = E[nP (1 −P)] + Var(nP )

=nαβ

(α+β)(α+β+ 1) +n2VarP

=nαβ(α+β+ 1 −1)

(α+β2)(α+β+ 1) +n2VarP

4-12 Solutions Manual for Statistical Inference

=nαβ(α+β+ 1)

(α+β2)(α+β+ 1) −nαβ

(α+β2)(α+β+ 1) +n2VarP

=nα

α+β

α+β−nVarP+n2VarP

=nEP(1 −EP) + n(n−1)VarP.

b. Var(Y) = E(Var(Y|Λ)) + Var(E(Y|Λ)) = EΛ + Var(Λ) = µ+1

αµ2since EΛ = µ=αβ and

Var(Λ) = αβ2=(αβ)2

α=µ2

α. The “extra-Poisson” variation is 1

αµ2.

4.37 a. Let Y=PXi.

P(Y=k) = P(Y=k, 1

2< c =1

2(1 + p)<1)

=Z1

(Y=k|c=1

2(1 + p))P(P=p)dp

=Z1

0n

k[1

2(1 + p)]k[1 −1

2(1 + p)]n−kΓ(a+b)

Γ(a)Γ(b)pa−1(1 −p)b−1dp

=Z1

0n

k(1 + p)k

(1 −p)n−k

2n−k

Γ(a+b)

Γ(a)Γ(b)pa−1(1 −p)b−1dp

=n

kΓ(a+b)

2nΓ(a)Γ(b)

j=0 Z1

pk+a−1(1 −p)n−k+b−1dp

=n

kΓ(a+b)

2nΓ(a)Γ(b)

j=0 k

jΓ(k+a)Γ(n−k+b)

Γ(n+a+b)

j=0 " k

j

2n!n

kΓ(a+b)

Γ(a)Γ(b)

Γ(k+a)Γ(n−k+b)

Γ(n+a+b)#.

A mixture of beta-binomial.

EY= E(E(Y|c)) = E[nc] = E n1

2(1 + p)=n

21 + a

a+b.

Using the results in Exercise 4.35(a),

Var(Y) = nEC(1 −EC) + n(n−1)VarC.

Therefore,

Var(Y) = nE1

2(1 + P)1−E1

2(1 + P)+n(n−1)Var 1

2(1 + P)

4(1 + EP)(1 −EP) + n(n−1)

4VarP

4 1−a

a+b2!+n(n−1)

(a+b)2(a+b+ 1).

4.38 a. Make the transformation u=x

ν−x

λ,du =−x

ν2dν,ν

λ−ν=x

λu . Then

Zλ

νe−x/ν 1

Γ(r)Γ(1 −r)

νr−1

(λ−ν)rdν

Second Edition 4-13

Γ(r)Γ(1 −r)Z∞

xx

λure−(u+x/λ)du

=xr−1e−x/λ

λrΓ(r)Γ(1 −r)Z∞

01

ur

e−udu =xr−1e−x/λ

Γ(r)λr,

since the integral is equal to Γ(1 −r) if r < 1.

b. Use the transformation t=ν/λ to get

Zλ

pλ(ν)dν =1

Γ(r)Γ(1 −r)Zλ

νr−1(λ−ν)−rdν =1

Γ(r)Γ(1 −r)Z1

tr−1(1 −t)−rdt = 1,

since this is a beta(r, 1−r).

dx log f(x) = d

dx log 1

Γ(r)λr+(r−1) log x−x/λ=r−1

x−1

λ>0

for some x, if r > 1. But,

dx log Z∞

e−x/ν

νqλ(ν)dν=−R∞

ν2e−x/ν qλ(ν)dν

R∞

νe−x/ν qλ(ν)dν <0∀x.

4.39 a. Without loss of generality lets assume that i<j. From the discussion in the text we have

that

f(x1, . . . , xj−1, xj+1, . . . , xn|xj)

=(m−xj)!

x1!·····xj−1!·xj+1!·····xn!

×p1

1−pjx1

·····pj−1

1−pjxj−1pj+1

1−pjxj+1

·····pn

1−pjxn

Then,

f(xi|xj)

(x1,...,xi−1,xi+1,...,xj−1,xj+1,...,xn)

f(x1, . . . , xj−1, xj+1, . . . , xn|xj)

(xk6=xi,xj)

(m−xj)!

x1!·····xj−1!·xj+1!·····xn!

×(p1

1−pj

)x1·····(pj−1

1−pj

)xj−1(pj+1

1−pj

)xj+1 ·····(pn

1−pj

)xn

(m−xi−xj)! 1−pi

1−pjm−xi−xj

(m−xi−xj)! 1−pi

1−pjm−xi−xj

=(m−xj)!

xi!(m−xi−xj)!(pi

1−pj

)xi1−pi

1−pjm−xi−xj

×X

(xk6=xi,xj)

(m−xi−xj)!

x1!·····xi−1!, xi+1!·····xj−1!, xj+1!·····xn!

×(p1

1−pj−pi

)x1·····(pi−1

1−pj−pi

)xi−1(pi+1

1−pj−pi

)xi+1

4-14 Solutions Manual for Statistical Inference

×(pj−1

1−pj−pi

)xj−1(pj+1

1−pj−pi

)xj+1 ·····(pn

1−pj−pi

)xn

=(m−xj)!

xi!(m−xi−xj)!(pi

1−pj

)xi1−pi

1−pjm−xi−xj

Thus Xi|Xj=xj∼binomial(m−xj,pi

1−pj).

f(xi, xj) = f(xi|xj)f(xj) = m!

xi!xj!(m−xj−xi)!pxi

ipxj

j(1 −pj−pi)m−xj−xi.

Using this result it can be shown that Xi+Xj∼binomial(m, pi+pj). Therefore,

Var(Xi+Xj) = m(pi+pj)(1 −pi−pj).

By Theorem 4.5.6 Var(Xi+Xj) = Var(Xi) + Var(Xj) + 2Cov(Xi, Xj). Therefore,

Cov(Xi, Xj) = 1

2[m(pi+pj)(1−pi−pj)−mpi(1−pi)−mpi(1−pi)] = 1

2(−2mpipj) = −mpipj.

4.41 Let abe a constant. Cov(a, X) = E(aX)−EaEX=aEX−aEX= 0.

4.42

ρXY,Y =Cov(XY, Y )

σXY σY

=E(XY 2)−µXY µY

σXY σY

=EXEY2−µXµYµY

σXY σY

where the last step follows from the independence of X and Y. Now compute

σ2

XY = E(XY )2−[E(XY )]2= EX2EY2−(EX)2(EY)2

= (σ2

X+µ2

X)(σ2

Y+µ2

Y)−µ2

Xµ2

Y=σ2

Xσ2

Y+σ2

Xµ2

Y+σ2

Yµ2

Therefore,

ρXY,Y =µX(σ2

Y+µ2

Y)−µXµ2

(σ2

Xσ2

Y+σ2

Xµ2

Y+σ2

Yµ2

X)1/2σY

=µXσY

(µ2

Xσ2

Y+µ2

Yσ2

X+σ2

Xσ2

Y)1/2.

4.43

Cov(X1+X2, X2+X3) = E(X1+X2)(X2+X3)−E(X1+X2)E(X2+X3)

= (4µ2+σ2)−4µ2=σ2

Cov(X1+X2)(X1−X2) = E(X1+X2)(X1−X2)=EX2

1−X2

2= 0.

4.44 Let µi= E(Xi). Then

Var n

i=1

Xi!= Var (X1+X2+··· +Xn)

= E [(X1+X2+··· +Xn)−(µ1+µ2+··· +µn)]2

= E [(X1−µ1)+(X2−µ2) + ··· + (Xn−µn)]2

i=1

E(Xi−µi)2+ 2 X

1≤i<j≤n

E(Xi−µi)(Xj−µj)

i=1

VarXi+ 2 X

1≤i<j≤n

Cov(Xi, Xj).

Second Edition 4-15

4.45 a. We will compute the marginal of X. The calculation for Yis similar. Start with

fXY (x, y) = 1

2πσXσYp1−ρ2

×exp "−1

2(1−ρ2)(x−µX

σX2

−2ρx−µX

σXy−µY

σY+y−µY

σY2)#

and compute

fX(x) = Z∞

−∞

fXY (x, y)dy =Z∞

−∞

2πσXσYp1−ρ2e−1

2(1−ρ2)(ω2−2ρωz+z2)σYdz,

where we make the substitution z=y−µY

σY,dy =σYdz,ω=x−µX

σX. Now the part of the

exponent involving ω2can be removed from the integral, and we complete the square in z

to get

fX(x) = e−ω2

2(1−ρ2)

2πσXp1−ρ2Z∞

−∞

e−1

2(1−ρ2)[(z2−2ρωz+ρ2ω2)−ρ2ω2]dz

=e−ω2/2(1−ρ2)eρ2ω2/2(1−ρ2)

2πσXp1−ρ2Z∞

−∞

e−1

2(1−ρ2)(z−ρω)2

dz.

The integrand is the kernel of normal pdf with σ2= (1 −ρ2), and µ=ρω, so it integrates

to √2πp1−ρ2. Also note that e−ω2/2(1−ρ2)eρ2ω2/2(1−ρ2)=e−ω2/2. Thus,

fX(x) = e−ω2/2

2πσXp1−ρ2√2πp1−ρ2=1

√2πσX

e−1

2x−µX

σX2

the pdf of n(µX, σ2

X).

fY|X(y|x)

2πσXσY√1−ρ2e−1

2(1−ρ2)hx−µX

σX2−2ρx−µX

σXy−µY

σY+y−µY

σY2i

√2πσXe−1

2σ2

(x−µX)2

√2πσYp1−ρ2e−1

2(1−ρ2)hx−µX

σX2−(1−ρ2)x−µX

σX2−2ρx−µX

σXy−µY

σY+y−µY

σY2i

√2πσYp1−ρ2e−1

2(1−ρ2)hρ2x−µX

σX2−2ρx−µX

σXy−µY

σY+y−µY

σY2i

√2πσYp1−ρ2e−1

2σ2

Y√(1−ρ2(y−µY)−ρσY

σX(x−µX)2

which is the pdf of n(µY−ρ(σY/σX)(x−µX), σYp1−ρ2.

c. The mean is easy to check,

E(aX +bY ) = aEX+bEY=aµX+bµY,

4-16 Solutions Manual for Statistical Inference

as is the variance,

Var(aX +bY ) = a2VarX+b2VarY+ 2abCov(X, Y ) = a2σ2

X+b2σ2

Y+ 2abρσXσY.

To show that aX +bY is normal we have to do a bivariate transform. One possibility is

U=aX +bY ,V=Y, then get fU,V (u, v) and show that fU(u) is normal. We will do this in

the standard case. Make the indicated transformation and write x=1

a(u−bv), y=vand

obtain

|J|=

1/a −b/a

0 1 =1

Then

fUV (u, v) = 1

2πap1−ρ2e−1

2(1−ρ2)[1

a(u−bv)]2−2ρ

a(u−bv)+v2.

Now factor the exponent to get a square in u. The result is

−1

2(1−ρ2)b2+ 2ρab +a2

a2 u2

b2+ 2ρab +a2−2b+aρ

b2+ 2ρab +a2uv +v2.

Note that this is joint bivariate normal form since µU=µV= 0, σ2

v= 1, σ2

u=a2+b2+ 2abρ

and

ρ∗=Cov(U, V )

σUσV

=E(aXY +bY 2)

σUσV

=aρ +b

pa2+b2+abρ,

thus

(1 −ρ∗2) = 1 −a2ρ2+abρ +b2

a2+b2+ 2abρ =(1−ρ2)a2

a2+b2+ 2abρ =(1 −ρ2)a2

σ2

where ap1−ρ2=σUp1−ρ∗2. We can then write

fUV (u, v) = 1

2πσUσVp1−ρ∗2exp "−1

2p1−ρ∗2u2

σ2

U−2ρuv

σUσV

+v2

σ2

V#,

which is in the exact form of a bivariate normal distribution. Thus, by part a), Uis normal.

4.46 a.

EX=aXEZ1+bXEZ2+ EcX=aX0 + bX0 + cX=cX

VarX=a2

XVarZ1+b2

XVarZ2+ VarcX=a2

X+b2

EY=aY0 + bY0 + cY=cY

VarY=a2

YVarZ1+b2

YVarZ2+ VarcY=a2

Y+b2

Cov(X, Y )=EXY −EX·EY

= E[(aXaYZ2

1+bXbYZ2

2+cXcY+aXbYZ1Z2+aXcYZ1+bXaYZ2Z1

+bXcYZ2+cXaYZ1+cXbYZ2)−cXcY]

=aXaY+bXbY,

since EZ2

1= EZ2

2= 1, and expectations of other terms are all zero.

b. Simply plug the expressions for aX,bX, etc. into the equalities in a) and simplify.

c. Let D=aXbY−aYbX=−p1−ρ2σXσYand solve for Z1and Z2,

Z1=bY(X−cX)−bX(Y−cY)

D=σY(X−µX)+σX(Y−µY)

p2(1+ρ)σXσY

Z2=σY(X−µX)+σX(Y−µY)

p2(1−ρ)σXσY

Second Edition 4-17

Then the Jacobian is

J= ∂z1

∂x1

∂z1

∂y

∂z2

∂x

∂z2

∂y !=bY

D−bX

−aY

D=aXbY

D2−aYbX

D2=1

D=1

−p1−ρ2σXσY

and we have that

fX,Y (x, y) = 1

√2πe−1

(σY(x−µX)+σX(y−µY))2

2(1+ρ)σ2

Xσ2

√2πe−1

(σY(x−µX)+σX(y−µY))2

2(1−ρ)σ2

Xσ2

p1−ρ2σXσY

= (2πσXσYp1−ρ2)−1exp −1

2(1 −ρ2)x−µX

σX2!

−2ρx−µX

σXy−µY

σY+y−µY

σY2

,−∞ < x < ∞,−∞ < y < ∞,

a bivariate normal pdf.

d. Another solution is

aX=ρσXbX=p(1 −ρ2)σX

aY=σYbY= 0

cX=µX

cY=µY.

There are an inﬁnite number of solutions. Write bX=±pσ2

X−a2

X,bY=±pσ2

Y−a2

Y, and

substitute bX,bYinto aXaY=ρσXσY. We get

aXaY+±qσ2

X−a2

X±qσ2

Y−a2

Y=ρσXσY.

Square both sides and simplify to get

(1 −ρ2)σ2

Xσ2

Y=σ2

Xa2

Y−2ρσXσYaXaY+σ2

Ya2

This is an ellipse for ρ6=±1, a line for ρ=±1. In either case there are an inﬁnite number

of points satisfying the equations.

4.47 a. By deﬁnition of Z, for z < 0,

P(Z≤z) = P(X≤zand XY > 0) + P(−X≤zand XY < 0)

=P(X≤zand Y < 0) + P(X≥ −zand Y < 0) (since z < 0)

=P(X≤z)P(Y < 0) + P(X≥ −z)P(Y < 0) (independence)

=P(X≤z)P(Y < 0) + P(X≤z)P(Y > 0) (symmetry of Xand Y)

=P(X≤z)(P(Y < 0) + P(Y > 0))

=P(X≤z).

By a similar argument, for z > 0, we get P(Z > z) = P(X > z), and hence, P(Z≤z) =

P(X≤z). Thus, Z∼X∼n(0,1).

b. By deﬁnition of Z,Z > 0⇔either (i)X < 0 and Y > 0 or (ii)X > 0 and Y > 0. So Zand

Yalways have the same sign, hence they cannot be bivariate normal.

4-18 Solutions Manual for Statistical Inference

4.49 a.

fX(x) = Z(af1(x)g1(y) + (1 −a)f2(x)g2(y))dy

=af1(x)Zg1(y)dy + (1 −a)f2(x)Zg2(y)dy

=af1(x) + (1 −a)f2(x).

fY(y) = Z(af1(x)g1(y) + (1 −a)f2(x)g2(y))dx

=ag1(y)Zf1(x)dx + (1 −a)g2(y)Zf2(x)dx

=ag1(y) + (1 −a)g2(y).

b. (⇒) If Xand Yare independent then f(x, y) = fX(x)fY(y). Then,

f(x, y)−fX(x)fY(y)

=af1(x)g1(y) + (1 −a)f2(x)g2(y)

−[af1(x) + (1 −a)f2(x)][ag1(y) + (1 −a)g2(y)]

=a(1 −a)[f1(x)g1(y)−f1(x)g2(y)−f2(x)g1(y) + f2(x)g2(y)]

=a(1 −a)[f1(x)−f2(x)][g1(y)−g2(y)]

= 0.

Thus [f1(x)−f2(x)][g1(y)−g2(y)] = 0 since 0 < a < 1.

(⇐) if [f1(x)−f2(x)][g1(y)−g2(y)] = 0 then

f1(x)g1(y) + f2(x)g2(y) = f1(x)g2(y) + f2(x)g1(y).

Therefore

fX(x)fY(y)

=a2f1(x)g1(y) + a(1 −a)f1(x)g2(y) + a(1 −a)f2(x)g1(y) + (1 −a)2f2(x)g2(y)

=a2f1(x)g1(y) + a(1 −a)[f1(x)g2(y) + f2(x)g1(y)] + (1 −a)2f2(x)g2(y)

=a2f1(x)g1(y) + a(1 −a)[f1(x)g1(y) + f2(x)g2(y)] + (1 −a)2f2(x)g2(y)

=af1(x)g1(y) + (1 −a)f2(x)g2(y) = f(x, y).

Thus Xand Yare independent.

Cov(X, Y ) = aµ1ξ1+ (1 −a)µ2ξ2−[aµ1+ (1 −a)µ2][aξ1+ (1 −a)ξ2]

=a(1 −a)[µ1ξ1−µ1ξ2−µ2ξ1+µ2ξ2]

=a(1 −a)[µ1−µ2][ξ1−ξ2].

To construct dependent uncorrelated random variables let (X, Y )∼af1(x)g1(y) + (1 −

a)f2(x)g2(y) where f1,f2,g1,g2are such that f1−f26= 0 and g1−g26= 0 with µ1=µ2or

ξ1=ξ2.

d. (i) f1∼binomial(n, p), f2∼binomial(n, p), g1∼binomial(n, p), g2∼binomial(n, 1−p).

(ii) f1∼binomial(n, p1), f2∼binomial(n, p2), g1∼binomial(n, p1), g2∼binomial(n, p2).

(iii) f1∼binomial(n1,p

n1), f2∼binomial(n2,p

n2), g1∼binomial(n1, p), g2∼binomial(n2, p).

Second Edition 4-19

4.51 a.

P(X/Y ≤t) = 1

2t t > 1

2+ (1 −t)t≤1

P(XY ≤t) = t−tlog t0< t < 1.

P(XY/Z ≤t) = Z1

P(XY ≤zt)dz

=(R1

0zt

2+ (1 −zt)dz if t≤1

0zt

2+ (1 −zt)dz +R1

2zt dz if t≤1

=1−t/4 if t≤1

t−1

4t+1

2tlog tif t > 1.

4.53

P(Real Roots) = P(B2>4AC)

=P(2 log B > log 4 + log A+ log C)

=P(−2 log B≤ −log 4 −log A−log C)

=P(−2 log B≤ −log 4 + (−log A−log C)) .

Let X=−2 log B,Y=−log A−log C. Then X∼exponential(2), Y∼gamma(2,1), indepen-

dent, and

P(Real Roots) = P(X < −log 4 + Y)

=Z∞

log 4

P(X < −log 4 + y)fY(y)dy

=Z∞

log 4 Z−log 4+y

2e−x/2dxye−ydy

=Z∞

log 4 1−e−1

2log 4e−y/2ye−ydy.

Integration-by-parts will show that R∞

aye−y/b =b(a+b)e−a/b and hence

P(Real Roots) = 1

4(1 + log 4) −1

24 2

3+ log 4=.511.

4.54 Let Y=Qn

i=1 Xi. Then P(Y≤y) = P(Qn

i=1 Xi≤y) = P(Pn

i=1 −log Xi≥ −log y). Now,

−log Xi∼exponential(1) = gamma(1,1). By Example 4.6.8, Pn

i=1 −log Xi∼gamma(n, 1).

Therefore,

P(Y≤y) = Z∞

−log y

Γ(n)zn−1e−zdz,

and

fY(y) = d

dy Z∞

−log y

Γ(n)zn−1e−zdz

=−1

Γ(n)(−log y)n−1e−(−log y)d

dy (−log y)

Γ(n)(−log y)n−1,0< y < 1.

4-20 Solutions Manual for Statistical Inference

4.55 Let X1,X2,X3be independent exponential(λ) random variables, and let Y= max(X1, X2, X3),

the lifetime of the system. Then

P(Y≤y) = P(max(X1, X2, X3)≤y)

=P(X1≤yand X2≤yand X3≤y)

=P(X1≤y)P(X2≤y)P(X3≤y).

by the independence of X1,X2and X3. Now each probability is P(X1≤y) = Ry

λe−x/λdx =

1−e−y/λ, so

P(Y≤y) = 1−e−y/λ3,0< y < ∞,

and the pdf is

fY(y) = 31−e−y/λ2e−y/λ y > 0

0y≤0.

4.57 a.

A1= [ 1

x=1

i]1

1=1

x=1

xi,the arithmetic mean.

A−1= [ 1

x=1

x−1

i]−1=1

n(1

x1+··· +1

xn),the harmonic mean.

lim

r→0log Ar= lim

r→0log[ 1

x=1

i]1

r= lim

r→0

rlog[ 1

x=1

i] = lim

r→0

nPn

i=1 rxr−1

nPn

i=1 xr

= lim

r→0

nPn

i=1 xr

ilog xi

nPn

i=1 xr

i=1

log xi=1

nlog(

i=1

xi).

Thus A0= limr→0Ar= exp( 1

nlog(Qn

i=1 xi)) = (Qn

i=1 xi)1

n, the geometric mean. The term

rxr−1

i=xr

ilog xisince rxr−1

i=d

dr xr

i=d

dr exp(rlog xi) = exp(rlog xi) log xi=xr

ilog xi.

b. (i) if log Aris nondecreasing then for r≤r0log Ar≤log Ar0, then elog Ar≤elog Ar0. Therefore

Ar≤Ar0. Thus Aris nondecreasing in r.

(ii) d

dr log Ar=−1

r2log( 1

nPn

x=1 xr

i) + 1

nPn

i=1 rxr−1

nPn

i=1 xr

r2rPn

i=1 xr

ilog xi

x=1 xr

i−log( 1

nPn

x=1 xr

i),

where we use the identity for rxr−1

ishowed in a).

(iii)

rPn

i=1 xr

ilog xi

x=1 xr

i−log( 1

x=1

= log(n) + rPn

i=1 xr

ilog xi

x=1 xr

i−log(

x=1

= log(n) +

i=1 "xr

i=1 xr

rlog xi−xr

i=1 xr

log(

x=1

i)#

= log(n) +

i=1 "xr

i=1 xr

(rlog xi−log(

x=1

i))#

= log(n)−

i=1

i=1 xr

log( Pn

x=1 xr

) = log(n)−

i=1

ailog( 1

Second Edition 4-21

We need to prove that log(n)≥Pn

i=1 ailog( 1

ai). Using Jensen inequality we have that

E log( 1

a) = Pn

i=1 ailog( 1

ai)≤log(E 1

a) = log(Pn

i=1 ai1

ai) = log(n) which establish the

result.

4.59 Assume that EX= 0, EY= 0, and EZ= 0. This can be done without loss of generality

because we could work with the quantities X−EX, etc. By iterating the expectation we have

Cov(X, Y ) = EXY = E[E(XY |Z)].

Adding and subtracting E(X|Z)E(Y|Z) gives

Cov(X, Y ) = E[E(XY |Z)−E(X|Z)E(Y|Z)] + E[E(X|Z)E(Y|Z)].

Since E[E(X|Z)] = EX= 0, the second term above is Cov[E(X|Z)E(Y|Z)]. For the ﬁrst term

write

E[E(XY |Z)−E(X|Z)E(Y|Z)] = E [E {XY −E(X|Z)E(Y|Z)|Z}]

where we have brought E(X|Z) and E(Y|Z) inside the conditional expectation. This can now

be recognized as ECov(X, Y |Z), establishing the identity.

4.61 a. To ﬁnd the distribution of f(X1|Z), let U=X2−1

X1and V=X1. Then x2=h1(u, v) = uv+1,

x1=h2(u, v) = v. Therefore

fU,V (u, v) = fX,Y (h1(u, v), h2(u, v))|J|=e−(uv+1)e−vv,

and

fU(u) = Z∞

ve−(uv+1)e−vdv =e−1

(u+ 1)2.

Thus V|U= 0 has distribution ve−v. The distribution of X1|X2is e−x1since X1and X2

are independent.

b. The following Mathematica code will draw the picture; the solid lines are B1and the dashed

lines are B2. Note that the solid lines increase with x1, while the dashed lines are constant.

Thus B1is informative, as the range of X2changes.

e = 1/10;

Plot[{-e*x1 + 1, e*x1 + 1, 1 - e, 1 + e}, {x1, 0, 5},

PlotStyle -> {Dashing[{}], Dashing[{}],Dashing[{0.15, 0.05}],

Dashing[{0.15, 0.05}]}]

P(X1≤x|B1) = P(V≤v∗| −  < U < ) = Rv∗

0R

−ve−(uv+1)e−vdudv

R∞

0R

−ve−(uv+1)e−vdudv

e−1he−v∗(1+)

1+−1

1+−e−v∗(1−)

1−+1

1−i

e−1h−1

1++1

1−i.

Thus lim→0P(X1≤x|B1) = 1 −e−v∗−v∗e−v∗=Rv∗

0ve−vdv =P(V≤v∗|U= 0).

P(X1≤x|B2) = Rx

0R1+

0e−(x1+x2)dx2dx1

R1+

0e−x2dx2

=e−(x+1+)−e−(1+)−e−x+ 1

1−e−(1+).

Thus lim→0P(X1≤x|B2) = 1 −ex=Rx

0ex1dx1=P(X1≤x|X2= 1).

4-22 Solutions Manual for Statistical Inference

4.63 Since X=eZand g(z) = ezis convex, by Jensen’s Inequality EX= Eg(Z)≥g(EZ) = e0= 1.

In fact, there is equality in Jensen’s Inequality if and only if there is an interval Iwith P(Z∈

I) = 1 and g(z) is linear on I. But ezis linear on an interval only if the interval is a single

point. So EX > 1, unless P(Z= EZ= 0) = 1.

4.64 a. Let aand bbe real numbers. Then,

|a+b|2= (a+b)(a+b) = a2+ 2ab +b2≤ |a|2+ 2|ab|+|b|2= (|a|+|b|)2.

Take the square root of both sides to get |a+b| ≤ |a|+|b|.

b. |X+Y| ≤ |X|+|Y| ⇒ E|X+Y| ≤ E(|X|+|Y|) = E|X|+ E|Y|.

4.65 Without loss of generality let us assume that Eg(X) = Eh(X) = 0. For part (a)

E(g(X)h(X)) = Z∞

−∞

g(x)h(x)fX(x)dx

=Z{x:h(x)≤0}

g(x)h(x)fX(x)dx +Z{x:h(x)≥0}

g(x)h(x)fX(x)dx

≤g(x0)Z{x:h(x)≤0}

h(x)fX(x)dx +g(x0)Z{x:h(x)≥0}

h(x)fX(x)dx

=Z∞

−∞

h(x)fX(x)dx

=g(x0)Eh(X) = 0.

where x0is the number such that h(x0) = 0. Note that g(x0) is a maximum in {x:h(x)≤0}

and a minimum in {x:h(x)≥0}since g(x) is nondecreasing. For part (b) where g(x) and

h(x) are both nondecreasing

E(g(X)h(X)) = Z∞

−∞

g(x)h(x)fX(x)dx

=Z{x:h(x)≤0}

g(x)h(x)fX(x)dx +Z{x:h(x)≥0}

g(x)h(x)fX(x)dx

≥g(x0)Z{x:h(x)≤0}

h(x)fX(x)dx +g(x0)Z{x:h(x)≥0}

h(x)fX(x)dx

=Z∞

−∞

h(x)fX(x)dx

=g(x0)Eh(X) = 0.

The case when g(x) and h(x) are both nonincreasing can be proved similarly.

Chapter 5

Properties of a Random Sample

5.1 Let X= # color blind people in a sample of size n. Then X∼binomial(n, p), where p=.01.

The probability that a sample contains a color blind person is P(X > 0) = 1 −P(X= 0),

where P(X= 0) = n

0(.01)0(.99)n=.99n. Thus,

P(X > 0) = 1 −.99n> .95 ⇔n > log(.05)/log(.99) ≈299.

5.3 Note that Yi∼Bernoulli with pi=P(Xi≥µ) = 1 −F(µ) for each i. Since the Yi’s are iid

Bernoulli, Pn

i=1 Yi∼binomial(n, p = 1 −F(µ)).

5.5 Let Y=X1+··· +Xn. Then ¯

X= (1/n)Y, a scale transformation. Therefore the pdf of ¯

Xis

f¯

X(x) = 1

1/n fYx

1/n =nfY(nx).

5.6 a. For Z=X−Y, set W=X. Then Y=W−Z,X=W, and |J|=

0 1

−1 1 = 1.Then

fZ,W (z, w) = fX(w)fY(w−z)·1, thus fZ(z) = R∞

−∞ fX(w)fY(w−z)dw.

b. For Z=XY , set W=X. Then Y=Z/W and |J|=

0 1

1/w −z/w2=−1/w. Then

fZ,W (z, w) = fX(w)fY(z/w)· |−1/w|, thus fZ(z) = R∞

−∞ |−1/w|fX(w)fY(z/w)dw.

c. For Z=X/Y , set W=X. Then Y=W/Z and |J|=

0 1

−w/z21/z =w/z2. Then

fZ,W (z, w) = fX(w)fY(w/z)· |w/z2|, thus fZ(z) = R∞

−∞ |w/z2|fX(w)fY(w/z)dw.

5.7 It is, perhaps, easiest to recover the constants by doing the integrations. We have

Z∞

−∞

1+ ω

σ2dω =σπB, Z∞

−∞

1+ ω−z

τ2dω =τπD

and

Z∞

−∞ "Aω

1+ ω

σ2−Cω

1+ ω−z

τ2#dω

=Z∞

−∞ "Aω

1+ ω

σ2−C(ω−z)

1+ ω−z

τ2#dω −Cz Z∞

−∞

1+ ω−z

τ2dω

=Aσ2

2log 1+ ω

σ2−Cτ 2

2log "1+ ω−z

τ2#

∞

−∞ −τπCz.

The integral is ﬁnite and equal to zero if A=M2

σ2,C=M2

τ2for some constant M. Hence

fZ(z) = 1

π2στ σπB−τπD−2πMz

τ=1

π(σ+τ)

1+ (z/(σ+τ))2,

if B=τ

σ+τ,D=σ

σ+τ),M=−στ2

2z(σ+τ)

1+(z

σ+τ)2.

5-2 Solutions Manual for Statistical Inference

5.8 a.

2n(n−1)

i=1

j=1

(Xi−Xj)2

2n(n−1)

i=1

j=1

(Xi−¯

X+¯

X−Xj)2

2n(n−1)

i=1

j=1 h(Xi−¯

X)2−2(Xi−¯

X)(Xj−¯

X)+(Xj−¯

X)2i

2n(n−1)







i=1

n(Xi−¯

X)2−2

i=1

(Xi−¯

j=1

(Xj−¯

| {z }

j=1

(Xj−¯

X)2





2n(n−1)

i=1

(Xi−¯

X)2+n

2n(n−1)

j=1

(Xj−¯

X)2

n−1

i=1

(Xi−¯

X)2=S2.

b. Although all of the calculations here are straightforward, there is a tedious amount of book-

keeping needed. It seems that induction is the easiest route. (Note: Without loss of generality

we can assume θ1= 0, so EXi= 0.)

(i) Prove the equation for n= 4. We have S2=1

24 P4

i=1 P4

j=1(Xi−Xj)2, and to calculate

Var(S2) we need to calculate E(S2)2and E(S2). The latter expectation is straightforward

and we get E(S2) = 24θ2. The expected value E(S2)2= E(S4) contains 256(= 44) terms

of which 112(= 4 ×16 + 4 ×16 −42) are zero, whenever i=j. Of the remaining terms,

•24 are of the form E(Xi−Xj)4= 2(θ4+ 3θ2

•96 are of the form E(Xi−Xj)2(Xi−Xk)2=θ4+ 3θ2

•24 are of the form E(Xi−Xj)2(Xk−X`)2= 4θ2

Thus,

Var(S2) = 1

242h24 ×2(θ4+ 3θ2

2) + 96(θ4+ 3θ2

2) + 24 ×4θ4−(24θ2)2i=1

4θ4−1

3θ2

2.

(ii) Assume that the formula holds for n, and establish it for n+1. (Let Sndenote the variance

based on nobservations.) Straightforward algebra will establish

n+1 =1

2n(n+ 1) 



i=1

j=1

(Xi−Xj)2+ 2

k=1

(Xk−Xn+1)2



def’n

2n(n+ 1) [A+ 2B]

where

Var(A)=4n(n−1)2θ4−n−3

n−1θ2

2(induction hypothesis)

Var(B) = n(n+ 1)θ4−n(n−3)θ2

2(Xkand Xn+1 are independent)

Cov(A, B)=2n(n−1) θ4−θ2

2(some minor bookkeeping needed)

Second Edition 5-3

Hence,

Var(S2

n+1) = 1

4n2(n+ 1)2[Var(A) + 4Var(B) + 4Cov(A, B)] = 1

n+ 1 θ4−n−2

nθ2

2,

establishing the induction and verifying the result.

c. Again assume that θ1= 0. Then

Cov( ¯

X, S2) = 1

2n2(n−1)E





k=1

i=1

j=1

(Xi−Xj)2





The double sum over iand jhas n(n−1) nonzero terms. For each of these, the entire

expectation is nonzero for only two values of k(when kmatches either ior j). Thus

Cov( ¯

X, S2) = 2n(n−1)

2n2(n−1)EXi(Xi−Xj)2=1

nθ3,

and ¯

Xand S2are uncorrelated if θ3= 0.

5.9 To establish the Lagrange Identity consider the case when n= 2,

(a1b2−a2b1)2=a2

1b2

2+a2

2b2

1−2a1b2a2b1

=a2

1b2

2+a2

2b2

1−2a1b2a2b1+a2

1b2

1+a2

2b2

2−a2

1b2

1−a2

2b2

= (a2

1+a2

2)(b2

1+b2

2)−(a1b1+a2b2)2.

Assume that is true for n, then

n+1

i=1

i! n+1

i=1

i!− n+1

i=1

aibi!2

= n

i=1

i+a2

n+1! n

i=1

i+b2

n+1!− n

i=1

aibi+an+1bn+1!2

= n

i=1

i! n

i=1

i!− n

i=1

aibi!2

+ n

i=1

i!b2

n+1 +a2

n+1 n

i=1

i!−2 n

i=1

aibi!an+1bn+1

n−1

i=1

j=i+1

(aibj−ajbi)2+

i=1

(aibn+1 −an+1bi)2

i=1

n+1

j=i+1

(aibj−ajbi)2.

If all the points lie on a straight line then Y−µy=c(X−µx), for some constant c6= 0. Let

bi=Y−µyand ai= (X−µx), then bi=cai. Therefore Pn

i=1 Pn+1

j=i+1(aibj−ajbi)2= 0. Thus

the correlation coeﬃcient is equal to 1.

5.10 a.

θ1= EXi=µ

5-4 Solutions Manual for Statistical Inference

θ2= E(Xi−µ)2=σ2

θ3= E(Xi−µ)3

= E(Xi−µ)2(Xi−µ) (Stein’s lemma: Eg(X)(X−θ) = σ2Eg0(X))

= 2σ2E(Xi−µ)=0

θ4= E(Xi−µ)4= E(Xi−µ)3(Xi−µ)=3σ2E(Xi−µ)2= 3σ4.

b. VarS2=1

n(θ4−n−3

n−1θ2

2) = 1

n(3σ4−n−3

n−1σ4) = 2σ4

n−1.

c. Use the fact that (n−1)S2/σ2∼χ2

n−1and Varχ2

n−1= 2(n−1) to get

Var (n−1)S2

σ2!= 2(n−1)

which implies ((n−1)2

σ4)VarS2= 2(n−1) and hence

VarS2=2(n−1)

(n−1)2/σ4=2σ4

n−1.

Remark: Another approach to b), not using the χ2distribution, is to use linear model theory.

For any matrix AVar(X0AX) = 2µ2

2trA2+ 4µ2θ0Aθ, where µ2is σ2,θ= EX=µ1. Write

S2=1

n−1Pn

i=1(Xi−¯

X) = 1

n−1X0(I−¯

Jn)X.Where

I−¯

Jn=





1−1

n−1

n··· −1

−1

n1−1

.....

−1

n··· ··· 1−1







Notice that trA2= trA=n−1, Aθ = 0. So

VarS2=1

(n−1)2Var(X0AX) = 1

(n−1)22σ4(n−1) + 0=2σ4

n−1.

5.11 Let g(s) = s2. Since g(·) is a convex function, we know from Jensen’s inequality that Eg(S)≥

g(ES), which implies σ2= ES2≥(ES)2. Taking square roots, σ≥ES. From the proof of

Jensen’s Inequality, it is clear that, in fact, the inequality will be strict unless there is an

interval I such that g is linear on I and P(X∈I) = 1. Since s2is “linear” only on single points,

we have ET2>(ET)2for any random variable T, unless P(T= ET) = 1.

5.13

Ec√S2=crσ2

n−1E rS2(n−1)

σ2!

=crσ2

n−1Z∞

√q1

Γn−1

22(n−1)/2q(n−1

2)−1e−q/2dq,

Since pS2(n−1)/σ2is the square root of a χ2random variable. Now adjust the integrand to

be another χ2pdf and get

Ec√S2=crσ2

n−1·Γ(n/2)2n/2

Γ((n−1)/2)2((n−1)/2Z∞

Γ(n/2)2n/2q(n−1)/2−1

2e−q/2dq

| {z }

=1 since χ2

npdf

So c=Γ(n−1

2)√n−1

√2Γ(n

2)gives E(cS) = σ.

Second Edition 5-5

5.15 a.

Xn+1 =Pn+1

i=1 Xi

n+ 1 =Xn+1 +Pn

i=1 Xi

n+ 1 =Xn+1 +n¯

n+ 1 .

nS2

n+1 =n

(n+ 1) −1

n+1

i=1 Xi−¯

Xn+12

n+1

i=1 Xi−Xn+1 +n¯

n+ 1 2

(use (a))

n+1

i=1 Xi−Xn+1

n+ 1 −n¯

n+ 12

n+1

i=1 Xi−¯

Xn−Xn+1

n+ 1 −¯

n+ 12±¯

Xn

n+1

i=1 "Xi−¯

Xn2−2Xi−¯

XnXn+1−¯

n+ 1 +1

(n+ 1)2Xn+1−¯

Xn2#

i=1 Xi−¯

Xn2+Xn+1 −¯

Xn2−2(Xn+1−¯

Xn)2

n+ 1 +n+ 1

(n+ 1)2Xn+1 −¯

Xn2

since

(Xi−¯

Xn) = 0!

= (n−1)S2

n+n

n+ 1 Xn+1 −¯

Xn2.

5.16 a. P3

i=1 Xi−i

i2∼χ2

b. Xi−1

i,v

tP3

i=2 Xi−i

i2,2∼t2

c. Square the random variable in part b).

5.17 a. Let U∼χ2

pand V∼χ2

q, independent. Their joint pdf is

Γp

2Γq

22(p+q)/2up

2−1vq

2−1e−(u+v)

From Deﬁnition 5.3.6, the random variable X= (U/p)/(V/q) has an Fdistribution, so we

make the transformation x= (u/p)/(v/q) and y=u+v. (Of course, many choices of ywill

do, but this one makes calculations easy. The choice is prompted by the exponential term

in the pdf.) Solving for uand vyields

qxy

1 + q

px, v =y

1 + q

px,and |J|=

1 + q

px2.

We then substitute into fU,V (u, v) to obtain

fX,Y (x, y) = 1

Γp

2Γq

22(p+q)/2 p

qxy

1 + q

px!p

2−1 y

1 + q

px!q

2−1

e−y

1 + q

px2.

5-6 Solutions Manual for Statistical Inference

Note that the pdf factors, showing that Xand Yare independent, and we can read oﬀ the

pdfs of each: Xhas the Fdistribution and Yis χ2

p+q. If we integrate out yto recover the

proper constant, we get the Fpdf

fX(x) = Γp+q

2

Γp

2Γq

2q

pp/2xp/2−1

1 + q

pxp+q

b. Since Fp,q =χ2

p/p

χ2

q/q , let U∼χ2

p,V∼χ2

qand Uand Vare independent. Then we have

EFp,q = E U/p

V/q = E U

pEq

V(by independence)

pqE1

V(EU=p).

Then

E1

V=Z∞

Γq

22q/2vq

2−1e−v

2dv =1

Γq

22q/2Z∞

vq−2

2−1e−v

2dv

Γq

22q/2Γq−2

22(q−2)/2=Γq−2

22(q−2)/2

Γq−2

2q−2

22q/2=1

q−2.

Hence, EFp,q =p

q−2=q

q−2, if q > 2. To calculate the variance, ﬁrst calculate

E(F2

p,q) = E U2

V2=q2

p2E(U2)E 1

V2.

Now

E(U2) = Var(U) + (EU)2= 2p+p2

and

E1

V2=Z∞

Γ (q/2) 2q/2v(q/2)−1e−v/2dv =1

(q−2)(q−4).

Therefore,

EF2

p,q =q2

p2p(2 + p)1

(q−2)(q−4) =q2

(p+ 2)

(q−2)(q−4),

and, hence

Var(Fp,q) = q2(p+ 2)

p(q−2)(q−4) −q2

(q−2)2= 2 q

q−22q+p−2

p(q−4) , q > 4.

c. Write X=U/p

V/p then 1

X=V/q

U/p ∼Fq,p, since U∼χ2

p,V∼χ2

qand Uand Vare independent.

d. Let Y=(p/q)X

1+(p/q)X=pX

q+pX , so X=qY

p(1−Y)and dx

dy =q

p(1 −y)−2. Thus, Yhas pdf

fY(y) = Γq+p

2

Γp

2Γq

2p

qp

2qy

p(1−y)p−2

1 + p

p(1−y)p+q

p(1 −y)2

=hBp

2,q

2i−1yp

2−1(1 −y)q

2−1∼beta p

2,q

2.

Second Edition 5-7

5.18 If X∼tp, then X=Z/pV /p where Z∼n(0,1), V∼χ2

pand Zand Vare independent.

a. EX= EZ/pV/p = (EZ)(E1/pV/p) = 0, since EZ= 0, as long as the other expectation is

ﬁnite. This is so if p > 1. From part b), X2∼F1,p. Thus VarX= EX2=p/(p−2), if p > 2

(from Exercise 5.17b).

b. X2=Z2/(V/p). Z2∼χ2

1, so the ratio is distributed F1,p.

c. The pdf of Xis

fX(x) = "Γ(p+1

Γ(p/2)√pπ #1

(1 + x2/p)(p+1)/2.

Denote the quantity in square brackets by Cp. From an extension of Stirling’s formula

(Exercise 1.28) we have

lim

p→∞ Cp= lim

p→∞

√2πp−1

2p−1

2+1

2e−p−1

√2πp−2

2p−2

2+1

2e−p−2

√pπ

=e−1/2

√πlim

p→∞ p−1

2p−1

2+1

p−2

2p−2

2+1

2√p

=e−1/2

√π

e1/2

√2,

by an application of Lemma 2.3.14. Applying the lemma again shows that for each x

lim

p→∞ 1+x2/p(p+1)/2=ex2/2,

establishing the result.

d. As the random variable F1,p is the square of a tp, we conjecture that it would converge to

the square of a n(0,1) random variable, a χ2

e. The random variable qFq,p can be thought of as the sum of qrandom variables, each a tp

squared. Thus, by all of the above, we expect it to converge to a χ2

qrandom variable as

p→ ∞.

5.19 a. χ2

p∼χ2

q+χ2

dwhere χ2

qand χ2

dare independent χ2random variables with qand d=p−q

degrees of freedom. Since χ2

dis a positive random variable, for any a > 0,

P(χp> a) = P(χ2

q+χ2

d> a)> P (χ2

q> a).

b. For k1> k2,k1Fk1,ν ∼(U+V)/(W/ν), where U,Vand Ware independent and U∼χ2

k2,

V∼χ2

k1−k2and W∼χ2

ν. For any a > 0, because V/(W/ν) is a positive random variable,

we have

P(k1Fk1,ν > a) = P((U+V)/(W/ν)> a)> P (U/(W/ν)> a) = P(k2Fk2,ν > a).

c. α=P(Fk,ν > Fα,k,ν ) = P(kFk,ν > kFα,k,ν ). So, kFα,k,ν is the αcutoﬀ point for the random

variable kFk,ν . Because kFk,ν is stochastically larger that (k−1)Fk−1,ν , the αcutoﬀ for kFk,ν

is larger than the αcutoﬀ for (k−1)Fk−1,ν , that is kFα,k,ν >(k−1)Fα,k−1,ν .

5.20 a. The given integral is

Z∞

√2πe−t2x/2ν√x1

Γ(ν/2)2ν/2(νx)(ν/2)−1e−νx/2dx

√2π

νν/2

Γ(ν/2)2ν/2Z∞

e−t2x/2x((ν+1)/2)−1e−νx/2dx

5-8 Solutions Manual for Statistical Inference

√2π

νν/2

Γ(ν/2)2ν/2Z∞

x((ν+1)/2)−1e−(ν+t2)x/2dx integrand is kernel of

gamma((ν+1)/2,2/(ν+t2)

√2π

νν/2

Γ(ν/2)2ν/2Γ((ν+ 1)/2) 2

ν+t2(ν+1)/2

√νπ

Γ((ν+1)/2)

Γ(ν/2)

(1 + t2/ν)(ν+1)/2,

the pdf of a tνdistribution.

b. Diﬀerentiate both sides with respect to tto obtain

νfF(νt) = Z∞

yf1(ty)fν(y)dy,

where fFis the Fpdf. Now write out the two chi-squared pdfs and collect terms to get

νfF(νt) = t−1/2

Γ(1/2)Γ(ν/2)2(ν+1)/2Z∞

y(ν−1)/2e−(1+t)y/2dy

=t−1/2

Γ(1/2)Γ(ν/2)2(ν+1)/2

Γ(ν+1

2)2(ν+1)/2

(1 + t)(ν+1)/2.

Now deﬁne y=νt to get

fF(y) = Γ( ν+1

νΓ(1/2)Γ(ν/2)

(y/ν)−1/2

(1 + y/ν)(ν+1)/2,

the pdf of an F1,ν .

c. Again diﬀerentiate both sides with respect to t, write out the chi-squared pdfs, and collect

terms to obtain

(ν/m)fF((ν/m)t) = t−m/2

Γ(m/2)Γ(ν/2)2(ν+m)/2Z∞

y(m+ν−2)/2e−(1+t)y/2dy.

Now, as before, integrate the gamma kernel, collect terms, and deﬁne y= (ν/m)tto get

fF(y) = Γ( ν+m

Γ(m/2)Γ(ν/2) m

νm/2ym/2−1

(1 + (m/ν)y)(ν+m)/2,

the pdf of an Fm,ν .

5.21 Let mdenote the median. Then, for general nwe have

P(max(X1, . . . , Xn)> m)=1−P(Xi≤mfor i= 1,2, . . . , n)

= 1 −[P(X1≤m)]n= 1 −1

2n

5.22 Calculating the cdf of Z2, we obtain

FZ2(z) = P((min(X, Y ))2≤z) = P(−z≤min(X, Y )≤√z)

=P(min(X, Y )≤√z)−P(min(X, Y )≤ −√z)

= [1 −P(min(X, Y )>√z)] −[1 −P(min(X, Y )>−√z)]

=P(min(X, Y )>−√z)−P(min(X, Y )>√z)

=P(X > −√z)P(Y > −√z)−P(X > √z)P(Y > √z),

Second Edition 5-9

where we use the independence of Xand Y. Since Xand Yare identically distributed, P(X >

a) = P(Y > a) = 1 −FX(a), so

FZ2(z) = (1 −FX(−√z))2−(1 −FX(√z))2= 1 −2FX(−√z),

since 1 −FX(√z) = FX(−√z). Diﬀerentiating and substituting gives

fZ2(z) = d

dz FZ2(z) = fX(−√z)1

√z=1

√2πe−z/2z−1/2,

the pdf of a χ2

1random variable. Alternatively,

P(Z2≤z) = P[min(X, Y )]2≤z

=P(−√z≤min(X, Y )≤√z)

=P(−√z≤X≤√z, X ≤Y) + P(−√z≤Y≤√z, Y ≤X)

=P(−√z≤X≤√z|X≤Y)P(X≤Y)

+P(−√z≤Y≤√z|Y≤X)P(Y≤X)

2P(−√z≤X≤√z) + 1

2P(−√z≤Y≤√z),

using the facts that Xand Yare independent, and P(Y≤X) = P(X≤Y) = 1

2. Moreover,

since Xand Yare identically distributed

P(Z2≤z) = P(−√z≤X≤√z)

and

fZ2(z) = d

dz P(−√z≤X≤√z) = 1

√2π(e−z/21

2z−1/2+e−z/21

2z−1/2)

√2πz−1/2e−z/2,

the pdf of a χ2

5.23

P(Z > z) = ∞

x=1

P(Z > z|x)P(X=x) = ∞

x=1

P(U1> z, . . . , Ux> z|x)P(X=x)

=∞

x=1

i=1

P(Ui> z)P(X=x) (by independence of the Ui’s)

=∞

x=1

P(Ui> z)xP(X=x) = ∞

x=1

(1 −z)x1

(e−1)x!

(e−1)

∞

x=1

(1 −z)x

x!=e1−z−1

e−10< z < 1.

5.24 Use fX(x)=1/θ,FX(x) = x/θ, 0 < x < θ. Let Y=X(n),Z=X(1). Then, from Theorem

5.4.6,

fZ,Y (z, y) = n!

0!(n−2)!0!

θz

θ0y−z

θn−21−y

θ0=n(n−1)

θn(y−z)n−2,0< z < y < θ.

5-10 Solutions Manual for Statistical Inference

Now let W=Z/Y ,Q=Y. Then Y=Q,Z=W Q, and |J|=q. Therefore

fW,Q(w, q) = n(n−1)

θn(q−wq)n−2q=n(n−1)

θn(1 −w)n−2qn−1,0< w < 1,0< q < θ.

The joint pdf factors into functions of wand q, and, hence, Wand Qare independent.

5.25 The joint pdf of X(1), . . . , X(n)is

f(u1, . . . , un) = n!an

θan ua−1

1···ua−1

n,0< u1<··· < un< θ.

Make the one-to-one transformation to Y1=X(1)/X(2), . . . , Yn−1=X(n−1)/X(n), Yn=X(n).

The Jacobian is J=y2y2

3···yn−1

n. So the joint pdf of Y1, . . . , Ynis

f(y1, . . . , yn) = n!an

θan (y1···yn)a−1(y2···yn)a−1···(yn)a−1(y2y2

3···yn−1

=n!an

θan ya−1

1y2a−1

2···yna−1

n,0< yi<1; i= 1, . . . , n −1,0< yn< θ.

We see that f(y1, . . . , yn) factors so Y1, . . . , Ynare mutually independent. To get the pdf of

Y1, integrate out the other variables and obtain that fY1(y1) = c1ya−1

1, 0 < y1<1, for some

constant c1. To have this pdf integrate to 1, it must be that c1=a. Thus fY1(y1) = aya−1

0< y1<1. Similarly, for i= 2, . . . , n −1, we obtain fYi(yi) = iayia−1

i,0< yi<1.From

Theorem 5.4.4, the pdf of Ynis fYn(yn) = na

θna yna−1

n, 0 < yn< θ. It can be checked that the

product of these marginal pdfs is the joint pdf given above.

5.27 a. fX(i)|X(j)(u|v) = fX(i),X(j)(u, v)/fX(j)(v). Consider two cases, depending on which of ior

jis greater. Using the formulas from Theorems 5.4.4 and 5.4.6, and after cancellation, we

obtain the following.

(i) If i < j,

fX(i)|X(j)(u|v) = (j−1)!

(i−1)!(j−1−i)!fX(u)Fi−1

X(u)[FX(v)−FX(u)]j−i−1F1−j

X(v)

=(j−1)!

(i−1)!(j−1−i)!

fX(u)

FX(v)FX(u)

FX(v)i−11−FX(u)

FX(v)j−i−1

, u < v.

Note this interpretation. This is the pdf of the ith order statistic from a sample of size j−1,

from a population with pdf given by the truncated distribution, f(u) = fX(u)/FX(v),

u<v.

(ii) If j < i and u>v,

fX(i)|X(j)(u|v)

=(n−j)!

(n−1)!(i−1−j)!fX(u) [1−FX(u)]n−i[FX(u)−FX(v)]i−1−j[1−FX(v)]j−n

=(n−j)!

(i−j−1)!(n−i)!

fX(u)

1−FX(v)FX(u)−FX(v)

1−FX(v)i−j−11−FX(u)−FX(v)

1−FX(v)n−i

This is the pdf of the (i−j)th order statistic from a sample of size n−j, from a population

with pdf given by the truncated distribution, f(u) = fX(u)/(1 −FX(v)), u>v.

b. From Example 5.4.7,

fV|R(v|r) = n(n−1)rn−2/an

n(n−1)rn−2(a−r)/an=1

a−r, r/2< v < a −r/2.

Second Edition 5-11

5.29 Let Xi= weight of ith booklet in package. The Xis are iid with EXi= 1 and VarXi=.052.

We want to approximate PP100

i=1 Xi>100.4=PP100

i=1 Xi/100 >1.004=P(¯

X > 1.004).

By the CLT, P(¯

X > 1.004) ≈P(Z > (1.004 −1)/(.05/10)) = P(Z > .8) = .2119.

5.30 From the CLT we have, approximately, ¯

X1∼n(µ, σ2/n), ¯

X2∼n(µ, σ2/n). Since ¯

X1and ¯

are independent, ¯

X1−¯

X2∼n(0,2σ2/n). Thus, we want

.99 ≈P¯

X1−¯

X2< σ/5

=P −σ/5

σ/pn/2<¯

X1−¯

σ/pn/2<σ/5

σ/pn/2!

≈P−1

5rn

2< Z < 1

5rn

2,

where Z∼n(0,1). Thus we need P(Z≥√n/5(√2)) ≈.005. From Table 1, √n/5√2 = 2.576,

which implies n= 50(2.576)2≈332.

5.31 We know that σ2

X= 9/100. Use Chebyshev’s Inequality to get

P−3k/10 <¯

X−µ < 3k/10≥1−1/k2.

We need 1 −1/k2≥.9 which implies k≥√10 = 3.16 and 3k/10 = .9487. Thus

P(−.9487 <¯

X−µ < .9487) ≥.9

by Chebychev’s Inequality. Using the CLT, ¯

Xis approximately nµ, σ2

Xwith σ¯

X=√.09 = .3

and ( ¯

X−µ)/.3∼n(0,1). Thus

.9 = P−1.645 <¯

X−µ

.3<1.645=P(−.4935 <¯

X−µ < .4935).

Thus, we again see the conservativeness of Chebychev’s Inequality, yielding bounds on ¯

X−µ

that are almost twice as big as the normal approximation. Moreover, with a sample of size 100,

Xis probably very close to normally distributed, even if the underlying Xdistribution is not

close to normal.

5.32 a. For any  > 0,

PpXn−√a> =PpXn−√apXn+√a>  pXn+√a

=P|Xn−a|>  pXn+√a

≤P|Xn−a|> √a→0,

as n→ ∞, since Xn→a in probability. Thus √Xn→√ain probability.

b. For any  > 0,

P

Xn−1≤=Pa

1+≤Xn≤a

1−

=Pa−a

1+≤Xn≤a+a

1−

≥Pa−a

1+≤Xn≤a+a

1+ a+a

1+< a +a

1−

=P|Xn−a| ≤ a

1+→1,

as n→ ∞, since Xn→a in probability. Thus a/Xn→1 in probability.

5-12 Solutions Manual for Statistical Inference

c. S2

n→σ2in probability. By a), Sn=pS2

n→√σ2=σin probability. By b), σ/Sn→1 in

probability.

5.33 For all  > 0 there exist Nsuch that if n > N, then P(Xn+Yn> c)>1−. Choose N1such

that P(Xn>−m)>1−/2 and N2such that P(Yn> c +m)>1−/2. Then

P(Xn+Yn> c)≥P(Xn>−m, +Yn> c +m)≥P(Xn>−m) + P(Yn> c +m)−1 = 1 −.

5.34 Using E ¯

Xn=µand Var ¯

Xn=σ2/n, we obtain

E√n(¯

Xn−µ)

σ=√n

σE( ¯

Xn−µ) = √n

σ(µ−µ) = 0.

Var√n(¯

Xn−µ)

σ=n

σ2Var( ¯

Xn−µ) = n

σ2Var ¯

X=n

σ2

n= 1.

5.35 a. Xi∼exponential(1). µX= 1, VarX= 1. From the CLT, ¯

Xnis approximately n(1,1/n). So

Xn−1

1/√n→Z∼n(0,1) and P¯

Xn−1

1/√n≤x→P(Z≤x).

dxP(Z≤x) = d

dxFZ(x) = fZ(x) = 1

√2πe−x2/2.

dxP¯

Xn−1

1/√n≤x

dx n

i=1

Xi≤x√n+n! W=

i=1

Xi∼gamma(n, 1)!

dxFW(x√n+n) = fW(x√n+n)·√n=1

Γ(n)(x√n+n)n−1e−(x√n+n)√n.

Therefore, (1/Γ(n))(x√n+n)n−1e−(x√n+n)√n≈1

√2πe−x2/2as n→ ∞. Substituting x= 0

yields n!≈nn+1/2e−n√2π.

5.37 a. For the exact calculations, use the fact that Vnis itself distributed negative binomial(10r, p).

The results are summarized in the following table. Note that the recursion relation of problem

3.48 can be used to simplify calculations.

P(Vn=v)

(a) (b) (c)

vExact Normal App. Normal w/cont.

0 .0008 .0071 .0056

1 .0048 .0083 .0113

2 .0151 .0147 .0201

3 .0332 .0258 .0263

4 .0572 .0392 .0549

5 .0824 .0588 .0664

6 .1030 .0788 .0882

7 .1148 .0937 .1007

8 .1162 .1100 .1137

9 .1085 .1114 .1144

10 .0944 .1113 .1024

Second Edition 5-13

b. Using the normal approximation, we have µv=r(1 −p)/p = 20(.3)/.7 = 8.57 and

σv=qr(1 −p)/p2=p(20)(.3)/.49 = 3.5.

Then,

P(Vn= 0) = 1 −P(Vn≥1) = 1 −PVn−8.57

3.5≥1−8.57

3.5= 1 −P(Z≥ −2.16) = .0154.

Another way to approximate this probability is

P(Vn= 0) = P(Vn≤0) = PV−8.57

3.5≤0−8.57

3.5=P(Z≤ −2.45) = .0071.

Continuing in this way we have P(V= 1) = P(V≤1) −P(V≤0) = .0154 −.0071 = .0083,

etc.

c. With the continuity correction, compute P(V=k) by P(k−.5)−8.57

3.5≤Z≤(k+.5)−8.57

3.5, so

P(V= 0) = P(−9.07/3.5≤Z≤ −8.07/3.5) = .0104 −.0048 = .0056, etc. Notice that the

continuity correction gives some improvement over the uncorrected normal approximation.

5.39 a. If his continuous given  > 0 there exits δsuch that |h(xn)−h(x)|<  for |xn−x|< δ. Since

X1, . . . , Xnconverges in probability to the random variable X, then limn→∞ P(|Xn−X|<

δ) = 1. Thus limn→∞ P(|h(Xn)−h(X)|< ) = 1.

b. Deﬁne the subsequence Xj(s) = s+I[a,b](s) such that in I[a,b],ais always 0, i.e, the subse-

quence X1, X2, X4, X7, . . .. For this subsequence

Xj(s)→nsif s > 0

s+ 1 if s= 0.

5.41 a. Let =|x−µ|.

(i) For x−µ≥0

P(|Xn−µ|> ) = P(|Xn−µ|> x −µ)

=P(Xn−µ < −(x−µ)) + P(Xn−µ > x −µ)

≥P(Xn−µ > x −µ)

=P(Xn> x)=1−P(Xn≤x).

Therefore, 0 = limn→∞ P(|Xn−µ|> )≥limn→∞ 1−P(Xn≤x). Thus limn→∞ P(Xn≤

x)≥1.

(ii) For x−µ < 0

P(|Xn−µ|> ) = P(|Xn−µ|>−(x−µ))

=P(Xn−µ < x −µ) + P(Xn−µ > −(x−µ))

≥P(Xn−µ < x −µ)

=P(Xn≤x).

Therefore, 0 = limn→∞ P(|Xn−µ|> )≥limn→∞ P(Xn≤x).

By (i) and (ii) the results follows.

b. For every  > 0,

P(|Xn−µ|> )≤P(Xn−µ < −) + P(Xn−µ > )

=P(Xn< µ −)+1−P(Xn≤µ+)→0 as n→ ∞.

5-14 Solutions Manual for Statistical Inference

5.43 a. P(|Yn−θ|< ) = Pp(n)(Yn−θ)<p(n). Therefore,

lim

n→∞ P(|Yn−θ|< ) = lim

n→∞ Pp(n)(Yn−θ)<p(n)=P(|Z|<∞) = 1,

where Z∼n(0, σ2). Thus Yn→θin probability.

b. By Slutsky’s Theorem (a), g0(θ)√n(Yn−θ)→g0(θ)Xwhere X∼n(0, σ2). Therefore

√n[g(Yn)−g(θ)] = g0(θ)√n(Yn−θ)→n(0, σ2[g0(θ)]2).

5.45 We do part (a), the other parts are similar. Using Mathematica, the exact calculation is

In[120]:=

f1[x_]=PDF[GammaDistribution[4,25],x]

p1=Integrate[f1[x],{x,100,\[Infinity]}]//N

1-CDF[BinomialDistribution[300,p1],149]

Out[120]=

e^(-x/25) x^3/2343750

Out[121]=

0.43347

Out[122]=

0.0119389.

The answer can also be simulated in Mathematica or in R. Here is the R code for simulating

the same probability

p1<-mean(rgamma(10000,4,scale=25)>100)

mean(rbinom(10000, 300, p1)>149)

In each case 10,000 random variables were simulated. We obtained p1=0.438 and a binomial

probability of 0.0108.

5.47 a. −2 log(Uj)∼exponential(2) ∼χ2

2. Thus Yis the sum of νindependent χ2

2random variables.

By Lemma 5.3.2(b), Y∼χ2

2ν.

b. βlog(Uj)∼exponential(2) ∼gamma(1, β). Thus Yis the sum of independent gamma

random variables. By Example 4.6.8, Y∼gamma(a, β)

c. Let V=Pa

j=1 log(Uj)∼gamma(a, 1). Similarly W=Pb

j=1 log(Uj)∼gamma(b, 1). By

Exercise 4.24, V

V+W∼beta(a, b).

5.49 a. See Example 2.1.4.

b. X=g(U) = −log 1−U

U. Then g−1(x) = 1

1+e−y. Thus

fX(x) = 1 ×

e−y

(1 + e−y)2=e−y

(1 + e−y)2− ∞ < y < ∞,

which is the density of a logistic(0,1) random variable.

c. Let Y∼logistic(µ, β) then fY(y) = 1

βfZ(−(y−µ)

β) where fZis the density of a logistic(0,1).

Then Y=βZ +µ. To generate a logistic(µ, β) random variable generate (i) generate U∼

uniform(0,1), (ii) Set Y=βlog U

1−U+µ.

5.51 a. For Ui∼uniform(0,1), EUi= 1/2, VarUi= 1/12. Then

i=1

Ui−6 = 12 ¯

U−6 = √12 ¯

U−1/2

1/√12 

Second Edition 5-15

is in the form √n(¯

U−EU)/σwith n= 12, so Xis approximately n(0,1) by the Central

Limit Theorem.

b. The approximation does not have the same range as Z∼n(0,1) where −∞ < Z < +∞,

since −6< X < 6.

EX= E 12

i=1

Ui−6!=

i=1

EUi−6 = 12

i=1

2!−6 = 6 −6 = 0.

VarX= Var 12

i=1

Ui−6!= Var

i=1

Ui= 12VarU1= 1

EX3= 0 since Xis symmetric about 0. (In fact, all odd moments of Xare 0.) Thus, the ﬁrst

three moments of Xall agree with the ﬁrst three moments of a n(0,1). The fourth moment

is not easy to get, one way to do it is to get the mgf of X. Since EetU = (et−1)/t,

EetP12

i=1 Ui−6=e−6tet−1

t12

=et/2−e−t/2

t12

Computing the fourth derivative and evaluating it at t= 0 gives us EX4. This is a lengthy

calculation. The answer is EX4= 29/10, slightly smaller than EZ4= 3, where Z∼n(0,1).

5.53 The R code is the following:

a. obs <- rbinom(1000,8,2/3)

meanobs <- mean(obs)

variance <- var(obs)

hist(obs)

Output:

> meanobs

[1] 5.231

> variance

[1] 1.707346

b. obs<- rhyper(1000,8,2,4)

meanobs <- mean(obs)

variance <- var(obs)

hist(obs)

Output:

> meanobs

[1] 3.169

> variance

[1] 0.4488879

c. obs <- rnbinom(1000,5,1/3)

meanobs <- mean(obs)

variance <- var(obs)

hist(obs)

Output:

> meanobs

[1] 10.308

> variance

[1] 29.51665

5-16 Solutions Manual for Statistical Inference

5.55 Let Xdenote the number of comparisons. Then

EX=∞

k=0

P(X > k) = 1 + ∞

k=1

P(U > Fy(yk−1))

= 1 + ∞

k=1

(1 −Fy(yk−1)) = 1 + ∞

k=0

(1 −Fy(yi)) = 1 + EY

5.57 a. Cov(Y1, Y2) = Cov(X1+X3, X2+X3) = Cov(X3, X3) = λ3since X1, X2and X3are

independent.

Zi=n1 if Xi=X3= 0

0 otherwise

pi=P(Zi= 0) = P(Yi= 0) = P(Xi= 0, X3= 0) = e−(λi+λ3). Therefore Ziare

Bernoulli(pi) with E[Zi] = pi, Var(Zi) = pi(1 −pi) and

E[Z1Z2] = P(Z1= 1, Z2= 1) = P(Y1= 0, Y2= 0)

=P(X1+X3= 0, X2+X3= 0) = P(X1= 0)P(X2= 0)P(X3= 0)

=e−λ1e−λ2e−λ3.

Therefore,

Cov(Z1, Z2) = E[Z1Z2]−E[Z1]E[Z2]

=e−λ1e−λ2e−λ3−e−(λi+λ3)e−(λ2+λ3)=e−(λi+λ3)e−(λ2+λ3)(eλ3−1)

=p1p2(eλ3−1).

Thus Corr(Z1, Z2) = p1p2(eλ3−1)

√p1(1−p1)√p2(1−p2).

c. E[Z1Z2]≤pi, therefore

Cov(Z1, Z2) = E[Z1Z2]−E[Z1]E[Z2]≤p1−p1p2=p1(1 −p2),and

Cov(Z1, Z2)≤p2(1 −p1).

Therefore,

Corr(Z1, Z2)≤p1(1 −p2)

pp1(1 −p1)pp2(1 −p2)=pp1(1 −p2)

pp2(1 −p1)

and

Corr(Z1, Z2)≤p2(1 −p1)

pp1(1 −p1)pp2(1 −p2)=pp2(1 −p1)

pp1(1 −p2)

which implies the result.

5.59

P(Y≤y) = P(V≤y|U < 1

cfY(V)) = P(V≤y, U < 1

cfY(V))

P(U < 1

cfY(V))

=Ry

0R1

cfY(v)

0dudv

cRy

0fY(v)dv

=Zy

fY(v)dv

5.61 a. M= supy

Γ(a+b)

Γ(a)Γ(b)ya−1(1−y)b−1

Γ([a]+[b])

Γ([a])Γ([b]) y[a]−1(1−y)[b]−1<∞, since a−[a]>0 and b−[b]>0 and y∈(0,1).

Second Edition 5-17

b. M= supy

Γ(a+b)

Γ(a)Γ(b)ya−1(1−y)b−1

Γ([a]+b)

Γ([a])Γ(b)y[a]−1(1−y)b−1<∞, since a−[a]>0 and y∈(0,1).

c. M= supy

Γ(a+b)

Γ(a)Γ(b)ya−1(1−y)b−1

Γ([a]+1+β)

Γ([a]+1)Γ(b0)y[a]+1−1(1−y)b0−1<∞, since a−[a]−1<0 and y∈(0,1). b−b0>0

when b0= [b] and will be equal to zero when b0=b, thus it does not aﬀect the result.

d. Let f(y) = yα(1 −y)β. Then

df(y)

dy =αyα−1(1 −y)β−yαβ(1 −y)β−1=yα−1(1 −y)β−1[α(1 −y) + βy]

which is maximize at y=α

α+β. Therefore for, α=a−a0and β=b−b0

Γ(a+b)

Γ(a)Γ(b)

Γ(a0+b0)

Γ(a0)Γ(b0)a−a0

a−a0+b−b0a−a0b−b0

a−a0+b−b0b−b0

We need to minimize Min a0and b0. First consider a−a0

a−a0+b−b0a−a0b−b0

a−a0+b−b0b−b0. Let

c=α+β, then this term becomes α

cαc−α

cc−α. This term is maximize at α

c=1

2, this

is at α=1

2c. Then M= (1

2)(a−a0+b−b0)

Γ(a+b)

Γ(a)Γ(b)

Γ(a0+b0)

Γ(a0)Γ(b0)

. Note that the minimum that Mcould be

is one, which it is attain when a=a0and b=b0. Otherwise the minimum will occur when

a−a0and b−b0are minimum but greater or equal than zero, this is when a0= [a] and

b0= [b] or a0=aand b0= [b] or a0= [a] and b0=b.

5.63 M= supy

√2πe−y2

2λe−|y|

. Let f(y) = −y2

2+|y|

λ. Then f(y) is maximize at y=1

λwhen y≥0 and at

y=−1

λwhen y < 0. Therefore in both cases M=

√2πe−1

2λ2

2λe−1

λ2

. To minimize Mlet M0=λe 1

2λ2.

Then dlog M0

dλ =1

λ−1

λ3, therefore Mis minimize at λ= 1 or λ=−1. Thus the value of λthat

will optimize the algorithm is λ= 1.

5.65

P(X∗≤x) =

i=1

P(X∗≤x|qi)qi=

i=1

I(Yi≤x)qi=

mPm

i=1

f(Yi)

g(Yi)I(Yi≤x)

mPm

i=1

f(Yi)

g(Yi)

−→

m→∞

Egf(Y)

g(Y)I(Y≤x)

Egf(Y)

g(Y)

=Rx

−∞

f(y)

g(y)g(y)dy

R∞

−∞

f(y)

g(y)g(y)dy =Zx

−∞

f(y)dy.

5.67 An R code to generate the sample of size 100 from the speciﬁed distribution is shown for part

c). The Metropolis Algorithm is used to generate 2000 variables. Among other options one can

choose the 100 variables in positions 1001 to 1100 or the ones in positions 1010,1020, ..., 2000.

a. We want to generate X=σZ +µwhere Z∼Student’s twith νdegrees of freedom.

Therefore we ﬁrst can generate a sample of size 100 from a Student’s tdistribution with

νdegrees of freedom and then make the transformation to obtain the X’s. Thus fZ(z) =

Γ( ν+1

Γ( ν

√νπ

1+z2

ν(v+1)/2. Let V∼n(0,ν

ν−2) since given νwe can set

EV= EZ= 0,and Var(V) = Var(Z) = ν

ν−2.

Now, follow the algorithm on page 254 and generate the sample Z1, Z2. . . , Z100 and then

calculate Xi=σZi+µ.

5-18 Solutions Manual for Statistical Inference

b. fX(x) = 1

√2πσ

e−(log x−µ)2/2σ2

x. Let V∼gamma(α, β) where

α=(eµ+(σ2/2))2

e2(µ+σ2)−e2µ+σ2,and β=e2(µ+σ2)−e2µ+σ2

eµ+(σ2/2) ,

since given µand σ2we can set

EV=αβ =eµ+(σ2/2) = EX

and

Var(V) = αβ2=e2(µ+σ2)−e2µ+σ2= Var(X).

Now, follow the algorithm on page 254.

c. fX(x) = α

βe−xα

βxα−1. Let V∼exponential(β). Now, follow the algorithm on page 254 where

ρi= min (Vα−1

Zα−1

i−1

e−Vα

i+Vi−Zi−1+Zα

i−1

β,1)

An R code to generate a sample size of 100 from a Weibull(3,2) is:

#initialize a and b

b <- 2

a <- 3

Z <- rexp(1,1/b)

ranvars <- matrix(c(Z),byrow=T,ncol=1)

for( i in seq(2000))

{

U <- runif(1,min=0,max=1)

V <- rexp(1,1/b)

p <- pmin((V/Z)^(a-1)*exp((-V^a+V-Z+Z^a)/b),1)

if (U <= p)

Z <- V

ranvars <- cbind(ranvars,Z)

}

#One option: choose elements in position 1001,1002,...,1100

to be the sample

vector.1 <- ranvars[1001:1100]

mean(vector.1)

var(vector.1)

#Another option: choose elements in position 1010,1020,...,2000

to be the sample

vector.2 <- ranvars[seq(1010,2000,10)]

mean(vector.2)

var(vector.2)

Output:

[1] 1.048035

[1] 0.1758335

[1] 1.130649

[1] 0.1778724

5.69 Let w(v, z) = fY(v)fV(z)

fV(v)fY(z), and then ρ(v, z) = min{w(v, z),1}. We will show that

Zi∼fY⇒P(Zi+1 ≤a) = P(Y≤a).

Second Edition 5-19

Write

P(Zi+1 ≤a) = P(Vi+1 ≤aand Ui+1 ≤ρi+1) + P(Zi≤aand Ui+1 > ρi+1).

Since Zi∼fY, suppressing the unnecessary subscripts we can write

P(Zi+1 ≤a) = P(V≤aand U≤ρ(V, Y )) + P(Y≤aand U > ρ(V, Y )).

Add and subtract P(Y≤aand U≤ρ(V, Y )) to get

P(Zi+1 ≤a) = P(Y≤a) + P(V≤aand U≤ρ(V, Y ))

−P(Y≤aand U≤ρ(V, Y )).

Thus we need to show that

P(V≤aand U≤ρ(V, Y )) = P(Y≤aand U≤ρ(V, Y )).

Write out the probability as

P(V≤aand U≤ρ(V, Y ))

=Za

−∞ Z∞

−∞

ρ(v, y)fY(y)fV(v)dydv

=Za

−∞ Z∞

−∞

I(w(v, y)≤1) fY(v)fV(y)

fV(v)fY(y)fY(y)fV(v)dydv

+Za

−∞ Z∞

−∞

I(w(v, y)≥1)fY(y)fV(v)dydv

=Za

−∞ Z∞

−∞

I(w(v, y)≤1)fY(v)fV(y)dydv

+Za

−∞ Z∞

−∞

I(w(v, y)≥1)fY(y)fV(v)dydv.

Now, notice that w(v, y) = 1/w(y, v), and thus ﬁrst term above can be written

−∞ Z∞

−∞

I(w(v, y)≤1)fY(v)fV(y)dydv

=Za

−∞ Z∞

−∞

I(w(y, v)>1)fY(v)fV(y)dydv

=P(Y≤a, ρ(V, Y ) = 1, U ≤ρ(V, Y )).

The second term is

−∞ Z∞

−∞

I(w(v, y)≥1)fY(y)fV(v)dydv

=Za

−∞ Z∞

−∞

I(w(y, v)≤1)fY(y)fV(v)dydv

=Za

−∞ Z∞

−∞

I(w(y, v)≤1) fV(y)fY(v)

fV(y)fY(v)fY(y)fV(v)dydv

=Za

−∞ Z∞

−∞

I(w(y, v)≤1) fY(y)fV(v)

fV(y)fY(v)fV(y)fY(v)dydv

=Za

−∞ Z∞

−∞

I(w(y, v)≤1)w(y, v)fV(y)fY(v)dydv

=P(Y≤a, U ≤ρ(V, Y ), ρ(V, Y )≤1).

5-20 Solutions Manual for Statistical Inference

Putting it all together we have

P(V≤aand U≤ρ(V, Y )) = P(Y≤a, ρ(V, Y ) = 1, U ≤ρ(V, Y ))

+P(Y≤a, U ≤ρ(V, Y ), ρ(V, Y )≤1)

=P(Y≤aand U≤ρ(V, Y )),

and hence

P(Zi+1 ≤a) = P(Y≤a),

so fYis the stationary density.

Chapter 6

Principles of Data Reduction

6.1 By the Factorization Theorem, |X|is suﬃcient because the pdf of Xis

f(x|σ2) = 1

√2πσ e−x2/2σ2=1

√2πσ e−|x|2/2σ2=g(|x||σ2)·1

|{z}

h(x)

6.2 By the Factorization Theorem, T(X) = mini(Xi/i) is suﬃcient because the joint pdf is

f(x1, . . . , xn|θ) =

i=1

eiθ−xiI(iθ,+∞)(xi) = einθI(θ,+∞)(T(x))

| {z }

g(T(x)|θ)

·e−Σixi

| {z }

h(x)

Notice, we use the fact that i > 0, and the fact that all xis> iθ if and only if mini(xi/i)> θ.

6.3 Let x(1) = minixi. Then the joint pdf is

f(x1, . . . , xn|µ, σ) =

i=1

σe−(xi−µ)/σI(µ,∞)(xi) = eµ/σ

σn

e−Σixi/σI(µ,∞)(x(1))

| {z }

g(x(1),Σixi|µ,σ)

·1

|{z}

h(x)

Thus, by the Factorization Theorem, X(1),PiXiis a suﬃcient statistic for (µ, σ).

6.4 The joint pdf is

j=1 (h(xj)c(θ) exp k

i=1

wi(θ)ti(xj)!)=c(θ)nexp 



i=1

wi(θ)

j=1

ti(xj)



| {z }

g(T(x)|θ)

j=1

h(xj)

| {z }

h(x)

By the Factorization Theorem, Pn

j=1 t1(Xj), . . . , Pn

j=1 tk(Xj)is a suﬃcient statistic for θ.

6.5 The sample density is given by

i=1

f(xi|θ) =

i=1

2iθ I(−i(θ−1) ≤xi≤i(θ+ 1))

=1

2θn n

i=1

i!Imin xi

i≥ −(θ−1)Imax xi

i≤θ+ 1.

Thus (min Xi/i, max Xi/i) is suﬃcient for θ.

6-2 Solutions Manual for Statistical Inference

6.6 The joint pdf is given by

f(x1, . . . , xn|α, β) =

i=1

Γ(α)βαxiα−1e−xi/β =1

Γ(α)βαn n

i=1

xi!α−1

e−Σixi/β.

By the Factorization Theorem, (Qn

i=1 Xi,Pn

i=1 Xi) is suﬃcient for (α, β).

6.7 Let x(1) = mini{x1, . . . , xn},x(n)= maxi{x1, . . . , xn},y(1) = mini{y1, . . . , yn}and y(n)=

maxi{y1, . . . , yn}. Then the joint pdf is

f(x,y|θ)

i=1

(θ3−θ1)(θ4−θ2)I(θ1,θ3)(xi)I(θ2,θ4)(yi)

=1

(θ3−θ1)(θ4−θ2)n

I(θ1,∞)(x(1))I(−∞,θ3)(x(n))I(θ2,∞)(y(1))I(−∞,θ4)(y(n))

| {z }

g(T(x)|θ)

·1

|{z}

h(x)

By the Factorization Theorem, X(1), X(n), Y(1), Y(n)is suﬃcient for (θ1, θ2, θ3, θ4).

6.9 Use Theorem 6.2.13.

f(x|θ)

f(y|θ)=(2π)−n/2e−Σi(xi−θ)2/2

(2π)−n/2e−Σi(yi−θ)2/2= exp (−1

2" n

i=1

i−

i=1

i!+2θn(¯y−¯x)#).

This is constant as a function of θif and only if ¯y= ¯x; therefore ¯

Xis a minimal suﬃcient

statistic for θ.

b. Note, for X∼location exponential(θ), the range depends on the parameter. Now

f(x|θ)

f(y|θ)=Qn

i=1 e−(xi−θ)I(θ,∞)(xi)

i=1 e−(yi−θ)I(θ,∞)(yi)

=enθe−ΣixiQn

i=1 I(θ,∞)(xi)

enθe−ΣiyiQn

i=1 I(θ,∞)(yi)=e−ΣixiI(θ,∞)(min xi)

e−ΣiyiI(θ,∞)(min yi).

To make the ratio independent of θwe need the ratio of indicator functions independent

of θ. This will be the case if and only if min{x1, . . . , xn}= min{y1, . . . , yn}. So T(X) =

min{X1, . . . , Xn}is a minimal suﬃcient statistic.

f(x|θ)

f(y|θ)=e−Σi(xi−θ)

i=1 1 + e−(xi−θ)2Qn

i=1 1 + e−(yi−θ)2

e−Σi(yi−θ)

=e−Σi(yi−xi) Qn

i=1 1 + e−(yi−θ)

i=1 1 + e−(xi−θ)!2

This is constant as a function of θif and only if xand yhave the same order statistics.

Therefore, the order statistics are minimal suﬃcient for θ.

d. This is a diﬃcult problem. The order statistics are a minimal suﬃcient statistic.

Second Edition 6-3

e. Fix sample points xand y. Deﬁne A(θ) = {i:xi≤θ},B(θ) = {i:yi≤θ},a(θ) = the

number of elements in A(θ) and b(θ) = the number of elements in B(θ). Then the function

f(x|θ)/f(y|θ) depends on θonly through the function

i=1 |xi−θ| −

i=1 |yi−θ|

i∈A(θ)

(θ−xi) + X

i∈A(θ)c

(xi−θ)−X

i∈B(θ)

(θ−yi)−X

i∈B(θ)c

(yi−θ)

= (a(θ)−[n−a(θ)] −b(θ)+[n−b(θ)])θ

+

−X

i∈A(θ)

xi+X

i∈A(θ)c

xi+X

i∈B(θ)

yi−X

i∈B(θ)c

yi



= 2(a(θ)−b(θ))θ+

−X

i∈A(θ)

xi+X

i∈A(θ)c

xi+X

i∈B(θ)

yi−X

i∈B(θ)c

yi

.

Consider an interval of θs that does not contain any xis or yis. The second term is constant

on such an interval. The ﬁrst term will be constant, on the interval if and only if a(θ) = b(θ).

This will be true for all such intervals if and only if the order statistics for xare the same

as the order statistics for y. Therefore, the order statistics are a minimal suﬃcient statistic.

6.10 To prove T(X) = (X(1), X(n)) is not complete, we want to ﬁnd g[T(X)] such that E g[T(X)] = 0

for all θ, but g[T(X)] 6≡ 0 . A natural candidate is R=X(n)−X(1), the range of X, because by

Example 6.2.17 its distribution does not depend on θ. From Example 6.2.17, R∼beta(n−1,2).

Thus E R= (n−1)/(n+ 1) does not depend on θ, and E(R−ER) = 0 for all θ. Thus

g[X(n), X(1)] = X(n)−X(1) −(n−1)/(n+ 1) = R−ERis a nonzero function whose expected

value is always 0. So, (X(1), X(n)) is not complete. This problem can be generalized to show

that if a function of a suﬃcient statistic is ancillary, then the suﬃcient statistic is not complete,

because the expectation of that function does not depend on θ. That provides the opportunity

to construct an unbiased, nonzero estimator of zero.

6.11 a. These are all location families. Let Z(1), . . . , Z(n)be the order statistics from a random

sample of size nfrom the standard pdf f(z|0). Then (Z(1) +θ, . . . , Z(n)+θ) has the same

joint distribution as (X(1), . . . , X(n)), and (Y(1), . . . , Y(n−1)) has the same joint distribution

as (Z(n)+θ−(Z(1) +θ), . . . , Z(n)+θ−(Z(n−1) +θ)) = (Z(n)−Z(1), . . . , Z(n)−Z(n−1)).

The last vector depends only on (Z1, . . . , Zn) whose distribution does not depend on θ. So,

(Y(1), . . . , Y(n−1)) is ancillary.

b. For a), Basu’s lemma shows that (Y1, . . . ,Yn−1) is independent of the complete suﬃcient

statistic. For c), d), and e) the order statistics are suﬃcient, so (Y1, . . . ,Yn−1) is not inde-

pendent of the suﬃcient statistic. For b), X(1) is suﬃcient. Deﬁne Yn=X(1). Then the joint

pdf of (Y1, . . . ,Yn) is

f(y1, . . . , yn) = n!e−n(y1−θ)e−(n−1)yn

n−1

i=2

eyi,0< yn−1< yn−2<··· < y1

0< yn<∞.

Thus, Yn=X(1) is independent of (Y1, . . . , Yn−1).

6.12 a. Use Theorem 6.2.13 and write

f(x, n|θ)

f(y, n0|θ)=f(x|θ, N =n)P(N=n)

f(y|θ, N =n0)P(N=n0)

=n

xθx(1−θ)n−xpn

n0

yθy(1−θ)n0−ypn0

=θx−y(1 −θ)n−n0−x+yn

xpn

n0

ypn0

6-4 Solutions Manual for Statistical Inference

The last ratio does not depend on θ. The other terms are constant as a function of θif and

only if n=n0and x=y. So (X, N ) is minimal suﬃcient for θ. Because P(N=n) = pn

does not depend on θ,Nis ancillary for θ. The point is that although Nis independent of

θ, the minimal suﬃcient statistic contains Nin this case. A minimal suﬃcient statistic may

contain an ancillary statistic.

EX

N= E EX

NN = E 1

NE (X|N)= E 1

NNθ= E(θ) = θ.

VarX

N= VarEX

NN+ EVar X

NN = Var(θ)+E1

N2Var (X|N)

= 0 + E Nθ(1−θ)

N2=θ(1 −θ)E 1

N.

We used the fact that X|N∼binomial(N, θ).

6.13 Let Y1= log X1and Y2= log X2. Then Y1and Y2are iid and, by Theorem 2.1.5, the pdf of

each is

f(y|α) = αexp {αy −eαy}=1

1/α exp y

1/α −ey/(1/α),−∞ < y < ∞.

We see that the family of distributions of Yiis a scale family with scale parameter 1/α. Thus,

by Theorem 3.5.6, we can write Yi=1

αZi, where Z1and Z2are a random sample from f(z|1).

Then logX1

logX2

=Y1

=(1/α)Z1

(1/α)Z2

=Z1

Because the distribution of Z1/Z2does not depend on α, (log X1)/(log X2) is an ancillary

statistic.

6.14 Because X1, . . . , Xnis from a location family, by Theorem 3.5.6, we can write Xi=Zi+µ, where

Z1, . . . , Znis a random sample from the standard pdf, f(z), and µis the location parameter. Let

M(X) denote the median calculated from X1, . . . , Xn. Then M(X) = M(Z)+µand ¯

X=¯

Z+µ.

Thus, M(X)−¯

X= (M(Z) + µ)−(¯

Z+µ) = M(Z)−¯

Z. Because M(X)−¯

Xis a function of

only Z1, . . . , Zn, the distribution of M(X)−¯

Xdoes not depend on µ; that is, M(X)−¯

Xis an

ancillary statistic.

6.15 a. The parameter space consists only of the points (θ, ν) on the graph of the function ν=aθ2.

This quadratic graph is a line and does not contain a two-dimensional open set.

b. Use the same factorization as in Example 6.2.9 to show ( ¯

X, S2) is suﬃcient. E(S2) = aθ2

and E( ¯

X2) = Var ¯

X+ (E ¯

X)2=aθ2/n +θ2= (a+n)θ2/n. Therefore,

En

a+n¯

X2−S2

a=n

a+na+n

nθ2−1

aaθ2= 0,for all θ.

Thus g(¯

X, S2) = n

a+n¯

X2−S2

ahas zero expectation so ( ¯

X, S2) not complete.

6.17 The population pmf is f(x|θ) = θ(1 −θ)x−1=θ

1−θelog(1−θ)x, an exponential family with t(x) =

x. Thus, PiXiis a complete, suﬃcient statistic by Theorems 6.2.10 and 6.2.25. PiXi−n∼

negative binomial(n, θ).

6.18 The distribution of Y=PiXiis Poisson(nλ). Now

Eg(Y) =

∞

y=0

g(y)(nλ)ye−nλ

y!.

If the expectation exists, this is an analytic function which cannot be identically zero.

Second Edition 6-5

6.19 To check if the family of distributions of Xis complete, we check if Epg(X) = 0 for all p,

implies that g(X)≡0. For Distribution 1,

Epg(X) =

x=0

g(x)P(X=x) = pg(0) + 3pg(1) + (1 −4p)g(2).

Note that if g(0) = −3g(1) and g(2) = 0, then the expectation is zero for all p, but g(x) need

not be identically zero. Hence the family is not complete. For Distribution 2 calculate

Epg(X) = g(0)p+g(1)p2+g(2)(1 −p−p2) = [g(1) −g(2)]p2+ [g(0) −g(2)]p+g(2).

This is a polynomial of degree 2 in p. To make it zero for all peach coeﬃcient must be zero.

Thus, g(0) = g(1) = g(2) = 0, so the family of distributions is complete.

6.20 The pdfs in b), c), and e) are exponential families, so they have complete suﬃcient statistics

from Theorem 6.2.25. For a), Y= max{Xi}is suﬃcient and

f(y) = 2n

θ2ny2n−1,0< y < θ.

For a function g(y),

Eg(Y) = Zθ

g(y)2n

θ2ny2n−1dy = 0 for all θimplies g(θ)2nθ2n−1

θ2n= 0 for all θ

by taking derivatives. This can only be zero if g(θ) = 0 for all θ, so Y= max{Xi}is complete.

For d), the order statistics are minimal suﬃcient. This is a location family. Thus, by Example

6.2.18 the range R=X(n)−X(1) is ancillary, and its expectation does not depend on θ. So

this suﬃcient statistic is not complete.

6.21 a. Xis suﬃcient because it is the data. To check completeness, calculate

Eg(X) = θ

2g(−1) + (1 −θ)g(0) + θ

2g(1).

If g(−1) = g(1) and g(0) = 0, then Eg(X) = 0 for all θ, but g(x) need not be identically 0.

So the family is not complete.

b. |X|is suﬃcient by Theorem 6.2.6, because f(x|θ) depends on xonly through the value of

|x|. The distribution of |X|is Bernoulli, because P(|X|= 0) = 1 −θand P(|X|= 1) = θ.

By Example 6.2.22, a binomial family (Bernoulli is a special case) is complete.

c. Yes, f(x|θ) = (1 −θ)(θ/(2(1 −θ))|x|= (1 −θ)e|x|log[θ/(2(1−θ)], the form of an exponential

family.

6.22 a. The sample density is Qiθxθ−1

i=θn(Qixi)θ−1, so QiXiis suﬃcient for θ, not PiXi.

b. Because Qif(xi|θ) = θne(θ−1) log(Πixi), log (QiXi) is complete and suﬃcient by Theorem

6.2.25. Because QiXiis a one-to-one function of log (QiXi), QiXiis also a complete

suﬃcient statistic.

6.23 Use Theorem 6.2.13. The ratio

f(x|θ)

f(y|θ)=θ−nI(x(n)/2,x(1))(θ)

θ−nI(y(n)/2,y(1))(θ)

is constant (in fact, one) if and only if x(1) =y(1) and x(n)=y(n). So (X(1), X(n)) is a

minimal suﬃcient statistic for θ. From Exercise 6.10, we know that if a function of the suﬃcient

statistics is ancillary, then the suﬃcient statistic is not complete. The uniform(θ, 2θ) family is

a scale family, with standard pdf f(z)∼uniform(1,2). So if Z1, . . . , Znis a random sample

6-6 Solutions Manual for Statistical Inference

from a uniform(1,2) population, then X1=θZ1, . . . , Xn=θZnis a random sample from a

uniform(θ, 2θ) population, and X(1) =θZ(1) and X(n)=θZ(n). So X(1)/X(n)=Z(1)/Z(n), a

statistic whose distribution does not depend on θ. Thus, as in Exercise 6.10, (X(1), X(n)) is not

complete.

6.24 If λ= 0, Eh(X) = h(0). If λ= 1,

Eh(X) = e−1h(0) + e−1

∞

x=1

h(x)

x!.

Let h(0) = 0 and P∞

x=1

h(x)

x!= 0, so Eh(X) = 0 but h(x)6≡ 0. (For example, take h(0) = 0,

h(1) = 1, h(2) = −2, h(x) = 0 for x≥3 .)

6.25 Using the fact that (n−1)s2

x=Pix2

i−n¯x2, for any (µ, σ2) the ratio in Example 6.2.14 can

be written as

f(x|µ, σ2)

f(y|µ, σ2)= exp "µ

σ2 X

xi−X

yi!−1

2σ2 X

i−X

i!#.

a. Do part b) ﬁrst showing that PiX2

iis a minimal suﬃcient statistic. Because PiXi,PiX2

i

is not a function of PiX2

i, by Deﬁnition 6.2.11 PiXi,PiX2

iis not minimal.

b. Substituting σ2=µin the above expression yields

f(x|µ, µ)

f(y|µ, µ)= exp "X

xi−X

yi#exp "−1

2µ X

i−X

i!#.

This is constant as a function of µif and only if Pix2

i=Piy2

i. Thus, PiX2

iis a minimal

suﬃcient statistic.

c. Substituting σ2=µ2in the ﬁrst expression yields

f(x|µ, µ2)

f(y|µ, µ2)= exp "1

µ X

xi−X

yi!−1

2µ2 X

i−X

i!#.

This is constant as a function of µif and only if Pixi=Piyiand Pix2

i=Piy2

i. Thus,

PiXi,PiX2

iis a minimal suﬃcient statistic.

d. The ﬁrst expression for the ratio is constant a function of µand σ2if and only if Pixi=

Piyiand Pix2

i=Piy2

i. Thus, PiXi,PiX2

iis a minimal suﬃcient statistic.

6.27 a. This pdf can be written as

f(x|µ, λ) = λ

2π1/21

x31/2

exp λ

µexp −λ

2µ2x−λ

x.

This is an exponential family with t1(x) = xand t2(x)=1/x. By Theorem 6.2.25, the

statistic (PiXi,Pi(1/Xi)) is a complete suﬃcient statistic. ( ¯

X, T ) given in the problem

is a one-to-one function of (PiXi,Pi(1/Xi)). Thus, ( ¯

X, T ) is also a complete suﬃcient

statistic.

b. This can be accomplished using the methods from Section 4.3 by a straightforward but

messy two-variable transformation U= (X1+X2)/2 and V= 2λ/T =λ[(1/X1) + (1/X2)−

(2/[X1+X2])]. This is a two-to-one transformation.

Second Edition 6-7

6.29 Let fj= logistic(αj, βj), j= 0,1, . . . , k. From Theorem 6.6.5, the statistic

T(x) = Qn

i=1 f1(xi)

i=1 f0(xi), . . . , Qn

i=1 fk(xi)

i=1 f0(xi)=Qn

i=1 f1(x(i))

i=1 f0(x(i)), . . . , Qn

i=1 fk(x(i))

i=1 f0(x(i))

is minimal suﬃcient for the family {f0, f1, . . . , fk}. As Tis a 1 −1 function of the order

statistics, the order statistics are also minimal suﬃcient for the family {f0, f1, . . . , fk}. If Fis

a nonparametric family, fj∈ F, so part (b) of Theorem 6.6.5 can now be directly applied to

show that the order statistics are minimal suﬃcient for F.

6.30 a. From Exercise 6.9b, we have that X(1) is a minimal suﬃcient statistic. To check completeness

compute fY1(y), where Y1=X(1). From Theorem 5.4.4 we have

fY1(y) = fX(y) (1−FX(y))n−1n=e−(y−µ)he−(y−µ)in−1n=ne−n(y−µ), y > µ.

Now, write Eµg(Y1) = R∞

µg(y)ne−n(y−µ)dy. If this is zero for all µ, then R∞

µg(y)e−ny dy = 0

for all µ(because nenµ >0 for all µand does not depend on y). Moreover,

0 = d

dµ Z∞

g(y)e−ny dy=−g(µ)e−nµ

for all µ. This implies g(µ) = 0 for all µ, so X(1) is complete.

b. Basu’s Theorem says that if X(1) is a complete suﬃcient statistic for µ, then X(1) is inde-

pendent of any ancillary statistic. Therefore, we need to show only that S2has distribution

independent of µ; that is, S2is ancillary. Recognize that f(x|µ) is a location family. So we

can write Xi=Zi+µ, where Z1, . . . , Znis a random sample from f(x|0). Then

S2=1

n−1X(Xi−¯

X)2=1

n−1X((Zi+µ)−(¯

Z+µ))2=1

n−1X(Zi−¯

Z)2.

Because S2is a function of only Z1, . . . , Zn, the distribution of S2does not depend on µ;

that is, S2is ancillary. Therefore, by Basu’s theorem, S2is independent of X(1).

6.31 a. (i) By Exercise 3.28 this is a one-dimensional exponential family with t(x) = x. By Theorem

6.2.25, PiXiis a complete suﬃcient statistic. ¯

Xis a one-to-one function of PiXi,

so ¯

Xis also a complete suﬃcient statistic. From Theorem 5.3.1 we know that (n−

1)S2/σ2∼χ2

n−1= gamma((n−1)/2,2). S2= [σ2/(n−1)][(n−1)S2/σ2], a simple scale

transformation, has a gamma((n−1)/2,2σ2/(n−1)) distribution, which does not depend

on µ; that is, S2is ancillary. By Basu’s Theorem, ¯

Xand S2are independent.

(ii) The independence of ¯

Xand S2is determined by the joint distribution of ( ¯

X, S2) for each

value of (µ, σ2). By part (i), for each value of (µ, σ2), ¯

Xand S2are independent.

b. (i) µis a location parameter. By Exercise 6.14, M−¯

Xis ancillary. As in part (a) ¯

Xis a

complete suﬃcient statistic. By Basu’s Theorem, ¯

Xand M−¯

Xare independent. Because

they are independent, by Theorem 4.5.6 Var M= Var(M−¯

X+¯

X) = Var(M−¯

X)+Var ¯

(ii) If S2is a sample variance calculated from a normal sample of size N, (N−1)S2/σ2∼

χ2

N−1. Hence, (N−1)2Var S2/(σ2)2= 2(N−1) and Var S2= 2(σ2)2/(N−1). Both M

and M−¯

Xare asymptotically normal, so, M1, . . . , MNand M1−¯

X1, . . . , MN−¯

are each approximately normal samples if nis reasonable large. Thus, using the above

expression we get the two given expressions where in the straightforward case σ2refers

to Var M, and in the swindle case σ2refers to Var(M−¯

X).

c. (i)

E(Xk) = E X

YYk

= E "X

YkYk#indep.

= E X

Yk

EYk.

Divide both sides by E Ykto obtain the desired equality.

6-8 Solutions Manual for Statistical Inference

(ii) If αis ﬁxed, T=PiXiis a complete suﬃcient statistic for βby Theorem 6.2.25. Because

βis a scale parameter, if Z1, . . . , Znis a random sample from a gamma(α, 1) distribution,

then X(i)/T has the same distribution as (βZ(i))/(βPiZi) = Z(i)/(PiZi), and this

distribution does not depend on β. Thus, X(i)/T is ancillary, and by Basu’s Theorem, it

is independent of T. We have

E(X(i)|T) = E X(i)

TTT=TEX(i)

TTindep.

=TEX(i)

Tpart (i)

=TE(X(i))

ET.

Note, this expression is correct for each ﬁxed value of (α, β), regardless whether αis

“known” or not.

6.32 In the Formal Likelihood Principle, take E1=E2=E. Then the conclusion is Ev(E, x1) =

Ev(E, x2) if L(θ|x1)/L(θ|x2) = c. Thus evidence is equal whenever the likelihood functions are

equal, and this follows from Formal Suﬃciency and Conditionality.

6.33 a. For all sample points except (2,x∗

2) (but including (1,x∗

1)), T(j, xj) = (j, xj). Hence,

g(T(j, xj)|θ)h(j, xj) = g((j, xj)|θ)1 = f∗((j, xj)|θ).

For (2,x∗

2) we also have

g(T(2,x∗

2)|θ)h(2,x∗

2) = g((1,x∗

1)|θ)C=f∗((1,x∗

1)|θ)C=C1

2f1(x∗

1|θ)

=C1

2L(θ|x∗

1) = 1

2L(θ|x∗

2) = 1

2f2(x∗

2|θ) = f∗((2,x∗

2)|θ).

By the Factorization Theorem, T(J, XJ) is suﬃcient.

b. Equations 6.3.4 and 6.3.5 follow immediately from the two Principles. Combining them we

have Ev(E1,x∗

1) = Ev(E2,x∗

2), the conclusion of the Formal Likelihood Principle.

c. To prove the Conditionality Principle. Let one experiment be the E∗experiment and the

other Ej. Then

L(θ|(j, xj)) = f∗((j, xj)|θ) = 1

2fj(xj|θ) = 1

2L(θ|xj).

Letting (j, xj) and xjplay the roles of x∗

1and x∗

2in the Formal Likelihood Principle we

can conclude Ev(E∗,(j, xj)) = Ev(Ej,xj),the Conditionality Principle. Now consider the

Formal Suﬃciency Principle. If T(X) is suﬃcient and T(x) = T(y), then L(θ|x) = CL(θ|y),

where C=h(x)/h(y) and his the function from the Factorization Theorem. Hence, by the

Formal Likelihood Principle, Ev(E, x) = Ev(E, y),the Formal Suﬃciency Principle.

6.35 Let 1 = success and 0 = failure. The four sample points are {0,10,110,111}. From the likelihood

principle, inference about pis only through L(p|x). The values of the likelihood are 1, p,p2,

and p3, and the sample size does not directly inﬂuence the inference.

6.37 a. For one observation (X, Y ) we have

I(θ) = −E∂2

∂θ2log f(X, Y |θ)=−E−2Y

θ3=2E Y

θ3.

But, Y∼exponential(θ), and E Y=θ. Hence, I(θ)=2/θ2for a sample of size one, and

I(θ) = 2n/θ2for a sample of size n.

b. (i) The cdf of Tis

P(T≤t) = PPiYi

PiXi≤t2=P2PiYi/θ

2PiXiθ≤t2/θ2=P(F2n,2n≤t2/θ2)

Second Edition 6-9

where F2n,2nis an Frandom variable with 2ndegrees of freedom in the numerator and

denominator. This follows since 2Yi/θ and 2Xiθare all independent exponential(1), or

χ2

2. Diﬀerentiating (in t) and simplifying gives the density of Tas

fT(t) = Γ(2n)

Γ(n)2

tt2

t2+θ2nθ2

t2+θ2n

and the second derivative (in θ) of the log density is

2nt4+ 2t2θ2−θ4

θ2(t2+θ2)2=2n

θ21−2

(t2/θ2+ 1)2,

and the information in Tis

θ2"1−2E 1

T2/θ2+ 12#=2n

θ2

1−2E 1

2n,2n+ 1!2

.

The expected value is

E 1

2n,2n+ 1!2

=Γ(2n)

Γ(n)2Z∞

(1 + w)2

wn−1

(1 + w)2n=Γ(2n)

Γ(n)2

Γ(n)Γ(n+ 2)

Γ(2n+ 2) =n+ 1

2(2n+ 1).

Substituting this above gives the information in Tas

θ21−2n+ 1

2(2n+ 1)=I(θ)n

2n+ 1,

which is not the answer reported by Joshi and Nabar.

(ii) Let W=PiXiand V=PiYi. In each pair, Xiand Yiare independent, so Wand Vare

independent. Xi∼exponential(1/θ); hence, W∼gamma(n, 1/θ). Yi∼exponential(θ);

hence, V∼gamma(n, θ). Use this joint distribution of (W, V ) to derive the joint pdf of

(T, U) as

f(t, u|θ) = 2

[Γ(n)]2tu2n−1exp −uθ

t−ut

θ, u > 0, t > 0.

Now, the information in (T, U) is

−E∂2

∂θ2log f(T, U|θ)=−E−2UT

θ3= E 2V

θ3=2nθ

θ3=2n

θ2.

(iii) The pdf of the sample is f(x,y) = exp [−θ(Pixi)−(Piyi)/θ].Hence, (W, V ) deﬁned

as in part (ii) is suﬃcient. (T, U) is a one-to-one function of (W, V ), hence (T, U) is also

suﬃcient. But, E U2= E W V = (n/θ)(nθ) = n2does not depend on θ. So E(U2−n2) = 0

for all θ, and (T, U) is not complete.

6.39 a. The transformation from Celsius to Fahrenheit is y= 9x/5 + 32. Hence,

9(T∗(y)−32) = 5

9((.5)(y)+(.5)(212) −32)

9((.5)(9x/5+32)+(.5)(212) −32) = (.5)x+ 50 = T(x).

b. T(x) = (.5)x+ 50 6= (.5)x+ 106 = T∗(x). Thus, we do not have equivariance.

6-10 Solutions Manual for Statistical Inference

6.40 a. Because X1, . . . , Xnis from a location scale family, by Theorem 3.5.6, we can write Xi=

σZi+µ, where Z1, . . . , Znis a random sample from the standard pdf f(z). Then

T1(X1, . . . , Xn)

T2(X1, . . . , Xn)=T1(σZ1+µ, . . . , σZn+µ)

T2(σZ1+µ, . . . , σZn+µ)=σT1(Z1, . . . , Zn)

σT2(Z1, . . . , Zn)=T1(Z1, . . . , Zn)

T2(Z1, . . . , Zn).

Because T1/T2is a function of only Z1, . . . , Zn, the distribution of T1/T2does not depend

on µor σ; that is, T1/T2is an ancillary statistic.

b. R(x1, . . . , xn) = x(n)−x(1). Because a > 0, max{ax1+b, . . . , axn+b}=ax(n)+band

min{ax1+b, . . . , axn+b}=ax(1)+b. Thus, R(ax1+b, . . . , axn+b) = (ax(n)+b)−(ax(1)+b) =

a(x(n)−x(1)) = aR(x1, . . . , xn). For the sample variance we have

S2(ax1+b, . . . , axn+b) = 1

n−1X((axi+b)−(a¯x+b))2

=a21

n−1X(xi−¯x)2=a2S2(x1, . . . , xn).

Thus, S(ax1+b, . . . , axn+b) = aS(x1, . . . , xn). Therefore, Rand Sboth satisfy the above

condition, and R/S is ancillary by a).

6.41 a. Measurement equivariance requires that the estimate of µbased on ybe the same as the

estimate of µbased on x; that is, T∗(x1+a, . . . , xn+a)−a=T∗(y)−a=T(x).

b. The formal structures for the problem involving Xand the problem involving Yare the same.

They both concern a random sample of size nfrom a normal population and estimation of

the mean of the population. Thus, formal invariance requires that T(x) = T∗(x) for all x.

Combining this with part (a), the Equivariance Principle requires that T(x1+a, . . . , xn+a)−

a=T∗(x1+a, . . . , xn+a)−a=T(x1, . . . , xn), i.e., T(x1+a, . . . , xn+a) = T(x1, . . . , xn)+a.

c. W(x1+a, . . . , xn+a) = Pi(xi+a)/n = (Pixi)/n +a=W(x1, . . . , xn) + a, so W(x)

is equivariant. The distribution of (X1, . . . , Xn) is the same as the distribution of (Z1+

θ, . . . , Zn+θ), where Z1, . . . , Znare a random sample from f(x−0) and E Zi= 0. Thus,

EθW= E Pi(Zi+θ)/n =θ, for all θ.

6.43 a. For a location-scale family, if X∼f(x|θ, σ2), then Y=ga,c(X)∼f(y|cθ +a, c2σ2). So

for estimating σ2, ¯ga,c(σ2) = c2σ2. An estimator of σ2is invariant with respect to G1if

W(cx1+a, . . . , cxn+a) = c2W(x1, . . . , xn). An estimator of the form kS2is invariant

because

kS2(cx1+a, . . . , cxn+a) = k

n−1

i=1 (cxi+a)−

i=1

(cxi+a)/n!2

n−1

i=1

((cxi+a)−(c¯x+a))2

=c2k

n−1

i=1

(xi−¯x)2=c2kS2(x1, . . . , xn).

To show invariance with respect to G2, use the above argument with c= 1. To show

invariance with respect to G3, use the above argument with a= 0. ( G2and G3are both

subgroups of G1. So invariance with respect to G1implies invariance with respect to G2and

G3.)

b. The transformations in G2leave the scale parameter unchanged. Thus, ¯ga(σ2) = σ2. An

estimator of σ2is invariant with respect to this group if

W(x1+a, . . . , xn+a) = W(ga(x)) = ¯ga(W(x)) = W(x1, . . . , xn).

Second Edition 6-11

An estimator of the given form is invariant if, for all aand (x1, . . . , xn),

W(x1+a, . . . , xn+a) = φ¯x+a

ss2=φ¯x

ss2=W(x1, . . . , xn).

In particular, for a sample point with s= 1 and ¯x= 0, this implies we must have φ(a) = φ(0),

for all a; that is, φmust be constant. On the other hand, if φis constant, then the estimators

are invariant by part a). So we have invariance if and only if φis constant. Invariance

with respect to G1also requires φto be constant because G2is a subgroup of G1. Finally,

an estimator of σ2is invariant with respect to G3if W(cx1, . . . , cxn) = c2W(x1, . . . , xn).

Estimators of the given form are invariant because

W(cx1, . . . , cxn) = φc¯x

cs c2s2=c2φ¯x

ss2=c2W(x1, . . . , xn).

Chapter 7

Point Estimation

7.1 For each value of x, the MLE ˆ

θis the value of θthat maximizes f(x|θ). These values are in the

following table.

x0 1 2 3 4

θ1 1 2 or 3 3 3

At x= 2, f(x|2) = f(x|3) = 1/4 are both maxima, so both ˆ

θ= 2 or ˆ

θ= 3 are MLEs.

7.2 a.

L(β|x) =

i=1

Γ(α)βαxα−1

ie−xi/β =1

Γ(α)nβnα "n

i=1

xi#α−1

e−Σixi/β

logL(β|x) = −log Γ(α)n−nα log β+ (α−1) log "n

i=1

xi#−Pixi

∂logL

∂β =−nα

β+Pixi

β2

Set the partial derivative equal to 0 and solve for βto obtain ˆ

β=Pixi/(nα). To check

that this is a maximum, calculate

∂2logL

∂β2β=ˆ

=nα

β2−2Pixi

β3β=ˆ

=(nα)3

(Pixi)2−2(nα)3

(Pixi)2=−(nα)3

(Pixi)2<0.

Because ˆ

βis the unique point where the derivative is 0 and it is a local maximum, it is a

global maximum. That is, ˆ

βis the MLE.

b. Now the likelihood function is

L(α, β|x) = 1

Γ(α)nβnα "n

i=1

xi#α−1

e−Σixi/β,

the same as in part (a) except αand βare both variables. There is no analytic form for the

MLEs, The values ˆαand ˆ

βthat maximize L. One approach to ﬁnding ˆαand ˆ

βwould be to

numerically maximize the function of two arguments. But it is usually best to do as much

as possible analytically, ﬁrst, and perhaps reduce the complexity of the numerical problem.

From part (a), for each ﬁxed value of α, the value of βthat maximizes Lis Pixi/(nα).

Substitute this into L. Then we just need to maximize the function of the one variable α

given by

Γ(α)n(Pixi/(nα))nα "n

i=1

xi#α−1

e−Σixi/(Σixi/(nα))

Γ(α)n(Pixi/(nα))nα "n

i=1

xi#α−1

e−nα.

7-2 Solutions Manual for Statistical Inference

For the given data, n= 14 and Pixi= 323.6. Many computer programs can be used

to maximize this function. From PROC NLIN in SAS we obtain ˆα= 514.219 and, hence,

β=323.6

14(514.219) =.0450.

7.3 The log function is a strictly monotone increasing function. Therefore, L(θ|x)> L(θ0|x) if and

only if log L(θ|x)>log L(θ0|x). So the value ˆ

θthat maximizes log L(θ|x) is the same as the

value that maximizes L(θ|x).

7.5 a. The value ˆzsolves the equation

(1 −p)n=Y

(1 −xiz),

where 0 ≤z≤(maxixi)−1. Let ˆ

k= greatest integer less than or equal to 1/ˆz. Then from

Example 7.2.9, ˆ

kmust satisfy

[k(1 −p)]n≥Y

(k−xi) and [(k+ 1)(1 −p)]n<Y

(k+ 1 −xi).

Because the right-hand side of the ﬁrst equation is decreasing in ˆz, and because ˆ

k≤1/ˆz(so

ˆz≤1/ˆ

k) and ˆ

k+ 1 >1/ˆz,ˆ

kmust satisfy the two inequalities. Thus ˆ

kis the MLE.

b. For p= 1/2, we must solve 1

24= (1 −20z)(1 −z)(1 −19z), which can be reduced to the

cubic equation −380z3+ 419z2−40z+ 15/16 = 0. The roots are .9998, .0646, and .0381,

leading to candidates of 1, 15, and 26 for ˆ

k. The ﬁrst two are less than maxixi. Thus ˆ

k= 26.

7.6 a. f(x|θ) = Qiθx−2

iI[θ,∞)(xi) = Qix−2

iθnI[θ,∞)(x(1)). Thus, X(1) is a suﬃcient statistic for

θby the Factorization Theorem.

b. L(θ|x) = θnQix−2

iI[θ,∞)(x(1)). θnis increasing in θ. The second term does not involve θ.

So to maximize L(θ|x), we want to make θas large as possible. But because of the indicator

function, L(θ|x) = 0 if θ > x(1). Thus, ˆ

θ=x(1).

c. E X=R∞

θθx−1dx =θlogx|∞

θ=∞. Thus the method of moments estimator of θdoes not

exist. (This is the Pareto distribution with α=θ,β= 1.)

7.7 L(0|x) = 1, 0 < xi<1, and L(1|x) = Qi1/(2√xi), 0 < xi<1. Thus, the MLE is 0 if

1≥Qi1/(2√xi), and the MLE is 1 if 1 <Qi1/(2√xi).

7.8 a. E X2= Var X+µ2=σ2. Therefore X2is an unbiased estimator of σ2.

L(σ|x) = 1

√2πσ e−x2/(2σ2).log L(σ|x) = log(2π)−1/2−log σ−x2/(2σ2).

∂logL

∂σ =−1

σ+x2

σ3

set

= 0 ⇒ˆσX2= ˆσ3⇒ˆσ=√X2=|X|.

∂2logL

∂σ2=−3x2σ2

σ6+1

σ2,which is negative at ˆσ=|x|.

Thus, ˆσ=|x|is a local maximum. Because it is the only place where the ﬁrst derivative is

zero, it is also a global maximum.

c. Because E X= 0 is known, just equate E X2=σ2=1

nP1

i=1 X2

i=X2⇒ˆσ=|X|.

7.9 This is a uniform(0, θ) model. So E X= (0 + θ)/2 = θ/2. The method of moments estimator

is the solution to the equation ˜

θ/2 = ¯

X, that is, ˜

θ= 2 ¯

X. Because ˜

θis a simple function of the

sample mean, its mean and variance are easy to calculate. We have

E˜

θ= 2E ¯

X= 2E X= 2θ

2=θ, and Var ˜

θ= 4Var ¯

X= 4θ2/12

n=θ2

3n.

Second Edition 7-3

The likelihood function is

L(θ|x) =

i=1

θI[0,θ](xi) = 1

θnI[0,θ](x(n))I[0,∞)(x(1)),

where x(1) and x(n)are the smallest and largest order statistics. For θ≥x(n),L= 1/θn, a

decreasing function. So for θ≥x(n),Lis maximized at ˆ

θ=x(n).L= 0 for θ < x(n). So the

overall maximum, the MLE, is ˆ

θ=X(n). The pdf of ˆ

θ=X(n)is nxn−1/θn, 0 ≤x≤θ. This

can be used to calculate

Eˆ

θ=n

n+ 1θ, Eˆ

θ2=n

n+ 2θ2and Var ˆ

θ=nθ2

(n+ 2)(n+ 1)2.

θis an unbiased estimator of θ;ˆ

θis a biased estimator. If nis large, the bias is not large

because n/(n+ 1) is close to one. But if nis small, the bias is quite large. On the other hand,

Var ˆ

θ < Var ˜

θfor all θ. So, if nis large, ˆ

θis probably preferable to ˜

θ.

7.10 a. f(x|θ) = Qiα

βαxα−1

iI[0,β](xi) = α

βαn(Qixi)α−1I(−∞,β](x(n))I[0,∞)(x(1)) = L(α, β|x). By

the Factorization Theorem, (QiXi, X(n)) are suﬃcient.

b. For any ﬁxed α,L(α, β|x) = 0 if β < x(n), and L(α, β|x) a decreasing function of βif

β≥x(n). Thus, X(n)is the MLE of β. For the MLE of αcalculate

∂

∂α logL=∂

∂α "nlogα−nαlogβ+(α−1)log Y

xi#=n

α−nlog β+ log Y

xi.

Set the derivative equal to zero and use ˆ

β=X(n)to obtain

ˆα=n

nlogX(n)−log QiXi

="1

(logX(n)−logXi)#−1

The second derivative is −n/α2<0, so this is the MLE.

c. X(n)= 25.0, log QiXi=Pilog Xi= 43.95 ⇒ˆ

β= 25.0, ˆα= 12.59.

7.11 a.

f(x|θ) = Y

θxθ−1

i=θn Y

xi!θ−1

=L(θ|x)

dθ log L=d

dθ "nlogθ+(θ−1)log Y

xi#=n

θ+X

log xi.

Set the derivative equal to zero and solve for θto obtain ˆ

θ= (−1

nPilog xi)−1. The second

derivative is −n/θ2<0, so this is the MLE. To calculate the variance of ˆ

θ, note that

Yi=−log Xi∼exponential(1/θ), so −Pilog Xi∼gamma(n, 1/θ). Thus ˆ

θ=n/T , where

T∼gamma(n, 1/θ). We can either calculate the ﬁrst and second moments directly, or use

the fact that ˆ

θis inverted gamma (page 51). We have

T=θn

Γ(n)Z∞

ttn−1e−θt dt =θn

Γ(n)

Γ(n−1)

θn−1=θ

n−1.

T2=θn

Γ(n)Z∞

t2tn−1e−θt dt =θn

Γ(n)

Γ(n−2)

θn−2=θ2

(n−1)(n−2),

7-4 Solutions Manual for Statistical Inference

and thus

Eˆ

θ=n

n−1θand Var ˆ

θ=n2

(n−1)2(n−2)θ2→0 as n→ ∞.

b. Because X∼beta(θ, 1), E X=θ/(θ+ 1) and the method of moments estimator is the

solution to 1

Xi=θ

θ+1 ⇒˜

θ=PiXi

n−PiXi

7.12 Xi∼iid Bernoulli(θ), 0 ≤θ≤1/2.

a. method of moments:

EX=θ=1

Xi=¯

X⇒˜

θ=¯

MLE: In Example 7.2.7, we showed that L(θ|x) is increasing for θ≤¯xand is decreasing

for θ≥¯x. Remember that 0 ≤θ≤1/2 in this exercise. Therefore, when ¯

X≤1/2, ¯

Xis

the MLE of θ, because ¯

Xis the overall maximum of L(θ|x). When ¯

X > 1/2, L(θ|x) is an

increasing function of θon [0,1/2] and obtains its maximum at the upper bound of θwhich

is 1/2. So the MLE is ˆ

θ= min ¯

X, 1/2.

b. The MSE of ˜

θis MSE(˜

θ) = Var ˜

θ+ bias(˜

θ)2= (θ(1 −θ)/n)+02=θ(1 −θ)/n. There is no

simple formula for MSE(ˆ

θ), but an expression is

MSE(ˆ

θ) = E(ˆ

θ−θ)2=

y=0

(ˆ

θ−θ)2n

yθy(1 −θ)n−y

[n/2]

y=0 y

n−θ2n

yθy(1 −θ)n−y+

y=[n/2]+1 1

2−θ2n

yθy(1 −θ)n−y,

where Y=PiXi∼binomial(n, θ) and [n/2] = n/2, if nis even, and [n/2] = (n−1)/2, if

nis odd.

c. Using the notation used in (b), we have

MSE(˜

θ) = E( ¯

X−θ)2=

y=0 y

n−θ2n

yθy(1 −θ)n−y.

Therefore,

MSE(˜

θ)−MSE(ˆ

θ) =

y=[n/2]+1 "y

n−θ2−1

2−θ2#n

yθy(1 −θ)n−y

y=[n/2]+1 y

n+1

2−2θy

n−1

2n

yθy(1 −θ)n−y.

The facts that y/n > 1/2 in the sum and θ≤1/2 imply that every term in the sum is positive.

Therefore MSE(ˆ

θ)<MSE(˜

θ) for every θin 0 < θ ≤1/2. (Note: MSE(ˆ

θ) = MSE(˜

θ) = 0 at

θ= 0.)

7.13 L(θ|x) = Qi1

2e−1

2|xi−θ|=1

2ne−1

2Σi|xi−θ|, so the MLE minimizes Pi|xi−θ|=Pi|x(i)−θ|,

where x(1), . . . , x(n)are the order statistics. For x(j)≤θ≤x(j+1),

i=1 |x(i)−θ|=

i=1

(θ−x(i)) +

i=j+1

(x(i)−θ) = (2j−n)θ−

i=1

x(i)+

i=j+1

x(i).

Second Edition 7-5

This is a linear function of θthat decreases for j < n/2 and increases for j > n/2. If nis even,

2j−n= 0 if j=n/2. So the likelihood is constant between x(n/2) and x((n/2)+1), and any

value in this interval is the MLE. Usually the midpoint of this interval is taken as the MLE. If

nis odd, the likelihood is minimized at ˆ

θ=x((n+1)/2).

7.15 a. The likelihood is

L(µ, λ|x) = λn/2

(2π)nQixi

exp (−λ

(xi−µ)2

µ2xi).

For ﬁxed λ, maximizing with respect to µis equivalent to minimizing the sum in the expo-

nential.

dµ X

(xi−µ)2

µ2xi

dµ X

((xi/µ)−1)2

=−X

2 ((xi/µ)−1)

µ2.

Setting this equal to zero is equivalent to setting

ixi

µ−1= 0,

and solving for µyields ˆµn= ¯x. Plugging in this ˆµnand maximizing with respect to λ

amounts to maximizing an expression of the form λn/2e−λb. Simple calculus yields

λn=n

2bwhere b=X

(xi−¯x)2

2¯x2xi

Finally,

2b=X

¯x2−2X

¯x+X

=−n

¯x+X

i1

xi−1

¯x.

b. This is the same as Exercise 6.27b.

c. This involved algebra can be found in Schwarz and Samanta (1991).

7.17 a. This is a special case of the computation in Exercise 7.2a.

b. Make the transformation

z= (x2−1)/x1, w =x1⇒x1=w, x2=wz + 1.

The Jacobean is |w|, and

fZ(z) = ZfX1(w)fX2(wz + 1)wdw =1

θ2e−1/θ Zwe−w(1+z)/θdw,

where the range of integration is 0 < w < −1/z if z < 0, 0 < w < ∞if z > 0. Thus,

fZ(z) = 1

θ2e−1/θ (R−1/z

0we−w(1+z)/θdw if z < 0

R∞

0we−w(1+z)/θdw if z≥0

Using the fact that Rwe−w/adw =−e−w/a(aw +a2), we have

fZ(z) = e−1/θ (zθ+e(1+z)/zθ (1+z−zθ)

θz(1+z)2if z < 0

(1+z)2if z≥0

7-6 Solutions Manual for Statistical Inference

c. From part (a) we get ˆ

θ= 1. From part (b), X2= 1 implies Z= 0 which, if we use the second

density, gives us ˆ

θ=∞.

d. The posterior distributions are just the normalized likelihood times prior, so of course they

are diﬀerent.

7.18 a. The usual ﬁrst two moment equations for Xand Yare

¯x= E X=µX,1

i= E X2=σ2

X+µ2

¯y= E Y=µY,1

i= E Y2=σ2

Y+µ2

We also need an equation involving ρ.

xiyi= E XY = Cov(X, Y ) + (E X)(E Y) = ρσXσY+µXµY.

Solving these ﬁve equations yields the estimators given. Facts such as

i−¯x2=Pix2

i−(Pixi)2/n

n=Pi(xi−¯x)2

are used.

b. Two answers are provided. First, use the Miscellanea: For

L(θ|x) = h(x)c(θ) exp k

i=1

wi(θ)ti(x)!,

the solutions to the kequations Pn

j=1 ti(xj) = EθPn

j=1 ti(Xj)=nEθti(X1), i= 1, . . . , k,

provide the unique MLE for θ. Multiplying out the exponent in the bivariate normal pdf

shows it has this exponential family form with k= 5 and t1(x, y) = x,t2(x, y) = y,t3(x, y) =

x2,t4(x, y) = y2and t5(x, y) = xy. Setting up the method of moment equations, we have

xi=nµX,X

i=n(µ2

X+σ2

X),

yi=nµY,X

i=n(µ2

Y+σ2

Y),

xiyi=X

[Cov(X, Y ) + µXµY] = n(ρσXσY+µXµY).

These are the same equations as in part (a) if you divide each one by n. So the MLEs are

the same as the method of moment estimators in part (a).

For the second answer, use the hint in the book to write

L(θ|x,y) = L(θ|x)L(θ, x|y)

= (2πσ2

X)−n

2exp (−1

2σ2

(xi−µX)2)

| {z }

×2πσ2

Y(1−ρ2)−n

2exp "−1

2σ2

Y(1 −ρ2)X

iyi−µY+ρσY

σX

(xi−µX)2#

| {z }

Second Edition 7-7

We know that ¯xand ˆσ2

X=Pi(xi−¯x)2/n maximizes A; the question is whether given σY,

µY, and ρ, does ¯x, ˆσ2

Xmaximize B? Let us ﬁrst ﬁx σ2

Xand look for ˆµX, that maximizes B.

We have

∂logB

∂µX∝ −2 X

i(yi−µY)−ρσY

σX

(xi−µX)!ρσY

σX

set

= 0

⇒X

(yi−µY) = ρσY

σX

Σ(xi−ˆµX).

Similarly do the same procedure for L(θ|y)L(θ, y|x) This implies Pi(xi−µX) = ρσX

σYPi(yi−

ˆµY). The solutions ˆµXand ˆµYtherefore must satisfy both equations. If Pi(yi−ˆµY)6= 0 or

Pi(xi−ˆµX)6= 0, we will get ρ= 1/ρ, so we need Pi(yi−ˆµY) = 0 and Pi(xi−ˆµX) = 0.

This implies ˆµX= ¯xand ˆµY= ¯y. ( ∂2log B

∂µ2

<0. Therefore it is maximum). To get ˆσ2

Xtake

∂log B

∂σ2

X∝X

ρσY

σ2

(xi−ˆµX)(yi−µY)−ρσY

σX

(xi−µX)set

= 0.

⇒X

(xi−ˆµX)(yi−ˆµY) = ρσY

ˆσXX(xi−ˆµX)2.

Similarly, Pi(xi−ˆµX)(yi−ˆµY) = ρσX

ˆσYPi(yi−ˆµY)2. Thus ˆσ2

Xand ˆσ2

Ymust satisfy the

above two equations with ˆµX=¯

X, ˆµY=¯

Y. This implies

ˆσY

ˆσXX

(xi−¯x)2=ˆσX

ˆσYX

(yi−¯y)2⇒Pi(xi−¯x)2

ˆσ2

=Pi(yi−¯y)2

ˆσ2

Therefore, ˆσ2

X=aPi(xi−¯x)2, ˆσ2

Y=aPi(yi−¯y)2where ais a constant. Combining the

knowledge that ¯x, 1

nPi(xi−¯x)2= (ˆµX,ˆσ2

X) maximizes A, we conclude that a= 1/n.

Lastly, we ﬁnd ˆρ, the MLE of ρ. Write

log L(¯x, ¯y, ˆσ2

X,ˆσ2

Y, ρ|x,y)

=−n

2log(1 −ρ2)−1

2(1−ρ2)X

i(xi−¯x)2

ˆσ2

X−2ρ(xi−¯x)(yi−¯y)

ˆσX,ˆσY

+(yi−¯y)2

ˆσ2

Y

=−n

2log(1 −ρ2)−1

2(1−ρ2)







2n−2ρX

(xi−¯x)(yi−¯y)

ˆσXˆσY

| {z }







because ˆσ2

X=1

nPi(xi−¯x)2and ˆσ2

Y=1

nPi(yi−¯y)2. Now

log L=−n

2log(1 −ρ2)−n

1−ρ2+ρ

1−ρ2A

and ∂log L

∂ρ =n

1−ρ2−nρ

(1−ρ2)2+A(1−ρ2)+2Aρ2

(1−ρ2)2

set

= 0.

This implies

A+Aρ2−nˆρ−nˆρ3

(1−ρ2)2= 0 ⇒A(1 + ˆρ2) = nˆρ(1 + ˆρ2)

⇒ˆρ=A

n=1

(xi−¯x)(yi−¯y)

ˆσXˆσY

7-8 Solutions Manual for Statistical Inference

7.19 a.

L(θ|y) = Y

√2πσ2exp −1

2σ2(yi−βxi)2

= (2πσ2)−n/2exp −1

2σ2X

(y2

i−2βxiyi+β2x2

i)!

= (2πσ2)−n/2exp −β2Pix2

2σ2exp −1

2σ2X

i+β

σ2X

xiyi!.

By Theorem 6.1.2, (PiY2

i,PixiYi) is a suﬃcient statistic for (β, σ2).

logL(β,σ2|y) = −n

2log(2π)−n

2log σ2−1

2σ2Xy2

i+β

σ2X

xiyi−β2

2σ2X

For a ﬁxed value of σ2,

∂logL

∂β =1

σ2X

xiyi−β

σ2X

set

= 0 ⇒ˆ

β=Pixiyi

Pix2

Also,

∂2logL

∂β2=1

σ2X

i<0,

so it is a maximum. Because ˆ

βdoes not depend on σ2, it is the MLE. And ˆ

βis unbiased

because

Eˆ

β=PixiEYi

Pix2

=Pixi·βxi

Pix2

=β.

c. ˆ

β=PiaiYi, where ai=xi/Pjx2

jare constants. By Corollary 4.6.10, ˆ

βis normally dis-

tributed with mean β, and

Var ˆ

β=X

iVar Yi=X

i xi

Pjx2

j!2

σ2=Pix2

(Pjx2

j)2σ2=σ2

Pix2

7.20 a.

EPiYi

Pixi

PixiX

EYi=1

PixiX

βxi=β.

Var PiYi

Pixi=1

(Pixi)2X

Var Yi=Piσ2

(Pixi)2=nσ2

n2¯x2=σ2

n¯x2.

Because Pix2

i−n¯x2=Pi(xi−¯x)2≥0, Pix2

i≥n¯x2. Hence,

Var ˆ

β=σ2

Pix2

i≤σ2

n¯x2= Var PiYi

Pixi.

(In fact, ˆ

βis BLUE (Best Linear Unbiased Estimator of β), as discussed in Section 11.3.2.)

Second Edition 7-9

7.21 a.

EYi

βxi

=β.

Var 1

n2X

Var Yi

=σ2

n2X

Using Example 4.7.8 with ai= 1/x2

iwe obtain

i≥n

Pix2

Thus,

Var ˆ

β=σ2

Pix2

i≤σ2

n2X

= Var 1

Because g(u) = 1/u2is convex, using Jensen’s Inequality we have

¯x2≤1

Thus,

Var PiYi

Pixi=σ2

n¯x2≤σ2

n2X

= Var 1

7.22 a.

f(¯x, θ) = f(¯x|θ)π(θ) = √n

√2πσ e−n(¯x−θ)2/(2σ2)1

√2πτ e−(θ−µ)2/2τ2.

b. Factor the exponent in part (a) as

−n

2σ2(¯x−θ)2−1

2τ2(θ−µ)2=−1

2v2(θ−δ(x))2−1

τ2+σ2/n(¯x−µ)2,

where δ(x) = (τ2¯x+ (σ2/n)µ)/(τ2+σ2/n) and v= (σ2τ2/n).(τ+σ2/n). Let n(a, b) denote

the pdf of a normal distribution with mean aand variance b. The above factorization shows

that

f(x, θ) = n(θ, σ2/n)×n(µ, τ2) = n(δ(x), v2)×n(µ, τ2+σ2/n),

where the marginal distribution of ¯

Xis n(µ, τ2+σ2/n) and the posterior distribution of θ|x

is n(δ(x), v2). This also completes part (c).

7.23 Let t=s2and θ=σ2. Because (n−1)S2/σ2∼χ2

n−1, we have

f(t|θ) = 1

Γ ((n−1)/2) 2(n−1)/2n−1

θt[(n−1)/2]−1

e−(n−1)t/2θn−1

θ.

With π(θ) as given, we have (ignoring terms that do not depend on θ)

π(θ|t)∝"1

θ((n−1)/2)−1

e−(n−1)t/2θ1

θ#1

θα+1 e−1/βθ

∝1

θ((n−1)/2)+α+1

exp −1

θ(n−1)t

2+1

β,

7-10 Solutions Manual for Statistical Inference

which we recognize as the kernel of an inverted gamma pdf, IG(a, b), with

a=n−1

2+αand b=(n−1)t

2+1

β−1

Direct calculation shows that the mean of an IG(a, b) is 1/((a−1)b), so

E(θ|t) =

n−1

2t+1

n−1

2+α−1=

n−1

2s2+1

n−1

2+α−1.

This is a Bayes estimator of σ2.

7.24 For nobservations, Y=PiXi∼Poisson(nλ).

a. The marginal pmf of Yis

m(y) = Z∞

(nλ)ye−nλ

Γ(α)βαλα−1e−λ/β dλ

=ny

y!Γ(α)βαZ∞

λ(y+α)−1e−λ

β/(nβ+1) dλ =ny

y!Γ(α)βαΓ(y+α)β

nβ+1 y+α

Thus,

π(λ|y) = f(y|λ)π(λ)

m(y)=λ(y+α)−1e−λ

β/(nβ+1)

Γ(y+α)β

nβ+1 y+α∼gamma y+α, β

nβ+1 .

E(λ|y)=(y+α)β

nβ+1 =β

nβ+1 y+1

nβ+1 (αβ).

Var(λ|y)=(y+α)β2

(nβ+1)2.

7.25 a. We will use the results and notation from part (b) to do this special case. From part (b),

the Xis are independent and each Xihas marginal pdf

m(x|µ, σ2, τ2) = Z∞

−∞

f(x|θ, σ2)π(θ|µ, τ 2)dθ =Z∞

−∞

2πστ e−(x−θ)2/2σ2e−(θ−µ)2/2τ2dθ.

Complete the square in θto write the sum of the two exponents as

−θ−hxτ2

σ2+τ2+µσ2

σ2+τ2i2

2σ2τ2

σ2+τ2−(x−µ)2

2(σ2+τ2).

Only the ﬁrst term involves θ; call it −A(θ). Also, e−A(θ)is the kernel of a normal pdf. Thus,

Z∞

−∞

e−A(θ)dθ =√2πστ

√σ2+τ2,

and the marginal pdf is

m(x|µ, σ2, τ2) = 1

2πστ √2πστ

√σ2+τ2exp −(x−µ)2

2(σ2+τ2)

√2π√σ2+τ2exp −(x−µ)2

2(σ2+τ2),

a n(µ, σ2+τ2) pdf.

Second Edition 7-11

b. For one observation of Xand θthe joint pdf is

h(x, θ|τ) = f(x|θ)π(θ|τ),

and the marginal pdf of Xis

m(x|τ) = Z∞

−∞

h(x, θ|τ)dθ.

Thus, the joint pdf of X= (X1, . . . , Xn) and θ= (θ1, . . . , θn) is

h(x,θ|τ) = Y

h(xi, θi|τ),

and the marginal pdf of Xis

m(x|τ) = Z∞

−∞ ···Z∞

−∞ Y

h(xi, θi|τ)dθ1. . . dθn

=Z∞

−∞ ···Z∞

−∞

h(x1, θ1|τ)dθ1n

i=2

h(xi, θi|τ)dθ2. . . dθn.

The dθ1integral is just m(x1|τ), and this is not a function of θ2, . . . , θn. So, m(x1|τ) can be

pulled out of the integrals. Doing each integral in turn yields the marginal pdf

m(x|τ) = Y

m(xi|τ).

Because this marginal pdf factors, this shows that marginally X1, . . . , Xnare independent,

and they each have the same marginal distribution, m(x|τ).

7.26 First write

f(x1, . . . , xn|θ)π(θ)∝e−n

2σ2(¯x−θ)2−|θ|/a

where the exponent can be written

2σ2(¯x−θ)2−|θ|

a=n

2σ2(θ−δ±(x)) + n

2σ2¯x2−δ2

±(x)

with δ±(x) = ¯x±σ2

na , where we use the “+” if θ > 0 and the “−” if θ < 0. Thus, the posterior

mean is

E(θ|x) = R∞

−∞ θe−n

2σ2(θ−δ±(x))2dθ

R∞

−∞ e−n

2σ2(θ−δ±(x))2dθ .

Now use the facts that for constants aand b,

Z∞

e−a

2(t−b)2dt =Z0

−∞

e−a

2(t−b)2dt =rπ

2a,

Z∞

te−a

2(t−b)2dt =Z∞

(t−b)e−a

2(t−b)2dt +Z∞

be−a

2(t−b)2dt =1

ae−a

2b2+brπ

2a,

−∞

te−a

2(t−b)2dt =−1

ae−a

2b2+brπ

2a,

to get

E(θ|x) = qπσ2

2n(δ−(x) + δ+(x)) +σ2

ne−n

2σ2δ2

+(x)−e−n

2σ2δ2

−(x)

2qπσ2

7-12 Solutions Manual for Statistical Inference

7.27 a. The log likelihood is

log L=

i=1 −βτi+yilog(βτi)−τi+xilog(τi)−log yi!−log xi!

and diﬀerentiation gives

∂

∂β log L=

i=1 −τi+yiτi

βτi⇒β=Pn

i=1 yi

i=1 τi

∂

∂τj

log L=−β+yjβ

βτj−i+xj

τj⇒τj=xj+yj

1 + β

⇒

j=1

τj=Pn

j=1 xj+Pn

j=1 yj

1 + β.

Combining these expressions yields ˆ

β=Pn

j=1 yj/Pn

j=1 xjand ˆτj=xj+yj

1+ ˆ

β.

b. The stationary point of the EM algorithm will satisfy

β=Pn

i=1 yi

ˆτ1+Pn

i=2 xi

ˆτ1=ˆτ1+y1

β+ 1

ˆτj=xj+yj

β+ 1 .

The second equation yields τ1=y1/β, and substituting this into the ﬁrst equation yields

β=Pn

j=2 yj/Pn

j=2 xj. Summing over jin the third equation, and substituting β=

j=2 yj/Pn

j=2 xjshows us that Pn

j=2 ˆτj=Pn

j=2 xj, and plugging this into the ﬁrst equa-

tion gives the desired expression for ˆ

β. The other two equations in (7.2.16) are obviously

satisﬁed.

c. The expression for ˆ

βwas derived in part (b), as were the expressions for ˆτi.

7.29 a. The joint density is the product of the individual densities.

b. The log likelihood is

log L=

i=1 −mβτi+yilog(mβτi) + xilog(τi) + log m!−log yi!−log xi!

and

∂

∂β log L= 0 ⇒β=Pn

i=1 yi

i=1 mτi

∂

∂τj

log L= 0 ⇒τj=xj+yj

mβ .

Since Pτj= 1, ˆ

β=Pn

i=1 yi/m =Pn

i=1 yi/Pn

i=1 xi. Also, Pjτj=Pj(yj+xj) = 1, which

implies that mβ =Pj(yj+xj) and ˆτj= (xj+yj)/Pi(yi+xi).

c. In the likelihood function we can ignore the factorial terms, and the expected complete-data

likelihood is obtained by on the rth iteration by replacing x1with E(X1|ˆτ(r)

1) = mˆτ(r)

Substituting this into the MLEs of part (b) gives the EM sequence.

Second Edition 7-13

The MLEs from the full data set are ˆ

β= 0.0008413892 and

ˆτ= (0.06337310,0.06374873,0.06689681,0.04981487,0.04604075,0.04883109,

0.07072460,0.01776164,0.03416388,0.01695673,0.02098127,0.01878119,

0.05621836,0.09818091,0.09945087,0.05267677,0.08896918,0.08642925).

The MLEs for the incomplete data were computed using R, where we take m=Pxi. The

Rcode is

#mles on the incomplete data#

xdatam<-c(3560,3739,2784,2571,2729,3952,993,1908,948,1172,

1047,3138,5485,5554,2943,4969,4828)

ydata<-c(3,4,1,1,3,1,2,0,2,0,1,3,5,4,6,2,5,4)

xdata<-c(mean(xdatam),xdatam); for (j in 1:500) {

xdata<-c(sum(xdata)*tau[1],xdatam) beta<-sum(ydata)/sum(xdata)

tau<-c((xdata+ydata)/(sum(xdata)+sum(ydata))) } beta tau

The MLEs from the incomplete data set are ˆ

β= 0.0008415534 and

ˆτ= (0.06319044,0.06376116,0.06690986,0.04982459,0.04604973,0.04884062,

0.07073839,0.01776510,0.03417054,0.01696004,0.02098536,0.01878485,

0.05622933,0.09820005,0.09947027,0.05268704,0.08898653,0.08644610).

7.31 a. By direct substitution we can write

log L(θ|y) = E hlog L(θ|y,X)|ˆ

θ(r),yi−Ehlog k(X|θ, y)|ˆ

θ(r),yi.

The next iterate, ˆ

θ(r+1) is obtained by maximizing the expected complete-data log likelihood,

so for any θ, E hlog L(ˆ

θ(r+1)y,X)ˆ

θ(r),yi≥Ehlog L(θ|y,X)|ˆ

θ(r),yi

b. Write

from the hint. Hence E hlog k(X|ˆ

θ(r+1),y)ˆ

θ(r),yi≤Ehlog k(X|ˆ

θ(r),y)ˆ

θ(r),yi, and so the

entire right hand side in part (a) is decreasing.

7.33 Substitute α=β=pn/4 into MSE(ˆpB) = np(1−p)

(α+β+n)2+np+α

α+β+n−p2and simplify to obtain

MSE(ˆpB) = n

4(√n+n)2,

independent of p, as desired.

7.35 a.

δp(g(x)) = δp(x1+a, . . . , xn+a)

=R∞

−∞ tQif(xi+a−t)dt

R∞

−∞ Qif(xi+a−t)dt =R∞

−∞ (y+a)Qif(xi−y)dy

R∞

−∞ Qif(xi−y)dy (y=t−a)

=a+δp(x) = ¯g(δp(x)) .

7-14 Solutions Manual for Statistical Inference

f(xi−t) = 1

(2π)n/2e−1

2Σi(xi−t)2=1

(2π)n/2e−1

2n(¯x−t)2e−1

2(n−1)s2,

δp(x) = (√n/√2π)R∞

−∞ te−1

2n(¯x−t)2dt

(√n/√2π)R∞

−∞ e−1

2n(¯x−t)2dt =¯x

1= ¯x.

f(xi−t) = Y

It−1

2≤xi≤t+1

2=Ix(n)−1

2≤t≤x(1) +1

2,

δp(x) = Rx(1)+1/2

x(n)+1/2t dt

Rx(1)+1/2

x(n)+1/21dt

=x(1) +x(n)

7.37 To ﬁnd a best unbiased estimator of θ, ﬁrst ﬁnd a complete suﬃcient statistic. The joint pdf is

f(x|θ) = 1

2θnY

I(−θ,θ)(xi) = 1

2θn

I[0,θ)(max

i|xi|).

By the Factorization Theorem, maxi|Xi|is a suﬃcient statistic. To check that it is a complete

suﬃcient statistic, let Y= maxi|Xi|. Note that the pdf of Yis fY(y) = nyn−1/θn, 0 < y < θ.

Suppose g(y) is a function such that

Eg(Y) = Zθ

nyn−1

θng(y)dy = 0,for all θ.

Taking derivatives shows that θn−1g(θ) = 0, for all θ. So g(θ) = 0, for all θ, and Y= maxi|Xi|

is a complete suﬃcient statistic. Now

EY=Zθ

ynyn−1

θndy =n

n+ 1θ⇒En+ 1

nY=θ.

Therefore n+1

nmaxi|Xi|is a best unbiased estimator for θbecause it is a function of a complete

suﬃcient statistic. (Note that X(1), X(n)is not a minimal suﬃcient statistic (recall Exercise

5.36). It is for θ < Xi<2θ,−2θ < Xi< θ, 4θ < Xi<6θ, etc., but not when the range is

symmetric about zero. Then maxi|Xi|is minimal suﬃcient.)

7.38 Use Corollary 7.3.15.

∂

∂θ logL(θ|x) = ∂

∂θ log Y

θxθ−1

i=∂

∂θ X

[logθ+ (θ−1) logxi]

i1

θ+ logxi=−n"−X

logxi

n−1

θ#.

Thus, −Pilog Xi/n is the UMVUE of 1/θ and attains the Cram´er-Rao bound.

Second Edition 7-15

∂

∂θ logL(θ|x) = ∂

∂θ log Y

logθ

θ−1θxi=∂

∂θ X

[loglogθ−log(θ−1) + xilogθ]

i1

θlogθ−1

θ−1+1

θX

xi=n

θlogθ−n

θ−1+n¯x

θ¯x−θ

θ−1−1

logθ.

Thus, ¯

Xis the UMVUE of θ

θ−1−1

logθand attains the Cram´er-Rao lower bound.

Note: We claim that if ∂

∂θ log L(θ|X) = a(θ)[W(X)−τ(θ)], then E W(X) = τ(θ), because

under the condition of the Cram´er-Rao Theorem, E ∂

∂θ log L(θ|x) = 0. To be rigorous, we

need to check the “interchange diﬀerentiation and integration“ condition. Both (a) and (b)

are exponential families, and this condition is satisﬁed for all exponential families.

7.39

Eθ∂2

∂θ2log f(X|θ)=Eθ∂

∂θ ∂

∂θ log f(X|θ)

= Eθ"∂

∂θ ∂

∂θ f(X|θ)

f(X|θ)!# = Eθ



∂2

∂θ2f(X|θ)

f(X|θ)− ∂

∂θ f(X|θ)

f(X|θ)!2

.

Now consider the ﬁrst term:

Eθ"∂2

∂θ2f(X|θ)

f(X|θ)#=Z∂2

∂θ2f(x|θ)dx=d

dθ Z∂

∂θ f(x|θ)dx(assumption)

dθ Eθ∂

∂θ log f(X|θ)= 0,(7.3.8)

and the identity is proved.

7.40

∂

∂θ logL(θ|x) = ∂

∂p log Y

pxi(1 −p)1−xi=∂

∂p X

xilog p+ (1 −xi) log(1 −p)

ixi

p−(1 −xi)

1−p=n¯x

p−n−n¯x

1−p=n

p(1 −p)[¯x−p].

By Corollary 7.3.15, ¯

Xis the UMVUE of pand attains the Cram´er-Rao lower bound. Alter-

natively, we could calculate

−nEθ∂2

∂θ2logf(X|θ)

=−nE∂2

∂p2log hpX(1 −p)1−Xi=−nE∂2

∂p2[Xlogp+ (1 −X) log(1 −p)]

=−nE∂

∂p X

p−(1 −X)

1−p =−nE −X

p2−1−X

(1 −p)2!

=−n−1

p−1

1−p=n

p(1 −p).

7-16 Solutions Manual for Statistical Inference

Then using τ(θ) = pand τ0(θ) = 1,

τ0(θ)

−nEθ∂2

∂θ2logf(X|θ)=1

n/p(1 −p)=p(1 −p)

n= Var ¯

We know that E ¯

X=p. Thus, ¯

Xattains the Cram´er-Rao bound.

7.41 a. E (PiaiXi) = PiaiEXi=Piaiµ=µPiai=µ. Hence the estimator is unbiased.

b. Var (PiaiXi) = Pia2

iVar Xi=Pia2

iσ2=σ2Pia2

i. Therefore, we need to minimize Pia2

subject to the constraint Piai= 1. Add and subtract the mean of the ai, 1/n, to get

i=X

iai−1

n+1

n2

iai−1

n2

because the cross-term is zero. Hence, Pia2

iis minimized by choosing ai= 1/n for all i.

Thus, Pi(1/n)Xi=¯

Xhas the minimum variance among all linear unbiased estimators.

7.43 a. This one is real hard - it was taken from an American Statistician article, but the proof is

not there. A cryptic version of the proof is in Tukey (Approximate Weights, Ann. Math.

Statist. 1948, 91-92); here is a more detailed version.

Let qi=q∗

i(1 + λti) with 0 ≤λ≤1 and |ti| ≤ 1. Recall that q∗

i= (1/σ2

i)/Pj(1/σ2

j) and

VarW∗= 1/Pj(1/σ2

j). Then

Var qiWi

Pjqj!=1

(Pjqj)2X

qiσ2

[Pjq∗

j(1 + λtj)]2X

q∗2

i(1 + λti)2σ2

[Pjq∗

j(1 + λtj)]2Pj(1/σ2

j)X

q∗

i(1 + λti)2,

using the deﬁnition of q∗

i. Now write

q∗

i(1 + λti)2= 1 + 2λX

qjtj+λ2X

qjt2

j= [1 + λX

qjtj]2+λ2[X

qjt2

j−(X

qjtj)2],

where we used the fact that Pjq∗

j= 1. Now since

q∗

j(1 + λtj)]2= [1 + λX

qjtj]2,

Var qiWi

Pjqj!=1

Pj(1/σ2

j)"1 + λ2[Pjqjt2

j−(Pjqjtj)2]

[1 + λPjqjtj]2#

≤1

Pj(1/σ2

j)"1 + λ2[1 −(Pjqjtj)2]

[1 + λPjqjtj]2#,

since Pjqjt2

j≤1. Now let T=Pjqjtj, and

Var qiWi

Pjqj!≤1

Pj(1/σ2

j)1 + λ2[1 −T2]

[1+λT ]2,

Second Edition 7-17

and the right hand side is maximized at T=−λ, with maximizing value

Var qiWi

Pjqj!≤1

Pj(1/σ2

j)1 + λ2[1 −λ2]

[1 −λ2]2= VarW∗1

1−λ2.

Bloch and Moses (1988) deﬁne λas the solution to

bmax/bmin =1 + λ

1−λ,

where bi/bjare the ratio of the normalized weights which, in the present notation, is

bi/bj= (1 + λti)/(1 + λtj).

The right hand side is maximized by taking tias large as possible and tjas small as possible,

and setting ti= 1 and tj=−1 (the extremes) yields the Bloch and Moses (1988) solution.

bi=1/k

(1/σ2

i).Pj1/σ2

j=σ2

1/σ2

Thus,

bmax =σ2

max

1/σ2

jand bmin =σ2

min

1/σ2

and B=bmax/bmin =σ2

max/σ2

min. Solving B= (1 + λ)/(1 −λ) yields λ= (B−1)/(B+ 1).

Substituting this into Tukey’s inequality yields

Var W

Var W∗≤(B+ 1)2

4B=((σ2

max/σ2

min) + 1)2

4(σ2

max/σ2

min).

7.44 PiXiis a complete suﬃcient statistic for θwhen Xi∼n(θ, 1). ¯

X2−1/n is a function of

PiXi. Therefore, by Theorem 7.3.23, ¯

X2−1/n is the unique best unbiased estimator of its

expectation.

E¯

X2−1

n= Var ¯

X+ (E ¯

X)2−1

n=1

n+θ2−1

n=θ2.

Therefore, ¯

X2−1/n is the UMVUE of θ2. We will calculate

Var ¯

X2−1/n= Var( ¯

X2) = E( ¯

X4)−[E( ¯

X2)]2,where ¯

X∼n (θ, 1/n),

but ﬁrst we derive some general formulas that will also be useful in later exercises. Let Y∼

n(θ, σ2). Then here are formulas for E Y4and Var Y2.

EY4= E[Y3(Y−θ+θ)] = E Y3(Y−θ)+EY3θ=E Y 3(Y−θ) + θEY3.

EY3(Y−θ) = σ2E(3Y2) = σ23σ2+θ2= 3σ4+ 3θ2σ2.(Stein’s Lemma)

θEY3=θ3θσ2+θ3= 3θ2σ2+θ4.(Example 3.6.6)

Var Y2= 3σ4+ 6θ2σ2+θ4−(σ2+θ2)2= 2σ4+ 4θ2σ2.

Thus,

Var ¯

X2−1

n= Var ¯

X2= 2 1

n2+ 4θ21

n>4θ2

7-18 Solutions Manual for Statistical Inference

To calculate the Cram´er-Rao lower bound, we have

Eθ∂2logf(X|θ)

∂θ2= Eθ∂2

∂θ2log 1

√2πe−(X−θ)2/2

= Eθ∂2

∂θ2log(2π)−1/2−1

2(X−θ)2 = Eθ∂

∂θ (X−θ)=−1,

and τ(θ) = θ2, [τ0(θ)]2= (2θ)2= 4θ2so the Cram´er-Rao Lower Bound for estimating θ2is

[τ0(θ)]2

−nEθ∂2

∂θ2logf(X|θ)=4θ2

Thus, the UMVUE of θ2does not attain the Cram´er-Rao bound. (However, the ratio of the

variance and the lower bound →1 as n→ ∞.)

7.45 a. Because E S2=σ2, bias(aS2) = E(aS2)−σ2= (a−1)σ2. Hence,

MSE(aS2) = Var(aS2) + bias(aS2)2=a2Var(S2)+(a−1)2σ4.

b. There were two typos in early printings; κ= E[X−µ]4/σ4and

Var(S2) = 1

nκ−n−3

n−1σ4.

See Exercise 5.8b for the proof.

c. There was a typo in early printings; under normality κ= 3. Under normality we have

κ=E[X−µ]4

σ4= E X−µ

σ4

= E Z4,

where Z∼n(0,1). Now, using Lemma 3.6.5 with g(z) = z3we have

κ= E Z4= E g(Z)Z= 1E(3Z2) = 3E Z2= 3.

To minimize MSE(S2) in general, write Var(S2) = Bσ4. Then minimizing MSE(S2) is

equivalent to minimizing a2B+ (a−1)2. Set the derivative of this equal to 0 (Bis not a

function of a) to obtain the minimizing value of ais 1/(B+ 1). Using the expression in part

(b), under normality the minimizing value of ais

B+ 1 =1

n3−n−3

n−1+ 1

=n−1

n+ 1.

d. There was a typo in early printings; the minimizing ais

a=n−1

(n+ 1) + (κ−3)(n−1)

To obtain this simply calculate 1/(B+ 1) with (from part (b))

B=1

nκ−n−3

n−1.

Second Edition 7-19

e. Using the expression for ain part (d), if κ= 3 the second term in the denominator is

zero and a= (n−1)/(n+ 1), the normal result from part (c). If κ < 3, the second term

in the denominator is negative. Because we are dividing by a smaller value, we have a >

(n−1)/(n+ 1). Because Var(S2) = Bσ4,B > 0, and, hence, a= 1/(B+ 1) <1. Similarly, if

κ > 3, the second term in the denominator is positive. Because we are dividing by a larger

value, we have a < (n−1)/(n+ 1).

7.46 a. For the uniform(θ, 2θ) distribution we have E X= (2θ+θ)/2 = 3θ/2. So we solve 3θ/2 = ¯

for θto obtain the method of moments estimator ˜

θ= 2 ¯

X/3.

b. Let x(1), . . . , x(n)denote the observed order statistics. Then, the likelihood function is

L(θ|x) = 1

θnI[x(n)/2,x(1)](θ).

Because 1/θnis decreasing, this is maximized at ˆ

θ=x(n)/2. So ˆ

θ=X(n)/2 is the MLE. Use

the pdf of X(n)to calculate E X(n)=2n+1

n+1 θ. So E ˆ

θ=2n+1

2n+2 θ, and if k= (2n+ 2)/(2n+ 1),

Ekˆ

θ=θ.

c. From Exercise 6.23, a minimal suﬃcient statistic for θis (X(1), X(n)). ˜

θis not a function

of this minimal suﬃcient statistic. So by the Rao-Blackwell Theorem, E(˜

θ|X(1), X(n)) is an

unbiased estimator of θ(˜

θis unbiased) with smaller variance than ˜

θ. The MLE is a function

of (X(1), X(n)), so it can not be improved with the Rao-Blackwell Theorem.

d. ˜

θ= 2(1.16)/3 = .7733 and ˆ

θ= 1.33/2 = .6650.

7.47 Xi∼n(r, σ2), so ¯

X∼n(r, σ2/n) and E ¯

X2=r2+σ2/n. Thus E [(π¯

X2−πσ2/n)] = πr2is

best unbiased because ¯

Xis a complete suﬃcient statistic. If σ2is unknown replace it with s2

and the conclusion still holds.

7.48 a. The Cram´er-Rao Lower Bound for unbiased estimates of pis

dp pi2

−nEd2

dp2logL(p|X)=1

−nEnd2

dp2log[pX(1 −p)1−X]o=1

−nEn−X

p2−(1−X)

(1−p)2o=p(1 −p)

because E X=p. The MLE of pis ˆp=PiXi/n, with E ˆp=pand Var ˆp=p(1 −p)/n. Thus

ˆpattains the CRLB and is the best unbiased estimator of p.

b. By independence, E(X1X2X3X4) = QiEXi=p4, so the estimator is unbiased. Because

PiXiis a complete suﬃcient statistic, Theorems 7.3.17 and 7.3.23 imply that E(X1X2X3X4|

PiXi) is the best unbiased estimator of p4. Evaluating this yields

E X1X2X3X4X

Xi=t!=P(X1=X2=X3=X4= 1,Pn

i=5 Xi=t−4)

P(PiXi=t)

=p4n−4

t−4pt−4(1 −p)n−t

n

tpt(1 −p)n−t=n−4

t−4.n

t,

for t≥4. For t < 4 one of the Xis must be zero, so the estimator is E(X1X2X3X4|PiXi=

t) = 0.

7.49 a. From Theorem 5.5.9, Y=X(1) has pdf

fY(y) = n!

(n−1)!

λe−y/λ h1−(1 −e−y/λ)in−1=n

λe−ny/λ.

Thus Y∼exponential(λ/n) so E Y=λ/n and nY is an unbiased estimator of λ.

7-20 Solutions Manual for Statistical Inference

b. Because fX(x) is in the exponential family, PiXiis a complete suﬃcient statistic and

E (nX(1)|PiXi) is the best unbiased estimator of λ. Because E (PiXi) = nλ, we must

have E (nX(1)|PiXi) = PiXi/n by completeness. Of course, any function of PiXithat

is an unbiased estimator of λis the best unbiased estimator of λ. Thus, we know directly

that because E(PiXi) = nλ,PiXi/n is the best unbiased estimator of λ.

c. From part (a), ˆ

λ= 601.2 and from part (b) ˆ

λ= 128.8. Maybe the exponential model is not

a good assumption.

7.50 a. E(a¯

X+ (1 −a)cS) = aE¯

X+ (1 −a)E(cS) = aθ + (1 −a)θ=θ. So a¯

X+ (1 −a)cS is an

unbiased estimator of θ.

b. Because ¯

Xand S2are independent for this normal model, Var(a¯

X+(1−a)cS) = a2V1+(1−

a)2V2, where V1= Var ¯

X=θ2/n and V2= Var(cS) = c2ES2−θ2=c2θ2−θ2= (c2−1)θ2.

Use calculus to show that this quadratic function of ais minimized at

a=V2

V1+V2

=(c2−1)θ2

((1/n) + c2−1)θ2=(c2−1)

((1/n) + c2−1).

c. Use the factorization in Example 6.2.9, with the special values µ=θand σ2=θ2, to show

that ( ¯

X, S2) is suﬃcient. E( ¯

X−cS) = θ−θ= 0, for all θ. So ¯

X−cS is a nonzero function

of ( ¯

X, S2) whose expected value is always zero. Thus ( ¯

X, S2) is not complete.

7.51 a. Straightforward calculation gives:

Eθ−(a1¯

X+a2cS)2=a2

1Var ¯

X+a2

2c2Var S+θ2(a1+a2−1)2.

Because Var ¯

X=θ2/n and Var S= E S2−(E S)2=θ2c2−1

c2, we have

Eθ−(a1¯

X+a2cS)2=θ2ha2

1.n+a2

2(c2−1) + (a1+a2−1)2i,

and we only need minimize the expression in square brackets, which is independent of θ.

Diﬀerentiating yields a2=(n+ 1)c2−n−1and a1= 1 −(n+ 1)c2−n−1.

b. The estimator T∗has minimum MSE over a class of estimators that contain those in Exercise

7.50.

c. Because θ > 0, restricting T∗≥0 will improve the MSE.

d. No. It does not ﬁt the deﬁnition of either one.

7.52 a. Because the Poisson family is an exponential family with t(x) = x,PiXiis a complete

suﬃcient statistic. Any function of PiXithat is an unbiased estimator of λis the unique

best unbiased estimator of λ. Because ¯

Xis a function of PiXiand E ¯

X=λ,¯

Xis the best

unbiased estimator of λ.

b. S2is an unbiased estimator of the population variance, that is, E S2=λ.¯

Xis a one-to-one

function of PiXi. So ¯

Xis also a complete suﬃcient statistic. Thus, E(S2|¯

X) is an unbiased

estimator of λand, by Theorem 7.3.23, it is also the unique best unbiased estimator of λ.

Therefore E(S2|¯

X) = ¯

X. Then we have

Var S2= Var E(S2|¯

X)+ E Var(S2|¯

X) = Var ¯

X+ E Var(S2|¯

X),

so Var S2>Var ¯

c. We formulate a general theorem. Let T(X) be a complete suﬃcient statistic, and let T0(X) be

any statistic other than T(X) such that E T(X)=ET0(X). Then E[T0(X)|T(X)] = T(X)

and Var T0(X)>Var T(X).

Second Edition 7-21

7.53 Let abe a constant and suppose Covθ0(W, U)>0. Then

Varθ0(W+aU) = Varθ0W+a2Varθ0U+ 2aCovθ0(W, U).

Choose a∈−2Covθ0(W, U).Varθ0U, 0. Then Varθ0(W+aU)<Varθ0W, so Wcannot be

best unbiased.

7.55 All three parts can be solved by this general method. Suppose X∼f(x|θ) = c(θ)m(x), a < x <

θ. Then 1/c(θ) = Rθ

am(x)dx, and the cdf of Xis F(x) = c(θ)/c(x), a < x < θ. Let Y=X(n)be

the largest order statistic. Arguing as in Example 6.2.23 we see that Yis a complete suﬃcient

statistic. Thus, any function T(Y) that is an unbiased estimator of h(θ) is the best unbiased

estimator of h(θ). By Theorem 5.4.4 the pdf of Yis g(y|θ) = nm(y)c(θ)n/c(y)n−1,a<y<θ.

Consider the equations

Zθ

f(x|θ)dx = 1 and Zθ

T(y)g(y|θ)dy =h(θ),

which are equivalent to

Zθ

m(x)dx =1

c(θ)and Zθ

T(y)nm(y)

c(y)n−1dy =h(θ)

c(θ)n.

Diﬀerentiating both sides of these two equations with respect to θand using the Fundamental

Theorem of Calculus yields

m(θ) = −c0(θ)

c(θ)2and T(θ)nm(θ)

c(θ)n−1=c(θ)nh0(θ)−h(θ)nc(θ)n−1c0(θ)

c(θ)2n.

Change θs to ys and solve these two equations for T(y) to get the best unbiased estimator of

h(θ) is

T(y) = h(y) + h0(y)

nm(y)c(y).

For h(θ) = θr,h0(θ) = rθr−1.

a. For this pdf, m(x) = 1 and c(θ) = 1/θ. Hence

T(y) = yr+ryr−1

n(1/y)=n+r

nyr.

b. If θis the lower endpoint of the support, the smallest order statistic Y=X(1) is a complete

suﬃcient statistic. Arguing as above yields the best unbiased estimator of h(θ) is

T(y) = h(y)−h0(y)

nm(y)c(y).

For this pdf, m(x) = e−xand c(θ) = eθ. Hence

T(y) = yr−ryr−1

ne−yey=yr−ryr−1

c. For this pdf, m(x) = e−xand c(θ) = 1/(e−θ−e−b). Hence

T(y) = yr−ryr−1

ne−y(e−y−e−b) = yr−ryr−1(1 −e−(b−y))

7-22 Solutions Manual for Statistical Inference

7.56 Because Tis suﬃcient, φ(T) = E[h(X1, . . . , Xn)|T] is a function only of T. That is, φ(T) is an

estimator. If E h(X1, . . . , Xn) = τ(θ), then

Eh(X1,···, Xn) = E [E ( h(X1, . . . , Xn)|T)] = τ(θ),

so φ(T) is an unbiased estimator of τ(θ). By Theorem 7.3.23, φ(T) is the best unbiased estimator

of τ(θ).

7.57 a. Tis a Bernoulli random variable. Hence,

EpT=Pp(T= 1) = Pp n

i=1

Xi> Xn+1!=h(p).

b. Pn+1

i=1 Xiis a complete suﬃcient statistic for θ, so E TPn+1

i=1 Xiis the best unbiased

estimator of h(p). We have

E T

n+1

i=1

Xi=y!=P n

i=1

Xi> Xn+1

n+1

i=1

Xi=y!

=P n

i=1

Xi> Xn+1,

n+1

i=1

Xi=y!.P n+1

i=1

Xi=y!.

The denominator equals n+1

ypy(1 −p)n+1−y. If y= 0 the numerator is

P n

i=1

Xi> Xn+1,

n+1

i=1

Xi= 0!= 0.

If y > 0 the numerator is

P n

i=1

Xi> Xn+1,

n+1

i=1

Xi=y, Xn+1 = 0!+P n

i=1

Xi> Xn+1,

n+1

i=1

Xi=y, Xn+1 = 1!

which equals

P n

i=1

Xi>0,

i=1

Xi=y!P(Xn+1 = 0) + P n

i=1

Xi>1,

i=1

Xi=y−1!P(Xn+1 = 1).

For all y > 0,

P n

i=1

Xi>0,

i=1

Xi=y!=P n

i=1

Xi=y!=n

ypy(1 −p)n−y.

If y= 1 or 2, then

P n

i=1

Xi>1,

i=1

Xi=y−1!= 0.

And if y > 2, then

P n

i=1

Xi>1,

i=1

Xi=y−1!=P n

i=1

Xi=y−1!=n

y−1py−1(1 −p)n−y+1.

Second Edition 7-23

Therefore, the UMVUE is

E T

n+1

i=1

Xi=y!=









0 if y= 0

y)py(1−p)n−y(1−p)

(n+1

y)py(1−p)n−y+1 =(n

(n+1

y)=1

(n+1)(n+1−y)if y= 1 or 2

((n

y)+(n

y−1))py(1−p)n−y+1

(n+1

y)py(1−p)n−y+1 =(n

y)+(n

y−1)

(n+1

y)= 1 if y > 2.

7.59 We know T= (n−1)S2/σ2∼χ2

n−1. Then

ETp/2=1

Γn−1

22n−1

2Z∞

tp+n−1

2−1e−t

2dt =2p

2Γp+n−1

2

Γn−1

2=Cp,n.

Thus

E (n−1)S2

σ2!p/2

=Cp,n,

so (n−1)p/2Sp.Cp,n is an unbiased estimator of σp. From Theorem 6.2.25, ( ¯

X, S2) is a

complete, suﬃcient statistic. The unbiased estimator (n−1)p/2Sp.Cp,n is a function of ( ¯

X, S2).

Hence, it is the best unbiased estimator.

7.61 The pdf for Y∼χ2

νis

f(y) = 1

Γ(ν/2)2ν/2yν/2−1e−y/2.

Thus the pdf for S2=σ2Y/ν is

g(s2) = ν

σ2

Γ(ν/2)2ν/2s2ν

σ2ν/2−1

e−s2ν/(2σ2).

Thus, the log-likelihood has the form (gathering together constants that do not depend on s2

or σ2)

log L(σ2|s2) = log 1

σ2+Klog s2

σ2−K0s2

σ2+K00,

where K > 0 and K0>0.

The loss function in Example 7.3.27 is

L(σ2, a) = a

σ2−log a

σ2−1,

so the loss of an estimator is the negative of its likelihood.

7.63 Let a=τ2/(τ2+ 1), so the Bayes estimator is δπ(x) = ax. Then R(µ, δπ) = (a−1)2µ2+a2.

As τ2increases, R(µ, δπ) becomes ﬂatter.

7.65 a. Figure omitted.

b. The posterior expected loss is E (L(θ, a)|x) = ecaEe−cθ −cE(a−θ)−1, where the expectation

is with respect to π(θ|x). Then

daE (L(θ, a)|x) = cecaEe−cθ −cset

= 0,

and a=−1

clog E e−cθ is the solution. The second derivative is positive, so this is the mini-

mum.

7-24 Solutions Manual for Statistical Inference

c. π(θ|x) = n(¯x, σ2/n). So, substituting into the formula for a normal mgf, we ﬁnd E e−cθ =

e−c¯x+σ2c2/2n, and the LINEX posterior loss is

E (L(θ, a)|x) = ec(a−¯x)+σ2c2/2n−c(a−¯x)−1.

Substitute E e−cθ =e−c¯x+σ2c2/2ninto the formula in part (b) to ﬁnd the Bayes rule is

¯x−cσ2/2n.

d. For an estimator ¯

X+b, the LINEX posterior loss (from part (c)) is

E (L(θ, ¯x+b)|x) = ecbec2σ2/2n−cb −1.

For ¯

Xthe expected loss is ec2σ2/2n−1, and for the Bayes estimator (b=−cσ2/2n) the

expected loss is c2σ2/2n. The marginal distribution of ¯

Xis m(¯x) = 1, so the Bayes risk is

inﬁnite for any estimator of the form ¯

X+b.

e. For ¯

X+b, the squared error risk is E (¯

X+b)−θ2=σ2/n +b2, so ¯

Xis better than the

Bayes estimator. The Bayes risk is inﬁnite for both estimators.

7.66 Let S=PiXi∼binomial(n, θ).

a. E ˆ

θ2= ES2

n2=1

n2ES2=1

n2(nθ(1 −θ)+(nθ)2) = θ

n+n−1

nθ2.

b. T(i)

n=Pj6=iXj2.(n−1)2. For Svalues of i,T(i)

n= (S−1)2/(n−1)2because the Xi

that is dropped out equals 1. For the other n−Svalues of i,T(i)

n=S2/(n−1)2because

the Xithat is dropped out equals 0. Thus we can write the estimator as

JK(Tn) = nS2

n2−n−1

n S(S−1)2

(n−1)2+ (n−S)S2

(n−1)2!=S2−S

n(n−1).

c. E JK(Tn) = 1

n(n−1) (nθ(1 −θ)+(nθ)2−nθ) = n2θ2−nθ2

n(n−1) =θ2.

d. For this binomial model, Sis a complete suﬃcient statistic. Because JK(Tn) is a function of

Sthat is an unbiased estimator of θ2, it is the best unbiased estimator of θ2.

Chapter 8

Hypothesis Testing

8.1 Let X= # of heads out of 1000. If the coin is fair, then X∼binomial(1000,1/2). So

P(X≥560) =

1000

x=560 1000

x1

2x1

2n−x

≈.0000825,

where a computer was used to do the calculation. For this binomial, E X= 1000p= 500 and

Var X= 1000p(1 −p) = 250. A normal approximation is also very good for this calculation.

P{X≥560}=PX−500

√250 ≥559.5−500

√250 ≈P{Z≥3.763} ≈ .0000839.

Thus, if the coin is fair, the probability of observing 560 or more heads out of 1000 is very

small. We might tend to believe that the coin is not fair, and p > 1/2.

8.2 Let X∼Poisson(λ), and we observed X= 10. To assess if the accident rate has dropped, we

could calculate

P(X≤10|λ= 15) =

i=0

e−15 15i

i!=e−15 1+15+152

2! +···+1510

10! ≈.11846.

This is a fairly large value, not overwhelming evidence that the accident rate has dropped. (A

normal approximation with continuity correction gives a value of .12264.)

8.3 The LRT statistic is

λ(y) = supθ≤θ0L(θ|y1, . . . , ym)

supΘL(θ|y1, . . . , ym).

Let y=Pm

i=1 yi, and note that the MLE in the numerator is min {y/m,θ0}(see Exercise 7.12)

while the denominator has y/m as the MLE (see Example 7.2.7). Thus

λ(y) = (1 if y/m ≤θ0

(θ0)y(1−θ0)m−y

(y/m)y(1−y/m)m−yif y/m > θ0,

and we reject H0if

(θ0)y(1−θ0)m−y

(y/m)y(1 −y/m)m−y< c.

To show that this is equivalent to rejecting if y > b, we could show λ(y) is decreasing in yso

that λ(y)< c occurs for y > b > mθ0. It is easier to work with log λ(y), and we have

log λ(y) = ylog θ0+ (m−y) log (1 −θ0)−ylog y

m−(m−y) log m−y

m,

8-2 Solutions Manual for Statistical Inference

and

dy logλ(y) = log θ0−log(1 −θ0)−log y

m−y1

y+ log m−y

m+ (m−y)1

m−y

= log θ0

y/m m−y

m

1−θ0!.

For y/m > θ0, 1 −y/m = (m−y)/m < 1−θ0, so each fraction above is less than 1, and the

log is less than 0. Thus d

dy log λ < 0 which shows that λis decreasing in yand λ(y)< c if and

only if y > b.

8.4 For discrete random variables, L(θ|x) = f(x|θ) = P(X=x|θ). So the numerator and denomi-

nator of λ(x) are the supremum of this probability over the indicated sets.

8.5 a. The log-likelihood is

log L(θ, ν|x) = nlog θ+nθ log ν−(θ+ 1) log Y

xi!, ν ≤x(1),

where x(1) = minixi. For any value of θ, this is an increasing function of νfor ν≤x(1). So

both the restricted and unrestricted MLEs of νare ˆν=x(1). To ﬁnd the MLE of θ, set

∂

∂θ log L(θ, x(1)|x) = n

θ+nlog x(1) −log Y

xi!= 0,

and solve for θyielding

θ=n

log( Qixi/xn

(1))=n

(∂2/∂θ2) log L(θ, x(1)|x) = −n/θ2<0, for all θ. So ˆ

θis a maximum.

b. Under H0, the MLE of θis ˆ

θ0= 1, and the MLE of νis still ˆν=x(1). So the likelihood ratio

statistic is

λ(x) = xn

(1)/(Qixi)2

(n/T )nxn2/T

(1) .(Qixi)n/T +1 =T

nne−T

(e−T)n/T =T

nn

e−T+n.

(∂/∂T ) log λ(x) = (n/T )−1. Hence, λ(x) is increasing if T≤nand decreasing if T≥n.

Thus, T≤cis equivalent to T≤c1or T≥c2, for appropriately chosen constants c1and c2.

c. We will not use the hint, although the problem can be solved that way. Instead, make

the following three transformations. First, let Yi= log Xi,i= 1, . . . , n. Next, make the

n-to-1 transformation that sets Z1= miniYiand sets Z2, . . . , Znequal to the remaining

Yis, with their order unchanged. Finally, let W1=Z1and Wi=Zi−Z1,i= 2, . . . , n.

Then you ﬁnd that the Wis are independent with W1∼fW1(w) = nνne−nw,w > log ν,

and Wi∼exponential(1), i= 2, . . . , n. Now T=Pn

i=2 Wi∼gamma(n−1,1), and, hence,

2T∼gamma(n−1,2) = χ2

2(n−1).

8.6 a.

λ(x,y) = supΘ0L(θ|x,y)

supΘL(θ|x,y)=supθQn

i=1 1

θe−xi/θ Qm

j=1 1

θe−yj/θ

supθ,µ Qn

i=1 1

θe−xi/θ Qm

j=1 1

µe−yj/µ

supθ1

θm+nexp n−Pn

i=1 xi+Pm

j=1 yj.θo

supθ,µ 1

θnexp {−Pn

i=1 xi/θ}1

µmexp n−Pm

j=1 yj/µo.

Second Edition 8-3

Diﬀerentiation will show that in the numerator ˆ

θ0= (Pixi+Pjyj)/(n+m), while in the

denominator ˆ

θ= ¯xand ˆµ= ¯y. Therefore,

λ(x,y) = n+m

Pixi+Pjyjn+m

exp −n+m

Pixi+PjyjPixi+Pjyj

n

Pixin

exp −n

PixiPixi m

Pjyjm

exp −m

PjyjPjyj

=(n+m)n+m

nnmm

(Pixi)nPjyjm

Pixi+Pjyjn+m.

And the LRT is to reject H0if λ(x,y)≤c.

λ=(n+m)n+m

nnmm Pixi

Pixi+Pjyj!n Pjyj

Pixi+Pjyj!m

=(n+m)n+m

nnmmTn(1 −T)m.

Therefore λis a function of T.λis a unimodal function of Twhich is maximized when

T=n

m+n. Rejection for λ≤cis equivalent to rejection for T≤aor T≥b, where aand b

are constants that satisfy an(1 −a)m=bn(1 −b)m.

c. When H0is true, PiXi∼gamma(n, θ) and PjYj∼gamma(m, θ) and they are indepen-

dent. So by an extension of Exercise 4.19b, T∼beta(n, m).

8.7 a.

L(θ, λ|x) =

i=1

λe−(xi−θ)/λI[θ,∞)(xi) = 1

λn

e−(Σixi−nθ)/λI[θ,∞)(x(1)),

which is increasing in θif x(1) ≥θ(regardless of λ). So the MLE of θis ˆ

θ=x(1). Then

∂log L

∂λ =−n

λ+Pixi−nˆ

λ2

set

= 0 ⇒nˆ

λ=X

xi−nˆ

θ⇒ˆ

λ= ¯x−x(1).

Because

∂2log L

∂λ2=n

λ2−2Pixi−nˆ

λ3¯x−x(1)

(¯x−x(1))2−2n(¯x−x(1))

(¯x−x(1))3=−n

(¯x−x(1))2<0,

we have ˆ

θ=x(1) and ˆ

λ= ¯x−x(1) as the unrestricted MLEs of θand λ. Under the restriction

θ≤0, the MLE of θ(regardless of λ) is

θ0=0 if x(1) >0

x(1) if x(1) ≤0.

For x(1) >0, substituting ˆ

θ0= 0 and maximizing with respect to λ, as above, yields ˆ

λ0= ¯x.

Therefore,

λ(x) = supΘ0L(θ,λ |x)

supΘL(θ,λ |x)=sup{(λ,θ):θ≤0}L(λ,θ |x)

L(ˆ

θ, ˆ

λ|x)=(1 if x(1) ≤0

L(¯x,0|x)

L(ˆ

λ,ˆ

θ|x)if x(1) >0,

where

L(¯x, 0|x)

L(ˆ

λ, ˆ

θ|x)=(1/¯x)ne−n¯x/¯x

1/ˆ

λne−n(¯x−x(1))/(¯x−x(1))= ˆ

¯x!n

=¯x−x(1)

¯xn

=1−x(1)

¯xn

So rejecting if λ(x)≤cis equivalent to rejecting if x(1)/¯x≥c∗, where c∗is some constant.

8-4 Solutions Manual for Statistical Inference

b. The LRT statistic is

λ(x) = supβ(1/βn)e−Σixi/β

supβ,γ (γn/βn)( Qixi)γ−1e−Σixγ

i/β .

The numerator is maximized at ˆ

β0= ¯x. For ﬁxed γ, the denominator is maximized at

βγ=Pixγ

i/n. Thus

λ(x) = ¯x−ne−n

supγ(γn/ˆ

βn

γ)( Qixi)γ−1e−Σixγ

i/ˆ

βγ

=¯x−n

supγ(γn/ˆ

βn

γ)( Qixi)γ−1.

The denominator cannot be maximized in closed form. Numeric maximization could be used

to compute the statistic for observed data x.

8.8 a. We will ﬁrst ﬁnd the MLEs of aand θ. We have

L(a, θ |x) =

i=1

√2πaθ e−(xi−θ)2/(2aθ),

log L(a, θ |x) =

i=1 −1

2log(2πaθ)−1

2aθ (xi−θ)2.

Thus

∂log L

∂a =

i=1 −1

2a+1

2θa2(xi−θ)2=−n

2a+1

2θa2

i=1

(xi−θ)2set

= 0

∂log L

∂θ =

i=1 −1

2θ+1

2aθ2(xi−θ)2+1

aθ (xi−θ)

=−n

2θ+1

2aθ2

i=1

(xi−θ)2+n¯x−nθ

aθ

set

= 0.

We have to solve these two equations simultaneously to get MLEs of aand θ, say ˆaand ˆ

θ.

Solve the ﬁrst equation for ain terms of θto get

a=1

nθ

i=1

(xi−θ)2.

Substitute this into the second equation to get

−n

2θ+n

2θ+n(¯x−θ)

aθ = 0.

So we get ˆ

θ= ¯x, and

ˆa=1

n¯x

i=1

(xi−¯x)2=ˆσ2

¯x,

the ratio of the usual MLEs of the mean and variance. (Veriﬁcation that this is a maximum

is lengthy. We omit it.) For a= 1, we just solve the second equation, which gives a quadratic

in θthat leads to the restricted MLE

θR=−1+q1+4(ˆσ2+¯x2)

Second Edition 8-5

Noting that ˆaˆ

θ= ˆσ2, we obtain

λ(x) = L(ˆ

θR|x)

L(ˆa, ˆ

θ|x)=Qn

i=1 1

√2πˆ

θR

e−(xi−ˆ

θR)2/(2ˆ

θR)

i=1 1

√2πˆaˆ

θe−(xi−ˆ

θ)2/(2ˆaˆ

θ)

=1/(2πˆ

θR)n/2e−Σi(xi−ˆ

θR)2/(2ˆ

θR)

(1/(2πˆσ2))n/2e−Σi(xi−¯x)2/(2ˆσ2)

=ˆσ2/ˆ

θRn/2e(n/2)−Σi(xi−ˆ

θR)2/(2ˆ

θR).

b. In this case we have

log L(a, θ |x) =

i=1 −1

2log(2πaθ2)−1

2aθ2(xi−θ)2.

Thus

∂logL

∂a =

i=1 −1

2a+1

2a2θ2(xi−θ)2=−n

2a+1

2a2θ2

i=1

(xi−θ)2set

= 0.

∂logL

∂θ =

i=1 −1

θ+1

aθ3(xi−θ)2+1

aθ2(xi−θ)

=−n

θ+1

aθ3

i=1

(xi−θ)2+1

aθ2

i=1

(xi−θ)set

= 0.

Solving the ﬁrst equation for ain terms of θyields

a=1

nθ2

i=1

(xi−θ)2.

Substituting this into the second equation, we get

−n

θ+n

θ+nPi(xi−θ)

Pi(xi−θ)2= 0.

So again, ˆ

θ= ¯xand

ˆa=1

n¯x2

i=1

(xi−¯x)2=ˆσ2

¯x2

in the unrestricted case. In the restricted case, set a= 1 in the second equation to obtain

∂log L

∂θ =−n

θ+1

θ3

i=1

(xi−θ)2+1

θ2

i=1

(xi−θ)set

= 0.

Multiply through by θ3/n to get

−θ2+1

i=1

(xi−θ)2−θ

i=1

(xi−θ) = 0.

Add ±¯xinside the square and complete all sums to get the equation

−θ2+ ˆσ2+ (¯x−θ)2+θ(¯x−θ) = 0.

8-6 Solutions Manual for Statistical Inference

This is a quadratic in θwith solution for the MLE

θR= ¯x+q¯x+4(ˆσ2+¯x2)2.

which yields the LRT statistic

λ(x) = L(ˆ

θR|x)

L(ˆa, ˆ

θ|x)=Qn

i=1 1

p2πˆ

θ2

e−(xi−ˆ

θR)2/(2ˆ

θ2

i=1 1

√2πˆaˆ

θ2e−(xi−ˆ

θ)2/(2ˆaˆ

θ2)=ˆσ

θRn

e(n/2)−Σi(xi−ˆ

θR)2/(2ˆ

θR).

8.9 a. The MLE of λunder H0is ˆ

λ0=¯

Y−1, and the MLE of λiunder H1is ˆ

λi=Y−1

i. The

LRT statistic is bounded above by 1 and is given by

1≥¯

Y−ne−n

(QiYi)−1e−n.

Rearrangement of this inequality yields ¯

Y≥(QiYi)1/n, the arithmetic-geometric mean

inequality.

b. The pdf of Xiis f(xi|λi) = (λi/x2

i)e−λi/xi,xi>0. The MLE of λunder H0is ˆ

λ0=

n/ [Pi(1/Xi)], and the MLE of λiunder H1is ˆ

λi=Xi. Now, the argument proceeds as in

part (a).

8.10 Let Y=PiXi. The posterior distribution of λ|yis gamma (y+α, β/(β+ 1)).

P(λ≤λ0|y) = (β+1)y+α

Γ(y+α)βy+αZλ0

ty+α−1e−t(β+1)/β dt.

P(λ>λ0|y) = 1 −P(λ≤λ0|y).

b. Because β/(β+ 1) is a scale parameter in the posterior distribution, (2(β+ 1)λ/β)|yhas

a gamma(y+α, 2) distribution. If 2αis an integer, this is a χ2

2y+2αdistribution. So, for

α= 5/2 and β= 2,

P(λ≤λ0|y) = P2(β+1)λ

β≤2(β+1)λ0

βy=P(χ2

2y+5 ≤3λ0).

8.11 a. From Exercise 7.23, the posterior distribution of σ2given S2is IG(γ, δ), where γ=α+ (n−

1)/2 and δ= [(n−1)S2/2 + 1/β]−1. Let Y= 2/(σ2δ). Then Y|S2∼gamma(γ, 2). (Note:

If 2αis an integer, this is a χ2

2γdistribution.) Let Mdenote the median of a gamma(γ, 2)

distribution. Note that Mdepends on only αand n, not on S2or β. Then we have P(Y≥

2/δ|S2) = P(σ2≤1|S2)>1/2 if and only if

M > 2

δ= (n−1)S2+2

β,that is, S2<M−2/β

n−1.

b. From Example 7.2.11, the unrestricted MLEs are ˆµ=¯

Xand ˆσ2= (n−1)S2/n. Under H0,

ˆµis still ¯

X, because this was the maximizing value of µ, regardless of σ2. Then because

L(¯x, σ2|x) is a unimodal function of σ2, the restricted MLE of σ2is ˆσ2, if ˆσ2≤1, and is 1,

if ˆσ2>1. So the LRT statistic is

λ(x) = 1 if ˆσ2≤1

(ˆσ2)n/2e−n(ˆσ2−1)/2if ˆσ2>1.

Second Edition 8-7

We have that, for ˆσ2>1,

∂

∂(ˆσ2)log λ(x) = n

21

ˆσ2−1<0.

So λ(x) is decreasing in ˆσ2, and rejecting H0for small values of λ(x) is equivalent to rejecting

for large values of ˆσ2, that is, large values of S2. The LRT accepts H0if and only if S2< k,

where kis a constant. We can pick the prior parameters so that the acceptance regions

match in this way. First, pick αlarge enough that M/(n−1) > k. Then, as βvaries between

0 and ∞, (M−2/β)/(n−1) varies between −∞ and M/(n−1). So, for some choice of β,

(M−2/β)/(n−1) = kand the acceptance regions match.

8.12 a. For H0:µ≤0 vs. H1:µ > 0 the LRT is to reject H0if ¯x > cσ/√n(Example 8.3.3). For

α=.05 take c= 1.645. The power function is

β(µ) = P¯

X−µ

σ/√n>1.645−µ

σ/√n=PZ > 1.645−√nµ

σ.

Note that the power will equal .5 when µ= 1.645σ/√n.

b. For H0:µ= 0 vs. HA:µ6= 0 the LRT is to reject H0if |¯x|> cσ/√n(Example 8.2.2). For

α=.05 take c= 1.96. The power function is

β(µ) = P−1.96 −√nµ/σ ≤Z≤1.96 + √nµ/σ.

In this case, µ=±1.96σ/√ngives power of approximately .5.

8.13 a. The size of φ1is α1=P(X1> .95|θ= 0) = .05. The size of φ2is α2=P(X1+X2> C|θ= 0).

If 1 ≤C≤2, this is

α2=P(X1+X2> C|θ= 0) = Z1

1−CZ1

C−x1

1dx2dx1=(2 −C)2

Setting this equal to αand solving for Cgives C= 2 −√2α, and for α=.05, we get

C= 2 −√.1≈1.68.

b. For the ﬁrst test we have the power function

β1(θ) = Pθ(X1> .95) = (0 if θ≤ −.05

θ+.05 if −.05 < θ ≤.95

1 if .95 < θ.

Using the distribution of Y=X1+X2, given by

fY(y|θ) = (y−2θif 2θ≤y < 2θ+ 1

2θ+ 2 −yif 2θ+1 ≤y < 2θ+ 2

0 otherwise,

we obtain the power function for the second test as

β2(θ) = Pθ(Y > C) = 









0 if θ≤(C/2) −1

(2θ+ 2 −C)2/2 if (C/2) −1< θ ≤(C−1)/2

1−(C−2θ)2/2 if (C−1)/2< θ ≤C/2

1 if C/2< θ.

c. From the graph it is clear that φ1is more powerful for θnear 0, but φ2is more powerful for

larger θs. φ2is not uniformly more powerful than φ1.

8-8 Solutions Manual for Statistical Inference

d. If either X1≥1 or X2≥1, we should reject H0, because if θ= 0, P(Xi<1) = 1. Thus,

consider the rejection region given by

{(x1, x2): x1+x2> C}[{(x1, x2) : x1>1}[{(x1, x2): x2>1}.

The ﬁrst set is the rejection region for φ2. The test with this rejection region has the same

size as φ2because the last two sets both have probability 0 if θ= 0. But for 0 < θ < C −1,

The power function of this test is strictly larger than β2(θ). If C−1≤θ, this test and φ2

have the same power.

8.14 The CLT tells us that Z= (PiXi−np)/pnp(1 −p) is approximately n(0,1). For a test that

rejects H0when PiXi> c, we need to ﬁnd cand nto satisfy

P Z > c−n(.49)

pn(.49)(.51)!=.01 and P Z > c−n(.51)

pn(.51)(.49)!=.99.

We thus want c−n(.49)

pn(.49)(.51) = 2.33 and c−n(.51)

pn(.51)(.49) =−2.33.

Solving these equations gives n= 13,567 and c= 6,783.5.

8.15 From the Neyman-Pearson lemma the UMP test rejects H0if

f(x|σ1)

f(x|σ0)=(2πσ2

1)−n/2e−Σix2

i/(2σ2

(2πσ2

0)−n/2e−Σix2

i/(2σ2

0)=σ0

σ1n

exp (1

i1

σ2

0−1

σ2

1)> k

for some k≥0. After some algebra, this is equivalent to rejecting if

i>2log (k(σ1/σ0)n)

1

σ2

0−1

σ2

1=cbecause 1

σ2

0−1

σ2

>0.

This is the UMP test of size α, where α=Pσ0(PiX2

i> c). To determine cto obtain a speciﬁed

α, use the fact that PiX2

i/σ2

0∼χ2

n. Thus

α=Pσ0 X

i/σ2

0> c/σ2

0!=Pχ2

n> c/σ2

0,

so we must have c/σ2

0=χ2

n,α, which means c=σ2

0χ2

n,α.

8.16 a.

Size = P(reject H0|H0is true) = 1 ⇒Type I error = 1.

Power = P(reject H0|HAis true) = 1 ⇒Type II error = 0.

Size = P(reject H0|H0is true) = 0 ⇒Type I error = 0.

Power = P(reject H0|HAis true) = 0 ⇒Type II error = 1.

8.17 a. The likelihood function is

L(µ, θ|x,y) = µn Y

xi!µ−1

θn

Y

yj



θ−1

Second Edition 8-9

Maximizing, by diﬀerentiating the log-likelihood, yields the MLEs

ˆµ=−n

Pilog xi

and ˆ

θ=−m

Pjlog yj

Under H0, the likelihood is

L(θ|x,y) = θn+m

Y

xiY

yj



θ−1

and maximizing as above yields the restricted MLE,

θ0=−n+m

Pilog xi+Pjlog yj

The LRT statistic is

λ(x,y) = ˆ

θm+n

ˆµnˆ

θm Y

xi!ˆ

θ0−ˆµ

Y

yj



θ0−ˆ

b. Substituting in the formulas for ˆ

θ, ˆµand ˆ

θ0yields (Qixi)ˆ

θ0−ˆµQjyjˆ

θ0−ˆ

θ= 1 and

λ(x,y) = ˆ

θm+n

ˆµnˆ

θm=ˆ

θn

ˆµn

θm

θm=m+n

mmm+n

nn

(1 −T)mTn.

This is a unimodal function of T. So rejecting if λ(x,y)≤cis equivalent to rejecting if

T≤c1or T≥c2, where c1and c2are appropriately chosen constants.

c. Simple transformations yield −log Xi∼exponential(1/µ) and −log Yi∼exponential(1/θ).

Therefore, T=W/(W+V) where Wand Vare independent, W∼gamma(n, 1/µ) and

V∼gamma(m, 1/θ). Under H0, the scale parameters of Wand Vare equal. Then, a

simple generalization of Exercise 4.19b yields T∼beta(n, m). The constants c1and c2are

determined by the two equations

P(T≤c1) + P(T≥c2) = αand (1 −c1)mcn

1= (1 −c2)mcn

8.18 a.

β(θ) = Pθ|¯

X−θ0|

σ/√n> c= 1 −Pθ|¯

X−θ0|

σ/√n≤c

= 1 −Pθ−cσ

√n≤¯

X−θ0≤cσ

√n

= 1 −Pθ−cσ/√n+θ0−θ

σ/√n≤¯

X−θ

σ/√n≤cσ/√n+θ0−θ

σ/√n

= 1 −P−c+θ0−θ

σ/√n≤Z≤c+θ0−θ

σ/√n

= 1 + Φ −c+θ0−θ

σ/√n−Φc+θ0−θ

σ/√n,

where Z∼n(0,1) and Φ is the standard normal cdf.

8-10 Solutions Manual for Statistical Inference

b. The size is .05 = β(θ0) = 1 + Φ(−c)−Φ(c) which implies c= 1.96. The power (1 −

type II error) is

.75 ≤β(θ0+σ) = 1 + Φ(−c−√n)−Φ(c−√n) = 1 + Φ(−1.96−√n)

| {z }

≈0

−Φ(1.96 −√n).

Φ(−.675) ≈.25 implies 1.96 −√n=−.675 implies n= 6.943 ≈7.

8.19 The pdf of Yis

f(y|θ) = 1

θy(1/θ)−1e−y1/θ , y > 0.

By the Neyman-Pearson Lemma, the UMP test will reject if

2y−1/2ey−y1/2=f(y|2)

f(y|1) > k.

To see the form of this rejection region, we compute

dy 1

2y−1/2ey−y1/2=1

2y−3/2ey−y1/2y−y1/2

2−1

2

which is negative for y < 1 and positive for y > 1. Thus f(y|2)/f(y|1) is decreasing for y≤1

and increasing for y≥1. Hence, rejecting for f(y|2)/f(y|1) > k is equivalent to rejecting for

y≤c0or y≥c1. To obtain a size αtest, the constants c0and c1must satisfy

α=P(Y≤c0|θ= 1) + P(Y≥c1|θ= 1) = 1 −e−c0+e−c1and f(c0|2)

f(c0|1) =f(c1|2)

f(c1|1).

Solving these two equations numerically, for α=.10, yields c0=.076546 and c1= 3.637798.

The Type II error probability is

P(c0< Y < c1|θ= 2) = Zc1

2y−1/2e−y1/2dy =−e−y1/2

=.609824.

8.20 By the Neyman-Pearson Lemma, the UMP test rejects for large values of f(x|H1)/f(x|H0).

Computing this ratio we obtain

x123456 7

f(x|H1)

f(x|H0)654321.84

The ratio is decreasing in x. So rejecting for large values of f(x|H1)/f(x|H0) corresponds to

rejecting for small values of x. To get a size αtest, we need to choose cso that P(X≤

c|H0) = α. The value c= 4 gives the UMP size α=.04 test. The Type II error probability is

P(X= 5,6,7|H1) = .82.

8.21 The proof is the same with integrals replaced by sums.

8.22 a. From Corollary 8.3.13 we can base the test on PiXi, the suﬃcient statistic. Let Y=

PiXi∼binomial(10, p) and let f(y|p) denote the pmf of Y. By Corollary 8.3.13, a test

that rejects if f(y|1/4)/f(y|1/2) > k is UMP of its size. By Exercise 8.25c, the ratio

f(y|1/2)/f(y|1/4) is increasing in y. So the ratio f(y|1/4)/f(y|1/2) is decreasing in y, and

rejecting for large value of the ratio is equivalent to rejecting for small values of y. To get

α=.0547, we must ﬁnd csuch that P(Y≤c|p= 1/2) = .0547. Trying values c= 0,1, . . .,

we ﬁnd that for c= 2, P(Y≤2|p= 1/2) = .0547. So the test that rejects if Y≤2 is the

UMP size α=.0547 test. The power of the test is P(Y≤2|p= 1/4) ≈.526.

Second Edition 8-11

b. The size of the test is P(Y≥6|p= 1/2) = P10

k=6 10

k1

2k1

210−k≈.377. The power

function is β(θ) = P10

k=6 10

kθk(1 −θ)10−k

c. There is a nonrandomized UMP test for all αlevels corresponding to the probabilities

P(Y≤i|p= 1/2), where iis an integer. For n= 10, αcan have any of the values 0,

1024 ,11

1024 ,56

1024 ,176

1024 ,386

1024 ,638

1024 ,848

1024 ,968

1024 ,1013

1024 ,1023

1024 , and 1.

8.23 a. The test is Reject H0if X > 1/2. So the power function is

β(θ) = Pθ(X > 1/2) = Z1

1/2

Γ(θ+1)

Γ(θ)Γ(1)xθ−1(1 −x)1−1dx =θ1

θxθ

1/2

= 1 −1

2θ.

The size is supθ∈H0β(θ) = supθ≤1(1 −1/2θ) = 1 −1/2 = 1/2.

b. By the Neyman-Pearson Lemma, the most powerful test of H0:θ= 1 vs. H1:θ= 2 is given

by Reject H0if f(x|2)/f(x|1) > k for some k≥0. Substituting the beta pdf gives

f(x|2)

f(x|1) =

β(2,1) x2−1(1 −x)1−1

β(1,1) x1−1(1 −x)1−1=Γ(3)

Γ(2)Γ(1)x= 2x.

Thus, the MP test is Reject H0if X > k/2. We now use the αlevel to determine k. We have

α= sup

θ∈Θ0

β(θ) = β(1) = Z1

k/2

fX(x|1) dx =Z1

k/2

β(1,1)x1−1(1 −x)1−1dx = 1 −k

Thus 1 −k/2 = α, so the most powerful αlevel test is reject H0if X > 1−α.

c. For θ2> θ1,f(x|θ2)/f(x|θ1)=(θ2/θ1)xθ2−θ1, an increasing function of xbecause θ2> θ1.

So this family has MLR. By the Karlin-Rubin Theorem, the test that rejects H0if X > t is

the UMP test of its size. By the argument in part (b), use t= 1 −αto get size α.

8.24 For H0:θ=θ0vs. H1:θ=θ1, the LRT statistic is

λ(x) = L(θ0|x)

max{L(θ0|x), L(θ1|x)}=1 if L(θ0|x)≥L(θ1|x)

L(θ0|x)/L(θ1|x) if L(θ0|x)< L(θ1|x).

The LRT rejects H0if λ(x)< c. The Neyman-Pearson test rejects H0if f(x|θ1)/f(x|θ0) =

L(θ1|x)/L(θ0|x)> k. If k= 1/c > 1, this is equivalent to L(θ0|x)/L(θ1|x)< c, the LRT. But

if c≥1 or k≤1, the tests will not be the same. Because cis usually chosen to be small (k

large) to get a small size α, in practice the two tests are often the same.

8.25 a. For θ2> θ1,

g(x|θ2)

g(x|θ1)=e−(x−θ2)2/2σ2

e−(x−θ1)2/2σ2=ex(θ2−θ1)/σ2e(θ2

1−θ2

2)/2σ2.

Because θ2−θ1>0, the ratio is increasing in x. So the families of n(θ, σ2) have MLR.

b. For θ2> θ1,

g(x|θ2)

g(x|θ1)=e−θ2θx

2/x!

e−θ1θx

1/x!=θ2

θ1x

eθ1−θ2,

which is increasing in xbecause θ2/θ1>1. Thus the Poisson(θ) family has an MLR.

c. For θ2> θ1,

g(x|θ2)

g(x|θ1)=n

xθx

2(1−θ2)n−x

n

xθx

1(1−θ1)n−x=θ2(1−θ1)

θ1(1−θ2)x1−θ2

1−θ1n

Both θ2/θ1>1 and (1 −θ1)/(1 −θ2)>1. Thus the ratio is increasing in x, and the family

has MLR.

(Note: You can also use the fact that an exponential family h(x)c(θ) exp(w(θ)x) has MLR if

w(θ) is increasing in θ(Exercise 8.27). For example, the Poisson(θ) pmf is e−θexp(xlog θ)/x!,

and the family has MLR because log θis increasing in θ.)

8-12 Solutions Manual for Statistical Inference

8.26 a. We will prove the result for continuous distributions. But it is also true for discrete MLR

families. For θ1> θ2, we must show F(x|θ1)≤F(x|θ2). Now

dx [F(x|θ1)−F(x|θ2)] = f(x|θ1)−f(x|θ2) = f(x|θ2)f(x|θ1)

f(x|θ2)−1.

Because fhas MLR, the ratio on the right-hand side is increasing, so the derivative can only

change sign from negative to positive showing that any interior extremum is a minimum.

Thus the function in square brackets is maximized by its value at ∞or −∞, which is zero.

b. From Exercise 3.42, location families are stochastically increasing in their location param-

eter, so the location Cauchy family with pdf f(x|θ)=(π[1+(x−θ)2])−1is stochastically

increasing. The family does not have MLR.

8.27 For θ2> θ1,

g(t|θ2)

g(t|θ1)=c(θ2)

c(θ1)e[w(θ2)−w(θ1)]t

which is increasing in tbecause w(θ2)−w(θ1)>0. Examples include n(θ, 1), beta(θ, 1), and

Bernoulli(θ).

8.28 a. For θ2> θ1, the likelihood ratio is

f(x|θ2)

f(x|θ1)=eθ1−θ21+ex−θ1

1+ex−θ22

The derivative of the quantity in brackets is

1+ex−θ1

1+ex−θ2=ex−θ1−ex−θ2

(1+ex−θ2)2.

Because θ2> θ1,ex−θ1> ex−θ2, and, hence, the ratio is increasing. This family has MLR.

b. The best test is to reject H0if f(x|1)/f(x|0) > k. From part (a), this ratio is increasing

in x. Thus this inequality is equivalent to rejecting if x>k0. The cdf of this logistic is

F(x|θ) = ex−θ.(1 + ex−θ). Thus

α= 1 −F(k0|0) = 1

1+ek0and β=F(k0|1) = ek0−1

1+ek0−1.

For a speciﬁed α,k0= log(1 −α)/α. So for α=.2, k0≈1.386 and β≈.595.

c. The Karlin-Rubin Theorem is satisﬁed, so the test is UMP of its size.

8.29 a. Let θ2> θ1. Then

f(x|θ2)

f(x|θ1)=1+(x−θ1)2

1+(x−θ2)2=1 + (1+θ1)2/x2−2θ1/x

1 + (1+θ2)2/x2−2θ2/x.

The limit of this ratio as x→ ∞ or as x→ −∞ is 1. So the ratio cannot be monotone

increasing (or decreasing) between −∞ and ∞. Thus, the family does not have MLR.

b. By the Neyman-Pearson Lemma, a test will be UMP if it rejects when f(x|1)/f(x|0) > k,

for some constant k. Examination of the derivative shows that f(x|1)/f(x|0) is decreasing

for x≤(1 −√5)/2 = −.618, is increasing for (1 −√5)/2≤x≤(1 + √5)/2 = 1.618, and is

decreasing for (1 + √5)/2≤x. Furthermore, f(1|1)/f(1|0) = f(3|1)/f(3|0) = 2. So rejecting

if f(x|1)/f(x|0) >2 is equivalent to rejecting if 1 < x < 3. Thus, the given test is UMP of

its size. The size of the test is

P(1 < X < 3|θ= 0) = Z3

1+x2dx =1

πarctanx

1≈.1476.

Second Edition 8-13

The Type II error probability is

1−P(1 < X < 3|θ= 1) = 1 −Z3

1+(x−1)2dx = 1 −1

πarctan(x−1)

1≈.6476.

c. We will not have f(1|θ)/f(1|0) = f(3|θ)/f(3|0) for any other value of θ6= 1. Try θ= 2, for

example. So the rejection region 1 <x<3 will not be most powerful at any other value of

θ. The test is not UMP for testing H0:θ≤0 versus H1:θ > 0.

8.30 a. For θ2> θ1>0, the likelihood ratio and its derivative are

f(x|θ2)

f(x|θ1)=θ2

θ1

θ2

1+x2

θ2

2+x2and d

f(x|θ2)

f(x|θ1)=θ2

θ1

θ2

2−θ2

(θ2

2+x2)2x.

The sign of the derivative is the same as the sign of x(recall, θ2

2−θ2

1>0), which changes

sign. Hence the ratio is not monotone.

b. Because f(x|θ) = (θ/π)(θ2+|x|2)−1,Y=|X|is suﬃcient. Its pdf is

f(y|θ) = 2θ

θ2+y2, y > 0.

Diﬀerentiating as above, the sign of the derivative is the same as the sign of y, which is

positive. Hence the family has MLR.

8.31 a. By the Karlin-Rubin Theorem, the UMP test is to reject H0if PiXi> k, because PiXi

is suﬃcient and PiXi∼Poisson(nλ) which has MLR. Choose the constant kto satisfy

P(PiXi> k|λ=λ0) = α.

P X

Xi> k

λ= 1!≈PZ > (k−n)/√nset

=.05,

P X

Xi> k

λ= 2!≈PZ > (k−2n)/√2nset

=.90.

Thus, solve for kand nin

k−n

√n= 1.645 and k−2n

√2n=−1.28,

yielding n= 12 and k= 17.70.

8.32 a. This is Example 8.3.15.

b. This is Example 8.3.19.

8.33 a. From Theorems 5.4.4 and 5.4.6, the marginal pdf of Y1and the joint pdf of (Y1, Yn) are

f(y1|θ) = n(1 −(y1−θ))n−1, θ < y1< θ + 1,

f(y1, yn|θ) = n(n−1)(yn−y1)n−2, θ < y1< yn< θ + 1.

Under H0,P(Yn≥1) = 0. So

α=P(Y1≥k|0) = Z1

n(1 −y1)n−1dy1= (1 −k)n.

Thus, use k= 1 −α1/n to have a size αtest.

8-14 Solutions Manual for Statistical Inference

b. For θ≤k−1, β(θ) = 0. For k−1< θ ≤0,

β(θ) = Zθ+1

n(1 −(y1−θ))n−1dy1= (1 −k+θ)n.

For 0 < θ ≤k,

β(θ) = Zθ+1

n(1 −(y1−θ))n−1dy1+Zk

θZθ+1

n(n−1)(yn−y1)n−2dyndy1

=α+ 1 −(1 −θ)n.

And for k < θ,β(θ) = 1.

c. (Y1, Yn) are suﬃcient statistics. So we can attempt to ﬁnd a UMP test using Corollary 8.3.13

and the joint pdf f(y1, yn|θ) in part (a). For 0 < θ < 1, the ratio of pdfs is

f(y1, yn|θ)

f(y1, yn|0) =(0 if 0 < y1≤θ,y1< yn<1

1 if θ < y1< yn<1

∞if 1 ≤yn< θ + 1, θ < y1< yn.

For 1 ≤θ, the ratio of pdfs is

f(y1, yn|θ)

f(y1, yn|0) =0 if y1< yn<1

∞if θ < y1< yn< θ + 1.

For 0 < θ < k, use k0= 1. The given test always rejects if f(y1, yn|θ)/f(y1, yn|0) >1 and

always accepts if f(y1, yn|θ)/f(y1, yn|0) <1. For θ≥k, use k0= 0. The given test always

rejects if f(y1, yn|θ)/f(y1, yn|0) >0 and always accepts if f(y1, yn|θ)/f(y1, yn|0) <0. Thus

the given test is UMP by Corollary 8.3.13.

d. According to the power function in part (b), β(θ) = 1 for all θ≥k= 1 −α1/n. So these

conditions are satisﬁed for any n.

8.34 a. This is Exercise 3.42a.

b. This is Exercise 8.26a.

8.35 a. We will use the equality in Exercise 3.17 which remains true so long as ν > −α. Recall that

Y∼χ2

ν= gamma(ν/2,2). Thus, using the independence of Xand Ywe have

ET0= E X

pY/ν = (E X)√νEY−1/2=µ√νΓ((ν−1)/2)

Γ(ν/2)√2

if ν > 1. To calculate the variance, compute

E(T0)2= E X2

Y/ν = (E X2)νEY−1= (µ2+ 1)νΓ((ν−2)/2)

Γ(ν/2)2 =(µ2+ 1)ν

ν−2

if ν > 2. Thus, if ν > 2,

Var T0=(µ2+ 1)ν

ν−2−µ√νΓ((ν−1)/2)

Γ(ν/2)√22

b. If δ= 0, all the terms in the sum for k= 1,2, . . . are zero because of the δkterm. The

expression with just the k= 0 term and δ= 0 simpliﬁes to the central tpdf.

c. The argument that the noncentral thas an MLR is fairly involved. It may be found in

Lehmann (1986, p. 295).

Second Edition 8-15

8.37 a. P(¯

X > θ0+zασ/√n|θ0) = P(¯

X−θ0)/(σ/√n)> zα|θ0=P(Z > zα) = α, where Z∼

n(0,1). Because ¯xis the unrestricted MLE, and the restricted MLE is θ0if ¯x > θ0, the LRT

statistic is, for ¯x≥θ0

λ(x) = (2πσ2)−n/2e−Σi(xi−θ0)2/2σ2

(2πσ2)−n/2e−Σi(xi−¯x)2/2σ2=e−[n(¯x−θ0)2+(n−1)s2]].2σ2

e−(n−1)s2/2σ2=e−n(¯x−θ0)2/2σ2.

and the LRT statistic is 1 for ¯x<θ0. Thus, rejecting if λ < c is equivalent to rejecting if

(¯x−θ0)/(σ/√n)> c0(as long as c < 1 – see Exercise 8.24).

b. The test is UMP by the Karlin-Rubin Theorem.

c. P(¯

X > θ0+tn−1,αS/√n|θ=θ0) = P(Tn−1> tn−1,α) = α, when Tn−1is a Student’s

trandom variable with n−1 degrees of freedom. If we deﬁne ˆσ2=1

nP(xi−¯x)2and

ˆσ2

0=1

nP(xi−θ0)2, then for ¯x≥θ0the LRT statistic is λ= (ˆσ2/ˆσ2

0)n/2, and for ¯x < θ0the

LRT statistic is λ= 1. Writing ˆσ2=n−1

ns2and ˆσ2

0= (¯x−θ0)2+n−1

ns2, it is clear that the

LRT is equivalent to the t-test because λ<cwhen

n−1

ns2

(¯x−θ0)2+n−1

ns2=(n−1)/n

(¯x−θ0)2/s2+(n−1)/n < c0and ¯x≥θ0,

which is the same as rejecting when (¯x−θ0)/(s/√n) is large.

d. The proof that the one-sided ttest is UMP unbiased is rather involved, using the bounded

completeness of the normal distribution and other facts. See Chapter 5 of Lehmann (1986)

for a complete treatment.

8.38 a.

Size = Pθ0n|¯

X−θ0|> tn−1,α/2pS2/no

= 1 −Pθ0n−tn−1,α/2pS2/n ≤¯

X−θ0≤tn−1,α/2pS2/no

= 1 −Pθ0(−tn−1,α/2≤¯

X−θ0

pS2/n ≤tn−1,α/2) ¯

X−θ0

pS2/n ∼tn−1under H0!

= 1 −(1 −α) = α.

b. The unrestricted MLEs are ˆ

θ=¯

Xand ˆσ2=Pi(Xi−¯

X)2/n. The restricted MLEs are

θ0=θ0and ˆσ2

0=Pi(Xi−θ0)2/n. So the LRT statistic is

λ(x) = (2πˆσ0)−n/2exp{−nˆσ2

0/(2ˆσ2

0)}

(2πˆσ)−n/2exp{−nˆσ2/(2ˆσ2)}

="Pi(xi−¯x)2

Pi(xi−θ0)2#n/2

="Pi(xi−¯x)2

Pi(xi−¯x)2+n(¯x−θ0)2#n/2

For a constant c, the LRT is

reject H0if "Pi(xi−¯x)2

Pi(xi−¯x)2+n(¯x−θ0)2#=1

1 + n(¯x−θ0)2/Pi(xi−¯x)2< c2/n.

After some algebra we can write the test as

reject H0if |¯x−θ0|>c−2/n −1(n−1) s2

n1/2

8-16 Solutions Manual for Statistical Inference

We now choose the constant cto achieve size α, and we

reject if |¯x−θ0|> tn−1,α/2ps2/n.

c. Again, see Chapter 5 of Lehmann (1986).

8.39 a. From Exercise 4.45c, Wi=Xi−Yi∼n(µW, σ2

W), where µX−µY=µWand σ2

X+σ2

Y−

ρσXσY=σ2

W. The Wis are independent because the pairs (Xi, Yi) are.

b. The hypotheses are equivalent to H0:µW= 0 vs H1:µW6= 0, and, from Exercise 8.38, if

we reject H0when |¯

W|> tn−1,α/2pS2

W/n, this is the LRT (based on W1, . . . , Wn) of size

α. (Note that if ρ > 0, Var Wican be small and the test will have good power.)

8.41 a.

λ(x,y) = supH0L(µX, µY, σ2|x,y)

supL(µX, µY, σ2|x,y)=L(ˆµ, ˆσ2

0|x,y)

L(ˆµX,ˆµY,ˆσ2

1|x,y).

Under H0, the Xis and Yis are one sample of size m+nfrom a n(µ, σ2) population, where

µ=µX=µY. So the restricted MLEs are

ˆµ=PiXi+PiYi

n+m=n¯x+n¯y

n+mand ˆσ2

0=Pi(Xi−ˆµ)2+Pi(Yi−ˆµ)2

n+m.

To obtain the unrestricted MLEs, ˆµx, ˆµy, ˆσ2, use

L(µX, µY, σ2|x, y) = (2πσ2)−(n+m)/2e−[Σi(xi−µX)2+Σi(yi−µY)2]/2σ2.

Firstly, note that ˆµX= ¯xand ˆµY= ¯y, because maximizing over µXdoes not involve µY

and vice versa. Then

∂log L

∂σ2=−n+m

σ2+1

2"X

(xi−ˆµX)2+X

(yi−ˆµY)2#1

(σ2)2

set

= 0

implies

ˆσ2="n

i=1

(xi−¯x)2+

i=1

(yi−¯y)2#1

n+m.

To check that this is a maximum,

∂2log L

∂(σ2)2ˆσ2

=n+m

(σ2)2−"X

(xi−ˆµX)2+X

(yi−ˆµY)2#1

(σ2)3ˆσ2

=n+m

(ˆσ2)2−(n+m)1

(ˆσ2)2=−n+m

(ˆσ2)2<0.

Thus, it is a maximum. We then have

λ(x,y) =

(2πˆσ2

0)−n+m

2exp n−1

2ˆσ2

0hPn

i=1 (xi−ˆµ)2+Pm

i=1 (yi−ˆµ)2io

(2πˆσ2)−n+m

2exp n−1

2ˆσ2hPn

i=1 (xi−¯x)2+Pm

i=1 (yi−¯y)2io =ˆσ2

ˆσ2

1−n+m

and the LRT is rejects H0if ˆσ2

0/ˆσ2> k. In the numerator, ﬁrst substitute ˆµ= (n¯x+

m¯y)/(n+m) and write

i=1 xi−n¯x+m¯y

n+m2

i=1 (xi−¯x)+ ¯x−n¯x+m¯y

n+m2

i=1

(xi−¯x)2+nm2

(n+m)2(¯x−¯y)2,

Second Edition 8-17

because the cross term is zero. Performing a similar operation on the Ysum yields

ˆσ2

ˆσ2=P(xi−¯x)2+P(yi−¯y)2+nm

n+m(¯x−¯y)2

ˆσ2=n+m+nm

n+m

(¯x−¯y)2

ˆσ2.

Because ˆσ2=n+m−2

n+mS2

p, large values of ˆσ2

0.ˆσ2are equivalent to large values of (¯x−¯y)2.S2

and large values of |T|. Hence, the LRT is the two-sample t-test.

T=¯

X−¯

qS2

p(1/n + 1/m)

(¯

X−¯

Y).pσ2(1/n + 1/m)

q[(n+m−2)S2

p/σ2]/(n+m−2)

Under H0, ( ¯

X−¯

Y)∼n(0, σ2(1/n+1/m)). Under the model, (n−1)S2

X/σ2and (m−1)S2

Y/σ2

are independent χ2random variables with (n−1) and (m−1) degrees of freedom. Thus,

(n+m−2)S2

p/σ2= (n−1)S2

X/σ2+ (m−1)S2

Y/σ2∼χ2

n+m−2. Furthermore, ¯

X−¯

Yis

independent of S2

Xand S2

Y, and, hence, S2

p.SoT∼tn+m−2.

c. The two-sample ttest is UMP unbiased, but the proof is rather involved. See Chapter 5 of

Lehmann (1986).

d. For these data we have n= 14, ¯

X= 1249.86, S2

X= 591.36, m= 9, ¯

Y= 1261.33, S2

Y= 176.00

and S2

p= 433.13. Therefore, T=−1.29 and comparing this to a t21 distribution gives a

p-value of .21. So there is no evidence that the mean age diﬀers between the core and

periphery.

8.42 a. The Satterthwaite approximation states that if Yi∼χ2

ri, where the Yi’s are independent,

then

aiYi

approx

∼χ2

ˆν

ˆνwhere ˆν=(PiaiYi)2

Pia2

iY2

i/ri

We have Y1= (n−1)S2

X/σ2

X∼χ2

n−1and Y2= (m−1)S2

Y/σ2

Y∼χ2

m−1. Now deﬁne

a1=σ2

n(n−1) [(σ2

X/n)+(σ2

Y/m)] and a2=σ2

m(m−1) [(σ2

X/n)+(σ2

Y/m)].

Then,

XaiYi=σ2

n(n−1) [(σ2

X/n)+(σ2

Y/m)]

(n−1)S2

σ2

+σ2

m(m−1) [(σ2

X/n)+(σ2

Y/m)]

(m−1)S2

σ2

=S2

X/n +S2

Y/m

σ2

X/n+σ2

Y/m ∼χ2

ˆν

where

ˆν=S2

X/n+S2

Y/m

σ2

X/n+σ2

Y/m 2

(n−1)

n2(σ2

X/n+σ2

Y/m)2+1

(m−1)

m2(σ2

X/n+σ2

Y/m)2

=S2

X/n +S2

Y/m2

n2(n−1) +S4

m2(m−1)

b. Because ¯

X−¯

Y∼nµX−µY, σ2

X/n+σ2

Y/mand S2

X/n+S2

Y/m

σ2

X/n+σ2

Y/m

approx

∼χ2

ˆν/ˆν, under H0:

µX−µY= 0 we have

T0=¯

X−¯

qS2

X/n +S2

Y/m

(¯

X−¯

Y).pσ2

X/n+σ2

Y/m

r(S2

X/n+S2

Y/m)

(σ2

X/n+σ2

Y/m)

approx

∼tˆν.

8-18 Solutions Manual for Statistical Inference

c. Using the values in Exercise 8.41d, we obtain T0=−1.46 and ˆν= 20.64. So the p-value is

.16. There is no evidence that the mean age diﬀers between the core and periphery.

d. F=S2

X/S2

Y= 3.36. Comparing this with an F13,8distribution yields a p-value of 2P(F≥

3.36) = .09. So there is some slight evidence that the variance diﬀers between the core and

periphery.

8.43 There were typos in early printings. The tstatistic should be

(¯

X−¯

Y)−(µ1−µ2)

n1+ρ2

n2q(n1−1)s2

X+(n2−1)s2

Y/ρ2

n1+n2−2

and the Fstatistic should be s2

Y/(ρ2s2

X). Multiply and divide the denominator of the tstatistic

by σto express it as

(¯

X−¯

Y)−(µ1−µ2)

qσ2

n1+ρ2σ2

divided by s(n1−1)s2

X/σ2+ (n2−1)s2

Y/(ρ2σ2)

n1+n2−2.

The numerator has a n(0,1) distribution. In the denominator, (n1−1)s2

X/σ2∼χ2

n1−1and

(n2−1)s2

Y/(ρ2σ2)∼χ2

n2−1and they are independent, so their sum has a χ2

n1+n2−2distribution.

Thus, the statistic has the form of n(0,1)/pχ2

ν/ν where ν=n1+n2−2, and the numerator

and denominator are independent because of the independence of sample means and variances

in normal sampling. Thus the statistic has a tn1+n2−2distribution. The Fstatistic can be

written as s2

ρ2s2

=s2

Y/(ρ2σ2)

X/σ2=[(n2−1)s2

Y/(ρ2σ2)]/(n2−1)

[(n1−1)s2

X/(σ2)]/(n1−1)

which has the form [χ2

n2−1/(n2−1)]/[χ2

n1−1/(n1−1)] which has an Fn2−1,n1−1distribution.

(Note, early printings had a typo with the numerator and denominator degrees of freedom

switched.)

8.44 Test 3 rejects H0:θ=θ0in favor of H1:θ6=θ0if ¯

X > θ0+zα/2σ/√nor ¯

X < θ0−zα/2σ/√n.

Let Φ and φdenote the standard normal cdf and pdf, respectively. Because ¯

X∼n(θ, σ2/n),

the power function of Test 3 is

β(θ) = Pθ(¯

X < θ0−zα/2σ/√n) + Pθ(¯

X > θ0+zα/2σ/√n)

= Φ θ0−θ

σ/√n−zα/2+ 1 −Φθ0−θ

σ/√n+zα/2,

and its derivative is

dβ(θ)

dθ =−√n

σφθ0−θ

σ/√n−zα/2+√n

σφθ0−θ

σ/√n+zα/2.

Because φis symmetric and unimodal about zero, this derivative will be zero only if

−θ0−θ

σ/√n−zα/2=θ0−θ

σ/√n+zα/2,

that is, only if θ=θ0. So, θ=θ0is the only possible local maximum or minimum of the power

function. β(θ0) = αand limθ→±∞ β(θ) = 1. Thus, θ=θ0is the global minimum of β(θ), and,

for any θ06=θ0,β(θ0)> β(θ0). That is, Test 3 is unbiased.

Second Edition 8-19

8.45 The veriﬁcation of size αis the same computation as in Exercise 8.37a. Example 8.3.3 shows

that the power function βm(θ) for each of these tests is an increasing function. So for θ > θ0,

βm(θ)> βm(θ0) = α. Hence, the tests are all unbiased.

8.47 a. This is very similar to the argument for Exercise 8.41.

b. By an argument similar to part (a), this LRT rejects H+

0if

T+=¯

X−¯

Y−δ

qS2

p1

n+1

m≤ −tn+m−2,α.

c. Because H0is the union of H+

0and H−

0, by the IUT method of Theorem 8.3.23 the test

that rejects H0if the tests in parts (a) and (b) both reject is a level αtest of H0. That is,

the test rejects H0if T+≤ −tn+m−2,α and T−≥tn+m−2,α.

d. Use Theorem 8.3.24. Consider parameter points with µX−µY=δand σ→0. For any

σ,P(T+≤ −tn+m−2,α) = α. The power of the T−test is computed from the noncentral t

distribution with noncentrality parameter |µx−µY−(−δ)|/[σ(1/n + 1/m)] = 2δ/[σ(1/n +

1/m)] which converges to ∞as σ→0. Thus, P(T−≥tn+m−2,α)→1 as σ→0. By Theorem

8.3.24, this IUT is a size αtest of H0.

8.49 a. The p-value is

P 7 or more successes

out of 10 Bernoulli trialsθ=1

2

=10

71

271

23

+10

81

281

22

+10

91

291

21

+10

101

210 1

20

=.171875.

P-value = P{X≥3|λ= 1}= 1 −P(X < 3|λ= 1)

= 1 −e−112

2! +e−111

1! +e−110

0! ≈.0803.

P-value = P{X

Xi≥9|3λ= 3}= 1 −P(Y < 9|3λ= 3)

= 1 −e−338

8! +37

7! +36

6! +35

5! +···+31

1! +30

0! ≈.0038,

where Y=P3

i=1 Xi∼Poisson(3λ).

8.50 From Exercise 7.26,

π(θ|x) = rn

2πσ2e−n(θ−δ±(x))2/(2σ2),

where δ±(x) = ¯x±σ2

na and we use the “+” if θ > 0 and the “−” if θ < 0.

a. For K > 0,

P(θ > K|x, a) = rn

2πσ2Z∞

e−n(θ−δ+(x))2/(2σ2)dθ =PZ > √n

σ[K−δ+(x)],

where Z∼n(0,1).

8-20 Solutions Manual for Statistical Inference

b. As a→ ∞,δ+(x)→¯xso P(θ > K)→PZ > √n

σ(K−¯x).

c. For K= 0, the answer in part (b) is 1 −(p-value) for H0:θ≤0.

8.51 If α < p(x),

sup

θ∈Θ0

P(W(X)≥cα) = α < p(x) = sup

θ∈Θ0

P(W(X)≥W(x)).

Thus W(x)< cαand we could not reject H0at level αhaving observed x. On the other hand,

if α≥p(x),

sup

θ∈Θ0

P(W(X)≥cα) = α≥p(x) = sup

θ∈Θ0

P(W(X)≥W(x)).

Either W(x)≥cαin which case we could reject H0at level αhaving observed xor W(x)< cα.

But, in the latter case we could use c0

α=W(x) and have {x0:W(x0)≥c0

α}deﬁne a size α

rejection region. Then we could reject H0at level αhaving observed x.

8.53 a.

P(−∞ < θ < ∞) = 1

2+1

√2πτ2Z∞

−∞

e−θ2/(2τ2)dθ =1

2+1

2= 1.

b. First calculate the posterior density. Because

f(¯x|θ) = √n

√2πσ e−n(¯x−θ)2/(2σ2),

we can calculate the marginal density as

mπ(¯x) = 1

2f(¯x|0) + 1

2Z∞

−∞

f(¯x|θ)1

√2πτ e−θ2/(2τ2)dθ

√n

√2πσ e−n¯x2/(2σ2)+1

√2πp(σ2/n)+τ2e−¯x2/[2((σ2/n)+τ2)]

(see Exercise 7.22). Then P(θ= 0|¯x) = 1

2f(¯x|0)/mπ(¯x).

P|¯

X|>¯xθ= 0= 1 −P|¯

X| ≤ ¯xθ= 0

= 1 −P−¯x≤¯

X≤¯xθ= 0= 2 1−Φ¯x/(σ/√n),

where Φ is the standard normal cdf.

d. For σ2=τ2= 1 and n= 9 we have a p-value of 2 (1 −Φ(3¯x)) and

P(θ= 0|¯x) = 1 + r1

10e81¯x2/20!−1

The p-value of ¯xis usually smaller than the Bayes posterior probability except when ¯xis

very close to the θvalue speciﬁed by H0. The following table illustrates this.

Some p-values and posterior probabilities (n= 9)

¯x

0±.1±.15 ±.2±.5±.6533 ±.7±1±2

p-value of ¯x1 .7642 .6528 .5486 .1336 .05 .0358 .0026 ≈0

posterior

P(θ= 0|¯x) .7597 .7523 .7427 .7290 .5347 .3595 .3030 .0522 ≈0

Second Edition 8-21

8.54 a. From Exercise 7.22, the posterior distribution of θ|xis normal with mean [τ2/(τ2+σ2/n)]¯x

and variance τ2/(1 + nτ2/σ2). So

P(θ≤0|x) = P Z≤0−[τ2/(τ2+σ2/n)]¯x

pτ2/(1 + nτ2/σ2)!

=P Z≤ − τ

p(σ2/n)(τ2+σ2/n)¯x!=P Z≥τ

p(σ2/n)(τ2+σ2/n)¯x!.

b. Using the fact that if θ= 0, ¯

X∼n(0, σ2/n), the p-value is

P(¯

X≥¯x) = PZ≥¯x−0

σ/√n=PZ≥1

σ/√n¯x

c. For σ2=τ2= 1,

P(θ≤0|x) = P Z≥1

p(1/n)(1 + 1/n)¯x!and P(¯

X≥¯x) = P Z≥1

p1/n ¯x!.

Because 1

p(1/n)(1 + 1/n)<1

p1/n,

the Bayes probability is larger than the p-value if ¯x≥0. (Note: The inequality is in the

opposite direction for ¯x < 0, but the primary interest would be in large values of ¯x.)

d. As τ2→ ∞, the constant in the Bayes probability,

p(σ2/n)(τ2+σ2/n)=1

p(σ2/n)(1+σ2/(τ2n)) →1

σ/√n,

the constant in the p-value. So the indicated equality is true.

8.55 The formulas for the risk functions are obtained from (8.3.14) using the power function β(θ) =

Φ(−zα+θ0−θ), where Φ is the standard normal cdf.

8.57 For 0–1 loss by (8.3.12) the risk function for any test is the power function β(µ) for µ≤0 and

1−β(µ) for µ > 0. Let α=P(1 < Z < 2), the size of test δ. By the Karlin-Rubin Theorem,

the test δzαthat rejects if X > zαis also size αand is uniformly more powerful than δ, that

is, βδzα(µ)> βδ(µ) for all µ > 0. Hence,

R(µ, δzα) = 1 −βδzα(µ)<1−βδ(µ) = R(µ, δ),for all µ > 0.

Now reverse the roles of H0and H1and consider testing H∗

0:µ > 0 versus H∗

1:µ≤0. Consider

the test δ∗that rejects H∗

0if X≤1 or X≥2, and the test δ∗

zαthat rejects H∗

0if X≤zα. It is

easily veriﬁed that for 0–1 loss δand δ∗have the same risk functions, and δ∗

zαand δzαhave the

same risk functions. Furthermore, using the Karlin-Rubin Theorem as before, we can conclude

that δ∗

zαis uniformly more powerful than δ∗. Thus we have

R(µ, δ) = R(µ, δ∗)≥R(µ, δ∗

zα) = R(µ, δzα),for all µ≤0,

with strict inequality if µ < 0. Thus, δzαis better than δ.

Chapter 9

Interval Estimation

9.1 Denote A={x:L(x)≤θ}and B={x:U(x)≥θ}. Then A∩B={x:L(x)≤θ≤U(x)}

and 1 ≥P{A∪B}=P{L(X)≤θor θ≤U(X)} ≥ P{L(X)≤θor θ≤L(X)}= 1, since

L(x)≤U(x). Therefore, P(A∩B) = P(A)+P(B)−P(A∪B) = 1−α1+1−α2−1 = 1−α1−α2.

9.3 a. The MLE of βis X(n)= max Xi. Since βis a scale parameter, X(n)/β is a pivot, and

.05 = Pβ(X(n)/β ≤c) = Pβ(all Xi≤cβ) = cβ

βα0n

=cα0n

implies c=.051/α0n. Thus, .95 = Pβ(X(n)/β > c) = Pβ(X(n)/c > β), and {β:β <

X(n)/(.051/α0n)}is a 95% upper conﬁdence limit for β.

b. From 7.10, ˆα= 12.59 and X(n)= 25. So the conﬁdence interval is (0,25/[.051/(12.59·14)]) =

(0,25.43).

9.4 a.

λ(x, y) = supλ=λ0Lσ2

X, σ2

Yx, y

supλ∈(0,+∞)L(σ2

X, σ2

Y|x, y)

The unrestricted MLEs of σ2

Xand σ2

Yare ˆσ2

X=ΣX2

nand ˆσ2

Y=ΣY2

m, as usual. Under the

restriction, λ=λ0,σ2

Y=λ0σ2

X, and

Lσ2

X, λ0σ2

Xx, y=2πσ2

X−n/22πλ0σ2

X−m/2e−Σx2

i/(2σ2

X)·e−Σy2

i/(2λ0σ2

=2πσ2

X−(m+n)/2λ−m/2

0e−(λ0Σx2

i+Σy2

i)/(2λ0σ2

Diﬀerentiating the log likelihood gives

dlog L

d(σ2

X)2=d

dσ2

X−m+n

2log σ2

X−m+n

2log (2π)−m

2log λ0−λ0Σx2

i+ Σy2

2λ0σ2

X

=−m+n

2σ2

X−1+λ0Σx2

i+ Σy2

2λ0σ2

X−2set

= 0

which implies

ˆσ2

0=λ0Σx2

i+ Σy2

λ0(m+n).

To see this is a maximum, check the second derivative:

d2log L

d(σ2

X)2=m+n

2σ2

X−2−1

λ0λ0Σx2

i+ Σy2

iσ2

X−3σ2

X=ˆσ2

=−m+n

2(ˆσ2

0)−2<0,

9-2 Solutions Manual for Statistical Inference

therefore ˆσ2

0is the MLE. The LRT statistic is

ˆσ2

Xn/2ˆσ2

Ym/2

λm/2

0(ˆσ2

0)(m+n)/2,

and the test is: Reject H0if λ(x, y)< k, where kis chosen to give the test size α.

b. Under H0,PY2

i/(λ0σ2

X)∼χ2

mand PX2

i/σ2

X∼χ2

n, independent. Also, we can write

λ(X, Y ) = 

1

m+n+(ΣY2

i/λ0σ2

X)/m

(ΣX2

i/σ2

X)/n ·m

m+n



n/2

1

m+n+(ΣX2

i/σ2

X)/n

(ΣY2

i/λ0σ2

X)/m ·n

m+n



m/2

="1

n+m+m

m+nF#n/2"1

m+n+n

m+nF−1#m/2

where F=ΣY2

i/λ0m

ΣX2

i/n ∼Fm,n under H0. The rejection region is











(x, y): 1

n+m+m

m+nFin/2·1

m+n+n

m+nF−1im/2< cα









where cαis chosen to satisfy

P(n

n+m+m

m+nF−n/2m

n+m+n

m+nF−1−m/2

< cα)=α.

c. To ease notation, let a=m/(n+m) and b=aPy2

i/Px2

i. From the duality of hypothesis

tests and conﬁdence sets, the set

c(λ) = 





λ:1

a+b/λn/2 1

(1 −a)+a(1−a)

bλ!m/2

≥cα





is a 1−αconﬁdence set for λ. We now must establish that this set is indeed an interval. To do

this, we establish that the function on the left hand side of the inequality has only an interior

maximum. That is, it looks like an upside-down bowl. Furthermore, it is straightforward to

establish that the function is zero at both λ= 0 and λ=∞. These facts imply that the set of

λvalues for which the function is greater than or equal to cαmust be an interval. We make

some further simpliﬁcations. If we multiply both sides of the inequality by [(1 −a)/b]m/2,

we need be concerned with only the behavior of the function

h(λ) = 1

a+b/λn/21

b+aλm/2

Moreover, since we are most interested in the sign of the derivative of h, this is the same as

the sign of the derivative of log h, which is much easier to work with. We have

dλlog h(λ) = d

dλ h−n

2log(a+b/λ)−m

2log(b+aλ)i

b/λ2

a+b/λ −m

b+aλ

2λ2(a+b/λ)(b+aλ)−a2mλ2+ab(n−m)λ+nb2.

Second Edition 9-3

The sign of the derivative is given by the expression in square brackets, a parabola. It is easy

to see that for λ≥0, the parabola changes sign from positive to negative. Since this is the

sign change of the derivative, the function must increase then decrease. Hence, the function

is an upside-down bowl, and the set is an interval.

9.5 a. Analogous to Example 9.2.5, the test here will reject H0if T < k(p0). Thus the conﬁdence

set is C={p:T≥k(p)}. Since k(p) is nondecreasing, this gives an upper bound on p.

b. k(p) is the integer that simultaneously satisﬁes

y=k(p)n

ypy(1 −p)n−y≥1−αand

y=k(p)+1 n

ypy(1 −p)n−y<1−α.

9.6 a. For Y=PXi∼binomial(n, p), the LRT statistic is

λ(y) = n

ypy

0(1 −p0)n−y

n

yˆpy(1 −ˆp)n−y=p0(1 −ˆp)

ˆp(1 −p0)y1−p0

1−ˆpn

where ˆp=y/n is the MLE of p. The acceptance region is

A(p0) = (y:p0

ˆpy1−p0

1−ˆpn−y

≥k∗)

where k∗is chosen to satisfy Pp0(Y∈A(p0)) = 1 −α. Inverting the acceptance region to a

conﬁdence set, we have

C(y) = (p:p

ˆpy(1 −p)

1−ˆpn−y

≥k∗).

b. For given nand observed y, write

C(y) = np: (n/y)y(n/(n−y))n−ypy(1 −p)n−y≥k∗o.

This is clearly a highest density region. The endpoints of C(y) are roots of the nth degree

polynomial (in p), (n/y)y(n/(n−y))n−ypy(1 −p)n−y−k∗. The interval of (10.4.4) is

(p:

ˆp−p

pp(1 −p)/n≤zα/2).

The endpoints of this interval are the roots of the second degree polynomial (in p), (ˆp−p)2−

α/2p(1 −p)/n. Typically, the second degree and nth degree polynomials will not have the

same roots. Therefore, the two intervals are diﬀerent. (Note that when n→ ∞ and y→ ∞,

the density becomes symmetric (CLT). Then the two intervals are the same.)

9.7 These densities have already appeared in Exercise 8.8, where LRT statistics were calculated

for testing H0:a= 1.

a. Using the result of Exercise 8.8(a), the restricted MLE of θ(when a=a0) is

θ0=−a0+pa2

0+ 4 Px2

i/n

and the unrestricted MLEs are

θ= ¯xand ˆa=P(xi−¯x)2

n¯x.

9-4 Solutions Manual for Statistical Inference

The LRT statistic is

λ(x) = ˆaˆ

a0ˆ

θ0n/2e−1

2a0ˆ

θ0Σ(xi−ˆ

θ0)2

e−1

2ˆaˆ

θΣ(xi−ˆ

θ)2=1

2πa0ˆ

θ0n/2

en/2e−1

2a0ˆ

θ0Σ(xi−ˆ

θ0)2

The rejection region of a size αtest is {x:λ(x)≤cα}, and a 1 −αconﬁdence set is

{a0:λ(x)≥cα}.

b. Using the results of Exercise 8.8b, the restricted MLE (for a=a0) is found by solving

−a0θ2+ [ˆσ2+ (¯x−θ)2] + θ(¯x−θ) = 0,

yielding the MLE ˆ

θR= ¯x+p¯x+ 4a0(ˆσ2+ ¯x2)/2a0.

The unrestricted MLEs are

θ= ¯xand ˆa=1

n¯x2

i=1

(xi−¯x)2=ˆσ2

¯x2,

yielding the LRT statistic

λ(x) = ˆσ/ˆ

θRne(n/2)−Σ(xi−ˆ

θR)2/(2ˆ

θR).

The rejection region of a size αtest is {x:λ(x)≤cα}, and a 1 −αconﬁdence set is

{a0:λ(x)≥cα}.

9.9 Let Z1, . . . , Znbe iid with pdf f(z).

a. For Xi∼f(x−µ), (X1, . . . , Xn)∼(Z1+µ, . . . , Zn+µ), and ¯

X−µ∼Z+µ−µ=¯

Z. The

distribution of ¯

Zdoes not depend on µ.

b. For Xi∼f(x/σ)/σ, (X1, . . . , Xn)∼(σZ1, . . . , σZn), and ¯

X/σ ∼σZ/σ =¯

Z. The distribu-

tion of ¯

Zdoes not depend on σ.

c. For Xi∼f((x−µ)/σ)/σ, (X1, . . . , Xn)∼(σZ1+µ, . . . , σZn+µ), and ( ¯

X−µ)/SX∼

(σZ +µ−µ)/SσZ+µ=σ¯

Z/(σSZ) = ¯

Z/SZ. The distribution of ¯

Z/SZdoes not depend on

µor σ.

9.11 Recall that if θis the true parameter, then FT(T|θ)∼uniform(0,1). Thus,

Pθ0({T:α1≤FT(T|θ0)≤1−α2}) = P(α1≤U≤1−α2) = 1 −α2−α1,

where U∼uniform(0,1). Since

t∈ {t:α1≤FT(t|θ)≤1−α2} ⇔ θ∈ {θ:α1≤FT(t|θ)≤1−α2}

the same calculation shows that the interval has conﬁdence 1 −α2−α1.

9.12 If X1, . . . , Xn∼iid n(θ, θ), then √n(¯

X−θ)/√θ∼n(0,1) and a 1 −αconﬁdence interval is

{θ:|√n(¯x−θ)/√θ| ≤ zα/2}. Solving for θ, we get

nθ:nθ2−θ2n¯x+z2

α/2+n¯x2≤0o=nθ:θ∈2n¯x+z2

α/2±q4n¯xz2

α/2+z4

α/2/2no.

Simpler answers can be obtained using the tpivot, ( ¯

X−θ)/(S/√n), or the χ2pivot, (n−1)S2/θ2.

(Tom Werhley of Texas A&M university notes the following: The largest probability of getting

a negative discriminant (hence empty conﬁdence interval) occurs when √nθ =1

2zα/2, and

the probability is equal to α/2. The behavior of the intervals for negative values of ¯xis also

interesting. When ¯x= 0 the lefthand endpoint is also equal to 0, but when ¯x < 0, the lefthand

endpoint is positive. Thus, the interval based on ¯x= 0 contains smaller values of θthan that

based on ¯x < 0. The intervals get smaller as ¯xdecreases, ﬁnally becoming empty.)

Second Edition 9-5

9.13 a. For Y=−(log X)−1, the pdf of Yis fY(y) = θ

y2e−θ/y , 0 < y < ∞, and

P(Y/2≤θ≤Y) = Z2θ

y2e−θ/y dy =e−θ/y

2θ

θ=e−1/2−e−1=.239.

b. Since fX(x) = θxθ−1, 0 < x < 1, T=Xθis a good guess at a pivot, and it is since fT(t) = 1,

0< t < 1. Thus a pivotal interval is formed from P(a < Xθ< b) = b−aand is

θ:log b

log x≤θ≤log a

log x.

Since Xθ∼uniform(0,1), the interval will have conﬁdence .239 as long as b−a=.239.

c. The interval in part a) is a special case of the one in part b). To ﬁnd the best interval, we

minimize log b−log asubject to b−a= 1 −α, or b= 1 −α+a. Thus we want to minimize

log(1 −α+a)−log a= log 1+1−α

a, which is minimized by taking aas big as possible.

Thus, take b= 1 and a=α, and the best 1 −αpivotal interval is nθ: 0 ≤θ≤log α

log xo. Thus

the interval in part a) is nonoptimal. A shorter interval with conﬁdence coeﬃcient .239 is

{θ: 0 ≤θ≤log(1 −.239)/log(x)}.

9.14 a. Recall the Bonferroni Inequality (1.2.9), P(A1∩A2)≥P(A1) + P(A2)−1. Let A1=

P(interval covers µ) and A2=P(interval covers σ2). Use the interval (9.2.14), with tn−1,α/4

to get a 1 −α/2 conﬁdence interval for µ. Use the interval after (9.2.14) with b=χ2

n−1,α/4

and a=χ2

n−1,1−α/4to get a 1−α/2 conﬁdence interval for σ. Then the natural simultaneous

set is

Ca(x) = ((µ, σ2): ¯x−tn−1,α/4

√n≤µ≤¯x+tn−1,α/4

√n

and (n−1)s2

χ2

n−1,α/4≤σ2≤(n−1)s2

χ2

n−1,1−α/4)

and PCa(X) covers (µ, σ2)=P(A1∩A2)≥P(A1) + P(A2)−1 = 2(1 −α/2) −1 = 1 −α.

b. If we replace the µinterval in a) by nµ: ¯x−kσ

√n≤µ≤¯x+kσ

√nothen ¯

X−µ

σ/√n∼n(0,1), so we

use zα/4and

Cb(x) = ((µ, σ2): ¯x−zα/4

√n≤µ≤¯x+zα/4

√nand (n−1)s2

χ2

n−1,α/4≤σ2≤(n−1)s2

χ2

n−1,1−α/4)

and PCb(X) covers (µ, σ2)≥2(1 −α/2) −1 = 1 −α.

c. The sets can be compared graphically in the (µ, σ) plane: Cais a rectangle, since µand σ2

are treated independently, while Cbis a trapezoid, with larger σ2giving a longer interval.

Their areas can also be calculated

Area of Ca=2tn−1,α/4

√n(q(n−1)s2 1

χ2

n−1,1−α/4−1

χ2

n−1,α/4!)

Area of Cb="zα/4

√n sn−1

χ2

n−1,1−α/4

+sn−1

χ2

n−1,α/4!#

×(q(n−1)s2 1

χ2

n−1,1−α/4−1

χ2

n−1,α/4!)

and compared numerically.

9-6 Solutions Manual for Statistical Inference

9.15 Fieller’s Theorem says that a 1 −αconﬁdence set for θ=µY/µXis

(θ: ¯x2−t2

n−1,α/2

n−1s2

X!θ2−2 ¯x¯y−t2

n−1,α/2

n−1sY X !θ+ ¯y2−t2

n−1,α/2

n−1s2

Y!≤0).

a. Deﬁne a= ¯x2−ts2

X,b= ¯x¯y−tsY X ,c= ¯y2−ts2

Y, where t=t2

n−1,α/2

n−1. Then the parabola

opens upward if a > 0. Furthermore, if a > 0, then there always exists at least one real root.

This follows from the fact that at θ= ¯y/¯x, the value of the function is negative. For ¯

θ= ¯y/¯x

we have

¯x2−ts2

X¯y

¯x2

−2 (¯x¯y−tsXY )¯y

¯x+¯y2−as2

Y

=−t¯y2

¯x2s2

X−2¯y

¯xsXY +s2

Y

=−t"n

i=1 ¯y2

¯x2(xi−¯x)2−2¯y

¯x(xi−¯x)(yi−¯y)+(yi−¯y)2#

=−t"n

i=1 ¯y

¯x(xi−¯x)−(yi−¯y)2#

which is negative.

b. The parabola opens downward if a < 0, that is, if ¯x2< ts2

X. This will happen if the test of

H0:µX= 0 accepts H0at level α.

c. The parabola has no real roots if b2< ac. This can only occur if a < 0.

9.16 a. The LRT (see Example 8.2.1) has rejection region {x:|¯x−θ0|> zα/2σ/√n}, acceptance

region A(θ0) = {x:−zα/2σ/√n≤¯x−θ0≤zα/2σ/√n}, and 1−αconﬁdence interval C(θ) =

{θ: ¯x−zα/2σ/√n≤θ≤¯x+zα/2σ/√n}.

b. We have a UMP test with rejection region {x: ¯x−θ0<−zασ/√n}, acceptance region

A(θ0) = {x: ¯x−θ0≥ −zασ/√n}, and 1−αconﬁdence interval C(θ) = {θ: ¯x+zασ/√n≥θ}.

c. Similar to b), the UMP test has rejection region {x: ¯x−θ0> zασ/√n}, acceptance region

A(θ0) = {x: ¯x−θ0≤zασ/√n}, and 1 −αconﬁdence interval C(θ) = {θ: ¯x−zασ/√n≤θ}.

9.17 a. Since X−θ∼uniform(−1/2,1/2), P(a≤X−θ≤b) = b−a. Any aand bsatisfying

b=a+ 1 −αwill do. One choice is a=−1

2+α

2,b=1

2−α

b. Since T=X/θ has pdf f(t) = 2t, 0 ≤t≤1,

P(a≤X/θ ≤b) = Zb

2t dt =b2−a2.

Any aand bsatisfying b2=a2+ 1 −αwill do. One choice is a=pα/2, b=p1−α/2.

9.18 a. Pp(X= 1) = 3

1p1(1 −p)3−1= 3p(1 −p)2, maximum at p= 1/3.

Pp(X= 2) = 3

2p2(1 −p)3−2= 3p2(1 −p), maximum at p= 2/3.

b. P(X= 0) = 3

0p0(1 −p)3−0= (1 −p)3, and this is greater than P(X= 2) if (1 −p)2>3p2,

or 2p2+ 2p−1<0. At p= 1/3, 2p2+ 2p−1 = −1/9.

c. To show that this is a 1 −α=.442 interval, compare with the interval in Example 9.2.11.

There are only two discrepancies. For example,

P(p∈interval |.362 < p < .634) = P(X= 1 or X= 2) > .442

by comparison with Sterne’s procedure, which is given by

Second Edition 9-7

x interval

0 [.000,.305)

1 [.305,.634)

2 [.362,.762)

3 [.695,1].

9.19 For FT(t|θ) increasing in θ, there are unique values θU(t) and θL(t) such that FT(t|θ)<1−α

if and only if θ < θU(t) and FT(t|θ)>α

2if and only if θ > θL(t). Hence,

P(θL(T)≤θ≤θU(T)) = P(θ≤θU(T)) −P(θ≤θL(T))

=PFT(T)≤1−α

2−PFT(T)≤α

2

= 1 −α.

9.21 To construct a 1 −αconﬁdence interval for pof the form {p:`≤p≤u}with P(`≤p≤u) =

1−α, we use the method of Theorem 9.2.12. We must solve for `and uin the equations

(1) α

k=0 n

kuk(1 −u)n−kand (2) α

k=xn

k`k(1 −`)n−k.

In equation (1) α/2 = P(K≤x) = P(Y≤1−u), where Y∼beta(n−x, x + 1) and

K∼binomial(n, u). This is Exercise 2.40. Let Z∼F2(n−x),2(x+1) and c= (n−x)/(x+ 1). By

Theorem 5.3.8c, cZ/(1 + cZ)∼beta(n−x, x + 1) ∼Y. So we want

α/2 = PcZ

(1 + cZ)≤1−u=P1

Z≥cu

1−u.

From Theorem 5.3.8a, 1/Z ∼F2(x+1),2(n−x). So we need cu/(1−u) = F2(x+1),2(n−x),α/2. Solving

for uyields

x+1

n−xF2(x+1),2(n−x),α/2

1 + x+1

n−xF2(x+1),2(n−x),α/2

A similar manipulation on equation (2) yields the value for `.

9.23 a. The LRT statistic for H0:λ=λ0versus H1:λ6=λ0is

g(y) = e−nλ0(nλ0)y/e−nˆ

λ(nˆ

λ)y,

where Y=PXi∼Poisson(nλ) and ˆ

λ=y/n. The acceptance region for this test is

A(λ0) = {y:g(y)> c(λ0)) where c(λ0) is chosen so that P(Y∈A(λ0)) ≥1−α.g(y) is a

unimodal function of yso A(λ0) is an interval of yvalues. Consider constructing A(λ0) for

each λ0>0. Then, for a ﬁxed y, there will be a smallest λ0, call it a(y), and a largest λ0,

call it b(y), such that y∈A(λ0). The conﬁdence interval for λis then C(y)=(a(y), b(y)).

The values a(y) and b(y) are not expressible in closed form. They can be determined by a

numerical search, constructing A(λ0) for diﬀerent values of λ0and determining those values

for which y∈A(λ0). (Jay Beder of the University of Wisconsin, Milwaukee, reminds us that

since cis a function of λ, the resulting conﬁdence set need not be a highest density region

of a likelihood function. This is an example of the eﬀect of the imposition of one type of

inference (frequentist) on another theory (likelihood).)

b. The procedure in part a) was carried out for y= 558 and the conﬁdence interval was found to

be (57.78,66.45). For the conﬁdence interval in Example 9.2.15, we need the values χ2

1116,.95 =

1039.444 and χ2

1118,.05 = 1196.899. This conﬁdence interval is (1039.444/18,1196.899/18) =

(57.75,66.49). The two conﬁdence intervals are virtually the same.

9-8 Solutions Manual for Statistical Inference

9.25 The conﬁdence interval derived by the method of Section 9.2.3 is

C(y) = µ:y+1

nlog α

2≤µ≤y+1

nlog 1−α

2

where y= minixi. The LRT method derives its interval from the test of H0:µ=µ0versus

H1:µ6=µ0. Since Yis suﬃcient for µ, we can use fY(y|µ). We have

λ(y) = supµ=µ0L(µ|y)

supµ∈(−∞,∞)L(µ|y)=ne−n(y−µ0)I[µ0,∞)(y)

ne−(y−y)I[µ,∞)(y)

=e−n(y−µ0)I[µ0,∞)(y) = 0 if y < µ0

e−n(y−µ0)if y≥µ0.

We reject H0if λ(y) = e−n(y−µ0)< cα, where 0 ≤cα≤1 is chosen to give the test level α. To

determine cα, set

α=P{reject H0|µ=µ0}=PY > µ0−log cα

nor Y < µ0µ=µ0

=PY > µ0−log cα

nµ=µ0=Z∞

µ0−log cα

ne−n(y−µ0)dy

=−e−n(y−µ0)∞

µ0−log cα

=elog cα=cα.

Therefore, cα=αand the 1 −αconﬁdence interval is

C(y) = µ:µ≤y≤µ−log α

n=µ:y+1

nlog α≤µ≤y.

To use the pivotal method, note that since µis a location parameter, a natural pivotal quantity

is Z=Y−µ. Then, fZ(z) = ne−nzI(0,∞)(z).Let P{a≤Z≤b}= 1 −α, where aand bsatisfy

2=Za

ne−nz dz =−e−nza

0= 1 −e−na ⇒e−na = 1 −α

⇒a=−log 1−α

2

2=Z∞

ne−nz dz =−e−nz∞

b=e−nb ⇒ −nb = log α

⇒b=−1

nlog α

2

Thus, the pivotal interval is Y+ log(α/2)/n ≤µ≤Y+ log(1 −α/2), the same interval as from

Example 9.2.13. To compare the intervals we compare their lengths. We have

Length of LRT interval = y−(y+1

nlog α) = −1

nlog α

Length of Pivotal interval = y+1

nlog(1 −α/2)−(y+1

nlog α/2) = 1

nlog 1−α/2

α/2

Thus, the LRT interval is shorter if −log α < log[(1 −α/2)/(α/2)], but this is always satisﬁed.

9.27 a. Y=PXi∼gamma(n, λ), and the posterior distribution of λis

π(λ|y) = (y+1

b)n+a

Γ(n+a)

λn+a+1 e−1

λ(y+1

b),

Second Edition 9-9

an IG n+a, (y+1

b)−1. The Bayes HPD region is of the form {λ:π(λ|y)≥k}, which is

an interval since π(λ|y) is unimodal. It thus has the form {λ:a1(y)≤λ≤a2(y)}, where a1

and a2satisfy 1

a1n+a+1 e−1

a1(y+1

b)=1

a2n+a+1 e−1

a2(y+1

b).

b. The posterior distribution is IG(((n−1)/2) + a, (((n−1)s2/2) + 1/b)−1). So the Bayes HPD

region is as in part a) with these parameters replacing n+aand y+ 1/b.

c. As a→0 and b→ ∞, the condition on a1and a2becomes

a1((n−1)/2)+1 e−1

(n−1)s2

2=1

a2((n−1)/2)+1 e−1

(n−1)s2

9.29 a. We know from Example 7.2.14 that if π(p)∼beta(a, b), the posterior is π(p|y)∼beta(y+

a, n −y+b) for y=Pxi.Soa1−αcredible set for pis:

{p:βy+a,n−y+b,1−α/2≤p≤βy+a,n−y+b,α/2}.

b. Converting to an Fdistribution, βc,d =(c/d)F2c,2d

1+(c/d)F2c,2d, the interval is

y+a

n−y+bF2(y+a),2(n−y+b),1−α/2

1 + y+a

n−y+bF2(y+a),2(n−y+b),1−α/2≤p≤

y+a

n−y+bF2(y+a),2(n−y+b),α/2

1 + y+a

n−y+bF2(y+a),2(n−y+b),α/2

or, using the fact that Fm,n =F−1

n,m,

1 + n−y+b

y+aF2(n−y+b),2(y+a),α/2≤p≤

y+a

n−y+bF2(y+a),2(n+b),α/2

1 + y+a

n−y+bF2(y+a),2(n−y+b),α/2

For this to match the interval of Exercise 9.21, we need x=yand

Lower limit: n−y+b=n−x+ 1 ⇒b= 1

y+a=x⇒a= 0

Upper limit: y+a=x+ 1 ⇒a= 1

n−y+b=n−x⇒b= 0.

So no values of aand bwill make the intervals match.

9.31 a. We continually use the fact that given Y=y,χ2

2yis a central χ2random variable with 2y

degrees of freedom. Hence

Eχ2

2Y= E[E(χ2

2Y|Y)] = E2Y= 2λ

Varχ2

2Y= E[Var(χ2

2Y|Y)] + Var[E(χ2

2Y|Y)]

= E[4Y] + Var[2Y]=4λ+ 4λ= 8λ

mgf = Eetχ2

2Y= E[E(etχ2

2Y|Y)] = E 1

1−2tY

=∞

y=0

e−λλ

1−2ty

y!=e−λ+λ

1−2t.

From Theorem 2.3.15, the mgf of (χ2

2Y−2λ)/√8λis

e−t√λ/2he−λ+λ

1−t/√2λi.

9-10 Solutions Manual for Statistical Inference

The log of this is

−pλ/2t−λ+λ

1−t/√2λ=t2√λ

−t√2+2√λ=t2

−(t√2/√λ)+2 →t2/2 as λ→ ∞,

so the mgf converges to et2/2, the mgf of a standard normal.

b. Since P(χ2

2Y≤χ2

2Y,α) = αfor all λ,

χ2

2Y,α −2λ

√8λ→zαas λ→ ∞.

In standardizing (9.2.22), the upper bound is

nb+1 χ2

2(Y+a),α/2−2λ

√8λ=r8(λ+a)

8λ"nb

nb+1 [χ2

2(Y+a),α/2−2(λ+a)]

p8(λ+a)+

nb+1 2(λ+a)−2λ

p8(λ+a)#.

While the ﬁrst quantity in square brackets →zα/2, the second one has limit

lim

λ→∞

−21

nb+1 λ+anb

nb+1

p8(λ+a)→ −∞,

so the coverage probability goes to zero.

9.33 a. Since 0 ∈Ca(x) for every x,P(0 ∈Ca(X)|µ= 0) = 1. If µ > 0,

P(µ∈Ca(X)) = P(µ≤max{0, X +a}) = P(µ≤X+a) (since µ > 0)

=P(Z≥ −a) (Z∼n(0,1))

=.95 (a= 1.645.)

A similar calculation holds for µ < 0.

b. The credible probability is

Zmax(0,x+a)

min(0,x−a)

√2πe−1

2(µ−x)2dµ =Zmax(−x,a)

min(−x,−a)

√2πe−1

2t2dt

=P(min(−x, −a)≤Z≤max(−x, a)) .

To evaluate this probability we have two cases:

(i) |x| ≤ a⇒credible probability = P(|Z| ≤ a)

(ii) |x|> a ⇒credible probability = P(−a≤Z≤ |x|)

Thus we see that for a= 1.645, the credible probability is equal to .90 if |x| ≤ 1.645 and

increases to .95 as |x| → ∞.

9.34 a. A 1 −αconﬁdence interval for µis {µ: ¯x−1.96σ/√n≤µ≤¯x+ 1.96σ/√n}. We need

2(1.96)σ/√n≤σ/4 or √n≥4(2)(1.96). Thus we need n≥64(1.96)2≈245.9. So n= 246

suﬃces.

b. The length of a 95% conﬁdence interval is 2tn−1,.025S/√n. Thus we need

P2tn−1,.025

√n≤σ

4≥.9⇒P4t2

n−1,.025

n≤σ2

16 ≥.9

⇒P





(n−1)S2

σ2

| {z }

∼χ2

n−1

≤(n−1)n

n−1,.025 ·64



≥.9.

Second Edition 9-11

We need to solve this numerically for the smallest nthat satisﬁes the inequality

(n−1)n

n−1,.025 ·64 ≥χ2

n−1,.1.

Trying diﬀerent values of nwe ﬁnd that the smallest such nis n= 276 for which

(n−1)n

n−1,.025 ·64 = 306.0≥305.5 = χ2

n−1,.1.

As to be expected, this is somewhat larger than the value found in a).

9.35 length = 2zα/2σ/√n, and if it is unknown, E(length) = 2tα/2,n−1cσ/√n, where

c=√n−1Γ(n−1

√2Γ(n/2)

and EcS =σ(Exercise 7.50). Thus the diﬀerence in lengths is (2σ/√n)(zα/2−ctα/2). A little

work will show that, as n→ ∞,c→constant. (This can be done using Stirling’s formula along

with Lemma 2.3.14. In fact, some careful algebra will show that c→1 as n→ ∞.) Also, we know

that, as n→ ∞,tα/2,n−1→zα/2. Thus, the diﬀerence in lengths (2σ/√n)(zα/2−ctα/2)→0

as n→ ∞.

9.36 The sample pdf is

f(x1, . . . , xn|θ) =

i=1

eiθ−xiI(iθ,∞)(xi) = eΣ(iθ−xi)I(θ,∞)[min(xi/i)].

Thus T= min(Xi/i) is suﬃcient by the Factorization Theorem, and

P(T > t) =

i=1

P(Xi> it) =

i=1 Z∞

eiθ−xdx =

i=1

ei(θ−t)=e−n(n+1)

2(t−θ),

and

fT(t) = n(n+ 1)

2e−n(n+1)

2(t−θ), t ≥θ.

Clearly, θis a location parameter and Y=T−θis a pivot. To ﬁnd the shortest conﬁdence

interval of the form [T+a, T +b], we must minimize b−asubject to the constraint P(−b≤

Y≤ −a)=1−α. Now the pdf of Yis strictly decreasing, so the interval length is shortest if

−b= 0 and asatisﬁes

P(0 ≤Y≤ −a) = e−n(n+1)

2a= 1 −α.

So a= 2 log(1 −α)/(n(n+ 1)).

9.37 a. The density of Y=X(n)is fY(y) = nyn−1/θn, 0 < y < θ. So θis a scale parameter, and

T=Y/θ is a pivotal quantity. The pdf of Tis fT(t) = ntn−1, 0 ≤t≤1.

b. A pivotal interval is formed from the set

{θ:a≤t≤b}=nθ:a≤y

θ≤bo=nθ:y

b≤θ≤y

ao,

and has length Y(1/a −1/b) = Y(b−a)/ab. Since fT(t) is increasing, b−ais minimized

and ab is maximized if b= 1. Thus shortest interval will have b= 1 and asatisfying

α=Ra

0ntn−1dt =an⇒a=α1/n. So the shortest 1 −αconﬁdence interval is {θ:y≤θ≤

y/α1/n}.

9-12 Solutions Manual for Statistical Inference

9.39 Let abe such that Ra

−∞ f(x)dx =α/2. This value is unique for a unimodal pdf if α > 0. Let µ

be the point of symmetry and let b= 2µ−a. Then f(b) = f(a) and R∞

bf(x)dx =α/2. a≤µ

since Ra

−∞ f(x)dx =α/2≤1/2 = Rµ

−∞ f(x)dx. Similarly, b≥µ. And, f(b) = f(a)>0 since

f(a)≥f(x) for all x≤aand Ra

−∞ f(x)dx =α/2>0⇒f(x)>0 for some x < a ⇒f(a)>0.

So the conditions of Theorem 9.3.2 are satisﬁed.

9.41 a. We show that for any interval [a, b] and  > 0, the probability content of [a−, b −] is

greater (as long as b− > a). Write

f(x)dx −Zb−

a−

f(x)dx =Zb

b−

f(x)dx −Za

a−

f(x)dx

≤f(b−)[b−(b−)] −f(a)[a−(a−)]

≤[f(b−)−f(a)] ≤0,

where all of the inequalities follow because f(x) is decreasing. So moving the interval toward

zero increases the probability, and it is therefore maximized by moving a all the way to zero.

b. T=Y−µis a pivot with decreasing pdf fT(t) = ne−ntI[0,∞](t). The shortest 1 −αinterval

on Tis [0,−1

nlog α], since

ne−nt dt = 1 −α⇒b=−1

nlog α.

Since a≤T≤bimplies Y−b≤µ≤Y−a, the best 1−αinterval on µis Y+1

nlog α≤µ≤Y.

9.43 a. Using Theorem 8.3.12, identify g(t) with f(x|θ1) and f(t) with f(x|θ0). Deﬁne φ(t) = 1 if

t∈Cand 0 otherwise, and let φ0be the indicator of any other set C0satisfying RC0f(t)dt ≥

1−α. Then (φ(t)−φ0(t))(g(t)−λf(t)) ≤0 and

0≥Z(φ−φ0)(g−λf) = ZC

g−ZC0

g−λZC

f−ZC0

f≥ZC

g−ZC0

showing that Cis the best set.

b. For Exercise 9.37, the pivot T=Y/θ has density ntn−1, and the pivotal interval a≤T≤b

results in the θinterval Y/b ≤θ≤Y/a. The length is proportional to 1/a −1/b, and thus

g(t) = 1/t2. The best set is {t: 1/t2≤λntn−1}, which is a set of the form {t:a≤t≤1}.

This has probability content 1 −αif a=α1/n. For Exercise 9.24 (or Example 9.3.4), the g

function is the same and the density of the pivot is fk, the density of a gamma(k, 1). The

set {t: 1/t2≤λfk(t)}={t:fk+2(t)≥λ0}, so the best aand bsatisfy Rb

afk(t)dt = 1 −α

and fk+2(a) = fk+2(b).

9.45 a. Since Y=PXi∼gamma(n, λ) has MLR, the Karlin-Rubin Theorem (Theorem 8.3.2)

shows that the UMP test is to reject H0if Y < k(λ0), where P(Y < k(λ0)|λ=λ0) = α.

b. T= 2Y/λ ∼χ2

2nso choose k(λ0) = 1

2λ0χ2

2n,α and

{λ:Y≥k(λ)}=λ:Y≥1

2λχ2

2n,α=λ: 0 < λ ≤2Y/χ2

2n,α

is the UMA conﬁdence set.

c. The expected length is E 2Y

χ2

2n,α =2nλ

χ2

2n,α .

d. X(1) ∼exponential(λ/n), so EX(1) =λ/n. Thus

E(length(C∗)) = 2×120

251.046λ=.956λ

E(length(Cm)) = −λ

120 ×log(.99) =.829λ.

Second Edition 9-13

9.46 The proof is similar to that of Theorem 9.3.5:

Pθ(θ0∈C∗(X)) = Pθ(X∈A∗(θ0)) ≤Pθ(X∈A(θ0)) = Pθ(θ0∈C(X)) ,

where Aand Care any competitors. The inequality follows directly from Deﬁnition 8.3.11.

9.47 Referring to (9.3.2), we want to show that for the upper conﬁdence bound, Pθ(θ0∈C)≤1−α

if θ0≥θ. We have

Pθ(θ0∈C) = Pθ(θ0≤¯

X+zασ/√n).

Subtract θfrom both sides and rearrange to get

Pθ(θ0∈C) = Pθθ0−θ

σ/√n≤¯

X−θ

σ/√n+zα=PZ≥θ0−θ

σ/√n−zα,

which is less than 1 −αas long as θ0≥θ. The solution for the lower conﬁdence interval is

similar.

9.48 a. Start with the hypothesis test H0:θ≥θ0versus H1:θ < θ0. Arguing as in Example 8.2.4

and Exercise 8.47, we ﬁnd that the LRT rejects H0if ( ¯

X−θ0)/(S/√n)<−tn−1,α. So the

acceptance region is {x: (¯x−θ0)/(s/√n)≥ −tn−1,α}and the corresponding conﬁdence set

is {θ: ¯x+tn−1,αs/√n≥θ}.

b. The test in part a) is the UMP unbiased test so the interval is the UMA unbiased interval.

9.49 a. Clearly, for each σ, the conditional probability Pθ0(¯

X > θ0+zασ/√n|σ) = α, hence the

test has unconditional size α. The conﬁdence set is {(θ,σ) : θ≥¯x−zασ/√n}, which has

conﬁdence coeﬃcient 1 −αconditionally and, hence, unconditionally.

b. From the Karlin-Rubin Theorem, the UMP test is to reject H0if X > c. To make this size

α,

Pθ0(X > c) = Pθ0(X > c|σ= 10) P(σ= 10) + P(X > c|σ= 1) P(σ= 1)

=pP X−θ0

10 >c−θ0

10 + (1 −p)P(X−θ0> c −θ0)

=pP Z > c−θ0

10 + (1 −p)P(Z > c −θ0),

where Z∼n(0,1). Without loss of generality take θ0= 0. For c=z(α−p)/(1−p)we have for

the proposed test

Pθ0(reject) = p+ (1 −p)PZ > z(α−p)/(1−p)

=p+ (1 −p)(α−p)

(1 −p)=p+α−p=α.

This is not UMP, but more powerful than part a. To get UMP, solve for cin pP (Z >

c/10) + (1 −p)P(Z > c) = α, and the UMP test is to reject if X > c. For p= 1/2, α=.05,

we get c= 12.81. If α=.1 and p=.05, c= 1.392 and z.1−.05

.95 =.0526= 1.62.

9.51

Pθ(θ∈C(X1, . . . , Xn)) = Pθ¯

X−k1≤θ≤¯

X+k2

=Pθ−k2≤¯

X−θ≤k1

=Pθ−k2≤XZi/n ≤k1,

where Zi=Xi−θ,i= 1, . . . , n. Since this is a location family, for any θ,Z1, . . . , Znare iid

with pdf f(z), i. e., the Zis are pivots. So the last probability does not depend on θ.

9-14 Solutions Manual for Statistical Inference

9.52 a. The LRT of H0:σ=σ0versus H1:σ6=σ0is based on the statistic

λ(x) = supµ,σ=σ0L(µ, σ0|x)

supµ,σ∈(0,∞)L(µ, σ2|x).

In the denominator, ˆσ2=P(xi−¯x)2/n and ˆµ= ¯xare the MLEs, while in the numerator,

σ2

0and ˆµare the MLEs. Thus

λ(x) = 2πσ2

0−n/2e−Σ(xi−¯x)2

2σ2

(2πˆσ2)−n/2e−Σ(xi−¯x)2

2σ2

=σ2

ˆσ2−n/2e−Σ(xi−¯x)2

2σ2

e−n/2,

and, writing ˆσ2= [(n−1)/n]s2, the LRT rejects H0if

σ2

n−1

ns2−n/2

e−(n−1)s2

2σ2

0< kα,

where kαis chosen to give a size αtest. If we denote t=(n−1)s2

σ2

0, then T∼χ2

n−1under H0,

and the test can be written: reject H0if tn/2e−t/2< k0

α. Thus, a 1 −αconﬁdence set is

nσ2:tn/2e−t/2≥k0

αo=(σ2:(n−1)s2

σ2n/2

e−(n−1)s2

σ2/2≥k0

α).

Note that the function tn/2e−t/2is unimodal (it is the kernel of a gamma density) so it

follows that the conﬁdence set is of the form

nσ2:tn/2e−t/2≥k0

αo=σ2:a≤t≤b=σ2:a≤(n−1)s2

σ2≤b

=σ2:(n−1)s2

b≤σ2≤(n−1)s2

b,

where aand bsatisfy an/2e−a/2=bn/2e−b/2(since they are points on the curve tn/2e−t/2).

Since n

2=n+2

2−1, aand balso satisfy

Γn+2

22(n+2)/2a((n+2)/2)−1e−a/2=1

Γn+2

22(n+2)/2b((n+2)/2)−1e−b/2,

or, fn+2(a) = fn+2(b).

b. The constants aand bmust satisfy fn−1(b)b2=fn−1(a)a2. But since b((n−1)/2)−1b2=

b((n+3)/2)−1, after adjusting constants, this is equivalent to fn+3(b) = fn+3(a). Thus, the

values of aand bthat give the minimum length interval must satisfy this along with the

probability constraint. The conﬁdence interval, say I(s2) will be unbiased if (Deﬁnition 9.3.7)

Pσ2σ02∈I(S2)≤Pσ2σ2∈I(S2)= 1 −α.

Some algebra will establish

Pσ2σ02∈I(S2)=Pσ2 (n−1)S2

bσ2≤σ02

σ2≤(n−1)S2

aσ2!

=Pσ2χ2

n−1

b≤σ02

σ2≤χ2

n−1

a=Zbc

fn−1(t)dt,

Second Edition 9-15

where c=σ02/σ2. The derivative (with respect to c) of this last expression is bfn−1(bc)−

afn−1(ac), and hence is equal to zero if both c= 1 (so the interval is unbiased) and

bfn−1(b) = afn−1(a). From the form of the chi squared pdf, this latter condition is equivalent

to fn+1(b) = fn+1(a).

d. By construction, the interval will be 1 −αequal-tailed.

9.53 a. E [blength(C)−IC(µ)] = 2cσb −P(|Z| ≤ c), where Z∼n(0,1).

b. d

dc [2cσb −P(|Z| ≤ c)] = 2σb −21

√2πe−c2/2.

c. If bσ > 1/√2πthe derivative is always positive since e−c2/2<1.

9.55

E[L((µ,σ), C)] = E [L((µ,σ), C)|S < K]P(S < K) + E [L((µ,σ), C)|S > K]P(S > K)

= E L((µ,σ), C0)|S < KP(S < K) + E [L((µ,σ), C)|S > K]P(S > K)

=RL((µ,σ), C0)+ E [L((µ,σ), C)|S > K]P(S > K),

where the last equality follows because C0=∅if S > K. The conditional expectation in the

second term is bounded by

E [L((µ,σ), C)|S > K] = E [blength(C)−IC(µ)|S > K]

= E [2bcS −IC(µ)|S > K]

>E [2bcK −1|S > K] (since S > K and IC≤1)

= 2bcK −1,

which is positive if K > 1/2bc. For those values of K,C0dominates C.

9.57 a. The distribution of Xn+1 −¯

Xis n[0, σ2(1 + 1/n)], so

PXn+1 ∈¯

X±zα/2σp1+1/n=P(|Z| ≤ zα/2) = 1 −α.

b. ppercent of the normal population is in the interval µ±zp/2σ, so ¯x±kσ is a 1 −αtolerance

interval if

P(µ±zp/2⊆σ¯

X±kσ) = P(¯

X−kσ ≤µ−zp/2σand ¯

X+kσ ≥µ+zp/2σ)≥1−α.

This can be attained by requiring

P(¯

X−kσ ≥µ−zp/2σ) = α/2 and P(¯

X+kσ ≤µ+zp/2σ) = α/2,

which is attained for k=zp/2+zα/2/√n.

c. From part (a), (Xn+1 −¯

X)/(Sp1+1/n)∼tn−1,soa1−αprediction interval is ¯

X±

tn−1,α/2Sp1+1/n.

Chapter 10

Asymptotic Evaluations

10.1 First calculate some moments for this distribution.

EX=θ/3,EX2= 1/3,VarX=1

3−θ2

So 3 ¯

Xnis an unbiased estimator of θwith variance

Var(3 ¯

Xn) = 9(VarX)/n = (3 −θ2)/n →0 as n→ ∞.

So by Theorem 10.1.3, 3 ¯

Xnis a consistent estimator of θ.

10.3 a. The log likelihood is

−n

2log (2πθ)−1

2X(xi−θ)/θ.

Diﬀerentiate and set equal to zero, and a little algebra will show that the MLE is the root

of θ2+θ−W= 0. The roots of this equation are (−1±√1+4W)/2, and the MLE is the

root with the plus sign, as it has to be nonnegative.

b. The second derivative of the log likelihood is (−2Px2

i+nθ)/(2θ3), yielding an expected

Fisher information of

I(θ) = −Eθ−2PX2

i+nθ

2θ3=2nθ +n

2θ2,

and by Theorem 10.1.12 the variance of the MLE is 1/I(θ).

10.4 a. Write PXiYi

PX2

=PXi(Xi+i)

PX2

= 1 + PXii

PX2

From normality and independence

EXii= 0,VarXii=σ2(µ2+τ2),EX2

i=µ2+τ2,VarX2

i= 2τ2(2µ2+τ2),

and Cov(Xi, Xii) = 0. Applying the formulas of Example 5.5.27, the asymptotic mean

and variance are

EPXiYi

PX2

i≈1 and Var PXiYi

PX2

i≈nσ2(µ2+τ2)

[n(µ2+τ2)]2=σ2

n(µ2+τ2)

b. PYi

PXi

=β+Pi

PXi

with approximate mean βand variance σ2/(nµ2).

10-2 Solutions Manual for Statistical Inference

c. 1

nXYi

=β+1

nXi

with approximate mean βand variance σ2/(nµ2).

10.5 a. The integral of ET2

nis unbounded near zero. We have

ET2

n>rn

2πσ2Z1

x2e−(x−µ)2/2σ2dx > rn

2πσ2KZ1

x2dx =∞,

where K= max0≤x≤1e−(x−µ)2/2σ2

b. If we delete the interval (−δ, δ), then the integrand is bounded, that is, over the range of

integration 1/x2<1/δ2.

c. Assume µ > 0. A similar argument works for µ < 0. Then

P(−δ < X < δ) = P[√n(−δ−µ)<√n(X−µ)<√n(δ−µ)] < P [Z < √n(δ−µ)],

where Z∼n(0,1). For δ < µ, the probability goes to 0 as n→ ∞.

10.7 We need to assume that τ(θ) is diﬀerentiable at θ=θ0, the true value of the parameter. Then

we apply Theorem 5.5.24 to Theorem 10.1.12.

10.9 We will do a more general problem that includes a) and b) as special cases. Suppose we want

to estimate λte−λ/t! = P(X=t). Let

T=T(X1, . . . , Xn) = 1 if X1=t

0 if X16=t.

Then ET=P(T= 1) = P(X1=t), so Tis an unbiased estimator. Since PXiis a complete

suﬃcient statistic for λ, E(T|PXi) is UMVUE. The UMVUE is 0 for y=PXi< t, and for

y≥t,

E(T|y) = P(X1=t|XXi=y)

=P(X1=t, PXi=y)

P(PXi=y)

=P(X1=t)P(Pn

i=2 Xi=y−t)

P(PXi=y)

={λte−λ/t!}{[(n−1)λ]y−te−(n−1)λ/(y−t)!}

(nλ)ye−nλ/y!

=y

t(n−1)y−t

ny.

a. The best unbiased estimator of e−λis ((n−1)/n)y.

b. The best unbiased estimator of λe−λis (y/n)[(n−1)/n]y−1

c. Use the fact that for constants aand b,

dλλabλ=bλλa−1(a+λlog b),

to calculate the asymptotic variances of the UMVUEs. We have for t= 0,

ARE n−1

nnˆ

, e−λ!="e−λ

n−1

nnλ log n−1

nn#2

Second Edition 10-3

and for t= 1

ARE n

n−1ˆ

λn−1

nnˆ

,ˆ

λe−λ!="(λ−1)e−λ

n−1n−1

nnλ 1 + log n−1

nn#2

Since [(n−1)/n]n→e−1as n→ ∞, both of these AREs are equal to 1 in the limit.

d. For these data, n= 15, PXi=y= 104 and the MLE of λis ˆ

λ=¯

X= 6.9333. The

estimates are

MLE UMVUE

P(X= 0) .000975 .000765

P(X= 1) .006758 .005684

10.11 a. It is easiest to use the Mathematica code in Example A.0.7. The second derivative of the

log likelihood is

∂2

∂µ2log 1

Γ[µ/β]βµ/β x−1+µ/β e−x/β =1

β2ψ0(µ/β),

where ψ(z) = Γ0(z)/Γ(z) is the digamma function.

b. Estimation of βdoes not aﬀect the calculation.

c. For µ=αβ known, the MOM estimate of βis ¯x/α. The MLE comes from diﬀerentiating

the log likelihood

dβ −αn log β−X

xi/β!set

= 0 ⇒β= ¯x/α.

d. The MOM estimate of βcomes from solving

xi=µand 1

i=µ2+µβ,

which yields ˜

β= ˆσ2/¯x. The approximate variance is quite a pain to calculate. Start from

E¯

X=µ, Var ¯

X=1

nµβ, Eˆσ2≈µβ, Varˆσ2≈2

nµβ3,

where we used Exercise 5.8(b) for the variance of ˆσ2. Now using Example 5.5.27 and (and

assuming the covariance is zero), we have Var ˜

β≈3β3

nµ . The ARE is then

ARE( ˆ

β, ˜

β) = 3β3/µE−d2

dβ2l(µ, β|X.

Here is a small table of AREs. There are some entries that are less than one - this is due

to using an approximation for the MOM variance.

β1 3 6 10

1 1.878 0.547 0.262 0.154

2 4.238 1.179 0.547 0.317

3 6.816 1.878 0.853 0.488

4 9.509 2.629 1.179 0.667

5 12.27 3.419 1.521 0.853

6 15.075 4.238 1.878 1.046

7 17.913 5.08 2.248 1.246

8 20.774 5.941 2.629 1.451

9 23.653 6.816 3.02 1.662

10 26.546 7.704 3.419 1.878

10-4 Solutions Manual for Statistical Inference

10.13 Here are the 35 distinct samples from {2,4,9,12}and their weights.

{12,12,12,12},1/256 {9,12,12,12},1/64 {9,9,12,12},3/128

{9,9,9,12},1/64 {9,9,9,9},1/256 {4,12,12,12},1/64

{4,9,12,12},3/64 {4,9,9,12},3/64 {4,9,9,9},1/64

{4,4,12,12},3/128 {4,4,9,12},3/64 {4,4,9,9},3/128

{4,4,4,12},1/64 {4,4,4,9},1/64 {4,4,4,4},1/256

{2,12,12,12},1/64 {2,9,12,12},3/64 {2,9,9,12},3/64

{2,9,9,9},1/64 {2,4,12,12},3/64 {2,4,9,12},3/32

{2,4,9,9},3/64 {2,4,4,12},3/64 {2,4,4,9},3/64

{2,4,4,4},1/64 {2,2,12,12},3/128 {2,2,9,12},3/64

{2,2,9,9},3/128 {2,2,4,12},3/64 {2,2,4,9},3/64

{2,2,4,4},3/128 {2,2,2,12},1/64 {2,2,2,9},1/64

{2,2,2,4},1/64 {2,2,2,2},1/256

The veriﬁcations of parts (a)−(d) can be done with this table, or the table of means

in Example A.0.1 can be used. For part (e),verifying the bootstrap identities can involve

much painful algebra, but it can be made easier if we understand what the bootstrap sample

space (the space of all nnbootstrap samples) looks like. Given a sample x1, x2, . . . , xn, the

bootstrap sample space can be thought of as a data array with nnrows (one for each

bootstrap sample) and ncolumns, so each row of the data array is one bootstrap sample.

For example, if the sample size is n= 3, the bootstrap sample space is

x1x1x1

x1x1x2

x1x1x3

x1x2x1

x1x2x2

x1x2x3

x1x3x1

x1x3x2

x1x3x3

x2x1x1

x2x1x2

x2x1x3

x2x2x1

x2x2x2

x2x2x3

x2x3x1

x2x3x2

x2x3x3

x3x1x1

x3x1x2

x3x1x3

x3x2x1

x3x2x2

x3x2x3

x3x3x1

x3x3x2

x3x3x3

Note the pattern. The ﬁrst column is 9 x1s followed by 9 x2s followed by 9 x3s, the second

column is 3 x1s followed by 3 x2s followed by 3 x3s, then repeated, etc. In general, for the

entire bootstrap sample,

Second Edition 10-5

◦The ﬁrst column is nn−1x1s followed by nn−1x2s followed by, . . ., followed by nn−1xns

◦The second column is nn−2x1s followed by nn−2x2s followed by, . . ., followed by nn−2

xns, repeated ntimes

◦The third column is nn−3x1s followed by nn−3x2s followed by, . . ., followed by nn−3

xns, repeated n2times

◦The nth column is 1 x1followed by 1 x2followed by, . . ., followed by 1 xn, repeated nn−1

times

So now it is easy to see that each column in the data array has mean ¯x, hence the entire

bootstrap data set has mean ¯x. Appealing to the 33×3 data array, we can write the

numerator of the variance of the bootstrap means as

i=1

j=1

k=1 1

3(xi+xj+xk)−¯x2

i=1

j=1

k=1

[(xi−¯x)+(xj−¯x)+(xk−¯x)]2

i=1

j=1

k=1 (xi−¯x)2+ (xj−¯x)2+ (xk−¯x)2,

because all of the cross terms are zero (since they are the sum of deviations from the mean).

Summing up and collecting terms shows that

i=1

j=1

k=1 (xi−¯x)2+ (xj−¯x)2+ (xk−¯x)2= 3

i=1

(xi−¯x)2,

and thus the average of the variance of the bootstrap means is

3P3

i=1(xi−¯x)2

which is the usual estimate of the variance of ¯

Xif we divide by ninstead of n−1. The

general result should now be clear. The variance of the bootstrap means is

i1=1

i2=1 ···

in=1 1

n(xi1+xi2+··· +xin)−¯x2

i1=1

i2=1 ···

in=1 (xi1−¯x)2+ (xi2−¯x)2+··· + (xin−¯x)2,

since all of the cross terms are zero. Summing and collecting terms shows that the sum is

nn−2Pn

i=1(xi−¯x)2, and the variance of the bootstrap means is nn−2Pn

i=1(xi−¯x)2/nn=

i=1(xi−¯x)2/n2.

10.15 a. As B→ ∞ Var∗

B(ˆ

θ) = Var∗(ˆ

θ).

b. Each Var∗

Bi(ˆ

θ) is a sample variance, and they are independent so the LLN applies and

i=1

Var∗

Bi(ˆ

θ)m→∞

→EVar∗

B(ˆ

θ) = Var∗(ˆ

θ),

where the last equality follows from Theorem 5.2.6(c).

10-6 Solutions Manual for Statistical Inference

10.17 a. The correlation is .7781

b. Here is R code (R is available free at http://cran.r-project.org/) to bootstrap the data,

calculate the standard deviation, and produce the histogram:

cor(law)

n <- 15

theta <- function(x,law){ cor(law[x,1],law[x,2]) }

results <- bootstrap(1:n,1000,theta,law,func=sd)

results[2]

hist(results[[1]])

The data “law” is in two columns of length 15, “results[2]” contains the standard deviation.

The vector “results[[1]]” is the bootstrap sample. The output is

V1 V2

V1 1.0000000 0.7781716

V2 0.7781716 1.0000000

$func.thetastar

[1] 0.1322881

showing a correlation of .7781 and a bootstrap standard deviation of .1323.

c. The R code for the parametric bootstrap is

mx<-600.6;my<-3.09

sdx<-sqrt(1791.83);sdy<-sqrt(.059)

rho<-.7782;b<-rho*sdx/sdy;sdxy<-sqrt(1-rho^2)*sdx

rhodata<-rho

for (j in 1:1000) {

y<-rnorm(15,mean=my,sd=sdy)

x<-rnorm(15,mean=mx+b*(y-my),sd=sdxy)

rhodata<-c(rhodata,cor(x,y))

}

sd(rhodata)

hist(rhodata)

where we generate the bivariate normal by ﬁrst generating the marginal then the condid-

ional, as R does not have a bivariate normal generator. The bootstrap standard deviation

is 0.1159, smaller than the nonparametric estimate. The histogram looks similar to the

nonparametric bootstrap histogram, displaying a skewness left.

d. The Delta Method approximation is

r∼n(ρ, (1 −ρ2)2/n),

and the “plug-in” estimate of standard error is p(1 −.77822)2/15 = .1018, the smallest so

far. Also, the approximate pdf of rwill be normal, hence symmetric.

e. By the change of variables

t=1

2log 1 + r

1−r, dt =1

1−r2,

the density of ris

√2π(1 −r2)exp −n

21

2log 1 + r

1−r−1

2log 1 + ρ

1−ρ2!,−1≤r≤1.

More formally, we could start with the random variable T, normal with mean 1

2log 1+ρ

1−ρ

and variance 1/n, and make the transformation to R=e2T+1

e2T−1and get the same answer.

Second Edition 10-7

10.19 a. The variance of ¯

Xis

Var ¯

X= E( ¯

X−µ)2= E 1

Xi−µ!2

n2E

X

(Xi−µ)2+ 2 X

i>j

(Xi−µ)(Xj−µ)



n2nσ2+ 2n(n−1)

2ρσ2

=σ2

n+n−1

nρσ2

b. In this case we have

E

X

i>j

(Xi−µ)(Xj−µ)

=σ2

i=2

i−1

j=1

ρi−j.

In the double sum ρappears n−1 times, ρ2appears n−2 times, etc.. so

i=2

i−1

j=1

ρi−j=

n−1

i=1

(n−i)ρi=ρ

1−ρn−1−ρn

1−ρ,

where the series can be summed using (1.5.4), the partial sum of the geometric series, or

using Mathematica.

c. The mean and variance of Xiare

EXi= E[E(Xi|Xi−1)] = EρXi−1=··· =ρi−1EX1

and

VarXi= VarE(Xi|Xi−1) + EVar(Xi|Xi−1) = ρ2σ2+ 1 = σ2

for σ2= 1/(1 −ρ2). Also, by iterating the expectation

EX1Xi= E[E(X1Xi|Xi−1)] = E[E(X1|Xi−1)E(Xi|Xi−1)] = ρE[X1Xi−1],

where we used the facts that X1and Xiare independent conditional on Xi−1. Continuing

with the argument we get that EX1Xi=ρi−1EX2

1. Thus,

Corr(X1, Xi) = ρi−1EX2

1−ρi−1(EX1)2

√VarX1VarXi

=ρi−1σ2

√σ2σ2=ρi−1.

10.21 a. If any xi→ ∞,s2→ ∞, so it has breakdown value 0. To see this, suppose that x1→ ∞.

Write

s2=1

n−1

i=1

(xi−¯x)2=1

n−1 [(1 −1

n)x1−¯x−1]2+

i=2

(xi−¯x)2!,

where ¯x−1= (x2+. . . +xn)/n. It is easy to see that as x1→ ∞, each term in the sum

→ ∞.

b. If less than 50% of the sample → ∞, the median remains the same, and the median of

|xi−M|remains the same. If more than 50% of the sample → ∞,M→ ∞ and so does

the MAD.

10-8 Solutions Manual for Statistical Inference

10.23 a. The ARE is [2σf(µ)]2. We have

Distribution Parameters variance f(µ) ARE

normal µ= 0, σ = 1 1 .3989 .64

logistic µ= 0, β = 1 π2/3.25 .82

double exp. µ= 0, σ = 1 2 .5 2

b. If X1, X2, . . . , Xnare iid fXwith EX1=µand VarX1=σ2, the ARE is σ2[2 ∗fX(µ)]2.

If we transform to Yi= (Xi−µ)/σ, the pdf of Yiis fY(y) = σfX(σy +µ) with ARE

[2 ∗fY(0)]2=σ2[2 ∗fX(µ)]2

c. The median is more eﬃcient for smaller ν, the distributions with heavier tails.

νVarX f(0) ARE

3 3 .367 1.62

5 5/3.379 .960

10 5/4.389 .757

25 25/23 .395 .678

50 25/24 .397 .657

∞1.399 .637

d. Again the heavier tails favor the median.

δ σ ARE

.01 2 .649

.1 2 .747

.5 2 .895

.01 5 .777

.1 5 1.83

.5 5 2.98

10.25 By transforming y=x−θ,

Z∞

−∞

ψ(x−θ)f(x−θ)dx =Z∞

−∞

ψ(y)f(y)dy.

Since ψis an odd function, ψ(y) = −ψ(−y), and

Z∞

−∞

ψ(y)f(y)dy =Z0

−∞

ψ(y)f(y)dy +Z∞

ψ(y)f(y)dy

=Z0

−∞ −ψ(−y)f(y)dy +Z∞

ψ(y)f(y)dy

=−Z∞

ψ(y)f(y)dy +Z∞

ψ(y)f(y)dy = 0,

where in the last line we made the transformation y→ −yand used the fact the fis symmetric,

so f(y) = f(−y). From the discussion preceding Example 10.2.6, ˆ

θMis asymptotically normal

with mean equal to the true θ.

10.27 a.

lim

δ→0

δ[(1 −δ)µ+δx −µ] = lim

δ→0

δ(x−µ)

δ=x−µ.

P(X≤a) = P(X≤a|X∼F)(1 −δ) + P(x≤a|X=x)δ= (1 −δ)F(a) + δI(x≤a)

Second Edition 10-9

and

(1 −δ)F(a) = 1

2⇒a=F−11

2(1 −δ)

(1 −δ)F(a) + δ=1

2⇒a=F−11

2−δ

2(1 −δ)

c. The limit is

lim

δ→0

aδ−a0

δ=a0

δ|δ=0

by the deﬁnition of derivative. Since F(aδ) = 1

2(1−δ),

dδ F(aδ) = d

dδ

2(1 −δ)

f(aδ)a0

δ=1

2(1 −δ)2⇒a0

δ=1

2(1 −δ)2f(aδ).

Since a0=m, the result follows. The other limit can be calculated in a similar manner.

10.29 a. Substituting cl0for ψmakes the ARE equal to 1.

b. For each distribution is the case that the given ψfunction is equal to cl0, hence the resulting

M-estimator is asymptotically eﬃcient by (10.2.9).

10.31 a. By the CLT,

√n1

ˆp1−p1

pp1(1 −p1)→n(0,1) and √n2

ˆp2−p2

pp2(1 −p2)→n(0,1),

so if ˆp1and ˆp2are independent, under H0:p1=p2=p,

ˆp1−ˆp2

r1

n1+1

n2ˆp(1 −ˆp)→n(0,1)

where we use Slutsky’s Theorem and the fact that ˆp= (S1+S2)/(n1+n2) is the MLE of

p under H0and converges to pin probability. Therefore, T→χ2

b. Substitute ˆpis for Siand Fis to get

T∗=n2

1(ˆp1−ˆp)2

n1ˆp+n2

2(ˆp2−ˆp)2

n2ˆp

+n2

1[(1 −ˆp1)−(1 −ˆp)]2

n1(1 −ˆp)+n2

2[(1 −ˆp2)−(1 −ˆp)]2

n2ˆp

=n1(ˆp1−ˆp)2

ˆp(1 −ˆp)+n2(ˆp2−ˆp)2

ˆp(1 −ˆp)

Write ˆp= (n1ˆp1+n2ˆp2)/(n1+n2). Substitute this into the numerator, and some algebra

will get

n1(ˆp1−ˆp)2+n2(ˆp2−ˆp)2=(ˆp1−ˆp2)2

n1+1

so T∗=T.

10-10 Solutions Manual for Statistical Inference

c. Under H0,ˆp1−ˆp2

r1

n1+1

n2p(1 −p)→n(0,1)

and both ˆp1and ˆp2are consistent, so ˆp1(1 −ˆp1)→p(1 −p) and ˆp2(1 −ˆp2)→p(1 −p) in

probability. Therefore, by Slutsky’s Theorem,

ˆp1−ˆp2

qˆp1(1−ˆp1)

n1+ˆp2(1−ˆp2)

→n(0,1),

and (T∗∗)2→χ2

1. It is easy to see that T∗∗ 6=Tin general.

d. The estimator (1/n1+ 1/n2)ˆp(1 −ˆp) is the MLE of Var(ˆp1−ˆp2) under H0, while the

estimator ˆp1(1 −ˆp1)/n1+ ˆp2(1 −ˆp2)/n1is the MLE of Var(ˆp1−ˆp2) under H1. One might

argue that in hypothesis testing, the ﬁrst one should be used, since under H0, it provides

a better estimator of variance. If interest is in ﬁnding the conﬁdence interval, however, we

are making inference under both H0and H1, and the second one is preferred.

e. We have ˆp1= 34/40, ˆp2= 19/35, ˆp= (34 + 19)/(40 + 35) = 53/75, and T= 8.495. Since

χ2

1,.05 = 3.84, we can reject H0at α=.05.

10.32 a. First calculate the MLEs under p1=p2=p. We have

L(p|x) = px1px2px3···pxn−1

n−1 1−2p−

n−1

i=3

pi!m−x1−x2−···−xn−1

Taking logs and diﬀerentiating yield the following equations for the MLEs:

∂logL

∂p =x1+x2

p−

2m−Pn−1

i=1 xi

1−2p−Pn−1

i=3 pi

= 0

∂logL

∂pi

=xi

pi−xn

1−2p−Pn−1

i=3 pi

= 0, i = 3, . . . , n −1,

with solutions ˆp=x1+x2

2m, ˆpi=xi

m, i = 3, . . . , n −1, and ˆpn=m−Pn−1

i=1 xi/m. Except

for the ﬁrst and second cells, we have expected = observed, since both are equal to xi. For

the ﬁrst two terms, expected = mˆp= (x1+x2)/2 and we get

X(observed −expected)2

expected =x1−x1+x2

22

x1+x2

+x2−x1+x2

22

x1+x2

=(x1−x2)2

x1+x2

b. Now the hypothesis is about conditional probabilities is given by H0: P(change—initial

agree)=P(change—initial disagree) or, in terms of the parameters H0:p1

p1+p3=p2

p2+p4.

This is the same as p1p4=p2p3, which is not the same as p1=p2.

10.33 Theorem 10.1.12 and Slutsky’s Theorem imply that

θ−θ

nIn(ˆ

θ)→n(0,1)

and the result follows.

10.35 a. Since σ/√nis the estimated standard deviation of ¯

Xin this case, the statistic is a Wald

statistic

Second Edition 10-11

b. The MLE of σ2is ˆσ2

µ=Pi(xi−µ)2/n. The information number is

−d2

d(σ2)2 −n

2log σ2−1

ˆσ2

σ2!σ2=ˆσ2

2ˆσ2

Using the Delta method, the variance of ˆσµ=qˆσ2

µis ˆσ2

µ/8n, and a Wald statistic is

ˆσµ−σ0

qσ2

µ/8n

10.37 a. The log likelihood is

log L=−n

2log σ2−1

(xi−µ)2/σ2

with

dµ =1

σ2X

(xi−µ) = n

σ2(¯x−µ)

dµ2=−n

σ2,

so the test statistic for the score test is

σ2(¯x−µ)

pσ2/n =√n¯x−µ

b. We test the equivalent hypothesis H0:σ2=σ2

0. The likelihood is the same as Exercise

10.35(b), with ﬁrst derivative

−d

dσ2=n(ˆσ2

µ−σ2)

2σ4

and expected information number

E n(2ˆσ2

µ−σ2)

2σ6!=n(2σ2−σ2)

2σ6=n

2σ4.

The score test statistic is

ˆσ2

µ−σ2

σ2

10.39 We summarize the results for (a)−(c) in the following table. We assume that the underlying

distribution is normal, and use that for all score calculations. The actual data is generated

from normal, logistic, and double exponential. The sample size is 15, we use 1000 simulations

and draw 20 bootstrap samples. Here θ0= 0, and the power is tabulated for a nominal α=.1

test.

10-12 Solutions Manual for Statistical Inference

Underlying

pdf Test θ0θ0+.25σ θ0+.5σ θ0+.75σ θ0+ 1σ θ0+ 2σ

Laplace Naive 0.101 0.366 0.774 0.957 0.993 1.

Boot 0.097 0.364 0.749 0.932 0.986 1.

Median 0.065 0.245 0.706 0.962 0.995 1.

Logistic Naive 0.137 0.341 0.683 0.896 0.97 1.

Boot 0.133 0.312 0.641 0.871 0.967 1.

Median 0.297 0.448 0.772 0.944 0.993 1.

Normal Naive 0.168 0.316 0.628 0.878 0.967 1.

Boot 0.148 0.306 0.58 0.836 0.957 1.

Median 0.096 0.191 0.479 0.761 0.935 1.

Here is Mathematica code:

This program calculates size and power for Exercise 10.39, Second Edition

We do our calculations assuming normality, but simulate power and size under other distri-

butions. We test H0:θ= 0.

theta_0=0;

Needs["Statistics‘Master‘"]

Clear[x]

f1[x_]=PDF[NormalDistribution[0,1],x];

F1[x_]=CDF[NormalDistribution[0,1],x];

f2[x_]=PDF[LogisticDistribution[0,1],x];

f3[x_]=PDF[LaplaceDistribution[0,1],x];

v1=Variance[NormalDistribution[0,1]];

v2=Variance[LogisticDistribution[0,1]];

v3=Variance[LaplaceDistribution[0,1]];

Calculate m-estimate

Clear[k,k1,k2,t,x,y,d,n,nsim,a,w1]

ind[x_,k_]:=If[Abs[x]<k,1,0]

rho[y_,k_]:=ind[y,k]*y^2 + (1-ind[y,k])*(k*Abs[y]-k^2)

alow[d_]:=Min[Mean[d],Median[d]]

aup[d_]:=Max[Mean[d],Median[d]]

sol[k_,d_]:=FindMinimum[Sum[rho[d[[i]]-a,k],{i,1,n}],{a,{alow[d],aup[d]}}]

mest[k_,d_]:=sol[k,d][[2]]

generate data - to change underlying distributions change the sd and the distribution in the

Random statement.

n = 15; nsim = 1000; sd = Sqrt[v1];

theta = {theta_0, theta_0 +.25*sd, theta_0 +.5*sd,

theta_0 +.75*sd, theta_0 + 1*sd, theta_0 +2*sd}

ntheta = Length[theta]

data = Table[Table[Random[NormalDistribution[0, 1]],

{i, 1, n}],{j, 1,nsim}];

m1 = Table[Table[a /. mest[k1, data[[j]] - theta[[i]]],

{j, 1, nsim}], {i, 1, n\theta}];

Calculation of naive variance and test statistic

Psi[x_, k_] = x*If[Abs[x]<= k, 1, 0]- k*If[x < -k, 1, 0] +

Second Edition 10-13

k*If[x > k, 1, 0];

Psi1[x_, k_] = If[Abs[x] <= k, 1, 0];

num =Table[Psi[w1[[j]][[i]], k1], {j, 1, nsim}, {i, 1,n}];

den =Table[Psi1[w1[[j]][[i]], k1], {j, 1, nsim}, {i, 1,n}];

varnaive = Map[Mean, num^2]/Map[Mean, den]^2;

naivestat = Table[Table[m1[[i]][[j]] -theta_0/Sqrt[varnaive[[j]]/n],

{j, 1, nsim}],{i, 1, ntheta}];

absnaive = Map[Abs, naivestat];

N[Table[Mean[Table[If[absnaive[[i]][[j]] > 1.645, 1, 0],

{j, 1, nsim}]], {i, 1, n\theta}]]

Calculation of bootstrap variance and test statistic

nboot=20;

u:=Random[DiscreteUniformDistribution[n]]

databoot=Table[data[[jj]][[u]],{jj,1,nsim},{j,1,nboot},{i,1,n}];

m1boot=Table[Table[a/.mest[k1,databoot[[j]][[jj]]],

{jj,1,nboot}],{j,1,nsim}];

varboot = Map[Variance, m1boot];

bootstat = Table[Table[m1[[i]][[j]] -theta_0/Sqrt[varboot[[j]]],

{j, 1, nsim}], {i, 1, ntheta}];

absboot = Map[Abs, bootstat];

N[Table[Mean[Table[If[absboot[[i]][[j]] > 1.645, 1,0],

{j, 1, nsim}]], {i, 1, ntheta}]]\)

Calculation of median test - use the score variance at the root density (normal)

med = Map[Median, data];

medsd = 1/(n*2*f1[theta_0]);

medstat = Table[Table[med[[j]] + \theta[[i]] - theta_0/medsd,

{j, 1, nsim}], {i, 1, ntheta}];

absmed = Map[Abs, medstat];

N[Table[Mean[Table[If[\(absmed[[i]][[j]] > 1.645, 1, 0],

{j, 1, nsim}]], {i, 1, ntheta}]]

10.41 a. The log likelihood is

log L=nr log p+n¯xlog(1 −p)

with

dp log L=nr

p−n¯x

1−pand d2

dp2log L=−nr

p2−n¯x

(1 −p)2,

expected information nr

p2(1−p)and (Wilks) score test statistic

√nr

p−n¯x

1−p

p2(1−p)

=rn

r(1 −p)r+p¯x

√1−p.

Since this is approximately n(0,1), a 1 −αconﬁdence set is

p:rn

r(1 −p)r−p¯x

√1−p≤zα/2.

10-14 Solutions Manual for Statistical Inference

b. The mean is µ=r(1 −p)/p, and a little algebra will verify that the variance, r(1 −p)/p2

can be written r(1 −p)/p2=µ+µ2/r. Thus

r(1 −p)r−p¯x

√1−p=√nµ−¯x

pµ+µ2/r .

The conﬁdence interval is found by setting this equal to zα/2, squaring both sides, and

solving the quadratic for µ. The endpoints of the interval are

r(8¯x+z2

α/2)±qrz2

α/2q16r¯x+ 16¯x2+rz2

α/2

8r−2z2

α/2

For the continuity correction, replace ¯xwith ¯x+1/(2n) when solving for the upper endpoint,

and with ¯x−1/(2n) when solving for the lower endpoint.

c. We table the endpoints for α=.1 and a range of values of r. Note that r=∞is the

Poisson, and smaller values of rgive a wider tail to the negative binomial distribution.

rlower bound upper bound

1 22.1796 364.42

5 36.2315 107.99

10 38.4565 95.28

50 40.6807 85.71

100 41.0015 84.53

1000 41.3008 83.46

∞41.3348 83.34

10.43 a. Since

P X

Xi= 0!= (1 −p)n=α/2⇒p= 1 −α1/n

and

P X

Xi=n!=pn=α/2⇒p=α1/n,

these endpoints are exact, and are the shortest possible.

b. Since p∈[0,1], any value outside has zero probability, so truncating the interval shortens

it at no cost.

10.45 The continuity corrected roots are

2ˆp+z2

α/2/n ±1

n±rz2

α/2

n3[±2n(1 −2ˆp)−1] + (2ˆp+z2

α/2/n)2−4ˆp2(1 + z2

α/2/n)

2(1 + z2

α/2/n)

where we use the upper sign for the upper root and the lower sign for the lower root. Note that

the only diﬀerences between the continuity-corrected intervals and the ordinary score intervals

are the terms with ±in front. But it is still diﬃcult to analytically compare lengths with the

non-corrected interval - we will do a numerical comparison. For n= 10 and α=.1 we have

the following table of length ratios, with the continuity-corrected length in the denominator

n0 1 2 3 4 5 6 7 8 9 10

Ratio 0.79 0.82 0.84 0.85 0.86 0.86 0.86 0.85 0.84 0.82 0.79

The coverage probabilities are

Second Edition 10-15

p0.1.2.3.4.5.6.7.8.9 1

score .99 .93 .97 .92 .90 .89 .90 .92 .97 .93 .99

cc .99 .99 .97 .92 .98 .98 .98 .92 .97 .99 .99

Mathematica code to do the calculations is:

Needs["Statistics‘Master‘"]

Clear[p, x]

pbino[p_, x_] = PDF[BinomialDistribution[n, p], x];

cut = 1.645^2;

n = 10;

The quadratic score interval with and without continuity correction

slowcc[x_] := p /. FindRoot[(x/n - 1/(2*n) - p)^2 ==

cut*(p*((1 - p))/n, {p, .001}]

supcc[x_] := p /. FindRoot[(x/n + 1/(2*n) - p)^2 ==

cut*(p*((1 - p)/n, {p, .999}]

slow[x_] := p /. FindRoot[(x/n - p))^2 ==

cut*(p*(1 - p))/n, {p, .001}]

sup[x_] := p /. FindRoot[(x/n - p)^2 ==

cut*(p*(1 - p)/n, {p, .999}]

scoreintcc=Partition[Flatten[{{0,sup[0]},Table[{slowcc[i],supcc[i]},

{i,1,n-1}],{slowcc[n],1}},2],2];

scoreint=Partition[Flatten[{{0,sup[0]},Table[{slow[i],sup[i]},

{i,1,n-1}],{slowcc[n],1}},2],2];

Length Comparison

Table[(sup[i] - slow[i])/(supcc[i] - slowcc[i]), {i, 0, n}]

Now we’ll calculate coverage probabilities

scoreindcc[p_,x_]:=If[scoreintcc[[x+1]][[1]]<=p<=scoreintcc[[x+1]][[2]],1,0]

scorecovcc[p_]:=scorecovcc[p]=Sum[pbino[p,x]*scoreindcc[p,x],{x,0,n}]

scoreind[p_,x_]:=If[scoreint[[x+1]][[1]]<=p<=scoreint[[x+1]][[2]],1,0]

scorecov[p_]:=scorecov[p]=Sum[pbino[p,x]*scoreind[p,x],{x,0,n}]

{scorecovcc[.0001],Table[scorecovcc[i/10],{i,1,9}],scorecovcc[.9999]}//N

{scorecov[.0001],Table[scorecov[i/10],{i,1,9}],scorecov[.9999]}//N

10.47 a. Since 2pY ∼χ2

nr (approximately)

P(χ2

nr,1−α/2≤2pY ≤χ2

nr,α/2) = 1 −α,

and rearrangment gives the interval.

b. The interval is of the form P(a/2Y≤p≤b/2Y), so the length is proportional to b−a.

This must be minimized subject to the constraint Rb

af(y)dy = 1 −α, where f(y) is the pdf

of a χ2

nr. Treating bas a function of a, diﬀerentiating gives

b0−1 = 0 and f(b)b0−f(a) = 0

which implies that we need f(b) = f(a).

Chapter 11

Analysis of Variance and Regression

11.1 a. The ﬁrst order Taylor’s series approximation is

Var[g(Y)] ≈[g0(θ)]2·VarY= [g0(θ)]2·v(θ).

b. If we choose g(y) = g∗(y) = Ry

√v(x)dx, then

dg∗(θ)

dθ =d

dθ Zθ

pv(x)dx =1

pv(θ),

by the Fundamental Theorem of Calculus. Then, for any θ,

Var[g∗(Y)] ≈ 1

pv(θ)!2

v(θ) = 1.

11.2 a. v(λ) = λ,g∗(y) = √y,dg∗(λ)

dλ =1

2√λ, Varg∗(Y)≈dg∗(λ)

dλ 2·v(λ) = 1/4, independent of λ.

b. To use the Taylor’s series approximation, we need to express everything in terms of θ=

EY=np. Then v(θ) = θ(1 −θ/n) and

dg∗(θ)

dθ 2

=



q1−θ

·1

2qθ

·1

n



4nθ(1 −θ/n).

Therefore

Var[g∗(Y)] ≈dg∗(θ)

dθ 2

v(θ) = 1

4n,

independent of θ, that is, independent of p.

c. v(θ) = Kθ2,dg∗(θ)

dθ =1

θand Var[g∗(Y)] ≈1

θ2·Kθ2=K, independent of θ.

11.3 a. g∗

λ(y) is clearly continuous with the possible exception of λ= 0. For that value use

l’Hˆopital’s rule to get

lim

λ→0

yλ−1

λ= lim

λ→0

(log y)yλ

1= log y.

b. From Exercise 11.1, we want to ﬁnd v(λ) that satisﬁes

yλ−1

λ=Zy

pv(x)dx.

Taking derivatives

dy yλ−1

λ=yλ−1=d

dy Zy

pv(x)dx =1

pv(y).

11-2 Solutions Manual for Statistical Inference

Thus v(y) = y−2(λ−1).From Exercise 11.1,

Var yλ−1

λ≈d

θλ−1

λ2

v(θ) = θ2(λ−1)θ−2(λ−1) = 1.

Note: If λ= 1/2, v(θ) = θ, which agrees with Exercise 11.2(a). If λ= 1 then v(θ) = θ2,

which agrees with Exercise 11.2(c).

11.5 For the model

Yij =µ+τi+εij , i = 1, . . . , k, j = 1, . . . , ni,

take k= 2. The two parameter conﬁgurations

(µ, τ1, τ2) = (10,5,2)

(µ, τ1, τ2) = (7,8,5),

have the same values for µ+τ1and µ+τ2, so they give the same distributions for Y1and Y2.

11.6 a. Under the ANOVA assumptions Yij =θi+ij , where ij ∼independent n(0, σ2), so Yij ∼

independent n(θi, σ2). Therefore the sample pdf is

i=1

j=1

(2πσ2)−1/2e−(yij −θi)2

2σ2= (2πσ2)−Σni/2exp 



−1

2σ2

i=1

j=1

(yij −θi)2





= (2πσ2)−Σni/2exp (−1

2σ2

i=1

niθ2

×exp 



−1

2σ2X

ij +2

2σ2

i=1

θini¯

Yi·





Therefore, by the Factorization Theorem,



¯

Y1·,¯

Y2·, . . . , ¯

Yk·,X

ij 



is jointly suﬃcient for θ1, . . . , θk, σ2. Since ( ¯

Y1·, . . . , ¯

Yk·, S2

p) is a 1-to-1 function of this

vector, ( ¯

Y1·, . . . , ¯

Yk·, S2

p) is also jointly suﬃcient.

b. We can write

(2πσ2)−Σni/2exp 



−1

2σ2

i=1

j=1

(yij −θi)2





= (2πσ2)−Σni/2exp 



−1

2σ2

i=1

j=1

([yij −¯yi·] + [¯yi·−θi])2





= (2πσ2)−Σni/2exp 



−1

2σ2

i=1

j=1

[yij −¯yi·]2





exp (−1

2σ2

i=1

ni[¯yi·−θi]2),

so, by the Factorization Theorem, ¯

Yi·,i= 1, . . . , n, is independent of Yij −¯

Yi·,j= 1, . . . , ni,

so S2

pis independent of each ¯

Yi·.

c. Just identify ni¯

Yi·with Xiand redeﬁne θias niθi.

Second Edition 11-3

11.7 Let Ui=¯

Yi·−θi. Then

i=1

ni[( ¯

Yi·−¯

Y)−(θi−¯

θ)]2=

i=1

ni(Ui−¯

U)2.

The Uiare clearly n(0, σ2/ni). For K= 2 we have

2=n1(U1−¯

U)2+n2(U2−¯

U)2

=n1U1−n1¯

U1+n2¯

n1+n22

+n2U2−n1¯

U1+n2¯

n1+n22

= (U1−U2)2"n1n2

n1+n22

+n2n1

n1+n22#

=(U1−U2)2

n1+1

Since U1−U2∼n(0, σ2(1/n1+ 1/n2)), S2

2/σ2∼χ2

1. Let ¯

Ukbe the weighted mean of k Uis,

and note that

Uk+1 =¯

Uk+nk+1

Nk+1

(Uk+1 −¯

Uk),

where Nk=Pk

j=1 nj. Then

k+1 =

k+1

i=1

ni(Ui−¯

Uk+1)2=

k+1

i=1

ni(Ui−¯

Uk)−nk+1

Nk+1

(Uk+1 −¯

Uk)2

=S2

k+nk+1Nk

Nk+1

(Uk+1 −¯

Uk)2,

where we have expanded the square, noted that the cross-term (summed up to k) is zero, and

did a boat-load of algebra. Now since

Uk+1 −¯

Uk∼n(0, σ2(1/nk+1 + 1/Nk)) = n(0, σ2(Nk+1/nk+1Nk)),

independent of S2

k, the rest of the argument is the same as in the proof of Theorem 5.3.1(c).

11.8 Under the oneway ANOVA assumptions, Yij ∼independent n(θi, σ2). Therefore

Yi·∼nθi, σ2/ni(Yij ’s are independent with common σ2.)

ai¯

Yi·∼naiθi, a2

iσ2/ni

i=1

ai¯

Yi·∼n Xaiθi, σ2

i=1

i/ni!.

All these distributions follow from Corollary 4.6.10.

11.9 a. From Exercise 11.8,

T=Xai¯

Yi∼nXaiθi, σ2Xa2

i,

and under H0, ET=δ. Thus, under H0,

Pai¯

Yi−δ

qS2

pPa2

i∼tN−k,

11-4 Solutions Manual for Statistical Inference

where N=Pni. Therefore, the test is to reject H0if

Pai¯

Yi−δ

qS2

pPa2

i/ni

> tN−k, α

b. Similarly for H0:Paiθi≤δvs. H1:Paiθi> δ, we reject H0if

Pai¯

Yi−δ

qS2

pPa2

i/ni

> tN−k,α.

11.10 a. Let Hi

0,i= 1, . . . , 4 denote the null hypothesis using contrast ai, of the form

0:X

aij θj≥0.

If H1

0is rejected, it indicates that the average of θ2,θ3,θ4, and θ5is bigger than θ1which

is the control mean. If all Hi

0’s are rejected, it indicates that θ5> θifor i= 1,2,3,4. To see

this, suppose H4

0and H5

0are rejected. This means θ5>θ5+θ4

2> θ3; the ﬁrst inequality is

implied by the rejection of H5

0and the second inequality is the rejection of H4

0. A similar

argument implies θ5> θ2and θ5> θ1. But, for example, it does not mean that θ4> θ3or

θ3> θ2. It also indicates that

2(θ5+θ4)> θ3,1

3(θ5+θ4+θ3)> θ2,1

4(θ5+θ4+θ3+θ2)> θ1.

b. In part a) all of the contrasts are orthogonal. For example,

i=1

a2ia3i=0,1,−1

3,−1

3











=−1

3+1

6+1

6= 0,

and this holds for all pairs of contrasts. Now, from Lemma 5.4.2,

Cov X

aji ¯

Yi·,X

aj0i¯

Yi·!=σ2

ajiaj0i,

which is zero because the contrasts are orthogonal. Note that the equal number of obser-

vations per treatment is important, since if ni6=ni0for some i,i0, then

Cov k

i=1

aji ¯

Yi,

i=1

aj0i¯

Yi!=

i=1

ajiaj0i

σ2

=σ2

i=1

ajiaj0i

ni6= 0.

c. This is not a set of orthogonal contrasts because, for example, a1×a2=−1. However, each

contrast can be interpreted meaningfully in the context of the experiment. For example, a1

tests the eﬀect of potassium alone, while a5looks at the eﬀect of adding zinc to potassium.

11.11 This is a direct consequence of Lemma 5.3.3.

11.12 a. This is a special case of (11.2.6) and (11.2.7).

Second Edition 11-5

b. From Exercise 5.8(a) We know that

s2=1

k−1

i=1

(¯yi·−¯

¯y)2=1

2k(k−1) X

i,i0

(¯yi·−¯yi0·)2.

Then

k(k−1) X

i,i0

ii0=1

2k(k−1) X

i,i0

(¯yi·−¯yi0·)2

p/n =

i=1

(¯yi·−¯

¯y)2

(k−1)s2

p/n

=Pin(¯yi·−¯

¯y)2/(k−1)

which is distributed as Fk−1,N−kunder H0:θ1=··· =θk. Note that

i,i0

ii0=

i=1

i0=1

ii0,

therefore t2

ii0and t2

i0iare both included, which is why the divisor is k(k−1), not k(k−1)

2=k

2.

Also, to use the result of Example 5.9(a), we treated each mean ¯

Yi·as an observation, with

overall mean ¯

Y. This is true for equal sample sizes.

11.13 a.

L(θ|y) = 1

2πσ2Nk/2

e−1

2Pk

i=1 Pni

j=1(yij −θi)2/σ2

Note that

i=1

j=1

(yij −θi)2=

i=1

j=1

(yij −¯yi·)2+

i=1

ni(¯yi·−θi)2

=SSW +

i=1

ni(¯yi·−θi)2,

and the LRT statistic is

λ= (ˆτ2/ˆτ2

0)Nk/2

where

ˆτ2=SSW and ˆτ2

0=SSW +X

ni(¯yi·−¯y··)2=SSW +SSB.

Thus λ<kif and only if SSB/SSW is large, which is equivalent to the Ftest.

b. The error probabilities of the test are a function of the θis only through η=Pθ2

i. The

distribution of Fis that of a ratio of chi squared random variables, with the numerator

being noncentral (dependent on η). Thus the Type II error is given by

P(F > k|η) = Pχ2

k−1(η)/(k−1)

χ2

N−k/(N−k)> k≥Pχ2

k−1(0)/(k−1)

χ2

N−k/(N−k)> k=α,

where the inequality follows from the fact that the noncentral chi squared is stochastically

increasing in the noncentrality parameter.

11-6 Solutions Manual for Statistical Inference

11.14 Let Xi∼n(θi, σ2). Then from Exercise 11.11

Cov Pi

√ciXi,Pi√civiXi=σ2Paivi

Var Pi

√ciXi=σ2Pa2

ci,Var Pi√civiXi=σ2Pciv2

and the Cauchy-Schwarz inequality gives

Xaivi.Xa2

i/ci≤Xciv2

If ai=civithis is an equality, hence the LHS is maximized. The simultaneous statement is

equivalent to

Pk

i=1 ai(¯yi·−θi)2

s2

pPk

i=1 a2

i/n≤Mfor all a1, . . . , ak,

and the LHS is maximized by ai=ni(¯yi·−θi). This produces the Fstatistic.

11.15 a. Since t2

ν=F1,ν , it follows from Exercise 5.19(b) that for k≥2

P[(k−1)Fk−1,ν ≥a]≥P(t2

ν≥a).

So if a=t2

ν,α/2, the Fprobability is greater than α, and thus the α-level cutoﬀ for the F

must be greater than t2

ν,α/2.

b. The only diﬀerence in the intervals is the cutoﬀ point, so the Scheﬀ´e intervals are wider.

c. Both sets of intervals have nominal level 1 −α, but since the Scheﬀ´e intervals are wider,

tests based on them have a smaller rejection region. In fact, the rejection region is contained

in the trejection region. So the tis more powerful.

11.16 a. If θi=θjfor all i,j, then θi−θj= 0 for all i,j, and the converse is also true.

b. H0:θ∈ ∩ij Θij and H1:θ∈ ∪ij (Θij )c.

11.17 a. If all of the means are equal, the Scheﬀ´e test will only reject αof the time, so the ttests

will be done only αof the time. The experimentwise error rate is preserved.

b. This follows from the fact that the ttests use a smaller cutoﬀ point, so there can be rejection

using the ttest but no rejection using Scheﬀ´e. Since Scheﬀ´e has experimentwise level α,

the ttest has experimentwise error greater than α.

c. The pooled standard deviation is 2.358, and the means and tstatistics are

Mean tstatistic

Low Medium High Med-Low High-Med High-Low

3.51. 9.27 24.93 3.86 10.49 14.36

The tstatistics all have 12 degrees of freedom and, for example, t12,.01 = 2.68, so all of the

tests reject and we conclude that the means are all signiﬁcantly diﬀerent.

11.18 a.

P(Y > a|Y > b) = P(Y > a, Y > b)/P (Y > b)

=P(Y > a)/P (Y > b) (a > b)

> P (Y > a).(P(Y > b)<1)

b. If ais a cutoﬀ point then we would declare signiﬁcance if Y > a. But if we only check if Yis

signiﬁcant because we see a big Y(Y > b), the proper signiﬁcance level is P(Y > a|Y > b),

which will show less signiﬁcance than P(Y > a).

Second Edition 11-7

11.19 a. The marginal distributions of the Yiare somewhat straightforward to derive. As Xi+1 ∼

gamma(λi+1,1) and, independently, Pi

j=1 Xj∼gamma(Pi

j=1 λj,1) (Example 4.6.8), we

only need to derive the distribution of the ratio of two independent gammas. Let X∼

gamma(λ1,1) and Y ∼gamma(λ2,1). Make the transformation

u=x/y, v =y⇒x=uv, y =v,

with Jacobian v. The density of (U, V ) is

f(u, v) = 1

Γ(λ1)Γ(λ2)(uv)λ1−1vλ2−1ve−uv e−v=uλ1−1

Γ(λ1)Γ(λ2)vλ1+λ2−1e−v(1+u).

To get the density of U, integrate with respect to v. Note that we have the kernel of a

gamma(λ1+λ2,1/(1 + u)), which yields

f(u) = Γ(λ1+λ2)

Γ(λ1)Γ(λ2)

uλ1−1

(1 + u)λ1+λ2−1.

The joint distribution is a nightmare. We have to make a multivariate change of variable.

This is made a bit more palatable if we do it in two steps. First transform

W1=X1, W2=X1+X2, W3=X1+X2+X3, . . . , Wn=X1+X2+··· +Xn,

with

X1=W1, X2=W2−W1, X3=W3−W2, . . . Xn=Wn−Wn−1,

and Jacobian 1. The joint density of the Wiis

f(w1, w2, . . . , wn) =

i=1

Γ(λi)(wi−wi−1)λi−1e−wn, w1≤w2≤ ··· ≤ wn,

where we set w0= 0 and note that the exponent telescopes. Next note that

y1=w2−w1

, y2=w3−w2

, . . . yn−1=wn−wn−1

wn−1

, yn=wn,

with

wi=yn

Qn−1

j=i(1 + yj), i = 1, . . . , n −1, wn=yn.

Since each wionly involves yjwith j≥i, the Jacobian matrix is triangular and the

determinant is the product of the diagonal elements. We have

dwi

dyi

=−yn

(1 + yi)Qn−1

j=i(1 + yj), i = 1, . . . , n −1,dwn

dyn

= 1,

and

f(y1, y2, . . . , yn) = 1

Γ(λ1) yn

Qn−1

j=1 (1 + yj)!λ1−1

n−1

i=2

Γ(λi) yn

Qn−1

j=i(1 + yj)−yn

Qn−1

j=i−1(1 + yj)!λi−1

e−yn

n−1

i=1

(1 + yi)Qn−1

j=i(1 + yj).

11-8 Solutions Manual for Statistical Inference

Factor out the terms with ynand do some algebra on the middle term to get

f(y1, y2, . . . , yn) = yΣiλi−1

ne−yn1

Γ(λ1) 1

Qn−1

j=1 (1 + yj)!λ1−1

n−1

i=2

Γ(λi) yi−1

1 + yi−1

Qn−1

j=i(1 + yj)!λi−1

n−1

i=1

(1 + yi)Qn−1

j=i(1 + yj).

We see that Ynis independent of the other Yi(and has a gamma distribution), but there

does not seem to be any other obvious conclusion to draw from this density.

b. The Yiare related to the Fdistribution in the ANOVA. For example, as long as the sum

of the λiare integers,

Yi=Xi+1

j=1 Xj

=2Xi+1

2Pi

j=1 Xj

=χ2

λi+1

χ2

j=1 λj

∼Fλi+1,Pi

j=1 λj.

Note that the Fdensity makes sense even if the λiare not integers.

11.21 a.

Grand mean ¯y·· =188.54

15 = 12.57

Total sum of squares =

i=1

j=1

(yij −¯y··)2= 1295.01.

Within SS =

(yij −¯yi·)2

(y1j−3.508)2+

(y2j−9.274)2+

(y3j−24.926)2

= 1.089 + 2.189 + 63.459 = 66.74

Between SS = 5 3

(yij −¯yi·)2!

= 5(82.120 + 10.864 + 152.671) = 245.65 ·5 = 1228.25.

ANOVA table:

Source df SS MS F

Treatment 2 1228.25 614.125 110.42

Within 12 66.74 5.562

Total 14 1294.99

Note that the total SS here is diﬀerent from above – round oﬀ error is to blame. Also,

F2,12 = 110.42 is highly signiﬁcant.

b. Completing the proof of (11.2.4), we have

i=1

j=1

(yij −¯

¯y)2=

i=1

j=1

((yij −¯yi·) + (¯yi−¯

¯y))2

Second Edition 11-9

i=1

j=1

(yij −¯yi·)2+

i=1

j=1

(¯yi·−¯

¯y)2

i=1

j=1

(yij −¯yi·) (¯yi·−¯

¯y),

where the cross term (the sum over j) is zero, so the sum of squares is partitioned as

i=1

j=1

(yij −¯yi·)2+

i=1

ni(¯yi−¯

¯y)2

c. From a), the Fstatistic for the ANOVA is 110.42. The individual two-sample t’s, using

p=1

15−3(66.74) = 5.5617, are

12 =(3.508 −9.274)2

(5.5617)(2/5) =33.247

2.2247 = 14.945,

13 =(3.508 −24.926)2

2.2247 = 206.201,

23 =(9.274 −24.926)2

2.2247 = 110.122,

and 2(14.945) + 2(206.201) + (110.122)

6= 110.42 = F.

11.23 a.

EYij = E(µ+τi+bj+ij ) = µ+τi+ Ebj+ Eij =µ+τi

VarYij = Varbj+ Varij =σ2

B+σ2,

by independence of bjand ij .

Var n

i=1

ai¯

Yi·!=

i=1

iVar ¯

Yi·+ 2 X

i>i0

Cov(aiYi·, ai0Yi0·).

The ﬁrst term is

i=1

iVar ¯

Yi·=

i=1

iVar 



j=1

µ+τi+bj+ij 

=1

i=1

i(rσ2

B+rσ2)

from part (a). For the covariance

E¯

Yi·=µ+τi,

and

E( ¯

Yi·¯

Yi0·)=E



µ+τi+1

(bj+ij )



µ+τi0+1

(bj+i0j)





= (µ+τi)(µ+τi0) + 1

r2E



X

(bj+ij )



X

(bj+i0j)





11-10 Solutions Manual for Statistical Inference

since the cross terms have expectation zero. Next, expanding the product in the second term

again gives all zero cross terms, and we have

E( ¯

Yi·¯

Yi0·) = (µ+τi)(µ+τi0) + 1

r2(rσ2

B),

and

Cov( ¯

Yi·,¯

Yi0·) = σ2

B/r.

Finally, this gives

Var n

i=1

ai¯

Yi·!=1

i=1

i(rσ2

B+rσ2)+2X

i>i0

aiai0σ2

B/r

r"n

i=1

iσ2+σ2

i=1

ai)2#

rσ2

i=1

r(σ2+σ2

B)(1 −ρ)

i=1

where, in the third equality we used the fact that Piai= 0.

11.25 Diﬀerentiation yields

a. ∂

∂c RSS = 2 P[yi−(c+dxi)] (−1) set

= 0 ⇒nc +dPxi=Pyi

∂

∂d RSS = 2 P[yi−(ci+dxi)] (−xi)set

= 0 ⇒cPxi+dPx2

i=Pxiyi.

b. Note that nc +dPxi=Pyi⇒c= ¯y−d¯x. Then

(¯y−d¯x)Xxi+dXx2

i=Xxiyiand dXx2

i−n¯x2=Xxiyi−Xxi¯y

which simpliﬁes to d=Pxi(yi−¯y)/P(xi−¯x)2. Thus cand dare the least squares

estimates.

c. The second derivatives are

∂2

∂c2RSS = n, ∂2

∂c∂d RSS = Xxi,∂2

∂d2RSS = Xx2

Thus the Jacobian of the second-order partials is



nPxi

PxiPx2

i

=nXx2

i−Xxi2=nX(xi−¯x)2>0.

11.27 For the linear estimator PiaiYito be unbiased for αwe have

E X

aiYi!=X

ai(α+βxi) = α⇒X

ai= 1 and X

aixi= 0.

Since Var PiaiYi=σ2Pia2

i, we need to solve:

minimize X

isubject to X

ai= 1 and X

aixi= 0.

Second Edition 11-11

A solution can be found with Lagrange multipliers, but verifying that it is a minimum is

excruciating. So instead we note that

ai= 1 ⇒ai=1

n+k(bi−¯

b),

for some constants k, b1, b2, . . . , bn, and

aixi= 0 ⇒k=−¯x

Pi(bi−¯

b)(xi−¯x)and ai=1

n−¯x(bi−¯

Pi(bi−¯

b)(xi−¯x).

Now

i=X

i1

n−¯x(bi−¯

Pi(bi−¯

b)(xi−¯x)2

n+¯x2Pi(bi−¯

b)2

[Pi(bi−¯

b)(xi−¯x)]2,

since the cross term is zero. So we need to minimize the last term. From Cauchy-Schwarz we

know that

Pi(bi−¯

b)2

[Pi(bi−¯

b)(xi−¯x)]2≥1

Pi(xi−¯x)]2,

and the minimum is attained at bi=xi. Substituting back we get that the minimizing aiis

n−¯x(xi−¯x)

Pi(xi−¯x)2, which results in PiaiYi=¯

Y−ˆ

β¯x, the least squares estimator.

11.28 To calculate

max

σ2L(σ2|y, ˆαˆ

β) = max

σ21

2πσ2n/2

e−1

2Σi[yi−(ˆα+ˆ

βxi)]2/σ2

take logs and diﬀerentiate with respect to σ2to get

dσ2log L(σ2|y, ˆα, ˆ

β) = −n

2σ2+1

2Pi[yi−(ˆα+ˆ

βxi)]2

(σ2)2.

Set this equal to zero and solve for σ2. The solution is ˆσ2.

11.29 a.

Eˆi= E(Yi−ˆα−ˆ

βxi) = (α+βxi)−α−βxi= 0.

Varˆi= E[Yi−ˆα−ˆ

βxi]2

= E[(Yi−α−βxi)−(ˆα−α)−xi(ˆ

β−β)]2

= VarYi+ Varˆα+x2

iVar ˆ

β−2Cov(Yi,ˆα)−2xiCov(Yi,ˆ

β)+2xiCov(ˆα, ˆ

β).

11.30 a. Straightforward algebra shows

ˆα= ¯y−ˆ

β¯x

=X1

nyi−¯xP(xi−¯x)yi

P(xi−¯x)2

=X1

n−¯x(xi−¯x)

P(xi−¯x)2yi.

11-12 Solutions Manual for Statistical Inference

b. Note that for ci=1

n−¯x(xi−¯x)

P(xi−¯x)2,Pci= 1 and Pcixi= 0. Then

Eˆα= E XciYi=Xci(α+βxi=α,

Varˆα=Xc2

iVarYi=σ2Xc2

and

Xc2

i=X1

n−¯x(xi−¯x)

P(xi−¯x)22

=X1

n2+P¯x2(xi−¯x)2

(P(xi−¯x)2)2(cross term = 0)

n+¯x2

P(xi−¯x)2=Px2

nSxx

c. Write ˆ

β=Pdiyi, where

di=xi−¯x

P(xi−¯x)2.

From Exercise 11.11,

Cov(ˆα, ˆ

β) = Cov XciYi,XdiYi=σ2Xcidi

=σ2X1

n−¯x(xi−¯x)

P(xi−¯x)2(xi−¯x)

P(xi−¯x)2=−σ2¯x

P(xi−¯x)2.

11.31 The fact that

ˆi=X

[δij −(cj+djxi)]Yj

follows directly from (11.3.27) and the deﬁnition of cjand dj. Since ˆα=PiciYi, from Lemma

11.3.2

Cov(ˆi,ˆα) = σ2X

cj[δij −(cj+djxi)]

=σ2

ci−X

cj(cj+djxi)



=σ2

ci−X

j−xiX

cjdj

.

Substituting for cjand djgives

ci=1

n−(xi−¯x)¯x

Sxx

j=1

n+¯x2

Sxx

xiX

cjdj=−xi¯x

Sxx

and substituting these values shows Cov(ˆi,ˆα) = 0. Similarly, for ˆ

β,

Cov(ˆi,ˆ

β) = σ2

di−X

cjdj−xiX

j



Second Edition 11-13

with

di=(xi−¯x)

Sxx

cjdj=−¯x

Sxx

xiX

j=1

Sxx

and substituting these values shows Cov(ˆi,ˆ

β) = 0.

11.32 Write the models as

3yi=α+βxi+i

yi=α0+β0(xi−¯x) + i

=α0+β0zi+i.

a. Since ¯z= 0,

β=P(xi−¯x)(yi−¯y)

P(xi−¯x)2=Pzi(yi−¯y)

Pz2

=ˆ

β0.

ˆα= ¯y−ˆ

β¯x,

ˆα0= ¯y−ˆ

β0¯z= ¯y

since ¯z= 0.

ˆα0∼n(α+β¯z, σ2/n) = n(α, σ2/n).

c. Write

ˆα0=X1

nyiˆ

β0=Xzi

Pz2

iyi.

Then

Cov(ˆα, ˆ

β) = −σ2X1

nzi

Pz2

i= 0,

since Pzi= 0.

11.33 a. From (11.23.25), β=ρ(σY/σX), so β= 0 if and only if ρ= 0 (since we assume that the

variances are positive).

b. Start from the display following (11.3.35). We have

β2

S2/Sxx

=S2

xy/Sxx

RSS/(n−2)

= (n−2) S2

Syy −S2

xy/SxxSxx

= (n−2) S2

SyySxx −S2

xy,

and dividing top and bottom by Syy Sxx ﬁnishes the proof.

c. From (11.3.33) if ρ= 0 (equivalently β= 0), then ˆ

β/(S/√Sxx) = √n−2r/√1−r2has a

tn−2distribution.

11-14 Solutions Manual for Statistical Inference

11.34 a. ANOVA table for height data

Source df SS MS F

Regression 1 60.36 60.36 50.7

Residual 6 7.14 1.19

Total 7 67.50

The least squares line is ˆy= 35.18 + .93x.

b. Since yi−¯y= (yi−ˆyi) + (ˆyi−¯y), we just need to show that the cross term is zero.

i=1

(yi−ˆyi)(ˆyi−¯y) =

i=1 hyi−(ˆα+ˆ

βxi)ih(ˆα+ˆ

βxi)−¯yi

i=1 h(ˆyi−¯y)−ˆ

β(xi−¯x)ihˆ

β(xi−¯x)i(ˆα= ¯y−ˆ

β¯x)

=ˆ

i=1

(xi−¯x)(yi−¯y)−ˆ

β2

i=1

(xi−¯x)2= 0,

from the deﬁnition of ˆ

β.

X(ˆyi−¯y)2=ˆ

β2X(xi−¯x)2=S2

Sxx

11.35 a. For the least squares estimate:

dθ X

(yi−θx2

i)2= 2 X

(yi−θx2

i)x2

i= 0

which implies

θ=Piyix2

Pix4

b. The log likelihood is

log L=−n

2log(2πσ2)−1

2σ2X

(yi−θx2

i)2,

and maximizing this is the same as the minimization in part (a).

c. The derivatives of the log likelihood are

dθ log L=1

σ2X

(yi−θx2

i)x2

dθ2log L=−1

σ2X

so the CRLB is σ2/Pix4

i. The variance of ˆ

θis

Varˆ

θ= Var Piyix2

Pix4

i=X

i x2

Pjx4

j!σ2=σ2/X

so ˆ

θis the best unbiased estimator.

Second Edition 11-15

11.36 a.

Eˆα= E( ¯

Y−ˆ

β¯

X)=EhE( ¯

Y−ˆ

β¯

X|¯

X)i= E α+β¯

X−β¯

X= Eα=α.

Eˆ

β= E[E( ˆ

β|¯

X)] = Eβ=β.

b. Recall

VarY= Var[E(Y|X)] + E[Var(Y|X)]

Cov(Y , Z) = Cov[E(Y|X),E(Z|X)] + E[Cov(Y, Z|X)].

Thus

Varˆα= E[Var(ˆα|X)] = σ2EhXX2

i.SXX i

Var ˆ

β=σ2E[1/SXX ]

Cov(ˆα, ˆ

β) = E[Cov(ˆα, ˆ

β|ˆ

X)] = −σ2E[ ¯

X/SXX ].

11.37 This is almost the same problem as Exercise 11.35. The log likelihood is

log L=−n

2log(2πσ2)−1

2σ2X

(yi−βxi)2.

The MLE is Pixiyi/Pix2

i, with mean βand variance σ2/Pix2

i, the CRLB.

11.38 a. The model is yi=θxi+i, so the least squares estimate of θis Pxiyi/Px2

i(regression

through the origin).

EPxiYi

Px2

i=Pxi(xiθ)

Px2

=θ

Var PxiYi

Px2

i=Px2

i(xiθ)

(Px2

i)2=θPx3

(Px2

i)2.

The estimator is unbiased.

b. The likelihood function is

L(θ|x) =

i=1

e−θxi(θxi)yi

(yi)! =e−θΣxiQ(θxi)yi

Qyi!

∂

∂θ logL=∂

∂θ h−θXxi+Xyilog(θxi)−log Yyi!i

=−Xxi+Xxiyi

θxi

set

= 0

which implies

θ=Pyi

Pxi

Eˆ

θ=Pθxi

Pxi

=θand Varˆ

θ= Var Pyi

Pxi=Pθxi

(Pxi)2=θ

Pxi

∂2

∂θ2log L=∂

∂θ −Xxi+Pyi

θ=−Pyi

θ2and E −∂2

∂θ2log L=Pxi

θ.

Thus, the CRLB is θ/ Pxi, and the MLE is the best unbiased estimator.

11-16 Solutions Manual for Statistical Inference

11.39 Let Aibe the set

Ai=





ˆα, ˆ

β:h(ˆα+ˆ

βx0i)−(α+βx0i)i.

Ss1

n+(x0i−¯x)2

Sxx 

≤tn−2,α/2m





Then P(∩m

i=1Ai) is the probability of simultaneous coverage, and using the Bonferroni In-

equality (1.2.10) we have

P(∩m

i=1Ai)≥

i=1

P(Ai)−(m−1) =

i=1 1−α

m−(m−1) = 1 −α.

11.41 Assume that we have observed data (y1, x1),(y2, x2), . . . , (yn−1, xn−1) and we have xnbut not

yn. Let φ(yi|xi) denote the density of Yi, a n(a+bxi, σ2).

a. The expected complete-data log likelihood is

E n

i=1

log φ(Yi|xi)!=

n−1

i=1

log φ(yi|xi) + E log φ(Y|xn),

where the expectation is respect to the distribution φ(y|xn) with the current values of the

parameter estimates. Thus we need to evaluate

E log φ(Y|xn) = E −1

2log(2πσ2

1)−1

2σ2

(Y−µ1)2,

where Y∼n(µ0, σ2

0). We have

E(Y−µ1)2= E([Y−µ0]+[µ0−µ1])2=σ2

0+ [µ0−µ1]2,

since the cross term is zero. Putting this all together, the expected complete-data log

likelihood is

−n

2log(2πσ2

1)−1

2σ2

n−1

i=1

[yi−(a1+b1xi)]2−σ2

0+ [(a0+b0xn)−(a1+b1xn)]2

2σ2

=−n

2log(2πσ2

1)−1

2σ2

i=1

[yi−(a1+b1xi)]2−σ2

2σ2

if we deﬁne yn=a0+b0xn.

b. For ﬁxed a0and b0, maximizing this likelihood gives the least squares estimates, while the

maximum with respect to σ2

1is

ˆσ2

1=Pn

i=1[yi−(a1+b1xi)]2+σ2

So the EM algorithm is the following: At iteration t, we have estimates ˆa(t),ˆ

b(t), and ˆσ2(t).

We then set y(t)

n= ˆa(t)+ˆ

b(t)xn(which is essentially the E-step) and then the M-step is

to calculate ˆa(t+1) and ˆ

b(t+1) as the least squares estimators using (y1, x1), (y2, x2), . . .

(yn−1, xn−1), (y(t)

n, xn), and

ˆσ2(t+1)

1=Pn

i=1[yi−(a(t+1) +b(t+1)xi)]2+σ2(t)

Second Edition 11-17

c. The EM calculations are simple here. Since y(t)

n= ˆa(t)+ˆ

b(t)xn, the estimates of aand b

must converge to the least squares estimates (since they minimize the sum of squares of

the observed data, and the last term adds nothing. For ˆσ2we have (substituting the least

squares estimates) the stationary point

ˆσ2=Pn

i=1[yi−(ˆa+ˆ

bxi)]2+ ˆσ2

n⇒ˆσ2=σ2

obs,

where σ2

obs is the MLE from the n−1 observed data points. So the MLE s are the same as

those without the extra xn.

d. Now we use the bivariate normal density (see Deﬁnition 4.5.10 and Exercise 4.45 ). Denote

the density by φ(x, y). Then the expected complete-data log likelihood is

n−1

i=1

log φ(xi, yi) + E log φ(X, yn),

where after iteration tthe missing data density is the conditional density of Xgiven Y=yn,

X|Y=yn∼nµ(t)

X+ρ(t)(σ(t)

X/σ(t)

Y)(yn−µ(t)

Y),(1 −ρ2(t))σ2(t)

X.

Denoting the mean by µ0and the variance by σ2

0, the expected value of the last piece in

the likelihood is

E log φ(X, yn)

=−1

2log(2πσ2

Xσ2

Y(1 −ρ2))

−1

2(1 −ρ2)"EX−µX

σX2

−2ρE(X−µX)(yn−µY)

σXσY+yn−µY

σY2#

=−1

2log(2πσ2

Xσ2

Y(1 −ρ2))

−1

2(1 −ρ2)"σ2

σ2

+µ0−µX

σX2

−2ρ(µ0−µX)(yn−µY)

σXσY+yn−µY

σY2#.

So the expected complete-data log likelihood is

n−1

i=1

log φ(xi, yi) + log φ(µ0, yn)−σ2

2(1 −ρ2)σ2

The EM algorithm is similar to the previous one. First note that the MLEs of µYand σ2

are the usual ones, ¯yand ˆσ2

Y, and don’t change with the iterations. We update the other

estimates as follows. At iteration t, the E-step consists of replacing x(t)

nby

x(t+1)

n= ˆµ(t)

X+ρ(t)σ(t)

σ(t)

(yn−¯y).

Then µ(t+1)

X= ¯xand we can write the likelihood as

−1

2log(2πσ2

Xˆσ2

Y(1 −ρ2)) −1

2(1 −ρ2)Sxx +σ2

σ2

X−2ρSxy

σXˆσY

+Syy

ˆσ2

Y.

11-18 Solutions Manual for Statistical Inference

which is the usual bivariate normal likelihood except that we replace Sxx with Sxx +σ2

So the MLEs are the usual ones, and the EM iterations are

x(t+1)

n= ˆµ(t)

X+ρ(t)σ(t)

σ(t)

(yn−¯y)

ˆµ(t+1)

X= ¯x(t)

ˆσ2(t+1)

X=S(t)

xx + (1 −ˆρ2(t))ˆσ2(t)

ˆρ(t+1) =S(t)

q(S(t)

xx + (1 −ˆρ2(t))ˆσ2(t+1)

X)Syy

Here is R code for the EM algorithm:

nsim<-20;

xdata0<-c(20,19.6,19.6,19.4,18.4,19,19,18.3,18.2,18.6,19.2,18.2,

18.7,18.5,18,17.4,16.5,17.2,17.3,17.8,17.3,18.4,16.9)

ydata0<-(1,1.2,1.1,1.4,2.3,1.7,1.7,2.4,2.1,2.1,1.2,2.3,1.9,2.4,2.6,

2.9,4,3.3,3,3.4,2.9,1.9,3.9,4.2)

nx<-length(xdata0);

ny<-length(ydata0);

#initial values from mles on the observed data#

xmean<-18.24167;xvar<-0.9597797;ymean<-2.370833;yvar<- 0.8312327;

rho<- -0.9700159;

for (j in 1:nsim) {

#This is the augmented x (O2) data#

xdata<-c(xdata0,xmean+rho*(4.2-ymean)/(sqrt(xvar*yvar)))

xmean<-mean(xdata);

Sxx<-(ny-1)*var(xdata)+(1-rho^2)*xvar

xvar<-Sxx/ny

rho<-cor(xdata,ydata0)*sqrt((ny-1)*var(xdata)/Sxx)

}

The algorithm converges very quickly. The MLEs are

ˆµX= 18.24,ˆµY= 2.37,ˆσ2

X=.969,ˆσ2

Y=.831,ˆρ=−0.969.

Chapter 12

Regression Models

12.1 The point (ˆx0,ˆy0) is the closest if it lies on the vertex of the right triangle with vertices (x0, y0)

and (x0, a +bx0). By the Pythagorean theorem, we must have

h(ˆx0−x0)2+ˆy0−(a+bx0)2i+h(ˆx0−x0)2+(ˆy0−y0)2i= (x0−x0)2+ (y0−(a+bx0))2.

Substituting the values of ˆx0and ˆy0from (12.2.7) we obtain for the LHS above

"b(y0−bx0−a)

1+b22

+b2(y0−bx0−a)

1+b22#+"b(y0−bx0−a)

1+b22

+y0−bx−a)

1+b22#

= (y0−(a+bx0))2"b2+b4+b2+1

(1+b2)2#= (y0−(a+bx0))2.

12.3 a. Diﬀerentiation yields ∂f /∂ξi=−2(xi−ξi)−2λβ [yi−(α+βξi)] set

= 0 ⇒ξi(1 + λβ2) =

xi−λβ(yi−α), which is the required solution. Also, ∂2f/∂ξ2= 2(1 + λβ2)>0, so this is a

minimum.

b. Parts i), ii), and iii) are immediate. For iv) just note that Dis Euclidean distance between

(x1,√λy1) and (x2,√λy2), hence satisﬁes the triangle inequality.

12.5 Diﬀerentiate log L, for Lin (12.2.17), to get

∂

∂σ2

log L=−n

σ2

2(σ2

δ)2

1+ ˆ

β2

i=1 hyi−(ˆα+ˆ

βxi)i2.

Set this equal to zero and solve for σ2

δ. The answer is (12.2.18).

12.7 a. Suppressing the subscript iand the minus sign, the exponent is

(x−ξ)2

σ2

+[y−(α+βξ)]2

σ2



=σ2

+β2σ2

σ2

σ2

δ(ξ−k)2+[y−(α+βx)]2

σ2

+β2σ2

where k=σ2

x+σ2

δβ(y−α)

σ2

+β2σ2

. Thus, integrating with respect to ξeliminates the ﬁrst term.

b. The resulting function must be the joint pdf of Xand Y. The double integral is inﬁnite,

however.

12.9 a. From the last two equations in (12.2.19),

ˆσ2

δ=1

nSxx −ˆσ2

ξ=1

nSxx −1

Sxy

β,

which is positive only if Sxx > Sxy/ˆ

β. Similarly,

ˆσ2

=1

nSyy −ˆ

β2ˆσ2

ξ=1

nSyy −ˆ

β21

Sxy

β,

which is positive only if Syy >ˆ

βSxy.

12-2 Solutions Manual for Statistical Inference

b. We have from part a), ˆσ2

δ>0⇒Sxx > Sxy/ˆ

βand ˆσ2

>0⇒Syy >ˆ

βSxy. Furthermore,

ˆσ2

ξ>0 implies that Sxy and ˆ

βhave the same sign. Thus Sxx >|Sxy|/|ˆ

β|and Syy >|ˆ

β||Sxy|.

Combining yields

|Sxy|

Sxx

<

β<Syy

|Sxy|.

12.11 a.

Cov(aY +bX, cY +dX)

= E(aY +bX)(cY +dX)−E(aY +bX)E(cY +dX)

= E acY 2+(bc +ad)XY +bdX2−E(aY +bX)E(cY +dX)

=acVarY+ac(EY)2+ (bc +ad)Cov(X, Y )

+(bc +ad)EXEY+bdVarX+bd(EX)2−E(aY +bX)E(cY +dX)

=acVarY+ (bc +ad)Cov(X, Y ) + bdVarX.

b. Identify a=βλ,b= 1, c= 1, d=−β, and using (12.3.19)

Cov(βλYi+Xi, Yi−βXi) = βλVarY+ (1 −λβ2)Cov(X, Y )−βVarX

=βλ σ2

+β2σ2

ξ+ (1 −λβ2)βσ2

ξ−βσ2

δ+σ2

ξ

=βλσ2

−βσ2

δ= 0

if λσ2

=σ2

δ. (Note that we did not need the normality assumption, just the moments.)

c. Let Wi=βλYi+Xi,Vi=Yi+βXi. Exercise 11.33 shows that if Cov(Wi, Vi) = 0,

then √n−2r/√1−r2has a tn−2distribution. Thus √n−2rλ(β)/p1−r2

λ(β) has a tn−2

distribution for all values of β, by part (b). Also

P (β:(n−2)r2

λ(β)

1−r2

λ(β)≤F1,n−2,α)!=P(X, Y ): (n−2)r2

λ(β)

1−r2

λ(β)≤F1,n−2,α= 1 −α.

12.13 a. Rewrite (12.2.22) to get

β:ˆ

β−tˆσβ

√n−2≤β≤ˆ

β+tˆσβ

√n−2=





β:(ˆ

β−β)2

σ2

β.(n−2) ≤F





b. For ˆ

βof (12.2.16), the numerator of rλ(β) in (12.2.22) can be written

βλSyy+(1−β2λ)Sxy −βSxy =β2(λSxy ) + β(Sxx −λSyy) + Sxy =λSxy(β−ˆ

β)β+1

λˆ

β.

Again from (12.2.22), we have

λ(β)

1−r2

λ(β)

=βλSyy+(1−β2λ)Sxy −βSxy 2

(β2λ2Syy +2βλSxy+Sxx) (Syy−2βSxy+β2Sxx)−(βλSyy+(1−β2λ)Sxy −βSxx)2,

and a great deal of straightforward (but tedious) algebra will show that the denominator

of this expression is equal to

(1 + λβ2)2SyySxx −S2

xy.

Second Edition 12-3

Thus

λ(β)

1−r2

λ(β)=y

λ2S2

xy β−ˆ

β2β+1

λˆ

β2

(1−λβ2)2SyySx−S2

xy

=β−ˆ

β2

ˆσ2

β 1+λβ ˆ

1+λβ2!2(1 + λˆ

β2)2S2

β2h(Sxx −λSyy)2+ 4λS2

xyi,

after substituting ˆσ2

βfrom page 588. Now using the fact that ˆ

βand −1/λ ˆ

βare both roots

of the same quadratic equation, we have

(1+λˆ

β2)2

β2=1

β+λˆ

β2

=(Sxx−λSyy)2+4λS2

Thus the expression in square brackets is equal to 1.

12.15 a.

π(−α/β) = eα+β(−α/β)

1 + eα+β(−α/β)=e0

1 + e0=1

π((−α/β) + c) = eα+β((−α/β)+c)

1 + eα+β((−α/β)+c)=eβc

1 + eβc ,

and

1−π((−α/β)−c) = 1 −e−βc

1 + e−βc =eβc

1 + eβc .

dxπ(x) = βeα+βx

[1 + eα+βx]2=βπ(x)(1 −π(x)).

d. Because π(x)

1−π(x)=eα+βx,

the result follows from direct substitution.

e. Follows directly from (d).

f. Follows directly from

∂

∂α F(α+βx) = f(α+βx) and ∂

∂β F(α+βx) = xf(α+βx).

g. For F(x) = ex/(1 + ex), f(x) = F(x)(1 −F(x)) and the result follows. For F(x) = π(x) of

(12.3.2), from part (c) if follows that f

F(1−F)=β.

12.17 a. The likelihood equations and solution are the same as in Example 12.3.1 with the exception

that here π(xj) = Φ(α+βxj), where Φ is the cdf of a standard normal.

b. If the 0 −1 failure response in denoted “oring” and the temperature data is “temp”, the

following Rcode will generate the logit and probit regression:

summary(glm(oring~temp, family=binomial(link=logit)))

summary(glm(oring~temp, family=binomial(link=probit)))

12-4 Solutions Manual for Statistical Inference

For the logit model we have

Estimate Std. Error z value P r(>|z|)

Intercept 15.0429 7.3719 2.041 0.0413

temp −0.2322 0.1081 −2.147 0.0318

and for the probit model we have

Estimate Std. Error z value P r(>|z|)

Intercept 8.77084 3.86222 2.271 0.0232

temp −0.13504 0.05632 −2.398 0.0165

Although the coeﬃcients are diﬀerent, the ﬁt is qualitatively the same, and the probability

of failure at 31◦, using the probit model, is .9999.

12.19 a. Using the notation of Example 12.3.1, the likelihood (joint density) is

j=1 eα+βxj

1 + eα+βxjy∗

j1

1 + eα+βxjnj−y∗

j=1 1

1 + eα+βxjnj

eαPjy∗

j+βPjxjy∗

By the Factorization Theorem, Pjy∗

jand Pjxjy∗

jare suﬃcient.

b. Straightforward substitution.

12.21 Since d

dπ log(π/(1 −π)) = 1/(π(1 −π)),

Var log ˆπ

1−ˆπ≈1

π(1 −π)2π(1 −π)

n=1

nπ(1 −π)

12.23 a. If Pai= 0,

aiYi=X

ai[α+βxi+µ(1 −δ)] = βX

aixi=β

for ai=xi−¯x.

E( ¯

Y−β¯x) = 1

[α+βxi+µ(1 −δ)] −β¯x=α+µ(1 −δ),

so the least squares estimate ais unbiased in the model Yi=α0+βxi+i, where α0=

α+µ(1 −δ).

12.25 a. The least absolute deviation line minimizes

|y1−(c+dx1)|+|y2−(c+dx1)|+|y3−(c+dx3)|.

Any line that lies between (x1, y1) and (x1, y2) has the same value for the sum of the ﬁrst

two terms, and this value is smaller than that of any line outside of (x1, y1) and (x2, y2).

Of all the lines that lie inside, the ones that go through (x3, y3) minimize the entire sum.

b. For the least squares line, a=−53.88 and b=.53. Any line with bbetween (17.9−14.4)/9 =

.39 and (17.9−11.9)/9 = .67 and a= 17.9−136bis a least absolute deviation line.

12.27 In the terminology of M-estimators (see the argument on pages 485 −486), ˆ

βLis consistent

for the β0that satisﬁes Eβ0Piψ(Yi−β0xi) = 0, so we must take the “true” βto be this

value. We then see that X

ψ(Yi−ˆ

βLxi)→0

as long as the derivative term is bounded, which we assume is so.

Second Edition 12-5

12.29 The argument for the median is a special case of Example 12.4.3, where we take xi= 1

so σ2

x= 1. The asymptotic distribution is given in (12.4.5) which, for σ2

x= 1, agrees with

Example 10.2.3.

12.31 The LAD estimates, from Example 12.4.2 are ˜α= 18.59 and ˜

β=−.89. Here is Mathematica

code to bootstrap the standard deviations. (Mathematica is probably not the best choice here,

as it is somewhat slow. Also, the minimization seemed a bit delicate, and worked better when

done iteratively.) Sad is the sum of the absolute deviations, which is minimized iteratively

in bmin and amin. The residuals are bootstrapped by generating random indices ufrom the

discrete uniform distribution on the integers 1 to 23.

1.First enter data and initialize

Needs["Statistics‘Master‘"]

Clear[a,b,r,u]

a0=18.59;b0=-.89;aboot=a0;bboot=b0;

y0={1,1.2,1.1,1.4,2.3,1.7,1.7,2.4,2.1,2.1,1.2,2.3,1.9,2.4,

2.6,2.9,4,3.3,3,3.4,2.9,1.9,3.9};

x0={20,19.6,19.6,19.4,18.4,19,19,18.3,18.2,18.6,19.2,18.2,

18.7,18.5,18,17.4,16.5,17.2,17.3,17.8,17.3,18.4,16.9};

model=a0+b0*x0;

r=y0-model;

u:=Random[DiscreteUniformDistribution[23]]

Sad[a_,b_]:=Mean[Abs[model+rstar-(a+b*x0)]]

bmin[a_]:=FindMinimum[Sad[a,b],{b,{.5,1.5}}]

amin:=FindMinimum[Sad[a,b/.bmin[a][[2]]],{a,{16,19}}]

2.Here is the actual bootstrap. The vectors aboot and bboot contain the bootstrapped values.

B=500;

Do[

rstar=Table[r[[u]],{i,1,23}];

astar=a/.amin[[2]];

bstar=b/.bmin[astar][[2]];

aboot=Flatten[{aboot,astar}];

bboot=Flatten[{bboot,bstar}],

{i,1,B}]

3.Summary Statistics

Mean[aboot]

StandardDeviation[aboot]

Mean[bboot]

StandardDeviation[bboot]

4.The results are Intercept: Mean 18.66, SD .923 Slope: Mean −.893, SD .050.

George Casella And Roger Berger Solutions Manual For Statistical Inference

Navigation menu

Versions of this User Manual:

Views

Navigation