George Casella And Roger Berger Solutions Manual For Statistical Inference

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 195 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Solutions Manual for
Statistical Inference, Second Edition
George Casella
University of Florida
Roger L. Berger
North Carolina State University
Damaris Santana
University of Florida
0-2 Solutions Manual for Statistical Inference
“When I hear you give your reasons,” I remarked, “the thing always appears to me to be so
ridiculously simple that I could easily do it myself, though at each successive instance of your
reasoning I am baffled until you explain your process.”
Dr. Watson to Sherlock Holmes
A Scandal in Bohemia
0.1 Description
This solutions manual contains solutions for all odd numbered problems plus a large number of
solutions for even numbered problems. Of the 624 exercises in Statistical Inference, Second Edition,
this manual gives solutions for 484 (78%) of them. There is an obtuse pattern as to which solutions
were included in this manual. We assembled all of the solutions that we had from the first edition,
and filled in so that all odd-numbered problems were done. In the passage from the first to the
second edition, problems were shuffled with no attention paid to numbering (hence no attention
paid to minimize the new effort), but rather we tried to put the problems in logical order.
A major change from the first edition is the use of the computer, both symbolically through
Mathematicatm and numerically using R. Some solutions are given as code in either of these lan-
guages. Mathematicatm can be purchased from Wolfram Research, and Ris a free download from
http://www.r-project.org/.
Here is a detailed listing of the solutions included.
Chapter Number of Exercises Number of Solutions Missing
1 55 51 26,30,36,42
2 40 37 34,38,40
3 50 42 4,6,10,20,30,32,34,36
4 65 52 8,14,22,28,36,40
48,50,52,56,58,60,62
5 69 46 2,4,12,14,26,28
all even problems from 36 68
6 43 35 8,16,26,28,34,36,38,42
7 66 52 4,14,16,28,30,32,34,
36,42,54,58,60,62,64
8 58 51 36,40,46,48,52,56,58
9 58 41 2,8,10,20,22,24,26,28,30
32,38,40,42,44,50,54,56
10 48 26 all even problems except 4 and 32
11 41 35 4,20,22,24,26,40
12 31 16 all even problems
0.2 Acknowledgement
Many people contributed to the assembly of this solutions manual. We again thank all of those
who contributed solutions to the first edition – many problems have carried over into the second
edition. Moreover, throughout the years a number of people have been in constant touch with us,
contributing to both the presentations and solutions. We apologize in advance for those we forget to
mention, and we especially thank Jay Beder, Yong Sung Joo, Michael Perlman, Rob Strawderman,
and Tom Wehrly. Thank you all for your help.
And, as we said the first time around, although we have benefited greatly from the assistance and
ACKNOWLEDGEMENT 0-3
comments of others in the assembly of this manual, we are responsible for its ultimate correctness.
To this end, we have tried our best but, as a wise man once said, “You pays your money and you
takes your chances.”
George Casella
Roger L. Berger
Damaris Santana
December, 2001
Chapter 1
Probability Theory
“If any little problem comes your way, I shall be happy, if I can, to give you a hint or two as
to its solution.”
Sherlock Holmes
The Adventure of the Three Students
1.1 a. Each sample point describes the result of the toss (H or T) for each of the four tosses. So,
for example THTT denotes T on 1st, H on 2nd, T on 3rd and T on 4th. There are 24= 16
such sample points.
b. The number of damaged leaves is a nonnegative integer. So we might use S={0,1,2, . . .}.
c. We might observe fractions of an hour. So we might use S={t:t0}, that is, the half
infinite interval [0,).
d. Suppose we weigh the rats in ounces. The weight must be greater than zero so we might use
S= (0,). If we know no 10-day-old rat weighs more than 100 oz., we could use S= (0,100].
e. If nis the number of items in the shipment, then S={0/n, 1/n, . . . , 1}.
1.2 For each of these equalities, you must show containment in both directions.
a. xA\BxAand x /BxAand x /ABxA\(AB). Also, xAand
x /BxAand xBcxABc.
b. Suppose xB. Then either xAor xAc. If xA, then xBA, and, hence
x(BA)(BAc). Thus B(BA)(BAc). Now suppose x(BA)(BAc).
Then either x(BA) or x(BAc). If x(BA), then xB. If x(BAc),
then xB. Thus (BA)(BAc)B. Since the containment goes both ways, we have
B= (BA)(BAc). (Note, a more straightforward argument for this part simply uses
the Distributive Law to state that (BA)(BAc) = B(AAc) = BS=B.)
c. Similar to part a).
d. From part b).
AB=A[(BA)(BAc)] = A(BA)A(BAc) = A[A(BAc)] =
A(BAc).
1.3 a. xABxAor xBxBA
xABxAand xBxBA.
b. xA(BC)xAor xBCxABor xCx(AB)C.
(It can similarly be shown that A(BC) = (AC)B.)
xA(BC)xAand xBand xCx(AB)C.
c. x(AB)cx /Aor x /BxAcand xBcxAcBc
x(AB)cx /ABx /Aand x /BxAcor xBcxAcBc.
1.4 a. Aor Bor both” is AB. From Theorem 1.2.9b we have P(AB) = P(A)+P(B)P(AB).
1-2 Solutions Manual for Statistical Inference
b. “Aor Bbut not both” is (ABc)(BAc). Thus we have
P((ABc)(BAc)) = P(ABc) + P(BAc) (disjoint union)
= [P(A)P(AB)] + [P(B)P(AB)] (Theorem1.2.9a)
=P(A) + P(B)2P(AB).
c. “At least one of Aor B” is AB. So we get the same answer as in a).
d. “At most one of Aor B” is (AB)c, and P((AB)c) = 1 P(AB).
1.5 a. ABC={a U.S. birth results in identical twins that are female}
b. P(ABC) = 1
90 ×1
3×1
2
1.6
p0= (1 u)(1 w), p1=u(1 w) + w(1 u), p2=uw,
p0=p2u+w= 1
p1=p2uw = 1/3.
These two equations imply u(1 u)=1/3, which has no solution in the real numbers. Thus,
the probability assignment is not legitimate.
1.7 a.
P(scoring ipoints) = (1πr2
Aif i= 0
πr2
Ah(6i)2(5i)2
52iif i= 1, . . . , 5.
b.
P(scoring ipoints|board is hit) = P(scoring ipoints board is hit)
P(board is hit)
P(board is hit) = πr2
A
P(scoring ipoints board is hit) = πr2
A(6 i)2(5 i)2
52i= 1, . . . , 5.
Therefore,
P(scoring ipoints|board is hit) = (6 i)2(5 i)2
52i= 1, . . . , 5
which is exactly the probability distribution of Example 1.2.7.
1.8 a. P(scoring exactly ipoints) = P(inside circle i)P(inside circle i+ 1). Circle ihas radius
(6 i)r/5, so
P(sscoring exactly ipoints) = π(6 i)2r2
52πr2π((6(i+ 1)))2r2
52πr2=(6 i)2(5 i)2
52.
b. Expanding the squares in part a) we find P(scoring exactly ipoints) = 112i
25 , which is
decreasing in i.
c. Let P(i) = 112i
25 . Since i5, P(i)0 for all i.P(S) = P(hitting the dartboard) = 1 by
definition. Lastly, P(ij) = area of iring + area of jring = P(i) + P(j).
1.9 a. Suppose x(αAα)c, by the definition of complement x6∈ ∪αAα, that is x6∈ Aαfor all
αΓ. Therefore xAc
αfor all αΓ. Thus x∈ ∩αAc
αand, by the definition of intersection
xAc
αfor all αΓ. By the definition of complement x6∈ Aαfor all αΓ. Therefore
x6∈ ∪αAα. Thus x(αAα)c.
Second Edition 1-3
b. Suppose x(αAα)c, by the definition of complement x6∈ (αAα). Therefore x6∈ Aαfor
some αΓ. Therefore xAc
αfor some αΓ. Thus x∈ ∪αAc
αand, by the definition of
union, xAc
αfor some αΓ. Therefore x6∈ Aαfor some αΓ. Therefore x6∈ ∩αAα. Thus
x(αAα)c.
1.10 For A1, . . . , An
(i) n
[
i=1
Ai!c
=
n
\
i=1
Ac
i(ii) n
\
i=1
Ai!c
=
n
[
i=1
Ac
i
Proof of (i): If x(Ai)c, then x /∈ ∪Ai. That implies x /Aifor any i, so xAc
ifor every i
and x∈ ∩Ai.
Proof of (ii): If x(Ai)c, then x /∈ ∩Ai. That implies xAc
ifor some i, so x∈ ∪Ac
i.
1.11 We must verify each of the three properties in Definition 1.2.1.
a. (1) The empty set ∅ ∈ {∅, S}. Thus ∅ ∈ B. (2) c=S∈ B and Sc=∅ ∈ B. (3) ∅∪S=S∈ B.
b. (1) The empty set is a subset of any set, in particular, ∅ ⊂ S. Thus ∅ ∈ B. (2) If A∈ B,
then AS. By the definition of complementation, Acis also a subset of S, and, hence,
Ac∈ B. (3) If A1, A2, . . . ∈ B, then, for each i, AiS. By the definition of union, AiS.
Hence, Ai∈ B.
c. Let B1and B2be the two sigma algebras. (1) ∅∈B1and ∅∈B2since B1and B2are
sigma algebras. Thus ∅ ∈ B1∩ B2. (2) If A∈ B1∩ B2, then A∈ B1and A∈ B2. Since
B1and B2are both sigma algebra Ac∈ B1and Ac∈ B2. Therefore Ac∈ B1∩ B2. (3) If
A1, A2, . . . ∈ B1B2, then A1, A2, . . . ∈ B1and A1, A2, . . . ∈ B2. Therefore, since B1and B2
are both sigma algebra,
i=1Ai∈ B1and
i=1Ai∈ B2. Thus
i=1Ai∈ B1∩ B2.
1.12 First write
P
[
i=1
Ai!=P n
[
i=1
Ai
[
i=n+1
Ai!
=P n
[
i=1
Ai!+P
[
i=n+1
Ai!(Ais are disjoint)
=
n
X
i=1
P(Ai) + P
[
i=n+1
Ai!(finite additivity)
Now define Bk=S
i=kAi. Note that Bk+1 Bkand Bkφas k→ ∞. (Otherwise the sum
of the probabilities would be infinite.) Thus
P
[
i=1
Ai!= lim
n→∞ P
[
i=1
Ai!= lim
n→∞ "n
X
i=1
P(Ai) + P(Bn+1)#=
X
i=1
P(Ai).
1.13 If Aand Bare disjoint, P(AB) = P(A) + P(B) = 1
3+3
4=13
12 , which is impossible. More
generally, if Aand Bare disjoint, then ABcand P(A)P(Bc). But here P(A)> P (Bc),
so Aand Bcannot be disjoint.
1.14 If S={s1, . . . , sn}, then any subset of Scan be constructed by either including or excluding
si, for each i. Thus there are 2npossible choices.
1.15 Proof by induction. The proof for k= 2 is given after Theorem 1.2.14. Assume true for k, that
is, the entire job can be done in n1×n2× ··· × nkways. For k+ 1, the k+ 1th task can be
done in nk+1 ways, and for each one of these ways we can complete the job by performing
1-4 Solutions Manual for Statistical Inference
the remaining ktasks. Thus for each of the nk+1 we have n1×n2× ··· × nkways of com-
pleting the job by the induction hypothesis. Thus, the number of ways we can do the job is
(1 ×(n1×n2× ··· × nk)) + ··· + (1 ×(n1×n2× ··· × nk))
| {z }
nk+1terms
=n1×n2× ··· × nk×nk+1.
1.16 a) 263. b) 263+ 262. c) 264+ 263+ 262.
1.17 There are n
2=n(n1)/2 pieces on which the two numbers do not match. (Choose 2 out of
nnumbers without replacement.) There are npieces on which the two numbers match. So the
total number of different pieces is n+n(n1)/2 = n(n+ 1)/2.
1.18 The probability is (n
2)n!
nn=(n1)(n1)!
2nn2. There are many ways to obtain this. Here is one. The
denominator is nnbecause this is the number of ways to place nballs in ncells. The numerator
is the number of ways of placing the balls such that exactly one cell is empty. There are nways
to specify the empty cell. There are n1 ways of choosing the cell with two balls. There are
n
2ways of picking the 2 balls to go into this cell. And there are (n2)! ways of placing the
remaining n2 balls into the n2 cells, one ball in each cell. The product of these is the
numerator n(n1)n
2(n2)! = n
2n!.
1.19 a. 6
4= 15.
b. Think of the nvariables as nbins. Differentiating with respect to one of the variables is
equivalent to putting a ball in the bin. Thus there are runlabeled balls to be placed in n
unlabeled bins, and there are n+r1
rways to do this.
1.20 A sample point specifies on which day (1 through 7) each of the 12 calls happens. Thus there
are 712 equally likely sample points. There are several different ways that the calls might be
assigned so that there is at least one call each day. There might be 6 calls one day and 1 call
each of the other days. Denote this by 6111111. The number of sample points with this pattern
is 712
66!. There are 7 ways to specify the day with 6 calls. There are 12
6to specify which of
the 12 calls are on this day. And there are 6! ways of assigning the remaining 6 calls to the
remaining 6 days. We will now count another pattern. There might be 4 calls on one day, 2 calls
on each of two days, and 1 call on each of the remaining four days. Denote this by 4221111.
The number of sample points with this pattern is 712
46
28
26
24!. (7 ways to pick day with 4
calls, 12
4to pick the calls for that day, 6
2to pick two days with two calls, 8
2ways to pick
two calls for lowered numbered day, 6
2ways to pick the two calls for higher numbered day,
4! ways to order remaining 4 calls.) Here is a list of all the possibilities and the counts of the
sample points for each one.
pattern number of sample points
6111111 712
66! = 4,656,960
5211111 712
567
25! = 83,825,280
4221111 712
46
28
26
24! = 523,908,000
4311111 712
468
35! = 139,708,800
3321111 7
212
39
356
24! = 698,544,000
3222111 712
36
39
37
25
23! = 1,397,088,000
2222211 7
512
210
28
26
24
22! = 314,344,800
3,162,075,840
The probability is the total number of sample points divided by 712, which is 3,162,075,840
712
.2285.
1.21 The probability is (n
2r)22r
(2n
2r). There are 2n
2rways of choosing 2rshoes from a total of 2nshoes.
Thus there are 2n
2requally likely sample points. The numerator is the number of sample points
for which there will be no matching pair. There are n
2rways of choosing 2rdifferent shoes
Second Edition 1-5
styles. There are two ways of choosing within a given shoe style (left shoe or right shoe), which
gives 22rways of arranging each one of the n
2rarrays. The product of this is the numerator
n
2r22r.
1.22 a) (31
15)(29
15)(31
15)(30
15)···(31
15)
(366
180)b) 336
366
335
365 ··· 316
336
(366
30 ).
1.23
P( same number of heads ) =
n
X
x=0
P(1st tosses x, 2nd tosses x)
=
n
X
x=0 "n
x1
2x1
2nx#2
=1
4nn
X
x=0 n
x2
.
1.24 a.
P(Awins) =
X
i=1
P(Awins on ith toss)
=1
2+1
221
2+1
241
2+··· =
X
i=0 1
22i+1
= 2/3.
b. P(Awins) = p+ (1 p)2p+ (1 p)4p+··· =P
i=0 p(1 p)2i=p
1(1p)2.
c. d
dp p
1(1p)2=p2
[1(1p)2]2>0. Thus the probability is increasing in p, and the minimum
is at zero. Using L’Hˆopital’s rule we find limp0p
1(1p)2= 1/2.
1.25 Enumerating the sample space gives S0={(B, B),(B, G),(G, B),(G, G)},with each outcome
equally likely. Thus P(at least one boy) = 3/4 and P(both are boys) = 1/4, therefore
P( both are boys |at least one boy ) = 1/3.
An ambiguity may arise if order is not acknowledged, the space is S0={(B, B),(B, G),(G, G)},
with each outcome equally likely.
1.27 a. For nodd the proof is straightforward. There are an even number of terms in the sum
(0,1,···, n), and n
kand n
nk, which are equal, have opposite signs. Thus, all pairs cancel
and the sum is zero. If nis even, use the following identity, which is the basis of Pascal’s
triangle: For k > 0, n
k=n1
k+n1
k1. Then, for neven
n
X
k=0
(1)kn
k=n
0+
n1
X
k=1
(1)kn
k+n
n
=n
0+n
n+
n1
X
k=1
(1)kn1
k+n1
k1
=n
0+n
nn1
0n1
n1= 0.
b. Use the fact that for k > 0, kn
k=nn1
k1to write
n
X
k=1
kn
k=n
n
X
k=1 n1
k1=n
n1
X
j=0 n1
j=n2n1.
1-6 Solutions Manual for Statistical Inference
c. Pn
k=1 (1)k+1kn
k=Pn
k=1(1)k+1n1
k1=nPn1
j=0 (1)jn1
j= 0 from part a).
1.28 The average of the two integrals is
[(nlog nn) + ((n+ 1) log (n+ 1) n)] /2=[nlog n+ (n+ 1) log (n+ 1)] /2n
(n+ 1/2) log nn.
Let dn= log n![(n+ 1/2) log nn], and we want to show that limn→∞ mdn=c, a constant.
This would complete the problem, since the desired limit is the exponential of this one. This
is accomplished in an indirect way, by working with differences, which avoids dealing with the
factorial. Note that
dndn+1 =n+1
2log 1 + 1
n1.
Differentiation will show that ((n+1
2)) log((1 + 1
n)) is increasing in n, and has minimum
value (3/2) log 2 = 1.04 at n= 1. Thus dndn+1 >0. Next recall the Taylor expansion of
log(1 + x) = xx2/2 + x3/3x4/4 + ···. The first three terms provide an upper bound on
log(1 + x), as the remaining adjacent pairs are negative. Hence
0< dndn+1 <n+1
21
n
1
2n2+1
3n31 = 1
12n2+1
6n3.
It therefore follows, by the comparison test, that the series P
1dndn+1 converges. Moreover,
the partial sums must approach a limit. Hence, since the sum telescopes,
lim
N→∞
N
X
1
dndn+1 = lim
N→∞ d1dN+1 =c.
Thus limn→∞ dn=d1c, a constant.
1.29 a.
Unordered Ordered
{4,4,12,12}(4,4,12,12), (4,12,12,4), (4,12,4,12)
(12,4,12,4), (12,4,4,12), (12,12,4,4)
Unordered Ordered
(2,9,9,12), (2,9,12,9), (2,12,9,9), (9,2,9,12)
{2,9,9,12}(9,2,12,9), (9,9,2,12), (9,9,12,2), (9,12,2,9)
(9,12,9,2), (12,2,9,9), (12,9,2,9), (12,9,9,2)
b. Same as (a).
c. There are 66ordered samples with replacement from {1,2,7,8,14,20}. The number of or-
dered samples that would result in {2,7,7,8,14,14}is 6!
2!2!1!1! = 180 (See Example 1.2.20).
Thus the probability is 180
66.
d. If the kobjects were distinguishable then there would be k! possible ordered arrangements.
Since we have k1, . . . , kmdifferent groups of indistinguishable objects, once the positions of
the objects are fixed in the ordered arrangement permutations within objects of the same
group won’t change the ordered arrangement. There are k1!k2!···km! of such permutations
for each ordered component. Thus there would be k!
k1!k2!···km!different ordered components.
e. Think of the mdistinct numbers as mbins. Selecting a sample of size k, with replacement,
is the same as putting kballs in the mbins. This is k+m1
k, which is the number of distinct
bootstrap samples. Note that, to create all of the bootstrap samples, we do not need to know
what the original sample was. We only need to know the sample size and the distinct values.
1.31 a. The number of ordered samples drawn with replacement from the set {x1, . . . , xn}is nn. The
number of ordered samples that make up the unordered sample {x1, . . . , xn}is n!. Therefore
the outcome with average x1+x2+···+xn
nthat is obtained by the unordered sample {x1, . . . , xn}
Second Edition 1-7
has probability n!
nn. Any other unordered outcome from {x1, . . . , xn}, distinct from the un-
ordered sample {x1, . . . , xn}, will contain m different numbers repeated k1, . . . , kmtimes
where k1+k2+··· +km=nwith at least one of the ki’s satisfying 2 kin. The
probability of obtaining the corresponding average of such outcome is
n!
k1!k2!···km!nn<n!
nn,since k1!k2!···km!>1.
Therefore the outcome with average x1+x2+···+xn
nis the most likely.
b. Stirling’s approximation is that, as n→ ∞,n!2πnn+(1/2)en, and thus
n!
nn 2
en!=n!en
nn2=2πnn+(1/2)enen
nn2= 1.
c. Since we are drawing with replacement from the set {x1, . . . , xn}, the probability of choosing
any xiis 1
n. Therefore the probability of obtaining an ordered sample of size nwithout xi
is (1 1
n)n. To prove that limn→∞(1 1
n)n=e1, calculate the limit of the log. That is
lim
n→∞ nlog 11
n= lim
n→∞
log 11
n
1/n .
L’Hˆopital’s rule shows that the limit is 1, establishing the result. See also Lemma 2.3.14.
1.32 This is most easily seen by doing each possibility. Let P(i) = probability that the candidate
hired on the ith trial is best. Then
P(1) = 1
N, P (2) = 1
N1, . . . , P (i) = 1
Ni+ 1, . . . , P (N) = 1.
1.33 Using Bayes rule
P(M|CB) = P(CB|M)P(M)
P(CB|M)P(M) + P(CB|F)P(F)=.05 ×1
2
.05 ×1
2+.0025 ×1
2
=.9524.
1.34 a.
P(Brown Hair)
=P(Brown Hair|Litter 1)P(Litter 1) + P(Brown Hair|Litter 2)P(Litter 2)
=2
31
2+3
51
2=19
30.
b. Use Bayes Theorem
P(Litter 1|Brown Hair) = P(BH|L1)P(L1)
P(BH|L1)P(L1) + P(BH|L2)P(L2=2
31
2
19
30
=10
19.
1.35 Clearly P(·|B)0, and P(S|B) = 1. If A1, A2, . . . are disjoint, then
P
[
i=1
Ai
B!=P(S
i=1 AiB)
P(B)=P(S
i=1 (AiB))
P(B)
=P
i=1 P(AiB)
P(B)=
X
i=1
P(Ai|B).
1-8 Solutions Manual for Statistical Inference
1.37 a. Using the same events A, B, C and Was in Example 1.3.4, we have
P(W) = P(W|A)P(A) + P(W|B)P(B) + P(W|C)P(C)
=γ1
3+ 0 1
3+ 1 1
3=γ+1
3.
Thus, P(A|W) = P(A∩W)
P(W)=γ/3
(γ+1)/3=γ
γ+1 where,
γ
γ+1 =1
3if γ=1
2
γ
γ+1 <1
3if γ < 1
2
γ
γ+1 >1
3if γ > 1
2.
b. By Exercise 1.35, P(·|W) is a probability function. A,Band Care a partition. So
P(A|W) + P(B|W) + P(C|W) = 1.
But, P(B|W) = 0. Thus, P(A|W) + P(C|W) = 1. Since P(A|W) = 1/3, P(C|W) = 2/3.
(This could be calculated directly, as in Example 1.3.4.) So if Acan swap fates with C, his
chance of survival becomes 2/3.
1.38 a. P(A) = P(AB) + P(ABc) from Theorem 1.2.11a. But (ABc)Bcand P(Bc) =
1P(B) = 0. So P(ABc) = 0, and P(A) = P(AB). Thus,
P(A|B) = P(AB)
P(B)=P(A)
1=P(A)
.
b. ABimplies AB=A. Thus,
P(B|A) = P(AB)
P(A)=P(A)
P(A)= 1.
And also,
P(A|B) = P(AB)
P(B)=P(A)
P(B).
c. If Aand Bare mutually exclusive, then P(AB) = P(A) + P(B) and A(AB) = A.
Thus,
P(A|AB) = P(A(AB))
P(AB)=P(A)
P(A) + P(B).
d. P(ABC) = P(A(BC)) = P(A|BC)P(BC) = P(A|BC)P(B|C)P(C).
1.39 a. Suppose Aand Bare mutually exclusive. Then AB=and P(AB) = 0. If Aand B
are independent, then 0 = P(AB) = P(A)P(B). But this cannot be since P(A)>0 and
P(B)>0. Thus Aand Bcannot be independent.
b. If Aand Bare independent and both have positive probability, then
0< P (A)P(B) = P(AB).
This implies AB6=, that is, Aand Bare not mutually exclusive.
1.40 a. P(AcB) = P(Ac|B)P(B) = [1 P(A|B)]P(B) = [1 P(A)]P(B) = P(Ac)P(B) , where
the third equality follows from the independence of Aand B.
b. P(AcBc) = P(Ac)P(AcB) = P(Ac)P(Ac)P(B) = P(Ac)P(Bc).
Second Edition 1-9
1.41 a.
P( dash sent |dash rec)
=P( dash rec |dash sent)P( dash sent)
P( dash rec |dash sent)P( dash sent) + P( dash rec |dot sent)P( dot sent)
=(2/3)(4/7)
(2/3)(4/7) + (1/4)(3/7) = 32/41.
b. By a similar calculation as the one in (a) P(dot sent|dot rec) = 27/434. Then we have
P( dash sent|dot rec) = 16
43 . Given that dot-dot was received, the distribution of the four
possibilities of what was sent are
Event Probability
dash-dash (16/43)2
dash-dot (16/43)(27/43)
dot-dash (27/43)(16/43)
dot-dot (27/43)2
1.43 a. For Boole’s Inequality,
P(n
i=1)
n
X
i=1
P(Ai)P2+P3+··· ± Pn
n
X
i=1
P(Ai)
since PiPjif ijand therefore the terms P2k+P2k+1 0 for k= 1, . . . , n1
2when
nis odd. When nis even the last term to consider is Pn0. For Bonferroni’s Inequality
apply the inclusion-exclusion identity to the Ac
i, and use the argument leading to (1.2.10).
b. We illustrate the proof that the Piare increasing by showing that P2P3. The other
arguments are similar. Write
P2=X
1i<jn
P(AiAj) =
n1
X
i=1
n
X
j=i+1
P(AiAj)
=
n1
X
i=1
n
X
j=i+1 "n
X
k=1
P(AiAjAk) + P(AiAj(kAk)c)#
Now to get to P3we drop terms from this last expression. That is
n1
X
i=1
n
X
j=i+1 "n
X
k=1
P(AiAjAk) + P(AiAj(kAk)c)#
n1
X
i=1
n
X
j=i+1 "n
X
k=1
P(AiAjAk)#
n2
X
i=1
n1
X
j=i+1
n
X
k=j+1
P(AiAjAk) = X
1i<j<kn
P(AiAjAk) = P3.
The sequence of bounds is improving because the bounds P1, P1P2+P3, P1P2+P3P4+
P5, . . ., are getting smaller since PiPjif ijand therefore the terms P2k+P2k+1 0.
The lower bounds P1P2, P1P2+P3P4, P1P2+P3P4+P5P6, . . ., are getting
bigger since PiPjif ijand therefore the terms P2k+1 P2k0.
1-10 Solutions Manual for Statistical Inference
c. If all of the Aiare equal, all of the probabilities in the inclusion-exclusion identity are the
same. Thus
P1=nP (A), P2=n
2P(A), . . . , Pj=n
jP(A),
and the sequence of upper bounds on P(iAi) = P(A) becomes
P1=nP (A), P1P2+P3=nn
2+n
3P(A), . . .
which eventually sum to one, so the last bound is exact. For the lower bounds we get
P1P2=nn
2P(A), P1P2+P3P4=nn
2+n
3n
4P(A), . . .
which start out negative, then become positive, with the last one equaling P(A) (see Schwa-
ger 1984 for details).
1.44 P(at least 10 correct|guessing) = P20
k=10 20
k1
4k3
4nk=.01386.
1.45 Xis finite. Therefore Bis the set of all subsets of X. We must verify each of the three properties
in Definition 1.2.4. (1) If A∈ B then PX(A) = P(xiA{sjS:X(sj) = xi})0 since P
is a probability function. (2) PX(X) = P(m
i=1{sjS:X(sj) = xi}) = P(S) = 1. (3) If
A1, A2, . . . ∈ B and pairwise disjoint then
PX(
k=1Ak) = P(
[
k=1{∪xiAk{sjS:X(sj) = xi}})
=
X
k=1
P(xiAk{sjS:X(sj) = xi}) =
X
k=1
PX(Ak),
where the second inequality follows from the fact the Pis a probability function.
1.46 This is similar to Exercise 1.20. There are 77equally likely sample points. The possible values of
X3are 0, 1 and 2. Only the pattern 331 (3 balls in one cell, 3 balls in another cell and 1 ball in a
third cell) yields X3= 2. The number of sample points with this pattern is 7
27
34
35 = 14,700.
So P(X3= 2) = 14,700/77.0178. There are 4 patterns that yield X3= 1. The number of
sample points that give each of these patterns is given below.
pattern number of sample points
34 77
36 = 1,470
322 77
36
24
22
2= 22,050
3211 77
364
25
22! = 176,400
31111 77
36
44! = 88,200
288,120
So P(X3= 1) = 288,120/77.3498. The number of sample points that yield X3= 0 is
77288,120 14,700 = 520,723, and P(X3= 0) = 520,723/77.6322.
1.47 All of the functions are continuous, hence right-continuous. Thus we only need to check the
limit, and that they are nondecreasing
a. limx→−∞ 1
2+1
πtan1(x) = 1
2+1
ππ
2= 0, limx→∞ 1
2+1
πtan1(x) = 1
2+1
ππ
2= 1, and
d
dx 1
2+1
πtan1(x)=1
1+x2>0, so F(x) is increasing.
b. See Example 1.5.5.
c. limx→−∞ eex= 0, limx→∞ eex= 1, d
dx eex=exeex>0.
d. limx→−∞(1 ex) = 0, limx→∞(1 ex) = 1, d
dx (1 ex) = ex>0.
Second Edition 1-11
e. limy→−∞ 1
1+ey= 0, limy→∞ +1
1+ey= 1, d
dx (1
1+ey) = (1)ey
(1+ey)2>0 and d
dx (+1
1+ey)>
0, FY(y) is continuous except on y= 0 where limy0(+1
1+ey) = F(0). Thus is FY(y) right
continuous.
1.48 If F(·) is a cdf, F(x) = P(Xx). Hence limx→∞ P(Xx) = 0 and limx→−∞ P(Xx) = 1.
F(x) is nondecreasing since the set {x:Xx}is nondecreasing in x. Lastly, as xx0,
P(Xx)P(Xx0), so F(·) is right-continuous. (This is merely a consequence of defining
F(x) with “ ”.)
1.49 For every t,FX(t)FY(t). Thus we have
P(X > t) = 1 P(Xt) = 1 FX(t)1FY(t) = 1 P(Yt) = P(Y > t).
And for some t,FX(t)< FY(t). Then we have that
P(X > t) = 1 P(Xt) = 1 FX(t)>1FY(t) = 1 P(Yt) = P(Y > t).
1.50 Proof by induction. For n= 2
2
X
k=1
tk1= 1 + t=1t2
1t.
Assume true for n, this is Pn
k=1 tk1=1tn
1t. Then for n+ 1
n+1
X
k=1
tk1=
n
X
k=1
tk1+tn=1tn
1t+tn=1tn+tn(1t)
1t=1tn+1
1t,
where the second inequality follows from the induction hypothesis.
1.51 This kind of random variable is called hypergeometric in Chapter 3. The probabilities are
obtained by counting arguments, as follows.
x fX(x) = P(X=x)
05
025
4.30
4.4616
15
125
3.30
4.4196
25
225
2.30
4.1095
35
325
1.30
4.0091
45
425
0.30
4.0002
The cdf is a step function with jumps at x= 0,1,2,3 and 4.
1.52 The function g(·) is clearly positive. Also,
Z
x0
g(x)dx =Z
x0
f(x)
1F(x0)dx =1F(x0)
1F(x0)= 1.
1.53 a. limy→−∞ FY(y) = limy→−∞ 0 = 0 and limy→∞ FY(y) = limy→∞ 11
y2= 1. For y1,
FY(y) = 0 is constant. For y > 1, d
dy FY(y) = 2/y3>0, so FYis increasing. Thus for all y,
FYis nondecreasing. Therefore FYis a cdf.
b. The pdf is fY(y) = d
dy FY(y) = 2/y3if y > 1
0 if y1.
c. FZ(z) = P(Zz) = P(10(Y1) z) = P(Y(z/10) + 1) = FY((z/10) + 1). Thus,
FZ(z) = (0 if z0
11
[(z/10)+1]2if z > 0.
1-12 Solutions Manual for Statistical Inference
1.54 a. Rπ/2
0sin xdx = 1. Thus, c= 1/1 = 1.
b. R
−∞ e−|x|dx =R0
−∞ exdx +R
0exdx = 1 + 1 = 2. Thus, c= 1/2.
1.55
P(V5) = P(T < 3) = Z3
0
1
1.5et/1.5dt = 1 e2.
For v6,
P(Vv) = P(2Tv) = PTv
2=Zv
2
0
1
1.5et/1.5dt = 1 ev/3.
Therefore,
P(Vv) = (0−∞ < v < 0,
1e20v < 6 ,
1ev/36v
.
Chapter 2
Transformations and Expectations
2.1 a. fx(x) = 42x5(1 x), 0 <x<1; y=x3=g(x), monotone, and Y= (0,1). Use Theorem
2.1.5.
fY(y) = fx(g1(y))
d
dy g1(y)
=fx(y1/3)d
dy (y1/3) = 42y5/3(1 y1/3)(1
3y2/3)
= 14y(1 y1/3) = 14y14y4/3,0< y < 1.
To check the integral,
Z1
0
(14y14y4/3)dy = 7y214y7/3
7/3
1
0
= 7y26y7/3
1
0= 1 0 = 1.
b. fx(x) = 7e7x, 0 < x < ,y= 4x+ 3, monotone, and Y= (3,). Use Theorem 2.1.5.
fY(y) = fx(y3
4)
d
dy (y3
4)
= 7e(7/4)(y3)
1
4
=7
4e(7/4)(y3),3< y < .
To check the integral,
Z
3
7
4e(7/4)(y3)dy =e(7/4)(y3)
3= 0 (1) = 1.
c. FY(y) = P(0 Xy) = FX(y). Then fY(y) = 1
2yfX(y). Therefore
fY(y) = 1
2y30(y)2(1 y)2= 15y1
2(1 y)2,0< y < 1.
To check the integral,
Z1
0
15y1
2(1 y)2dy =Z1
0
(15y1
230y+ 15y3
2)dy = 15(2
3)30(1
2) + 15(2
5) = 1.
2.2 In all three cases, Theorem 2.1.5 is applicable and yields the following answers.
a. fY(y) = 1
2y1/2, 0 < y < 1.
b. fY(y) = (n+m+1)!
n!m!ey(n+1)(1 ey)m, 0 < y < .
c. fY(y) = 1
σ2
log y
ye(1/2)((log y))2, 0 < y < .
2.3 P(Y=y) = P(X
X+1 =y) = P(X=y
1y) = 1
3(2
3)y/(1y), where y= 0,1
2,2
3,3
4, . . . , x
x+1 , . . . .
2.4 a. f(x) is a pdf since it is positive and
Z
−∞
f(x)dx =Z0
−∞
1
2λeλxdx +Z
0
1
2λeλxdx =1
2+1
2= 1.
2-2 Solutions Manual for Statistical Inference
b. Let Xbe a random variable with density f(x).
P(X < t) = (Rt
−∞
1
2λeλxdx if t < 0
R0
−∞
1
2λeλxdx+Rt
0
1
2λeλxdx if t0
where, Rt
−∞
1
2λeλxdx =1
2eλx
t
−∞ =1
2eλt and Rt
0
1
2λeλxdx =1
2eλx
t
0=1
2eλt +1
2.
Therefore,
P(X < t) = 1
2eλt if t < 0
11
2eλtdx if t0
c. P(|X|< t) = 0 for t < 0, and for t0,
P(|X|< t) = P(t < X < t) = Z0
t
1
2λeλxdx +Zt
0
1
2λeλxdx
=1
21eλt+1
2eλt+1= 1 eλt.
2.5 To apply Theorem 2.1.8. Let A0={0},A1= (0,π
2), A3= (π, 3π
2) and A4= (3π
2,2π). Then
gi(x) = sin2(x) on Aifor i= 1,2,3,4. Therefore g1
1(y) = sin1(y), g1
2(y) = πsin1(y),
g1
3(y) = sin1(y) + πand g1
4(y) = 2πsin1(y). Thus
fY(y) = 1
2π
1
1y
1
2y
+1
2π1
1y
1
2y
+1
2π
1
1y
1
2y
+1
2π1
1y
1
2y
=1
πpy(1 y),0y1
To use the cdf given in (2.1.6) we have that x1= sin1(y) and x2=πsin1(y). Then by
differentiating (2.1.6) we obtain that
fY(y)=2fX(sin1(y)d
dy (sin1(y)2fX(πsin1(y)d
dy (πsin1(y)
= 2( 1
2π
1
1y
1
2y)2( 1
2π1
1y
1
2y)
=1
πpy(1 y)
2.6 Theorem 2.1.8 can be used for all three parts.
a. Let A0={0},A1= (−∞,0) and A2= (0,). Then g1(x) = |x|3=x3on A1and
g2(x) = |x|3=x3on A2. Use Theorem 2.1.8 to obtain
fY(y) = 1
3ey1/3y2/3,0< y <
.
b. Let A0={0},A1= (1,0) and A2= (0,1). Then g1(x) = 1 x2on A1and g2(x) = 1 x2
on A2. Use Theorem 2.1.8 to obtain
fY(y) = 3
8(1 y)1/2+3
8(1 y)1/2,0< y < 1
.
Second Edition 2-3
c. Let A0={0},A1= (1,0) and A2= (0,1). Then g1(x) = 1 x2on A1and g2(x) = 1 x
on A2. Use Theorem 2.1.8 to obtain
fY(y) = 3
16(1 p1y)21
1y+3
8(2 y)2,0< y < 1
.
2.7 Theorem 2.1.8 does not directly apply.
a. Theorem 2.1.8 does not directly apply. Instead write
P(Yy) = P(X2y)
=P(yXy) if |x| ≤ 1
P(1 Xy) if x1
=(Ry
yfX(x)dx if |x| ≤ 1
Ry
1fX(x)dx if x1.
Differentiation gives
fy(y) = (2
9
1
yif y1
1
9+1
9
1
yif y1.
b. If the sets B1, B2, . . . , BKare a partition of the range of Y, we can write
fY(y) = X
k
fY(y)I(yBk)
and do the transformation on each of the Bk. So this says that we can apply Theorem 2.1.8
on each of the Bkand add up the pieces. For A1= (1,1) and A2= (1,2) the calculations
are identical to those in part (a). (Note that on A1we are essentially using Example 2.1.7).
2.8 For each function we check the conditions of Theorem 1.5.3.
a. (i) limx0F(x) = 1 e0= 0, limx→−∞ F(x) = 1 e−∞ = 1.
(ii) 1 exis increasing in x.
(iii) 1 exis continuous.
(iv) F1
x(y) = log(1 y).
b. (i) limx→−∞ F(x) = e−∞/2 = 0, limx→∞ F(x) = 1 (e1−∞/2) = 1.
(ii) ex/2is increasing, 1/2 is nondecreasing, 1 (e1x/2) is increasing.
(iii) For continuity we only need check x= 0 and x= 1, and limx0F(x) = 1/2,
limx1F(x) = 1/2, so Fis continuous.
(iv)
F1
X(y) = log(2y) 0 y < 1
2y < 1,
1log(2(1 y)) 1
2y < 1
c. (i) limx→−∞ F(x) = e−∞/4 = 0, limx→∞ F(x) = 1 e−∞/4 = 1.
(ii) ex/4 and 1 ex/4 are both increasing in x.
(iii) limx0F(x) = 1 e0/4 = 3
4=F(0), so Fis right-continuous.
(iv) F1
X(y) = log(4y) 0 y < 1
4
log(4(1 y)) 1
4y < 1
2-4 Solutions Manual for Statistical Inference
2.9 From the probability integral transformation, Theorem 2.1.10, we know that if u(x) = Fx(x),
then Fx(X)uniform(0,1). Therefore, for the given pdf, calculate
u(x) = Fx(x) = (0 if x1
(x1)2/4 if 1 < x < 3
1 if 3 x
.
2.10 a. We prove part b), which is equivalent to part a).
b. Let Ay={x:Fx(x)y}. Since Fxis nondecreasing, Ayis a half infinite interval, either
open, say (−∞, xy), or closed, say (−∞, xy]. If Ayis closed, then
FY(y) = P(Yy) = P(Fx(X)y) = P(XAy) = Fx(xy)y.
The last inequality is true because xyAy, and Fx(x)yfor every xAy. If Ayis open,
then
FY(y) = P(Yy) = P(Fx(X)y) = P(XAy),
as before. But now we have
P(XAy) = P(X(− ∞,xy)) = lim
xyP(X(−∞, x]),
Use the Axiom of Continuity, Exercise 1.12, and this equals limxyFX(x)y. The last
inequality is true since Fx(x)yfor every xAy, that is, for every x<xy. Thus,
FY(y)yfor every y. To get strict inequality for some y, let ybe a value that is “jumped
over” by Fx. That is, let ybe such that, for some xy,
lim
xyFX(x)< y < FX(xy).
For such a y,Ay= (−∞, xy), and FY(y) = limxyFX(x)< y.
2.11 a. Using integration by parts with u=xand dv =xe
x2
2dx then
EX2=Z
−∞
x21
2πe
x2
2dx =1
2π"xe
x2
2
−∞
+Z
−∞
e
x2
2dx#=1
2π(2π) = 1.
Using example 2.1.7 let Y=X2. Then
fY(y) = 1
2y1
2πe
y
2+1
2πe
y
2=1
2πy e
y
2.
Therefore,
EY=Z
0
y
2πy e
y
2dy =1
2π2y1
2e
y
2
0+Z
0
y
1
2e
y
2dy=1
2π(2π) = 1.
This was obtained using integration by parts with u= 2y1
2and dv =1
2e
y
2and the fact the
fY(y) integrates to 1.
b. Y=|X|where −∞ < x < . Therefore 0 < y < . Then
FY(y) = P(Yy) = P(|X| ≤ y) = P(yXy)
=P(xy)P(X≤ −y) = FX(y)FX(y).
Second Edition 2-5
Therefore,
FY(y) = d
dy FY(y) = fX(y) + fX(y) = 1
2πe
y
2+1
2πe
y
2=r2
πe
y
2.
Thus,
EY=Z
0
yr2
πe
y
2dy =r2
πZ
0
eudu =r2
πeu
0=r2
π,
where u=y2
2.
EY2=Z
0
y2r2
πe
y
2dy =r2
πye
y
2
0+Z
0
e
y
2dy=r2
πrπ
2= 1.
This was done using integration by part with u=yand dv =ye
y
2dy. Then Var(Y) = 12
π.
2.12 We have tan x=y/d, therefore tan1(y/d) = xand d
dy tan1(y/d) = 1
1+(y/d)2
1
ddy =dx. Thus,
fY(y) = 2
πd
1
1+(y/d)2,0< y < .
This is the Cauchy distribution restricted to (0,), and the mean is infinite.
2.13 P(X=k) = (1 p)kp+pk(1 p), k= 1,2, . . .. Therefore,
EX=
X
k=1
k[(1 p)kp+pk(1 p)] = (1 p)p"
X
k=1
k(1 p)k1+
X
k=1
kpk1#
= (1 p)p1
p2+1
(1 p)2=12p+ 2p2
p(1 p).
2.14
Z
0
(1 FX(x))dx =Z
0
P(X > x)dx
=Z
0Z
x
fX(y)dydx
=Z
0Zy
0
dxfX(y)dy
=Z
0
yfX(y)dy = EX,
where the last equality follows from changing the order of integration.
2.15 Assume without loss of generality that XY. Then XY=Yand XY=X. Thus
X+Y= (XY)+(XY). Taking expectations
E[X+Y] = E[(XY)+(XY)] = E(XY) + E(XY).
Therefore E(XY) = EX+ EYE(XY).
2.16 From Exercise 2.14,
ET=Z
0aeλt+(1 a)eµtdt =aeλt
λ(1 a)eµt
µ
0
=a
λ+1a
µ.
2-6 Solutions Manual for Statistical Inference
2.17 a. Rm
03x2dx =m3set
=1
2m=1
21/3=.794.
b. The function is symmetric about zero, therefore m= 0 as long as the integral is finite.
1
πZ
−∞
1
1+x2dx =1
πtan1(x)
−∞
=1
ππ
2+π
2= 1.
This is the Cauchy pdf.
2.18 E|Xa|=R
−∞ |xa|f(x)dx =Ra
−∞ (xa)f(x)dx +R
a(xa)f(x)dx. Then,
d
daE|Xa|=Za
−∞
f(x)dx Z
a
f(x)dx set
= 0.
The solution to this equation is a= median. This is a minimum since d2/da2E|Xa|= 2f(a)>
0.
2.19
d
daE(Xa)2=d
da Z
−∞
(xa)2fX(x)dx =Z
−∞
d
da(xa)2fX(x)dx
=Z
−∞ 2(xa)fX(x)dx =2Z
−∞
xfX(x)dx aZ
−∞
fX(x)dx
=2[EXa].
Therefore if d
da E(Xa)2= 0 then 2[EXa] = 0 which implies that EX=a. If EX=athen
d
da E(Xa)2=2[EXa] = 2[aa] = 0. EX=ais a minimum since d2/da2E(Xa)2=
2>0. The assumptions that are needed are the ones listed in Theorem 2.4.3.
2.20 From Example 1.5.4, if X= number of children until the first daughter, then
P(X=k) = (1 p)k1p,
where p = probability of a daughter. Thus Xis a geometric random variable, and
EX=
X
k=1
k(1 p)k1p=p
X
k=1
d
dp(1 p)k=pd
dp "
X
k=0
(1 p)k1#
=pd
dp 1
p1=1
p.
Therefore, if p = 1
2,the expected number of children is two.
2.21 Since g(x) is monotone
Eg(X) = Z
−∞
g(x)fX(x)dx =Z
−∞
yfX(g1(y)) d
dy g1(y)dy =Z
−∞
yfY(y)dy = EY,
where the second equality follows from the change of variable y=g(x), x=g1(y) and
dx =d
dy g1(y)dy.
2.22 a. Using integration by parts with u=xand dv =xex22we obtain that
Z
0
x2ex22dx2=β2
2Z
0
ex22dx.
The integral can be evaluated using the argument on pages 104-105 (see 3.3.14) or by trans-
forming to a gamma kernel (use y=λ22). Therefore, R
0ex22dx =πβ/2 and hence
the function integrates to 1.
Second Edition 2-7
b. EX= 2β/πEX2= 3β2/2 VarX=β23
24
π.
2.23 a. Use Theorem 2.1.8 with A0={0},A1= (1,0) and A2= (0,1). Then g1(x) = x2on A1
and g2(x) = x2on A2. Then
fY(y) = 1
2y1/2,0< y < 1.
b. EY=R1
0yfY(y)dy =1
3EY2=R1
0y2fY(y)dy =1
5VarY=1
51
32=4
45 .
2.24 a. EX=R1
0xaxa1dx =R1
0axadx =axa+1
a+1
1
0=a
a+1 .
EX2=R1
0x2axa1dx =R1
0axa+1dx =axa+2
a+2
1
0=a
a+2 .
VarX=a
a+2 a
a+1 2=a
(a+2)(a+1)2.
b. EX=Pn
x=1 x
n=1
nPn
x=1 x=1
n
n(n+1)
2=n+1
2.
EX2=Pn
i=1 x2
n=1
nPn
i=1 x2=1
n
n(n+1)(2n+1)
6=(n+1)(2n+1)
6.
VarX=(n+1)(2n+1)
6n+1
22=2n2+3n+1
6n2+2n+1
4=n2+1
12 .
c. EX=R2
0x3
2(x1)2dx =3
2R2
0(x32x2+x)dx = 1.
EX2=R2
0x23
2(x1)2dx =3
2R2
0(x42x3+x2)dx =8
5.
VarX=8
512=3
5.
2.25 a. Y=Xand g1(y) = y. Thus fY(y) = fX(g1(y))|d
dy g1(y)|=fX(y)| − 1|=fX(y)
for every y.
b. To show that MX(t) is symmetric about 0 we must show that MX(0 + ) = MX(0 ) for
all  > 0.
MX(0 + ) = Z
−∞
e(0+)xfX(x)dx =Z0
−∞
exfX(x)dx +Z
0
exfX(x)dx
=Z
0
e(x)fX(x)dx +Z0
−∞
e(x)fX(x)dx =Z
−∞
exfX(x)dx
=Z
−∞
e(0)xfX(x)dx =MX(0 ).
2.26 a. There are many examples; here are three. The standard normal pdf (Example 2.1.9) is
symmetric about a= 0 because (0 )2= (0 + )2. The Cauchy pdf (Example 2.2.4) is
symmetric about a= 0 because (0 )2= (0 + )2. The uniform(0,1) pdf (Example 2.1.4)
is symmetric about a= 1/2 because
f((1/2) + ) = f((1/2) ) = 1 if 0 <  < 1
2
0 if 1
2 < .
b.
Z
a
f(x)dx =Z
0
f(a+)d (change variable, =xa)
=Z
0
f(a)d (f(a+) = f(a) for all  > 0)
=Za
−∞
f(x)dx. (change variable, x=a)
2-8 Solutions Manual for Statistical Inference
Since
Za
−∞
f(x)dx +Z
a
f(x)dx =Z
−∞
f(x)dx = 1,
it must be that
Za
−∞
f(x)dx =Z
a
f(x)dx = 1/2.
Therefore, ais a median.
c.
EXa= E(Xa) = Z
−∞
(xa)f(x)dx
=Za
−∞
(xa)f(x)dx +Z
a
(xa)f(x)dx
=Z
0
()f(a)d +Z
0
f(a+)d
With a change of variable, =axin the first integral, and =xain the second integral
we obtain that
EXa= E(Xa)
=Z
0
f(a)d +Z
0
f(a)d (f(a+) = f(a) for all  > 0)
= 0.(two integrals are same)
Therefore, EX=a.
d. If a >  > 0,
f(a) = e(a)> e(a+)=f(a+).
Therefore, f(x) is not symmetric about a > 0. If  < a 0,
f(a) = 0 < e(a+)=f(a+).
Therefore, f(x) is not symmetric about a0, either.
e. The median of X= log 2 <1 = EX.
2.27 a. The standard normal pdf.
b. The uniform on the interval (0,1).
c. For the case when the mode is unique. Let abe the point of symmetry and bbe the mode. Let
assume that ais not the mode and without loss of generality that a=b+ > b for  > 0. Since
bis the mode then f(b)> f(b+)f(b+ 2) which implies that f(a)> f(a)f(a+)
which contradict the fact the f(x) is symmetric. Thus ais the mode.
For the case when the mode is not unique, there must exist an interval (x1, x2) such that
f(x) has the same value in the whole interval, i.e, f(x) is flat in this interval and for all
b(x1, x2), bis a mode. Let assume that a6∈ (x1, x2), thus ais not a mode. Let also assume
without loss of generality that a= (b+)> b. Since bis a mode and a= (b+)6∈ (x1, x2)
then f(b)> f(b+)f(b+ 2) which contradict the fact the f(x) is symmetric. Thus
a(x1, x2) and is a mode.
d. f(x) is decreasing for x0, with f(0) > f(x)> f(y) for all 0 < x < y. Thus f(x) is
unimodal and 0 is the mode.
Second Edition 2-9
2.28 a.
µ3=Z
−∞
(xa)3f(x)dx =Za
−∞
(xa)3f(x)dx +Z
a
(xa)3f(x)dx
=Z0
−∞
y3f(y+a)dy +Z
0
y3f(y+a)dy (change variable y=xa)
=Z
0y3f(y+a)dy +Z
0
y3f(y+a)dy
= 0.(f(y+a) = f(y+a))
b. For f(x) = ex,µ1=µ2= 1, therefore α3=µ3.
µ3=Z
0
(x1)3exdx =Z
0
(x33x2+ 3x1)exdx
= Γ(4) 3Γ(3) + 3Γ(2) Γ(1) = 3! 3×2! + 3 ×11 = 3.
c. Each distribution has µ1= 0, therefore we must calculate µ2= EX2and µ4= EX4.
(i) f(x) = 1
2πex2/2, µ2= 1, µ4= 3, α4= 3.
(ii) f(x) = 1
2,1< x < 1, µ2=1
3,µ4=1
5,α4=9
5.
(iii) f(x) = 1
2e−|x|,−∞ < x < , µ2= 2, µ4= 24, α4= 6.
As a graph will show, (iii) is most peaked, (i) is next, and (ii) is least peaked.
2.29 a. For the binomial
EX(X1) =
n
X
x=2
x(x1)n
xpx(1 p)nx
=n(n1)p2
n
X
x=2 n2
xpx2(1 p)nx
=n(n1)p2
n2
X
y=0 n2
ypy(1 p)n2y=n(n1)p2,
where we use the identity x(x1)n
x=n(n1)n2
x, substitute y=x2 and recognize
that the new sum is equal to 1. Similarly, for the Poisson
EX(X1) =
X
x=2
x(x1)eλλx
x!=λ2
X
y=0
eλλy
y!=λ2,
where we substitute y=x2.
b. Var(X) = E[X(X1)] + EX(EX)2. For the binomial
Var(X) = n(n1)p2+np (np)2=np(1 p).
For the Poisson
Var(X) = λ2+λλ2=λ.
c.
EY=
n
X
y=0
ya
y+an
ya+b1
a
n+a+b1
y+a=
n
X
y=1
na
(y1) + (a+ 1)n1
y1a+b1
a
(n1)+(a+1)+b1
(y1)+(a+1)
2-10 Solutions Manual for Statistical Inference
=
n
X
y=1
na
(y1) + (a+ 1)n1
y1a+b1
a
(n1)+(a+1)+b1
(y1)+(a+1)
=
na
a+1 a+b1
a
a+1+b1
a+1
n
X
y=1
a+ 1
(y1) + (a+ 1)n1
y1a+1+b1
a+1
(n1)+(a+1)+b1
(y1)+(a+1)
=na
a+b
n1
X
j=0
a+ 1
j+ (a+ 1)n1
ja+1+b1
a+1
(n1)+(a+1)+b1
(j+(a+1) =na
a+b,
since the last summation is 1, being the sum over all possible values of a beta-binomial(n
1, a + 1, b). E[Y(Y1)] = n(n1)a(a+1)
(a+b)(a+b+1) is calculated similar to EY, but using the identity
y(y1)n
y=n(n1)n2
y2and adding 2 instead of 1 to the parameter a. The sum over all
possible values of abeta-binomial(n2, a + 2, b) will appear in the calculation. Therefore
Var(Y) = E[Y(Y1)] + EY(EY)2=nab(n+a+b)
(a+b)2(a+b+ 1).
2.30 a. E(etX ) = Rc
0etx 1
cdx =1
ct etx
c
0=1
ct etc 1
ct 1 = 1
ct (etc 1).
b. E(etX ) = Rc
0
2x
c2etxdx =2
c2t2(ctetc etc + 1).(integration-by-parts)
c.
E(etx) = Zα
−∞
1
2βe(xα)etxdx +Z
α
1
2βe(xα)etxdx
=eα/β
2β
1
(1
β+t)ex(1
β+t)
α
−∞
+eα/β
2β
1
(1
βt)ex(1
βt)
α
=4eαt
4β2t2,2 < t < 2.
d. E etX =P
x=0 etxr+x1
xpr(1 p)x=prP
x=0 r+x1
x(1 p)etx.Now use the fact
that P
x=0 r+x1
x(1 p)etx1(1 p)etr= 1 for (1 p)et<1, since this is just the
sum of this pmf, to get E(etX ) = p
1(1p)etr, t < log(1 p).
2.31 Since the mgf is defined as MX(t) = EetX , we necessarily have MX(0) = Ee0= 1.But t/(1 t)
is 0 at t= 0, therefore it cannot be an mgf.
2.32
d
dtS(t)t=0
=d
dt (log(Mx(t))t=0
=
d
dt Mx(t)
Mx(t)t=0
=EX
1= EXsince MX(0) = Ee0= 1
d2
dt2S(t)t=0
=d
dt M0
x(t)
Mx(t)t=0
=Mx(t)M00
x(t)[M0
x(t)]2
[Mx(t)]2t=0
=1·EX2(EX)2
1= VarX.
2.33 a. MX(t) = P
x=0 etx eλλx
x!=eλP
x=1
(etλ)x
x!=eλeλet=eλ(et1).
EX=d
dt Mx(t)t=0 =eλ(et1)λett=0 =λ.
Second Edition 2-11
EX2=d2
dt2Mx(t)t=0 =λeteλ(et1)λet+λeteλ(et1)t=0 =λ2+λ.
VarX= EX2(EX)2=λ2+λλ2=λ.
b.
Mx(t) =
X
x=0
etxp(1 p)x=p
X
x=0
((1 p)et)x
=p1
1(1 p)et=p
1(1 p)et, t < log(1 p).
EX=d
dtMx(t)t=0
=p
(1 (1 p)et)2(1 p)ett=0
=p(1 p)
p2=1p
p.
EX2=d2
dt2Mx(t)t=0
=1(1 p)et2p(1 p)et+p(1 p)et21(1 p)et(1 p)et
(1 (1 p)et)4t=0
=p3(1 p)+2p2(1 p)2
p4=p(1 p) + 2(1 p)2
p2.
VarX=p(1 p) + 2(1 p)2
p2(1 p)2
p2=1p
p2.
c. Mx(t) = R
−∞ etx 1
2πσ e(xµ)2/2σ2dx =1
2πσ R
−∞ e(x22µx2σ2tx+µ2)/2σ2dx. Now com-
plete the square in the numerator by writing
x22µx 2σ2tx+µ2=x22(µ+σ2t)x±(µ+σ2t)2+µ2
= (x(µ+σ2t))2(µ+σ2t)2+µ2
= (x(µ+σ2t))2[2µσ2t+ (σ2t)2].
Then we have Mx(t) = e[2µσ2t+(σ2t)2]/2σ21
2πσ R
−∞ e1
2σ2(x(µ+σ2t))2dx =eµt+σ2t2
2.
EX=d
dt Mx(t)t=0 = (µ+σ2t)eµt+σ2t2/2t=0 =µ.
EX2=d2
dt2Mx(t)t=0 = (µ+σ2t)2eµt+σ2t2/2+σ2eµt+σ2t/2t=0 =µ2+σ2.
VarX=µ2+σ2µ2=σ2.
2.35 a.
EXr
1=Z
0
xr1
2πxe(log x)2/2dx (f1is lognormal with µ= 0, σ2= 1)
=1
2πZ
−∞
ey(r1)ey2/2eydy (substitute y= log x, dy = (1/x)dx)
=1
2πZ
−∞
ey2/2+ry dy =1
2πZ
−∞
e(y22ry+r2)/2er2/2dy
=er2/2.
2-12 Solutions Manual for Statistical Inference
b.
Z
0
xrf1(x) sin(2πlog x)dx =Z
0
xr1
2πxe(log x)2/2sin(2πlog x)dx
=Z
−∞
e(y+r)r1
2πe(y+r)2/2sin(2πy + 2πr)dy
(substitute y= log x, dy = (1/x)dx)
=Z
−∞
1
2πe(r2y2)/2sin(2πy)dy
(sin(a+ 2πr) = sin(a) if r= 0,1,2, . . .)
= 0,
because e(r2y2)/2sin(2πy) = e(r2(y)2)/2sin(2π(y)); the integrand is an odd function
so the negative integral cancels the positive one.
2.36 First, it can be shown that
lim
x→∞ etx(log x)2=
by using l’Hˆopital’s rule to show
lim
x→∞
tx (log x)2
tx = 1,
and, hence,
lim
x→∞ tx (log x)2= lim
x→∞ tx =.
Then for any k > 0, there is a constant csuch that
Z
k
1
xetxe( log x)2/2dx cZ
k
1
xdx =clog x|
k=.
Hence Mx(t) does not exist.
2.37 a. The graph looks very similar to Figure 2.3.2 except that f1is symmetric around 0 (since it
is standard normal).
b. The functions look like t2/2 – it is impossible to see any difference.
c. The mgf of f1is eK1(t). The mgf of f2is eK2(t).
d. Make the transformation y=exto get the densities in Example 2.3.10.
2.39 a. d
dx Rx
0eλtdt =eλx. Verify
d
dx Zx
0
eλtdt=d
dx 1
λeλt
x
0=d
dx 1
λeλx +1
λ=eλx.
b. d
R
0eλtdt =R
0
d
eλtdt =R
0teλtdt =Γ(2)
λ2=1
λ2. Verify
d
Z
0
eλtdt =d
1
λ=1
λ2.
c. d
dt R1
t
1
x2dx =1
t2. Verify
d
dt Z1
t
1
x2dx=d
dt 1
x
1
t!=d
dt 1 + 1
t=1
t2.
d. d
dt R
1
1
(xt)2dx =R
1
d
dt 1
(xt)2dx =R
12(xt)3dx =(xt)2
1=1
(1t)2. Verify
d
dt Z
1
(xt)2dx =d
dt h(xt)1
1i=d
dt
1
1t=1
(1 t)2.
Chapter 3
Common Families of Distributions
3.1 The pmf of Xis f(x) = 1
N1N0+1 ,x=N0, N0+ 1, . . . , N1. Then
EX=
N1
X
x=N0
x1
N1N0+1 =1
N1N0+1 N1
X
x=1
x
N01
X
x=1
x!
=1
N1N0+1 N1(N1+1)
2(N01)(N01 + 1)
2
=N1+N0
2.
Similarly, using the formula for PN
1x2, we obtain
Ex2=1
N1N0+1 N1(N1+1)(2N1+1) N0(N01)(2N01)
6
VarX= EX2EX=(N1N0)(N1N0+2)
12 .
3.2 Let X= number of defective parts in the sample. Then Xhypergeometric(N= 100, M, K)
where M= number of defectives in the lot and K= sample size.
a. If there are 6 or more defectives in the lot, then the probability that the lot is accepted
(X= 0) is at most
P(X= 0 |M= 100, N = 6, K) = 6
094
K
100
K=(100 K)· ··· ·(100 K5)
100 · ··· · 95 .
By trial and error we find P(X= 0) = .10056 for K= 31 and P(X= 0) = .09182 for
K= 32. So the sample size must be at least 32.
b. Now P(accept lot) = P(X= 0 or 1), and, for 6 or more defectives, the probability is at
most
P(X= 0 or 1 |M= 100, N = 6, K) = 6
094
K
100
K+6
1 94
K1
100
K.
By trial and error we find P(X= 0 or 1) = .10220 for K= 50 and P(X= 0 or 1) = .09331
for K= 51. So the sample size must be at least 51.
3.3 In the seven seconds for the event, no car must pass in the last three seconds, an event with
probability (1 p)3. The only occurrence in the first four seconds, for which the pedestrian
does not wait the entire four seconds, is to have a car pass in the first second and no other
car pass. This has probability p(1 p)3. Thus the probability of waiting exactly four seconds
before starting to cross is [1 p(1 p)3](1 p)3.
3-2 Solutions Manual for Statistical Inference
3.5 Let X= number of effective cases. If the new and old drugs are equally effective, then the
probability that the new drug is effective on a case is .8. If the cases are independent then X
binomial(100, .8), and
P(X85) =
100
X
x=85 100
x.8x.2100x=.1285.
So, even if the new drug is no better than the old, the chance of 85 or more effective cases is
not too small. Hence, we cannot conclude the new drug is better. Note that using a normal
approximation to calculate this binomial probability yields P(X85) P(Z1.125) =
.1303.
3.7 Let XPoisson(λ). We want P(X2) .99, that is,
P(X1) = eλ+λeλ.01.
Solving eλ+λeλ=.01 by trial and error (numerical bisection method) yields λ= 6.6384.
3.8 a. We want P(X > N)< .01 where Xbinomial(1000,1/2). Since the 1000 customers choose
randomly, we take p= 1/2. We thus require
P(X > N) =
1000
X
x=N+1 1000
x1
2x11
21000x
< .01
which implies that
1
21000 1000
X
x=N+1 1000
x< .01.
This last inequality can be used to solve for N, that is, Nis the smallest integer that satisfies
1
21000 1000
X
x=N+1 1000
x< .01.
The solution is N= 537.
b. To use the normal approximation we take Xn(500,250), where we used µ= 1000( 1
2) = 500
and σ2= 1000(1
2)(1
2) = 250.Then
P(X > N) = PX500
250 >N500
250 < .01
thus,
PZ > N500
250 < .01
where Zn(0,1). From the normal table we get
P(Z > 2.33) .0099 < .01 N500
250 = 2.33
N537.
Therefore, each theater should have at least 537 seats, and the answer based on the approx-
imation equals the exact answer.
Second Edition 3-3
3.9 a. We can think of each one of the 60 children entering kindergarten as 60 independent Bernoulli
trials with probability of success (a twin birth) of approximately 1
90 . The probability of having
5 or more successes approximates the probability of having 5 or more sets of twins entering
kindergarten. Then Xbinomial(60,1
90 ) and
P(X5) = 1
4
X
x=0 60
x 1
90x11
9060x
=.0006,
which is small and may be rare enough to be newsworthy.
b. Let Xbe the number of elementary schools in New York state that have 5 or more sets
of twins entering kindergarten. Then the probability of interest is P(X1) where X
binomial(310,.0006). Therefore P(X1) = 1 P(X= 0) = .1698.
c. Let Xbe the number of States that have 5 or more sets of twins entering kindergarten
during any of the last ten years. Then the probability of interest is P(X1) where X
binomial(500, .1698). Therefore P(X1) = 1 P(X= 0) = 1 3.90 ×1041 1.
3.11 a.
lim
M/Np,M→∞,N→∞ M
xNM
Kx
N
K
=K!
x!(Kx)! lim
M/Np,M→∞,N→∞
M!(NM)!(NK)!
N!(Mx)!(NM(Kx))!
In the limit, each of the factorial terms can be replaced by the approximation from Stirling’s
formula because, for example,
M! = (M!/(2πMM+1/2eM))2πMM+1/2eM
and M!/(2πMM+1/2eM)1. When this replacement is made, all the 2πand expo-
nential terms cancel. Thus,
lim
M/Np,M→∞,N→∞ M
xNM
Kx
N
K
=K
xlim
M/Np,M→∞,N→∞
MM+1/2(NM)NM+1/2(NK)NK+1/2
NN+1/2(Mx)Mx+1/2(NMK+x)NM(Kx)+1/2.
We can evaluate the limit by breaking the ratio into seven terms, each of which has a finite
limit we can evaluate. In some limits we use the fact that M→ ∞,N→ ∞ and M/N p
imply NM→ ∞. The first term (of the seven terms) is
lim
M→∞ M
MxM
= lim
M→∞
1
Mx
MM= lim
M→∞
1
1+x
MM=1
ex=ex.
Lemma 2.3.14 is used to get the penultimate equality. Similarly we get two more terms,
lim
NM→∞ NM
NM(Kx)NM
=eKx
and
lim
N→∞ NK
NN
=eK.
3-4 Solutions Manual for Statistical Inference
Note, the product of these three limits is one. Three other terms are
lim M→ ∞M
Mx1/2
= 1
lim
NM→∞ NM
NM(Kx)1/2
= 1
and
lim
N→∞ NK
N1/2
= 1.
The only term left is
lim
M/Np,M→∞,N→∞
(Mx)x(NM(Kx))Kx
(NK)K
= lim
M/Np,M→∞,N→∞ Mx
NKxNM(Kx)
NKKx
=px(1 p)Kx.
b. If in (a) we in addition have K→ ∞,p0, MK/N pK λ, by the Poisson approxi-
mation to the binomial, we heuristically get
M
xNM
Kx
N
KK
xpx(1 p)Kxeλλx
x!.
c. Using Stirling’s formula as in (a), we get
lim
N,M,K→∞,M
N0,KM
NλM
xNM
Kx
N
K
= lim
N,M,K→∞,M
N0,KM
Nλ
ex
x!
KxexMxex(NM)KxeKx
NKeK
=1
x!lim
N,M,K→∞,M
N0,KM
NλKM
NxNM
NKx
=1
x!λxlim
N,M,K→∞,M
N0,KM
Nλ 1
MK
N
K!K
=eλλx
x!.
3.12 Consider a sequence of Bernoulli trials with success probability p. Define X= number of
successes in first ntrials and Y= number of failures before the rth success. Then Xand Y
have the specified binomial and hypergeometric distributions, respectively. And we have
Fx(r1) = P(Xr1)
=P(rth success on (n+ 1)st or later trial)
=P(at least n+ 1 rfailures before the rth success)
=P(Ynr+ 1)
= 1 P(Ynr)
= 1 FY(nr).
Second Edition 3-5
3.13 For any Xwith support 0,1, . . ., we have the mean and variance of the 0truncated XTare
given by
EXT=
X
x=1
xP (XT=x) =
X
x=1
xP(X=x)
P(X > 0)
=1
P(X > 0)
X
x=1
xP (X=x) = 1
P(X > 0)
X
x=0
xP (X=x) = EX
P(X > 0).
In a similar way we get EX2
T=EX2
P(X>0) .Thus,
VarXT=EX2
P(X > 0) EX
P(X > 0)2
.
a. For Poisson(λ), P(X > 0) = 1 P(X=0)=1eλλ0
0! = 1 eλ, therefore
P(XT=x) = eλλx
x!(1eλ)x= 1,2, . . .
EXT=λ/(1 eλ)
VarXT= (λ2+λ)/(1 eλ)(λ/(1 eλ))2.
b. For negative binomial(r, p), P(X > 0) = 1 P(X= 0) = 1 r1
0pr(1 p)0= 1 pr. Then
P(XT=x) = r+x1
xpr(1 p)x
1pr, x = 1,2, . . .
EXT=r(1 p)
p(1 pr)
VarXT=r(1 p) + r2(1 p)2
p2(1 pr)r(1 p)
p(1 pr)2.
3.14 a. P
x=1 (1p)x
xlog p=1
log pP
x=1 (1p)x
x= 1,since the sum is the Taylor series for log p.
b.
EX=1
log p"
X
x=1
(1p)x#=1
log p"
X
x=0
(1p)x1#== 1
log p1
p1=1
log p1p
p.
Since the geometric series converges uniformly,
EX2=1
log p
X
x=1
x(1 p)x=(1p)
log p
X
x=1
d
dp(1 p)x
=(1p)
log p
d
dp
X
x=1
(1 p)x=(1p)
log p
d
dp 1p
p=(1p)
p2log p.
Thus
VarX=(1p)
p2log p1 + (1p)
log p.
Alternatively, the mgf can be calculated,
Mx(t) = 1
log p
X
x=1 h(1p)etix=log(1+petet)
log p
and can be differentiated to obtain the moments.
3-6 Solutions Manual for Statistical Inference
3.15 The moment generating function for the negative binomial is
M(t) = p
1(1 p)etr
= 1 + 1
r
r(1 p)(et1)
1(1 p)et!r
,
the term
r(1 p)(et1)
1(1 p)etλ(et1)
1=λ(et1) as r→ ∞, p 1 and r(p1) λ.
Thus by Lemma 2.3.14, the negative binomial moment generating function converges to
eλ(et1), the Poisson moment generating function.
3.16 a. Using integration by parts with, u=tαand dv =etdt, we obtain
Γ(α+ 1) = Z
0
t(α+1)1etdt =tα(et)
0Z
0
αtα1(et)dt = 0 + αΓ(α) = αΓ(α).
b. Making the change of variable z=2t, i.e., t=z2/2, we obtain
Γ(1/2) = Z
0
t1/2etdt =Z
0
2
zez2/2zdz =2Z
0
ez2/2dz =2π
2=π.
where the penultimate equality uses (3.3.14).
3.17
EXν=Z
0
xν1
Γ(α)βαxα1ex/βdx =1
Γ(α)βαZ
0
x(ν+α)1ex/βdx
=Γ(ν+α)βν+α
Γ(α)βα=βνΓ(ν+α)
Γ(α).
Note, this formula is valid for all ν > α. The expectation does not exist for ν≤ −α.
3.18 If Ynegative binomial(r, p), its moment generating function is MY(t) = p
1(1p)etr,and,
from Theorem 2.3.15, MpY (t) = p
1(1p)ept r.Now use L’Hˆopital’s rule to calculate
lim
p0p
1(1 p)ept = lim
p0
1
(p1)tept+ept =1
1t,
so the moment generating function converges to (1 t)r, the moment generating function of
a gamma(r, 1).
3.19 Repeatedly apply the integration-by-parts formula
1
Γ(n)Z
x
zn1zzdz =xn1ex
(n1)! +1
Γ(n1) Z
x
zn2zzdz,
until the exponent on the second integral is zero. This will establish the formula. If X
gamma(α, 1) and YPoisson(x). The probabilistic relationship is P(Xx) = P(Yα1).
3.21 The moment generating function would be defined by 1
πR
−∞
etx
1+x2dx. On (0,), etx > x, hence
Z
0
etx
1+x2dx > Z
0
x
1+x2dx =,
thus the moment generating function does not exist.
Second Edition 3-7
3.22 a.
E(X(X1)) =
X
x=0
x(x1)eλλx
x!
=eλλ2
X
x=2
λx2
(x2)! (let y=x2)
=eλλ2
X
y=0
λy
y!=eλλ2eλ=λ2
EX2=λ2+ EX=λ2+λ
VarX= EX2(EX)2=λ2+λλ2=λ.
b.
E(X(X1)) =
X
x=0
x(x1)r+x1
xpr(1 p)x
=
X
x=2
r(r+ 1)r+x1
x2pr(1 p)x
=r(r+ 1)(1 p)2
p2
X
y=0 r+2+y1
ypr + 2(1 p)y
=r(r1)(1 p)2
p2,
where in the second equality we substituted y=x2, and in the third equality we use the
fact that we are summing over a negative binomial(r+ 2, p) pmf. Thus,
VarX= EX(X1) + EX(EX)2
=r(r+ 1)(1 p)2
p2+r(1 p)
pr2(1 p)2
p2
=r(1 p)
p2.
c.
EX2=Z
0
x21
Γ(α)βαxα1ex/βdx =1
Γ(α)βαZ
0
xα+1ex/β dx
=1
Γ(α)βαΓ(α+ 2)βα+2 =α(α+ 1)β2.
VarX= EX2(EX)2=α(α+ 1)β2α2β2=αβ2.
d. (Use 3.3.18)
EX=Γ(α+1)Γ(α+β)
Γ(α+β+1)Γ(α)=αΓ(α)Γ(α+β)
(α+β)Γ(α+β)Γ(α)=α
α+β.
EX2=Γ(α+2)Γ(α+β)
Γ(α+β+2)Γ(α)=(α+1)αΓ(α)Γ(α+β)
(α+β+1)(α+β)Γ(α+β)Γ(α)=α(α+1)
(α+β)(α+β+1).
VarX= EX2(EX)2=α(α+1)
(α+β)(α+β+1) α2
(α+β)2=αβ
(α+β)2(α+β+1).
3-8 Solutions Manual for Statistical Inference
e. The double exponential(µ, σ) pdf is symmetric about µ. Thus, by Exercise 2.26, EX=µ.
VarX=Z
−∞
(xµ)21
2σe−|xµ|dx =Z
−∞
σz21
2e−|z|σdz
=σ2Z
0
z2ezdz =σ2Γ(3) = 2σ2.
3.23 a.
Z
α
xβ1dx =1
βxβ
α
=1
βαβ,
thus f(x) integrates to 1 .
b. EXn=βαn
(nβ), therefore
EX=αβ
(1 β)
EX2=αβ2
(2 β)
VarX=αβ2
2β(αβ)2
(1β)2
c. If β < 2 the integral of the second moment is infinite.
3.24 a. fx(x) = 1
βex/β,x > 0. For Y=X1,fY(y) = γ
βeyγyγ1,y > 0. Using the transforma-
tion z=yγ, we calculate
EYn=γ
βZ
0
yγ+n1eyγdy =βn/γ Z
0
zn/γ ezdz =βn/γ Γn
γ+1.
Thus EY=β1Γ( 1
γ+ 1) and VarY=β2hΓ2
γ+1Γ21
γ+1i.
b. fx(x) = 1
βex/β,x > 0. For Y= (2X/β)1/2,fY(y) = yey2/2,y > 0 . We now notice that
EY=Z
0
y2ey2/2dy =2π
2
since 1
2πR
−∞ y2ey2/2= 1, the variance of a standard normal, and the integrand is sym-
metric. Use integration-by-parts to calculate the second moment
EY2=Z
0
y3ey2/2dy = 2 Z
0
yey2/2dy = 2,
where we take u=y2,dv =yey2/2. Thus VarY= 2(1 π/4).
c. The gamma(a, b) density is
fX(x) = 1
Γ(a)baxa1ex/b.
Make the transformation y= 1/x with dx =dy/y2to get
fY(y) = fX(1/y)|1/y2|=1
Γ(a)ba1
ya+1
e1/by.
Second Edition 3-9
The first two moments are
EY=1
Γ(a)baZ
01
ya
e1/by =Γ(a1)ba1
Γ(a)ba=1
(a1)b
EY2=Γ(a2)ba2
Γ(a)ba=1
(a1)(a2)b2,
and so VarY=1
(a1)2(a2)b2.
d. fx(x) = 1
Γ(3/2)β3/2x3/21ex/β,x > 0. For Y= (X/β)1/2,fY(y) = 2
Γ(3/2) y2ey2,y > 0. To
calculate the moments we use integration-by-parts with u=y2,dv =yey2to obtain
EY=2
Γ(3/2) Z
0
y3ey2dy =2
Γ(3/2) Z
0
yey2dy =1
Γ(3/2)
and with u=y3, dv =yey2to obtain
EY2=2
Γ(3/2) Z
0
y4ey2dy =3
Γ(3/2) Z
0
y2ey2dy =3
Γ(3/2)π.
Using the fact that 1
2πR
−∞ y2ey2= 1, since it is the variance of a n(0,2), symmetry yields
R
0y2ey2dy =π. Thus, VarY= 6 4, using Γ(3/2) = 1
2π.
e. fx(x) = ex,x > 0. For Y=αγlog X,fY(y) = eeαy
γeαy
γ1
γ,−∞ < y < . Calculation
of EYand EY2cannot be done in closed form. If we define
I1=Z
0
log xexdx, I2=Z
0
(log x)2exdx,
then EY= E(αγlog x) = αγI1, and EY2= E(αγlog x)2=α22αγI1+γ2I2.The
constant I1=.5772157 is called Euler’s constant.
3.25 Note that if T is continuous then,
P(tTt+δ|tT) = P(tTt+δ, t T)
P(tT)
=P(tTt+δ)
P(tT)
=FT(t+δ)FT(t)
1FT(t).
Therefore from the definition of derivative,
hT(t) = 1
1FT(t)= lim
δ0
FT(t+δ)FT(t)
δ=F0
T(t)
1FT(t)=fT(t)
1FT(t).
Also,
d
dt (log[1 FT(t)]) = 1
1FT(t)(fT(t)) = hT(t).
3.26 a. fT(t) = 1
βet/β and FT(t) = Rt
0
1
βex/βdx =ex/β
t
0= 1 et/β. Thus,
hT(t) = fT(t)
1FT(t)=(1)et/β
1(1 et/β)=1
β.
3-10 Solutions Manual for Statistical Inference
b. fT(t) = γ
βtγ1etγ, t 0 and FT(t) = Rt
0
γ
βxγ1exγdx =Rtγ
0eudu =eu|tγ
0=
1etγ, where u=xγ . Thus,
hT(t) = (γ)tγ1etγ
etγ=γ
βtγ1.
c. FT(t) = 1
1+e(tµ)and fT(t) = e(tµ)
(1+e(tµ))2. Thus,
hT(t) = 1
βe(tµ)(1+e(tµ))21
e(tµ)
1+e(tµ)
=1
βFT(t).
3.27 a. The uniform pdf satisfies the inequalities of Exercise 2.27, hence is unimodal.
b. For the gamma(α, β) pdf f(x), ignoring constants, d
dx f(x) = xα2ex/β
β[β(α1) x], which
only has one sign change. Hence the pdf is unimodal with mode β(α1).
c. For the n(µ, σ2) pdf f(x), ignoring constants, d
dx f(x) = xµ
σ2e(x/β)2/2σ2, which only has
one sign change. Hence the pdf is unimodal with mode µ.
d. For the beta(α, β) pdf f(x), ignoring constants,
d
dxf(x) = xα2(1 x)β2[(α1) x(α+β2)] ,
which only has one sign change. Hence the pdf is unimodal with mode α1
α+β2.
3.28 a. (i) µknown,
f(x|σ2) = 1
2πσ exp 1
2σ2(xµ)2,
h(x) = 1, c(σ2) = 1
2πσ2I(0,)(σ2), w1(σ2) = 1
2σ2, t1(x) = (xµ)2.
(ii) σ2known,
f(x|µ) = 1
2πσ exp x2
2σ2exp µ2
2σ2exp µx
σ2,
h(x) = exp x2
2σ2, c(µ) = 1
2πσ exp µ2
2σ2, w1(µ) = µ, t1(x) = x
σ2.
b. (i) αknown,
f(x|β) = 1
Γ(α)βαxα1e
x
β,
h(x) = xα1
Γ(α),x > 0, c(β) = 1
βα, w1(β) = 1
β, t1(x) = x.
(ii) βknown,
f(x|α) = ex/β 1
Γ(α)βαexp((α1) log x),
h(x) = ex/β,x > 0, c(α) = 1
Γ(α)βαw1(α) = α1, t1(x) = log x.
(iii) α, β unknown,
f(x|α, β) = 1
Γ(α)βαexp((α1) log xx
β),
h(x) = I{x>0}(x), c(α, β) = 1
Γ(α)βα, w1(α) = α1, t1(x) = log x,
w2(α, β) = 1, t2(x) = x.
c. (i) αknown, h(x) = xα1I[0,1](x), c(β) = 1
B(α,β), w1(β) = β1, t1(x) = log(1 x).
(ii) βknown, h(x) = (1 x)β1I[0,1](x), c(α) = 1
B(α,β), w1(x) = α1, t1(x) = log x.
Second Edition 3-11
(iii) α, β unknown,
h(x) = I[0,1](x), c(α, β) = 1
B(α,β), w1(α) = α1, t1(x) = log x,
w2(β) = β1, t2(x) = log(1 x).
d. h(x) = 1
x!I{0,1,2,...}(x), c(θ) = eθ, w1(θ) = log θ, t1(x) = x.
e. h(x) = x1
r1I{r,r+1,...}(x), c(p) = p
1pr, w1(p) = log(1 p), t1(x) = x.
3.29 a. For the n(µ, σ2)
f(x) = 1
2π eµ2/2σ2
σ!ex2/2σ2+xµ/σ2,
so the natural parameter is (η1, η2) = (1/2σ2, µ/σ2) with natural parameter space
{(η12):η1<0,−∞ < η2<∞}.
b. For the gamma(α, β),
f(x) = 1
Γ(α)βαe(α1) log xx/β ,
so the natural parameter is (η1, η2) = (α1,1) with natural parameter space
{(η12):η1>12<0}.
c. For the beta(α, β),
f(x) = Γ(α+β)
Γ(α)Γ(β)e(α1) log x+(β1) log(1x),
so the natural parameter is (η1, η2)=(α1, β 1) and the natural parameter space is
{(η12):η1>12>1}.
d. For the Poisson
f(x) = 1
x!eθexlogθ
so the natural parameter is η= log θand the natural parameter space is {η:−∞ < η < ∞}.
e. For the negative binomial(r, p), rknown,
P(X=x) = r+x1
x(pr)exlog (1p),
so the natural parameter is η= log(1 p) with natural parameter space {η:η < 0}.
3.31 a.
0 =
θ Zh(x)c(θ) exp k
X
i=1
wi(θ)ti(x)!dx
=Zh(x)c0(θ) exp k
X
i=1
wi(θ)ti(x)!dx
+Zh(x)c(θ) exp k
X
i=1
wi(θ)ti(x)! k
X
i=1
wi(θ)
θj
ti(x)!dx
=Zh(x)
θj
logc(θ)c(θ) exp k
X
i=1
wi(θ)ti(x)!dx + E "k
X
i=1
wi(θ)
θj
ti(x)#
=
θj
logc(θ)+E"k
X
i=1
wi(θ)
θj
ti(x)#
Therefore E hPk
i=1
wi(θ)
θjti(x)i=
θjlogc(θ).
3-12 Solutions Manual for Statistical Inference
b.
0 = 2
θ2Zh(x)c(θ) exp k
X
i=1
wi(θ)ti(x)!dx
=Zh(x)c00(θ) exp k
X
i=1
wi(θ)ti(x)!dx
+Zh(x)c0(θ) exp k
X
i=1
wi(θ)ti(x)! k
X
i=1
wi(θ)
θj
ti(x)!dx
+Zh(x)c0(θ) exp k
X
i=1
wi(θ)ti(x)! k
X
i=1
wi(θ)
θj
ti(x)!dx
+Zh(x)c(θ) exp k
X
i=1
wi(θ)ti(x)! k
X
i=1
wi(θ)
θj
ti(x)!2
dx
+Zh(x)c(θ) exp k
X
i=1
wi(θ)ti(x)! k
X
i=1
2wi(θ)
θ2
j
ti(x)!dx
=Zh(x)"2
θ2
j
logc(θ)#c(θ) exp k
X
i=1
wi(θ)ti(x)!dx
+Zh(x)c0(θ)
c(θ)2
c(θ) exp k
X
i=1
wi(θ)ti(x)!dx
+2
θj
logc(θ)E"k
X
i=1
wi(θ)
θj
ti(x)#
+E "(
k
X
i=1
wi(θ)
θj
ti(x))2#+ E "k
X
i=1
2wi(θ)
θ2
j
ti(x)#
=2
θ2
j
logc(θ) +
θj
logc(θ)2
2E "k
X
i=1
wi(θ)
θj
ti(x)#E"k
X
i=1
wi(θ)
θj
ti(x)#
+E "(
k
X
i=1
wi(θ)
θj
ti(x))2#+ E "k
X
i=1
2wi(θ)
θ2
j
ti(x)#
=2
θ2
j
logc(θ) + Var k
X
i=1
wi(θ)
θj
ti(x)!+ E "k
X
i=1
2wi(θ)
θ2
j
ti(x)#.
Therefore Var Pk
i=1
wi(θ)
θjti(x)=2
θ2
j
logc(θ)EhPk
i=1
2wi(θ)
θ2
j
ti(x)i.
3.33 a. (i) h(x) = exI{−∞<x<∞}(x), c(θ) = 1
2πθ exp(θ
2)θ > 0, w1(θ) = 1
2θ, t1(x) = x2.
(ii) The nonnegative real line.
b. (i) h(x) = I{−∞<x<∞}(x), c(θ) = 1
2π2exp(1
2a)− ∞ < θ < , a > 0,
w1(θ) = 1
22, w2(θ) = 1
, t1(x) = x2, t2(x) = x.
(ii) A parabola.
Second Edition 3-13
c. (i) h(x) = 1
xI{0<x<∞}(x), c(α) = αα
Γ(α)α > 0, w1(α) = α, w2(α) = α,
t1(x) = log(x), t2(x) = x.
(ii) A line.
d. (i) h(x) = Cexp(x4)I{−∞<x<∞}(x), c(θ) = exp(θ4)− ∞ < θ < , w1(θ) = θ,
w2(θ) = θ2, w3(θ) = θ3, t1(x) = 4x3, t2(x) = 6x2, t3(x) = 4x.
(ii) The curve is a spiral in 3-space.
(iii) A good picture can be generated with the Mathematica statement
ParametricPlot3D[{t, t^2, t^3}, {t, 0, 1}, ViewPoint -> {1, -2, 2.5}].
3.35 a. In Exercise 3.34(a) w1(λ) = 1
2λand for a n(eθ, eθ), w1(θ) = 1
2eθ.
b. EX=µ=αβ, then β=µ
α. Therefore h(x) = 1
xI{0<x<∞}(x),
c(α) = αα
Γ(α)( µ
α)α, α > 0, w1(α) = α, w2(α) = α
µ, t1(x) = log(x), t2(x) = x.
c. From (b) then (α1, . . . , αn, β1, . . . , βn) = (α1, . . . , αn,α1
µ, . . . , αn
µ)
3.37 The pdf ( 1
σ)f((xµ)
σ) is symmetric about µbecause, for any  > 0,
1
σf(µ+)µ
σ=1
σf
σ=1
σf
σ=1
σf(µ)µ
σ.
Thus, by Exercise 2.26b, µis the median.
3.38 P(X > xα) = P(σZ +µ > σzα+µ) = P(Z > zα) by Theorem 3.5.6.
3.39 First take µ= 0 and σ= 1.
a. The pdf is symmetric about 0, so 0 must be the median. Verifying this, write
P(Z0) = Z
0
1
π
1
1+z2dz =1
πtan1(z)
0
=1
ππ
20=1
2.
b. P(Z1) = 1
πtan1(z)
1=1
ππ
2π
4=1
4.By symmetry this is also equal to P(Z≤ −1).
Writing z= (xµ)establishes P(Xµ) = 1
2and P(Xµ+σ) = 1
4.
3.40 Let Xf(x) have mean µand variance σ2. Let Z=Xµ
σ. Then
EZ=1
σE(Xµ) = 0
and
VarZ= Var Xµ
σ=1
σ2Var(Xµ) = 1
σ2VarX=σ2
σ2= 1.
Then compute the pdf of Z,fZ(z) = fx(σz +µ)·σ=σfx(σz +µ) and use fZ(z) as the standard
pdf.
3.41 a. This is a special case of Exercise 3.42a.
b. This is a special case of Exercise 3.42b.
3.42 a. Let θ1> θ2. Let X1f(xθ1) and X2f(xθ2). Let F(z) be the cdf corresponding to
f(z) and let Zf(z).Then
F(x|θ1) = P(X1x) = P(Z+θ1x) = P(Zxθ1) = F(xθ1)
F(xθ2) = P(Zxθ2) = P(Z+θ2x) = P(X2x)
=F(x|θ2).
3-14 Solutions Manual for Statistical Inference
The inequality is because xθ2> x θ1, and Fis nondecreasing. To get strict inequality
for some x, let (a, b] be an interval of length θ1θ2with P(a < Z b) = F(b)F(a)>0.
Let x=a+θ1. Then
F(x|θ1) = F(xθ1) = F(a+θ1θ1) = F(a)
< F (b) = F(a+θ1θ2) = F(xθ2) = F(x|θ2).
b. Let σ1> σ2. Let X1f(x/σ1) and X2f(x/σ2). Let F(z) be the cdf corresponding to
f(z) and let Zf(z). Then, for x > 0,
F(x|σ1) = P(X1x) = P(σ1Zx) = P(Zx/σ1) = F(x/σ1)
F(x/σ2) = P(Zx/σ2) = P(σ2Zx) = P(X2x)
=F(x|σ2).
The inequality is because x/σ2> x/σ1(because x > 0 and σ1> σ2>0), and Fis
nondecreasing. For x0, F(x|σ1) = P(X1x)=0=P(X2x) = F(x|σ2). To
get strict inequality for some x, let (a, b] be an interval such that a > 0, b/a =σ12and
P(a < Z b) = F(b)F(a)>0. Let x=1. Then
F(x|σ1) = F(x/σ1) = F(11) = F(a)
< F (b) = F(12) = F(x/σ2)
=F(x|σ2).
3.43 a. FY(y|θ) = 1 FX(1
y|θ)y > 0, by Theorem 2.1.3. For θ1> θ2,
FY(y|θ1) = 1 FX1
y
θ11FX1
y
θ2=FY(y|θ2)
for all y, since FX(x|θ) is stochastically increasing and if θ1> θ2,FX(x|θ2)FX(x|θ1) for
all x. Similarly, FY(y|θ1) = 1 FX(1
y|θ1)<1FX(1
y|θ2) = FY(y|θ2) for some y, since if
θ1> θ2,FX(x|θ2)< FX(x|θ1) for some x. Thus FY(y|θ) is stochastically decreasing in θ.
b. FX(x|θ) is stochastically increasing in θ. If θ1> θ2and θ1, θ2>0 then 1
θ2>1
θ1. Therefore
FX(x|1
θ1)FX(x|1
θ2) for all xand FX(x|1
θ1)< FX(x|1
θ2) for some x. Thus FX(x|1
θ) is
stochastically decreasing in θ.
3.44 The function g(x) = |x|is a nonnegative function. So by Chebychev’s Inequality,
P(|X| ≥ b)E|X|/b.
Also, P(|X| ≥ b) = P(X2b2). Since g(x) = x2is also nonnegative, again by Chebychev’s
Inequality we have
P(|X| ≥ b) = P(X2b2)EX2/b2.
For Xexponential(1), E|X|= EX= 1 and EX2= VarX+ (EX)2= 2 . For b= 3,
E|X|/b = 1/3>2/9 = EX2/b2.
Thus EX2/b2is a better bound. But for b=2,
E|X|/b = 1/2<1 = EX2/b2.
Thus E|X|/b is a better bound.
Second Edition 3-15
3.45 a.
MX(t) = Z
−∞
etxfX(x)dx Z
a
etxfX(x)dx
eta Z
a
fX(x)dx =etaP(Xa),
where we use the fact that etx is increasing in xfor t > 0.
b.
MX(t) = Z
−∞
etxfX(x)dx Za
−∞
etxfX(x)dx
eta Za
−∞
fX(x)dx =etaP(Xa),
where we use the fact that etx is decreasing in xfor t < 0.
c. h(t, x) must be nonnegative.
3.46 For Xuniform(0,1), µ=1
2and σ2=1
12 , thus
P(|Xµ|> kσ) = 1 P1
2k
12 X1
2+k
12=12k
12 k < 3,
0k3,
For Xexponential(λ), µ=λand σ2=λ2, thus
P(|Xµ|> kσ) = 1 P(λkλ Xλ+kλ) = 1 + e(k+1) ek1k1
e(k+1) k > 1.
From Example 3.6.2, Chebychev’s Inequality gives the bound P(|Xµ|> kσ)1/k2.
Comparison of probabilities
ku(0,1) exp(λ) Chebychev
exact exact
.1 .942 .926 100
.5 .711 .617 4
1 .423 .135 1
1.5 .134 .0821 .44
3 0 0.0651 .33
2 0 0.0498 .25
4 0 0.00674 .0625
10 0 0.0000167 .01
So we see that Chebychev’s Inequality is quite conservative.
3.47
P(|Z|> t)=2P(Z > t)=21
2πZ
t
ex2/2dx
=r2
πZ
t
1+x2
1+x2ex2/2dx
=r2
πZ
t
1
1+x2ex2/2dx+Z
t
x2
1+x2ex2/2dx.
3-16 Solutions Manual for Statistical Inference
To evaluate the second term, let u=x
1+x2,dv =xex2/2dx,v=ex2/2,du =1x2
(1+x2)2, to
obtain
Z
t
x2
1 + x2ex2/2dx =x
1 + x2(ex2/2)
tZ
t
1x2
(1 + x2)2(ex2/2)dx
=t
1 + t2et2/2+Z
t
1x2
(1 + x2)2ex2/2dx.
Therefore,
P(Zt) = r2
π
t
1 + t2et2/2+r2
πZ
t1
1 + x2+1x2
(1 + x2)2ex2/2dx
=r2
π
t
1 + t2et2/2+r2
πZ
t2
(1 + x2)2ex2/2dx
r2
π
t
1 + t2et2/2.
3.48 For the negative binomial
P(X=x+ 1) = r+x+ 1 1
x+ 1 pr(1 p)x+1 =r+x
x+ 1(1 p)P(X=x).
For the hypergeometric
P(X=x+ 1) =
(Mx)(kx+x+1)(x+1)
P(X=x)if x < k,x < M,xM(Nk)
(M
x+1)( NM
kx1)
(N
k)if x=M(Nk)1
0 otherwise.
3.49 a.
E(g(X)(Xαβ)) = Z
0
g(x)(xαβ)1
Γ(α)βαxα1ex/βdx.
Let u=g(x), du =g0(x), dv = (xαβ)xα1ex/β ,v=βxαex/β . Then
Eg(X)(Xαβ) = 1
Γ(α)βαg(x)βxαex/β
0+βZ
0
g0(x)xαex/βdx.
Assuming g(x) to be differentiable, E|Xg0(X)|<and limx→∞ g(x)xαex/β = 0, the first
term is zero, and the second term is βE(Xg0(X)).
b.
Eg(X)β(α1)1X
x=Γ(α+β)
Γ(α)Γ(β)Z1
0
g(x)β(α1)1x
xxα1(1 x)β1dx.
Let u=g(x) and dv = (β(α1)1x
x)xα1(1 x)β. The expectation is
Γ(α+β)
Γ(α)Γ(β)g(x)xα1(1 x)β
1
0+Z1
0
(1 x)g0(x)xα1(1 x)β1dx= E((1 X)g0(X)),
assuming the first term is zero and the integral exists.
Second Edition 3-17
3.50 The proof is similar to that of part a) of Theorem 3.6.8. For Xnegative binomial(r, p),
Eg(X)
=
X
x=0
g(x)r+x1
xpr(1 p)x
=
X
y=1
g(y1)r+y2
y1pr(1 p)y1(set y=x+ 1)
=
X
y=1
g(y1) y
r+y1r+y1
ypr(1 p)y1
=
X
y=0 y
r+y1
g(y1)
1pr+y1
ypr(1 p)y(the summand is zero at y = 0)
= E X
r+X1
g(X1)
1p,
where in the third equality we use the fact that r+y2
y1=y
r+y1r+y1
y.
Chapter 4
Multiple Random Variables
4.1 Since the distribution is uniform, the easiest way to calculate these probabilities is as the ratio
of areas, the total area being 4.
a. The circle x2+y21 has area π, so P(X2+Y21) = π
4.
b. The area below the line y= 2xis half of the area of the square, so P(2XY > 0) = 2
4.
c. Clearly P(|X+Y|<2) = 1.
4.2 These are all fundamental properties of integrals. The proof is the same as for Theorem 2.2.5
with bivariate integrals replacing univariate integrals.
4.3 For the experiment of tossing two fair dice, each of the points in the 36-point sample space are
equally likely. So the probability of an event is (number of points in the event)/36. The given
probabilities are obtained by noting the following equivalences of events.
P({X= 0, Y = 0}) = P({(1,1),(2,1),(1,3),(2,3),(1,5),(2,5)}) = 6
36 =1
6
P({X= 0, Y = 1}) = P({(1,2),(2,2),(1,4),(2,4),(1,6),(2,6)}) = 6
36 =1
6
P({X= 1, Y = 0})
=P({(3,1),(4,1),(5,1),(6,1),(3,3),(4,3),(5,3),(6,3),(3,5),(4,5),(5,5),(6,5)})
=12
36 =1
3
P({X= 1, Y = 1})
=P({(3,2),(4,2),(5,2),(6,2),(3,4),(4,4),(5,4),(6,4),(3,6),(4,6),(5,6),(6,6)})
=12
36 =1
3
4.4 a. R1
0R2
0C(x+ 2y)dxdy = 4C= 1, thus C=1
4.
b. fX(x) = R1
0
1
4(x+ 2y)dy =1
4(x+ 1) 0 < x < 2
0 otherwise
c. FXY (x, y) = P(Xx, Y y) = Rx
−∞ Ry
−∞ f(v, u)dvdu. The way this integral is calculated
depends on the values of xand y. For example, for 0 < x < 2 and 0 < y < 1,
FXY (x, y) = Zx
−∞ Zy
−∞
f(u, v)dvdu =Zx
0Zy
0
1
4(u+ 2v)dvdu =x2y
8+y2x
4.
But for 0 < x < 2 and 1 y,
FXY (x, y) = Zx
−∞ Zy
−∞
f(u, v)dvdu =Zx
0Z1
0
1
4(u+ 2v)dvdu =x2
8+x
4.
4-2 Solutions Manual for Statistical Inference
The complete definition of FXY is
FXY (x, y) =
0x0 or y0
x2y/8 + y2x/4 0 < x < 2 and 0 < y < 1
y/2 + y2/2 2 xand 0 < y < 1
x2/8 + x/4 0 < x < 2 and 1 y
1 2 xand 1 y
.
d. The function z=g(x) = 9/(x+ 1)2is monotone on 0 <x<2, so use Theorem 2.1.5 to
obtain fZ(z) = 9/(8z2), 1 < z < 9.
4.5 a. P(X > Y) = R1
0R1
y(x+y)dxdy =7
20 .
b. P(X2< Y < X) = R1
0Ry
y2xdxdy =1
6.
4.6 Let A= time that Aarrives and B= time that Barrives. The random variables Aand Bare
independent uniform(1,2) variables. So their joint pdf is uniform on the square (1,2) ×(1,2).
Let X= amount of time Awaits for B. Then, FX(x) = P(Xx) = 0 for x < 0, and
FX(x) = P(Xx) = 1 for 1 x. For x= 0, we have
FX(0) = P(X0) = P(X= 0) = P(BA) = Z2
1Za
1
1dbda =1
2.
And for 0 < x < 1,
FX(x) = P(Xx) = 1P(X > x) = 1P(BA > x) = 1Z2x
1Z2
a+x
1dbda =1
2+xx2
2.
4.7 We will measure time in minutes past 8 A.M. So Xuniform(0,30), Yuniform(40,50) and
the joint pdf is 1/300 on the rectangle (0,30) ×(40,50).
P(arrive before 9 A.M.) = P(X+Y < 60) = Z50
40 Z60y
0
1
300dxdy =1
2.
4.9
P(aXb, c Yd)
=P(Xb, c Yd)P(Xa, c Yd)
=P(Xb, Y d)P(Xb, Y c)P(Xa, Y d) + P(Xa, Y c)
=F(b, d)F(b, c)F(a, d)F(a, c)
=FX(b)FY(d)FX(b)FY(c)FX(a)FY(d)FX(a)FY(c)
=P(Xb) [P(Yd)P(Yc)] P(Xa) [P(Yd)P(Yc)]
=P(Xb)P(cYd)P(Xa)P(cYd)
=P(aXb)P(cYd).
4.10 a. The marginal distribution of Xis P(X= 1) = P(X= 3) = 1
4and P(X= 2) = 1
2. The
marginal distribution of Yis P(Y= 2) = P(Y= 3) = P(Y= 4) = 1
3. But
P(X= 2, Y = 3) = 0 6= (1
2)(1
3) = P(X= 2)P(Y= 3).
Therefore the random variables are not independent.
b. The distribution that satisfies P(U=x, V =y) = P(U=x)P(V=y) where UXand
VYis
Second Edition 4-3
U
1 2 3
21
12
1
6
1
12
V31
12
1
6
1
12
41
12
1
6
1
12
4.11 The support of the distribution of (U, V ) is {(u, v) : u= 1,2, . . . ;v=u+ 1, u + 2, . . .}. This
is not a cross-product set. Therefore, Uand Vare not independent. More simply, if we know
U=u, then we know V > u.
4.12 One interpretation of “a stick is broken at random into three pieces” is this. Suppose the length
of the stick is 1. Let Xand Ydenote the two points where the stick is broken. Let Xand Y
both have uniform(0,1) distributions, and assume Xand Y are independent. Then the joint
distribution of Xand Yis uniform on the unit square. In order for the three pieces to form
a triangle, the sum of the lengths of any two pieces must be greater than the length of the
third. This will be true if and only if the length of each piece is less than 1/2. To calculate the
probability of this, we need to identify the sample points (x, y) such that the length of each
piece is less than 1/2. If y > x, this will be true if x < 1/2, yx < 1/2 and 1 y < 1/2.
These three inequalities define the triangle with vertices (0,1/2), (1/2,1/2) and (1/2,1). (Draw
a graph of this set.) Because of the uniform distribution, the probability that (X, Y ) falls in
the triangle is the area of the triangle, which is 1/8. Similarly, if x > y, each piece will have
length less than 1/2 if y < 1/2, xy < 1/2 and 1 x < 1/2. These three inequalities define
the triangle with vertices (1/2,0), (1/2,1/2) and (1,1/2). The probability that (X, Y ) is in this
triangle is also 1/8. So the probability that the pieces form a triangle is 1/8+1/8 = 1/4.
4.13 a.
E(Yg(X))2
= E ((YE(Y|X)) + (E(Y|X)g(X)))2
= E(YE(Y|X))2+ E(E(Y|X)g(X))2+ 2E [(YE(Y|X))(E(Y|X)g(X))] .
The cross term can be shown to be zero by iterating the expectation. Thus
E(Yg(X))2= E(YE(Y|X))2+E(E(Y|X)g(X))2E(YE(Y|X))2,for all g(·).
The choice g(X) = E(Y|X) will give equality.
b. Equation (2.2.3) is the special case of a) where we take the random variable Xto be a
constant. Then, g(X) is a constant, say b, and E(Y|X) = EY.
4.15 We will find the conditional distribution of Y|X+Y. The derivation of the conditional distri-
bution of X|X+Yis similar. Let U=X+Yand V=Y. In Example 4.3.1, we found the
joint pmf of (U, V ). Note that for fixed u,f(u, v) is positive for v= 0, . . . , u. Therefore the
conditional pmf is
f(v|u) = f(u, v)
f(u)=
θuveθ
(uv)!
λveλ
v!
(θ+λ)ue(θ+λ)
u!
=u
vλ
θ+λvθ
θ+λuv
, v = 0, . . . , u.
That is V|Ubinomial(U, λ/(θ+λ)).
4.16 a. The support of the distribution of (U, V ) is {(u, v) : u= 1,2, . . . ;v= 0,±1,±2, . . .}.
If V > 0, then X > Y . So for v= 1,2, . . ., the joint pmf is
fU,V (u, v) = P(U=u, V =v) = P(Y=u, X =u+v)
=p(1 p)u+v1p(1 p)u1=p2(1 p)2u+v2.
4-4 Solutions Manual for Statistical Inference
If V < 0, then X < Y . So for v=1,2, . . ., the joint pmf is
fU,V (u, v) = P(U=u, V =v) = P(X=u, Y =uv)
=p(1 p)u1p(1 p)uv1=p2(1 p)2uv2.
If V= 0, then X=Y. So for v= 0, the joint pmf is
fU,V (u, 0) = P(U=u, V = 0) = P(X=Y=u) = p(1 p)u1p(1 p)u1=p2(1 p)2u2.
In all three cases, we can write the joint pmf as
fU,V (u, v) = p2(1 p)2u+|v|−2=p2(1 p)2u(1 p)|v|−2, u = 1,2, . . . ;v= 0,±1,±2, . . . .
Since the joint pmf factors into a function of uand a function of v,Uand Vare independent.
b. The possible values of Zare all the fractions of the form r/s, where rand sare positive
integers and r < s. Consider one such value, r/s, where the fraction is in reduced form. That
is, rand shave no common factors. We need to identify all the pairs (x, y) such that xand
yare positive integers and x/(x+y) = r/s. All such pairs are (ir, i(sr)), i= 1,2, . . ..
Therefore,
PZ=r
s=
X
i=1
P(X=ir, Y =i(sr)) =
X
i=1
p(1 p)ir1p(1 p)i(sr)1
=p2
(1 p)2
X
i=1
((1 p)s)i=p2
(1 p)2
(1 p)s
1(1 p)s=p2(1 p)s2
1(1 p)s.
c.
P(X=x, X +Y=t) = P(X=x, Y =tx) = P(X=x)P(Y=tx) = p2(1 p)t2.
4.17 a. P(Y=i+ 1) = Ri+1
iexdx =ei(1 e1), which is geometric with p= 1 e1.
b. Since Y5 if and only if X4,
P(X4x|Y5) = P(X4x|X4) = P(Xx) = ex,
since the exponential distribution is memoryless.
4.18 We need to show f(x, y) is nonnegative and integrates to 1. f(x, y)0, because the numerator
is nonnegative since g(x)0, and the denominator is positive for all x > 0, y > 0. Changing
to polar coordinates, x=rcos θand y=rsin θ, we obtain
Z
0Z
0
f(x, y)dxdy =Zπ/2
0Z
0
2g(r)
πr rdr=2
πZπ/2
0Z
0
g(r)dr=2
πZπ/2
0
1= 1.
4.19 a. Since (X1X2).2n(0,1), (X1X2)2.2χ2
1(see Example 2.1.9).
b. Make the transformation y1=x1
x1+x2,y2=x1+x2then x1=y1y2,x2=y2(1 y1) and
|J|=y2. Then
f(y1, y2) = Γ(α1+α2)
Γ(α1)Γ(α2)yα11
1(1 y1)α211
Γ(α1+α2)yα1+α21
2ey2,
thus Y1beta(α1, α2), Y2gamma(α1+α1,1) and are independent.
Second Edition 4-5
4.20 a. This transformation is not one-to-one because you cannot determine the sign of X2from
Y1and Y2. So partition the support of (X1, X2) into A0={−∞ < x1<, x2= 0},
A1={−∞ < x1<, x2>0}and A2={−∞ < x1<, x2<0}. The support of (Y1, Y2)
is B={0< y1<,1< y2<1}. The inverse transformation from Bto A1is x1=y2y1
and x2=py1y1y2
2with Jacobian
J1=
1
2
y2
y1y1
1
2
1y2
2
y1
y2y1
1y2
2
=1
2p1y2
2
.
The inverse transformation from Bto A2is x1=y2y1and x2=py1y1y2
2with J2=
J1. From (4.3.6), fY1,Y 2(y1, y2) is the sum of two terms, both of which are the same in this
case. Then
fY1,Y 2(y1, y2)=2"1
2πσ2ey1/(2σ2)1
2p1y2
2#
=1
2πσ2ey1/(2σ2)1
p1y2
2
,0< y1<,1< y2<1.
b. We see in the above expression that the joint pdf factors into a function of y1and a function
of y2. So Y1and Y2are independent. Y1is the square of the distance from (X1, X2) to
the origin. Y2is the cosine of the angle between the positive x1-axis and the line from
(X1, X2) to the origin. So independence says the distance from the origin is independent of
the orientation (as measured by the angle).
4.21 Since Rand θare independent, the joint pdf of T=R2and θis
fT (t, θ) = 1
4πet/2,0< t < ,0< θ < 2π.
Make the transformation x=tcos θ,y=tsin θ. Then t=x2+y2,θ= tan1(y/x), and
J=
2x2y
y
x2+y2x
x2+y2= 2.
Therefore
fX,Y (x, y) = 2
4πe1
2(x2+y2),0< x2+y2<,0<tan1y/x < 2π.
Thus,
fX,Y (x, y) = 1
2πe1
2(x2+y2),−∞ < x, y < .
So Xand Yare independent standard normals.
4.23 a. Let y=v,x=u/y =u/v then
J=
x
u
x
v
y
u
y
v =
1
vu
v2
0 1 =1
v.
fU,V (u, v) = Γ(α+β)
Γ(α)Γ(β)
Γ(α+β+γ)
Γ(α+β)Γ(γ)u
vα11u
vβ1vα+β1(1v)γ11
v,0< u < v < 1.
4-6 Solutions Manual for Statistical Inference
Then,
fU(u) = Γ(α+β+γ)
Γ(α)Γ(β)Γ(γ)uα1Z1
u
vβ1(1 v)γ1(vu
v)β1dv
=Γ(α+β+γ)
Γ(α)Γ(β)Γ(γ)uα1(1 u)β+γ1Z1
0
yβ1(1 y)γ1dy y=vu
1u, dy =dv
1u
=Γ(α+β+γ)
Γ(α)Γ(β)Γ(γ)uα1(1 u)β+γ1Γ(β)Γ(γ)
Γ(β+γ)
=Γ(α+β+γ)
Γ(α)Γ(β+γ)uα1(1 u)β+γ1,0< u < 1.
Thus, Ugamma(α, β +γ).
b. Let x=uv,y=pu
vthen
J=
x
u
x
v
y
u
x
v =
1
2v1/2u1/21
2u1/2v1/2
1
2v1/2u1/21
2u1/2v3/2=1
2v.
fU,V (u, v) = Γ(α+β+γ)
Γ(α)Γ(β)Γ(γ)(uvα1(1 uv)β1ru
vα+β11ru
vγ11
2v.
The set {0< x < 1,0< y < 1}is mapped onto the set {0< u < v < 1
u,0<u<1}. Then,
fU(u)
=Z1/u
u
fU,V (u, v)dv
=Γ(α+β+γ)
Γ(α)Γ(β)Γ(γ)uα1(1u)β+γ1
| {z }Z1/u
u1uv
1uβ1 1pu/v
1u!γ1(pu/v)β
2v(1 u)dv.
Call it A
To simplify, let z=u/vu
1u. Then v=uz= 1, v= 1/u z= 0 and dz =u/v
2(1u)vdv.
Thus,
fU(u) = AZzβ1(1 z)γ1dz ( kernel of beta(β, γ))
=Γ(α+β+γ)
Γ(α)Γ(β)Γ(γ)uα1(1 u)β+γ1Γ(β)Γ(γ)
Γ(β+γ)
=Γ(α+β+γ)
Γ(α)Γ(β+γ)uα1(1 u)β+γ1,0<u<1.
That is, Ubeta(α, β +γ), as in a).
4.24 Let z1=x+y,z2=x
x+y, then x=z1z2,y=z1(1 z2) and
|J|=
x
z1
x
z2
y
z1
y
z2
=
z2z1
1z2z1=z1.
The set {x > 0, y > 0}is mapped onto the set {z1>0,0< z2<1}.
fZ1,Z2(z1, z2) = 1
Γ(r)(z1z2)r1ez1z2·1
Γ(s)(z1z1z2)s1ez1+z1z2z1
=1
Γ(r+s)zr+s1
1ez1·Γ(r+s)
Γ(r)Γ(s)zr1
2(1 z2)s1,0< z1,0< z2<1.
Second Edition 4-7
fZ1,Z2(z1, z2) can be factored into two densities. Therefore Z1and Z2are independent and
Z1gamma(r+s, 1), Z2beta(r, s).
4.25 For Xand Zindependent, and Y=X+Z,fXY (x, y) = fX(x)fZ(yx). In Example 4.5.8,
fXY (x, y) = I(0,1)(x)1
10I(0,1/10)(yx).
In Example 4.5.9, Y=X2+Zand
fXY (x, y) = fX(x)fZ(yx2) = 1
2I(1,1)(x)1
10I(0,1/10)(yx2).
4.26 a.
P(Zz, W = 0) = P(min(X, Y )z, Y X) = P(Yz, Y X)
=Zz
0Z
y
1
λex/λ 1
µey/µdxdy
=λ
µ+λ1exp 1
µ+1
λz.
Similarly,
P(Zz,W =1) = P(min(X, Y )z, X Y) = P(Xz, X Y)
=Zz
0Z
x
1
λex/λ 1
µey/µdydx =µ
µ+λ1exp 1
µ+1
λz.
b.
P(W= 0) = P(YX) = Z
0Z
y
1
λex/λ 1
µey/µdxdy =λ
µ+λ.
P(W=1)=1P(W= 0) = µ
µ+λ.
P(Zz) = P(Zz, W = 0) + P(Zz, W = 1) = 1 exp 1
µ+1
λz.
Therefore, P(Zz, W =i) = P(Zz)P(W=i), for i= 0,1, z > 0. So Zand Ware
independent.
4.27 From Theorem 4.2.14 we know Un(µ+γ, 2σ2) and Vn(µγ, 2σ2). It remains to show
that they are independent. Proceed as in Exercise 4.24.
fXY (x, y) = 1
2πσ2e1
2σ2[(xµ)2+(yγ)2](by independence, sofXY =fXfY)
Let u=x+y,v=xy, then x=1
2(u+v), y=1
2(uv) and
|J|=
1/2 1/2
1/21/2=1
2.
The set {−∞ < x < ,−∞ < y < ∞} is mapped onto the set {−∞ < u < ,−∞ < v < ∞}.
Therefore
fUV (u, v) = 1
2πσ2e1
2σ2((u+v
2)µ)2+((uv
2)γ)2·1
2
=1
4πσ2e1
2σ2h2(u
2)2u(µ+γ)+ (µ+γ)2
2+2(v
2)2v(µγ)+ (µ+γ)2
2i
=g(u)1
4πσ2e1
2(2σ2)(u(µ+γ))2·h(v)e1
2(2σ2)(v(µγ))2.
By the factorization theorem, Uand Vare independent.
4-8 Solutions Manual for Statistical Inference
4.29 a. X
Y=Rcos θ
Rsin θ= cot θ. Let Z= cot θ. Let A1= (0, π), g1(θ) = cot θ,g1
1(z) = cot1z,
A2= (π, 2π), g2(θ) = cot θ,g1
2(z) = π+ cot1z. By Theorem 2.1.8
fZ(z) = 1
2π|1
1 + z2|+1
2π|1
1 + z2|=1
π
1
1 + z2,−∞ < z < .
b. XY =R2cos θsin θthen 2XY =R22 cos θsin θ=R2sin 2θ. Therefore 2XY
R=Rsin 2θ.
Since R=X2+Y2then 2XY
X2+Y2=Rsin 2θ. Thus 2XY
X2+Y2is distributed as sin 2θwhich
is distributed as sin θ. To see this let sin θfsin θ. For the function sin 2θthe values of
the function sin θare repeated over each of the 2 intervals (0, π) and (π, 2π) . Therefore
the distribution in each of these intervals is the distribution of sin θ. The probability of
choosing between each one of these intervals is 1
2. Thus f2 sin θ=1
2fsin θ+1
2fsin θ=fsin θ.
Therefore 2XY
X2+Y2has the same distribution as Y= sin θ. In addition, 2XY
X2+Y2has the
same distribution as X= cos θsince sin θhas the same distribution as cos θ. To see this let
consider the distribution of W= cos θand V= sin θwhere θuniform(0,2π). To derive
the distribution of W= cos θlet A1= (0, π), g1(θ) = cos θ,g1
1(w) = cos1w,A2= (π, 2π),
g2(θ) = cos θ,g1
2(w) = 2πcos1w. By Theorem 2.1.8
fW(w) = 1
2π|1
1w2|+1
2π|1
1w2|=1
π
1
1w2,1w1.
To derive the distribution of V= sin θ, first consider the interval ( π
2,3π
2). Let g1(θ) = sin θ,
4g1
1(v) = πsin1v, then
fV(v) = 1
π
1
1v2,1v1.
Second, consider the set {(0,π
2)(3π
2,2π)}, for which the function sin θhas the same values
as it does in the interval ( π
2,π
2). Therefore the distribution of Vin {(0,π
2)(3π
2,2π)}is
the same as the distribution of Vin (π
2,π
2) which is 1
π
1
1v2,1v1. On (0,2π) each
of the sets (π
2,3π
2), {(0,π
2)(3π
2,2π)}has probability 1
2of being chosen. Therefore
fV(v) = 1
2
1
π
1
1v2+1
2
1
π
1
1v2=1
π
1
1v2,1v1.
Thus Wand Vhas the same distribution.
Let Xand Ybe iid n(0,1). Then X2+Y2χ2
2is a positive random variable. Therefore
with X=Rcos θand Y=Rsin θ,R=X2+Y2is a positive random variable and
θ= tan1(Y
X)uniform(0,1). Thus 2XY
X2+Y2Xn(0,1).
4.30 a.
EY= E {E(Y|X)}= EX=1
2.
VarY= Var (E(Y|X)) + E (Var(Y|X)) = VarX+ EX2=1
12 +1
3=5
12.
EXY = E[E(XY |X)] = E[XE(Y|X)] = EX2=1
3
Cov(X, Y )=EXY EXEY=1
31
22
=1
12.
b. The quick proof is to note that the distribution of Y|X=xis n(1,1), hence is independent
of X. The bivariate transformation t=y/x,u=xwill also show that the joint density
factors.
Second Edition 4-9
4.31 a.
EY= E{E(Y|X)}= EnX =n
2.
VarY= Var (E(Y|X)) + E (Var(Y|X)) = Var(nX)+EnX(1 X) = n2
12 +n
6.
b.
P(Y=y, X x) = n
yxy(1 x)ny, y = 0,1, . . . , n, 0< x < 1.
c.
P(y=y) = n
yΓ(y+ 1)Γ(ny+ 1)
Γ(n+ 2) .
4.32 a. The pmf of Y, for y= 0,1, . . ., is
fY(y) = Z
0
fY(y|λ)fΛ(λ)=Z
0
λyeλ
y!
1
Γ(α)βαλα1eλ/β
=1
y!Γ(α)βαZ
0
λ(y+α)1exp
λ
β
1+β
=1
y!Γ(α)βαΓ(y+α)β
1+βy+α
.
If αis a positive integer,
fY(y) = y+α1
yβ
1+βy1
1+βα
,
the negative binomial(α, 1/(1 + β)) pmf. Then
EY= E(E(Y|Λ)) = EΛ = αβ
VarY= Var(E(Y|Λ)) + E(Var(Y|Λ)) = VarΛ + EΛ = αβ2+αβ =αβ(β+ 1).
b. For y= 0,1, . . ., we have
P(Y=y|λ) =
X
n=y
P(Y=y|N=n, λ)P(N=n|λ)
=
X
n=yn
ypy(1 p)nyeλλn
n!
=
X
n=y
1
y!(ny)! p
1py
[(1 p)λ]neλ
=eλ
X
m=0
1
y!m!p
1py
[(1 p)λ]m+y(let m=ny)
=eλ
y!p
1py
[(1 p)λ]y"
X
m=0
[(1p)λ]m
m!#
=eλ()ye(1p)λ
=()ye
y!,
4-10 Solutions Manual for Statistical Inference
the Poisson() pmf. Thus Y|ΛPoisson(). Now calculations like those in a) yield the
pmf of Y, for y= 0,1, . . ., is
fY(y) = 1
Γ(α)y!()αΓ(y+α)
1+y+α
.
Again, if αis a positive integer, Ynegative binomial(α, 1/(1 + )).
4.33 We can show that Hhas a negative binomial distribution by computing the mgf of H.
EeHt = EE eHtN= EE e(X1+···+XN)tN= E nEeX1tNNo,
because, by Theorem 4.6.7, the mgf of a sum of independent random variables is equal to the
product of the individual mgfs. Now,
EeX1t=
X
x1=1
ex1t1
logp
(1 p)x1
x1
=1
logp
X
x1=1
(et(1 p))x1
x1
=1
logplog 1et(1 p).
Then
Elog {1et(1 p)}
logpN
=
X
n=0 log {1et(1 p)}
logpneλλn
n!(since NPoisson)
=eλeλlog(1et(1p))
logp
X
n=0
e
λlog(1et(1p))
logpλlog(1et(1p))
logpn
n!.
The sum equals 1. It is the sum of a Poisson[λlog(1 et(1 p))]/[logp]pmf. Therefore,
E(eHt) = eλhelog(1et(1p))iλ/ log p=elogpλ/ logp1
1et(1 p)λ/ log p
=p
1et(1 p)λ/ logp
.
This is the mgf of a negative binomial(r, p), with r=λ/ log p, if ris an integer.
4.34 a.
P(Y=y) = Z1
0
P(Y=y|p)fp(p)dp
=Z1
0n
ypy(1 p)ny1
B(α, β)pα1(1 p)β1dp
=n
yΓ(α+β)
Γ(α)Γ(β)Z1
0
py+α1(1 p)n+βy1dp
=n
yΓ(α+β)
Γ(α)Γ(β)
Γ(y+α)Γ(n+βy)
Γ(α+n+β), y = 0,1, . . . , n.
b.
P(X=x) = Z1
0
P(X=x|p)fP(p)dp
=Z1
0r+x1
xpr(1 p)xΓ(α+β)
Γ(α)Γ(β)pα1(1 p)β1dp
Second Edition 4-11
=r+x1
xΓ(α+β)
Γ(α)Γ(β)Z1
0
p(r+α)1(1 p)(x+β)1dp
=r+x1
xΓ(α+β)
Γ(α)Γ(β)
Γ(r+α)Γ(x+β)
Γ(r+x+α+β)x= 0,1, . . .
Therefore,
EX= E[E(X|P)] = E r(1 P)
P=rβ
α1,
since
E1P
P=Z1
01P
PΓ(α+β)
Γ(α)Γ(β)pα1(1 p)β1dp
=Γ(α+β)
Γ(α)Γ(β)Z1
0
p(α1)1(1 p)(β+1)1dp =Γ(α+β)
Γ(α)Γ(β)
Γ(α1)Γ(β+ 1)
Γ(α+β)
=β
α1.
Var(X) = E(Var(X|P)) + Var(E(X|P)) = E r(1 P)
P2+ Var r(1 P)
P
=r(β+ 1)(α+β)
α(α1) +r2β(α+β1)
(α1)2(α2),
since
E1P
P2=Z1
0
Γ(α+β)
Γ(α)Γ(β)p(α2)1(1 p)(β+1)1dp =Γ(α+β)
Γ(α)Γ(β)
Γ(α2)Γ(β+ 1)
Γ(α+β1)
=(β+ 1)(α+β)
α(α1)
and
Var 1P
P= E "1P
P2#E1P
P2
=β(β+ 1)
(α2)(α1) (β
α1)2
=β(α+β1)
(α1)2(α2),
where
E"1P
P2#=Z1
0
Γ(α+β)
Γ(α)Γ(β)p(α2)1(1 p)(β+2)1dp
=Γ(α+β)
Γ(α)Γ(β)
Γ(α2)Γ(β+ 2)
Γ(α2 + β+ 2) =β(β+ 1)
(α2)(α1).
4.35 a. Var(X) = E(Var(X|P)) + Var(E(X|P)). Therefore,
Var(X) = E[nP (1 P)] + Var(nP )
=nαβ
(α+β)(α+β+ 1) +n2VarP
=nαβ(α+β+ 1 1)
(α+β2)(α+β+ 1) +n2VarP
4-12 Solutions Manual for Statistical Inference
=nαβ(α+β+ 1)
(α+β2)(α+β+ 1) nαβ
(α+β2)(α+β+ 1) +n2VarP
=nα
α+β
β
α+βnVarP+n2VarP
=nEP(1 EP) + n(n1)VarP.
b. Var(Y) = E(Var(Y|Λ)) + Var(E(Y|Λ)) = EΛ + Var(Λ) = µ+1
αµ2since EΛ = µ=αβ and
Var(Λ) = αβ2=(αβ)2
α=µ2
α. The “extra-Poisson” variation is 1
αµ2.
4.37 a. Let Y=PXi.
P(Y=k) = P(Y=k, 1
2< c =1
2(1 + p)<1)
=Z1
0
(Y=k|c=1
2(1 + p))P(P=p)dp
=Z1
0n
k[1
2(1 + p)]k[1 1
2(1 + p)]nkΓ(a+b)
Γ(a)Γ(b)pa1(1 p)b1dp
=Z1
0n
k(1 + p)k
2k
(1 p)nk
2nk
Γ(a+b)
Γ(a)Γ(b)pa1(1 p)b1dp
=n
kΓ(a+b)
2nΓ(a)Γ(b)
k
X
j=0 Z1
0
pk+a1(1 p)nk+b1dp
=n
kΓ(a+b)
2nΓ(a)Γ(b)
k
X
j=0 k
jΓ(k+a)Γ(nk+b)
Γ(n+a+b)
=
k
X
j=0 " k
j
2n!n
kΓ(a+b)
Γ(a)Γ(b)
Γ(k+a)Γ(nk+b)
Γ(n+a+b)#.
A mixture of beta-binomial.
b.
EY= E(E(Y|c)) = E[nc] = E n1
2(1 + p)=n
21 + a
a+b.
Using the results in Exercise 4.35(a),
Var(Y) = nEC(1 EC) + n(n1)VarC.
Therefore,
Var(Y) = nE1
2(1 + P)1E1
2(1 + P)+n(n1)Var 1
2(1 + P)
=n
4(1 + EP)(1 EP) + n(n1)
4VarP
=n
4 1a
a+b2!+n(n1)
4
ab
(a+b)2(a+b+ 1).
4.38 a. Make the transformation u=x
νx
λ,du =x
ν2,ν
λν=x
λu . Then
Zλ
0
1
νex/ν 1
Γ(r)Γ(1 r)
νr1
(λν)r
Second Edition 4-13
=1
Γ(r)Γ(1 r)Z
0
1
xx
λure(u+x/λ)du
=xr1ex/λ
λrΓ(r)Γ(1 r)Z
01
ur
eudu =xr1ex/λ
Γ(r)λr,
since the integral is equal to Γ(1 r) if r < 1.
b. Use the transformation t=νto get
Zλ
0
pλ(ν)=1
Γ(r)Γ(1 r)Zλ
0
νr1(λν)r=1
Γ(r)Γ(1 r)Z1
0
tr1(1 t)rdt = 1,
since this is a beta(r, 1r).
c.
d
dx log f(x) = d
dx log 1
Γ(r)λr+(r1) log xx/λ=r1
x1
λ>0
for some x, if r > 1. But,
d
dx log Z
0
ex/ν
νqλ(ν)=R
0
1
ν2ex/ν qλ(ν)
R
0
1
νex/ν qλ(ν)<0x.
4.39 a. Without loss of generality lets assume that i<j. From the discussion in the text we have
that
f(x1, . . . , xj1, xj+1, . . . , xn|xj)
=(mxj)!
x1!·····xj1!·xj+1!·····xn!
×p1
1pjx1
·····pj1
1pjxj1pj+1
1pjxj+1
·····pn
1pjxn
.
Then,
f(xi|xj)
=X
(x1,...,xi1,xi+1,...,xj1,xj+1,...,xn)
f(x1, . . . , xj1, xj+1, . . . , xn|xj)
=X
(xk6=xi,xj)
(mxj)!
x1!·····xj1!·xj+1!·····xn!
×(p1
1pj
)x1·····(pj1
1pj
)xj1(pj+1
1pj
)xj+1 ·····(pn
1pj
)xn
×
(mxixj)! 1pi
1pjmxixj
(mxixj)! 1pi
1pjmxixj
=(mxj)!
xi!(mxixj)!(pi
1pj
)xi1pi
1pjmxixj
×X
(xk6=xi,xj)
(mxixj)!
x1!·····xi1!, xi+1!·····xj1!, xj+1!·····xn!
×(p1
1pjpi
)x1·····(pi1
1pjpi
)xi1(pi+1
1pjpi
)xi+1
4-14 Solutions Manual for Statistical Inference
×(pj1
1pjpi
)xj1(pj+1
1pjpi
)xj+1 ·····(pn
1pjpi
)xn
=(mxj)!
xi!(mxixj)!(pi
1pj
)xi1pi
1pjmxixj
.
Thus Xi|Xj=xjbinomial(mxj,pi
1pj).
b.
f(xi, xj) = f(xi|xj)f(xj) = m!
xi!xj!(mxjxi)!pxi
ipxj
j(1 pjpi)mxjxi.
Using this result it can be shown that Xi+Xjbinomial(m, pi+pj). Therefore,
Var(Xi+Xj) = m(pi+pj)(1 pipj).
By Theorem 4.5.6 Var(Xi+Xj) = Var(Xi) + Var(Xj) + 2Cov(Xi, Xj). Therefore,
Cov(Xi, Xj) = 1
2[m(pi+pj)(1pipj)mpi(1pi)mpi(1pi)] = 1
2(2mpipj) = mpipj.
4.41 Let abe a constant. Cov(a, X) = E(aX)EaEX=aEXaEX= 0.
4.42
ρXY,Y =Cov(XY, Y )
σXY σY
=E(XY 2)µXY µY
σXY σY
=EXEY2µXµYµY
σXY σY
,
where the last step follows from the independence of X and Y. Now compute
σ2
XY = E(XY )2[E(XY )]2= EX2EY2(EX)2(EY)2
= (σ2
X+µ2
X)(σ2
Y+µ2
Y)µ2
Xµ2
Y=σ2
Xσ2
Y+σ2
Xµ2
Y+σ2
Yµ2
X.
Therefore,
ρXY,Y =µX(σ2
Y+µ2
Y)µXµ2
Y
(σ2
Xσ2
Y+σ2
Xµ2
Y+σ2
Yµ2
X)1/2σY
=µXσY
(µ2
Xσ2
Y+µ2
Yσ2
X+σ2
Xσ2
Y)1/2.
4.43
Cov(X1+X2, X2+X3) = E(X1+X2)(X2+X3)E(X1+X2)E(X2+X3)
= (4µ2+σ2)4µ2=σ2
Cov(X1+X2)(X1X2) = E(X1+X2)(X1X2)=EX2
1X2
2= 0.
4.44 Let µi= E(Xi). Then
Var n
X
i=1
Xi!= Var (X1+X2+··· +Xn)
= E [(X1+X2+··· +Xn)(µ1+µ2+··· +µn)]2
= E [(X1µ1)+(X2µ2) + ··· + (Xnµn)]2
=
n
X
i=1
E(Xiµi)2+ 2 X
1i<jn
E(Xiµi)(Xjµj)
=
n
X
i=1
VarXi+ 2 X
1i<jn
Cov(Xi, Xj).
Second Edition 4-15
4.45 a. We will compute the marginal of X. The calculation for Yis similar. Start with
fXY (x, y) = 1
2πσXσYp1ρ2
×exp "1
2(1ρ2)(xµX
σX2
2ρxµX
σXyµY
σY+yµY
σY2)#
and compute
fX(x) = Z
−∞
fXY (x, y)dy =Z
−∞
1
2πσXσYp1ρ2e1
2(1ρ2)(ω22ρωz+z2)σYdz,
where we make the substitution z=yµY
σY,dy =σYdz,ω=xµX
σX. Now the part of the
exponent involving ω2can be removed from the integral, and we complete the square in z
to get
fX(x) = eω2
2(1ρ2)
2πσXp1ρ2Z
−∞
e1
2(1ρ2)[(z22ρωz+ρ2ω2)ρ2ω2]dz
=eω2/2(1ρ2)eρ2ω2/2(1ρ2)
2πσXp1ρ2Z
−∞
e1
2(1ρ2)(zρω)2
dz.
The integrand is the kernel of normal pdf with σ2= (1 ρ2), and µ=ρω, so it integrates
to 2πp1ρ2. Also note that eω2/2(1ρ2)eρ2ω2/2(1ρ2)=eω2/2. Thus,
fX(x) = eω2/2
2πσXp1ρ22πp1ρ2=1
2πσX
e1
2xµX
σX2
,
the pdf of n(µX, σ2
X).
b.
fY|X(y|x)
=
1
2πσXσY1ρ2e1
2(1ρ2)hxµX
σX22ρxµX
σXyµY
σY+yµY
σY2i
1
2πσXe1
2σ2
X
(xµX)2
=1
2πσYp1ρ2e1
2(1ρ2)hxµX
σX2(1ρ2)xµX
σX22ρxµX
σXyµY
σY+yµY
σY2i
=1
2πσYp1ρ2e1
2(1ρ2)hρ2xµX
σX22ρxµX
σXyµY
σY+yµY
σY2i
=1
2πσYp1ρ2e1
2σ2
Y(1ρ2(yµY)ρσY
σX(xµX)2
,
which is the pdf of n(µYρ(σYX)(xµX), σYp1ρ2.
c. The mean is easy to check,
E(aX +bY ) = aEX+bEY=X+Y,
4-16 Solutions Manual for Statistical Inference
as is the variance,
Var(aX +bY ) = a2VarX+b2VarY+ 2abCov(X, Y ) = a2σ2
X+b2σ2
Y+ 2abρσXσY.
To show that aX +bY is normal we have to do a bivariate transform. One possibility is
U=aX +bY ,V=Y, then get fU,V (u, v) and show that fU(u) is normal. We will do this in
the standard case. Make the indicated transformation and write x=1
a(ubv), y=vand
obtain
|J|=
1/a b/a
0 1 =1
a.
Then
fUV (u, v) = 1
2πap1ρ2e1
2(1ρ2)[1
a(ubv)]22ρ
a(ubv)+v2.
Now factor the exponent to get a square in u. The result is
1
2(1ρ2)b2+ 2ρab +a2
a2u2
b2+ 2ρab +a22b+
b2+ 2ρab +a2uv +v2.
Note that this is joint bivariate normal form since µU=µV= 0, σ2
v= 1, σ2
u=a2+b2+ 2abρ
and
ρ=Cov(U, V )
σUσV
=E(aXY +bY 2)
σUσV
=+b
pa2+b2+abρ,
thus
(1 ρ2) = 1 a2ρ2+abρ +b2
a2+b2+ 2abρ =(1ρ2)a2
a2+b2+ 2abρ =(1 ρ2)a2
σ2
u
where ap1ρ2=σUp1ρ2. We can then write
fUV (u, v) = 1
2πσUσVp1ρ2exp "1
2p1ρ2u2
σ2
U2ρuv
σUσV
+v2
σ2
V#,
which is in the exact form of a bivariate normal distribution. Thus, by part a), Uis normal.
4.46 a.
EX=aXEZ1+bXEZ2+ EcX=aX0 + bX0 + cX=cX
VarX=a2
XVarZ1+b2
XVarZ2+ VarcX=a2
X+b2
X
EY=aY0 + bY0 + cY=cY
VarY=a2
YVarZ1+b2
YVarZ2+ VarcY=a2
Y+b2
Y
Cov(X, Y )=EXY EX·EY
= E[(aXaYZ2
1+bXbYZ2
2+cXcY+aXbYZ1Z2+aXcYZ1+bXaYZ2Z1
+bXcYZ2+cXaYZ1+cXbYZ2)cXcY]
=aXaY+bXbY,
since EZ2
1= EZ2
2= 1, and expectations of other terms are all zero.
b. Simply plug the expressions for aX,bX, etc. into the equalities in a) and simplify.
c. Let D=aXbYaYbX=p1ρ2σXσYand solve for Z1and Z2,
Z1=bY(XcX)bX(YcY)
D=σY(XµX)+σX(YµY)
p2(1+ρ)σXσY
Z2=σY(XµX)+σX(YµY)
p2(1ρ)σXσY
.
Second Edition 4-17
Then the Jacobian is
J= z1
x1
z1
y
z2
x
z2
y !=bY
DbX
D
aY
D
aX
D=aXbY
D2aYbX
D2=1
D=1
p1ρ2σXσY
,
and we have that
fX,Y (x, y) = 1
2πe1
2
(σY(xµX)+σX(yµY))2
2(1+ρ)σ2
Xσ2
Y1
2πe1
2
(σY(xµX)+σX(yµY))2
2(1ρ)σ2
Xσ2
Y1
p1ρ2σXσY
= (2πσXσYp1ρ2)1exp 1
2(1 ρ2)xµX
σX2!
2ρxµX
σXyµY
σY+yµY
σY2
,−∞ < x < ,−∞ < y < ,
a bivariate normal pdf.
d. Another solution is
aX=ρσXbX=p(1 ρ2)σX
aY=σYbY= 0
cX=µX
cY=µY.
There are an infinite number of solutions. Write bX=±pσ2
Xa2
X,bY=±pσ2
Ya2
Y, and
substitute bX,bYinto aXaY=ρσXσY. We get
aXaY+±qσ2
Xa2
X±qσ2
Ya2
Y=ρσXσY.
Square both sides and simplify to get
(1 ρ2)σ2
Xσ2
Y=σ2
Xa2
Y2ρσXσYaXaY+σ2
Ya2
X.
This is an ellipse for ρ6=±1, a line for ρ=±1. In either case there are an infinite number
of points satisfying the equations.
4.47 a. By definition of Z, for z < 0,
P(Zz) = P(Xzand XY > 0) + P(Xzand XY < 0)
=P(Xzand Y < 0) + P(X≥ −zand Y < 0) (since z < 0)
=P(Xz)P(Y < 0) + P(X≥ −z)P(Y < 0) (independence)
=P(Xz)P(Y < 0) + P(Xz)P(Y > 0) (symmetry of Xand Y)
=P(Xz)(P(Y < 0) + P(Y > 0))
=P(Xz).
By a similar argument, for z > 0, we get P(Z > z) = P(X > z), and hence, P(Zz) =
P(Xz). Thus, ZXn(0,1).
b. By definition of Z,Z > 0either (i)X < 0 and Y > 0 or (ii)X > 0 and Y > 0. So Zand
Yalways have the same sign, hence they cannot be bivariate normal.
4-18 Solutions Manual for Statistical Inference
4.49 a.
fX(x) = Z(af1(x)g1(y) + (1 a)f2(x)g2(y))dy
=af1(x)Zg1(y)dy + (1 a)f2(x)Zg2(y)dy
=af1(x) + (1 a)f2(x).
fY(y) = Z(af1(x)g1(y) + (1 a)f2(x)g2(y))dx
=ag1(y)Zf1(x)dx + (1 a)g2(y)Zf2(x)dx
=ag1(y) + (1 a)g2(y).
b. () If Xand Yare independent then f(x, y) = fX(x)fY(y). Then,
f(x, y)fX(x)fY(y)
=af1(x)g1(y) + (1 a)f2(x)g2(y)
[af1(x) + (1 a)f2(x)][ag1(y) + (1 a)g2(y)]
=a(1 a)[f1(x)g1(y)f1(x)g2(y)f2(x)g1(y) + f2(x)g2(y)]
=a(1 a)[f1(x)f2(x)][g1(y)g2(y)]
= 0.
Thus [f1(x)f2(x)][g1(y)g2(y)] = 0 since 0 < a < 1.
() if [f1(x)f2(x)][g1(y)g2(y)] = 0 then
f1(x)g1(y) + f2(x)g2(y) = f1(x)g2(y) + f2(x)g1(y).
Therefore
fX(x)fY(y)
=a2f1(x)g1(y) + a(1 a)f1(x)g2(y) + a(1 a)f2(x)g1(y) + (1 a)2f2(x)g2(y)
=a2f1(x)g1(y) + a(1 a)[f1(x)g2(y) + f2(x)g1(y)] + (1 a)2f2(x)g2(y)
=a2f1(x)g1(y) + a(1 a)[f1(x)g1(y) + f2(x)g2(y)] + (1 a)2f2(x)g2(y)
=af1(x)g1(y) + (1 a)f2(x)g2(y) = f(x, y).
Thus Xand Yare independent.
c.
Cov(X, Y ) = 1ξ1+ (1 a)µ2ξ2[1+ (1 a)µ2][1+ (1 a)ξ2]
=a(1 a)[µ1ξ1µ1ξ2µ2ξ1+µ2ξ2]
=a(1 a)[µ1µ2][ξ1ξ2].
To construct dependent uncorrelated random variables let (X, Y )af1(x)g1(y) + (1
a)f2(x)g2(y) where f1,f2,g1,g2are such that f1f26= 0 and g1g26= 0 with µ1=µ2or
ξ1=ξ2.
d. (i) f1binomial(n, p), f2binomial(n, p), g1binomial(n, p), g2binomial(n, 1p).
(ii) f1binomial(n, p1), f2binomial(n, p2), g1binomial(n, p1), g2binomial(n, p2).
(iii) f1binomial(n1,p
n1), f2binomial(n2,p
n2), g1binomial(n1, p), g2binomial(n2, p).
Second Edition 4-19
4.51 a.
P(X/Y t) = 1
2t t > 1
1
2+ (1 t)t1
P(XY t) = ttlog t0< t < 1.
b.
P(XY/Z t) = Z1
0
P(XY zt)dz
=(R1
0zt
2+ (1 zt)dz if t1
R1
t
0zt
2+ (1 zt)dz +R1
1
t
1
2zt dz if t1
=1t/4 if t1
t1
4t+1
2tlog tif t > 1.
4.53
P(Real Roots) = P(B2>4AC)
=P(2 log B > log 4 + log A+ log C)
=P(2 log B≤ −log 4 log Alog C)
=P(2 log B≤ −log 4 + (log Alog C)) .
Let X=2 log B,Y=log Alog C. Then Xexponential(2), Ygamma(2,1), indepen-
dent, and
P(Real Roots) = P(X < log 4 + Y)
=Z
log 4
P(X < log 4 + y)fY(y)dy
=Z
log 4 Zlog 4+y
0
1
2ex/2dxyeydy
=Z
log 4 1e1
2log 4ey/2yeydy.
Integration-by-parts will show that R
ayey/b =b(a+b)ea/b and hence
P(Real Roots) = 1
4(1 + log 4) 1
24 2
3+ log 4=.511.
4.54 Let Y=Qn
i=1 Xi. Then P(Yy) = P(Qn
i=1 Xiy) = P(Pn
i=1 log Xi≥ −log y). Now,
log Xiexponential(1) = gamma(1,1). By Example 4.6.8, Pn
i=1 log Xigamma(n, 1).
Therefore,
P(Yy) = Z
log y
1
Γ(n)zn1ezdz,
and
fY(y) = d
dy Z
log y
1
Γ(n)zn1ezdz
=1
Γ(n)(log y)n1e(log y)d
dy (log y)
=1
Γ(n)(log y)n1,0< y < 1.
4-20 Solutions Manual for Statistical Inference
4.55 Let X1,X2,X3be independent exponential(λ) random variables, and let Y= max(X1, X2, X3),
the lifetime of the system. Then
P(Yy) = P(max(X1, X2, X3)y)
=P(X1yand X2yand X3y)
=P(X1y)P(X2y)P(X3y).
by the independence of X1,X2and X3. Now each probability is P(X1y) = Ry
0
1
λex/λdx =
1ey/λ, so
P(Yy) = 1ey3,0< y < ,
and the pdf is
fY(y) = 31ey/λ2ey/λ y > 0
0y0.
4.57 a.
A1= [ 1
n
n
X
x=1
x1
i]1
1=1
n
n
X
x=1
xi,the arithmetic mean.
A1= [ 1
n
n
X
x=1
x1
i]1=1
1
n(1
x1+··· +1
xn),the harmonic mean.
lim
r0log Ar= lim
r0log[ 1
n
n
X
x=1
xr
i]1
r= lim
r0
1
rlog[ 1
n
n
X
x=1
xr
i] = lim
r0
1
nPn
i=1 rxr1
i
1
nPn
i=1 xr
i
= lim
r0
1
nPn
i=1 xr
ilog xi
1
nPn
i=1 xr
i
=1
n
n
X
i=1
log xi=1
nlog(
n
Y
i=1
xi).
Thus A0= limr0Ar= exp( 1
nlog(Qn
i=1 xi)) = (Qn
i=1 xi)1
n, the geometric mean. The term
rxr1
i=xr
ilog xisince rxr1
i=d
dr xr
i=d
dr exp(rlog xi) = exp(rlog xi) log xi=xr
ilog xi.
b. (i) if log Aris nondecreasing then for rr0log Arlog Ar0, then elog Arelog Ar0. Therefore
ArAr0. Thus Aris nondecreasing in r.
(ii) d
dr log Ar=1
r2log( 1
nPn
x=1 xr
i) + 1
r
1
nPn
i=1 rxr1
i
1
nPn
i=1 xr
i
=1
r2rPn
i=1 xr
ilog xi
Pn
x=1 xr
ilog( 1
nPn
x=1 xr
i),
where we use the identity for rxr1
ishowed in a).
(iii)
rPn
i=1 xr
ilog xi
Pn
x=1 xr
ilog( 1
n
n
X
x=1
xr
i)
= log(n) + rPn
i=1 xr
ilog xi
Pn
x=1 xr
ilog(
n
X
x=1
xr
i)
= log(n) +
n
X
i=1 "xr
i
Pn
i=1 xr
i
rlog xixr
i
Pn
i=1 xr
i
log(
n
X
x=1
xr
i)#
= log(n) +
n
X
i=1 "xr
i
Pn
i=1 xr
i
(rlog xilog(
n
X
x=1
xr
i))#
= log(n)
n
X
i=1
xr
i
Pn
i=1 xr
i
log( Pn
x=1 xr
i
xr
i
) = log(n)
n
X
i=1
ailog( 1
ai
).
Second Edition 4-21
We need to prove that log(n)Pn
i=1 ailog( 1
ai). Using Jensen inequality we have that
E log( 1
a) = Pn
i=1 ailog( 1
ai)log(E 1
a) = log(Pn
i=1 ai1
ai) = log(n) which establish the
result.
4.59 Assume that EX= 0, EY= 0, and EZ= 0. This can be done without loss of generality
because we could work with the quantities XEX, etc. By iterating the expectation we have
Cov(X, Y ) = EXY = E[E(XY |Z)].
Adding and subtracting E(X|Z)E(Y|Z) gives
Cov(X, Y ) = E[E(XY |Z)E(X|Z)E(Y|Z)] + E[E(X|Z)E(Y|Z)].
Since E[E(X|Z)] = EX= 0, the second term above is Cov[E(X|Z)E(Y|Z)]. For the first term
write
E[E(XY |Z)E(X|Z)E(Y|Z)] = E [E {XY E(X|Z)E(Y|Z)|Z}]
where we have brought E(X|Z) and E(Y|Z) inside the conditional expectation. This can now
be recognized as ECov(X, Y |Z), establishing the identity.
4.61 a. To find the distribution of f(X1|Z), let U=X21
X1and V=X1. Then x2=h1(u, v) = uv+1,
x1=h2(u, v) = v. Therefore
fU,V (u, v) = fX,Y (h1(u, v), h2(u, v))|J|=e(uv+1)evv,
and
fU(u) = Z
0
ve(uv+1)evdv =e1
(u+ 1)2.
Thus V|U= 0 has distribution vev. The distribution of X1|X2is ex1since X1and X2
are independent.
b. The following Mathematica code will draw the picture; the solid lines are B1and the dashed
lines are B2. Note that the solid lines increase with x1, while the dashed lines are constant.
Thus B1is informative, as the range of X2changes.
e = 1/10;
Plot[{-e*x1 + 1, e*x1 + 1, 1 - e, 1 + e}, {x1, 0, 5},
PlotStyle -> {Dashing[{}], Dashing[{}],Dashing[{0.15, 0.05}],
Dashing[{0.15, 0.05}]}]
c.
P(X1x|B1) = P(Vv| −  < U < ) = Rv
0R
ve(uv+1)evdudv
R
0R
ve(uv+1)evdudv
=
e1hev(1+)
1+1
1+ev(1)
1+1
1i
e1h1
1++1
1i.
Thus lim0P(X1x|B1) = 1 evvev=Rv
0vevdv =P(Vv|U= 0).
P(X1x|B2) = Rx
0R1+
0e(x1+x2)dx2dx1
R1+
0ex2dx2
=e(x+1+)e(1+)ex+ 1
1e(1+).
Thus lim0P(X1x|B2) = 1 ex=Rx
0ex1dx1=P(X1x|X2= 1).
4-22 Solutions Manual for Statistical Inference
4.63 Since X=eZand g(z) = ezis convex, by Jensen’s Inequality EX= Eg(Z)g(EZ) = e0= 1.
In fact, there is equality in Jensen’s Inequality if and only if there is an interval Iwith P(Z
I) = 1 and g(z) is linear on I. But ezis linear on an interval only if the interval is a single
point. So EX > 1, unless P(Z= EZ= 0) = 1.
4.64 a. Let aand bbe real numbers. Then,
|a+b|2= (a+b)(a+b) = a2+ 2ab +b2≤ |a|2+ 2|ab|+|b|2= (|a|+|b|)2.
Take the square root of both sides to get |a+b| ≤ |a|+|b|.
b. |X+Y| ≤ |X|+|Y| ⇒ E|X+Y| ≤ E(|X|+|Y|) = E|X|+ E|Y|.
4.65 Without loss of generality let us assume that Eg(X) = Eh(X) = 0. For part (a)
E(g(X)h(X)) = Z
−∞
g(x)h(x)fX(x)dx
=Z{x:h(x)0}
g(x)h(x)fX(x)dx +Z{x:h(x)0}
g(x)h(x)fX(x)dx
g(x0)Z{x:h(x)0}
h(x)fX(x)dx +g(x0)Z{x:h(x)0}
h(x)fX(x)dx
=Z
−∞
h(x)fX(x)dx
=g(x0)Eh(X) = 0.
where x0is the number such that h(x0) = 0. Note that g(x0) is a maximum in {x:h(x)0}
and a minimum in {x:h(x)0}since g(x) is nondecreasing. For part (b) where g(x) and
h(x) are both nondecreasing
E(g(X)h(X)) = Z
−∞
g(x)h(x)fX(x)dx
=Z{x:h(x)0}
g(x)h(x)fX(x)dx +Z{x:h(x)0}
g(x)h(x)fX(x)dx
g(x0)Z{x:h(x)0}
h(x)fX(x)dx +g(x0)Z{x:h(x)0}
h(x)fX(x)dx
=Z
−∞
h(x)fX(x)dx
=g(x0)Eh(X) = 0.
The case when g(x) and h(x) are both nonincreasing can be proved similarly.
Chapter 5
Properties of a Random Sample
5.1 Let X= # color blind people in a sample of size n. Then Xbinomial(n, p), where p=.01.
The probability that a sample contains a color blind person is P(X > 0) = 1 P(X= 0),
where P(X= 0) = n
0(.01)0(.99)n=.99n. Thus,
P(X > 0) = 1 .99n> .95 n > log(.05)/log(.99) 299.
5.3 Note that YiBernoulli with pi=P(Xiµ) = 1 F(µ) for each i. Since the Yi’s are iid
Bernoulli, Pn
i=1 Yibinomial(n, p = 1 F(µ)).
5.5 Let Y=X1+··· +Xn. Then ¯
X= (1/n)Y, a scale transformation. Therefore the pdf of ¯
Xis
f¯
X(x) = 1
1/n fYx
1/n =nfY(nx).
5.6 a. For Z=XY, set W=X. Then Y=WZ,X=W, and |J|=
0 1
1 1 = 1.Then
fZ,W (z, w) = fX(w)fY(wz)·1, thus fZ(z) = R
−∞ fX(w)fY(wz)dw.
b. For Z=XY , set W=X. Then Y=Z/W and |J|=
0 1
1/w z/w2=1/w. Then
fZ,W (z, w) = fX(w)fY(z/w)· |−1/w|, thus fZ(z) = R
−∞ |−1/w|fX(w)fY(z/w)dw.
c. For Z=X/Y , set W=X. Then Y=W/Z and |J|=
0 1
w/z21/z =w/z2. Then
fZ,W (z, w) = fX(w)fY(w/z)· |w/z2|, thus fZ(z) = R
−∞ |w/z2|fX(w)fY(w/z)dw.
5.7 It is, perhaps, easiest to recover the constants by doing the integrations. We have
Z
−∞
B
1+ ω
σ2=σπB, Z
−∞
D
1+ ωz
τ2=τπD
and
Z
−∞ "
1+ ω
σ2Cω
1+ ωz
τ2#
=Z
−∞ "
1+ ω
σ2C(ωz)
1+ ωz
τ2#Cz Z
−∞
1
1+ ωz
τ2
=Aσ2
2log 1+ ω
σ2Cτ 2
2log "1+ ωz
τ2#
−∞ τπCz.
The integral is finite and equal to zero if A=M2
σ2,C=M2
τ2for some constant M. Hence
fZ(z) = 1
π2στ σπBτπD2πMz
τ=1
π(σ+τ)
1
1+ (z/(σ+τ))2,
if B=τ
σ+τ,D=σ
σ+τ),M=στ2
2z(σ+τ)
1
1+(z
σ+τ)2.
5-2 Solutions Manual for Statistical Inference
5.8 a.
1
2n(n1)
n
X
i=1
n
X
j=1
(XiXj)2
=1
2n(n1)
n
X
i=1
n
X
j=1
(Xi¯
X+¯
XXj)2
=1
2n(n1)
n
X
i=1
n
X
j=1 h(Xi¯
X)22(Xi¯
X)(Xj¯
X)+(Xj¯
X)2i
=1
2n(n1)
n
X
i=1
n(Xi¯
X)22
n
X
i=1
(Xi¯
X)
n
X
j=1
(Xj¯
X)
| {z }
=0
+n
n
X
j=1
(Xj¯
X)2
=n
2n(n1)
n
X
i=1
(Xi¯
X)2+n
2n(n1)
n
X
j=1
(Xj¯
X)2
=1
n1
n
X
i=1
(Xi¯
X)2=S2.
b. Although all of the calculations here are straightforward, there is a tedious amount of book-
keeping needed. It seems that induction is the easiest route. (Note: Without loss of generality
we can assume θ1= 0, so EXi= 0.)
(i) Prove the equation for n= 4. We have S2=1
24 P4
i=1 P4
j=1(XiXj)2, and to calculate
Var(S2) we need to calculate E(S2)2and E(S2). The latter expectation is straightforward
and we get E(S2) = 24θ2. The expected value E(S2)2= E(S4) contains 256(= 44) terms
of which 112(= 4 ×16 + 4 ×16 42) are zero, whenever i=j. Of the remaining terms,
24 are of the form E(XiXj)4= 2(θ4+ 3θ2
2)
96 are of the form E(XiXj)2(XiXk)2=θ4+ 3θ2
2
24 are of the form E(XiXj)2(XkX`)2= 4θ2
2
Thus,
Var(S2) = 1
242h24 ×2(θ4+ 3θ2
2) + 96(θ4+ 3θ2
2) + 24 ×4θ4(24θ2)2i=1
4θ41
3θ2
2.
(ii) Assume that the formula holds for n, and establish it for n+1. (Let Sndenote the variance
based on nobservations.) Straightforward algebra will establish
S2
n+1 =1
2n(n+ 1)
n
X
i=1
n
X
j=1
(XiXj)2+ 2
n
X
k=1
(XkXn+1)2
def’n
=1
2n(n+ 1) [A+ 2B]
where
Var(A)=4n(n1)2θ4n3
n1θ2
2(induction hypothesis)
Var(B) = n(n+ 1)θ4n(n3)θ2
2(Xkand Xn+1 are independent)
Cov(A, B)=2n(n1) θ4θ2
2(some minor bookkeeping needed)
Second Edition 5-3
Hence,
Var(S2
n+1) = 1
4n2(n+ 1)2[Var(A) + 4Var(B) + 4Cov(A, B)] = 1
n+ 1 θ4n2
nθ2
2,
establishing the induction and verifying the result.
c. Again assume that θ1= 0. Then
Cov( ¯
X, S2) = 1
2n2(n1)E
n
X
k=1
Xk
n
X
i=1
n
X
j=1
(XiXj)2
.
The double sum over iand jhas n(n1) nonzero terms. For each of these, the entire
expectation is nonzero for only two values of k(when kmatches either ior j). Thus
Cov( ¯
X, S2) = 2n(n1)
2n2(n1)EXi(XiXj)2=1
nθ3,
and ¯
Xand S2are uncorrelated if θ3= 0.
5.9 To establish the Lagrange Identity consider the case when n= 2,
(a1b2a2b1)2=a2
1b2
2+a2
2b2
12a1b2a2b1
=a2
1b2
2+a2
2b2
12a1b2a2b1+a2
1b2
1+a2
2b2
2a2
1b2
1a2
2b2
2
= (a2
1+a2
2)(b2
1+b2
2)(a1b1+a2b2)2.
Assume that is true for n, then
n+1
X
i=1
a2
i! n+1
X
i=1
b2
i! n+1
X
i=1
aibi!2
= n
X
i=1
a2
i+a2
n+1! n
X
i=1
b2
i+b2
n+1! n
X
i=1
aibi+an+1bn+1!2
= n
X
i=1
a2
i! n
X
i=1
b2
i! n
X
i=1
aibi!2
+ n
X
i=1
a2
i!b2
n+1 +a2
n+1 n
X
i=1
b2
i!2 n
X
i=1
aibi!an+1bn+1
=
n1
X
i=1
n
X
j=i+1
(aibjajbi)2+
n
X
i=1
(aibn+1 an+1bi)2
=
n
X
i=1
n+1
X
j=i+1
(aibjajbi)2.
If all the points lie on a straight line then Yµy=c(Xµx), for some constant c6= 0. Let
bi=Yµyand ai= (Xµx), then bi=cai. Therefore Pn
i=1 Pn+1
j=i+1(aibjajbi)2= 0. Thus
the correlation coefficient is equal to 1.
5.10 a.
θ1= EXi=µ
5-4 Solutions Manual for Statistical Inference
θ2= E(Xiµ)2=σ2
θ3= E(Xiµ)3
= E(Xiµ)2(Xiµ) (Stein’s lemma: Eg(X)(Xθ) = σ2Eg0(X))
= 2σ2E(Xiµ)=0
θ4= E(Xiµ)4= E(Xiµ)3(Xiµ)=3σ2E(Xiµ)2= 3σ4.
b. VarS2=1
n(θ4n3
n1θ2
2) = 1
n(3σ4n3
n1σ4) = 2σ4
n1.
c. Use the fact that (n1)S22χ2
n1and Varχ2
n1= 2(n1) to get
Var (n1)S2
σ2!= 2(n1)
which implies ((n1)2
σ4)VarS2= 2(n1) and hence
VarS2=2(n1)
(n1)24=2σ4
n1.
Remark: Another approach to b), not using the χ2distribution, is to use linear model theory.
For any matrix AVar(X0AX) = 2µ2
2trA2+ 4µ2θ0, where µ2is σ2,θ= EX=µ1. Write
S2=1
n1Pn
i=1(Xi¯
X) = 1
n1X0(I¯
Jn)X.Where
I¯
Jn=
11
n1
n··· −1
n
1
n11
n
.
.
.
.
.
.....
.
.
1
n··· ··· 11
n
.
Notice that trA2= trA=n1, = 0. So
VarS2=1
(n1)2Var(X0AX) = 1
(n1)22σ4(n1) + 0=2σ4
n1.
5.11 Let g(s) = s2. Since g(·) is a convex function, we know from Jensen’s inequality that Eg(S)
g(ES), which implies σ2= ES2(ES)2. Taking square roots, σES. From the proof of
Jensen’s Inequality, it is clear that, in fact, the inequality will be strict unless there is an
interval I such that g is linear on I and P(XI) = 1. Since s2is “linear” only on single points,
we have ET2>(ET)2for any random variable T, unless P(T= ET) = 1.
5.13
EcS2=crσ2
n1E rS2(n1)
σ2!
=crσ2
n1Z
0
q1
Γn1
22(n1)/2q(n1
2)1eq/2dq,
Since pS2(n1)2is the square root of a χ2random variable. Now adjust the integrand to
be another χ2pdf and get
EcS2=crσ2
n1·Γ(n/2)2n/2
Γ((n1)/2)2((n1)/2Z
0
1
Γ(n/2)2n/2q(n1)/21
2eq/2dq
| {z }
=1 since χ2
npdf
.
So c=Γ(n1
2)n1
(n
2)gives E(cS) = σ.
Second Edition 5-5
5.15 a.
¯
Xn+1 =Pn+1
i=1 Xi
n+ 1 =Xn+1 +Pn
i=1 Xi
n+ 1 =Xn+1 +n¯
Xn
n+ 1 .
b.
nS2
n+1 =n
(n+ 1) 1
n+1
X
i=1 Xi¯
Xn+12
=
n+1
X
i=1 XiXn+1 +n¯
Xn
n+ 1 2
(use (a))
=
n+1
X
i=1 XiXn+1
n+ 1 n¯
Xn
n+ 12
=
n+1
X
i=1 Xi¯
XnXn+1
n+ 1 ¯
Xn
n+ 12±¯
Xn
=
n+1
X
i=1 "Xi¯
Xn22Xi¯
XnXn+1¯
Xn
n+ 1 +1
(n+ 1)2Xn+1¯
Xn2#
=
n
X
i=1 Xi¯
Xn2+Xn+1 ¯
Xn22(Xn+1¯
Xn)2
n+ 1 +n+ 1
(n+ 1)2Xn+1 ¯
Xn2
since
n
X
1
(Xi¯
Xn) = 0!
= (n1)S2
n+n
n+ 1 Xn+1 ¯
Xn2.
5.16 a. P3
i=1 Xii
i2χ2
3
b. Xi1
i,v
u
u
tP3
i=2 Xii
i2,2t2
c. Square the random variable in part b).
5.17 a. Let Uχ2
pand Vχ2
q, independent. Their joint pdf is
1
Γp
2Γq
22(p+q)/2up
21vq
21e(u+v)
2.
From Definition 5.3.6, the random variable X= (U/p)/(V/q) has an Fdistribution, so we
make the transformation x= (u/p)/(v/q) and y=u+v. (Of course, many choices of ywill
do, but this one makes calculations easy. The choice is prompted by the exponential term
in the pdf.) Solving for uand vyields
u=
p
qxy
1 + q
px, v =y
1 + q
px,and |J|=
q
py
1 + q
px2.
We then substitute into fU,V (u, v) to obtain
fX,Y (x, y) = 1
Γp
2Γq
22(p+q)/2 p
qxy
1 + q
px!p
21 y
1 + q
px!q
21
ey
2
q
py
1 + q
px2.
5-6 Solutions Manual for Statistical Inference
Note that the pdf factors, showing that Xand Yare independent, and we can read off the
pdfs of each: Xhas the Fdistribution and Yis χ2
p+q. If we integrate out yto recover the
proper constant, we get the Fpdf
fX(x) = Γp+q
2
Γp
2Γq
2q
pp/2xp/21
1 + q
pxp+q
2
.
b. Since Fp,q =χ2
p/p
χ2
q/q , let Uχ2
p,Vχ2
qand Uand Vare independent. Then we have
EFp,q = E U/p
V/q = E U
pEq
V(by independence)
=p
pqE1
V(EU=p).
Then
E1
V=Z
0
1
v
1
Γq
22q/2vq
21ev
2dv =1
Γq
22q/2Z
0
vq2
21ev
2dv
=1
Γq
22q/2Γq2
22(q2)/2=Γq2
22(q2)/2
Γq2
2q2
22q/2=1
q2.
Hence, EFp,q =p
p
q
q2=q
q2, if q > 2. To calculate the variance, first calculate
E(F2
p,q) = E U2
p2
q2
V2=q2
p2E(U2)E 1
V2.
Now
E(U2) = Var(U) + (EU)2= 2p+p2
and
E1
V2=Z
0
1
v2
1
Γ (q/2) 2q/2v(q/2)1ev/2dv =1
(q2)(q4).
Therefore,
EF2
p,q =q2
p2p(2 + p)1
(q2)(q4) =q2
p
(p+ 2)
(q2)(q4),
and, hence
Var(Fp,q) = q2(p+ 2)
p(q2)(q4) q2
(q2)2= 2 q
q22q+p2
p(q4) , q > 4.
c. Write X=U/p
V/p then 1
X=V/q
U/p Fq,p, since Uχ2
p,Vχ2
qand Uand Vare independent.
d. Let Y=(p/q)X
1+(p/q)X=pX
q+pX , so X=qY
p(1Y)and dx
dy =q
p(1 y)2. Thus, Yhas pdf
fY(y) = Γq+p
2
Γp
2Γq
2p
qp
2qy
p(1y)p2
2
1 + p
q
qy
p(1y)p+q
2
q
p(1 y)2
=hBp
2,q
2i1yp
21(1 y)q
21beta p
2,q
2.
Second Edition 5-7
5.18 If Xtp, then X=Z/pV /p where Zn(0,1), Vχ2
pand Zand Vare independent.
a. EX= EZ/pV/p = (EZ)(E1/pV/p) = 0, since EZ= 0, as long as the other expectation is
finite. This is so if p > 1. From part b), X2F1,p. Thus VarX= EX2=p/(p2), if p > 2
(from Exercise 5.17b).
b. X2=Z2/(V/p). Z2χ2
1, so the ratio is distributed F1,p.
c. The pdf of Xis
fX(x) = "Γ(p+1
2)
Γ(p/2)#1
(1 + x2/p)(p+1)/2.
Denote the quantity in square brackets by Cp. From an extension of Stirling’s formula
(Exercise 1.28) we have
lim
p→∞ Cp= lim
p→∞
2πp1
2p1
2+1
2ep1
2
2πp2
2p2
2+1
2ep2
2
1
=e1/2
πlim
p→∞ p1
2p1
2+1
2
p2
2p2
2+1
2p
=e1/2
π
e1/2
2,
by an application of Lemma 2.3.14. Applying the lemma again shows that for each x
lim
p→∞ 1+x2/p(p+1)/2=ex2/2,
establishing the result.
d. As the random variable F1,p is the square of a tp, we conjecture that it would converge to
the square of a n(0,1) random variable, a χ2
1.
e. The random variable qFq,p can be thought of as the sum of qrandom variables, each a tp
squared. Thus, by all of the above, we expect it to converge to a χ2
qrandom variable as
p→ ∞.
5.19 a. χ2
pχ2
q+χ2
dwhere χ2
qand χ2
dare independent χ2random variables with qand d=pq
degrees of freedom. Since χ2
dis a positive random variable, for any a > 0,
P(χp> a) = P(χ2
q+χ2
d> a)> P (χ2
q> a).
b. For k1> k2,k1Fk1(U+V)/(W), where U,Vand Ware independent and Uχ2
k2,
Vχ2
k1k2and Wχ2
ν. For any a > 0, because V/(W) is a positive random variable,
we have
P(k1Fk1> a) = P((U+V)/(W)> a)> P (U/(W)> a) = P(k2Fk2> a).
c. α=P(Fk,ν > Fα,k) = P(kFk> kFα,k). So, kFα,kis the αcutoff point for the random
variable kFk,ν . Because kFk,ν is stochastically larger that (k1)Fk1, the αcutoff for kFk,ν
is larger than the αcutoff for (k1)Fk1, that is kFα,k,ν >(k1)Fα,k1.
5.20 a. The given integral is
Z
0
1
2πet2x/2νx1
Γ(ν/2)2ν/2(νx)(ν/2)1eνx/2dx
=1
2π
νν/2
Γ(ν/2)2ν/2Z
0
et2x/2x((ν+1)/2)1eνx/2dx
5-8 Solutions Manual for Statistical Inference
=1
2π
νν/2
Γ(ν/2)2ν/2Z
0
x((ν+1)/2)1e(ν+t2)x/2dx integrand is kernel of
gamma((ν+1)/2,2/(ν+t2)
=1
2π
νν/2
Γ(ν/2)2ν/2Γ((ν+ 1)/2) 2
ν+t2(ν+1)/2
=1
νπ
Γ((ν+1)/2)
Γ(ν/2)
1
(1 + t2)(ν+1)/2,
the pdf of a tνdistribution.
b. Differentiate both sides with respect to tto obtain
νfF(νt) = Z
0
yf1(ty)fν(y)dy,
where fFis the Fpdf. Now write out the two chi-squared pdfs and collect terms to get
νfF(νt) = t1/2
Γ(1/2)Γ(ν/2)2(ν+1)/2Z
0
y(ν1)/2e(1+t)y/2dy
=t1/2
Γ(1/2)Γ(ν/2)2(ν+1)/2
Γ(ν+1
2)2(ν+1)/2
(1 + t)(ν+1)/2.
Now define y=νt to get
fF(y) = Γ( ν+1
2)
νΓ(1/2)Γ(ν/2)
(y)1/2
(1 + y)(ν+1)/2,
the pdf of an F1.
c. Again differentiate both sides with respect to t, write out the chi-squared pdfs, and collect
terms to obtain
(ν/m)fF((ν/m)t) = tm/2
Γ(m/2)Γ(ν/2)2(ν+m)/2Z
0
y(m+ν2)/2e(1+t)y/2dy.
Now, as before, integrate the gamma kernel, collect terms, and define y= (ν/m)tto get
fF(y) = Γ( ν+m
2)
Γ(m/2)Γ(ν/2) m
νm/2ym/21
(1 + (m/ν)y)(ν+m)/2,
the pdf of an Fm,ν .
5.21 Let mdenote the median. Then, for general nwe have
P(max(X1, . . . , Xn)> m)=1P(Ximfor i= 1,2, . . . , n)
= 1 [P(X1m)]n= 1 1
2n
.
5.22 Calculating the cdf of Z2, we obtain
FZ2(z) = P((min(X, Y ))2z) = P(zmin(X, Y )z)
=P(min(X, Y )z)P(min(X, Y )≤ −z)
= [1 P(min(X, Y )>z)] [1 P(min(X, Y )>z)]
=P(min(X, Y )>z)P(min(X, Y )>z)
=P(X > z)P(Y > z)P(X > z)P(Y > z),
Second Edition 5-9
where we use the independence of Xand Y. Since Xand Yare identically distributed, P(X >
a) = P(Y > a) = 1 FX(a), so
FZ2(z) = (1 FX(z))2(1 FX(z))2= 1 2FX(z),
since 1 FX(z) = FX(z). Differentiating and substituting gives
fZ2(z) = d
dz FZ2(z) = fX(z)1
z=1
2πez/2z1/2,
the pdf of a χ2
1random variable. Alternatively,
P(Z2z) = P[min(X, Y )]2z
=P(zmin(X, Y )z)
=P(zXz, X Y) + P(zYz, Y X)
=P(zXz|XY)P(XY)
+P(zYz|YX)P(YX)
=1
2P(zXz) + 1
2P(zYz),
using the facts that Xand Yare independent, and P(YX) = P(XY) = 1
2. Moreover,
since Xand Yare identically distributed
P(Z2z) = P(zXz)
and
fZ2(z) = d
dz P(zXz) = 1
2π(ez/21
2z1/2+ez/21
2z1/2)
=1
2πz1/2ez/2,
the pdf of a χ2
1.
5.23
P(Z > z) =
X
x=1
P(Z > z|x)P(X=x) =
X
x=1
P(U1> z, . . . , Ux> z|x)P(X=x)
=
X
x=1
x
Y
i=1
P(Ui> z)P(X=x) (by independence of the Ui’s)
=
X
x=1
P(Ui> z)xP(X=x) =
X
x=1
(1 z)x1
(e1)x!
=1
(e1)
X
x=1
(1 z)x
x!=e1z1
e10< z < 1.
5.24 Use fX(x)=1,FX(x) = x/θ, 0 < x < θ. Let Y=X(n),Z=X(1). Then, from Theorem
5.4.6,
fZ,Y (z, y) = n!
0!(n2)!0!
1
θ
1
θz
θ0yz
θn21y
θ0=n(n1)
θn(yz)n2,0< z < y < θ.
5-10 Solutions Manual for Statistical Inference
Now let W=Z/Y ,Q=Y. Then Y=Q,Z=W Q, and |J|=q. Therefore
fW,Q(w, q) = n(n1)
θn(qwq)n2q=n(n1)
θn(1 w)n2qn1,0< w < 1,0< q < θ.
The joint pdf factors into functions of wand q, and, hence, Wand Qare independent.
5.25 The joint pdf of X(1), . . . , X(n)is
f(u1, . . . , un) = n!an
θan ua1
1···ua1
n,0< u1<··· < un< θ.
Make the one-to-one transformation to Y1=X(1)/X(2), . . . , Yn1=X(n1)/X(n), Yn=X(n).
The Jacobian is J=y2y2
3···yn1
n. So the joint pdf of Y1, . . . , Ynis
f(y1, . . . , yn) = n!an
θan (y1···yn)a1(y2···yn)a1···(yn)a1(y2y2
3···yn1
n)
=n!an
θan ya1
1y2a1
2···yna1
n,0< yi<1; i= 1, . . . , n 1,0< yn< θ.
We see that f(y1, . . . , yn) factors so Y1, . . . , Ynare mutually independent. To get the pdf of
Y1, integrate out the other variables and obtain that fY1(y1) = c1ya1
1, 0 < y1<1, for some
constant c1. To have this pdf integrate to 1, it must be that c1=a. Thus fY1(y1) = aya1
1,
0< y1<1. Similarly, for i= 2, . . . , n 1, we obtain fYi(yi) = iayia1
i,0< yi<1.From
Theorem 5.4.4, the pdf of Ynis fYn(yn) = na
θna yna1
n, 0 < yn< θ. It can be checked that the
product of these marginal pdfs is the joint pdf given above.
5.27 a. fX(i)|X(j)(u|v) = fX(i),X(j)(u, v)/fX(j)(v). Consider two cases, depending on which of ior
jis greater. Using the formulas from Theorems 5.4.4 and 5.4.6, and after cancellation, we
obtain the following.
(i) If i < j,
fX(i)|X(j)(u|v) = (j1)!
(i1)!(j1i)!fX(u)Fi1
X(u)[FX(v)FX(u)]ji1F1j
X(v)
=(j1)!
(i1)!(j1i)!
fX(u)
FX(v)FX(u)
FX(v)i11FX(u)
FX(v)ji1
, u < v.
Note this interpretation. This is the pdf of the ith order statistic from a sample of size j1,
from a population with pdf given by the truncated distribution, f(u) = fX(u)/FX(v),
u<v.
(ii) If j < i and u>v,
fX(i)|X(j)(u|v)
=(nj)!
(n1)!(i1j)!fX(u) [1FX(u)]ni[FX(u)FX(v)]i1j[1FX(v)]jn
=(nj)!
(ij1)!(ni)!
fX(u)
1FX(v)FX(u)FX(v)
1FX(v)ij11FX(u)FX(v)
1FX(v)ni
.
This is the pdf of the (ij)th order statistic from a sample of size nj, from a population
with pdf given by the truncated distribution, f(u) = fX(u)/(1 FX(v)), u>v.
b. From Example 5.4.7,
fV|R(v|r) = n(n1)rn2/an
n(n1)rn2(ar)/an=1
ar, r/2< v < a r/2.
Second Edition 5-11
5.29 Let Xi= weight of ith booklet in package. The Xis are iid with EXi= 1 and VarXi=.052.
We want to approximate PP100
i=1 Xi>100.4=PP100
i=1 Xi/100 >1.004=P(¯
X > 1.004).
By the CLT, P(¯
X > 1.004) P(Z > (1.004 1)/(.05/10)) = P(Z > .8) = .2119.
5.30 From the CLT we have, approximately, ¯
X1n(µ, σ2/n), ¯
X2n(µ, σ2/n). Since ¯
X1and ¯
X2
are independent, ¯
X1¯
X2n(0,2σ2/n). Thus, we want
.99 P¯
X1¯
X2< σ/5
=P σ/5
σ/pn/2<¯
X1¯
X2
σ/pn/2<σ/5
σ/pn/2!
P1
5rn
2< Z < 1
5rn
2,
where Zn(0,1). Thus we need P(Zn/5(2)) .005. From Table 1, n/52 = 2.576,
which implies n= 50(2.576)2332.
5.31 We know that σ2
¯
X= 9/100. Use Chebyshev’s Inequality to get
P3k/10 <¯
Xµ < 3k/1011/k2.
We need 1 1/k2.9 which implies k10 = 3.16 and 3k/10 = .9487. Thus
P(.9487 <¯
Xµ < .9487) .9
by Chebychev’s Inequality. Using the CLT, ¯
Xis approximately nµ, σ2
¯
Xwith σ¯
X=.09 = .3
and ( ¯
Xµ)/.3n(0,1). Thus
.9 = P1.645 <¯
Xµ
.3<1.645=P(.4935 <¯
Xµ < .4935).
Thus, we again see the conservativeness of Chebychev’s Inequality, yielding bounds on ¯
Xµ
that are almost twice as big as the normal approximation. Moreover, with a sample of size 100,
¯
Xis probably very close to normally distributed, even if the underlying Xdistribution is not
close to normal.
5.32 a. For any  > 0,
PpXna> =PpXnapXn+a> pXn+a
=P|Xna|> pXn+a
P|Xna|> a0,
as n→ ∞, since Xna in probability. Thus Xnain probability.
b. For any  > 0,
P
a
Xn1=Pa
1+Xna
1
=Paa
1+Xna+a
1
Paa
1+Xna+a
1+ a+a
1+< a +a
1
=P|Xna| ≤ a
1+1,
as n→ ∞, since Xna in probability. Thus a/Xn1 in probability.
5-12 Solutions Manual for Statistical Inference
c. S2
nσ2in probability. By a), Sn=pS2
nσ2=σin probability. By b), σ/Sn1 in
probability.
5.33 For all  > 0 there exist Nsuch that if n > N, then P(Xn+Yn> c)>1. Choose N1such
that P(Xn>m)>1/2 and N2such that P(Yn> c +m)>1/2. Then
P(Xn+Yn> c)P(Xn>m, +Yn> c +m)P(Xn>m) + P(Yn> c +m)1 = 1 .
5.34 Using E ¯
Xn=µand Var ¯
Xn=σ2/n, we obtain
En(¯
Xnµ)
σ=n
σE( ¯
Xnµ) = n
σ(µµ) = 0.
Varn(¯
Xnµ)
σ=n
σ2Var( ¯
Xnµ) = n
σ2Var ¯
X=n
σ2
σ2
n= 1.
5.35 a. Xiexponential(1). µX= 1, VarX= 1. From the CLT, ¯
Xnis approximately n(1,1/n). So
¯
Xn1
1/nZn(0,1) and P¯
Xn1
1/nxP(Zx).
b.
d
dxP(Zx) = d
dxFZ(x) = fZ(x) = 1
2πex2/2.
d
dxP¯
Xn1
1/nx
=d
dx n
X
i=1
Xixn+n! W=
n
X
i=1
Xigamma(n, 1)!
=d
dxFW(xn+n) = fW(xn+n)·n=1
Γ(n)(xn+n)n1e(xn+n)n.
Therefore, (1/Γ(n))(xn+n)n1e(xn+n)n1
2πex2/2as n→ ∞. Substituting x= 0
yields n!nn+1/2en2π.
5.37 a. For the exact calculations, use the fact that Vnis itself distributed negative binomial(10r, p).
The results are summarized in the following table. Note that the recursion relation of problem
3.48 can be used to simplify calculations.
P(Vn=v)
(a) (b) (c)
vExact Normal App. Normal w/cont.
0 .0008 .0071 .0056
1 .0048 .0083 .0113
2 .0151 .0147 .0201
3 .0332 .0258 .0263
4 .0572 .0392 .0549
5 .0824 .0588 .0664
6 .1030 .0788 .0882
7 .1148 .0937 .1007
8 .1162 .1100 .1137
9 .1085 .1114 .1144
10 .0944 .1113 .1024
Second Edition 5-13
b. Using the normal approximation, we have µv=r(1 p)/p = 20(.3)/.7 = 8.57 and
σv=qr(1 p)/p2=p(20)(.3)/.49 = 3.5.
Then,
P(Vn= 0) = 1 P(Vn1) = 1 PVn8.57
3.518.57
3.5= 1 P(Z≥ −2.16) = .0154.
Another way to approximate this probability is
P(Vn= 0) = P(Vn0) = PV8.57
3.508.57
3.5=P(Z≤ −2.45) = .0071.
Continuing in this way we have P(V= 1) = P(V1) P(V0) = .0154 .0071 = .0083,
etc.
c. With the continuity correction, compute P(V=k) by P(k.5)8.57
3.5Z(k+.5)8.57
3.5, so
P(V= 0) = P(9.07/3.5Z≤ −8.07/3.5) = .0104 .0048 = .0056, etc. Notice that the
continuity correction gives some improvement over the uncorrected normal approximation.
5.39 a. If his continuous given  > 0 there exits δsuch that |h(xn)h(x)|<  for |xnx|< δ. Since
X1, . . . , Xnconverges in probability to the random variable X, then limn→∞ P(|XnX|<
δ) = 1. Thus limn→∞ P(|h(Xn)h(X)|< ) = 1.
b. Define the subsequence Xj(s) = s+I[a,b](s) such that in I[a,b],ais always 0, i.e, the subse-
quence X1, X2, X4, X7, . . .. For this subsequence
Xj(s)nsif s > 0
s+ 1 if s= 0.
5.41 a. Let =|xµ|.
(i) For xµ0
P(|Xnµ|> ) = P(|Xnµ|> x µ)
=P(Xnµ < (xµ)) + P(Xnµ > x µ)
P(Xnµ > x µ)
=P(Xn> x)=1P(Xnx).
Therefore, 0 = limn→∞ P(|Xnµ|> )limn→∞ 1P(Xnx). Thus limn→∞ P(Xn
x)1.
(ii) For xµ < 0
P(|Xnµ|> ) = P(|Xnµ|>(xµ))
=P(Xnµ < x µ) + P(Xnµ > (xµ))
P(Xnµ < x µ)
=P(Xnx).
Therefore, 0 = limn→∞ P(|Xnµ|> )limn→∞ P(Xnx).
By (i) and (ii) the results follows.
b. For every  > 0,
P(|Xnµ|> )P(Xnµ < ) + P(Xnµ > )
=P(Xn< µ )+1P(Xnµ+)0 as n→ ∞.
5-14 Solutions Manual for Statistical Inference
5.43 a. P(|Ynθ|< ) = Pp(n)(Ynθ)<p(n). Therefore,
lim
n→∞ P(|Ynθ|< ) = lim
n→∞ Pp(n)(Ynθ)<p(n)=P(|Z|<) = 1,
where Zn(0, σ2). Thus Ynθin probability.
b. By Slutsky’s Theorem (a), g0(θ)n(Ynθ)g0(θ)Xwhere Xn(0, σ2). Therefore
n[g(Yn)g(θ)] = g0(θ)n(Ynθ)n(0, σ2[g0(θ)]2).
5.45 We do part (a), the other parts are similar. Using Mathematica, the exact calculation is
In[120]:=
f1[x_]=PDF[GammaDistribution[4,25],x]
p1=Integrate[f1[x],{x,100,\[Infinity]}]//N
1-CDF[BinomialDistribution[300,p1],149]
Out[120]=
e^(-x/25) x^3/2343750
Out[121]=
0.43347
Out[122]=
0.0119389.
The answer can also be simulated in Mathematica or in R. Here is the R code for simulating
the same probability
p1<-mean(rgamma(10000,4,scale=25)>100)
mean(rbinom(10000, 300, p1)>149)
In each case 10,000 random variables were simulated. We obtained p1=0.438 and a binomial
probability of 0.0108.
5.47 a. 2 log(Uj)exponential(2) χ2
2. Thus Yis the sum of νindependent χ2
2random variables.
By Lemma 5.3.2(b), Yχ2
2ν.
b. βlog(Uj)exponential(2) gamma(1, β). Thus Yis the sum of independent gamma
random variables. By Example 4.6.8, Ygamma(a, β)
c. Let V=Pa
j=1 log(Uj)gamma(a, 1). Similarly W=Pb
j=1 log(Uj)gamma(b, 1). By
Exercise 4.24, V
V+Wbeta(a, b).
5.49 a. See Example 2.1.4.
b. X=g(U) = log 1U
U. Then g1(x) = 1
1+ey. Thus
fX(x) = 1 ×
ey
(1 + ey)2=ey
(1 + ey)2− ∞ < y < ,
which is the density of a logistic(0,1) random variable.
c. Let Ylogistic(µ, β) then fY(y) = 1
βfZ((yµ)
β) where fZis the density of a logistic(0,1).
Then Y=βZ +µ. To generate a logistic(µ, β) random variable generate (i) generate U
uniform(0,1), (ii) Set Y=βlog U
1U+µ.
5.51 a. For Uiuniform(0,1), EUi= 1/2, VarUi= 1/12. Then
X=
12
X
i=1
Ui6 = 12 ¯
U6 = 12 ¯
U1/2
1/12
Second Edition 5-15
is in the form n(¯
UEU)with n= 12, so Xis approximately n(0,1) by the Central
Limit Theorem.
b. The approximation does not have the same range as Zn(0,1) where −∞ < Z < +,
since 6< X < 6.
c.
EX= E 12
X
i=1
Ui6!=
12
X
i=1
EUi6 = 12
X
i=1
1
2!6 = 6 6 = 0.
VarX= Var 12
X
i=1
Ui6!= Var
12
X
i=1
Ui= 12VarU1= 1
EX3= 0 since Xis symmetric about 0. (In fact, all odd moments of Xare 0.) Thus, the first
three moments of Xall agree with the first three moments of a n(0,1). The fourth moment
is not easy to get, one way to do it is to get the mgf of X. Since EetU = (et1)/t,
EetP12
i=1 Ui6=e6tet1
t12
=et/2et/2
t12
.
Computing the fourth derivative and evaluating it at t= 0 gives us EX4. This is a lengthy
calculation. The answer is EX4= 29/10, slightly smaller than EZ4= 3, where Zn(0,1).
5.53 The R code is the following:
a. obs <- rbinom(1000,8,2/3)
meanobs <- mean(obs)
variance <- var(obs)
hist(obs)
Output:
> meanobs
[1] 5.231
> variance
[1] 1.707346
b. obs<- rhyper(1000,8,2,4)
meanobs <- mean(obs)
variance <- var(obs)
hist(obs)
Output:
> meanobs
[1] 3.169
> variance
[1] 0.4488879
c. obs <- rnbinom(1000,5,1/3)
meanobs <- mean(obs)
variance <- var(obs)
hist(obs)
Output:
> meanobs
[1] 10.308
> variance
[1] 29.51665
5-16 Solutions Manual for Statistical Inference
5.55 Let Xdenote the number of comparisons. Then
EX=
X
k=0
P(X > k) = 1 +
X
k=1
P(U > Fy(yk1))
= 1 +
X
k=1
(1 Fy(yk1)) = 1 +
X
k=0
(1 Fy(yi)) = 1 + EY
5.57 a. Cov(Y1, Y2) = Cov(X1+X3, X2+X3) = Cov(X3, X3) = λ3since X1, X2and X3are
independent.
b.
Zi=n1 if Xi=X3= 0
0 otherwise
pi=P(Zi= 0) = P(Yi= 0) = P(Xi= 0, X3= 0) = e(λi+λ3). Therefore Ziare
Bernoulli(pi) with E[Zi] = pi, Var(Zi) = pi(1 pi) and
E[Z1Z2] = P(Z1= 1, Z2= 1) = P(Y1= 0, Y2= 0)
=P(X1+X3= 0, X2+X3= 0) = P(X1= 0)P(X2= 0)P(X3= 0)
=eλ1eλ2eλ3.
Therefore,
Cov(Z1, Z2) = E[Z1Z2]E[Z1]E[Z2]
=eλ1eλ2eλ3e(λi+λ3)e(λ2+λ3)=e(λi+λ3)e(λ2+λ3)(eλ31)
=p1p2(eλ31).
Thus Corr(Z1, Z2) = p1p2(eλ31)
p1(1p1)p2(1p2).
c. E[Z1Z2]pi, therefore
Cov(Z1, Z2) = E[Z1Z2]E[Z1]E[Z2]p1p1p2=p1(1 p2),and
Cov(Z1, Z2)p2(1 p1).
Therefore,
Corr(Z1, Z2)p1(1 p2)
pp1(1 p1)pp2(1 p2)=pp1(1 p2)
pp2(1 p1)
and
Corr(Z1, Z2)p2(1 p1)
pp1(1 p1)pp2(1 p2)=pp2(1 p1)
pp1(1 p2)
which implies the result.
5.59
P(Yy) = P(Vy|U < 1
cfY(V)) = P(Vy, U < 1
cfY(V))
P(U < 1
cfY(V))
=Ry
0R1
cfY(v)
0dudv
1
c
=
1
cRy
0fY(v)dv
1
c
=Zy
0
fY(v)dv
5.61 a. M= supy
Γ(a+b)
Γ(a)Γ(b)ya1(1y)b1
Γ([a]+[b])
Γ([a])Γ([b]) y[a]1(1y)[b]1<, since a[a]>0 and b[b]>0 and y(0,1).
Second Edition 5-17
b. M= supy
Γ(a+b)
Γ(a)Γ(b)ya1(1y)b1
Γ([a]+b)
Γ([a])Γ(b)y[a]1(1y)b1<, since a[a]>0 and y(0,1).
c. M= supy
Γ(a+b)
Γ(a)Γ(b)ya1(1y)b1
Γ([a]+1+β)
Γ([a]+1)Γ(b0)y[a]+11(1y)b01<, since a[a]1<0 and y(0,1). bb0>0
when b0= [b] and will be equal to zero when b0=b, thus it does not affect the result.
d. Let f(y) = yα(1 y)β. Then
df(y)
dy =αyα1(1 y)βyαβ(1 y)β1=yα1(1 y)β1[α(1 y) + βy]
which is maximize at y=α
α+β. Therefore for, α=aa0and β=bb0
M=
Γ(a+b)
Γ(a)Γ(b)
Γ(a0+b0)
Γ(a0)Γ(b0)aa0
aa0+bb0aa0bb0
aa0+bb0bb0
.
We need to minimize Min a0and b0. First consider aa0
aa0+bb0aa0bb0
aa0+bb0bb0. Let
c=α+β, then this term becomes α
cαcα
ccα. This term is maximize at α
c=1
2, this
is at α=1
2c. Then M= (1
2)(aa0+bb0)
Γ(a+b)
Γ(a)Γ(b)
Γ(a0+b0)
Γ(a0)Γ(b0)
. Note that the minimum that Mcould be
is one, which it is attain when a=a0and b=b0. Otherwise the minimum will occur when
aa0and bb0are minimum but greater or equal than zero, this is when a0= [a] and
b0= [b] or a0=aand b0= [b] or a0= [a] and b0=b.
5.63 M= supy
1
2πey2
2
1
2λe−|y|
λ
. Let f(y) = y2
2+|y|
λ. Then f(y) is maximize at y=1
λwhen y0 and at
y=1
λwhen y < 0. Therefore in both cases M=
1
2πe1
2λ2
1
2λe1
λ2
. To minimize Mlet M0=λe 1
2λ2.
Then dlog M0
=1
λ1
λ3, therefore Mis minimize at λ= 1 or λ=1. Thus the value of λthat
will optimize the algorithm is λ= 1.
5.65
P(Xx) =
m
X
i=1
P(Xx|qi)qi=
m
X
i=1
I(Yix)qi=
1
mPm
i=1
f(Yi)
g(Yi)I(Yix)
1
mPm
i=1
f(Yi)
g(Yi)
m→∞
Egf(Y)
g(Y)I(Yx)
Egf(Y)
g(Y)
=Rx
−∞
f(y)
g(y)g(y)dy
R
−∞
f(y)
g(y)g(y)dy =Zx
−∞
f(y)dy.
5.67 An R code to generate the sample of size 100 from the specified distribution is shown for part
c). The Metropolis Algorithm is used to generate 2000 variables. Among other options one can
choose the 100 variables in positions 1001 to 1100 or the ones in positions 1010,1020, ..., 2000.
a. We want to generate X=σZ +µwhere ZStudent’s twith νdegrees of freedom.
Therefore we first can generate a sample of size 100 from a Student’s tdistribution with
νdegrees of freedom and then make the transformation to obtain the X’s. Thus fZ(z) =
Γ( ν+1
2)
Γ( ν
2)
1
νπ
1
1+z2
ν(v+1)/2. Let Vn(0,ν
ν2) since given νwe can set
EV= EZ= 0,and Var(V) = Var(Z) = ν
ν2.
Now, follow the algorithm on page 254 and generate the sample Z1, Z2. . . , Z100 and then
calculate Xi=σZi+µ.
5-18 Solutions Manual for Statistical Inference
b. fX(x) = 1
2πσ
e(log xµ)2/2σ2
x. Let Vgamma(α, β) where
α=(eµ+(σ2/2))2
e2(µ+σ2)e2µ+σ2,and β=e2(µ+σ2)e2µ+σ2
eµ+(σ2/2) ,
since given µand σ2we can set
EV=αβ =eµ+(σ2/2) = EX
and
Var(V) = αβ2=e2(µ+σ2)e2µ+σ2= Var(X).
Now, follow the algorithm on page 254.
c. fX(x) = α
βexα
βxα1. Let Vexponential(β). Now, follow the algorithm on page 254 where
ρi= min (Vα1
i
Zα1
i1
eVα
i+ViZi1+Zα
i1
β,1)
An R code to generate a sample size of 100 from a Weibull(3,2) is:
#initialize a and b
b <- 2
a <- 3
Z <- rexp(1,1/b)
ranvars <- matrix(c(Z),byrow=T,ncol=1)
for( i in seq(2000))
{
U <- runif(1,min=0,max=1)
V <- rexp(1,1/b)
p <- pmin((V/Z)^(a-1)*exp((-V^a+V-Z+Z^a)/b),1)
if (U <= p)
Z <- V
ranvars <- cbind(ranvars,Z)
}
#One option: choose elements in position 1001,1002,...,1100
to be the sample
vector.1 <- ranvars[1001:1100]
mean(vector.1)
var(vector.1)
#Another option: choose elements in position 1010,1020,...,2000
to be the sample
vector.2 <- ranvars[seq(1010,2000,10)]
mean(vector.2)
var(vector.2)
Output:
[1] 1.048035
[1] 0.1758335
[1] 1.130649
[1] 0.1778724
5.69 Let w(v, z) = fY(v)fV(z)
fV(v)fY(z), and then ρ(v, z) = min{w(v, z),1}. We will show that
ZifYP(Zi+1 a) = P(Ya).
Second Edition 5-19
Write
P(Zi+1 a) = P(Vi+1 aand Ui+1 ρi+1) + P(Ziaand Ui+1 > ρi+1).
Since ZifY, suppressing the unnecessary subscripts we can write
P(Zi+1 a) = P(Vaand Uρ(V, Y )) + P(Yaand U > ρ(V, Y )).
Add and subtract P(Yaand Uρ(V, Y )) to get
P(Zi+1 a) = P(Ya) + P(Vaand Uρ(V, Y ))
P(Yaand Uρ(V, Y )).
Thus we need to show that
P(Vaand Uρ(V, Y )) = P(Yaand Uρ(V, Y )).
Write out the probability as
P(Vaand Uρ(V, Y ))
=Za
−∞ Z
−∞
ρ(v, y)fY(y)fV(v)dydv
=Za
−∞ Z
−∞
I(w(v, y)1) fY(v)fV(y)
fV(v)fY(y)fY(y)fV(v)dydv
+Za
−∞ Z
−∞
I(w(v, y)1)fY(y)fV(v)dydv
=Za
−∞ Z
−∞
I(w(v, y)1)fY(v)fV(y)dydv
+Za
−∞ Z
−∞
I(w(v, y)1)fY(y)fV(v)dydv.
Now, notice that w(v, y) = 1/w(y, v), and thus first term above can be written
Za
−∞ Z
−∞
I(w(v, y)1)fY(v)fV(y)dydv
=Za
−∞ Z
−∞
I(w(y, v)>1)fY(v)fV(y)dydv
=P(Ya, ρ(V, Y ) = 1, U ρ(V, Y )).
The second term is
Za
−∞ Z
−∞
I(w(v, y)1)fY(y)fV(v)dydv
=Za
−∞ Z
−∞
I(w(y, v)1)fY(y)fV(v)dydv
=Za
−∞ Z
−∞
I(w(y, v)1) fV(y)fY(v)
fV(y)fY(v)fY(y)fV(v)dydv
=Za
−∞ Z
−∞
I(w(y, v)1) fY(y)fV(v)
fV(y)fY(v)fV(y)fY(v)dydv
=Za
−∞ Z
−∞
I(w(y, v)1)w(y, v)fV(y)fY(v)dydv
=P(Ya, U ρ(V, Y ), ρ(V, Y )1).
5-20 Solutions Manual for Statistical Inference
Putting it all together we have
P(Vaand Uρ(V, Y )) = P(Ya, ρ(V, Y ) = 1, U ρ(V, Y ))
+P(Ya, U ρ(V, Y ), ρ(V, Y )1)
=P(Yaand Uρ(V, Y )),
and hence
P(Zi+1 a) = P(Ya),
so fYis the stationary density.
Chapter 6
Principles of Data Reduction
6.1 By the Factorization Theorem, |X|is sufficient because the pdf of Xis
f(x|σ2) = 1
2πσ ex2/2σ2=1
2πσ e−|x|2/2σ2=g(|x||σ2)·1
|{z}
h(x)
.
6.2 By the Factorization Theorem, T(X) = mini(Xi/i) is sufficient because the joint pdf is
f(x1, . . . , xn|θ) =
n
Y
i=1
exiI(,+)(xi) = einθI(θ,+)(T(x))
| {z }
g(T(x)|θ)
·eΣixi
| {z }
h(x)
.
Notice, we use the fact that i > 0, and the fact that all xis> iθ if and only if mini(xi/i)> θ.
6.3 Let x(1) = minixi. Then the joint pdf is
f(x1, . . . , xn|µ, σ) =
n
Y
i=1
1
σe(xiµ)I(µ,)(xi) = eµ/σ
σn
eΣixiI(µ,)(x(1))
| {z }
g(x(1),Σixi|µ,σ)
·1
|{z}
h(x)
.
Thus, by the Factorization Theorem, X(1),PiXiis a sufficient statistic for (µ, σ).
6.4 The joint pdf is
n
Y
j=1 (h(xj)c(θ) exp k
X
i=1
wi(θ)ti(xj)!)=c(θ)nexp
k
X
i=1
wi(θ)
n
X
j=1
ti(xj)
| {z }
g(T(x)|θ)
·
n
Y
j=1
h(xj)
| {z }
h(x)
.
By the Factorization Theorem, Pn
j=1 t1(Xj), . . . , Pn
j=1 tk(Xj)is a sufficient statistic for θ.
6.5 The sample density is given by
n
Y
i=1
f(xi|θ) =
n
Y
i=1
1
2I(i(θ1) xii(θ+ 1))
=1
2θn n
Y
i=1
1
i!Imin xi
i≥ −(θ1)Imax xi
iθ+ 1.
Thus (min Xi/i, max Xi/i) is sufficient for θ.
6-2 Solutions Manual for Statistical Inference
6.6 The joint pdf is given by
f(x1, . . . , xn|α, β) =
n
Y
i=1
1
Γ(α)βαxiα1exi=1
Γ(α)βαn n
Y
i=1
xi!α1
eΣixi.
By the Factorization Theorem, (Qn
i=1 Xi,Pn
i=1 Xi) is sufficient for (α, β).
6.7 Let x(1) = mini{x1, . . . , xn},x(n)= maxi{x1, . . . , xn},y(1) = mini{y1, . . . , yn}and y(n)=
maxi{y1, . . . , yn}. Then the joint pdf is
f(x,y|θ)
=
n
Y
i=1
1
(θ3θ1)(θ4θ2)I(θ13)(xi)I(θ24)(yi)
=1
(θ3θ1)(θ4θ2)n
I(θ1,)(x(1))I(−∞3)(x(n))I(θ2,)(y(1))I(−∞4)(y(n))
| {z }
g(T(x)|θ)
·1
|{z}
h(x)
.
By the Factorization Theorem, X(1), X(n), Y(1), Y(n)is sufficient for (θ1, θ2, θ3, θ4).
6.9 Use Theorem 6.2.13.
a.
f(x|θ)
f(y|θ)=(2π)n/2eΣi(xiθ)2/2
(2π)n/2eΣi(yiθ)2/2= exp (1
2" n
X
i=1
x2
i
n
X
i=1
y2
i!+2θn(¯y¯x)#).
This is constant as a function of θif and only if ¯y= ¯x; therefore ¯
Xis a minimal sufficient
statistic for θ.
b. Note, for Xlocation exponential(θ), the range depends on the parameter. Now
f(x|θ)
f(y|θ)=Qn
i=1 e(xiθ)I(θ,)(xi)
Qn
i=1 e(yiθ)I(θ,)(yi)
=eeΣixiQn
i=1 I(θ,)(xi)
eeΣiyiQn
i=1 I(θ,)(yi)=eΣixiI(θ,)(min xi)
eΣiyiI(θ,)(min yi).
To make the ratio independent of θwe need the ratio of indicator functions independent
of θ. This will be the case if and only if min{x1, . . . , xn}= min{y1, . . . , yn}. So T(X) =
min{X1, . . . , Xn}is a minimal sufficient statistic.
c.
f(x|θ)
f(y|θ)=eΣi(xiθ)
Qn
i=1 1 + e(xiθ)2Qn
i=1 1 + e(yiθ)2
eΣi(yiθ)
=eΣi(yixi) Qn
i=1 1 + e(yiθ)
Qn
i=1 1 + e(xiθ)!2
.
This is constant as a function of θif and only if xand yhave the same order statistics.
Therefore, the order statistics are minimal sufficient for θ.
d. This is a difficult problem. The order statistics are a minimal sufficient statistic.
Second Edition 6-3
e. Fix sample points xand y. Define A(θ) = {i:xiθ},B(θ) = {i:yiθ},a(θ) = the
number of elements in A(θ) and b(θ) = the number of elements in B(θ). Then the function
f(x|θ)/f(y|θ) depends on θonly through the function
n
X
i=1 |xiθ| −
n
X
i=1 |yiθ|
=X
iA(θ)
(θxi) + X
iA(θ)c
(xiθ)X
iB(θ)
(θyi)X
iB(θ)c
(yiθ)
= (a(θ)[na(θ)] b(θ)+[nb(θ)])θ
+
X
iA(θ)
xi+X
iA(θ)c
xi+X
iB(θ)
yiX
iB(θ)c
yi
= 2(a(θ)b(θ))θ+
X
iA(θ)
xi+X
iA(θ)c
xi+X
iB(θ)
yiX
iB(θ)c
yi
.
Consider an interval of θs that does not contain any xis or yis. The second term is constant
on such an interval. The first term will be constant, on the interval if and only if a(θ) = b(θ).
This will be true for all such intervals if and only if the order statistics for xare the same
as the order statistics for y. Therefore, the order statistics are a minimal sufficient statistic.
6.10 To prove T(X) = (X(1), X(n)) is not complete, we want to find g[T(X)] such that E g[T(X)] = 0
for all θ, but g[T(X)] 6≡ 0 . A natural candidate is R=X(n)X(1), the range of X, because by
Example 6.2.17 its distribution does not depend on θ. From Example 6.2.17, Rbeta(n1,2).
Thus E R= (n1)/(n+ 1) does not depend on θ, and E(RER) = 0 for all θ. Thus
g[X(n), X(1)] = X(n)X(1) (n1)/(n+ 1) = RERis a nonzero function whose expected
value is always 0. So, (X(1), X(n)) is not complete. This problem can be generalized to show
that if a function of a sufficient statistic is ancillary, then the sufficient statistic is not complete,
because the expectation of that function does not depend on θ. That provides the opportunity
to construct an unbiased, nonzero estimator of zero.
6.11 a. These are all location families. Let Z(1), . . . , Z(n)be the order statistics from a random
sample of size nfrom the standard pdf f(z|0). Then (Z(1) +θ, . . . , Z(n)+θ) has the same
joint distribution as (X(1), . . . , X(n)), and (Y(1), . . . , Y(n1)) has the same joint distribution
as (Z(n)+θ(Z(1) +θ), . . . , Z(n)+θ(Z(n1) +θ)) = (Z(n)Z(1), . . . , Z(n)Z(n1)).
The last vector depends only on (Z1, . . . , Zn) whose distribution does not depend on θ. So,
(Y(1), . . . , Y(n1)) is ancillary.
b. For a), Basu’s lemma shows that (Y1, . . . ,Yn1) is independent of the complete sufficient
statistic. For c), d), and e) the order statistics are sufficient, so (Y1, . . . ,Yn1) is not inde-
pendent of the sufficient statistic. For b), X(1) is sufficient. Define Yn=X(1). Then the joint
pdf of (Y1, . . . ,Yn) is
f(y1, . . . , yn) = n!en(y1θ)e(n1)yn
n1
Y
i=2
eyi,0< yn1< yn2<··· < y1
0< yn<.
Thus, Yn=X(1) is independent of (Y1, . . . , Yn1).
6.12 a. Use Theorem 6.2.13 and write
f(x, n|θ)
f(y, n0|θ)=f(x|θ, N =n)P(N=n)
f(y|θ, N =n0)P(N=n0)
=n
xθx(1θ)nxpn
n0
yθy(1θ)n0ypn0
=θxy(1 θ)nn0x+yn
xpn
n0
ypn0
.
6-4 Solutions Manual for Statistical Inference
The last ratio does not depend on θ. The other terms are constant as a function of θif and
only if n=n0and x=y. So (X, N ) is minimal sufficient for θ. Because P(N=n) = pn
does not depend on θ,Nis ancillary for θ. The point is that although Nis independent of
θ, the minimal sufficient statistic contains Nin this case. A minimal sufficient statistic may
contain an ancillary statistic.
b.
EX
N= E EX
NN = E 1
NE (X|N)= E 1
NNθ= E(θ) = θ.
VarX
N= VarEX
NN+ EVar X
NN = Var(θ)+E1
N2Var (X|N)
= 0 + E Nθ(1θ)
N2=θ(1 θ)E 1
N.
We used the fact that X|Nbinomial(N, θ).
6.13 Let Y1= log X1and Y2= log X2. Then Y1and Y2are iid and, by Theorem 2.1.5, the pdf of
each is
f(y|α) = αexp {αy eαy}=1
1exp y
1ey/(1),−∞ < y < .
We see that the family of distributions of Yiis a scale family with scale parameter 1. Thus,
by Theorem 3.5.6, we can write Yi=1
αZi, where Z1and Z2are a random sample from f(z|1).
Then logX1
logX2
=Y1
Y2
=(1)Z1
(1)Z2
=Z1
Z2
.
Because the distribution of Z1/Z2does not depend on α, (log X1)/(log X2) is an ancillary
statistic.
6.14 Because X1, . . . , Xnis from a location family, by Theorem 3.5.6, we can write Xi=Zi+µ, where
Z1, . . . , Znis a random sample from the standard pdf, f(z), and µis the location parameter. Let
M(X) denote the median calculated from X1, . . . , Xn. Then M(X) = M(Z)+µand ¯
X=¯
Z+µ.
Thus, M(X)¯
X= (M(Z) + µ)(¯
Z+µ) = M(Z)¯
Z. Because M(X)¯
Xis a function of
only Z1, . . . , Zn, the distribution of M(X)¯
Xdoes not depend on µ; that is, M(X)¯
Xis an
ancillary statistic.
6.15 a. The parameter space consists only of the points (θ, ν) on the graph of the function ν=2.
This quadratic graph is a line and does not contain a two-dimensional open set.
b. Use the same factorization as in Example 6.2.9 to show ( ¯
X, S2) is sufficient. E(S2) = 2
and E( ¯
X2) = Var ¯
X+ (E ¯
X)2=2/n +θ2= (a+n)θ2/n. Therefore,
En
a+n¯
X2S2
a=n
a+na+n
nθ21
a2= 0,for all θ.
Thus g(¯
X, S2) = n
a+n¯
X2S2
ahas zero expectation so ( ¯
X, S2) not complete.
6.17 The population pmf is f(x|θ) = θ(1 θ)x1=θ
1θelog(1θ)x, an exponential family with t(x) =
x. Thus, PiXiis a complete, sufficient statistic by Theorems 6.2.10 and 6.2.25. PiXin
negative binomial(n, θ).
6.18 The distribution of Y=PiXiis Poisson(). Now
Eg(Y) =
X
y=0
g(y)()ye
y!.
If the expectation exists, this is an analytic function which cannot be identically zero.
Second Edition 6-5
6.19 To check if the family of distributions of Xis complete, we check if Epg(X) = 0 for all p,
implies that g(X)0. For Distribution 1,
Epg(X) =
2
X
x=0
g(x)P(X=x) = pg(0) + 3pg(1) + (1 4p)g(2).
Note that if g(0) = 3g(1) and g(2) = 0, then the expectation is zero for all p, but g(x) need
not be identically zero. Hence the family is not complete. For Distribution 2 calculate
Epg(X) = g(0)p+g(1)p2+g(2)(1 pp2) = [g(1) g(2)]p2+ [g(0) g(2)]p+g(2).
This is a polynomial of degree 2 in p. To make it zero for all peach coefficient must be zero.
Thus, g(0) = g(1) = g(2) = 0, so the family of distributions is complete.
6.20 The pdfs in b), c), and e) are exponential families, so they have complete sufficient statistics
from Theorem 6.2.25. For a), Y= max{Xi}is sufficient and
f(y) = 2n
θ2ny2n1,0< y < θ.
For a function g(y),
Eg(Y) = Zθ
0
g(y)2n
θ2ny2n1dy = 0 for all θimplies g(θ)22n1
θ2n= 0 for all θ
by taking derivatives. This can only be zero if g(θ) = 0 for all θ, so Y= max{Xi}is complete.
For d), the order statistics are minimal sufficient. This is a location family. Thus, by Example
6.2.18 the range R=X(n)X(1) is ancillary, and its expectation does not depend on θ. So
this sufficient statistic is not complete.
6.21 a. Xis sufficient because it is the data. To check completeness, calculate
Eg(X) = θ
2g(1) + (1 θ)g(0) + θ
2g(1).
If g(1) = g(1) and g(0) = 0, then Eg(X) = 0 for all θ, but g(x) need not be identically 0.
So the family is not complete.
b. |X|is sufficient by Theorem 6.2.6, because f(x|θ) depends on xonly through the value of
|x|. The distribution of |X|is Bernoulli, because P(|X|= 0) = 1 θand P(|X|= 1) = θ.
By Example 6.2.22, a binomial family (Bernoulli is a special case) is complete.
c. Yes, f(x|θ) = (1 θ)(θ/(2(1 θ))|x|= (1 θ)e|x|log[θ/(2(1θ)], the form of an exponential
family.
6.22 a. The sample density is Qiθxθ1
i=θn(Qixi)θ1, so QiXiis sufficient for θ, not PiXi.
b. Because Qif(xi|θ) = θne(θ1) log(Πixi), log (QiXi) is complete and sufficient by Theorem
6.2.25. Because QiXiis a one-to-one function of log (QiXi), QiXiis also a complete
sufficient statistic.
6.23 Use Theorem 6.2.13. The ratio
f(x|θ)
f(y|θ)=θnI(x(n)/2,x(1))(θ)
θnI(y(n)/2,y(1))(θ)
is constant (in fact, one) if and only if x(1) =y(1) and x(n)=y(n). So (X(1), X(n)) is a
minimal sufficient statistic for θ. From Exercise 6.10, we know that if a function of the sufficient
statistics is ancillary, then the sufficient statistic is not complete. The uniform(θ, 2θ) family is
a scale family, with standard pdf f(z)uniform(1,2). So if Z1, . . . , Znis a random sample
6-6 Solutions Manual for Statistical Inference
from a uniform(1,2) population, then X1=θZ1, . . . , Xn=θZnis a random sample from a
uniform(θ, 2θ) population, and X(1) =θZ(1) and X(n)=θZ(n). So X(1)/X(n)=Z(1)/Z(n), a
statistic whose distribution does not depend on θ. Thus, as in Exercise 6.10, (X(1), X(n)) is not
complete.
6.24 If λ= 0, Eh(X) = h(0). If λ= 1,
Eh(X) = e1h(0) + e1
X
x=1
h(x)
x!.
Let h(0) = 0 and P
x=1
h(x)
x!= 0, so Eh(X) = 0 but h(x)6≡ 0. (For example, take h(0) = 0,
h(1) = 1, h(2) = 2, h(x) = 0 for x3 .)
6.25 Using the fact that (n1)s2
x=Pix2
in¯x2, for any (µ, σ2) the ratio in Example 6.2.14 can
be written as
f(x|µ, σ2)
f(y|µ, σ2)= exp "µ
σ2 X
i
xiX
i
yi!1
2σ2 X
i
x2
iX
i
y2
i!#.
a. Do part b) first showing that PiX2
iis a minimal sufficient statistic. Because PiXi,PiX2
i
is not a function of PiX2
i, by Definition 6.2.11 PiXi,PiX2
iis not minimal.
b. Substituting σ2=µin the above expression yields
f(x|µ, µ)
f(y|µ, µ)= exp "X
i
xiX
i
yi#exp "1
2µ X
i
x2
iX
i
y2
i!#.
This is constant as a function of µif and only if Pix2
i=Piy2
i. Thus, PiX2
iis a minimal
sufficient statistic.
c. Substituting σ2=µ2in the first expression yields
f(x|µ, µ2)
f(y|µ, µ2)= exp "1
µ X
i
xiX
i
yi!1
2µ2 X
i
x2
iX
i
y2
i!#.
This is constant as a function of µif and only if Pixi=Piyiand Pix2
i=Piy2
i. Thus,
PiXi,PiX2
iis a minimal sufficient statistic.
d. The first expression for the ratio is constant a function of µand σ2if and only if Pixi=
Piyiand Pix2
i=Piy2
i. Thus, PiXi,PiX2
iis a minimal sufficient statistic.
6.27 a. This pdf can be written as
f(x|µ, λ) = λ
2π1/21
x31/2
exp λ
µexp λ
2µ2xλ
2
1
x.
This is an exponential family with t1(x) = xand t2(x)=1/x. By Theorem 6.2.25, the
statistic (PiXi,Pi(1/Xi)) is a complete sufficient statistic. ( ¯
X, T ) given in the problem
is a one-to-one function of (PiXi,Pi(1/Xi)). Thus, ( ¯
X, T ) is also a complete sufficient
statistic.
b. This can be accomplished using the methods from Section 4.3 by a straightforward but
messy two-variable transformation U= (X1+X2)/2 and V= 2λ/T =λ[(1/X1) + (1/X2)
(2/[X1+X2])]. This is a two-to-one transformation.
Second Edition 6-7
6.29 Let fj= logistic(αj, βj), j= 0,1, . . . , k. From Theorem 6.6.5, the statistic
T(x) = Qn
i=1 f1(xi)
Qn
i=1 f0(xi), . . . , Qn
i=1 fk(xi)
Qn
i=1 f0(xi)=Qn
i=1 f1(x(i))
Qn
i=1 f0(x(i)), . . . , Qn
i=1 fk(x(i))
Qn
i=1 f0(x(i))
is minimal sufficient for the family {f0, f1, . . . , fk}. As Tis a 1 1 function of the order
statistics, the order statistics are also minimal sufficient for the family {f0, f1, . . . , fk}. If Fis
a nonparametric family, fj∈ F, so part (b) of Theorem 6.6.5 can now be directly applied to
show that the order statistics are minimal sufficient for F.
6.30 a. From Exercise 6.9b, we have that X(1) is a minimal sufficient statistic. To check completeness
compute fY1(y), where Y1=X(1). From Theorem 5.4.4 we have
fY1(y) = fX(y) (1FX(y))n1n=e(yµ)he(yµ)in1n=nen(yµ), y > µ.
Now, write Eµg(Y1) = R
µg(y)nen(yµ)dy. If this is zero for all µ, then R
µg(y)eny dy = 0
for all µ(because ne>0 for all µand does not depend on y). Moreover,
0 = d
Z
µ
g(y)eny dy=g(µ)e
for all µ. This implies g(µ) = 0 for all µ, so X(1) is complete.
b. Basu’s Theorem says that if X(1) is a complete sufficient statistic for µ, then X(1) is inde-
pendent of any ancillary statistic. Therefore, we need to show only that S2has distribution
independent of µ; that is, S2is ancillary. Recognize that f(x|µ) is a location family. So we
can write Xi=Zi+µ, where Z1, . . . , Znis a random sample from f(x|0). Then
S2=1
n1X(Xi¯
X)2=1
n1X((Zi+µ)(¯
Z+µ))2=1
n1X(Zi¯
Z)2.
Because S2is a function of only Z1, . . . , Zn, the distribution of S2does not depend on µ;
that is, S2is ancillary. Therefore, by Basu’s theorem, S2is independent of X(1).
6.31 a. (i) By Exercise 3.28 this is a one-dimensional exponential family with t(x) = x. By Theorem
6.2.25, PiXiis a complete sufficient statistic. ¯
Xis a one-to-one function of PiXi,
so ¯
Xis also a complete sufficient statistic. From Theorem 5.3.1 we know that (n
1)S22χ2
n1= gamma((n1)/2,2). S2= [σ2/(n1)][(n1)S22], a simple scale
transformation, has a gamma((n1)/2,2σ2/(n1)) distribution, which does not depend
on µ; that is, S2is ancillary. By Basu’s Theorem, ¯
Xand S2are independent.
(ii) The independence of ¯
Xand S2is determined by the joint distribution of ( ¯
X, S2) for each
value of (µ, σ2). By part (i), for each value of (µ, σ2), ¯
Xand S2are independent.
b. (i) µis a location parameter. By Exercise 6.14, M¯
Xis ancillary. As in part (a) ¯
Xis a
complete sufficient statistic. By Basu’s Theorem, ¯
Xand M¯
Xare independent. Because
they are independent, by Theorem 4.5.6 Var M= Var(M¯
X+¯
X) = Var(M¯
X)+Var ¯
X.
(ii) If S2is a sample variance calculated from a normal sample of size N, (N1)S22
χ2
N1. Hence, (N1)2Var S2/(σ2)2= 2(N1) and Var S2= 2(σ2)2/(N1). Both M
and M¯
Xare asymptotically normal, so, M1, . . . , MNand M1¯
X1, . . . , MN¯
XN
are each approximately normal samples if nis reasonable large. Thus, using the above
expression we get the two given expressions where in the straightforward case σ2refers
to Var M, and in the swindle case σ2refers to Var(M¯
X).
c. (i)
E(Xk) = E X
YYk
= E "X
YkYk#indep.
= E X
Yk
EYk.
Divide both sides by E Ykto obtain the desired equality.
6-8 Solutions Manual for Statistical Inference
(ii) If αis fixed, T=PiXiis a complete sufficient statistic for βby Theorem 6.2.25. Because
βis a scale parameter, if Z1, . . . , Znis a random sample from a gamma(α, 1) distribution,
then X(i)/T has the same distribution as (βZ(i))/(βPiZi) = Z(i)/(PiZi), and this
distribution does not depend on β. Thus, X(i)/T is ancillary, and by Basu’s Theorem, it
is independent of T. We have
E(X(i)|T) = E X(i)
TTT=TEX(i)
TTindep.
=TEX(i)
Tpart (i)
=TE(X(i))
ET.
Note, this expression is correct for each fixed value of (α, β), regardless whether αis
“known” or not.
6.32 In the Formal Likelihood Principle, take E1=E2=E. Then the conclusion is Ev(E, x1) =
Ev(E, x2) if L(θ|x1)/L(θ|x2) = c. Thus evidence is equal whenever the likelihood functions are
equal, and this follows from Formal Sufficiency and Conditionality.
6.33 a. For all sample points except (2,x
2) (but including (1,x
1)), T(j, xj) = (j, xj). Hence,
g(T(j, xj)|θ)h(j, xj) = g((j, xj)|θ)1 = f((j, xj)|θ).
For (2,x
2) we also have
g(T(2,x
2)|θ)h(2,x
2) = g((1,x
1)|θ)C=f((1,x
1)|θ)C=C1
2f1(x
1|θ)
=C1
2L(θ|x
1) = 1
2L(θ|x
2) = 1
2f2(x
2|θ) = f((2,x
2)|θ).
By the Factorization Theorem, T(J, XJ) is sufficient.
b. Equations 6.3.4 and 6.3.5 follow immediately from the two Principles. Combining them we
have Ev(E1,x
1) = Ev(E2,x
2), the conclusion of the Formal Likelihood Principle.
c. To prove the Conditionality Principle. Let one experiment be the Eexperiment and the
other Ej. Then
L(θ|(j, xj)) = f((j, xj)|θ) = 1
2fj(xj|θ) = 1
2L(θ|xj).
Letting (j, xj) and xjplay the roles of x
1and x
2in the Formal Likelihood Principle we
can conclude Ev(E,(j, xj)) = Ev(Ej,xj),the Conditionality Principle. Now consider the
Formal Sufficiency Principle. If T(X) is sufficient and T(x) = T(y), then L(θ|x) = CL(θ|y),
where C=h(x)/h(y) and his the function from the Factorization Theorem. Hence, by the
Formal Likelihood Principle, Ev(E, x) = Ev(E, y),the Formal Sufficiency Principle.
6.35 Let 1 = success and 0 = failure. The four sample points are {0,10,110,111}. From the likelihood
principle, inference about pis only through L(p|x). The values of the likelihood are 1, p,p2,
and p3, and the sample size does not directly influence the inference.
6.37 a. For one observation (X, Y ) we have
I(θ) = E2
θ2log f(X, Y |θ)=E2Y
θ3=2E Y
θ3.
But, Yexponential(θ), and E Y=θ. Hence, I(θ)=22for a sample of size one, and
I(θ) = 2n/θ2for a sample of size n.
b. (i) The cdf of Tis
P(Tt) = PPiYi
PiXit2=P2PiYi
2PiXiθt22=P(F2n,2nt22)
Second Edition 6-9
where F2n,2nis an Frandom variable with 2ndegrees of freedom in the numerator and
denominator. This follows since 2Yiand 2Xiθare all independent exponential(1), or
χ2
2. Differentiating (in t) and simplifying gives the density of Tas
fT(t) = Γ(2n)
Γ(n)2
2
tt2
t2+θ2nθ2
t2+θ2n
,
and the second derivative (in θ) of the log density is
2nt4+ 2t2θ2θ4
θ2(t2+θ2)2=2n
θ212
(t22+ 1)2,
and the information in Tis
2n
θ2"12E 1
T22+ 12#=2n
θ2
12E 1
F2
2n,2n+ 1!2
.
The expected value is
E 1
F2
2n,2n+ 1!2
=Γ(2n)
Γ(n)2Z
0
1
(1 + w)2
wn1
(1 + w)2n=Γ(2n)
Γ(n)2
Γ(n)Γ(n+ 2)
Γ(2n+ 2) =n+ 1
2(2n+ 1).
Substituting this above gives the information in Tas
2n
θ212n+ 1
2(2n+ 1)=I(θ)n
2n+ 1,
which is not the answer reported by Joshi and Nabar.
(ii) Let W=PiXiand V=PiYi. In each pair, Xiand Yiare independent, so Wand Vare
independent. Xiexponential(1); hence, Wgamma(n, 1). Yiexponential(θ);
hence, Vgamma(n, θ). Use this joint distribution of (W, V ) to derive the joint pdf of
(T, U) as
f(t, u|θ) = 2
[Γ(n)]2tu2n1exp
tut
θ, u > 0, t > 0.
Now, the information in (T, U) is
E2
θ2log f(T, U|θ)=E2UT
θ3= E 2V
θ3=2
θ3=2n
θ2.
(iii) The pdf of the sample is f(x,y) = exp [θ(Pixi)(Piyi)].Hence, (W, V ) defined
as in part (ii) is sufficient. (T, U) is a one-to-one function of (W, V ), hence (T, U) is also
sufficient. But, E U2= E W V = (n/θ)() = n2does not depend on θ. So E(U2n2) = 0
for all θ, and (T, U) is not complete.
6.39 a. The transformation from Celsius to Fahrenheit is y= 9x/5 + 32. Hence,
5
9(T(y)32) = 5
9((.5)(y)+(.5)(212) 32)
=5
9((.5)(9x/5+32)+(.5)(212) 32) = (.5)x+ 50 = T(x).
b. T(x) = (.5)x+ 50 6= (.5)x+ 106 = T(x). Thus, we do not have equivariance.
6-10 Solutions Manual for Statistical Inference
6.40 a. Because X1, . . . , Xnis from a location scale family, by Theorem 3.5.6, we can write Xi=
σZi+µ, where Z1, . . . , Znis a random sample from the standard pdf f(z). Then
T1(X1, . . . , Xn)
T2(X1, . . . , Xn)=T1(σZ1+µ, . . . , σZn+µ)
T2(σZ1+µ, . . . , σZn+µ)=σT1(Z1, . . . , Zn)
σT2(Z1, . . . , Zn)=T1(Z1, . . . , Zn)
T2(Z1, . . . , Zn).
Because T1/T2is a function of only Z1, . . . , Zn, the distribution of T1/T2does not depend
on µor σ; that is, T1/T2is an ancillary statistic.
b. R(x1, . . . , xn) = x(n)x(1). Because a > 0, max{ax1+b, . . . , axn+b}=ax(n)+band
min{ax1+b, . . . , axn+b}=ax(1)+b. Thus, R(ax1+b, . . . , axn+b) = (ax(n)+b)(ax(1)+b) =
a(x(n)x(1)) = aR(x1, . . . , xn). For the sample variance we have
S2(ax1+b, . . . , axn+b) = 1
n1X((axi+b)(a¯x+b))2
=a21
n1X(xi¯x)2=a2S2(x1, . . . , xn).
Thus, S(ax1+b, . . . , axn+b) = aS(x1, . . . , xn). Therefore, Rand Sboth satisfy the above
condition, and R/S is ancillary by a).
6.41 a. Measurement equivariance requires that the estimate of µbased on ybe the same as the
estimate of µbased on x; that is, T(x1+a, . . . , xn+a)a=T(y)a=T(x).
b. The formal structures for the problem involving Xand the problem involving Yare the same.
They both concern a random sample of size nfrom a normal population and estimation of
the mean of the population. Thus, formal invariance requires that T(x) = T(x) for all x.
Combining this with part (a), the Equivariance Principle requires that T(x1+a, . . . , xn+a)
a=T(x1+a, . . . , xn+a)a=T(x1, . . . , xn), i.e., T(x1+a, . . . , xn+a) = T(x1, . . . , xn)+a.
c. W(x1+a, . . . , xn+a) = Pi(xi+a)/n = (Pixi)/n +a=W(x1, . . . , xn) + a, so W(x)
is equivariant. The distribution of (X1, . . . , Xn) is the same as the distribution of (Z1+
θ, . . . , Zn+θ), where Z1, . . . , Znare a random sample from f(x0) and E Zi= 0. Thus,
EθW= E Pi(Zi+θ)/n =θ, for all θ.
6.43 a. For a location-scale family, if Xf(x|θ, σ2), then Y=ga,c(X)f(y|+a, c2σ2). So
for estimating σ2, ¯ga,c(σ2) = c2σ2. An estimator of σ2is invariant with respect to G1if
W(cx1+a, . . . , cxn+a) = c2W(x1, . . . , xn). An estimator of the form kS2is invariant
because
kS2(cx1+a, . . . , cxn+a) = k
n1
n
X
i=1 (cxi+a)
n
X
i=1
(cxi+a)/n!2
=k
n1
n
X
i=1
((cxi+a)(c¯x+a))2
=c2k
n1
n
X
i=1
(xi¯x)2=c2kS2(x1, . . . , xn).
To show invariance with respect to G2, use the above argument with c= 1. To show
invariance with respect to G3, use the above argument with a= 0. ( G2and G3are both
subgroups of G1. So invariance with respect to G1implies invariance with respect to G2and
G3.)
b. The transformations in G2leave the scale parameter unchanged. Thus, ¯ga(σ2) = σ2. An
estimator of σ2is invariant with respect to this group if
W(x1+a, . . . , xn+a) = W(ga(x)) = ¯ga(W(x)) = W(x1, . . . , xn).
Second Edition 6-11
An estimator of the given form is invariant if, for all aand (x1, . . . , xn),
W(x1+a, . . . , xn+a) = φ¯x+a
ss2=φ¯x
ss2=W(x1, . . . , xn).
In particular, for a sample point with s= 1 and ¯x= 0, this implies we must have φ(a) = φ(0),
for all a; that is, φmust be constant. On the other hand, if φis constant, then the estimators
are invariant by part a). So we have invariance if and only if φis constant. Invariance
with respect to G1also requires φto be constant because G2is a subgroup of G1. Finally,
an estimator of σ2is invariant with respect to G3if W(cx1, . . . , cxn) = c2W(x1, . . . , xn).
Estimators of the given form are invariant because
W(cx1, . . . , cxn) = φc¯x
cs c2s2=c2φ¯x
ss2=c2W(x1, . . . , xn).
Chapter 7
Point Estimation
7.1 For each value of x, the MLE ˆ
θis the value of θthat maximizes f(x|θ). These values are in the
following table.
x0 1 2 3 4
ˆ
θ1 1 2 or 3 3 3
At x= 2, f(x|2) = f(x|3) = 1/4 are both maxima, so both ˆ
θ= 2 or ˆ
θ= 3 are MLEs.
7.2 a.
L(β|x) =
n
Y
i=1
1
Γ(α)βαxα1
iexi=1
Γ(α)nβ"n
Y
i=1
xi#α1
eΣixi
logL(β|x) = log Γ(α)nlog β+ (α1) log "n
Y
i=1
xi#Pixi
β
logL
β =
β+Pixi
β2
Set the partial derivative equal to 0 and solve for βto obtain ˆ
β=Pixi/(). To check
that this is a maximum, calculate
2logL
β2β=ˆ
β
=
β22Pixi
β3β=ˆ
β
=()3
(Pixi)22()3
(Pixi)2=()3
(Pixi)2<0.
Because ˆ
βis the unique point where the derivative is 0 and it is a local maximum, it is a
global maximum. That is, ˆ
βis the MLE.
b. Now the likelihood function is
L(α, β|x) = 1
Γ(α)nβ"n
Y
i=1
xi#α1
eΣixi,
the same as in part (a) except αand βare both variables. There is no analytic form for the
MLEs, The values ˆαand ˆ
βthat maximize L. One approach to finding ˆαand ˆ
βwould be to
numerically maximize the function of two arguments. But it is usually best to do as much
as possible analytically, first, and perhaps reduce the complexity of the numerical problem.
From part (a), for each fixed value of α, the value of βthat maximizes Lis Pixi/().
Substitute this into L. Then we just need to maximize the function of the one variable α
given by
1
Γ(α)n(Pixi/())"n
Y
i=1
xi#α1
eΣixi/ixi/())
=1
Γ(α)n(Pixi/())"n
Y
i=1
xi#α1
e.
7-2 Solutions Manual for Statistical Inference
For the given data, n= 14 and Pixi= 323.6. Many computer programs can be used
to maximize this function. From PROC NLIN in SAS we obtain ˆα= 514.219 and, hence,
ˆ
β=323.6
14(514.219) =.0450.
7.3 The log function is a strictly monotone increasing function. Therefore, L(θ|x)> L(θ0|x) if and
only if log L(θ|x)>log L(θ0|x). So the value ˆ
θthat maximizes log L(θ|x) is the same as the
value that maximizes L(θ|x).
7.5 a. The value ˆzsolves the equation
(1 p)n=Y
i
(1 xiz),
where 0 z(maxixi)1. Let ˆ
k= greatest integer less than or equal to 1/ˆz. Then from
Example 7.2.9, ˆ
kmust satisfy
[k(1 p)]nY
i
(kxi) and [(k+ 1)(1 p)]n<Y
i
(k+ 1 xi).
Because the right-hand side of the first equation is decreasing in ˆz, and because ˆ
k1/ˆz(so
ˆz1/ˆ
k) and ˆ
k+ 1 >1/ˆz,ˆ
kmust satisfy the two inequalities. Thus ˆ
kis the MLE.
b. For p= 1/2, we must solve 1
24= (1 20z)(1 z)(1 19z), which can be reduced to the
cubic equation 380z3+ 419z240z+ 15/16 = 0. The roots are .9998, .0646, and .0381,
leading to candidates of 1, 15, and 26 for ˆ
k. The first two are less than maxixi. Thus ˆ
k= 26.
7.6 a. f(x|θ) = Qiθx2
iI[θ,)(xi) = Qix2
iθnI[θ,)(x(1)). Thus, X(1) is a sufficient statistic for
θby the Factorization Theorem.
b. L(θ|x) = θnQix2
iI[θ,)(x(1)). θnis increasing in θ. The second term does not involve θ.
So to maximize L(θ|x), we want to make θas large as possible. But because of the indicator
function, L(θ|x) = 0 if θ > x(1). Thus, ˆ
θ=x(1).
c. E X=R
θθx1dx =θlogx|
θ=. Thus the method of moments estimator of θdoes not
exist. (This is the Pareto distribution with α=θ,β= 1.)
7.7 L(0|x) = 1, 0 < xi<1, and L(1|x) = Qi1/(2xi), 0 < xi<1. Thus, the MLE is 0 if
1Qi1/(2xi), and the MLE is 1 if 1 <Qi1/(2xi).
7.8 a. E X2= Var X+µ2=σ2. Therefore X2is an unbiased estimator of σ2.
b.
L(σ|x) = 1
2πσ ex2/(2σ2).log L(σ|x) = log(2π)1/2log σx2/(2σ2).
logL
σ =1
σ+x2
σ3
set
= 0 ˆσX2= ˆσ3ˆσ=X2=|X|.
2logL
σ2=3x2σ2
σ6+1
σ2,which is negative at ˆσ=|x|.
Thus, ˆσ=|x|is a local maximum. Because it is the only place where the first derivative is
zero, it is also a global maximum.
c. Because E X= 0 is known, just equate E X2=σ2=1
nP1
i=1 X2
i=X2ˆσ=|X|.
7.9 This is a uniform(0, θ) model. So E X= (0 + θ)/2 = θ/2. The method of moments estimator
is the solution to the equation ˜
θ/2 = ¯
X, that is, ˜
θ= 2 ¯
X. Because ˜
θis a simple function of the
sample mean, its mean and variance are easy to calculate. We have
E˜
θ= 2E ¯
X= 2E X= 2θ
2=θ, and Var ˜
θ= 4Var ¯
X= 4θ2/12
n=θ2
3n.
Second Edition 7-3
The likelihood function is
L(θ|x) =
n
Y
i=1
1
θI[0](xi) = 1
θnI[0](x(n))I[0,)(x(1)),
where x(1) and x(n)are the smallest and largest order statistics. For θx(n),L= 1n, a
decreasing function. So for θx(n),Lis maximized at ˆ
θ=x(n).L= 0 for θ < x(n). So the
overall maximum, the MLE, is ˆ
θ=X(n). The pdf of ˆ
θ=X(n)is nxn1n, 0 xθ. This
can be used to calculate
Eˆ
θ=n
n+ 1θ, Eˆ
θ2=n
n+ 2θ2and Var ˆ
θ=2
(n+ 2)(n+ 1)2.
˜
θis an unbiased estimator of θ;ˆ
θis a biased estimator. If nis large, the bias is not large
because n/(n+ 1) is close to one. But if nis small, the bias is quite large. On the other hand,
Var ˆ
θ < Var ˜
θfor all θ. So, if nis large, ˆ
θis probably preferable to ˜
θ.
7.10 a. f(x|θ) = Qiα
βαxα1
iI[0](xi) = α
βαn(Qixi)α1I(−∞](x(n))I[0,)(x(1)) = L(α, β|x). By
the Factorization Theorem, (QiXi, X(n)) are sufficient.
b. For any fixed α,L(α, β|x) = 0 if β < x(n), and L(α, β|x) a decreasing function of βif
βx(n). Thus, X(n)is the MLE of β. For the MLE of αcalculate
α logL=
α "nlogαlogβ+(α1)log Y
i
xi#=n
αnlog β+ log Y
i
xi.
Set the derivative equal to zero and use ˆ
β=X(n)to obtain
ˆα=n
nlogX(n)log QiXi
="1
nX
i
(logX(n)logXi)#1
.
The second derivative is n/α2<0, so this is the MLE.
c. X(n)= 25.0, log QiXi=Pilog Xi= 43.95 ˆ
β= 25.0, ˆα= 12.59.
7.11 a.
f(x|θ) = Y
i
θxθ1
i=θn Y
i
xi!θ1
=L(θ|x)
d
log L=d
"nlogθ+(θ1)log Y
i
xi#=n
θ+X
i
log xi.
Set the derivative equal to zero and solve for θto obtain ˆ
θ= (1
nPilog xi)1. The second
derivative is n/θ2<0, so this is the MLE. To calculate the variance of ˆ
θ, note that
Yi=log Xiexponential(1), so Pilog Xigamma(n, 1). Thus ˆ
θ=n/T , where
Tgamma(n, 1). We can either calculate the first and second moments directly, or use
the fact that ˆ
θis inverted gamma (page 51). We have
E1
T=θn
Γ(n)Z
0
1
ttn1eθt dt =θn
Γ(n)
Γ(n1)
θn1=θ
n1.
E1
T2=θn
Γ(n)Z
0
1
t2tn1eθt dt =θn
Γ(n)
Γ(n2)
θn2=θ2
(n1)(n2),
7-4 Solutions Manual for Statistical Inference
and thus
Eˆ
θ=n
n1θand Var ˆ
θ=n2
(n1)2(n2)θ20 as n→ ∞.
b. Because Xbeta(θ, 1), E X=θ/(θ+ 1) and the method of moments estimator is the
solution to 1
nX
i
Xi=θ
θ+1 ˜
θ=PiXi
nPiXi
.
7.12 Xiiid Bernoulli(θ), 0 θ1/2.
a. method of moments:
EX=θ=1
nX
i
Xi=¯
X˜
θ=¯
X.
MLE: In Example 7.2.7, we showed that L(θ|x) is increasing for θ¯xand is decreasing
for θ¯x. Remember that 0 θ1/2 in this exercise. Therefore, when ¯
X1/2, ¯
Xis
the MLE of θ, because ¯
Xis the overall maximum of L(θ|x). When ¯
X > 1/2, L(θ|x) is an
increasing function of θon [0,1/2] and obtains its maximum at the upper bound of θwhich
is 1/2. So the MLE is ˆ
θ= min ¯
X, 1/2.
b. The MSE of ˜
θis MSE(˜
θ) = Var ˜
θ+ bias(˜
θ)2= (θ(1 θ)/n)+02=θ(1 θ)/n. There is no
simple formula for MSE(ˆ
θ), but an expression is
MSE(ˆ
θ) = E(ˆ
θθ)2=
n
X
y=0
(ˆ
θθ)2n
yθy(1 θ)ny
=
[n/2]
X
y=0 y
nθ2n
yθy(1 θ)ny+
n
X
y=[n/2]+1 1
2θ2n
yθy(1 θ)ny,
where Y=PiXibinomial(n, θ) and [n/2] = n/2, if nis even, and [n/2] = (n1)/2, if
nis odd.
c. Using the notation used in (b), we have
MSE(˜
θ) = E( ¯
Xθ)2=
n
X
y=0 y
nθ2n
yθy(1 θ)ny.
Therefore,
MSE(˜
θ)MSE(ˆ
θ) =
n
X
y=[n/2]+1 "y
nθ21
2θ2#n
yθy(1 θ)ny
=
n
X
y=[n/2]+1 y
n+1
22θy
n1
2n
yθy(1 θ)ny.
The facts that y/n > 1/2 in the sum and θ1/2 imply that every term in the sum is positive.
Therefore MSE(ˆ
θ)<MSE(˜
θ) for every θin 0 < θ 1/2. (Note: MSE(ˆ
θ) = MSE(˜
θ) = 0 at
θ= 0.)
7.13 L(θ|x) = Qi1
2e1
2|xiθ|=1
2ne1
2Σi|xiθ|, so the MLE minimizes Pi|xiθ|=Pi|x(i)θ|,
where x(1), . . . , x(n)are the order statistics. For x(j)θx(j+1),
n
X
i=1 |x(i)θ|=
j
X
i=1
(θx(i)) +
n
X
i=j+1
(x(i)θ) = (2jn)θ
j
X
i=1
x(i)+
n
X
i=j+1
x(i).
Second Edition 7-5
This is a linear function of θthat decreases for j < n/2 and increases for j > n/2. If nis even,
2jn= 0 if j=n/2. So the likelihood is constant between x(n/2) and x((n/2)+1), and any
value in this interval is the MLE. Usually the midpoint of this interval is taken as the MLE. If
nis odd, the likelihood is minimized at ˆ
θ=x((n+1)/2).
7.15 a. The likelihood is
L(µ, λ|x) = λn/2
(2π)nQixi
exp (λ
2X
i
(xiµ)2
µ2xi).
For fixed λ, maximizing with respect to µis equivalent to minimizing the sum in the expo-
nential.
d
X
i
(xiµ)2
µ2xi
=d
X
i
((xi)1)2
xi
=X
i
2 ((xi)1)
xi
xi
µ2.
Setting this equal to zero is equivalent to setting
X
ixi
µ1= 0,
and solving for µyields ˆµn= ¯x. Plugging in this ˆµnand maximizing with respect to λ
amounts to maximizing an expression of the form λn/2eλb. Simple calculus yields
ˆ
λn=n
2bwhere b=X
i
(xi¯x)2
2¯x2xi
.
Finally,
2b=X
i
xi
¯x22X
i
1
¯x+X
i
1
xi
=n
¯x+X
i
1
xi
=X
i1
xi1
¯x.
b. This is the same as Exercise 6.27b.
c. This involved algebra can be found in Schwarz and Samanta (1991).
7.17 a. This is a special case of the computation in Exercise 7.2a.
b. Make the transformation
z= (x21)/x1, w =x1x1=w, x2=wz + 1.
The Jacobean is |w|, and
fZ(z) = ZfX1(w)fX2(wz + 1)wdw =1
θ2e1Zwew(1+z)dw,
where the range of integration is 0 < w < 1/z if z < 0, 0 < w < if z > 0. Thus,
fZ(z) = 1
θ2e1(R1/z
0wew(1+z)dw if z < 0
R
0wew(1+z)dw if z0
Using the fact that Rwew/adw =ew/a(aw +a2), we have
fZ(z) = e1(zθ+e(1+z)/zθ (1+zzθ)
θz(1+z)2if z < 0
1
(1+z)2if z0
7-6 Solutions Manual for Statistical Inference
c. From part (a) we get ˆ
θ= 1. From part (b), X2= 1 implies Z= 0 which, if we use the second
density, gives us ˆ
θ=.
d. The posterior distributions are just the normalized likelihood times prior, so of course they
are different.
7.18 a. The usual first two moment equations for Xand Yare
¯x= E X=µX,1
nX
i
x2
i= E X2=σ2
X+µ2
X,
¯y= E Y=µY,1
nX
i
y2
i= E Y2=σ2
Y+µ2
Y.
We also need an equation involving ρ.
1
nX
i
xiyi= E XY = Cov(X, Y ) + (E X)(E Y) = ρσXσY+µXµY.
Solving these five equations yields the estimators given. Facts such as
1
nX
i
x2
i¯x2=Pix2
i(Pixi)2/n
n=Pi(xi¯x)2
n
are used.
b. Two answers are provided. First, use the Miscellanea: For
L(θ|x) = h(x)c(θ) exp k
X
i=1
wi(θ)ti(x)!,
the solutions to the kequations Pn
j=1 ti(xj) = EθPn
j=1 ti(Xj)=nEθti(X1), i= 1, . . . , k,
provide the unique MLE for θ. Multiplying out the exponent in the bivariate normal pdf
shows it has this exponential family form with k= 5 and t1(x, y) = x,t2(x, y) = y,t3(x, y) =
x2,t4(x, y) = y2and t5(x, y) = xy. Setting up the method of moment equations, we have
X
i
xi=X,X
i
x2
i=n(µ2
X+σ2
X),
X
i
yi=Y,X
i
y2
i=n(µ2
Y+σ2
Y),
X
i
xiyi=X
i
[Cov(X, Y ) + µXµY] = n(ρσXσY+µXµY).
These are the same equations as in part (a) if you divide each one by n. So the MLEs are
the same as the method of moment estimators in part (a).
For the second answer, use the hint in the book to write
L(θ|x,y) = L(θ|x)L(θ, x|y)
= (2πσ2
X)n
2exp (1
2σ2
XX
i
(xiµX)2)
| {z }
A
×2πσ2
Y(1ρ2)n
2exp "1
2σ2
Y(1 ρ2)X
iyiµY+ρσY
σX
(xiµX)2#
| {z }
B
Second Edition 7-7
We know that ¯xand ˆσ2
X=Pi(xi¯x)2/n maximizes A; the question is whether given σY,
µY, and ρ, does ¯x, ˆσ2
Xmaximize B? Let us first fix σ2
Xand look for ˆµX, that maximizes B.
We have
logB
µX∝ −2 X
i(yiµY)ρσY
σX
(xiµX)!ρσY
σX
set
= 0
X
i
(yiµY) = ρσY
σX
Σ(xiˆµX).
Similarly do the same procedure for L(θ|y)L(θ, y|x) This implies Pi(xiµX) = ρσX
σYPi(yi
ˆµY). The solutions ˆµXand ˆµYtherefore must satisfy both equations. If Pi(yiˆµY)6= 0 or
Pi(xiˆµX)6= 0, we will get ρ= 1, so we need Pi(yiˆµY) = 0 and Pi(xiˆµX) = 0.
This implies ˆµX= ¯xand ˆµY= ¯y. ( 2log B
µ2
X
<0. Therefore it is maximum). To get ˆσ2
Xtake
log B
σ2
XX
i
ρσY
σ2
X
(xiˆµX)(yiµY)ρσY
σX
(xiµX)set
= 0.
X
i
(xiˆµX)(yiˆµY) = ρσY
ˆσXX(xiˆµX)2.
Similarly, Pi(xiˆµX)(yiˆµY) = ρσX
ˆσYPi(yiˆµY)2. Thus ˆσ2
Xand ˆσ2
Ymust satisfy the
above two equations with ˆµX=¯
X, ˆµY=¯
Y. This implies
ˆσY
ˆσXX
i
(xi¯x)2=ˆσX
ˆσYX
i
(yi¯y)2Pi(xi¯x)2
ˆσ2
X
=Pi(yi¯y)2
ˆσ2
Y
.
Therefore, ˆσ2
X=aPi(xi¯x)2, ˆσ2
Y=aPi(yi¯y)2where ais a constant. Combining the
knowledge that ¯x, 1
nPi(xi¯x)2= (ˆµX,ˆσ2
X) maximizes A, we conclude that a= 1/n.
Lastly, we find ˆρ, the MLE of ρ. Write
log L(¯x, ¯y, ˆσ2
X,ˆσ2
Y, ρ|x,y)
=n
2log(1 ρ2)1
2(1ρ2)X
i(xi¯x)2
ˆσ2
X2ρ(xi¯x)(yi¯y)
ˆσX,ˆσY
+(yi¯y)2
ˆσ2
Y
=n
2log(1 ρ2)1
2(1ρ2)
2n2ρX
i
(xi¯x)(yi¯y)
ˆσXˆσY
| {z }
A
because ˆσ2
X=1
nPi(xi¯x)2and ˆσ2
Y=1
nPi(yi¯y)2. Now
log L=n
2log(1 ρ2)n
1ρ2+ρ
1ρ2A
and log L
ρ =n
1ρ2
(1ρ2)2+A(1ρ2)+22
(1ρ2)2
set
= 0.
This implies
A+2nˆρnˆρ3
(1ρ2)2= 0 A(1 + ˆρ2) = nˆρ(1 + ˆρ2)
ˆρ=A
n=1
nX
i
(xi¯x)(yi¯y)
ˆσXˆσY
.
7-8 Solutions Manual for Statistical Inference
7.19 a.
L(θ|y) = Y
i
1
2πσ2exp 1
2σ2(yiβxi)2
= (2πσ2)n/2exp 1
2σ2X
i
(y2
i2βxiyi+β2x2
i)!
= (2πσ2)n/2exp β2Pix2
i
2σ2exp 1
2σ2X
i
y2
i+β
σ2X
i
xiyi!.
By Theorem 6.1.2, (PiY2
i,PixiYi) is a sufficient statistic for (β, σ2).
b.
logL(β2|y) = n
2log(2π)n
2log σ21
2σ2Xy2
i+β
σ2X
i
xiyiβ2
2σ2X
i
x2
i.
For a fixed value of σ2,
logL
β =1
σ2X
i
xiyiβ
σ2X
i
x2
i
set
= 0 ˆ
β=Pixiyi
Pix2
i
.
Also,
2logL
β2=1
σ2X
i
x2
i<0,
so it is a maximum. Because ˆ
βdoes not depend on σ2, it is the MLE. And ˆ
βis unbiased
because
Eˆ
β=PixiEYi
Pix2
i
=Pixi·βxi
Pix2
i
=β.
c. ˆ
β=PiaiYi, where ai=xi/Pjx2
jare constants. By Corollary 4.6.10, ˆ
βis normally dis-
tributed with mean β, and
Var ˆ
β=X
i
a2
iVar Yi=X
i xi
Pjx2
j!2
σ2=Pix2
i
(Pjx2
j)2σ2=σ2
Pix2
i
.
7.20 a.
EPiYi
Pixi
=1
PixiX
i
EYi=1
PixiX
i
βxi=β.
b.
Var PiYi
Pixi=1
(Pixi)2X
i
Var Yi=Piσ2
(Pixi)2=2
n2¯x2=σ2
n¯x2.
Because Pix2
in¯x2=Pi(xi¯x)20, Pix2
in¯x2. Hence,
Var ˆ
β=σ2
Pix2
iσ2
n¯x2= Var PiYi
Pixi.
(In fact, ˆ
βis BLUE (Best Linear Unbiased Estimator of β), as discussed in Section 11.3.2.)
Second Edition 7-9
7.21 a.
E1
nX
i
Yi
xi
=1
nX
i
EYi
xi
=1
nX
i
βxi
xi
=β.
b.
Var 1
nX
i
Yi
xi
=1
n2X
i
Var Yi
x2
i
=σ2
n2X
i
1
x2
i
.
Using Example 4.7.8 with ai= 1/x2
iwe obtain
1
nX
i
1
x2
in
Pix2
i
.
Thus,
Var ˆ
β=σ2
Pix2
iσ2
n2X
i
1
x2
i
= Var 1
nX
i
Yi
xi
.
Because g(u) = 1/u2is convex, using Jensen’s Inequality we have
1
¯x21
nX
i
1
x2
i
.
Thus,
Var PiYi
Pixi=σ2
n¯x2σ2
n2X
i
1
x2
i
= Var 1
nX
i
Yi
xi
.
7.22 a.
f(¯x, θ) = f(¯x|θ)π(θ) = n
2πσ en(¯xθ)2/(2σ2)1
2πτ e(θµ)2/2τ2.
b. Factor the exponent in part (a) as
n
2σ2(¯xθ)21
2τ2(θµ)2=1
2v2(θδ(x))21
τ2+σ2/n(¯xµ)2,
where δ(x) = (τ2¯x+ (σ2/n)µ)/(τ2+σ2/n) and v= (σ2τ2/n).(τ+σ2/n). Let n(a, b) denote
the pdf of a normal distribution with mean aand variance b. The above factorization shows
that
f(x, θ) = n(θ, σ2/n)×n(µ, τ2) = n(δ(x), v2)×n(µ, τ2+σ2/n),
where the marginal distribution of ¯
Xis n(µ, τ2+σ2/n) and the posterior distribution of θ|x
is n(δ(x), v2). This also completes part (c).
7.23 Let t=s2and θ=σ2. Because (n1)S22χ2
n1, we have
f(t|θ) = 1
Γ ((n1)/2) 2(n1)/2n1
θt[(n1)/2]1
e(n1)t/2θn1
θ.
With π(θ) as given, we have (ignoring terms that do not depend on θ)
π(θ|t)"1
θ((n1)/2)1
e(n1)t/2θ1
θ#1
θα+1 e1θ
1
θ((n1)/2)+α+1
exp 1
θ(n1)t
2+1
β,
7-10 Solutions Manual for Statistical Inference
which we recognize as the kernel of an inverted gamma pdf, IG(a, b), with
a=n1
2+αand b=(n1)t
2+1
β1
.
Direct calculation shows that the mean of an IG(a, b) is 1/((a1)b), so
E(θ|t) =
n1
2t+1
β
n1
2+α1=
n1
2s2+1
β
n1
2+α1.
This is a Bayes estimator of σ2.
7.24 For nobservations, Y=PiXiPoisson().
a. The marginal pmf of Yis
m(y) = Z
0
()ye
y!
1
Γ(α)βαλα1eλ/β
=ny
y!Γ(α)βαZ
0
λ(y+α)1eλ
β/(+1) =ny
y!Γ(α)βαΓ(y+α)β
+1 y+α
.
Thus,
π(λ|y) = f(y|λ)π(λ)
m(y)=λ(y+α)1eλ
β/(+1)
Γ(y+α)β
+1 y+αgamma y+α, β
+1 .
b.
E(λ|y)=(y+α)β
+1 =β
+1 y+1
+1 (αβ).
Var(λ|y)=(y+α)β2
(+1)2.
7.25 a. We will use the results and notation from part (b) to do this special case. From part (b),
the Xis are independent and each Xihas marginal pdf
m(x|µ, σ2, τ2) = Z
−∞
f(x|θ, σ2)π(θ|µ, τ 2)=Z
−∞
1
2πστ e(xθ)2/2σ2e(θµ)2/2τ2.
Complete the square in θto write the sum of the two exponents as
θh2
σ2+τ2+µσ2
σ2+τ2i2
2σ2τ2
σ2+τ2(xµ)2
2(σ2+τ2).
Only the first term involves θ; call it A(θ). Also, eA(θ)is the kernel of a normal pdf. Thus,
Z
−∞
eA(θ)=2πστ
σ2+τ2,
and the marginal pdf is
m(x|µ, σ2, τ2) = 1
2πστ 2πστ
σ2+τ2exp (xµ)2
2(σ2+τ2)
=1
2πσ2+τ2exp (xµ)2
2(σ2+τ2),
a n(µ, σ2+τ2) pdf.
Second Edition 7-11
b. For one observation of Xand θthe joint pdf is
h(x, θ|τ) = f(x|θ)π(θ|τ),
and the marginal pdf of Xis
m(x|τ) = Z
−∞
h(x, θ|τ)dθ.
Thus, the joint pdf of X= (X1, . . . , Xn) and θ= (θ1, . . . , θn) is
h(x,θ|τ) = Y
i
h(xi, θi|τ),
and the marginal pdf of Xis
m(x|τ) = Z
−∞ ···Z
−∞ Y
i
h(xi, θi|τ)1. . . dθn
=Z
−∞ ···Z
−∞
h(x1, θ1|τ)1n
Y
i=2
h(xi, θi|τ)2. . . dθn.
The 1integral is just m(x1|τ), and this is not a function of θ2, . . . , θn. So, m(x1|τ) can be
pulled out of the integrals. Doing each integral in turn yields the marginal pdf
m(x|τ) = Y
i
m(xi|τ).
Because this marginal pdf factors, this shows that marginally X1, . . . , Xnare independent,
and they each have the same marginal distribution, m(x|τ).
7.26 First write
f(x1, . . . , xn|θ)π(θ)en
2σ2(¯xθ)2−|θ|/a
where the exponent can be written
n
2σ2(¯xθ)2|θ|
a=n
2σ2(θδ±(x)) + n
2σ2¯x2δ2
±(x)
with δ±(x) = ¯x±σ2
na , where we use the “+” if θ > 0 and the “” if θ < 0. Thus, the posterior
mean is
E(θ|x) = R
−∞ θen
2σ2(θδ±(x))2
R
−∞ en
2σ2(θδ±(x))2.
Now use the facts that for constants aand b,
Z
0
ea
2(tb)2dt =Z0
−∞
ea
2(tb)2dt =rπ
2a,
Z
0
tea
2(tb)2dt =Z
0
(tb)ea
2(tb)2dt +Z
0
bea
2(tb)2dt =1
aea
2b2+brπ
2a,
Z0
−∞
tea
2(tb)2dt =1
aea
2b2+brπ
2a,
to get
E(θ|x) = qπσ2
2n(δ(x) + δ+(x)) +σ2
nen
2σ2δ2
+(x)en
2σ2δ2
(x)
2qπσ2
2n
.
7-12 Solutions Manual for Statistical Inference
7.27 a. The log likelihood is
log L=
n
X
i=1 βτi+yilog(βτi)τi+xilog(τi)log yi!log xi!
and differentiation gives
β log L=
n
X
i=1 τi+yiτi
βτiβ=Pn
i=1 yi
Pn
i=1 τi
τj
log L=β+yjβ
βτji+xj
τjτj=xj+yj
1 + β
n
X
j=1
τj=Pn
j=1 xj+Pn
j=1 yj
1 + β.
Combining these expressions yields ˆ
β=Pn
j=1 yj/Pn
j=1 xjand ˆτj=xj+yj
1+ ˆ
β.
b. The stationary point of the EM algorithm will satisfy
ˆ
β=Pn
i=1 yi
ˆτ1+Pn
i=2 xi
ˆτ1=ˆτ1+y1
ˆ
β+ 1
ˆτj=xj+yj
ˆ
β+ 1 .
The second equation yields τ1=y1, and substituting this into the first equation yields
β=Pn
j=2 yj/Pn
j=2 xj. Summing over jin the third equation, and substituting β=
Pn
j=2 yj/Pn
j=2 xjshows us that Pn
j=2 ˆτj=Pn
j=2 xj, and plugging this into the first equa-
tion gives the desired expression for ˆ
β. The other two equations in (7.2.16) are obviously
satisfied.
c. The expression for ˆ
βwas derived in part (b), as were the expressions for ˆτi.
7.29 a. The joint density is the product of the individual densities.
b. The log likelihood is
log L=
n
X
i=1 τi+yilog(τi) + xilog(τi) + log m!log yi!log xi!
and
β log L= 0 β=Pn
i=1 yi
Pn
i=1 i
τj
log L= 0 τj=xj+yj
.
Since Pτj= 1, ˆ
β=Pn
i=1 yi/m =Pn
i=1 yi/Pn
i=1 xi. Also, Pjτj=Pj(yj+xj) = 1, which
implies that =Pj(yj+xj) and ˆτj= (xj+yj)/Pi(yi+xi).
c. In the likelihood function we can ignore the factorial terms, and the expected complete-data
likelihood is obtained by on the rth iteration by replacing x1with E(X1|ˆτ(r)
1) = mˆτ(r)
1.
Substituting this into the MLEs of part (b) gives the EM sequence.
Second Edition 7-13
The MLEs from the full data set are ˆ
β= 0.0008413892 and
ˆτ= (0.06337310,0.06374873,0.06689681,0.04981487,0.04604075,0.04883109,
0.07072460,0.01776164,0.03416388,0.01695673,0.02098127,0.01878119,
0.05621836,0.09818091,0.09945087,0.05267677,0.08896918,0.08642925).
The MLEs for the incomplete data were computed using R, where we take m=Pxi. The
Rcode is
#mles on the incomplete data#
xdatam<-c(3560,3739,2784,2571,2729,3952,993,1908,948,1172,
1047,3138,5485,5554,2943,4969,4828)
ydata<-c(3,4,1,1,3,1,2,0,2,0,1,3,5,4,6,2,5,4)
xdata<-c(mean(xdatam),xdatam); for (j in 1:500) {
xdata<-c(sum(xdata)*tau[1],xdatam) beta<-sum(ydata)/sum(xdata)
tau<-c((xdata+ydata)/(sum(xdata)+sum(ydata))) } beta tau
The MLEs from the incomplete data set are ˆ
β= 0.0008415534 and
ˆτ= (0.06319044,0.06376116,0.06690986,0.04982459,0.04604973,0.04884062,
0.07073839,0.01776510,0.03417054,0.01696004,0.02098536,0.01878485,
0.05622933,0.09820005,0.09947027,0.05268704,0.08898653,0.08644610).
7.31 a. By direct substitution we can write
log L(θ|y) = E hlog L(θ|y,X)|ˆ
θ(r),yiEhlog k(X|θ, y)|ˆ
θ(r),yi.
The next iterate, ˆ
θ(r+1) is obtained by maximizing the expected complete-data log likelihood,
so for any θ, E hlog L(ˆ
θ(r+1)y,X)ˆ
θ(r),yiEhlog L(θ|y,X)|ˆ
θ(r),yi
b. Write
E [log k(X|θ, y)|θ0,y] = Zlog k(x|θ, y) log k(x|θ0,y)dxZlog k(x|θ0,y) log k(x|θ0,y)dx,
from the hint. Hence E hlog k(X|ˆ
θ(r+1),y)ˆ
θ(r),yiEhlog k(X|ˆ
θ(r),y)ˆ
θ(r),yi, and so the
entire right hand side in part (a) is decreasing.
7.33 Substitute α=β=pn/4 into MSE(ˆpB) = np(1p)
(α+β+n)2+np+α
α+β+np2and simplify to obtain
MSE(ˆpB) = n
4(n+n)2,
independent of p, as desired.
7.35 a.
δp(g(x)) = δp(x1+a, . . . , xn+a)
=R
−∞ tQif(xi+at)dt
R
−∞ Qif(xi+at)dt =R
−∞ (y+a)Qif(xiy)dy
R
−∞ Qif(xiy)dy (y=ta)
=a+δp(x) = ¯g(δp(x)) .
7-14 Solutions Manual for Statistical Inference
b.
Y
i
f(xit) = 1
(2π)n/2e1
2Σi(xit)2=1
(2π)n/2e1
2n(¯xt)2e1
2(n1)s2,
so
δp(x) = (n/2π)R
−∞ te1
2n(¯xt)2dt
(n/2π)R
−∞ e1
2n(¯xt)2dt =¯x
1= ¯x.
c.
Y
i
f(xit) = Y
i
It1
2xit+1
2=Ix(n)1
2tx(1) +1
2,
so
δp(x) = Rx(1)+1/2
x(n)+1/2t dt
Rx(1)+1/2
x(n)+1/21dt
=x(1) +x(n)
2.
7.37 To find a best unbiased estimator of θ, first find a complete sufficient statistic. The joint pdf is
f(x|θ) = 1
2θnY
i
I(θ,θ)(xi) = 1
2θn
I[0)(max
i|xi|).
By the Factorization Theorem, maxi|Xi|is a sufficient statistic. To check that it is a complete
sufficient statistic, let Y= maxi|Xi|. Note that the pdf of Yis fY(y) = nyn1n, 0 < y < θ.
Suppose g(y) is a function such that
Eg(Y) = Zθ
0
nyn1
θng(y)dy = 0,for all θ.
Taking derivatives shows that θn1g(θ) = 0, for all θ. So g(θ) = 0, for all θ, and Y= maxi|Xi|
is a complete sufficient statistic. Now
EY=Zθ
0
ynyn1
θndy =n
n+ 1θEn+ 1
nY=θ.
Therefore n+1
nmaxi|Xi|is a best unbiased estimator for θbecause it is a function of a complete
sufficient statistic. (Note that X(1), X(n)is not a minimal sufficient statistic (recall Exercise
5.36). It is for θ < Xi<2θ,2θ < Xi< θ, 4θ < Xi<6θ, etc., but not when the range is
symmetric about zero. Then maxi|Xi|is minimal sufficient.)
7.38 Use Corollary 7.3.15.
a.
θ logL(θ|x) =
θ log Y
i
θxθ1
i=
θ X
i
[logθ+ (θ1) logxi]
=X
i1
θ+ logxi=n"X
i
logxi
n1
θ#.
Thus, Pilog Xi/n is the UMVUE of 1and attains the Cram´er-Rao bound.
Second Edition 7-15
b.
θ logL(θ|x) =
θ log Y
i
logθ
θ1θxi=
θ X
i
[loglogθlog(θ1) + xilogθ]
=X
i1
θlogθ1
θ1+1
θX
i
xi=n
θlogθn
θ1+n¯x
θ
=n
θ¯xθ
θ11
logθ.
Thus, ¯
Xis the UMVUE of θ
θ11
logθand attains the Cram´er-Rao lower bound.
Note: We claim that if
θ log L(θ|X) = a(θ)[W(X)τ(θ)], then E W(X) = τ(θ), because
under the condition of the Cram´er-Rao Theorem, E
θ log L(θ|x) = 0. To be rigorous, we
need to check the “interchange differentiation and integration“ condition. Both (a) and (b)
are exponential families, and this condition is satisfied for all exponential families.
7.39
Eθ2
θ2log f(X|θ)=Eθ
θ
θ log f(X|θ)
= Eθ"
θ
θ f(X|θ)
f(X|θ)!# = Eθ
2
θ2f(X|θ)
f(X|θ)
θ f(X|θ)
f(X|θ)!2
.
Now consider the first term:
Eθ"2
θ2f(X|θ)
f(X|θ)#=Z2
θ2f(x|θ)dx=d
Z
θ f(x|θ)dx(assumption)
=d
Eθ
θ log f(X|θ)= 0,(7.3.8)
and the identity is proved.
7.40
θ logL(θ|x) =
p log Y
i
pxi(1 p)1xi=
p X
i
xilog p+ (1 xi) log(1 p)
=X
ixi
p(1 xi)
1p=n¯x
pnn¯x
1p=n
p(1 p)[¯xp].
By Corollary 7.3.15, ¯
Xis the UMVUE of pand attains the Cram´er-Rao lower bound. Alter-
natively, we could calculate
nEθ2
θ2logf(X|θ)
=nE2
p2log hpX(1 p)1Xi=nE2
p2[Xlogp+ (1 X) log(1 p)]
=nE
p X
p(1 X)
1p =nE X
p21X
(1 p)2!
=n1
p1
1p=n
p(1 p).
7-16 Solutions Manual for Statistical Inference
Then using τ(θ) = pand τ0(θ) = 1,
τ0(θ)
nEθ2
θ2logf(X|θ)=1
n/p(1 p)=p(1 p)
n= Var ¯
X.
We know that E ¯
X=p. Thus, ¯
Xattains the Cram´er-Rao bound.
7.41 a. E (PiaiXi) = PiaiEXi=Piaiµ=µPiai=µ. Hence the estimator is unbiased.
b. Var (PiaiXi) = Pia2
iVar Xi=Pia2
iσ2=σ2Pia2
i. Therefore, we need to minimize Pia2
i,
subject to the constraint Piai= 1. Add and subtract the mean of the ai, 1/n, to get
X
i
a2
i=X
iai1
n+1
n2
=X
iai1
n2
+1
n,
because the cross-term is zero. Hence, Pia2
iis minimized by choosing ai= 1/n for all i.
Thus, Pi(1/n)Xi=¯
Xhas the minimum variance among all linear unbiased estimators.
7.43 a. This one is real hard - it was taken from an American Statistician article, but the proof is
not there. A cryptic version of the proof is in Tukey (Approximate Weights, Ann. Math.
Statist. 1948, 91-92); here is a more detailed version.
Let qi=q
i(1 + λti) with 0 λ1 and |ti| ≤ 1. Recall that q
i= (12
i)/Pj(12
j) and
VarW= 1/Pj(12
j). Then
Var qiWi
Pjqj!=1
(Pjqj)2X
i
qiσ2
i
=1
[Pjq
j(1 + λtj)]2X
i
q2
i(1 + λti)2σ2
i
=1
[Pjq
j(1 + λtj)]2Pj(12
j)X
i
q
i(1 + λti)2,
using the definition of q
i. Now write
X
i
q
i(1 + λti)2= 1 + 2λX
j
qjtj+λ2X
j
qjt2
j= [1 + λX
j
qjtj]2+λ2[X
j
qjt2
j(X
j
qjtj)2],
where we used the fact that Pjq
j= 1. Now since
[X
j
q
j(1 + λtj)]2= [1 + λX
j
qjtj]2,
Var qiWi
Pjqj!=1
Pj(12
j)"1 + λ2[Pjqjt2
j(Pjqjtj)2]
[1 + λPjqjtj]2#
1
Pj(12
j)"1 + λ2[1 (Pjqjtj)2]
[1 + λPjqjtj]2#,
since Pjqjt2
j1. Now let T=Pjqjtj, and
Var qiWi
Pjqj!1
Pj(12
j)1 + λ2[1 T2]
[1+λT ]2,
Second Edition 7-17
and the right hand side is maximized at T=λ, with maximizing value
Var qiWi
Pjqj!1
Pj(12
j)1 + λ2[1 λ2]
[1 λ2]2= VarW1
1λ2.
Bloch and Moses (1988) define λas the solution to
bmax/bmin =1 + λ
1λ,
where bi/bjare the ratio of the normalized weights which, in the present notation, is
bi/bj= (1 + λti)/(1 + λtj).
The right hand side is maximized by taking tias large as possible and tjas small as possible,
and setting ti= 1 and tj=1 (the extremes) yields the Bloch and Moses (1988) solution.
b.
bi=1/k
(12
i).Pj12
j=σ2
i
kX
j
12
j.
Thus,
bmax =σ2
max
kX
j
12
jand bmin =σ2
min
kX
j
12
j
and B=bmax/bmin =σ2
max2
min. Solving B= (1 + λ)/(1 λ) yields λ= (B1)/(B+ 1).
Substituting this into Tukey’s inequality yields
Var W
Var W(B+ 1)2
4B=((σ2
max2
min) + 1)2
4(σ2
max2
min).
7.44 PiXiis a complete sufficient statistic for θwhen Xin(θ, 1). ¯
X21/n is a function of
PiXi. Therefore, by Theorem 7.3.23, ¯
X21/n is the unique best unbiased estimator of its
expectation.
E¯
X21
n= Var ¯
X+ (E ¯
X)21
n=1
n+θ21
n=θ2.
Therefore, ¯
X21/n is the UMVUE of θ2. We will calculate
Var ¯
X21/n= Var( ¯
X2) = E( ¯
X4)[E( ¯
X2)]2,where ¯
Xn (θ, 1/n),
but first we derive some general formulas that will also be useful in later exercises. Let Y
n(θ, σ2). Then here are formulas for E Y4and Var Y2.
EY4= E[Y3(Yθ+θ)] = E Y3(Yθ)+EY3θ=E Y 3(Yθ) + θEY3.
EY3(Yθ) = σ2E(3Y2) = σ23σ2+θ2= 3σ4+ 3θ2σ2.(Stein’s Lemma)
θEY3=θ3θσ2+θ3= 3θ2σ2+θ4.(Example 3.6.6)
Var Y2= 3σ4+ 6θ2σ2+θ4(σ2+θ2)2= 2σ4+ 4θ2σ2.
Thus,
Var ¯
X21
n= Var ¯
X2= 2 1
n2+ 4θ21
n>4θ2
n.
7-18 Solutions Manual for Statistical Inference
To calculate the Cram´er-Rao lower bound, we have
Eθ2logf(X|θ)
θ2= Eθ2
θ2log 1
2πe(Xθ)2/2
= Eθ2
θ2log(2π)1/21
2(Xθ)2 = Eθ
θ (Xθ)=1,
and τ(θ) = θ2, [τ0(θ)]2= (2θ)2= 4θ2so the Cram´er-Rao Lower Bound for estimating θ2is
[τ0(θ)]2
nEθ2
θ2logf(X|θ)=4θ2
n.
Thus, the UMVUE of θ2does not attain the Cram´er-Rao bound. (However, the ratio of the
variance and the lower bound 1 as n→ ∞.)
7.45 a. Because E S2=σ2, bias(aS2) = E(aS2)σ2= (a1)σ2. Hence,
MSE(aS2) = Var(aS2) + bias(aS2)2=a2Var(S2)+(a1)2σ4.
b. There were two typos in early printings; κ= E[Xµ]44and
Var(S2) = 1
nκn3
n1σ4.
See Exercise 5.8b for the proof.
c. There was a typo in early printings; under normality κ= 3. Under normality we have
κ=E[Xµ]4
σ4= E Xµ
σ4
= E Z4,
where Zn(0,1). Now, using Lemma 3.6.5 with g(z) = z3we have
κ= E Z4= E g(Z)Z= 1E(3Z2) = 3E Z2= 3.
To minimize MSE(S2) in general, write Var(S2) = Bσ4. Then minimizing MSE(S2) is
equivalent to minimizing a2B+ (a1)2. Set the derivative of this equal to 0 (Bis not a
function of a) to obtain the minimizing value of ais 1/(B+ 1). Using the expression in part
(b), under normality the minimizing value of ais
1
B+ 1 =1
1
n3n3
n1+ 1
=n1
n+ 1.
d. There was a typo in early printings; the minimizing ais
a=n1
(n+ 1) + (κ3)(n1)
n
.
To obtain this simply calculate 1/(B+ 1) with (from part (b))
B=1
nκn3
n1.
Second Edition 7-19
e. Using the expression for ain part (d), if κ= 3 the second term in the denominator is
zero and a= (n1)/(n+ 1), the normal result from part (c). If κ < 3, the second term
in the denominator is negative. Because we are dividing by a smaller value, we have a >
(n1)/(n+ 1). Because Var(S2) = Bσ4,B > 0, and, hence, a= 1/(B+ 1) <1. Similarly, if
κ > 3, the second term in the denominator is positive. Because we are dividing by a larger
value, we have a < (n1)/(n+ 1).
7.46 a. For the uniform(θ, 2θ) distribution we have E X= (2θ+θ)/2 = 3θ/2. So we solve 3θ/2 = ¯
X
for θto obtain the method of moments estimator ˜
θ= 2 ¯
X/3.
b. Let x(1), . . . , x(n)denote the observed order statistics. Then, the likelihood function is
L(θ|x) = 1
θnI[x(n)/2,x(1)](θ).
Because 1nis decreasing, this is maximized at ˆ
θ=x(n)/2. So ˆ
θ=X(n)/2 is the MLE. Use
the pdf of X(n)to calculate E X(n)=2n+1
n+1 θ. So E ˆ
θ=2n+1
2n+2 θ, and if k= (2n+ 2)/(2n+ 1),
Ekˆ
θ=θ.
c. From Exercise 6.23, a minimal sufficient statistic for θis (X(1), X(n)). ˜
θis not a function
of this minimal sufficient statistic. So by the Rao-Blackwell Theorem, E(˜
θ|X(1), X(n)) is an
unbiased estimator of θ(˜
θis unbiased) with smaller variance than ˜
θ. The MLE is a function
of (X(1), X(n)), so it can not be improved with the Rao-Blackwell Theorem.
d. ˜
θ= 2(1.16)/3 = .7733 and ˆ
θ= 1.33/2 = .6650.
7.47 Xin(r, σ2), so ¯
Xn(r, σ2/n) and E ¯
X2=r2+σ2/n. Thus E [(π¯
X2πσ2/n)] = πr2is
best unbiased because ¯
Xis a complete sufficient statistic. If σ2is unknown replace it with s2
and the conclusion still holds.
7.48 a. The Cram´er-Rao Lower Bound for unbiased estimates of pis
hd
dp pi2
nEd2
dp2logL(p|X)=1
nEnd2
dp2log[pX(1 p)1X]o=1
nEnX
p2(1X)
(1p)2o=p(1 p)
n,
because E X=p. The MLE of pis ˆp=PiXi/n, with E ˆp=pand Var ˆp=p(1 p)/n. Thus
ˆpattains the CRLB and is the best unbiased estimator of p.
b. By independence, E(X1X2X3X4) = QiEXi=p4, so the estimator is unbiased. Because
PiXiis a complete sufficient statistic, Theorems 7.3.17 and 7.3.23 imply that E(X1X2X3X4|
PiXi) is the best unbiased estimator of p4. Evaluating this yields
E X1X2X3X4X
i
Xi=t!=P(X1=X2=X3=X4= 1,Pn
i=5 Xi=t4)
P(PiXi=t)
=p4n4
t4pt4(1 p)nt
n
tpt(1 p)nt=n4
t4.n
t,
for t4. For t < 4 one of the Xis must be zero, so the estimator is E(X1X2X3X4|PiXi=
t) = 0.
7.49 a. From Theorem 5.5.9, Y=X(1) has pdf
fY(y) = n!
(n1)!
1
λey/λ h1(1 ey/λ)in1=n
λeny/λ.
Thus Yexponential(λ/n) so E Y=λ/n and nY is an unbiased estimator of λ.
7-20 Solutions Manual for Statistical Inference
b. Because fX(x) is in the exponential family, PiXiis a complete sufficient statistic and
E (nX(1)|PiXi) is the best unbiased estimator of λ. Because E (PiXi) = , we must
have E (nX(1)|PiXi) = PiXi/n by completeness. Of course, any function of PiXithat
is an unbiased estimator of λis the best unbiased estimator of λ. Thus, we know directly
that because E(PiXi) = ,PiXi/n is the best unbiased estimator of λ.
c. From part (a), ˆ
λ= 601.2 and from part (b) ˆ
λ= 128.8. Maybe the exponential model is not
a good assumption.
7.50 a. E(a¯
X+ (1 a)cS) = aE¯
X+ (1 a)E(cS) = + (1 a)θ=θ. So a¯
X+ (1 a)cS is an
unbiased estimator of θ.
b. Because ¯
Xand S2are independent for this normal model, Var(a¯
X+(1a)cS) = a2V1+(1
a)2V2, where V1= Var ¯
X=θ2/n and V2= Var(cS) = c2ES2θ2=c2θ2θ2= (c21)θ2.
Use calculus to show that this quadratic function of ais minimized at
a=V2
V1+V2
=(c21)θ2
((1/n) + c21)θ2=(c21)
((1/n) + c21).
c. Use the factorization in Example 6.2.9, with the special values µ=θand σ2=θ2, to show
that ( ¯
X, S2) is sufficient. E( ¯
XcS) = θθ= 0, for all θ. So ¯
XcS is a nonzero function
of ( ¯
X, S2) whose expected value is always zero. Thus ( ¯
X, S2) is not complete.
7.51 a. Straightforward calculation gives:
Eθ(a1¯
X+a2cS)2=a2
1Var ¯
X+a2
2c2Var S+θ2(a1+a21)2.
Because Var ¯
X=θ2/n and Var S= E S2(E S)2=θ2c21
c2, we have
Eθ(a1¯
X+a2cS)2=θ2ha2
1.n+a2
2(c21) + (a1+a21)2i,
and we only need minimize the expression in square brackets, which is independent of θ.
Differentiating yields a2=(n+ 1)c2n1and a1= 1 (n+ 1)c2n1.
b. The estimator Thas minimum MSE over a class of estimators that contain those in Exercise
7.50.
c. Because θ > 0, restricting T0 will improve the MSE.
d. No. It does not fit the definition of either one.
7.52 a. Because the Poisson family is an exponential family with t(x) = x,PiXiis a complete
sufficient statistic. Any function of PiXithat is an unbiased estimator of λis the unique
best unbiased estimator of λ. Because ¯
Xis a function of PiXiand E ¯
X=λ,¯
Xis the best
unbiased estimator of λ.
b. S2is an unbiased estimator of the population variance, that is, E S2=λ.¯
Xis a one-to-one
function of PiXi. So ¯
Xis also a complete sufficient statistic. Thus, E(S2|¯
X) is an unbiased
estimator of λand, by Theorem 7.3.23, it is also the unique best unbiased estimator of λ.
Therefore E(S2|¯
X) = ¯
X. Then we have
Var S2= Var E(S2|¯
X)+ E Var(S2|¯
X) = Var ¯
X+ E Var(S2|¯
X),
so Var S2>Var ¯
X.
c. We formulate a general theorem. Let T(X) be a complete sufficient statistic, and let T0(X) be
any statistic other than T(X) such that E T(X)=ET0(X). Then E[T0(X)|T(X)] = T(X)
and Var T0(X)>Var T(X).
Second Edition 7-21
7.53 Let abe a constant and suppose Covθ0(W, U)>0. Then
Varθ0(W+aU) = Varθ0W+a2Varθ0U+ 2aCovθ0(W, U).
Choose a2Covθ0(W, U).Varθ0U, 0. Then Varθ0(W+aU)<Varθ0W, so Wcannot be
best unbiased.
7.55 All three parts can be solved by this general method. Suppose Xf(x|θ) = c(θ)m(x), a < x <
θ. Then 1/c(θ) = Rθ
am(x)dx, and the cdf of Xis F(x) = c(θ)/c(x), a < x < θ. Let Y=X(n)be
the largest order statistic. Arguing as in Example 6.2.23 we see that Yis a complete sufficient
statistic. Thus, any function T(Y) that is an unbiased estimator of h(θ) is the best unbiased
estimator of h(θ). By Theorem 5.4.4 the pdf of Yis g(y|θ) = nm(y)c(θ)n/c(y)n1,a<y<θ.
Consider the equations
Zθ
a
f(x|θ)dx = 1 and Zθ
a
T(y)g(y|θ)dy =h(θ),
which are equivalent to
Zθ
a
m(x)dx =1
c(θ)and Zθ
a
T(y)nm(y)
c(y)n1dy =h(θ)
c(θ)n.
Differentiating both sides of these two equations with respect to θand using the Fundamental
Theorem of Calculus yields
m(θ) = c0(θ)
c(θ)2and T(θ)nm(θ)
c(θ)n1=c(θ)nh0(θ)h(θ)nc(θ)n1c0(θ)
c(θ)2n.
Change θs to ys and solve these two equations for T(y) to get the best unbiased estimator of
h(θ) is
T(y) = h(y) + h0(y)
nm(y)c(y).
For h(θ) = θr,h0(θ) = rθr1.
a. For this pdf, m(x) = 1 and c(θ) = 1. Hence
T(y) = yr+ryr1
n(1/y)=n+r
nyr.
b. If θis the lower endpoint of the support, the smallest order statistic Y=X(1) is a complete
sufficient statistic. Arguing as above yields the best unbiased estimator of h(θ) is
T(y) = h(y)h0(y)
nm(y)c(y).
For this pdf, m(x) = exand c(θ) = eθ. Hence
T(y) = yrryr1
neyey=yrryr1
n.
c. For this pdf, m(x) = exand c(θ) = 1/(eθeb). Hence
T(y) = yrryr1
ney(eyeb) = yrryr1(1 e(by))
n.
7-22 Solutions Manual for Statistical Inference
7.56 Because Tis sufficient, φ(T) = E[h(X1, . . . , Xn)|T] is a function only of T. That is, φ(T) is an
estimator. If E h(X1, . . . , Xn) = τ(θ), then
Eh(X1,···, Xn) = E [E ( h(X1, . . . , Xn)|T)] = τ(θ),
so φ(T) is an unbiased estimator of τ(θ). By Theorem 7.3.23, φ(T) is the best unbiased estimator
of τ(θ).
7.57 a. Tis a Bernoulli random variable. Hence,
EpT=Pp(T= 1) = Pp n
X
i=1
Xi> Xn+1!=h(p).
b. Pn+1
i=1 Xiis a complete sufficient statistic for θ, so E TPn+1
i=1 Xiis the best unbiased
estimator of h(p). We have
E T
n+1
X
i=1
Xi=y!=P n
X
i=1
Xi> Xn+1
n+1
X
i=1
Xi=y!
=P n
X
i=1
Xi> Xn+1,
n+1
X
i=1
Xi=y!.P n+1
X
i=1
Xi=y!.
The denominator equals n+1
ypy(1 p)n+1y. If y= 0 the numerator is
P n
X
i=1
Xi> Xn+1,
n+1
X
i=1
Xi= 0!= 0.
If y > 0 the numerator is
P n
X
i=1
Xi> Xn+1,
n+1
X
i=1
Xi=y, Xn+1 = 0!+P n
X
i=1
Xi> Xn+1,
n+1
X
i=1
Xi=y, Xn+1 = 1!
which equals
P n
X
i=1
Xi>0,
n
X
i=1
Xi=y!P(Xn+1 = 0) + P n
X
i=1
Xi>1,
n
X
i=1
Xi=y1!P(Xn+1 = 1).
For all y > 0,
P n
X
i=1
Xi>0,
n
X
i=1
Xi=y!=P n
X
i=1
Xi=y!=n
ypy(1 p)ny.
If y= 1 or 2, then
P n
X
i=1
Xi>1,
n
X
i=1
Xi=y1!= 0.
And if y > 2, then
P n
X
i=1
Xi>1,
n
X
i=1
Xi=y1!=P n
X
i=1
Xi=y1!=n
y1py1(1 p)ny+1.
Second Edition 7-23
Therefore, the UMVUE is
E T
n+1
X
i=1
Xi=y!=
0 if y= 0
(n
y)py(1p)ny(1p)
(n+1
y)py(1p)ny+1 =(n
y)
(n+1
y)=1
(n+1)(n+1y)if y= 1 or 2
((n
y)+(n
y1))py(1p)ny+1
(n+1
y)py(1p)ny+1 =(n
y)+(n
y1)
(n+1
y)= 1 if y > 2.
7.59 We know T= (n1)S22χ2
n1. Then
ETp/2=1
Γn1
22n1
2Z
0
tp+n1
21et
2dt =2p
2Γp+n1
2
Γn1
2=Cp,n.
Thus
E (n1)S2
σ2!p/2
=Cp,n,
so (n1)p/2Sp.Cp,n is an unbiased estimator of σp. From Theorem 6.2.25, ( ¯
X, S2) is a
complete, sufficient statistic. The unbiased estimator (n1)p/2Sp.Cp,n is a function of ( ¯
X, S2).
Hence, it is the best unbiased estimator.
7.61 The pdf for Yχ2
νis
f(y) = 1
Γ(ν/2)2ν/2yν/21ey/2.
Thus the pdf for S2=σ2Yis
g(s2) = ν
σ2
1
Γ(ν/2)2ν/2s2ν
σ2ν/21
es2ν/(2σ2).
Thus, the log-likelihood has the form (gathering together constants that do not depend on s2
or σ2)
log L(σ2|s2) = log 1
σ2+Klog s2
σ2K0s2
σ2+K00,
where K > 0 and K0>0.
The loss function in Example 7.3.27 is
L(σ2, a) = a
σ2log a
σ21,
so the loss of an estimator is the negative of its likelihood.
7.63 Let a=τ2/(τ2+ 1), so the Bayes estimator is δπ(x) = ax. Then R(µ, δπ) = (a1)2µ2+a2.
As τ2increases, R(µ, δπ) becomes flatter.
7.65 a. Figure omitted.
b. The posterior expected loss is E (L(θ, a)|x) = ecaEecE(aθ)1, where the expectation
is with respect to π(θ|x). Then
d
daE (L(θ, a)|x) = cecaEecset
= 0,
and a=1
clog E eis the solution. The second derivative is positive, so this is the mini-
mum.
7-24 Solutions Manual for Statistical Inference
c. π(θ|x) = n(¯x, σ2/n). So, substituting into the formula for a normal mgf, we find E e=
ec¯x+σ2c2/2n, and the LINEX posterior loss is
E (L(θ, a)|x) = ec(a¯x)+σ2c2/2nc(a¯x)1.
Substitute E e=ec¯x+σ2c2/2ninto the formula in part (b) to find the Bayes rule is
¯x2/2n.
d. For an estimator ¯
X+b, the LINEX posterior loss (from part (c)) is
E (L(θ, ¯x+b)|x) = ecbec2σ2/2ncb 1.
For ¯
Xthe expected loss is ec2σ2/2n1, and for the Bayes estimator (b=2/2n) the
expected loss is c2σ2/2n. The marginal distribution of ¯
Xis m(¯x) = 1, so the Bayes risk is
infinite for any estimator of the form ¯
X+b.
e. For ¯
X+b, the squared error risk is E (¯
X+b)θ2=σ2/n +b2, so ¯
Xis better than the
Bayes estimator. The Bayes risk is infinite for both estimators.
7.66 Let S=PiXibinomial(n, θ).
a. E ˆ
θ2= ES2
n2=1
n2ES2=1
n2((1 θ)+()2) = θ
n+n1
nθ2.
b. T(i)
n=Pj6=iXj2.(n1)2. For Svalues of i,T(i)
n= (S1)2/(n1)2because the Xi
that is dropped out equals 1. For the other nSvalues of i,T(i)
n=S2/(n1)2because
the Xithat is dropped out equals 0. Thus we can write the estimator as
JK(Tn) = nS2
n2n1
n S(S1)2
(n1)2+ (nS)S2
(n1)2!=S2S
n(n1).
c. E JK(Tn) = 1
n(n1) ((1 θ)+()2) = n2θ22
n(n1) =θ2.
d. For this binomial model, Sis a complete sufficient statistic. Because JK(Tn) is a function of
Sthat is an unbiased estimator of θ2, it is the best unbiased estimator of θ2.
Chapter 8
Hypothesis Testing
8.1 Let X= # of heads out of 1000. If the coin is fair, then Xbinomial(1000,1/2). So
P(X560) =
1000
X
x=560 1000
x1
2x1
2nx
.0000825,
where a computer was used to do the calculation. For this binomial, E X= 1000p= 500 and
Var X= 1000p(1 p) = 250. A normal approximation is also very good for this calculation.
P{X560}=PX500
250 559.5500
250 P{Z3.763} ≈ .0000839.
Thus, if the coin is fair, the probability of observing 560 or more heads out of 1000 is very
small. We might tend to believe that the coin is not fair, and p > 1/2.
8.2 Let XPoisson(λ), and we observed X= 10. To assess if the accident rate has dropped, we
could calculate
P(X10|λ= 15) =
10
X
i=0
e15 15i
i!=e15 1+15+152
2! +···+1510
10! .11846.
This is a fairly large value, not overwhelming evidence that the accident rate has dropped. (A
normal approximation with continuity correction gives a value of .12264.)
8.3 The LRT statistic is
λ(y) = supθθ0L(θ|y1, . . . , ym)
supΘL(θ|y1, . . . , ym).
Let y=Pm
i=1 yi, and note that the MLE in the numerator is min {y/m,θ0}(see Exercise 7.12)
while the denominator has y/m as the MLE (see Example 7.2.7). Thus
λ(y) = (1 if y/m θ0
(θ0)y(1θ0)my
(y/m)y(1y/m)myif y/m > θ0,
and we reject H0if
(θ0)y(1θ0)my
(y/m)y(1 y/m)my< c.
To show that this is equivalent to rejecting if y > b, we could show λ(y) is decreasing in yso
that λ(y)< c occurs for y > b > mθ0. It is easier to work with log λ(y), and we have
log λ(y) = ylog θ0+ (my) log (1 θ0)ylog y
m(my) log my
m,
8-2 Solutions Manual for Statistical Inference
and
d
dy logλ(y) = log θ0log(1 θ0)log y
my1
y+ log my
m+ (my)1
my
= log θ0
y/m my
m
1θ0!.
For y/m > θ0, 1 y/m = (my)/m < 1θ0, so each fraction above is less than 1, and the
log is less than 0. Thus d
dy log λ < 0 which shows that λis decreasing in yand λ(y)< c if and
only if y > b.
8.4 For discrete random variables, L(θ|x) = f(x|θ) = P(X=x|θ). So the numerator and denomi-
nator of λ(x) are the supremum of this probability over the indicated sets.
8.5 a. The log-likelihood is
log L(θ, ν|x) = nlog θ+log ν(θ+ 1) log Y
i
xi!, ν x(1),
where x(1) = minixi. For any value of θ, this is an increasing function of νfor νx(1). So
both the restricted and unrestricted MLEs of νare ˆν=x(1). To find the MLE of θ, set
θ log L(θ, x(1)|x) = n
θ+nlog x(1) log Y
i
xi!= 0,
and solve for θyielding
ˆ
θ=n
log( Qixi/xn
(1))=n
T.
(2/∂θ2) log L(θ, x(1)|x) = n/θ2<0, for all θ. So ˆ
θis a maximum.
b. Under H0, the MLE of θis ˆ
θ0= 1, and the MLE of νis still ˆν=x(1). So the likelihood ratio
statistic is
λ(x) = xn
(1)/(Qixi)2
(n/T )nxn2/T
(1) .(Qixi)n/T +1 =T
nneT
(eT)n/T =T
nn
eT+n.
(/∂T ) log λ(x) = (n/T )1. Hence, λ(x) is increasing if Tnand decreasing if Tn.
Thus, Tcis equivalent to Tc1or Tc2, for appropriately chosen constants c1and c2.
c. We will not use the hint, although the problem can be solved that way. Instead, make
the following three transformations. First, let Yi= log Xi,i= 1, . . . , n. Next, make the
n-to-1 transformation that sets Z1= miniYiand sets Z2, . . . , Znequal to the remaining
Yis, with their order unchanged. Finally, let W1=Z1and Wi=ZiZ1,i= 2, . . . , n.
Then you find that the Wis are independent with W1fW1(w) = nenw,w > log ν,
and Wiexponential(1), i= 2, . . . , n. Now T=Pn
i=2 Wigamma(n1,1), and, hence,
2Tgamma(n1,2) = χ2
2(n1).
8.6 a.
λ(x,y) = supΘ0L(θ|x,y)
supΘL(θ|x,y)=supθQn
i=1 1
θexiQm
j=1 1
θeyj
supθ,µ Qn
i=1 1
θexiQm
j=1 1
µeyj
=
supθ1
θm+nexp nPn
i=1 xi+Pm
j=1 yj.θo
supθ,µ 1
θnexp {−Pn
i=1 xi}1
µmexp nPm
j=1 yjo.
Second Edition 8-3
Differentiation will show that in the numerator ˆ
θ0= (Pixi+Pjyj)/(n+m), while in the
denominator ˆ
θ= ¯xand ˆµ= ¯y. Therefore,
λ(x,y) = n+m
Pixi+Pjyjn+m
exp n+m
Pixi+PjyjPixi+Pjyj
n
Pixin
exp n
PixiPixim
Pjyjm
exp m
PjyjPjyj
=(n+m)n+m
nnmm
(Pixi)nPjyjm
Pixi+Pjyjn+m.
And the LRT is to reject H0if λ(x,y)c.
b.
λ=(n+m)n+m
nnmm Pixi
Pixi+Pjyj!n Pjyj
Pixi+Pjyj!m
=(n+m)n+m
nnmmTn(1 T)m.
Therefore λis a function of T.λis a unimodal function of Twhich is maximized when
T=n
m+n. Rejection for λcis equivalent to rejection for Taor Tb, where aand b
are constants that satisfy an(1 a)m=bn(1 b)m.
c. When H0is true, PiXigamma(n, θ) and PjYjgamma(m, θ) and they are indepen-
dent. So by an extension of Exercise 4.19b, Tbeta(n, m).
8.7 a.
L(θ, λ|x) =
n
Y
i=1
1
λe(xiθ)I[θ,)(xi) = 1
λn
eixi)I[θ,)(x(1)),
which is increasing in θif x(1) θ(regardless of λ). So the MLE of θis ˆ
θ=x(1). Then
log L
λ =n
λ+Pixinˆ
θ
λ2
set
= 0 nˆ
λ=X
i
xinˆ
θˆ
λ= ¯xx(1).
Because
2log L
λ2=n
λ22Pixinˆ
θ
λ3¯xx(1)
=n
(¯xx(1))22n(¯xx(1))
(¯xx(1))3=n
(¯xx(1))2<0,
we have ˆ
θ=x(1) and ˆ
λ= ¯xx(1) as the unrestricted MLEs of θand λ. Under the restriction
θ0, the MLE of θ(regardless of λ) is
ˆ
θ0=0 if x(1) >0
x(1) if x(1) 0.
For x(1) >0, substituting ˆ
θ0= 0 and maximizing with respect to λ, as above, yields ˆ
λ0= ¯x.
Therefore,
λ(x) = supΘ0L(θ|x)
supΘL(θ|x)=sup{(λ,θ):θ0}L(λ,θ |x)
L(ˆ
θ, ˆ
λ|x)=(1 if x(1) 0
L(¯x,0|x)
L(ˆ
λ,ˆ
θ|x)if x(1) >0,
where
L(¯x, 0|x)
L(ˆ
λ, ˆ
θ|x)=(1/¯x)nen¯x/¯x
1/ˆ
λnen(¯xx(1))/(¯xx(1))= ˆ
λ
¯x!n
=¯xx(1)
¯xn
=1x(1)
¯xn
.
So rejecting if λ(x)cis equivalent to rejecting if x(1)/¯xc, where cis some constant.
8-4 Solutions Manual for Statistical Inference
b. The LRT statistic is
λ(x) = supβ(1n)eΣixi
supβ(γnn)( Qixi)γ1eΣixγ
i.
The numerator is maximized at ˆ
β0= ¯x. For fixed γ, the denominator is maximized at
ˆ
βγ=Pixγ
i/n. Thus
λ(x) = ¯xnen
supγ(γn/ˆ
βn
γ)( Qixi)γ1eΣixγ
i/ˆ
βγ
=¯xn
supγ(γn/ˆ
βn
γ)( Qixi)γ1.
The denominator cannot be maximized in closed form. Numeric maximization could be used
to compute the statistic for observed data x.
8.8 a. We will first find the MLEs of aand θ. We have
L(a, θ |x) =
n
Y
i=1
1
2πe(xiθ)2/(2),
log L(a, θ |x) =
n
X
i=1 1
2log(2π)1
2(xiθ)2.
Thus
log L
a =
n
X
i=1 1
2a+1
2θa2(xiθ)2=n
2a+1
2θa2
n
X
i=1
(xiθ)2set
= 0
log L
θ =
n
X
i=1 1
2θ+1
22(xiθ)2+1
(xiθ)
=n
2θ+1
22
n
X
i=1
(xiθ)2+n¯x
set
= 0.
We have to solve these two equations simultaneously to get MLEs of aand θ, say ˆaand ˆ
θ.
Solve the first equation for ain terms of θto get
a=1
n
X
i=1
(xiθ)2.
Substitute this into the second equation to get
n
2θ+n
2θ+n(¯xθ)
= 0.
So we get ˆ
θ= ¯x, and
ˆa=1
n¯x
n
X
i=1
(xi¯x)2=ˆσ2
¯x,
the ratio of the usual MLEs of the mean and variance. (Verification that this is a maximum
is lengthy. We omit it.) For a= 1, we just solve the second equation, which gives a quadratic
in θthat leads to the restricted MLE
ˆ
θR=1+q1+4(ˆσ2+¯x2)
2.
Second Edition 8-5
Noting that ˆaˆ
θ= ˆσ2, we obtain
λ(x) = L(ˆ
θR|x)
La, ˆ
θ|x)=Qn
i=1 1
2πˆ
θR
e(xiˆ
θR)2/(2ˆ
θR)
Qn
i=1 1
2πˆaˆ
θe(xiˆ
θ)2/(2ˆaˆ
θ)
=1/(2πˆ
θR)n/2eΣi(xiˆ
θR)2/(2ˆ
θR)
(1/(2πˆσ2))n/2eΣi(xi¯x)2/(2ˆσ2)
=ˆσ2/ˆ
θRn/2e(n/2)Σi(xiˆ
θR)2/(2ˆ
θR).
b. In this case we have
log L(a, θ |x) =
n
X
i=1 1
2log(2π2)1
22(xiθ)2.
Thus
logL
a =
n
X
i=1 1
2a+1
2a2θ2(xiθ)2=n
2a+1
2a2θ2
n
X
i=1
(xiθ)2set
= 0.
logL
θ =
n
X
i=1 1
θ+1
3(xiθ)2+1
2(xiθ)
=n
θ+1
3
n
X
i=1
(xiθ)2+1
2
n
X
i=1
(xiθ)set
= 0.
Solving the first equation for ain terms of θyields
a=1
2
n
X
i=1
(xiθ)2.
Substituting this into the second equation, we get
n
θ+n
θ+nPi(xiθ)
Pi(xiθ)2= 0.
So again, ˆ
θ= ¯xand
ˆa=1
n¯x2
n
X
i=1
(xi¯x)2=ˆσ2
¯x2
in the unrestricted case. In the restricted case, set a= 1 in the second equation to obtain
log L
θ =n
θ+1
θ3
n
X
i=1
(xiθ)2+1
θ2
n
X
i=1
(xiθ)set
= 0.
Multiply through by θ3/n to get
θ2+1
n
n
X
i=1
(xiθ)2θ
n
n
X
i=1
(xiθ) = 0.
Add ±¯xinside the square and complete all sums to get the equation
θ2+ ˆσ2+ (¯xθ)2+θ(¯xθ) = 0.
8-6 Solutions Manual for Statistical Inference
This is a quadratic in θwith solution for the MLE
ˆ
θR= ¯x+q¯x+4(ˆσ2+¯x2)2.
which yields the LRT statistic
λ(x) = L(ˆ
θR|x)
La, ˆ
θ|x)=Qn
i=1 1
p2πˆ
θ2
R
e(xiˆ
θR)2/(2ˆ
θ2
R)
Qn
i=1 1
2πˆaˆ
θ2e(xiˆ
θ)2/(2ˆaˆ
θ2)=ˆσ
ˆ
θRn
e(n/2)Σi(xiˆ
θR)2/(2ˆ
θR).
8.9 a. The MLE of λunder H0is ˆ
λ0=¯
Y1, and the MLE of λiunder H1is ˆ
λi=Y1
i. The
LRT statistic is bounded above by 1 and is given by
1¯
Ynen
(QiYi)1en.
Rearrangement of this inequality yields ¯
Y(QiYi)1/n, the arithmetic-geometric mean
inequality.
b. The pdf of Xiis f(xi|λi) = (λi/x2
i)eλi/xi,xi>0. The MLE of λunder H0is ˆ
λ0=
n/ [Pi(1/Xi)], and the MLE of λiunder H1is ˆ
λi=Xi. Now, the argument proceeds as in
part (a).
8.10 Let Y=PiXi. The posterior distribution of λ|yis gamma (y+α, β/(β+ 1)).
a.
P(λλ0|y) = (β+1)y+α
Γ(y+α)βy+αZλ0
0
ty+α1et(β+1)dt.
P(λ>λ0|y) = 1 P(λλ0|y).
b. Because β/(β+ 1) is a scale parameter in the posterior distribution, (2(β+ 1)λ/β)|yhas
a gamma(y+α, 2) distribution. If 2αis an integer, this is a χ2
2y+2αdistribution. So, for
α= 5/2 and β= 2,
P(λλ0|y) = P2(β+1)λ
β2(β+1)λ0
βy=P(χ2
2y+5 3λ0).
8.11 a. From Exercise 7.23, the posterior distribution of σ2given S2is IG(γ, δ), where γ=α+ (n
1)/2 and δ= [(n1)S2/2 + 1]1. Let Y= 2/(σ2δ). Then Y|S2gamma(γ, 2). (Note:
If 2αis an integer, this is a χ2
2γdistribution.) Let Mdenote the median of a gamma(γ, 2)
distribution. Note that Mdepends on only αand n, not on S2or β. Then we have P(Y
2|S2) = P(σ21|S2)>1/2 if and only if
M > 2
δ= (n1)S2+2
β,that is, S2<M2
n1.
b. From Example 7.2.11, the unrestricted MLEs are ˆµ=¯
Xand ˆσ2= (n1)S2/n. Under H0,
ˆµis still ¯
X, because this was the maximizing value of µ, regardless of σ2. Then because
L(¯x, σ2|x) is a unimodal function of σ2, the restricted MLE of σ2is ˆσ2, if ˆσ21, and is 1,
if ˆσ2>1. So the LRT statistic is
λ(x) = 1 if ˆσ21
(ˆσ2)n/2en(ˆσ21)/2if ˆσ2>1.
Second Edition 8-7
We have that, for ˆσ2>1,
(ˆσ2)log λ(x) = n
21
ˆσ21<0.
So λ(x) is decreasing in ˆσ2, and rejecting H0for small values of λ(x) is equivalent to rejecting
for large values of ˆσ2, that is, large values of S2. The LRT accepts H0if and only if S2< k,
where kis a constant. We can pick the prior parameters so that the acceptance regions
match in this way. First, pick αlarge enough that M/(n1) > k. Then, as βvaries between
0 and , (M2)/(n1) varies between −∞ and M/(n1). So, for some choice of β,
(M2)/(n1) = kand the acceptance regions match.
8.12 a. For H0:µ0 vs. H1:µ > 0 the LRT is to reject H0if ¯x > cσ/n(Example 8.3.3). For
α=.05 take c= 1.645. The power function is
β(µ) = P¯
Xµ
σ/n>1.645µ
σ/n=PZ > 1.645
σ.
Note that the power will equal .5 when µ= 1.645σ/n.
b. For H0:µ= 0 vs. HA:µ6= 0 the LRT is to reject H0if |¯x|> cσ/n(Example 8.2.2). For
α=.05 take c= 1.96. The power function is
β(µ) = P1.96 nµ/σ Z1.96 + nµ/σ.
In this case, µ=±1.96σ/ngives power of approximately .5.
8.13 a. The size of φ1is α1=P(X1> .95|θ= 0) = .05. The size of φ2is α2=P(X1+X2> C|θ= 0).
If 1 C2, this is
α2=P(X1+X2> C|θ= 0) = Z1
1CZ1
Cx1
1dx2dx1=(2 C)2
2.
Setting this equal to αand solving for Cgives C= 2 2α, and for α=.05, we get
C= 2 .11.68.
b. For the first test we have the power function
β1(θ) = Pθ(X1> .95) = (0 if θ≤ −.05
θ+.05 if .05 < θ .95
1 if .95 < θ.
Using the distribution of Y=X1+X2, given by
fY(y|θ) = (y2θif 2θy < 2θ+ 1
2θ+ 2 yif 2θ+1 y < 2θ+ 2
0 otherwise,
we obtain the power function for the second test as
β2(θ) = Pθ(Y > C) =
0 if θ(C/2) 1
(2θ+ 2 C)2/2 if (C/2) 1< θ (C1)/2
1(C2θ)2/2 if (C1)/2< θ C/2
1 if C/2< θ.
c. From the graph it is clear that φ1is more powerful for θnear 0, but φ2is more powerful for
larger θs. φ2is not uniformly more powerful than φ1.
8-8 Solutions Manual for Statistical Inference
d. If either X11 or X21, we should reject H0, because if θ= 0, P(Xi<1) = 1. Thus,
consider the rejection region given by
{(x1, x2): x1+x2> C}[{(x1, x2) : x1>1}[{(x1, x2): x2>1}.
The first set is the rejection region for φ2. The test with this rejection region has the same
size as φ2because the last two sets both have probability 0 if θ= 0. But for 0 < θ < C 1,
The power function of this test is strictly larger than β2(θ). If C1θ, this test and φ2
have the same power.
8.14 The CLT tells us that Z= (PiXinp)/pnp(1 p) is approximately n(0,1). For a test that
rejects H0when PiXi> c, we need to find cand nto satisfy
P Z > cn(.49)
pn(.49)(.51)!=.01 and P Z > cn(.51)
pn(.51)(.49)!=.99.
We thus want cn(.49)
pn(.49)(.51) = 2.33 and cn(.51)
pn(.51)(.49) =2.33.
Solving these equations gives n= 13,567 and c= 6,783.5.
8.15 From the Neyman-Pearson lemma the UMP test rejects H0if
f(x|σ1)
f(x|σ0)=(2πσ2
1)n/2eΣix2
i/(2σ2
1)
(2πσ2
0)n/2eΣix2
i/(2σ2
0)=σ0
σ1n
exp (1
2X
i
x2
i1
σ2
01
σ2
1)> k
for some k0. After some algebra, this is equivalent to rejecting if
X
i
x2
i>2log (k(σ10)n)
1
σ2
01
σ2
1=cbecause 1
σ2
01
σ2
1
>0.
This is the UMP test of size α, where α=Pσ0(PiX2
i> c). To determine cto obtain a specified
α, use the fact that PiX2
i2
0χ2
n. Thus
α=Pσ0 X
i
X2
i2
0> c/σ2
0!=Pχ2
n> c/σ2
0,
so we must have c/σ2
0=χ2
n,α, which means c=σ2
0χ2
n,α.
8.16 a.
Size = P(reject H0|H0is true) = 1 Type I error = 1.
Power = P(reject H0|HAis true) = 1 Type II error = 0.
b.
Size = P(reject H0|H0is true) = 0 Type I error = 0.
Power = P(reject H0|HAis true) = 0 Type II error = 1.
8.17 a. The likelihood function is
L(µ, θ|x,y) = µn Y
i
xi!µ1
θn
Y
j
yj
θ1
.
Second Edition 8-9
Maximizing, by differentiating the log-likelihood, yields the MLEs
ˆµ=n
Pilog xi
and ˆ
θ=m
Pjlog yj
.
Under H0, the likelihood is
L(θ|x,y) = θn+m
Y
i
xiY
j
yj
θ1
,
and maximizing as above yields the restricted MLE,
ˆ
θ0=n+m
Pilog xi+Pjlog yj
.
The LRT statistic is
λ(x,y) = ˆ
θm+n
0
ˆµnˆ
θm Y
i
xi!ˆ
θ0ˆµ
Y
j
yj
ˆ
θ0ˆ
θ
.
b. Substituting in the formulas for ˆ
θ, ˆµand ˆ
θ0yields (Qixi)ˆ
θ0ˆµQjyjˆ
θ0ˆ
θ= 1 and
λ(x,y) = ˆ
θm+n
0
ˆµnˆ
θm=ˆ
θn
0
ˆµn
ˆ
θm
0
ˆ
θm=m+n
mmm+n
nn
(1 T)mTn.
This is a unimodal function of T. So rejecting if λ(x,y)cis equivalent to rejecting if
Tc1or Tc2, where c1and c2are appropriately chosen constants.
c. Simple transformations yield log Xiexponential(1) and log Yiexponential(1).
Therefore, T=W/(W+V) where Wand Vare independent, Wgamma(n, 1) and
Vgamma(m, 1). Under H0, the scale parameters of Wand Vare equal. Then, a
simple generalization of Exercise 4.19b yields Tbeta(n, m). The constants c1and c2are
determined by the two equations
P(Tc1) + P(Tc2) = αand (1 c1)mcn
1= (1 c2)mcn
2.
8.18 a.
β(θ) = Pθ|¯
Xθ0|
σ/n> c= 1 Pθ|¯
Xθ0|
σ/nc
= 1 Pθ
n¯
Xθ0
n
= 1 Pθ/n+θ0θ
σ/n¯
Xθ
σ/n/n+θ0θ
σ/n
= 1 Pc+θ0θ
σ/nZc+θ0θ
σ/n
= 1 + Φ c+θ0θ
σ/nΦc+θ0θ
σ/n,
where Zn(0,1) and Φ is the standard normal cdf.
8-10 Solutions Manual for Statistical Inference
b. The size is .05 = β(θ0) = 1 + Φ(c)Φ(c) which implies c= 1.96. The power (1
type II error) is
.75 β(θ0+σ) = 1 + Φ(cn)Φ(cn) = 1 + Φ(1.96n)
| {z }
0
Φ(1.96 n).
Φ(.675) .25 implies 1.96 n=.675 implies n= 6.943 7.
8.19 The pdf of Yis
f(y|θ) = 1
θy(1)1ey1, y > 0.
By the Neyman-Pearson Lemma, the UMP test will reject if
1
2y1/2eyy1/2=f(y|2)
f(y|1) > k.
To see the form of this rejection region, we compute
d
dy 1
2y1/2eyy1/2=1
2y3/2eyy1/2yy1/2
21
2
which is negative for y < 1 and positive for y > 1. Thus f(y|2)/f(y|1) is decreasing for y1
and increasing for y1. Hence, rejecting for f(y|2)/f(y|1) > k is equivalent to rejecting for
yc0or yc1. To obtain a size αtest, the constants c0and c1must satisfy
α=P(Yc0|θ= 1) + P(Yc1|θ= 1) = 1 ec0+ec1and f(c0|2)
f(c0|1) =f(c1|2)
f(c1|1).
Solving these two equations numerically, for α=.10, yields c0=.076546 and c1= 3.637798.
The Type II error probability is
P(c0< Y < c1|θ= 2) = Zc1
c0
1
2y1/2ey1/2dy =ey1/2
c1
c0
=.609824.
8.20 By the Neyman-Pearson Lemma, the UMP test rejects for large values of f(x|H1)/f(x|H0).
Computing this ratio we obtain
x123456 7
f(x|H1)
f(x|H0)654321.84
The ratio is decreasing in x. So rejecting for large values of f(x|H1)/f(x|H0) corresponds to
rejecting for small values of x. To get a size αtest, we need to choose cso that P(X
c|H0) = α. The value c= 4 gives the UMP size α=.04 test. The Type II error probability is
P(X= 5,6,7|H1) = .82.
8.21 The proof is the same with integrals replaced by sums.
8.22 a. From Corollary 8.3.13 we can base the test on PiXi, the sufficient statistic. Let Y=
PiXibinomial(10, p) and let f(y|p) denote the pmf of Y. By Corollary 8.3.13, a test
that rejects if f(y|1/4)/f(y|1/2) > k is UMP of its size. By Exercise 8.25c, the ratio
f(y|1/2)/f(y|1/4) is increasing in y. So the ratio f(y|1/4)/f(y|1/2) is decreasing in y, and
rejecting for large value of the ratio is equivalent to rejecting for small values of y. To get
α=.0547, we must find csuch that P(Yc|p= 1/2) = .0547. Trying values c= 0,1, . . .,
we find that for c= 2, P(Y2|p= 1/2) = .0547. So the test that rejects if Y2 is the
UMP size α=.0547 test. The power of the test is P(Y2|p= 1/4) .526.
Second Edition 8-11
b. The size of the test is P(Y6|p= 1/2) = P10
k=6 10
k1
2k1
210k.377. The power
function is β(θ) = P10
k=6 10
kθk(1 θ)10k
c. There is a nonrandomized UMP test for all αlevels corresponding to the probabilities
P(Yi|p= 1/2), where iis an integer. For n= 10, αcan have any of the values 0,
1
1024 ,11
1024 ,56
1024 ,176
1024 ,386
1024 ,638
1024 ,848
1024 ,968
1024 ,1013
1024 ,1023
1024 , and 1.
8.23 a. The test is Reject H0if X > 1/2. So the power function is
β(θ) = Pθ(X > 1/2) = Z1
1/2
Γ(θ+1)
Γ(θ)Γ(1)xθ1(1 x)11dx =θ1
θxθ
1
1/2
= 1 1
2θ.
The size is supθH0β(θ) = supθ1(1 1/2θ) = 1 1/2 = 1/2.
b. By the Neyman-Pearson Lemma, the most powerful test of H0:θ= 1 vs. H1:θ= 2 is given
by Reject H0if f(x|2)/f(x|1) > k for some k0. Substituting the beta pdf gives
f(x|2)
f(x|1) =
1
β(2,1) x21(1 x)11
1
β(1,1) x11(1 x)11=Γ(3)
Γ(2)Γ(1)x= 2x.
Thus, the MP test is Reject H0if X > k/2. We now use the αlevel to determine k. We have
α= sup
θΘ0
β(θ) = β(1) = Z1
k/2
fX(x|1) dx =Z1
k/2
1
β(1,1)x11(1 x)11dx = 1 k
2.
Thus 1 k/2 = α, so the most powerful αlevel test is reject H0if X > 1α.
c. For θ2> θ1,f(x|θ2)/f(x|θ1)=(θ21)xθ2θ1, an increasing function of xbecause θ2> θ1.
So this family has MLR. By the Karlin-Rubin Theorem, the test that rejects H0if X > t is
the UMP test of its size. By the argument in part (b), use t= 1 αto get size α.
8.24 For H0:θ=θ0vs. H1:θ=θ1, the LRT statistic is
λ(x) = L(θ0|x)
max{L(θ0|x), L(θ1|x)}=1 if L(θ0|x)L(θ1|x)
L(θ0|x)/L(θ1|x) if L(θ0|x)< L(θ1|x).
The LRT rejects H0if λ(x)< c. The Neyman-Pearson test rejects H0if f(x|θ1)/f(x|θ0) =
L(θ1|x)/L(θ0|x)> k. If k= 1/c > 1, this is equivalent to L(θ0|x)/L(θ1|x)< c, the LRT. But
if c1 or k1, the tests will not be the same. Because cis usually chosen to be small (k
large) to get a small size α, in practice the two tests are often the same.
8.25 a. For θ2> θ1,
g(x|θ2)
g(x|θ1)=e(xθ2)2/2σ2
e(xθ1)2/2σ2=ex(θ2θ1)2e(θ2
1θ2
2)/2σ2.
Because θ2θ1>0, the ratio is increasing in x. So the families of n(θ, σ2) have MLR.
b. For θ2> θ1,
g(x|θ2)
g(x|θ1)=eθ2θx
2/x!
eθ1θx
1/x!=θ2
θ1x
eθ1θ2,
which is increasing in xbecause θ21>1. Thus the Poisson(θ) family has an MLR.
c. For θ2> θ1,
g(x|θ2)
g(x|θ1)=n
xθx
2(1θ2)nx
n
xθx
1(1θ1)nx=θ2(1θ1)
θ1(1θ2)x1θ2
1θ1n
.
Both θ21>1 and (1 θ1)/(1 θ2)>1. Thus the ratio is increasing in x, and the family
has MLR.
(Note: You can also use the fact that an exponential family h(x)c(θ) exp(w(θ)x) has MLR if
w(θ) is increasing in θ(Exercise 8.27). For example, the Poisson(θ) pmf is eθexp(xlog θ)/x!,
and the family has MLR because log θis increasing in θ.)
8-12 Solutions Manual for Statistical Inference
8.26 a. We will prove the result for continuous distributions. But it is also true for discrete MLR
families. For θ1> θ2, we must show F(x|θ1)F(x|θ2). Now
d
dx [F(x|θ1)F(x|θ2)] = f(x|θ1)f(x|θ2) = f(x|θ2)f(x|θ1)
f(x|θ2)1.
Because fhas MLR, the ratio on the right-hand side is increasing, so the derivative can only
change sign from negative to positive showing that any interior extremum is a minimum.
Thus the function in square brackets is maximized by its value at or −∞, which is zero.
b. From Exercise 3.42, location families are stochastically increasing in their location param-
eter, so the location Cauchy family with pdf f(x|θ)=(π[1+(xθ)2])1is stochastically
increasing. The family does not have MLR.
8.27 For θ2> θ1,
g(t|θ2)
g(t|θ1)=c(θ2)
c(θ1)e[w(θ2)w(θ1)]t
which is increasing in tbecause w(θ2)w(θ1)>0. Examples include n(θ, 1), beta(θ, 1), and
Bernoulli(θ).
8.28 a. For θ2> θ1, the likelihood ratio is
f(x|θ2)
f(x|θ1)=eθ1θ21+exθ1
1+exθ22
.
The derivative of the quantity in brackets is
d
dx
1+exθ1
1+exθ2=exθ1exθ2
(1+exθ2)2.
Because θ2> θ1,exθ1> exθ2, and, hence, the ratio is increasing. This family has MLR.
b. The best test is to reject H0if f(x|1)/f(x|0) > k. From part (a), this ratio is increasing
in x. Thus this inequality is equivalent to rejecting if x>k0. The cdf of this logistic is
F(x|θ) = exθ.(1 + exθ). Thus
α= 1 F(k0|0) = 1
1+ek0and β=F(k0|1) = ek01
1+ek01.
For a specified α,k0= log(1 α). So for α=.2, k01.386 and β.595.
c. The Karlin-Rubin Theorem is satisfied, so the test is UMP of its size.
8.29 a. Let θ2> θ1. Then
f(x|θ2)
f(x|θ1)=1+(xθ1)2
1+(xθ2)2=1 + (1+θ1)2/x22θ1/x
1 + (1+θ2)2/x22θ2/x.
The limit of this ratio as x→ ∞ or as x→ −∞ is 1. So the ratio cannot be monotone
increasing (or decreasing) between −∞ and . Thus, the family does not have MLR.
b. By the Neyman-Pearson Lemma, a test will be UMP if it rejects when f(x|1)/f(x|0) > k,
for some constant k. Examination of the derivative shows that f(x|1)/f(x|0) is decreasing
for x(1 5)/2 = .618, is increasing for (1 5)/2x(1 + 5)/2 = 1.618, and is
decreasing for (1 + 5)/2x. Furthermore, f(1|1)/f(1|0) = f(3|1)/f(3|0) = 2. So rejecting
if f(x|1)/f(x|0) >2 is equivalent to rejecting if 1 < x < 3. Thus, the given test is UMP of
its size. The size of the test is
P(1 < X < 3|θ= 0) = Z3
1
1
π
1
1+x2dx =1
πarctanx
3
1.1476.
Second Edition 8-13
The Type II error probability is
1P(1 < X < 3|θ= 1) = 1 Z3
1
1
π
1
1+(x1)2dx = 1 1
πarctan(x1)
3
1.6476.
c. We will not have f(1|θ)/f(1|0) = f(3|θ)/f(3|0) for any other value of θ6= 1. Try θ= 2, for
example. So the rejection region 1 <x<3 will not be most powerful at any other value of
θ. The test is not UMP for testing H0:θ0 versus H1:θ > 0.
8.30 a. For θ2> θ1>0, the likelihood ratio and its derivative are
f(x|θ2)
f(x|θ1)=θ2
θ1
θ2
1+x2
θ2
2+x2and d
dx
f(x|θ2)
f(x|θ1)=θ2
θ1
θ2
2θ2
1
(θ2
2+x2)2x.
The sign of the derivative is the same as the sign of x(recall, θ2
2θ2
1>0), which changes
sign. Hence the ratio is not monotone.
b. Because f(x|θ) = (θ/π)(θ2+|x|2)1,Y=|X|is sufficient. Its pdf is
f(y|θ) = 2θ
π
1
θ2+y2, y > 0.
Differentiating as above, the sign of the derivative is the same as the sign of y, which is
positive. Hence the family has MLR.
8.31 a. By the Karlin-Rubin Theorem, the UMP test is to reject H0if PiXi> k, because PiXi
is sufficient and PiXiPoisson() which has MLR. Choose the constant kto satisfy
P(PiXi> k|λ=λ0) = α.
b.
P X
i
Xi> k
λ= 1!PZ > (kn)/nset
=.05,
P X
i
Xi> k
λ= 2!PZ > (k2n)/2nset
=.90.
Thus, solve for kand nin
kn
n= 1.645 and k2n
2n=1.28,
yielding n= 12 and k= 17.70.
8.32 a. This is Example 8.3.15.
b. This is Example 8.3.19.
8.33 a. From Theorems 5.4.4 and 5.4.6, the marginal pdf of Y1and the joint pdf of (Y1, Yn) are
f(y1|θ) = n(1 (y1θ))n1, θ < y1< θ + 1,
f(y1, yn|θ) = n(n1)(yny1)n2, θ < y1< yn< θ + 1.
Under H0,P(Yn1) = 0. So
α=P(Y1k|0) = Z1
k
n(1 y1)n1dy1= (1 k)n.
Thus, use k= 1 α1/n to have a size αtest.
8-14 Solutions Manual for Statistical Inference
b. For θk1, β(θ) = 0. For k1< θ 0,
β(θ) = Zθ+1
k
n(1 (y1θ))n1dy1= (1 k+θ)n.
For 0 < θ k,
β(θ) = Zθ+1
k
n(1 (y1θ))n1dy1+Zk
θZθ+1
1
n(n1)(yny1)n2dyndy1
=α+ 1 (1 θ)n.
And for k < θ,β(θ) = 1.
c. (Y1, Yn) are sufficient statistics. So we can attempt to find a UMP test using Corollary 8.3.13
and the joint pdf f(y1, yn|θ) in part (a). For 0 < θ < 1, the ratio of pdfs is
f(y1, yn|θ)
f(y1, yn|0) =(0 if 0 < y1θ,y1< yn<1
1 if θ < y1< yn<1
if 1 yn< θ + 1, θ < y1< yn.
For 1 θ, the ratio of pdfs is
f(y1, yn|θ)
f(y1, yn|0) =0 if y1< yn<1
if θ < y1< yn< θ + 1.
For 0 < θ < k, use k0= 1. The given test always rejects if f(y1, yn|θ)/f(y1, yn|0) >1 and
always accepts if f(y1, yn|θ)/f(y1, yn|0) <1. For θk, use k0= 0. The given test always
rejects if f(y1, yn|θ)/f(y1, yn|0) >0 and always accepts if f(y1, yn|θ)/f(y1, yn|0) <0. Thus
the given test is UMP by Corollary 8.3.13.
d. According to the power function in part (b), β(θ) = 1 for all θk= 1 α1/n. So these
conditions are satisfied for any n.
8.34 a. This is Exercise 3.42a.
b. This is Exercise 8.26a.
8.35 a. We will use the equality in Exercise 3.17 which remains true so long as ν > α. Recall that
Yχ2
ν= gamma(ν/2,2). Thus, using the independence of Xand Ywe have
ET0= E X
pY= (E X)νEY1/2=µνΓ((ν1)/2)
Γ(ν/2)2
if ν > 1. To calculate the variance, compute
E(T0)2= E X2
Y= (E X2)νEY1= (µ2+ 1)νΓ((ν2)/2)
Γ(ν/2)2 =(µ2+ 1)ν
ν2
if ν > 2. Thus, if ν > 2,
Var T0=(µ2+ 1)ν
ν2µνΓ((ν1)/2)
Γ(ν/2)22
b. If δ= 0, all the terms in the sum for k= 1,2, . . . are zero because of the δkterm. The
expression with just the k= 0 term and δ= 0 simplifies to the central tpdf.
c. The argument that the noncentral thas an MLR is fairly involved. It may be found in
Lehmann (1986, p. 295).
Second Edition 8-15
8.37 a. P(¯
X > θ0+zασ/n|θ0) = P(¯
Xθ0)/(σ/n)> zα|θ0=P(Z > zα) = α, where Z
n(0,1). Because ¯xis the unrestricted MLE, and the restricted MLE is θ0if ¯x > θ0, the LRT
statistic is, for ¯xθ0
λ(x) = (2πσ2)n/2eΣi(xiθ0)2/2σ2
(2πσ2)n/2eΣi(xi¯x)2/2σ2=e[n(¯xθ0)2+(n1)s2]].2σ2
e(n1)s2/2σ2=en(¯xθ0)2/2σ2.
and the LRT statistic is 1 for ¯x<θ0. Thus, rejecting if λ < c is equivalent to rejecting if
(¯xθ0)/(σ/n)> c0(as long as c < 1 – see Exercise 8.24).
b. The test is UMP by the Karlin-Rubin Theorem.
c. P(¯
X > θ0+tn1S/n|θ=θ0) = P(Tn1> tn1) = α, when Tn1is a Student’s
trandom variable with n1 degrees of freedom. If we define ˆσ2=1
nP(xi¯x)2and
ˆσ2
0=1
nP(xiθ0)2, then for ¯xθ0the LRT statistic is λ= (ˆσ2/ˆσ2
0)n/2, and for ¯x < θ0the
LRT statistic is λ= 1. Writing ˆσ2=n1
ns2and ˆσ2
0= (¯xθ0)2+n1
ns2, it is clear that the
LRT is equivalent to the t-test because λ<cwhen
n1
ns2
(¯xθ0)2+n1
ns2=(n1)/n
(¯xθ0)2/s2+(n1)/n < c0and ¯xθ0,
which is the same as rejecting when (¯xθ0)/(s/n) is large.
d. The proof that the one-sided ttest is UMP unbiased is rather involved, using the bounded
completeness of the normal distribution and other facts. See Chapter 5 of Lehmann (1986)
for a complete treatment.
8.38 a.
Size = Pθ0n|¯
Xθ0|> tn1,α/2pS2/no
= 1 Pθ0ntn1,α/2pS2/n ¯
Xθ0tn1,α/2pS2/no
= 1 Pθ0(tn1,α/2¯
Xθ0
pS2/n tn1,α/2) ¯
Xθ0
pS2/n tn1under H0!
= 1 (1 α) = α.
b. The unrestricted MLEs are ˆ
θ=¯
Xand ˆσ2=Pi(Xi¯
X)2/n. The restricted MLEs are
ˆ
θ0=θ0and ˆσ2
0=Pi(Xiθ0)2/n. So the LRT statistic is
λ(x) = (2πˆσ0)n/2exp{−nˆσ2
0/(2ˆσ2
0)}
(2πˆσ)n/2exp{−nˆσ2/(2ˆσ2)}
="Pi(xi¯x)2
Pi(xiθ0)2#n/2
="Pi(xi¯x)2
Pi(xi¯x)2+n(¯xθ0)2#n/2
.
For a constant c, the LRT is
reject H0if "Pi(xi¯x)2
Pi(xi¯x)2+n(¯xθ0)2#=1
1 + n(¯xθ0)2/Pi(xi¯x)2< c2/n.
After some algebra we can write the test as
reject H0if |¯xθ0|>c2/n 1(n1) s2
n1/2
.
8-16 Solutions Manual for Statistical Inference
We now choose the constant cto achieve size α, and we
reject if |¯xθ0|> tn1,α/2ps2/n.
c. Again, see Chapter 5 of Lehmann (1986).
8.39 a. From Exercise 4.45c, Wi=XiYin(µW, σ2
W), where µXµY=µWand σ2
X+σ2
Y
ρσXσY=σ2
W. The Wis are independent because the pairs (Xi, Yi) are.
b. The hypotheses are equivalent to H0:µW= 0 vs H1:µW6= 0, and, from Exercise 8.38, if
we reject H0when |¯
W|> tn1,α/2pS2
W/n, this is the LRT (based on W1, . . . , Wn) of size
α. (Note that if ρ > 0, Var Wican be small and the test will have good power.)
8.41 a.
λ(x,y) = supH0L(µX, µY, σ2|x,y)
supL(µX, µY, σ2|x,y)=L(ˆµ, ˆσ2
0|x,y)
L(ˆµX,ˆµY,ˆσ2
1|x,y).
Under H0, the Xis and Yis are one sample of size m+nfrom a n(µ, σ2) population, where
µ=µX=µY. So the restricted MLEs are
ˆµ=PiXi+PiYi
n+m=n¯x+n¯y
n+mand ˆσ2
0=Pi(Xiˆµ)2+Pi(Yiˆµ)2
n+m.
To obtain the unrestricted MLEs, ˆµx, ˆµy, ˆσ2, use
L(µX, µY, σ2|x, y) = (2πσ2)(n+m)/2ei(xiµX)2i(yiµY)2]/2σ2.
Firstly, note that ˆµX= ¯xand ˆµY= ¯y, because maximizing over µXdoes not involve µY
and vice versa. Then
log L
σ2=n+m
2
1
σ2+1
2"X
i
(xiˆµX)2+X
i
(yiˆµY)2#1
(σ2)2
set
= 0
implies
ˆσ2="n
X
i=1
(xi¯x)2+
m
X
i=1
(yi¯y)2#1
n+m.
To check that this is a maximum,
2log L
(σ2)2ˆσ2
=n+m
2
1
(σ2)2"X
i
(xiˆµX)2+X
i
(yiˆµY)2#1
(σ2)3ˆσ2
=n+m
2
1
(ˆσ2)2(n+m)1
(ˆσ2)2=n+m
2
1
(ˆσ2)2<0.
Thus, it is a maximum. We then have
λ(x,y) =
(2πˆσ2
0)n+m
2exp n1
2ˆσ2
0hPn
i=1 (xiˆµ)2+Pm
i=1 (yiˆµ)2io
(2πˆσ2)n+m
2exp n1
2ˆσ2hPn
i=1 (xi¯x)2+Pm
i=1 (yi¯y)2io =ˆσ2
0
ˆσ2
1n+m
2
and the LRT is rejects H0if ˆσ2
0/ˆσ2> k. In the numerator, first substitute ˆµ= (n¯x+
m¯y)/(n+m) and write
n
X
i=1 xin¯x+m¯y
n+m2
=
n
X
i=1 (xi¯x)+ ¯xn¯x+m¯y
n+m2
=
n
X
i=1
(xi¯x)2+nm2
(n+m)2(¯x¯y)2,
Second Edition 8-17
because the cross term is zero. Performing a similar operation on the Ysum yields
ˆσ2
0
ˆσ2=P(xi¯x)2+P(yi¯y)2+nm
n+m(¯x¯y)2
ˆσ2=n+m+nm
n+m
(¯x¯y)2
ˆσ2.
Because ˆσ2=n+m2
n+mS2
p, large values of ˆσ2
0.ˆσ2are equivalent to large values of (¯x¯y)2.S2
p
and large values of |T|. Hence, the LRT is the two-sample t-test.
b.
T=¯
X¯
Y
qS2
p(1/n + 1/m)
=
(¯
X¯
Y).pσ2(1/n + 1/m)
q[(n+m2)S2
p2]/(n+m2)
.
Under H0, ( ¯
X¯
Y)n(0, σ2(1/n+1/m)). Under the model, (n1)S2
X2and (m1)S2
Y2
are independent χ2random variables with (n1) and (m1) degrees of freedom. Thus,
(n+m2)S2
p2= (n1)S2
X2+ (m1)S2
Y2χ2
n+m2. Furthermore, ¯
X¯
Yis
independent of S2
Xand S2
Y, and, hence, S2
p.SoTtn+m2.
c. The two-sample ttest is UMP unbiased, but the proof is rather involved. See Chapter 5 of
Lehmann (1986).
d. For these data we have n= 14, ¯
X= 1249.86, S2
X= 591.36, m= 9, ¯
Y= 1261.33, S2
Y= 176.00
and S2
p= 433.13. Therefore, T=1.29 and comparing this to a t21 distribution gives a
p-value of .21. So there is no evidence that the mean age differs between the core and
periphery.
8.42 a. The Satterthwaite approximation states that if Yiχ2
ri, where the Yi’s are independent,
then
X
i
aiYi
approx
χ2
ˆν
ˆνwhere ˆν=(PiaiYi)2
Pia2
iY2
i/ri
.
We have Y1= (n1)S2
X2
Xχ2
n1and Y2= (m1)S2
Y2
Yχ2
m1. Now define
a1=σ2
X
n(n1) [(σ2
X/n)+(σ2
Y/m)] and a2=σ2
Y
m(m1) [(σ2
X/n)+(σ2
Y/m)].
Then,
XaiYi=σ2
X
n(n1) [(σ2
X/n)+(σ2
Y/m)]
(n1)S2
X
σ2
X
+σ2
Y
m(m1) [(σ2
X/n)+(σ2
Y/m)]
(m1)S2
Y
σ2
Y
=S2
X/n +S2
Y/m
σ2
X/n+σ2
Y/m χ2
ˆν
ˆν
where
ˆν=S2
X/n+S2
Y/m
σ2
X/n+σ2
Y/m 2
1
(n1)
S4
X
n2(σ2
X/n+σ2
Y/m)2+1
(m1)
S4
Y
m2(σ2
X/n+σ2
Y/m)2
=S2
X/n +S2
Y/m2
S4
X
n2(n1) +S4
Y
m2(m1)
.
b. Because ¯
X¯
YnµXµY, σ2
X/n+σ2
Y/mand S2
X/n+S2
Y/m
σ2
X/n+σ2
Y/m
approx
χ2
ˆν/ˆν, under H0:
µXµY= 0 we have
T0=¯
X¯
Y
qS2
X/n +S2
Y/m
=
(¯
X¯
Y).pσ2
X/n+σ2
Y/m
r(S2
X/n+S2
Y/m)
(σ2
X/n+σ2
Y/m)
approx
tˆν.
8-18 Solutions Manual for Statistical Inference
c. Using the values in Exercise 8.41d, we obtain T0=1.46 and ˆν= 20.64. So the p-value is
.16. There is no evidence that the mean age differs between the core and periphery.
d. F=S2
X/S2
Y= 3.36. Comparing this with an F13,8distribution yields a p-value of 2P(F
3.36) = .09. So there is some slight evidence that the variance differs between the core and
periphery.
8.43 There were typos in early printings. The tstatistic should be
(¯
X¯
Y)(µ1µ2)
q1
n1+ρ2
n2q(n11)s2
X+(n21)s2
Y2
n1+n22
,
and the Fstatistic should be s2
Y/(ρ2s2
X). Multiply and divide the denominator of the tstatistic
by σto express it as
(¯
X¯
Y)(µ1µ2)
qσ2
n1+ρ2σ2
n2
divided by s(n11)s2
X2+ (n21)s2
Y/(ρ2σ2)
n1+n22.
The numerator has a n(0,1) distribution. In the denominator, (n11)s2
X2χ2
n11and
(n21)s2
Y/(ρ2σ2)χ2
n21and they are independent, so their sum has a χ2
n1+n22distribution.
Thus, the statistic has the form of n(0,1)/pχ2
νwhere ν=n1+n22, and the numerator
and denominator are independent because of the independence of sample means and variances
in normal sampling. Thus the statistic has a tn1+n22distribution. The Fstatistic can be
written as s2
Y
ρ2s2
X
=s2
Y/(ρ2σ2)
s2
X2=[(n21)s2
Y/(ρ2σ2)]/(n21)
[(n11)s2
X/(σ2)]/(n11)
which has the form [χ2
n21/(n21)]/[χ2
n11/(n11)] which has an Fn21,n11distribution.
(Note, early printings had a typo with the numerator and denominator degrees of freedom
switched.)
8.44 Test 3 rejects H0:θ=θ0in favor of H1:θ6=θ0if ¯
X > θ0+zα/2σ/nor ¯
X < θ0zα/2σ/n.
Let Φ and φdenote the standard normal cdf and pdf, respectively. Because ¯
Xn(θ, σ2/n),
the power function of Test 3 is
β(θ) = Pθ(¯
X < θ0zα/2σ/n) + Pθ(¯
X > θ0+zα/2σ/n)
= Φ θ0θ
σ/nzα/2+ 1 Φθ0θ
σ/n+zα/2,
and its derivative is
(θ)
=n
σφθ0θ
σ/nzα/2+n
σφθ0θ
σ/n+zα/2.
Because φis symmetric and unimodal about zero, this derivative will be zero only if
θ0θ
σ/nzα/2=θ0θ
σ/n+zα/2,
that is, only if θ=θ0. So, θ=θ0is the only possible local maximum or minimum of the power
function. β(θ0) = αand limθ→±∞ β(θ) = 1. Thus, θ=θ0is the global minimum of β(θ), and,
for any θ06=θ0,β(θ0)> β(θ0). That is, Test 3 is unbiased.
Second Edition 8-19
8.45 The verification of size αis the same computation as in Exercise 8.37a. Example 8.3.3 shows
that the power function βm(θ) for each of these tests is an increasing function. So for θ > θ0,
βm(θ)> βm(θ0) = α. Hence, the tests are all unbiased.
8.47 a. This is very similar to the argument for Exercise 8.41.
b. By an argument similar to part (a), this LRT rejects H+
0if
T+=¯
X¯
Yδ
qS2
p1
n+1
m≤ −tn+m2.
c. Because H0is the union of H+
0and H
0, by the IUT method of Theorem 8.3.23 the test
that rejects H0if the tests in parts (a) and (b) both reject is a level αtest of H0. That is,
the test rejects H0if T+≤ −tn+m2and Ttn+m2.
d. Use Theorem 8.3.24. Consider parameter points with µXµY=δand σ0. For any
σ,P(T+≤ −tn+m2) = α. The power of the Ttest is computed from the noncentral t
distribution with noncentrality parameter |µxµY(δ)|/[σ(1/n + 1/m)] = 2δ/[σ(1/n +
1/m)] which converges to as σ0. Thus, P(Ttn+m2)1 as σ0. By Theorem
8.3.24, this IUT is a size αtest of H0.
8.49 a. The p-value is
P7 or more successes
out of 10 Bernoulli trialsθ=1
2
=10
71
271
23
+10
81
281
22
+10
91
291
21
+10
101
210 1
20
=.171875.
b.
P-value = P{X3|λ= 1}= 1 P(X < 3|λ= 1)
= 1 e112
2! +e111
1! +e110
0! .0803.
c.
P-value = P{X
i
Xi9|3λ= 3}= 1 P(Y < 9|3λ= 3)
= 1 e338
8! +37
7! +36
6! +35
5! +···+31
1! +30
0! .0038,
where Y=P3
i=1 XiPoisson(3λ).
8.50 From Exercise 7.26,
π(θ|x) = rn
2πσ2en(θδ±(x))2/(2σ2),
where δ±(x) = ¯x±σ2
na and we use the “+” if θ > 0 and the “” if θ < 0.
a. For K > 0,
P(θ > K|x, a) = rn
2πσ2Z
K
en(θδ+(x))2/(2σ2)=PZ > n
σ[Kδ+(x)],
where Zn(0,1).
8-20 Solutions Manual for Statistical Inference
b. As a→ ∞,δ+(x)¯xso P(θ > K)PZ > n
σ(K¯x).
c. For K= 0, the answer in part (b) is 1 (p-value) for H0:θ0.
8.51 If α < p(x),
sup
θΘ0
P(W(X)cα) = α < p(x) = sup
θΘ0
P(W(X)W(x)).
Thus W(x)< cαand we could not reject H0at level αhaving observed x. On the other hand,
if αp(x),
sup
θΘ0
P(W(X)cα) = αp(x) = sup
θΘ0
P(W(X)W(x)).
Either W(x)cαin which case we could reject H0at level αhaving observed xor W(x)< cα.
But, in the latter case we could use c0
α=W(x) and have {x0:W(x0)c0
α}define a size α
rejection region. Then we could reject H0at level αhaving observed x.
8.53 a.
P(−∞ < θ < ) = 1
2+1
2
1
2πτ2Z
−∞
eθ2/(2τ2)=1
2+1
2= 1.
b. First calculate the posterior density. Because
f(¯x|θ) = n
2πσ en(¯xθ)2/(2σ2),
we can calculate the marginal density as
mπ(¯x) = 1
2f(¯x|0) + 1
2Z
−∞
f(¯x|θ)1
2πτ eθ2/(2τ2)
=1
2
n
2πσ en¯x2/(2σ2)+1
2
1
2πp(σ2/n)+τ2e¯x2/[2((σ2/n)+τ2)]
(see Exercise 7.22). Then P(θ= 0|¯x) = 1
2f(¯x|0)/mπ(¯x).
c.
P|¯
X|>¯xθ= 0= 1 P|¯
X| ≤ ¯xθ= 0
= 1 P¯x¯
X¯xθ= 0= 2 1Φ¯x/(σ/n),
where Φ is the standard normal cdf.
d. For σ2=τ2= 1 and n= 9 we have a p-value of 2 (1 Φ(3¯x)) and
P(θ= 0|¯x) = 1 + r1
10e81¯x2/20!1
.
The p-value of ¯xis usually smaller than the Bayes posterior probability except when ¯xis
very close to the θvalue specified by H0. The following table illustrates this.
Some p-values and posterior probabilities (n= 9)
¯x
0±.1±.15 ±.2±.5±.6533 ±.7±1±2
p-value of ¯x1 .7642 .6528 .5486 .1336 .05 .0358 .0026 0
posterior
P(θ= 0|¯x) .7597 .7523 .7427 .7290 .5347 .3595 .3030 .0522 0
Second Edition 8-21
8.54 a. From Exercise 7.22, the posterior distribution of θ|xis normal with mean [τ2/(τ2+σ2/n)]¯x
and variance τ2/(1 + 22). So
P(θ0|x) = P Z0[τ2/(τ2+σ2/n)]¯x
pτ2/(1 + 22)!
=P Z≤ − τ
p(σ2/n)(τ2+σ2/n)¯x!=P Zτ
p(σ2/n)(τ2+σ2/n)¯x!.
b. Using the fact that if θ= 0, ¯
Xn(0, σ2/n), the p-value is
P(¯
X¯x) = PZ¯x0
σ/n=PZ1
σ/n¯x
c. For σ2=τ2= 1,
P(θ0|x) = P Z1
p(1/n)(1 + 1/n)¯x!and P(¯
X¯x) = P Z1
p1/n ¯x!.
Because 1
p(1/n)(1 + 1/n)<1
p1/n,
the Bayes probability is larger than the p-value if ¯x0. (Note: The inequality is in the
opposite direction for ¯x < 0, but the primary interest would be in large values of ¯x.)
d. As τ2→ ∞, the constant in the Bayes probability,
τ
p(σ2/n)(τ2+σ2/n)=1
p(σ2/n)(1+σ2/(τ2n)) 1
σ/n,
the constant in the p-value. So the indicated equality is true.
8.55 The formulas for the risk functions are obtained from (8.3.14) using the power function β(θ) =
Φ(zα+θ0θ), where Φ is the standard normal cdf.
8.57 For 0–1 loss by (8.3.12) the risk function for any test is the power function β(µ) for µ0 and
1β(µ) for µ > 0. Let α=P(1 < Z < 2), the size of test δ. By the Karlin-Rubin Theorem,
the test δzαthat rejects if X > zαis also size αand is uniformly more powerful than δ, that
is, βδzα(µ)> βδ(µ) for all µ > 0. Hence,
R(µ, δzα) = 1 βδzα(µ)<1βδ(µ) = R(µ, δ),for all µ > 0.
Now reverse the roles of H0and H1and consider testing H
0:µ > 0 versus H
1:µ0. Consider
the test δthat rejects H
0if X1 or X2, and the test δ
zαthat rejects H
0if Xzα. It is
easily verified that for 0–1 loss δand δhave the same risk functions, and δ
zαand δzαhave the
same risk functions. Furthermore, using the Karlin-Rubin Theorem as before, we can conclude
that δ
zαis uniformly more powerful than δ. Thus we have
R(µ, δ) = R(µ, δ)R(µ, δ
zα) = R(µ, δzα),for all µ0,
with strict inequality if µ < 0. Thus, δzαis better than δ.
Chapter 9
Interval Estimation
9.1 Denote A={x:L(x)θ}and B={x:U(x)θ}. Then AB={x:L(x)θU(x)}
and 1 P{AB}=P{L(X)θor θU(X)} ≥ P{L(X)θor θL(X)}= 1, since
L(x)U(x). Therefore, P(AB) = P(A)+P(B)P(AB) = 1α1+1α21 = 1α1α2.
9.3 a. The MLE of βis X(n)= max Xi. Since βis a scale parameter, X(n)is a pivot, and
.05 = Pβ(X(n)c) = Pβ(all Xi) =
βα0n
=cα0n
implies c=.0510n. Thus, .95 = Pβ(X(n) > c) = Pβ(X(n)/c > β), and {β:β <
X(n)/(.0510n)}is a 95% upper confidence limit for β.
b. From 7.10, ˆα= 12.59 and X(n)= 25. So the confidence interval is (0,25/[.051/(12.59·14)]) =
(0,25.43).
9.4 a.
λ(x, y) = supλ=λ0Lσ2
X, σ2
Yx, y
supλ(0,+)L(σ2
X, σ2
Y|x, y)
The unrestricted MLEs of σ2
Xand σ2
Yare ˆσ2
X=ΣX2
i
nand ˆσ2
Y=ΣY2
i
m, as usual. Under the
restriction, λ=λ0,σ2
Y=λ0σ2
X, and
Lσ2
X, λ0σ2
Xx, y=2πσ2
Xn/22πλ0σ2
Xm/2eΣx2
i/(2σ2
X)·eΣy2
i/(2λ0σ2
X)
=2πσ2
X(m+n)/2λm/2
0e(λ0Σx2
iy2
i)/(2λ0σ2
X)
Differentiating the log likelihood gives
dlog L
d(σ2
X)2=d
2
Xm+n
2log σ2
Xm+n
2log (2π)m
2log λ0λ0Σx2
i+ Σy2
i
2λ0σ2
X
=m+n
2σ2
X1+λ0Σx2
i+ Σy2
i
2λ0σ2
X2set
= 0
which implies
ˆσ2
0=λ0Σx2
i+ Σy2
i
λ0(m+n).
To see this is a maximum, check the second derivative:
d2log L
d(σ2
X)2=m+n
2σ2
X21
λ0λ0Σx2
i+ Σy2
iσ2
X3σ2
X=ˆσ2
0
=m+n
2(ˆσ2
0)2<0,
9-2 Solutions Manual for Statistical Inference
therefore ˆσ2
0is the MLE. The LRT statistic is
ˆσ2
Xn/2ˆσ2
Ym/2
λm/2
0(ˆσ2
0)(m+n)/2,
and the test is: Reject H0if λ(x, y)< k, where kis chosen to give the test size α.
b. Under H0,PY2
i/(λ0σ2
X)χ2
mand PX2
i2
Xχ2
n, independent. Also, we can write
λ(X, Y ) =
1
n
m+n+Y2
i0σ2
X)/m
X2
i2
X)/n ·m
m+n
n/2
1
m
m+n+X2
i2
X)/n
Y2
i0σ2
X)/m ·n
m+n
m/2
="1
n
n+m+m
m+nF#n/2"1
m
m+n+n
m+nF1#m/2
where F=ΣY2
i0m
ΣX2
i/n Fm,n under H0. The rejection region is
(x, y): 1
hn
n+m+m
m+nFin/2·1
hm
m+n+n
m+nF1im/2< cα
where cαis chosen to satisfy
P(n
n+m+m
m+nFn/2m
n+m+n
m+nF1m/2
< cα)=α.
c. To ease notation, let a=m/(n+m) and b=aPy2
i/Px2
i. From the duality of hypothesis
tests and confidence sets, the set
c(λ) =
λ:1
a+b/λn/2 1
(1 a)+a(1a)
bλ!m/2
cα
is a 1αconfidence set for λ. We now must establish that this set is indeed an interval. To do
this, we establish that the function on the left hand side of the inequality has only an interior
maximum. That is, it looks like an upside-down bowl. Furthermore, it is straightforward to
establish that the function is zero at both λ= 0 and λ=. These facts imply that the set of
λvalues for which the function is greater than or equal to cαmust be an interval. We make
some further simplifications. If we multiply both sides of the inequality by [(1 a)/b]m/2,
we need be concerned with only the behavior of the function
h(λ) = 1
a+b/λn/21
b+m/2
.
Moreover, since we are most interested in the sign of the derivative of h, this is the same as
the sign of the derivative of log h, which is much easier to work with. We have
d
log h(λ) = d
hn
2log(a+b/λ)m
2log(b+)i
=n
2
b/λ2
a+b/λ m
2
a
b+
=1
2λ2(a+b/λ)(b+)a22+ab(nm)λ+nb2.
Second Edition 9-3
The sign of the derivative is given by the expression in square brackets, a parabola. It is easy
to see that for λ0, the parabola changes sign from positive to negative. Since this is the
sign change of the derivative, the function must increase then decrease. Hence, the function
is an upside-down bowl, and the set is an interval.
9.5 a. Analogous to Example 9.2.5, the test here will reject H0if T < k(p0). Thus the confidence
set is C={p:Tk(p)}. Since k(p) is nondecreasing, this gives an upper bound on p.
b. k(p) is the integer that simultaneously satisfies
n
X
y=k(p)n
ypy(1 p)ny1αand
n
X
y=k(p)+1 n
ypy(1 p)ny<1α.
9.6 a. For Y=PXibinomial(n, p), the LRT statistic is
λ(y) = n
ypy
0(1 p0)ny
n
yˆpy(1 ˆp)ny=p0(1 ˆp)
ˆp(1 p0)y1p0
1ˆpn
where ˆp=y/n is the MLE of p. The acceptance region is
A(p0) = (y:p0
ˆpy1p0
1ˆpny
k)
where kis chosen to satisfy Pp0(YA(p0)) = 1 α. Inverting the acceptance region to a
confidence set, we have
C(y) = (p:p
ˆpy(1 p)
1ˆpny
k).
b. For given nand observed y, write
C(y) = np: (n/y)y(n/(ny))nypy(1 p)nyko.
This is clearly a highest density region. The endpoints of C(y) are roots of the nth degree
polynomial (in p), (n/y)y(n/(ny))nypy(1 p)nyk. The interval of (10.4.4) is
(p:
ˆpp
pp(1 p)/nzα/2).
The endpoints of this interval are the roots of the second degree polynomial (in p), (ˆpp)2
z2
α/2p(1 p)/n. Typically, the second degree and nth degree polynomials will not have the
same roots. Therefore, the two intervals are different. (Note that when n→ ∞ and y→ ∞,
the density becomes symmetric (CLT). Then the two intervals are the same.)
9.7 These densities have already appeared in Exercise 8.8, where LRT statistics were calculated
for testing H0:a= 1.
a. Using the result of Exercise 8.8(a), the restricted MLE of θ(when a=a0) is
ˆ
θ0=a0+pa2
0+ 4 Px2
i/n
2,
and the unrestricted MLEs are
ˆ
θ= ¯xand ˆa=P(xi¯x)2
n¯x.
9-4 Solutions Manual for Statistical Inference
The LRT statistic is
λ(x) = ˆaˆ
θ
a0ˆ
θ0n/2e1
2a0ˆ
θ0Σ(xiˆ
θ0)2
e1
aˆ
θΣ(xiˆ
θ)2=1
2πa0ˆ
θ0n/2
en/2e1
2a0ˆ
θ0Σ(xiˆ
θ0)2
The rejection region of a size αtest is {x:λ(x)cα}, and a 1 αconfidence set is
{a0:λ(x)cα}.
b. Using the results of Exercise 8.8b, the restricted MLE (for a=a0) is found by solving
a0θ2+ [ˆσ2+ (¯xθ)2] + θ(¯xθ) = 0,
yielding the MLE ˆ
θR= ¯x+p¯x+ 4a0(ˆσ2+ ¯x2)/2a0.
The unrestricted MLEs are
ˆ
θ= ¯xand ˆa=1
n¯x2
n
X
i=1
(xi¯x)2=ˆσ2
¯x2,
yielding the LRT statistic
λ(x) = ˆσ/ˆ
θRne(n/2)Σ(xiˆ
θR)2/(2ˆ
θR).
The rejection region of a size αtest is {x:λ(x)cα}, and a 1 αconfidence set is
{a0:λ(x)cα}.
9.9 Let Z1, . . . , Znbe iid with pdf f(z).
a. For Xif(xµ), (X1, . . . , Xn)(Z1+µ, . . . , Zn+µ), and ¯
XµZ+µµ=¯
Z. The
distribution of ¯
Zdoes not depend on µ.
b. For Xif(x/σ), (X1, . . . , Xn)(σZ1, . . . , σZn), and ¯
XσZ=¯
Z. The distribu-
tion of ¯
Zdoes not depend on σ.
c. For Xif((xµ)), (X1, . . . , Xn)(σZ1+µ, . . . , σZn+µ), and ( ¯
Xµ)/SX
(σZ +µµ)/SσZ+µ=σ¯
Z/(σSZ) = ¯
Z/SZ. The distribution of ¯
Z/SZdoes not depend on
µor σ.
9.11 Recall that if θis the true parameter, then FT(T|θ)uniform(0,1). Thus,
Pθ0({T:α1FT(T|θ0)1α2}) = P(α1U1α2) = 1 α2α1,
where Uuniform(0,1). Since
t∈ {t:α1FT(t|θ)1α2} ⇔ θ∈ {θ:α1FT(t|θ)1α2}
the same calculation shows that the interval has confidence 1 α2α1.
9.12 If X1, . . . , Xniid n(θ, θ), then n(¯
Xθ)/θn(0,1) and a 1 αconfidence interval is
{θ:|n(¯xθ)/θ| ≤ zα/2}. Solving for θ, we get
nθ:2θ2n¯x+z2
α/2+n¯x20o=nθ:θ2n¯x+z2
α/2±q4n¯xz2
α/2+z4
α/2/2no.
Simpler answers can be obtained using the tpivot, ( ¯
Xθ)/(S/n), or the χ2pivot, (n1)S22.
(Tom Werhley of Texas A&M university notes the following: The largest probability of getting
a negative discriminant (hence empty confidence interval) occurs when =1
2zα/2, and
the probability is equal to α/2. The behavior of the intervals for negative values of ¯xis also
interesting. When ¯x= 0 the lefthand endpoint is also equal to 0, but when ¯x < 0, the lefthand
endpoint is positive. Thus, the interval based on ¯x= 0 contains smaller values of θthan that
based on ¯x < 0. The intervals get smaller as ¯xdecreases, finally becoming empty.)
Second Edition 9-5
9.13 a. For Y=(log X)1, the pdf of Yis fY(y) = θ
y2eθ/y , 0 < y < , and
P(Y/2θY) = Z2θ
θ
θ
y2eθ/y dy =eθ/y
2θ
θ=e1/2e1=.239.
b. Since fX(x) = θxθ1, 0 < x < 1, T=Xθis a good guess at a pivot, and it is since fT(t) = 1,
0< t < 1. Thus a pivotal interval is formed from P(a < Xθ< b) = baand is
θ:log b
log xθlog a
log x.
Since Xθuniform(0,1), the interval will have confidence .239 as long as ba=.239.
c. The interval in part a) is a special case of the one in part b). To find the best interval, we
minimize log blog asubject to ba= 1 α, or b= 1 α+a. Thus we want to minimize
log(1 α+a)log a= log 1+1α
a, which is minimized by taking aas big as possible.
Thus, take b= 1 and a=α, and the best 1 αpivotal interval is nθ: 0 θlog α
log xo. Thus
the interval in part a) is nonoptimal. A shorter interval with confidence coefficient .239 is
{θ: 0 θlog(1 .239)/log(x)}.
9.14 a. Recall the Bonferroni Inequality (1.2.9), P(A1A2)P(A1) + P(A2)1. Let A1=
P(interval covers µ) and A2=P(interval covers σ2). Use the interval (9.2.14), with tn1,α/4
to get a 1 α/2 confidence interval for µ. Use the interval after (9.2.14) with b=χ2
n1,α/4
and a=χ2
n1,1α/4to get a 1α/2 confidence interval for σ. Then the natural simultaneous
set is
Ca(x) = ((µ, σ2): ¯xtn1,α/4
s
nµ¯x+tn1,α/4
s
n
and (n1)s2
χ2
n1,α/4σ2(n1)s2
χ2
n1,1α/4)
and PCa(X) covers (µ, σ2)=P(A1A2)P(A1) + P(A2)1 = 2(1 α/2) 1 = 1 α.
b. If we replace the µinterval in a) by nµ: ¯xkσ
nµ¯x+kσ
nothen ¯
Xµ
σ/nn(0,1), so we
use zα/4and
Cb(x) = ((µ, σ2): ¯xzα/4
σ
nµ¯x+zα/4
σ
nand (n1)s2
χ2
n1,α/4σ2(n1)s2
χ2
n1,1α/4)
and PCb(X) covers (µ, σ2)2(1 α/2) 1 = 1 α.
c. The sets can be compared graphically in the (µ, σ) plane: Cais a rectangle, since µand σ2
are treated independently, while Cbis a trapezoid, with larger σ2giving a longer interval.
Their areas can also be calculated
Area of Ca=2tn1,α/4
s
n(q(n1)s2 1
χ2
n1,1α/41
χ2
n1,α/4!)
Area of Cb="zα/4
s
n sn1
χ2
n1,1α/4
+sn1
χ2
n1,α/4!#
×(q(n1)s2 1
χ2
n1,1α/41
χ2
n1,α/4!)
and compared numerically.
9-6 Solutions Manual for Statistical Inference
9.15 Fieller’s Theorem says that a 1 αconfidence set for θ=µYXis
(θ: ¯x2t2
n1,α/2
n1s2
X!θ22 ¯x¯yt2
n1,α/2
n1sY X !θ+ ¯y2t2
n1,α/2
n1s2
Y!0).
a. Define a= ¯x2ts2
X,b= ¯x¯ytsY X ,c= ¯y2ts2
Y, where t=t2
n1,α/2
n1. Then the parabola
opens upward if a > 0. Furthermore, if a > 0, then there always exists at least one real root.
This follows from the fact that at θ= ¯y/¯x, the value of the function is negative. For ¯
θ= ¯y/¯x
we have
¯x2ts2
X¯y
¯x2
2 (¯x¯ytsXY )¯y
¯x+¯y2as2
Y
=t¯y2
¯x2s2
X2¯y
¯xsXY +s2
Y
=t"n
X
i=1 ¯y2
¯x2(xi¯x)22¯y
¯x(xi¯x)(yi¯y)+(yi¯y)2#
=t"n
X
i=1 ¯y
¯x(xi¯x)(yi¯y)2#
which is negative.
b. The parabola opens downward if a < 0, that is, if ¯x2< ts2
X. This will happen if the test of
H0:µX= 0 accepts H0at level α.
c. The parabola has no real roots if b2< ac. This can only occur if a < 0.
9.16 a. The LRT (see Example 8.2.1) has rejection region {x:|¯xθ0|> zα/2σ/n}, acceptance
region A(θ0) = {x:zα/2σ/n¯xθ0zα/2σ/n}, and 1αconfidence interval C(θ) =
{θ: ¯xzα/2σ/nθ¯x+zα/2σ/n}.
b. We have a UMP test with rejection region {x: ¯xθ0<zασ/n}, acceptance region
A(θ0) = {x: ¯xθ0≥ −zασ/n}, and 1αconfidence interval C(θ) = {θ: ¯x+zασ/nθ}.
c. Similar to b), the UMP test has rejection region {x: ¯xθ0> zασ/n}, acceptance region
A(θ0) = {x: ¯xθ0zασ/n}, and 1 αconfidence interval C(θ) = {θ: ¯xzασ/nθ}.
9.17 a. Since Xθuniform(1/2,1/2), P(aXθb) = ba. Any aand bsatisfying
b=a+ 1 αwill do. One choice is a=1
2+α
2,b=1
2α
2.
b. Since T=X/θ has pdf f(t) = 2t, 0 t1,
P(aX/θ b) = Zb
a
2t dt =b2a2.
Any aand bsatisfying b2=a2+ 1 αwill do. One choice is a=pα/2, b=p1α/2.
9.18 a. Pp(X= 1) = 3
1p1(1 p)31= 3p(1 p)2, maximum at p= 1/3.
Pp(X= 2) = 3
2p2(1 p)32= 3p2(1 p), maximum at p= 2/3.
b. P(X= 0) = 3
0p0(1 p)30= (1 p)3, and this is greater than P(X= 2) if (1 p)2>3p2,
or 2p2+ 2p1<0. At p= 1/3, 2p2+ 2p1 = 1/9.
c. To show that this is a 1 α=.442 interval, compare with the interval in Example 9.2.11.
There are only two discrepancies. For example,
P(pinterval |.362 < p < .634) = P(X= 1 or X= 2) > .442
by comparison with Sterne’s procedure, which is given by
Second Edition 9-7
x interval
0 [.000,.305)
1 [.305,.634)
2 [.362,.762)
3 [.695,1].
9.19 For FT(t|θ) increasing in θ, there are unique values θU(t) and θL(t) such that FT(t|θ)<1α
2
if and only if θ < θU(t) and FT(t|θ)>α
2if and only if θ > θL(t). Hence,
P(θL(T)θθU(T)) = P(θθU(T)) P(θθL(T))
=PFT(T)1α
2PFT(T)α
2
= 1 α.
9.21 To construct a 1 αconfidence interval for pof the form {p:`pu}with P(`pu) =
1α, we use the method of Theorem 9.2.12. We must solve for `and uin the equations
(1) α
2=
x
X
k=0 n
kuk(1 u)nkand (2) α
2=
n
X
k=xn
k`k(1 `)nk.
In equation (1) α/2 = P(Kx) = P(Y1u), where Ybeta(nx, x + 1) and
Kbinomial(n, u). This is Exercise 2.40. Let ZF2(nx),2(x+1) and c= (nx)/(x+ 1). By
Theorem 5.3.8c, cZ/(1 + cZ)beta(nx, x + 1) Y. So we want
α/2 = PcZ
(1 + cZ)1u=P1
Zcu
1u.
From Theorem 5.3.8a, 1/Z F2(x+1),2(nx). So we need cu/(1u) = F2(x+1),2(nx),α/2. Solving
for uyields
u=
x+1
nxF2(x+1),2(nx),α/2
1 + x+1
nxF2(x+1),2(nx),α/2
.
A similar manipulation on equation (2) yields the value for `.
9.23 a. The LRT statistic for H0:λ=λ0versus H1:λ6=λ0is
g(y) = e0(0)y/enˆ
λ(nˆ
λ)y,
where Y=PXiPoisson() and ˆ
λ=y/n. The acceptance region for this test is
A(λ0) = {y:g(y)> c(λ0)) where c(λ0) is chosen so that P(YA(λ0)) 1α.g(y) is a
unimodal function of yso A(λ0) is an interval of yvalues. Consider constructing A(λ0) for
each λ0>0. Then, for a fixed y, there will be a smallest λ0, call it a(y), and a largest λ0,
call it b(y), such that yA(λ0). The confidence interval for λis then C(y)=(a(y), b(y)).
The values a(y) and b(y) are not expressible in closed form. They can be determined by a
numerical search, constructing A(λ0) for different values of λ0and determining those values
for which yA(λ0). (Jay Beder of the University of Wisconsin, Milwaukee, reminds us that
since cis a function of λ, the resulting confidence set need not be a highest density region
of a likelihood function. This is an example of the effect of the imposition of one type of
inference (frequentist) on another theory (likelihood).)
b. The procedure in part a) was carried out for y= 558 and the confidence interval was found to
be (57.78,66.45). For the confidence interval in Example 9.2.15, we need the values χ2
1116,.95 =
1039.444 and χ2
1118,.05 = 1196.899. This confidence interval is (1039.444/18,1196.899/18) =
(57.75,66.49). The two confidence intervals are virtually the same.
9-8 Solutions Manual for Statistical Inference
9.25 The confidence interval derived by the method of Section 9.2.3 is
C(y) = µ:y+1
nlog α
2µy+1
nlog 1α
2
where y= minixi. The LRT method derives its interval from the test of H0:µ=µ0versus
H1:µ6=µ0. Since Yis sufficient for µ, we can use fY(y|µ). We have
λ(y) = supµ=µ0L(µ|y)
supµ(−∞,)L(µ|y)=nen(yµ0)I[µ0,)(y)
ne(yy)I[µ,)(y)
=en(yµ0)I[µ0,)(y) = 0 if y < µ0
en(yµ0)if yµ0.
We reject H0if λ(y) = en(yµ0)< cα, where 0 cα1 is chosen to give the test level α. To
determine cα, set
α=P{reject H0|µ=µ0}=PY > µ0log cα
nor Y < µ0µ=µ0
=PY > µ0log cα
nµ=µ0=Z
µ0log cα
n
nen(yµ0)dy
=en(yµ0)
µ0log cα
n
=elog cα=cα.
Therefore, cα=αand the 1 αconfidence interval is
C(y) = µ:µyµlog α
n=µ:y+1
nlog αµy.
To use the pivotal method, note that since µis a location parameter, a natural pivotal quantity
is Z=Yµ. Then, fZ(z) = nenzI(0,)(z).Let P{aZb}= 1 α, where aand bsatisfy
α
2=Za
0
nenz dz =enza
0= 1 ena ena = 1 α
2
a=log 1α
2
n
α
2=Z
b
nenz dz =enz
b=enb ⇒ −nb = log α
2
b=1
nlog α
2
Thus, the pivotal interval is Y+ log(α/2)/n µY+ log(1 α/2), the same interval as from
Example 9.2.13. To compare the intervals we compare their lengths. We have
Length of LRT interval = y(y+1
nlog α) = 1
nlog α
Length of Pivotal interval = y+1
nlog(1 α/2)(y+1
nlog α/2) = 1
nlog 1α/2
α/2
Thus, the LRT interval is shorter if log α < log[(1 α/2)/(α/2)], but this is always satisfied.
9.27 a. Y=PXigamma(n, λ), and the posterior distribution of λis
π(λ|y) = (y+1
b)n+a
Γ(n+a)
1
λn+a+1 e1
λ(y+1
b),
Second Edition 9-9
an IG n+a, (y+1
b)1. The Bayes HPD region is of the form {λ:π(λ|y)k}, which is
an interval since π(λ|y) is unimodal. It thus has the form {λ:a1(y)λa2(y)}, where a1
and a2satisfy 1
a1n+a+1 e1
a1(y+1
b)=1
a2n+a+1 e1
a2(y+1
b).
b. The posterior distribution is IG(((n1)/2) + a, (((n1)s2/2) + 1/b)1). So the Bayes HPD
region is as in part a) with these parameters replacing n+aand y+ 1/b.
c. As a0 and b→ ∞, the condition on a1and a2becomes
1
a1((n1)/2)+1 e1
a1
(n1)s2
2=1
a2((n1)/2)+1 e1
a2
(n1)s2
2.
9.29 a. We know from Example 7.2.14 that if π(p)beta(a, b), the posterior is π(p|y)beta(y+
a, n y+b) for y=Pxi.Soa1αcredible set for pis:
{p:βy+a,ny+b,1α/2pβy+a,ny+b,α/2}.
b. Converting to an Fdistribution, βc,d =(c/d)F2c,2d
1+(c/d)F2c,2d, the interval is
y+a
ny+bF2(y+a),2(ny+b),1α/2
1 + y+a
ny+bF2(y+a),2(ny+b),1α/2p
y+a
ny+bF2(y+a),2(ny+b),α/2
1 + y+a
ny+bF2(y+a),2(ny+b),α/2
or, using the fact that Fm,n =F1
n,m,
1
1 + ny+b
y+aF2(ny+b),2(y+a),α/2p
y+a
ny+bF2(y+a),2(n+b),α/2
1 + y+a
ny+bF2(y+a),2(ny+b),α/2
.
For this to match the interval of Exercise 9.21, we need x=yand
Lower limit: ny+b=nx+ 1 b= 1
y+a=xa= 0
Upper limit: y+a=x+ 1 a= 1
ny+b=nxb= 0.
So no values of aand bwill make the intervals match.
9.31 a. We continually use the fact that given Y=y,χ2
2yis a central χ2random variable with 2y
degrees of freedom. Hence
Eχ2
2Y= E[E(χ2
2Y|Y)] = E2Y= 2λ
Varχ2
2Y= E[Var(χ2
2Y|Y)] + Var[E(χ2
2Y|Y)]
= E[4Y] + Var[2Y]=4λ+ 4λ= 8λ
mgf = Ee2
2Y= E[E(e2
2Y|Y)] = E 1
12tY
=
X
y=0
eλλ
12ty
y!=eλ+λ
12t.
From Theorem 2.3.15, the mgf of (χ2
2Y2λ)/8λis
etλ/2heλ+λ
1t/2λi.
9-10 Solutions Manual for Statistical Inference
The log of this is
pλ/2tλ+λ
1t/2λ=t2λ
t2+2λ=t2
(t2/λ)+2 t2/2 as λ→ ∞,
so the mgf converges to et2/2, the mgf of a standard normal.
b. Since P(χ2
2Yχ2
2Y) = αfor all λ,
χ2
2Y2λ
8λzαas λ→ ∞.
In standardizing (9.2.22), the upper bound is
nb
nb+1 χ2
2(Y+a),α/22λ
8λ=r8(λ+a)
8λ"nb
nb+1 [χ2
2(Y+a),α/22(λ+a)]
p8(λ+a)+
nb
nb+1 2(λ+a)2λ
p8(λ+a)#.
While the first quantity in square brackets zα/2, the second one has limit
lim
λ→∞
21
nb+1 λ+anb
nb+1
p8(λ+a)→ −∞,
so the coverage probability goes to zero.
9.33 a. Since 0 Ca(x) for every x,P(0 Ca(X)|µ= 0) = 1. If µ > 0,
P(µCa(X)) = P(µmax{0, X +a}) = P(µX+a) (since µ > 0)
=P(Z≥ −a) (Zn(0,1))
=.95 (a= 1.645.)
A similar calculation holds for µ < 0.
b. The credible probability is
Zmax(0,x+a)
min(0,xa)
1
2πe1
2(µx)2=Zmax(x,a)
min(x,a)
1
2πe1
2t2dt
=P(min(x, a)Zmax(x, a)) .
To evaluate this probability we have two cases:
(i) |x| ≤ acredible probability = P(|Z| ≤ a)
(ii) |x|> a credible probability = P(aZ≤ |x|)
Thus we see that for a= 1.645, the credible probability is equal to .90 if |x| ≤ 1.645 and
increases to .95 as |x| → ∞.
9.34 a. A 1 αconfidence interval for µis {µ: ¯x1.96σ/nµ¯x+ 1.96σ/n}. We need
2(1.96)σ/nσ/4 or n4(2)(1.96). Thus we need n64(1.96)2245.9. So n= 246
suffices.
b. The length of a 95% confidence interval is 2tn1,.025S/n. Thus we need
P2tn1,.025
S
nσ
4.9P4t2
n1,.025
S2
nσ2
16 .9
P
(n1)S2
σ2
| {z }
χ2
n1
(n1)n
t2
n1,.025 ·64
.9.
Second Edition 9-11
We need to solve this numerically for the smallest nthat satisfies the inequality
(n1)n
t2
n1,.025 ·64 χ2
n1,.1.
Trying different values of nwe find that the smallest such nis n= 276 for which
(n1)n
t2
n1,.025 ·64 = 306.0305.5 = χ2
n1,.1.
As to be expected, this is somewhat larger than the value found in a).
9.35 length = 2zα/2σ/n, and if it is unknown, E(length) = 2tα/2,n1/n, where
c=n1Γ(n1
2)
2Γ(n/2)
and EcS =σ(Exercise 7.50). Thus the difference in lengths is (2σ/n)(zα/2ctα/2). A little
work will show that, as n→ ∞,cconstant. (This can be done using Stirling’s formula along
with Lemma 2.3.14. In fact, some careful algebra will show that c1 as n→ ∞.) Also, we know
that, as n→ ∞,tα/2,n1zα/2. Thus, the difference in lengths (2σ/n)(zα/2ctα/2)0
as n→ ∞.
9.36 The sample pdf is
f(x1, . . . , xn|θ) =
n
Y
i=1
exiI(,)(xi) = eΣ(xi)I(θ,)[min(xi/i)].
Thus T= min(Xi/i) is sufficient by the Factorization Theorem, and
P(T > t) =
n
Y
i=1
P(Xi> it) =
n
Y
i=1 Z
it
exdx =
n
Y
i=1
ei(θt)=en(n+1)
2(tθ),
and
fT(t) = n(n+ 1)
2en(n+1)
2(tθ), t θ.
Clearly, θis a location parameter and Y=Tθis a pivot. To find the shortest confidence
interval of the form [T+a, T +b], we must minimize basubject to the constraint P(b
Y≤ −a)=1α. Now the pdf of Yis strictly decreasing, so the interval length is shortest if
b= 0 and asatisfies
P(0 Y≤ −a) = en(n+1)
2a= 1 α.
So a= 2 log(1 α)/(n(n+ 1)).
9.37 a. The density of Y=X(n)is fY(y) = nyn1n, 0 < y < θ. So θis a scale parameter, and
T=Yis a pivotal quantity. The pdf of Tis fT(t) = ntn1, 0 t1.
b. A pivotal interval is formed from the set
{θ:atb}=nθ:ay
θbo=nθ:y
bθy
ao,
and has length Y(1/a 1/b) = Y(ba)/ab. Since fT(t) is increasing, bais minimized
and ab is maximized if b= 1. Thus shortest interval will have b= 1 and asatisfying
α=Ra
0ntn1dt =ana=α1/n. So the shortest 1 αconfidence interval is {θ:yθ
y1/n}.
9-12 Solutions Manual for Statistical Inference
9.39 Let abe such that Ra
−∞ f(x)dx =α/2. This value is unique for a unimodal pdf if α > 0. Let µ
be the point of symmetry and let b= 2µa. Then f(b) = f(a) and R
bf(x)dx =α/2. aµ
since Ra
−∞ f(x)dx =α/21/2 = Rµ
−∞ f(x)dx. Similarly, bµ. And, f(b) = f(a)>0 since
f(a)f(x) for all xaand Ra
−∞ f(x)dx =α/2>0f(x)>0 for some x < a f(a)>0.
So the conditions of Theorem 9.3.2 are satisfied.
9.41 a. We show that for any interval [a, b] and  > 0, the probability content of [a, b ] is
greater (as long as b > a). Write
Za
b
f(x)dx Zb
a
f(x)dx =Zb
b
f(x)dx Za
a
f(x)dx
f(b)[b(b)] f(a)[a(a)]
[f(b)f(a)] 0,
where all of the inequalities follow because f(x) is decreasing. So moving the interval toward
zero increases the probability, and it is therefore maximized by moving a all the way to zero.
b. T=Yµis a pivot with decreasing pdf fT(t) = nentI[0,](t). The shortest 1 αinterval
on Tis [0,1
nlog α], since
Zb
0
nent dt = 1 αb=1
nlog α.
Since aTbimplies YbµYa, the best 1αinterval on µis Y+1
nlog αµY.
9.43 a. Using Theorem 8.3.12, identify g(t) with f(x|θ1) and f(t) with f(x|θ0). Define φ(t) = 1 if
tCand 0 otherwise, and let φ0be the indicator of any other set C0satisfying RC0f(t)dt
1α. Then (φ(t)φ0(t))(g(t)λf(t)) 0 and
0Z(φφ0)(gλf) = ZC
gZC0
gλZC
fZC0
fZC
gZC0
g,
showing that Cis the best set.
b. For Exercise 9.37, the pivot T=Yhas density ntn1, and the pivotal interval aTb
results in the θinterval Y/b θY/a. The length is proportional to 1/a 1/b, and thus
g(t) = 1/t2. The best set is {t: 1/t2λntn1}, which is a set of the form {t:at1}.
This has probability content 1 αif a=α1/n. For Exercise 9.24 (or Example 9.3.4), the g
function is the same and the density of the pivot is fk, the density of a gamma(k, 1). The
set {t: 1/t2λfk(t)}={t:fk+2(t)λ0}, so the best aand bsatisfy Rb
afk(t)dt = 1 α
and fk+2(a) = fk+2(b).
9.45 a. Since Y=PXigamma(n, λ) has MLR, the Karlin-Rubin Theorem (Theorem 8.3.2)
shows that the UMP test is to reject H0if Y < k(λ0), where P(Y < k(λ0)|λ=λ0) = α.
b. T= 2Yχ2
2nso choose k(λ0) = 1
2λ0χ2
2n,α and
{λ:Yk(λ)}=λ:Y1
2λχ2
2n,α=λ: 0 < λ 2Y2
2n,α
is the UMA confidence set.
c. The expected length is E 2Y
χ2
2n,α =2
χ2
2n,α .
d. X(1) exponential(λ/n), so EX(1) =λ/n. Thus
E(length(C)) = 2×120
251.046λ=.956λ
E(length(Cm)) = λ
120 ×log(.99) =.829λ.
Second Edition 9-13
9.46 The proof is similar to that of Theorem 9.3.5:
Pθ(θ0C(X)) = Pθ(XA(θ0)) Pθ(XA(θ0)) = Pθ(θ0C(X)) ,
where Aand Care any competitors. The inequality follows directly from Definition 8.3.11.
9.47 Referring to (9.3.2), we want to show that for the upper confidence bound, Pθ(θ0C)1α
if θ0θ. We have
Pθ(θ0C) = Pθ(θ0¯
X+zασ/n).
Subtract θfrom both sides and rearrange to get
Pθ(θ0C) = Pθθ0θ
σ/n¯
Xθ
σ/n+zα=PZθ0θ
σ/nzα,
which is less than 1 αas long as θ0θ. The solution for the lower confidence interval is
similar.
9.48 a. Start with the hypothesis test H0:θθ0versus H1:θ < θ0. Arguing as in Example 8.2.4
and Exercise 8.47, we find that the LRT rejects H0if ( ¯
Xθ0)/(S/n)<tn1. So the
acceptance region is {x: (¯xθ0)/(s/n)≥ −tn1}and the corresponding confidence set
is {θ: ¯x+tn1s/nθ}.
b. The test in part a) is the UMP unbiased test so the interval is the UMA unbiased interval.
9.49 a. Clearly, for each σ, the conditional probability Pθ0(¯
X > θ0+zασ/n|σ) = α, hence the
test has unconditional size α. The confidence set is {(θ) : θ¯xzασ/n}, which has
confidence coefficient 1 αconditionally and, hence, unconditionally.
b. From the Karlin-Rubin Theorem, the UMP test is to reject H0if X > c. To make this size
α,
Pθ0(X > c) = Pθ0(X > c|σ= 10) P(σ= 10) + P(X > c|σ= 1) P(σ= 1)
=pP Xθ0
10 >cθ0
10 + (1 p)P(Xθ0> c θ0)
=pP Z > cθ0
10 + (1 p)P(Z > c θ0),
where Zn(0,1). Without loss of generality take θ0= 0. For c=z(αp)/(1p)we have for
the proposed test
Pθ0(reject) = p+ (1 p)PZ > z(αp)/(1p)
=p+ (1 p)(αp)
(1 p)=p+αp=α.
This is not UMP, but more powerful than part a. To get UMP, solve for cin pP (Z >
c/10) + (1 p)P(Z > c) = α, and the UMP test is to reject if X > c. For p= 1/2, α=.05,
we get c= 12.81. If α=.1 and p=.05, c= 1.392 and z.1.05
.95 =.0526= 1.62.
9.51
Pθ(θC(X1, . . . , Xn)) = Pθ¯
Xk1θ¯
X+k2
=Pθk2¯
Xθk1
=Pθk2XZi/n k1,
where Zi=Xiθ,i= 1, . . . , n. Since this is a location family, for any θ,Z1, . . . , Znare iid
with pdf f(z), i. e., the Zis are pivots. So the last probability does not depend on θ.
9-14 Solutions Manual for Statistical Inference
9.52 a. The LRT of H0:σ=σ0versus H1:σ6=σ0is based on the statistic
λ(x) = supµ,σ=σ0L(µ, σ0|x)
supµ,σ(0,)L(µ, σ2|x).
In the denominator, ˆσ2=P(xi¯x)2/n and ˆµ= ¯xare the MLEs, while in the numerator,
σ2
0and ˆµare the MLEs. Thus
λ(x) = 2πσ2
0n/2eΣ(xi¯x)2
2σ2
0
(2πˆσ2)n/2eΣ(xi¯x)2
2σ2
=σ2
0
ˆσ2n/2eΣ(xi¯x)2
2σ2
0
en/2,
and, writing ˆσ2= [(n1)/n]s2, the LRT rejects H0if
σ2
0
n1
ns2n/2
e(n1)s2
2σ2
0< kα,
where kαis chosen to give a size αtest. If we denote t=(n1)s2
σ2
0, then Tχ2
n1under H0,
and the test can be written: reject H0if tn/2et/2< k0
α. Thus, a 1 αconfidence set is
nσ2:tn/2et/2k0
αo=(σ2:(n1)s2
σ2n/2
e(n1)s2
σ2/2k0
α).
Note that the function tn/2et/2is unimodal (it is the kernel of a gamma density) so it
follows that the confidence set is of the form
nσ2:tn/2et/2k0
αo=σ2:atb=σ2:a(n1)s2
σ2b
=σ2:(n1)s2
bσ2(n1)s2
b,
where aand bsatisfy an/2ea/2=bn/2eb/2(since they are points on the curve tn/2et/2).
Since n
2=n+2
21, aand balso satisfy
1
Γn+2
22(n+2)/2a((n+2)/2)1ea/2=1
Γn+2
22(n+2)/2b((n+2)/2)1eb/2,
or, fn+2(a) = fn+2(b).
b. The constants aand bmust satisfy fn1(b)b2=fn1(a)a2. But since b((n1)/2)1b2=
b((n+3)/2)1, after adjusting constants, this is equivalent to fn+3(b) = fn+3(a). Thus, the
values of aand bthat give the minimum length interval must satisfy this along with the
probability constraint. The confidence interval, say I(s2) will be unbiased if (Definition 9.3.7)
c.
Pσ2σ02I(S2)Pσ2σ2I(S2)= 1 α.
Some algebra will establish
Pσ2σ02I(S2)=Pσ2 (n1)S2
2σ02
σ2(n1)S2
2!
=Pσ2χ2
n1
bσ02
σ2χ2
n1
a=Zbc
ac
fn1(t)dt,
Second Edition 9-15
where c=σ022. The derivative (with respect to c) of this last expression is bfn1(bc)
afn1(ac), and hence is equal to zero if both c= 1 (so the interval is unbiased) and
bfn1(b) = afn1(a). From the form of the chi squared pdf, this latter condition is equivalent
to fn+1(b) = fn+1(a).
d. By construction, the interval will be 1 αequal-tailed.
9.53 a. E [blength(C)IC(µ)] = 2b P(|Z| ≤ c), where Zn(0,1).
b. d
dc [2b P(|Z| ≤ c)] = 2σb 21
2πec2/2.
c. If > 1/2πthe derivative is always positive since ec2/2<1.
9.55
E[L((µ,σ), C)] = E [L((µ,σ), C)|S < K]P(S < K) + E [L((µ,σ), C)|S > K]P(S > K)
= E L((µ,σ), C0)|S < KP(S < K) + E [L((µ,σ), C)|S > K]P(S > K)
=RL((µ,σ), C0)+ E [L((µ,σ), C)|S > K]P(S > K),
where the last equality follows because C0=if S > K. The conditional expectation in the
second term is bounded by
E [L((µ,σ), C)|S > K] = E [blength(C)IC(µ)|S > K]
= E [2bcS IC(µ)|S > K]
>E [2bcK 1|S > K] (since S > K and IC1)
= 2bcK 1,
which is positive if K > 1/2bc. For those values of K,C0dominates C.
9.57 a. The distribution of Xn+1 ¯
Xis n[0, σ2(1 + 1/n)], so
PXn+1 ¯
X±zα/2σp1+1/n=P(|Z| ≤ zα/2) = 1 α.
b. ppercent of the normal population is in the interval µ±zp/2σ, so ¯x±kσ is a 1 αtolerance
interval if
P(µ±zp/2σ¯
X±kσ) = P(¯
Xkσ µzp/2σand ¯
X+kσ µ+zp/2σ)1α.
This can be attained by requiring
P(¯
Xkσ µzp/2σ) = α/2 and P(¯
X+kσ µ+zp/2σ) = α/2,
which is attained for k=zp/2+zα/2/n.
c. From part (a), (Xn+1 ¯
X)/(Sp1+1/n)tn1,soa1αprediction interval is ¯
X±
tn1,α/2Sp1+1/n.
Chapter 10
Asymptotic Evaluations
10.1 First calculate some moments for this distribution.
EX=θ/3,EX2= 1/3,VarX=1
3θ2
9.
So 3 ¯
Xnis an unbiased estimator of θwith variance
Var(3 ¯
Xn) = 9(VarX)/n = (3 θ2)/n 0 as n→ ∞.
So by Theorem 10.1.3, 3 ¯
Xnis a consistent estimator of θ.
10.3 a. The log likelihood is
n
2log (2πθ)1
2X(xiθ).
Differentiate and set equal to zero, and a little algebra will show that the MLE is the root
of θ2+θW= 0. The roots of this equation are (1±1+4W)/2, and the MLE is the
root with the plus sign, as it has to be nonnegative.
b. The second derivative of the log likelihood is (2Px2
i+)/(2θ3), yielding an expected
Fisher information of
I(θ) = Eθ2PX2
i+
2θ3=2+n
2θ2,
and by Theorem 10.1.12 the variance of the MLE is 1/I(θ).
10.4 a. Write PXiYi
PX2
i
=PXi(Xi+i)
PX2
i
= 1 + PXii
PX2
i
.
From normality and independence
EXii= 0,VarXii=σ2(µ2+τ2),EX2
i=µ2+τ2,VarX2
i= 2τ2(2µ2+τ2),
and Cov(Xi, Xii) = 0. Applying the formulas of Example 5.5.27, the asymptotic mean
and variance are
EPXiYi
PX2
i1 and Var PXiYi
PX2
i2(µ2+τ2)
[n(µ2+τ2)]2=σ2
n(µ2+τ2)
b. PYi
PXi
=β+Pi
PXi
with approximate mean βand variance σ2/(2).
10-2 Solutions Manual for Statistical Inference
c. 1
nXYi
Xi
=β+1
nXi
Xi
with approximate mean βand variance σ2/(2).
10.5 a. The integral of ET2
nis unbounded near zero. We have
ET2
n>rn
2πσ2Z1
0
1
x2e(xµ)2/2σ2dx > rn
2πσ2KZ1
0
1
x2dx =,
where K= max0x1e(xµ)2/2σ2
b. If we delete the interval (δ, δ), then the integrand is bounded, that is, over the range of
integration 1/x2<12.
c. Assume µ > 0. A similar argument works for µ < 0. Then
P(δ < X < δ) = P[n(δµ)<n(Xµ)<n(δµ)] < P [Z < n(δµ)],
where Zn(0,1). For δ < µ, the probability goes to 0 as n→ ∞.
10.7 We need to assume that τ(θ) is differentiable at θ=θ0, the true value of the parameter. Then
we apply Theorem 5.5.24 to Theorem 10.1.12.
10.9 We will do a more general problem that includes a) and b) as special cases. Suppose we want
to estimate λteλ/t! = P(X=t). Let
T=T(X1, . . . , Xn) = 1 if X1=t
0 if X16=t.
Then ET=P(T= 1) = P(X1=t), so Tis an unbiased estimator. Since PXiis a complete
sufficient statistic for λ, E(T|PXi) is UMVUE. The UMVUE is 0 for y=PXi< t, and for
yt,
E(T|y) = P(X1=t|XXi=y)
=P(X1=t, PXi=y)
P(PXi=y)
=P(X1=t)P(Pn
i=2 Xi=yt)
P(PXi=y)
={λteλ/t!}{[(n1)λ]yte(n1)λ/(yt)!}
()ye/y!
=y
t(n1)yt
ny.
a. The best unbiased estimator of eλis ((n1)/n)y.
b. The best unbiased estimator of λeλis (y/n)[(n1)/n]y1
c. Use the fact that for constants aand b,
d
λabλ=bλλa1(a+λlog b),
to calculate the asymptotic variances of the UMVUEs. We have for t= 0,
ARE n1
nnˆ
λ
, eλ!="eλ
n1
nlog n1
nn#2
,
Second Edition 10-3
and for t= 1
ARE n
n1ˆ
λn1
nnˆ
λ
,ˆ
λeλ!="(λ1)eλ
n
n1n1
n1 + log n1
nn#2
.
Since [(n1)/n]ne1as n→ ∞, both of these AREs are equal to 1 in the limit.
d. For these data, n= 15, PXi=y= 104 and the MLE of λis ˆ
λ=¯
X= 6.9333. The
estimates are
MLE UMVUE
P(X= 0) .000975 .000765
P(X= 1) .006758 .005684
10.11 a. It is easiest to use the Mathematica code in Example A.0.7. The second derivative of the
log likelihood is
2
µ2log 1
Γ[µ/β]βµ/β x1+µ/β ex/β =1
β2ψ0(µ/β),
where ψ(z) = Γ0(z)/Γ(z) is the digamma function.
b. Estimation of βdoes not affect the calculation.
c. For µ=αβ known, the MOM estimate of βis ¯x/α. The MLE comes from differentiating
the log likelihood
d
αn log βX
i
xi!set
= 0 β= ¯x/α.
d. The MOM estimate of βcomes from solving
1
nX
i
xi=µand 1
nX
i
x2
i=µ2+µβ,
which yields ˜
β= ˆσ2/¯x. The approximate variance is quite a pain to calculate. Start from
E¯
X=µ, Var ¯
X=1
nµβ, Eˆσ2µβ, Varˆσ22
nµβ3,
where we used Exercise 5.8(b) for the variance of ˆσ2. Now using Example 5.5.27 and (and
assuming the covariance is zero), we have Var ˜
β3β3
. The ARE is then
ARE( ˆ
β, ˜
β) = 3β3Ed2
2l(µ, β|X.
Here is a small table of AREs. There are some entries that are less than one - this is due
to using an approximation for the MOM variance.
µ
β1 3 6 10
1 1.878 0.547 0.262 0.154
2 4.238 1.179 0.547 0.317
3 6.816 1.878 0.853 0.488
4 9.509 2.629 1.179 0.667
5 12.27 3.419 1.521 0.853
6 15.075 4.238 1.878 1.046
7 17.913 5.08 2.248 1.246
8 20.774 5.941 2.629 1.451
9 23.653 6.816 3.02 1.662
10 26.546 7.704 3.419 1.878
10-4 Solutions Manual for Statistical Inference
10.13 Here are the 35 distinct samples from {2,4,9,12}and their weights.
{12,12,12,12},1/256 {9,12,12,12},1/64 {9,9,12,12},3/128
{9,9,9,12},1/64 {9,9,9,9},1/256 {4,12,12,12},1/64
{4,9,12,12},3/64 {4,9,9,12},3/64 {4,9,9,9},1/64
{4,4,12,12},3/128 {4,4,9,12},3/64 {4,4,9,9},3/128
{4,4,4,12},1/64 {4,4,4,9},1/64 {4,4,4,4},1/256
{2,12,12,12},1/64 {2,9,12,12},3/64 {2,9,9,12},3/64
{2,9,9,9},1/64 {2,4,12,12},3/64 {2,4,9,12},3/32
{2,4,9,9},3/64 {2,4,4,12},3/64 {2,4,4,9},3/64
{2,4,4,4},1/64 {2,2,12,12},3/128 {2,2,9,12},3/64
{2,2,9,9},3/128 {2,2,4,12},3/64 {2,2,4,9},3/64
{2,2,4,4},3/128 {2,2,2,12},1/64 {2,2,2,9},1/64
{2,2,2,4},1/64 {2,2,2,2},1/256
The verifications of parts (a)(d) can be done with this table, or the table of means
in Example A.0.1 can be used. For part (e),verifying the bootstrap identities can involve
much painful algebra, but it can be made easier if we understand what the bootstrap sample
space (the space of all nnbootstrap samples) looks like. Given a sample x1, x2, . . . , xn, the
bootstrap sample space can be thought of as a data array with nnrows (one for each
bootstrap sample) and ncolumns, so each row of the data array is one bootstrap sample.
For example, if the sample size is n= 3, the bootstrap sample space is
x1x1x1
x1x1x2
x1x1x3
x1x2x1
x1x2x2
x1x2x3
x1x3x1
x1x3x2
x1x3x3
x2x1x1
x2x1x2
x2x1x3
x2x2x1
x2x2x2
x2x2x3
x2x3x1
x2x3x2
x2x3x3
x3x1x1
x3x1x2
x3x1x3
x3x2x1
x3x2x2
x3x2x3
x3x3x1
x3x3x2
x3x3x3
Note the pattern. The first column is 9 x1s followed by 9 x2s followed by 9 x3s, the second
column is 3 x1s followed by 3 x2s followed by 3 x3s, then repeated, etc. In general, for the
entire bootstrap sample,
Second Edition 10-5
The first column is nn1x1s followed by nn1x2s followed by, . . ., followed by nn1xns
The second column is nn2x1s followed by nn2x2s followed by, . . ., followed by nn2
xns, repeated ntimes
The third column is nn3x1s followed by nn3x2s followed by, . . ., followed by nn3
xns, repeated n2times
.
.
.
The nth column is 1 x1followed by 1 x2followed by, . . ., followed by 1 xn, repeated nn1
times
So now it is easy to see that each column in the data array has mean ¯x, hence the entire
bootstrap data set has mean ¯x. Appealing to the 33×3 data array, we can write the
numerator of the variance of the bootstrap means as
3
X
i=1
3
X
j=1
3
X
k=1 1
3(xi+xj+xk)¯x2
=1
32
3
X
i=1
3
X
j=1
3
X
k=1
[(xi¯x)+(xj¯x)+(xk¯x)]2
=1
32
3
X
i=1
3
X
j=1
3
X
k=1 (xi¯x)2+ (xj¯x)2+ (xk¯x)2,
because all of the cross terms are zero (since they are the sum of deviations from the mean).
Summing up and collecting terms shows that
1
32
3
X
i=1
3
X
j=1
3
X
k=1 (xi¯x)2+ (xj¯x)2+ (xk¯x)2= 3
3
X
i=1
(xi¯x)2,
and thus the average of the variance of the bootstrap means is
3P3
i=1(xi¯x)2
33
which is the usual estimate of the variance of ¯
Xif we divide by ninstead of n1. The
general result should now be clear. The variance of the bootstrap means is
n
X
i1=1
n
X
i2=1 ···
n
X
in=1 1
n(xi1+xi2+··· +xin)¯x2
=1
n2
n
X
i1=1
n
X
i2=1 ···
n
X
in=1 (xi1¯x)2+ (xi2¯x)2+··· + (xin¯x)2,
since all of the cross terms are zero. Summing and collecting terms shows that the sum is
nn2Pn
i=1(xi¯x)2, and the variance of the bootstrap means is nn2Pn
i=1(xi¯x)2/nn=
Pn
i=1(xi¯x)2/n2.
10.15 a. As B→ ∞ Var
B(ˆ
θ) = Var(ˆ
θ).
b. Each Var
Bi(ˆ
θ) is a sample variance, and they are independent so the LLN applies and
1
m
m
X
i=1
Var
Bi(ˆ
θ)m→∞
EVar
B(ˆ
θ) = Var(ˆ
θ),
where the last equality follows from Theorem 5.2.6(c).
10-6 Solutions Manual for Statistical Inference
10.17 a. The correlation is .7781
b. Here is R code (R is available free at http://cran.r-project.org/) to bootstrap the data,
calculate the standard deviation, and produce the histogram:
cor(law)
n <- 15
theta <- function(x,law){ cor(law[x,1],law[x,2]) }
results <- bootstrap(1:n,1000,theta,law,func=sd)
results[2]
hist(results[[1]])
The data “law” is in two columns of length 15, “results[2]” contains the standard deviation.
The vector “results[[1]]” is the bootstrap sample. The output is
V1 V2
V1 1.0000000 0.7781716
V2 0.7781716 1.0000000
$func.thetastar
[1] 0.1322881
showing a correlation of .7781 and a bootstrap standard deviation of .1323.
c. The R code for the parametric bootstrap is
mx<-600.6;my<-3.09
sdx<-sqrt(1791.83);sdy<-sqrt(.059)
rho<-.7782;b<-rho*sdx/sdy;sdxy<-sqrt(1-rho^2)*sdx
rhodata<-rho
for (j in 1:1000) {
y<-rnorm(15,mean=my,sd=sdy)
x<-rnorm(15,mean=mx+b*(y-my),sd=sdxy)
rhodata<-c(rhodata,cor(x,y))
}
sd(rhodata)
hist(rhodata)
where we generate the bivariate normal by first generating the marginal then the condid-
ional, as R does not have a bivariate normal generator. The bootstrap standard deviation
is 0.1159, smaller than the nonparametric estimate. The histogram looks similar to the
nonparametric bootstrap histogram, displaying a skewness left.
d. The Delta Method approximation is
rn(ρ, (1 ρ2)2/n),
and the “plug-in” estimate of standard error is p(1 .77822)2/15 = .1018, the smallest so
far. Also, the approximate pdf of rwill be normal, hence symmetric.
e. By the change of variables
t=1
2log 1 + r
1r, dt =1
1r2,
the density of ris
1
2π(1 r2)exp n
21
2log 1 + r
1r1
2log 1 + ρ
1ρ2!,1r1.
More formally, we could start with the random variable T, normal with mean 1
2log 1+ρ
1ρ
and variance 1/n, and make the transformation to R=e2T+1
e2T1and get the same answer.
Second Edition 10-7
10.19 a. The variance of ¯
Xis
Var ¯
X= E( ¯
Xµ)2= E 1
nX
i
Xiµ!2
=1
n2E
X
i
(Xiµ)2+ 2 X
i>j
(Xiµ)(Xjµ)
=1
n22+ 2n(n1)
2ρσ2
=σ2
n+n1
nρσ2
b. In this case we have
E
X
i>j
(Xiµ)(Xjµ)
=σ2
n
X
i=2
i1
X
j=1
ρij.
In the double sum ρappears n1 times, ρ2appears n2 times, etc.. so
n
X
i=2
i1
X
j=1
ρij=
n1
X
i=1
(ni)ρi=ρ
1ρn1ρn
1ρ,
where the series can be summed using (1.5.4), the partial sum of the geometric series, or
using Mathematica.
c. The mean and variance of Xiare
EXi= E[E(Xi|Xi1)] = EρXi1=··· =ρi1EX1
and
VarXi= VarE(Xi|Xi1) + EVar(Xi|Xi1) = ρ2σ2+ 1 = σ2
for σ2= 1/(1 ρ2). Also, by iterating the expectation
EX1Xi= E[E(X1Xi|Xi1)] = E[E(X1|Xi1)E(Xi|Xi1)] = ρE[X1Xi1],
where we used the facts that X1and Xiare independent conditional on Xi1. Continuing
with the argument we get that EX1Xi=ρi1EX2
1. Thus,
Corr(X1, Xi) = ρi1EX2
1ρi1(EX1)2
VarX1VarXi
=ρi1σ2
σ2σ2=ρi1.
10.21 a. If any xi→ ∞,s2→ ∞, so it has breakdown value 0. To see this, suppose that x1→ ∞.
Write
s2=1
n1
n
X
i=1
(xi¯x)2=1
n1 [(1 1
n)x1¯x1]2+
n
X
i=2
(xi¯x)2!,
where ¯x1= (x2+. . . +xn)/n. It is easy to see that as x1→ ∞, each term in the sum
→ ∞.
b. If less than 50% of the sample → ∞, the median remains the same, and the median of
|xiM|remains the same. If more than 50% of the sample → ∞,M→ ∞ and so does
the MAD.
10-8 Solutions Manual for Statistical Inference
10.23 a. The ARE is [2σf(µ)]2. We have
Distribution Parameters variance f(µ) ARE
normal µ= 0, σ = 1 1 .3989 .64
logistic µ= 0, β = 1 π2/3.25 .82
double exp. µ= 0, σ = 1 2 .5 2
b. If X1, X2, . . . , Xnare iid fXwith EX1=µand VarX1=σ2, the ARE is σ2[2 fX(µ)]2.
If we transform to Yi= (Xiµ), the pdf of Yiis fY(y) = σfX(σy +µ) with ARE
[2 fY(0)]2=σ2[2 fX(µ)]2
c. The median is more efficient for smaller ν, the distributions with heavier tails.
νVarX f(0) ARE
3 3 .367 1.62
5 5/3.379 .960
10 5/4.389 .757
25 25/23 .395 .678
50 25/24 .397 .657
1.399 .637
d. Again the heavier tails favor the median.
δ σ ARE
.01 2 .649
.1 2 .747
.5 2 .895
.01 5 .777
.1 5 1.83
.5 5 2.98
10.25 By transforming y=xθ,
Z
−∞
ψ(xθ)f(xθ)dx =Z
−∞
ψ(y)f(y)dy.
Since ψis an odd function, ψ(y) = ψ(y), and
Z
−∞
ψ(y)f(y)dy =Z0
−∞
ψ(y)f(y)dy +Z
0
ψ(y)f(y)dy
=Z0
−∞ ψ(y)f(y)dy +Z
0
ψ(y)f(y)dy
=Z
0
ψ(y)f(y)dy +Z
0
ψ(y)f(y)dy = 0,
where in the last line we made the transformation y→ −yand used the fact the fis symmetric,
so f(y) = f(y). From the discussion preceding Example 10.2.6, ˆ
θMis asymptotically normal
with mean equal to the true θ.
10.27 a.
lim
δ0
1
δ[(1 δ)µ+δx µ] = lim
δ0
δ(xµ)
δ=xµ.
b.
P(Xa) = P(Xa|XF)(1 δ) + P(xa|X=x)δ= (1 δ)F(a) + δI(xa)
Second Edition 10-9
and
(1 δ)F(a) = 1
2a=F11
2(1 δ)
(1 δ)F(a) + δ=1
2a=F11
2δ
2(1 δ)
c. The limit is
lim
δ0
aδa0
δ=a0
δ|δ=0
by the definition of derivative. Since F(aδ) = 1
2(1δ),
d
F(aδ) = d
1
2(1 δ)
or
f(aδ)a0
δ=1
2(1 δ)2a0
δ=1
2(1 δ)2f(aδ).
Since a0=m, the result follows. The other limit can be calculated in a similar manner.
10.29 a. Substituting cl0for ψmakes the ARE equal to 1.
b. For each distribution is the case that the given ψfunction is equal to cl0, hence the resulting
M-estimator is asymptotically efficient by (10.2.9).
10.31 a. By the CLT,
n1
ˆp1p1
pp1(1 p1)n(0,1) and n2
ˆp2p2
pp2(1 p2)n(0,1),
so if ˆp1and ˆp2are independent, under H0:p1=p2=p,
ˆp1ˆp2
r1
n1+1
n2ˆp(1 ˆp)n(0,1)
where we use Slutsky’s Theorem and the fact that ˆp= (S1+S2)/(n1+n2) is the MLE of
p under H0and converges to pin probability. Therefore, Tχ2
1.
b. Substitute ˆpis for Siand Fis to get
T=n2
1(ˆp1ˆp)2
n1ˆp+n2
2(ˆp2ˆp)2
n2ˆp
+n2
1[(1 ˆp1)(1 ˆp)]2
n1(1 ˆp)+n2
2[(1 ˆp2)(1 ˆp)]2
n2ˆp
=n1(ˆp1ˆp)2
ˆp(1 ˆp)+n2(ˆp2ˆp)2
ˆp(1 ˆp)
Write ˆp= (n1ˆp1+n2ˆp2)/(n1+n2). Substitute this into the numerator, and some algebra
will get
n1(ˆp1ˆp)2+n2(ˆp2ˆp)2=(ˆp1ˆp2)2
1
n1+1
n2
,
so T=T.
10-10 Solutions Manual for Statistical Inference
c. Under H0,ˆp1ˆp2
r1
n1+1
n2p(1 p)n(0,1)
and both ˆp1and ˆp2are consistent, so ˆp1(1 ˆp1)p(1 p) and ˆp2(1 ˆp2)p(1 p) in
probability. Therefore, by Slutsky’s Theorem,
ˆp1ˆp2
qˆp1(1ˆp1)
n1+ˆp2(1ˆp2)
n2
n(0,1),
and (T∗∗)2χ2
1. It is easy to see that T∗∗ 6=Tin general.
d. The estimator (1/n1+ 1/n2)ˆp(1 ˆp) is the MLE of Var(ˆp1ˆp2) under H0, while the
estimator ˆp1(1 ˆp1)/n1+ ˆp2(1 ˆp2)/n1is the MLE of Var(ˆp1ˆp2) under H1. One might
argue that in hypothesis testing, the first one should be used, since under H0, it provides
a better estimator of variance. If interest is in finding the confidence interval, however, we
are making inference under both H0and H1, and the second one is preferred.
e. We have ˆp1= 34/40, ˆp2= 19/35, ˆp= (34 + 19)/(40 + 35) = 53/75, and T= 8.495. Since
χ2
1,.05 = 3.84, we can reject H0at α=.05.
10.32 a. First calculate the MLEs under p1=p2=p. We have
L(p|x) = px1px2px3···pxn1
n1 12p
n1
X
i=3
pi!mx1x2−···−xn1
.
Taking logs and differentiating yield the following equations for the MLEs:
logL
p =x1+x2
p
2mPn1
i=1 xi
12pPn1
i=3 pi
= 0
logL
pi
=xi
pixn
12pPn1
i=3 pi
= 0, i = 3, . . . , n 1,
with solutions ˆp=x1+x2
2m, ˆpi=xi
m, i = 3, . . . , n 1, and ˆpn=mPn1
i=1 xi/m. Except
for the first and second cells, we have expected = observed, since both are equal to xi. For
the first two terms, expected = mˆp= (x1+x2)/2 and we get
X(observed expected)2
expected =x1x1+x2
22
x1+x2
2
+x2x1+x2
22
x1+x2
2
=(x1x2)2
x1+x2
.
b. Now the hypothesis is about conditional probabilities is given by H0: P(change—initial
agree)=P(change—initial disagree) or, in terms of the parameters H0:p1
p1+p3=p2
p2+p4.
This is the same as p1p4=p2p3, which is not the same as p1=p2.
10.33 Theorem 10.1.12 and Slutsky’s Theorem imply that
ˆ
θθ
q1
nIn(ˆ
θ)n(0,1)
and the result follows.
10.35 a. Since σ/nis the estimated standard deviation of ¯
Xin this case, the statistic is a Wald
statistic
Second Edition 10-11
b. The MLE of σ2is ˆσ2
µ=Pi(xiµ)2/n. The information number is
d2
d(σ2)2 n
2log σ21
2
ˆσ2
µ
σ2!σ2=ˆσ2
µ
=n
2ˆσ2
µ
.
Using the Delta method, the variance of ˆσµ=qˆσ2
µis ˆσ2
µ/8n, and a Wald statistic is
ˆσµσ0
qσ2
µ/8n
.
10.37 a. The log likelihood is
log L=n
2log σ21
2X
i
(xiµ)22
with
d
=1
σ2X
i
(xiµ) = n
σ2(¯xµ)
d2
2=n
σ2,
so the test statistic for the score test is
n
σ2(¯xµ)
pσ2/n =n¯xµ
σ
b. We test the equivalent hypothesis H0:σ2=σ2
0. The likelihood is the same as Exercise
10.35(b), with first derivative
d
2=n(ˆσ2
µσ2)
2σ4
and expected information number
E n(2ˆσ2
µσ2)
2σ6!=n(2σ2σ2)
2σ6=n
2σ4.
The score test statistic is
rn
2
ˆσ2
µσ2
0
σ2
0
10.39 We summarize the results for (a)(c) in the following table. We assume that the underlying
distribution is normal, and use that for all score calculations. The actual data is generated
from normal, logistic, and double exponential. The sample size is 15, we use 1000 simulations
and draw 20 bootstrap samples. Here θ0= 0, and the power is tabulated for a nominal α=.1
test.
10-12 Solutions Manual for Statistical Inference
Underlying
pdf Test θ0θ0+.25σ θ0+.5σ θ0+.75σ θ0+ 1σ θ0+ 2σ
Laplace Naive 0.101 0.366 0.774 0.957 0.993 1.
Boot 0.097 0.364 0.749 0.932 0.986 1.
Median 0.065 0.245 0.706 0.962 0.995 1.
Logistic Naive 0.137 0.341 0.683 0.896 0.97 1.
Boot 0.133 0.312 0.641 0.871 0.967 1.
Median 0.297 0.448 0.772 0.944 0.993 1.
Normal Naive 0.168 0.316 0.628 0.878 0.967 1.
Boot 0.148 0.306 0.58 0.836 0.957 1.
Median 0.096 0.191 0.479 0.761 0.935 1.
Here is Mathematica code:
This program calculates size and power for Exercise 10.39, Second Edition
We do our calculations assuming normality, but simulate power and size under other distri-
butions. We test H0:θ= 0.
theta_0=0;
Needs["Statistics‘Master‘"]
Clear[x]
f1[x_]=PDF[NormalDistribution[0,1],x];
F1[x_]=CDF[NormalDistribution[0,1],x];
f2[x_]=PDF[LogisticDistribution[0,1],x];
f3[x_]=PDF[LaplaceDistribution[0,1],x];
v1=Variance[NormalDistribution[0,1]];
v2=Variance[LogisticDistribution[0,1]];
v3=Variance[LaplaceDistribution[0,1]];
Calculate m-estimate
Clear[k,k1,k2,t,x,y,d,n,nsim,a,w1]
ind[x_,k_]:=If[Abs[x]<k,1,0]
rho[y_,k_]:=ind[y,k]*y^2 + (1-ind[y,k])*(k*Abs[y]-k^2)
alow[d_]:=Min[Mean[d],Median[d]]
aup[d_]:=Max[Mean[d],Median[d]]
sol[k_,d_]:=FindMinimum[Sum[rho[d[[i]]-a,k],{i,1,n}],{a,{alow[d],aup[d]}}]
mest[k_,d_]:=sol[k,d][[2]]
generate data - to change underlying distributions change the sd and the distribution in the
Random statement.
n = 15; nsim = 1000; sd = Sqrt[v1];
theta = {theta_0, theta_0 +.25*sd, theta_0 +.5*sd,
theta_0 +.75*sd, theta_0 + 1*sd, theta_0 +2*sd}
ntheta = Length[theta]
data = Table[Table[Random[NormalDistribution[0, 1]],
{i, 1, n}],{j, 1,nsim}];
m1 = Table[Table[a /. mest[k1, data[[j]] - theta[[i]]],
{j, 1, nsim}], {i, 1, n\theta}];
Calculation of naive variance and test statistic
Psi[x_, k_] = x*If[Abs[x]<= k, 1, 0]- k*If[x < -k, 1, 0] +
Second Edition 10-13
k*If[x > k, 1, 0];
Psi1[x_, k_] = If[Abs[x] <= k, 1, 0];
num =Table[Psi[w1[[j]][[i]], k1], {j, 1, nsim}, {i, 1,n}];
den =Table[Psi1[w1[[j]][[i]], k1], {j, 1, nsim}, {i, 1,n}];
varnaive = Map[Mean, num^2]/Map[Mean, den]^2;
naivestat = Table[Table[m1[[i]][[j]] -theta_0/Sqrt[varnaive[[j]]/n],
{j, 1, nsim}],{i, 1, ntheta}];
absnaive = Map[Abs, naivestat];
N[Table[Mean[Table[If[absnaive[[i]][[j]] > 1.645, 1, 0],
{j, 1, nsim}]], {i, 1, n\theta}]]
Calculation of bootstrap variance and test statistic
nboot=20;
u:=Random[DiscreteUniformDistribution[n]]
databoot=Table[data[[jj]][[u]],{jj,1,nsim},{j,1,nboot},{i,1,n}];
m1boot=Table[Table[a/.mest[k1,databoot[[j]][[jj]]],
{jj,1,nboot}],{j,1,nsim}];
varboot = Map[Variance, m1boot];
bootstat = Table[Table[m1[[i]][[j]] -theta_0/Sqrt[varboot[[j]]],
{j, 1, nsim}], {i, 1, ntheta}];
absboot = Map[Abs, bootstat];
N[Table[Mean[Table[If[absboot[[i]][[j]] > 1.645, 1,0],
{j, 1, nsim}]], {i, 1, ntheta}]]\)
Calculation of median test - use the score variance at the root density (normal)
med = Map[Median, data];
medsd = 1/(n*2*f1[theta_0]);
medstat = Table[Table[med[[j]] + \theta[[i]] - theta_0/medsd,
{j, 1, nsim}], {i, 1, ntheta}];
absmed = Map[Abs, medstat];
N[Table[Mean[Table[If[\(absmed[[i]][[j]] > 1.645, 1, 0],
{j, 1, nsim}]], {i, 1, ntheta}]]
10.41 a. The log likelihood is
log L=nr log p+n¯xlog(1 p)
with
d
dp log L=nr
pn¯x
1pand d2
dp2log L=nr
p2n¯x
(1 p)2,
expected information nr
p2(1p)and (Wilks) score test statistic
nr
pn¯x
1p
qr
p2(1p)
=rn
r(1 p)r+p¯x
1p.
Since this is approximately n(0,1), a 1 αconfidence set is
p:rn
r(1 p)rp¯x
1pzα/2.
10-14 Solutions Manual for Statistical Inference
b. The mean is µ=r(1 p)/p, and a little algebra will verify that the variance, r(1 p)/p2
can be written r(1 p)/p2=µ+µ2/r. Thus
rn
r(1 p)rp¯x
1p=nµ¯x
pµ+µ2/r .
The confidence interval is found by setting this equal to zα/2, squaring both sides, and
solving the quadratic for µ. The endpoints of the interval are
r(8¯x+z2
α/2)±qrz2
α/2q16r¯x+ 16¯x2+rz2
α/2
8r2z2
α/2
.
For the continuity correction, replace ¯xwith ¯x+1/(2n) when solving for the upper endpoint,
and with ¯x1/(2n) when solving for the lower endpoint.
c. We table the endpoints for α=.1 and a range of values of r. Note that r=is the
Poisson, and smaller values of rgive a wider tail to the negative binomial distribution.
rlower bound upper bound
1 22.1796 364.42
5 36.2315 107.99
10 38.4565 95.28
50 40.6807 85.71
100 41.0015 84.53
1000 41.3008 83.46
41.3348 83.34
10.43 a. Since
P X
i
Xi= 0!= (1 p)n=α/2p= 1 α1/n
and
P X
i
Xi=n!=pn=α/2p=α1/n,
these endpoints are exact, and are the shortest possible.
b. Since p[0,1], any value outside has zero probability, so truncating the interval shortens
it at no cost.
10.45 The continuity corrected roots are
2ˆp+z2
α/2/n ±1
n±rz2
α/2
n3[±2n(1 2ˆp)1] + (2ˆp+z2
α/2/n)24ˆp2(1 + z2
α/2/n)
2(1 + z2
α/2/n)
where we use the upper sign for the upper root and the lower sign for the lower root. Note that
the only differences between the continuity-corrected intervals and the ordinary score intervals
are the terms with ±in front. But it is still difficult to analytically compare lengths with the
non-corrected interval - we will do a numerical comparison. For n= 10 and α=.1 we have
the following table of length ratios, with the continuity-corrected length in the denominator
n0 1 2 3 4 5 6 7 8 9 10
Ratio 0.79 0.82 0.84 0.85 0.86 0.86 0.86 0.85 0.84 0.82 0.79
The coverage probabilities are
Second Edition 10-15
p0.1.2.3.4.5.6.7.8.9 1
score .99 .93 .97 .92 .90 .89 .90 .92 .97 .93 .99
cc .99 .99 .97 .92 .98 .98 .98 .92 .97 .99 .99
Mathematica code to do the calculations is:
Needs["Statistics‘Master‘"]
Clear[p, x]
pbino[p_, x_] = PDF[BinomialDistribution[n, p], x];
cut = 1.645^2;
n = 10;
The quadratic score interval with and without continuity correction
slowcc[x_] := p /. FindRoot[(x/n - 1/(2*n) - p)^2 ==
cut*(p*((1 - p))/n, {p, .001}]
supcc[x_] := p /. FindRoot[(x/n + 1/(2*n) - p)^2 ==
cut*(p*((1 - p)/n, {p, .999}]
slow[x_] := p /. FindRoot[(x/n - p))^2 ==
cut*(p*(1 - p))/n, {p, .001}]
sup[x_] := p /. FindRoot[(x/n - p)^2 ==
cut*(p*(1 - p)/n, {p, .999}]
scoreintcc=Partition[Flatten[{{0,sup[0]},Table[{slowcc[i],supcc[i]},
{i,1,n-1}],{slowcc[n],1}},2],2];
scoreint=Partition[Flatten[{{0,sup[0]},Table[{slow[i],sup[i]},
{i,1,n-1}],{slowcc[n],1}},2],2];
Length Comparison
Table[(sup[i] - slow[i])/(supcc[i] - slowcc[i]), {i, 0, n}]
Now we’ll calculate coverage probabilities
scoreindcc[p_,x_]:=If[scoreintcc[[x+1]][[1]]<=p<=scoreintcc[[x+1]][[2]],1,0]
scorecovcc[p_]:=scorecovcc[p]=Sum[pbino[p,x]*scoreindcc[p,x],{x,0,n}]
scoreind[p_,x_]:=If[scoreint[[x+1]][[1]]<=p<=scoreint[[x+1]][[2]],1,0]
scorecov[p_]:=scorecov[p]=Sum[pbino[p,x]*scoreind[p,x],{x,0,n}]
{scorecovcc[.0001],Table[scorecovcc[i/10],{i,1,9}],scorecovcc[.9999]}//N
{scorecov[.0001],Table[scorecov[i/10],{i,1,9}],scorecov[.9999]}//N
10.47 a. Since 2pY χ2
nr (approximately)
P(χ2
nr,1α/22pY χ2
nr,α/2) = 1 α,
and rearrangment gives the interval.
b. The interval is of the form P(a/2Ypb/2Y), so the length is proportional to ba.
This must be minimized subject to the constraint Rb
af(y)dy = 1 α, where f(y) is the pdf
of a χ2
nr. Treating bas a function of a, differentiating gives
b01 = 0 and f(b)b0f(a) = 0
which implies that we need f(b) = f(a).
Chapter 11
Analysis of Variance and Regression
11.1 a. The first order Taylor’s series approximation is
Var[g(Y)] [g0(θ)]2·VarY= [g0(θ)]2·v(θ).
b. If we choose g(y) = g(y) = Ry
a
1
v(x)dx, then
dg(θ)
=d
Zθ
a
1
pv(x)dx =1
pv(θ),
by the Fundamental Theorem of Calculus. Then, for any θ,
Var[g(Y)] 1
pv(θ)!2
v(θ) = 1.
11.2 a. v(λ) = λ,g(y) = y,dg(λ)
=1
2λ, Varg(Y)dg(λ)
2·v(λ) = 1/4, independent of λ.
b. To use the Taylor’s series approximation, we need to express everything in terms of θ=
EY=np. Then v(θ) = θ(1 θ/n) and
dg(θ)
2
=
1
q1θ
n
·1
2qθ
n
·1
n
2
=1
4(1 θ/n).
Therefore
Var[g(Y)] dg(θ)
2
v(θ) = 1
4n,
independent of θ, that is, independent of p.
c. v(θ) = Kθ2,dg(θ)
=1
θand Var[g(Y)] 1
θ2·Kθ2=K, independent of θ.
11.3 a. g
λ(y) is clearly continuous with the possible exception of λ= 0. For that value use
l’Hˆopital’s rule to get
lim
λ0
yλ1
λ= lim
λ0
(log y)yλ
1= log y.
b. From Exercise 11.1, we want to find v(λ) that satisfies
yλ1
λ=Zy
a
1
pv(x)dx.
Taking derivatives
d
dy yλ1
λ=yλ1=d
dy Zy
a
1
pv(x)dx =1
pv(y).
11-2 Solutions Manual for Statistical Inference
Thus v(y) = y2(λ1).From Exercise 11.1,
Var yλ1
λd
dy
θλ1
λ2
v(θ) = θ2(λ1)θ2(λ1) = 1.
Note: If λ= 1/2, v(θ) = θ, which agrees with Exercise 11.2(a). If λ= 1 then v(θ) = θ2,
which agrees with Exercise 11.2(c).
11.5 For the model
Yij =µ+τi+εij , i = 1, . . . , k, j = 1, . . . , ni,
take k= 2. The two parameter configurations
(µ, τ1, τ2) = (10,5,2)
(µ, τ1, τ2) = (7,8,5),
have the same values for µ+τ1and µ+τ2, so they give the same distributions for Y1and Y2.
11.6 a. Under the ANOVA assumptions Yij =θi+ij , where ij independent n(0, σ2), so Yij
independent n(θi, σ2). Therefore the sample pdf is
k
Y
i=1
ni
Y
j=1
(2πσ2)1/2e(yij θi)2
2σ2= (2πσ2)Σni/2exp
1
2σ2
k
X
i=1
ni
X
j=1
(yij θi)2
= (2πσ2)Σni/2exp (1
2σ2
k
X
i=1
niθ2
i)
×exp
1
2σ2X
iX
j
y2
ij +2
2σ2
k
X
i=1
θini¯
Yi·
.
Therefore, by the Factorization Theorem,
¯
Y1·,¯
Y2·, . . . , ¯
Yk·,X
iX
j
Y2
ij
is jointly sufficient for θ1, . . . , θk, σ2. Since ( ¯
Y1·, . . . , ¯
Yk·, S2
p) is a 1-to-1 function of this
vector, ( ¯
Y1·, . . . , ¯
Yk·, S2
p) is also jointly sufficient.
b. We can write
(2πσ2)Σni/2exp
1
2σ2
k
X
i=1
ni
X
j=1
(yij θi)2
= (2πσ2)Σni/2exp
1
2σ2
k
X
i=1
ni
X
j=1
([yij ¯yi·] + [¯yi·θi])2
= (2πσ2)Σni/2exp
1
2σ2
k
X
i=1
ni
X
j=1
[yij ¯yi·]2
exp (1
2σ2
k
X
i=1
ni[¯yi·θi]2),
so, by the Factorization Theorem, ¯
Yi·,i= 1, . . . , n, is independent of Yij ¯
Yi·,j= 1, . . . , ni,
so S2
pis independent of each ¯
Yi·.
c. Just identify ni¯
Yi·with Xiand redefine θias niθi.
Second Edition 11-3
11.7 Let Ui=¯
Yi·θi. Then
k
X
i=1
ni[( ¯
Yi·¯
¯
Y)(θi¯
θ)]2=
k
X
i=1
ni(Ui¯
U)2.
The Uiare clearly n(0, σ2/ni). For K= 2 we have
S2
2=n1(U1¯
U)2+n2(U2¯
U)2
=n1U1n1¯
U1+n2¯
U2
n1+n22
+n2U2n1¯
U1+n2¯
U2
n1+n22
= (U1U2)2"n1n2
n1+n22
+n2n1
n1+n22#
=(U1U2)2
1
n1+1
n2
.
Since U1U2n(0, σ2(1/n1+ 1/n2)), S2
22χ2
1. Let ¯
Ukbe the weighted mean of k Uis,
and note that
¯
Uk+1 =¯
Uk+nk+1
Nk+1
(Uk+1 ¯
Uk),
where Nk=Pk
j=1 nj. Then
S2
k+1 =
k+1
X
i=1
ni(Ui¯
Uk+1)2=
k+1
X
i=1
ni(Ui¯
Uk)nk+1
Nk+1
(Uk+1 ¯
Uk)2
=S2
k+nk+1Nk
Nk+1
(Uk+1 ¯
Uk)2,
where we have expanded the square, noted that the cross-term (summed up to k) is zero, and
did a boat-load of algebra. Now since
Uk+1 ¯
Ukn(0, σ2(1/nk+1 + 1/Nk)) = n(0, σ2(Nk+1/nk+1Nk)),
independent of S2
k, the rest of the argument is the same as in the proof of Theorem 5.3.1(c).
11.8 Under the oneway ANOVA assumptions, Yij independent n(θi, σ2). Therefore
¯
Yi·nθi, σ2/ni(Yij ’s are independent with common σ2.)
ai¯
Yi·naiθi, a2
iσ2/ni
k
X
i=1
ai¯
Yi·n Xaiθi, σ2
k
X
i=1
a2
i/ni!.
All these distributions follow from Corollary 4.6.10.
11.9 a. From Exercise 11.8,
T=Xai¯
YinXaiθi, σ2Xa2
i,
and under H0, ET=δ. Thus, under H0,
Pai¯
Yiδ
qS2
pPa2
itNk,
11-4 Solutions Manual for Statistical Inference
where N=Pni. Therefore, the test is to reject H0if
Pai¯
Yiδ
qS2
pPa2
i/ni
> tNk, α
2.
b. Similarly for H0:Paiθiδvs. H1:Paiθi> δ, we reject H0if
Pai¯
Yiδ
qS2
pPa2
i/ni
> tNk,α.
11.10 a. Let Hi
0,i= 1, . . . , 4 denote the null hypothesis using contrast ai, of the form
Hi
0:X
j
aij θj0.
If H1
0is rejected, it indicates that the average of θ2,θ3,θ4, and θ5is bigger than θ1which
is the control mean. If all Hi
0’s are rejected, it indicates that θ5> θifor i= 1,2,3,4. To see
this, suppose H4
0and H5
0are rejected. This means θ5>θ5+θ4
2> θ3; the first inequality is
implied by the rejection of H5
0and the second inequality is the rejection of H4
0. A similar
argument implies θ5> θ2and θ5> θ1. But, for example, it does not mean that θ4> θ3or
θ3> θ2. It also indicates that
1
2(θ5+θ4)> θ3,1
3(θ5+θ4+θ3)> θ2,1
4(θ5+θ4+θ3+θ2)> θ1.
b. In part a) all of the contrasts are orthogonal. For example,
5
X
i=1
a2ia3i=0,1,1
3,1
3,1
3
0
0
1
1
2
1
2
=1
3+1
6+1
6= 0,
and this holds for all pairs of contrasts. Now, from Lemma 5.4.2,
Cov X
i
aji ¯
Yi·,X
i
aj0i¯
Yi·!=σ2
nX
i
ajiaj0i,
which is zero because the contrasts are orthogonal. Note that the equal number of obser-
vations per treatment is important, since if ni6=ni0for some i,i0, then
Cov k
X
i=1
aji ¯
Yi,
k
X
i=1
aj0i¯
Yi!=
k
X
i=1
ajiaj0i
σ2
ni
=σ2
k
X
i=1
ajiaj0i
ni6= 0.
c. This is not a set of orthogonal contrasts because, for example, a1×a2=1. However, each
contrast can be interpreted meaningfully in the context of the experiment. For example, a1
tests the effect of potassium alone, while a5looks at the effect of adding zinc to potassium.
11.11 This is a direct consequence of Lemma 5.3.3.
11.12 a. This is a special case of (11.2.6) and (11.2.7).
Second Edition 11-5
b. From Exercise 5.8(a) We know that
s2=1
k1
k
X
i=1
(¯yi·¯
¯y)2=1
2k(k1) X
i,i0
(¯yi·¯yi0·)2.
Then
1
k(k1) X
i,i0
t2
ii0=1
2k(k1) X
i,i0
(¯yi·¯yi0·)2
s2
p/n =
k
X
i=1
(¯yi·¯
¯y)2
(k1)s2
p/n
=Pin(¯yi·¯
¯y)2/(k1)
s2
p
,
which is distributed as Fk1,Nkunder H0:θ1=··· =θk. Note that
X
i,i0
t2
ii0=
k
X
i=1
k
X
i0=1
t2
ii0,
therefore t2
ii0and t2
i0iare both included, which is why the divisor is k(k1), not k(k1)
2=k
2.
Also, to use the result of Example 5.9(a), we treated each mean ¯
Yi·as an observation, with
overall mean ¯
¯
Y. This is true for equal sample sizes.
11.13 a.
L(θ|y) = 1
2πσ2Nk/2
e1
2Pk
i=1 Pni
j=1(yij θi)22
.
Note that
k
X
i=1
ni
X
j=1
(yij θi)2=
k
X
i=1
ni
X
j=1
(yij ¯yi·)2+
k
X
i=1
ni(¯yi·θi)2
=SSW +
k
X
i=1
ni(¯yi·θi)2,
and the LRT statistic is
λ= (ˆτ2/ˆτ2
0)Nk/2
where
ˆτ2=SSW and ˆτ2
0=SSW +X
i
ni(¯yi·¯y··)2=SSW +SSB.
Thus λ<kif and only if SSB/SSW is large, which is equivalent to the Ftest.
b. The error probabilities of the test are a function of the θis only through η=Pθ2
i. The
distribution of Fis that of a ratio of chi squared random variables, with the numerator
being noncentral (dependent on η). Thus the Type II error is given by
P(F > k|η) = Pχ2
k1(η)/(k1)
χ2
Nk/(Nk)> kPχ2
k1(0)/(k1)
χ2
Nk/(Nk)> k=α,
where the inequality follows from the fact that the noncentral chi squared is stochastically
increasing in the noncentrality parameter.
11-6 Solutions Manual for Statistical Inference
11.14 Let Xin(θi, σ2). Then from Exercise 11.11
Cov Pi
ai
ciXi,PiciviXi=σ2Paivi
Var Pi
ai
ciXi=σ2Pa2
i
ci,Var PiciviXi=σ2Pciv2
i,
and the Cauchy-Schwarz inequality gives
Xaivi.Xa2
i/ciXciv2
i.
If ai=civithis is an equality, hence the LHS is maximized. The simultaneous statement is
equivalent to
Pk
i=1 ai(¯yi·θi)2
s2
pPk
i=1 a2
i/nMfor all a1, . . . , ak,
and the LHS is maximized by ai=ni(¯yi·θi). This produces the Fstatistic.
11.15 a. Since t2
ν=F1, it follows from Exercise 5.19(b) that for k2
P[(k1)Fk1a]P(t2
νa).
So if a=t2
ν,α/2, the Fprobability is greater than α, and thus the α-level cutoff for the F
must be greater than t2
ν,α/2.
b. The only difference in the intervals is the cutoff point, so the Scheff´e intervals are wider.
c. Both sets of intervals have nominal level 1 α, but since the Scheff´e intervals are wider,
tests based on them have a smaller rejection region. In fact, the rejection region is contained
in the trejection region. So the tis more powerful.
11.16 a. If θi=θjfor all i,j, then θiθj= 0 for all i,j, and the converse is also true.
b. H0:θ∈ ∩ij Θij and H1:θ∈ ∪ij ij )c.
11.17 a. If all of the means are equal, the Scheff´e test will only reject αof the time, so the ttests
will be done only αof the time. The experimentwise error rate is preserved.
b. This follows from the fact that the ttests use a smaller cutoff point, so there can be rejection
using the ttest but no rejection using Scheff´e. Since Scheff´e has experimentwise level α,
the ttest has experimentwise error greater than α.
c. The pooled standard deviation is 2.358, and the means and tstatistics are
Mean tstatistic
Low Medium High Med-Low High-Med High-Low
3.51. 9.27 24.93 3.86 10.49 14.36
The tstatistics all have 12 degrees of freedom and, for example, t12,.01 = 2.68, so all of the
tests reject and we conclude that the means are all significantly different.
11.18 a.
P(Y > a|Y > b) = P(Y > a, Y > b)/P (Y > b)
=P(Y > a)/P (Y > b) (a > b)
> P (Y > a).(P(Y > b)<1)
b. If ais a cutoff point then we would declare significance if Y > a. But if we only check if Yis
significant because we see a big Y(Y > b), the proper significance level is P(Y > a|Y > b),
which will show less significance than P(Y > a).
Second Edition 11-7
11.19 a. The marginal distributions of the Yiare somewhat straightforward to derive. As Xi+1
gamma(λi+1,1) and, independently, Pi
j=1 Xjgamma(Pi
j=1 λj,1) (Example 4.6.8), we
only need to derive the distribution of the ratio of two independent gammas. Let X
gamma(λ1,1) and Y gamma(λ2,1). Make the transformation
u=x/y, v =yx=uv, y =v,
with Jacobian v. The density of (U, V ) is
f(u, v) = 1
Γ(λ1)Γ(λ2)(uv)λ11vλ21veuv ev=uλ11
Γ(λ1)Γ(λ2)vλ1+λ21ev(1+u).
To get the density of U, integrate with respect to v. Note that we have the kernel of a
gamma(λ1+λ2,1/(1 + u)), which yields
f(u) = Γ(λ1+λ2)
Γ(λ1)Γ(λ2)
uλ11
(1 + u)λ1+λ21.
The joint distribution is a nightmare. We have to make a multivariate change of variable.
This is made a bit more palatable if we do it in two steps. First transform
W1=X1, W2=X1+X2, W3=X1+X2+X3, . . . , Wn=X1+X2+··· +Xn,
with
X1=W1, X2=W2W1, X3=W3W2, . . . Xn=WnWn1,
and Jacobian 1. The joint density of the Wiis
f(w1, w2, . . . , wn) =
n
Y
i=1
1
Γ(λi)(wiwi1)λi1ewn, w1w2≤ ··· ≤ wn,
where we set w0= 0 and note that the exponent telescopes. Next note that
y1=w2w1
w1
, y2=w3w2
w2
, . . . yn1=wnwn1
wn1
, yn=wn,
with
wi=yn
Qn1
j=i(1 + yj), i = 1, . . . , n 1, wn=yn.
Since each wionly involves yjwith ji, the Jacobian matrix is triangular and the
determinant is the product of the diagonal elements. We have
dwi
dyi
=yn
(1 + yi)Qn1
j=i(1 + yj), i = 1, . . . , n 1,dwn
dyn
= 1,
and
f(y1, y2, . . . , yn) = 1
Γ(λ1) yn
Qn1
j=1 (1 + yj)!λ11
×
n1
Y
i=2
1
Γ(λi) yn
Qn1
j=i(1 + yj)yn
Qn1
j=i1(1 + yj)!λi1
eyn
×
n1
Y
i=1
yn
(1 + yi)Qn1
j=i(1 + yj).
11-8 Solutions Manual for Statistical Inference
Factor out the terms with ynand do some algebra on the middle term to get
f(y1, y2, . . . , yn) = yΣiλi1
neyn1
Γ(λ1) 1
Qn1
j=1 (1 + yj)!λ11
×
n1
Y
i=2
1
Γ(λi) yi1
1 + yi1
1
Qn1
j=i(1 + yj)!λi1
×
n1
Y
i=1
1
(1 + yi)Qn1
j=i(1 + yj).
We see that Ynis independent of the other Yi(and has a gamma distribution), but there
does not seem to be any other obvious conclusion to draw from this density.
b. The Yiare related to the Fdistribution in the ANOVA. For example, as long as the sum
of the λiare integers,
Yi=Xi+1
Pi
j=1 Xj
=2Xi+1
2Pi
j=1 Xj
=χ2
λi+1
χ2
Pi
j=1 λj
Fλi+1,Pi
j=1 λj.
Note that the Fdensity makes sense even if the λiare not integers.
11.21 a.
Grand mean ¯y·· =188.54
15 = 12.57
Total sum of squares =
3
X
i=1
5
X
j=1
(yij ¯y··)2= 1295.01.
Within SS =
3
X
1
5
X
1
(yij ¯yi·)2
=
5
X
1
(y1j3.508)2+
5
X
1
(y2j9.274)2+
5
X
1
(y3j24.926)2
= 1.089 + 2.189 + 63.459 = 66.74
Between SS = 5 3
X
1
(yij ¯yi·)2!
= 5(82.120 + 10.864 + 152.671) = 245.65 ·5 = 1228.25.
ANOVA table:
Source df SS MS F
Treatment 2 1228.25 614.125 110.42
Within 12 66.74 5.562
Total 14 1294.99
Note that the total SS here is different from above – round off error is to blame. Also,
F2,12 = 110.42 is highly significant.
b. Completing the proof of (11.2.4), we have
k
X
i=1
ni
X
j=1
(yij ¯
¯y)2=
k
X
i=1
ni
X
j=1
((yij ¯yi·) + (¯yi¯
¯y))2
Second Edition 11-9
=
k
X
i=1
ni
X
j=1
(yij ¯yi·)2+
k
X
i=1
ni
X
j=1
(¯yi·¯
¯y)2
+
k
X
i=1
ni
X
j=1
(yij ¯yi·) (¯yi·¯
¯y),
where the cross term (the sum over j) is zero, so the sum of squares is partitioned as
k
X
i=1
ni
X
j=1
(yij ¯yi·)2+
k
X
i=1
ni(¯yi¯
¯y)2
c. From a), the Fstatistic for the ANOVA is 110.42. The individual two-sample t’s, using
s2
p=1
153(66.74) = 5.5617, are
t2
12 =(3.508 9.274)2
(5.5617)(2/5) =33.247
2.2247 = 14.945,
t2
13 =(3.508 24.926)2
2.2247 = 206.201,
t2
23 =(9.274 24.926)2
2.2247 = 110.122,
and 2(14.945) + 2(206.201) + (110.122)
6= 110.42 = F.
11.23 a.
EYij = E(µ+τi+bj+ij ) = µ+τi+ Ebj+ Eij =µ+τi
VarYij = Varbj+ Varij =σ2
B+σ2,
by independence of bjand ij .
b.
Var n
X
i=1
ai¯
Yi·!=
n
X
i=1
a2
iVar ¯
Yi·+ 2 X
i>i0
Cov(aiYi·, ai0Yi0·).
The first term is
n
X
i=1
a2
iVar ¯
Yi·=
n
X
i=1
a2
iVar
1
r
r
X
j=1
µ+τi+bj+ij
=1
r2
n
X
i=1
a2
i(rσ2
B+rσ2)
from part (a). For the covariance
E¯
Yi·=µ+τi,
and
E( ¯
Yi·¯
Yi0·)=E
µ+τi+1
rX
j
(bj+ij )
µ+τi0+1
rX
j
(bj+i0j)
= (µ+τi)(µ+τi0) + 1
r2E
X
j
(bj+ij )
X
j
(bj+i0j)
11-10 Solutions Manual for Statistical Inference
since the cross terms have expectation zero. Next, expanding the product in the second term
again gives all zero cross terms, and we have
E( ¯
Yi·¯
Yi0·) = (µ+τi)(µ+τi0) + 1
r2(rσ2
B),
and
Cov( ¯
Yi·,¯
Yi0·) = σ2
B/r.
Finally, this gives
Var n
X
i=1
ai¯
Yi·!=1
r2
n
X
i=1
a2
i(rσ2
B+rσ2)+2X
i>i0
aiai0σ2
B/r
=1
r"n
X
i=1
a2
iσ2+σ2
B(
n
X
i=1
ai)2#
=1
rσ2
n
X
i=1
a2
i
=1
r(σ2+σ2
B)(1 ρ)
n
X
i=1
a2
i,
where, in the third equality we used the fact that Piai= 0.
11.25 Differentiation yields
a.
c RSS = 2 P[yi(c+dxi)] (1) set
= 0 nc +dPxi=Pyi
d RSS = 2 P[yi(ci+dxi)] (xi)set
= 0 cPxi+dPx2
i=Pxiyi.
b. Note that nc +dPxi=Pyic= ¯yd¯x. Then
(¯yd¯x)Xxi+dXx2
i=Xxiyiand dXx2
in¯x2=XxiyiXxi¯y
which simplifies to d=Pxi(yi¯y)/P(xi¯x)2. Thus cand dare the least squares
estimates.
c. The second derivatives are
2
c2RSS = n, 2
c∂d RSS = Xxi,2
d2RSS = Xx2
i.
Thus the Jacobian of the second-order partials is
nPxi
PxiPx2
i
=nXx2
iXxi2=nX(xi¯x)2>0.
11.27 For the linear estimator PiaiYito be unbiased for αwe have
E X
i
aiYi!=X
i
ai(α+βxi) = αX
i
ai= 1 and X
i
aixi= 0.
Since Var PiaiYi=σ2Pia2
i, we need to solve:
minimize X
i
a2
isubject to X
i
ai= 1 and X
i
aixi= 0.
Second Edition 11-11
A solution can be found with Lagrange multipliers, but verifying that it is a minimum is
excruciating. So instead we note that
X
i
ai= 1 ai=1
n+k(bi¯
b),
for some constants k, b1, b2, . . . , bn, and
X
i
aixi= 0 k=¯x
Pi(bi¯
b)(xi¯x)and ai=1
n¯x(bi¯
b)
Pi(bi¯
b)(xi¯x).
Now
X
i
a2
i=X
i1
n¯x(bi¯
b)
Pi(bi¯
b)(xi¯x)2
=1
n+¯x2Pi(bi¯
b)2
[Pi(bi¯
b)(xi¯x)]2,
since the cross term is zero. So we need to minimize the last term. From Cauchy-Schwarz we
know that
Pi(bi¯
b)2
[Pi(bi¯
b)(xi¯x)]21
Pi(xi¯x)]2,
and the minimum is attained at bi=xi. Substituting back we get that the minimizing aiis
1
n¯x(xi¯x)
Pi(xi¯x)2, which results in PiaiYi=¯
Yˆ
β¯x, the least squares estimator.
11.28 To calculate
max
σ2L(σ2|y, ˆαˆ
β) = max
σ21
2πσ2n/2
e1
2Σi[yi(ˆα+ˆ
βxi)]22
take logs and differentiate with respect to σ2to get
d
2log L(σ2|y, ˆα, ˆ
β) = n
2σ2+1
2Pi[yi(ˆα+ˆ
βxi)]2
(σ2)2.
Set this equal to zero and solve for σ2. The solution is ˆσ2.
11.29 a.
i= E(Yiˆαˆ
βxi) = (α+βxi)αβxi= 0.
b.
Varˆi= E[Yiˆαˆ
βxi]2
= E[(Yiαβxi)(ˆαα)xi(ˆ
ββ)]2
= VarYi+ Varˆα+x2
iVar ˆ
β2Cov(Yi,ˆα)2xiCov(Yi,ˆ
β)+2xiCov(ˆα, ˆ
β).
11.30 a. Straightforward algebra shows
ˆα= ¯yˆ
β¯x
=X1
nyi¯xP(xi¯x)yi
P(xi¯x)2
=X1
n¯x(xi¯x)
P(xi¯x)2yi.
11-12 Solutions Manual for Statistical Inference
b. Note that for ci=1
n¯x(xi¯x)
P(xi¯x)2,Pci= 1 and Pcixi= 0. Then
Eˆα= E XciYi=Xci(α+βxi=α,
Varˆα=Xc2
iVarYi=σ2Xc2
i,
and
Xc2
i=X1
n¯x(xi¯x)
P(xi¯x)22
=X1
n2+P¯x2(xi¯x)2
(P(xi¯x)2)2(cross term = 0)
=1
n+¯x2
P(xi¯x)2=Px2
i
nSxx
.
c. Write ˆ
β=Pdiyi, where
di=xi¯x
P(xi¯x)2.
From Exercise 11.11,
Cov(ˆα, ˆ
β) = Cov XciYi,XdiYi=σ2Xcidi
=σ2X1
n¯x(xi¯x)
P(xi¯x)2(xi¯x)
P(xi¯x)2=σ2¯x
P(xi¯x)2.
11.31 The fact that
ˆi=X
i
[δij (cj+djxi)]Yj
follows directly from (11.3.27) and the definition of cjand dj. Since ˆα=PiciYi, from Lemma
11.3.2
Cov(ˆi,ˆα) = σ2X
j
cj[δij (cj+djxi)]
=σ2
ciX
j
cj(cj+djxi)
=σ2
ciX
j
c2
jxiX
j
cjdj
.
Substituting for cjand djgives
ci=1
n(xi¯x)¯x
Sxx
X
j
c2
j=1
n+¯x2
Sxx
xiX
j
cjdj=xi¯x
Sxx
,
and substituting these values shows Cov(ˆi,ˆα) = 0. Similarly, for ˆ
β,
Cov(ˆi,ˆ
β) = σ2
diX
j
cjdjxiX
j
d2
j
Second Edition 11-13
with
di=(xi¯x)
Sxx
X
j
cjdj=¯x
Sxx
xiX
j
d2
j=1
Sxx
,
and substituting these values shows Cov(ˆi,ˆ
β) = 0.
11.32 Write the models as
3yi=α+βxi+i
yi=α0+β0(xi¯x) + i
=α0+β0zi+i.
a. Since ¯z= 0,
ˆ
β=P(xi¯x)(yi¯y)
P(xi¯x)2=Pzi(yi¯y)
Pz2
i
=ˆ
β0.
b.
ˆα= ¯yˆ
β¯x,
ˆα0= ¯yˆ
β0¯z= ¯y
since ¯z= 0.
ˆα0n(α+β¯z, σ2/n) = n(α, σ2/n).
c. Write
ˆα0=X1
nyiˆ
β0=Xzi
Pz2
iyi.
Then
Cov(ˆα, ˆ
β) = σ2X1
nzi
Pz2
i= 0,
since Pzi= 0.
11.33 a. From (11.23.25), β=ρ(σYX), so β= 0 if and only if ρ= 0 (since we assume that the
variances are positive).
b. Start from the display following (11.3.35). We have
ˆ
β2
S2/Sxx
=S2
xy/Sxx
RSS/(n2)
= (n2) S2
xy
Syy S2
xy/SxxSxx
= (n2) S2
xy
SyySxx S2
xy,
and dividing top and bottom by Syy Sxx finishes the proof.
c. From (11.3.33) if ρ= 0 (equivalently β= 0), then ˆ
β/(S/Sxx) = n2r/1r2has a
tn2distribution.
11-14 Solutions Manual for Statistical Inference
11.34 a. ANOVA table for height data
Source df SS MS F
Regression 1 60.36 60.36 50.7
Residual 6 7.14 1.19
Total 7 67.50
The least squares line is ˆy= 35.18 + .93x.
b. Since yi¯y= (yiˆyi) + (ˆyi¯y), we just need to show that the cross term is zero.
n
X
i=1
(yiˆyi)(ˆyi¯y) =
n
X
i=1 hyi(ˆα+ˆ
βxi)ih(ˆα+ˆ
βxi)¯yi
=
n
X
i=1 h(ˆyi¯y)ˆ
β(xi¯x)ihˆ
β(xi¯x)i(ˆα= ¯yˆ
β¯x)
=ˆ
β
n
X
i=1
(xi¯x)(yi¯y)ˆ
β2
n
X
i=1
(xi¯x)2= 0,
from the definition of ˆ
β.
c.
X(ˆyi¯y)2=ˆ
β2X(xi¯x)2=S2
xy
Sxx
.
11.35 a. For the least squares estimate:
d
X
i
(yiθx2
i)2= 2 X
i
(yiθx2
i)x2
i= 0
which implies
ˆ
θ=Piyix2
i
Pix4
i
.
b. The log likelihood is
log L=n
2log(2πσ2)1
2σ2X
i
(yiθx2
i)2,
and maximizing this is the same as the minimization in part (a).
c. The derivatives of the log likelihood are
d
log L=1
σ2X
i
(yiθx2
i)x2
i
d2
2log L=1
σ2X
i
x4
i,
so the CRLB is σ2/Pix4
i. The variance of ˆ
θis
Varˆ
θ= Var Piyix2
i
Pix4
i=X
i x2
i
Pjx4
j!σ2=σ2/X
i
x4
i,
so ˆ
θis the best unbiased estimator.
Second Edition 11-15
11.36 a.
Eˆα= E( ¯
Yˆ
β¯
X)=EhE( ¯
Yˆ
β¯
X|¯
X)i= E α+β¯
Xβ¯
X= Eα=α.
Eˆ
β= E[E( ˆ
β|¯
X)] = Eβ=β.
b. Recall
VarY= Var[E(Y|X)] + E[Var(Y|X)]
Cov(Y , Z) = Cov[E(Y|X),E(Z|X)] + E[Cov(Y, Z|X)].
Thus
Varˆα= E[Var(ˆα|X)] = σ2EhXX2
i.SXX i
Var ˆ
β=σ2E[1/SXX ]
Cov(ˆα, ˆ
β) = E[Cov(ˆα, ˆ
β|ˆ
X)] = σ2E[ ¯
X/SXX ].
11.37 This is almost the same problem as Exercise 11.35. The log likelihood is
log L=n
2log(2πσ2)1
2σ2X
i
(yiβxi)2.
The MLE is Pixiyi/Pix2
i, with mean βand variance σ2/Pix2
i, the CRLB.
11.38 a. The model is yi=θxi+i, so the least squares estimate of θis Pxiyi/Px2
i(regression
through the origin).
EPxiYi
Px2
i=Pxi(xiθ)
Px2
i
=θ
Var PxiYi
Px2
i=Px2
i(xiθ)
(Px2
i)2=θPx3
i
(Px2
i)2.
The estimator is unbiased.
b. The likelihood function is
L(θ|x) =
n
Y
i=1
eθxi(θxi)yi
(yi)! =eθΣxiQ(θxi)yi
Qyi!
θ logL=
θ hθXxi+Xyilog(θxi)log Yyi!i
=Xxi+Xxiyi
θxi
set
= 0
which implies
ˆ
θ=Pyi
Pxi
Eˆ
θ=Pθxi
Pxi
=θand Varˆ
θ= Var Pyi
Pxi=Pθxi
(Pxi)2=θ
Pxi
.
c.
2
θ2log L=
θ Xxi+Pyi
θ=Pyi
θ2and E 2
θ2log L=Pxi
θ.
Thus, the CRLB is θ/ Pxi, and the MLE is the best unbiased estimator.
11-16 Solutions Manual for Statistical Inference
11.39 Let Aibe the set
Ai=
ˆα, ˆ
β:h(ˆα+ˆ
βx0i)(α+βx0i)i.
Ss1
n+(x0i¯x)2
Sxx
tn2,α/2m
.
Then P(m
i=1Ai) is the probability of simultaneous coverage, and using the Bonferroni In-
equality (1.2.10) we have
P(m
i=1Ai)
m
X
i=1
P(Ai)(m1) =
m
X
i=1 1α
m(m1) = 1 α.
11.41 Assume that we have observed data (y1, x1),(y2, x2), . . . , (yn1, xn1) and we have xnbut not
yn. Let φ(yi|xi) denote the density of Yi, a n(a+bxi, σ2).
a. The expected complete-data log likelihood is
E n
X
i=1
log φ(Yi|xi)!=
n1
X
i=1
log φ(yi|xi) + E log φ(Y|xn),
where the expectation is respect to the distribution φ(y|xn) with the current values of the
parameter estimates. Thus we need to evaluate
E log φ(Y|xn) = E 1
2log(2πσ2
1)1
2σ2
1
(Yµ1)2,
where Yn(µ0, σ2
0). We have
E(Yµ1)2= E([Yµ0]+[µ0µ1])2=σ2
0+ [µ0µ1]2,
since the cross term is zero. Putting this all together, the expected complete-data log
likelihood is
n
2log(2πσ2
1)1
2σ2
1
n1
X
i=1
[yi(a1+b1xi)]2σ2
0+ [(a0+b0xn)(a1+b1xn)]2
2σ2
1
=n
2log(2πσ2
1)1
2σ2
1
n
X
i=1
[yi(a1+b1xi)]2σ2
0
2σ2
1
if we define yn=a0+b0xn.
b. For fixed a0and b0, maximizing this likelihood gives the least squares estimates, while the
maximum with respect to σ2
1is
ˆσ2
1=Pn
i=1[yi(a1+b1xi)]2+σ2
0
n.
So the EM algorithm is the following: At iteration t, we have estimates ˆa(t),ˆ
b(t), and ˆσ2(t).
We then set y(t)
n= ˆa(t)+ˆ
b(t)xn(which is essentially the E-step) and then the M-step is
to calculate ˆa(t+1) and ˆ
b(t+1) as the least squares estimators using (y1, x1), (y2, x2), . . .
(yn1, xn1), (y(t)
n, xn), and
ˆσ2(t+1)
1=Pn
i=1[yi(a(t+1) +b(t+1)xi)]2+σ2(t)
0
n.
Second Edition 11-17
c. The EM calculations are simple here. Since y(t)
n= ˆa(t)+ˆ
b(t)xn, the estimates of aand b
must converge to the least squares estimates (since they minimize the sum of squares of
the observed data, and the last term adds nothing. For ˆσ2we have (substituting the least
squares estimates) the stationary point
ˆσ2=Pn
i=1[yia+ˆ
bxi)]2+ ˆσ2
nˆσ2=σ2
obs,
where σ2
obs is the MLE from the n1 observed data points. So the MLE s are the same as
those without the extra xn.
d. Now we use the bivariate normal density (see Definition 4.5.10 and Exercise 4.45 ). Denote
the density by φ(x, y). Then the expected complete-data log likelihood is
n1
X
i=1
log φ(xi, yi) + E log φ(X, yn),
where after iteration tthe missing data density is the conditional density of Xgiven Y=yn,
X|Y=ynnµ(t)
X+ρ(t)(σ(t)
X(t)
Y)(ynµ(t)
Y),(1 ρ2(t))σ2(t)
X.
Denoting the mean by µ0and the variance by σ2
0, the expected value of the last piece in
the likelihood is
E log φ(X, yn)
=1
2log(2πσ2
Xσ2
Y(1 ρ2))
1
2(1 ρ2)"EXµX
σX2
2ρE(XµX)(ynµY)
σXσY+ynµY
σY2#
=1
2log(2πσ2
Xσ2
Y(1 ρ2))
1
2(1 ρ2)"σ2
0
σ2
X
+µ0µX
σX2
2ρ(µ0µX)(ynµY)
σXσY+ynµY
σY2#.
So the expected complete-data log likelihood is
n1
X
i=1
log φ(xi, yi) + log φ(µ0, yn)σ2
0
2(1 ρ2)σ2
X
.
The EM algorithm is similar to the previous one. First note that the MLEs of µYand σ2
Y
are the usual ones, ¯yand ˆσ2
Y, and don’t change with the iterations. We update the other
estimates as follows. At iteration t, the E-step consists of replacing x(t)
nby
x(t+1)
n= ˆµ(t)
X+ρ(t)σ(t)
X
σ(t)
Y
(yn¯y).
Then µ(t+1)
X= ¯xand we can write the likelihood as
1
2log(2πσ2
Xˆσ2
Y(1 ρ2)) 1
2(1 ρ2)Sxx +σ2
0
σ2
X2ρSxy
σXˆσY
+Syy
ˆσ2
Y.
11-18 Solutions Manual for Statistical Inference
which is the usual bivariate normal likelihood except that we replace Sxx with Sxx +σ2
0.
So the MLEs are the usual ones, and the EM iterations are
x(t+1)
n= ˆµ(t)
X+ρ(t)σ(t)
X
σ(t)
Y
(yn¯y)
ˆµ(t+1)
X= ¯x(t)
ˆσ2(t+1)
X=S(t)
xx + (1 ˆρ2(t))ˆσ2(t)
X
n
ˆρ(t+1) =S(t)
xy
q(S(t)
xx + (1 ˆρ2(t))ˆσ2(t+1)
X)Syy
.
Here is R code for the EM algorithm:
nsim<-20;
xdata0<-c(20,19.6,19.6,19.4,18.4,19,19,18.3,18.2,18.6,19.2,18.2,
18.7,18.5,18,17.4,16.5,17.2,17.3,17.8,17.3,18.4,16.9)
ydata0<-(1,1.2,1.1,1.4,2.3,1.7,1.7,2.4,2.1,2.1,1.2,2.3,1.9,2.4,2.6,
2.9,4,3.3,3,3.4,2.9,1.9,3.9,4.2)
nx<-length(xdata0);
ny<-length(ydata0);
#initial values from mles on the observed data#
xmean<-18.24167;xvar<-0.9597797;ymean<-2.370833;yvar<- 0.8312327;
rho<- -0.9700159;
for (j in 1:nsim) {
#This is the augmented x (O2) data#
xdata<-c(xdata0,xmean+rho*(4.2-ymean)/(sqrt(xvar*yvar)))
xmean<-mean(xdata);
Sxx<-(ny-1)*var(xdata)+(1-rho^2)*xvar
xvar<-Sxx/ny
rho<-cor(xdata,ydata0)*sqrt((ny-1)*var(xdata)/Sxx)
}
The algorithm converges very quickly. The MLEs are
ˆµX= 18.24,ˆµY= 2.37,ˆσ2
X=.969,ˆσ2
Y=.831,ˆρ=0.969.
Chapter 12
Regression Models
12.1 The point (ˆx0,ˆy0) is the closest if it lies on the vertex of the right triangle with vertices (x0, y0)
and (x0, a +bx0). By the Pythagorean theorem, we must have
h(ˆx0x0)2+ˆy0(a+bx0)2i+h(ˆx0x0)2+(ˆy0y0)2i= (x0x0)2+ (y0(a+bx0))2.
Substituting the values of ˆx0and ˆy0from (12.2.7) we obtain for the LHS above
"b(y0bx0a)
1+b22
+b2(y0bx0a)
1+b22#+"b(y0bx0a)
1+b22
+y0bxa)
1+b22#
= (y0(a+bx0))2"b2+b4+b2+1
(1+b2)2#= (y0(a+bx0))2.
12.3 a. Differentiation yields f /∂ξi=2(xiξi)2λβ [yi(α+βξi)] set
= 0 ξi(1 + λβ2) =
xiλβ(yiα), which is the required solution. Also, 2f/∂ξ2= 2(1 + λβ2)>0, so this is a
minimum.
b. Parts i), ii), and iii) are immediate. For iv) just note that Dis Euclidean distance between
(x1,λy1) and (x2,λy2), hence satisfies the triangle inequality.
12.5 Differentiate log L, for Lin (12.2.17), to get
σ2
δ
log L=n
σ2
δ
+1
2(σ2
δ)2
λ
1+ ˆ
β2
n
X
i=1 hyi(ˆα+ˆ
βxi)i2.
Set this equal to zero and solve for σ2
δ. The answer is (12.2.18).
12.7 a. Suppressing the subscript iand the minus sign, the exponent is
(xξ)2
σ2
δ
+[y(α+βξ)]2
σ2
=σ2
+β2σ2
δ
σ2
σ2
δ(ξk)2+[y(α+βx)]2
σ2
+β2σ2
δ
,
where k=σ2
x+σ2
δβ(yα)
σ2
+β2σ2
δ
. Thus, integrating with respect to ξeliminates the first term.
b. The resulting function must be the joint pdf of Xand Y. The double integral is infinite,
however.
12.9 a. From the last two equations in (12.2.19),
ˆσ2
δ=1
nSxx ˆσ2
ξ=1
nSxx 1
n
Sxy
ˆ
β,
which is positive only if Sxx > Sxy/ˆ
β. Similarly,
ˆσ2
=1
nSyy ˆ
β2ˆσ2
ξ=1
nSyy ˆ
β21
n
Sxy
ˆ
β,
which is positive only if Syy >ˆ
βSxy.
12-2 Solutions Manual for Statistical Inference
b. We have from part a), ˆσ2
δ>0Sxx > Sxy/ˆ
βand ˆσ2
>0Syy >ˆ
βSxy. Furthermore,
ˆσ2
ξ>0 implies that Sxy and ˆ
βhave the same sign. Thus Sxx >|Sxy|/|ˆ
β|and Syy >|ˆ
β||Sxy|.
Combining yields
|Sxy|
Sxx
<
ˆ
β<Syy
|Sxy|.
12.11 a.
Cov(aY +bX, cY +dX)
= E(aY +bX)(cY +dX)E(aY +bX)E(cY +dX)
= E acY 2+(bc +ad)XY +bdX2E(aY +bX)E(cY +dX)
=acVarY+ac(EY)2+ (bc +ad)Cov(X, Y )
+(bc +ad)EXEY+bdVarX+bd(EX)2E(aY +bX)E(cY +dX)
=acVarY+ (bc +ad)Cov(X, Y ) + bdVarX.
b. Identify a=βλ,b= 1, c= 1, d=β, and using (12.3.19)
Cov(βλYi+Xi, YiβXi) = βλVarY+ (1 λβ2)Cov(X, Y )βVarX
=βλ σ2
+β2σ2
ξ+ (1 λβ2)βσ2
ξβσ2
δ+σ2
ξ
=βλσ2
βσ2
δ= 0
if λσ2
=σ2
δ. (Note that we did not need the normality assumption, just the moments.)
c. Let Wi=βλYi+Xi,Vi=Yi+βXi. Exercise 11.33 shows that if Cov(Wi, Vi) = 0,
then n2r/1r2has a tn2distribution. Thus n2rλ(β)/p1r2
λ(β) has a tn2
distribution for all values of β, by part (b). Also
P (β:(n2)r2
λ(β)
1r2
λ(β)F1,n2)!=P(X, Y ): (n2)r2
λ(β)
1r2
λ(β)F1,n2= 1 α.
12.13 a. Rewrite (12.2.22) to get
β:ˆ
βtˆσβ
n2βˆ
β+tˆσβ
n2=
β:(ˆ
ββ)2
σ2
β.(n2) F
.
b. For ˆ
βof (12.2.16), the numerator of rλ(β) in (12.2.22) can be written
βλSyy+(1β2λ)Sxy βSxy =β2(λSxy ) + β(Sxx λSyy) + Sxy =λSxy(βˆ
β)β+1
λˆ
β.
Again from (12.2.22), we have
r2
λ(β)
1r2
λ(β)
=βλSyy+(1β2λ)Sxy βSxy 2
(β2λ2Syy +2βλSxy+Sxx) (Syy2βSxy+β2Sxx)(βλSyy+(1β2λ)Sxy βSxx)2,
and a great deal of straightforward (but tedious) algebra will show that the denominator
of this expression is equal to
(1 + λβ2)2SyySxx S2
xy.
Second Edition 12-3
Thus
r2
λ(β)
1r2
λ(β)=y
λ2S2
xy βˆ
β2β+1
λˆ
β2
(1λβ2)2SyySxS2
xy
=βˆ
β2
ˆσ2
β 1+λβ ˆ
β
1+λβ2!2(1 + λˆ
β2)2S2
xy
ˆ
β2h(Sxx λSyy)2+ 4λS2
xyi,
after substituting ˆσ2
βfrom page 588. Now using the fact that ˆ
βand 1ˆ
βare both roots
of the same quadratic equation, we have
(1+λˆ
β2)2
ˆ
β2=1
ˆ
β+λˆ
β2
=(SxxλSyy)2+4λS2
xy
S2
xy
.
Thus the expression in square brackets is equal to 1.
12.15 a.
π(α/β) = eα+β(α/β)
1 + eα+β(α/β)=e0
1 + e0=1
2.
b.
π((α/β) + c) = eα+β((α/β)+c)
1 + eα+β((α/β)+c)=eβc
1 + eβc ,
and
1π((α/β)c) = 1 eβc
1 + eβc =eβc
1 + eβc .
c.
d
dxπ(x) = βeα+βx
[1 + eα+βx]2=βπ(x)(1 π(x)).
d. Because π(x)
1π(x)=eα+βx,
the result follows from direct substitution.
e. Follows directly from (d).
f. Follows directly from
α F(α+βx) = f(α+βx) and
β F(α+βx) = xf(α+βx).
g. For F(x) = ex/(1 + ex), f(x) = F(x)(1 F(x)) and the result follows. For F(x) = π(x) of
(12.3.2), from part (c) if follows that f
F(1F)=β.
12.17 a. The likelihood equations and solution are the same as in Example 12.3.1 with the exception
that here π(xj) = Φ(α+βxj), where Φ is the cdf of a standard normal.
b. If the 0 1 failure response in denoted “oring” and the temperature data is “temp”, the
following Rcode will generate the logit and probit regression:
summary(glm(oring~temp, family=binomial(link=logit)))
summary(glm(oring~temp, family=binomial(link=probit)))
12-4 Solutions Manual for Statistical Inference
For the logit model we have
Estimate Std. Error z value P r(>|z|)
Intercept 15.0429 7.3719 2.041 0.0413
temp 0.2322 0.1081 2.147 0.0318
and for the probit model we have
Estimate Std. Error z value P r(>|z|)
Intercept 8.77084 3.86222 2.271 0.0232
temp 0.13504 0.05632 2.398 0.0165
Although the coefficients are different, the fit is qualitatively the same, and the probability
of failure at 31, using the probit model, is .9999.
12.19 a. Using the notation of Example 12.3.1, the likelihood (joint density) is
J
Y
j=1 eα+βxj
1 + eα+βxjy
j1
1 + eα+βxjnjy
j
=
J
Y
j=1 1
1 + eα+βxjnj
eαPjy
j+βPjxjy
j.
By the Factorization Theorem, Pjy
jand Pjxjy
jare sufficient.
b. Straightforward substitution.
12.21 Since d
log(π/(1 π)) = 1/(π(1 π)),
Var log ˆπ
1ˆπ1
π(1 π)2π(1 π)
n=1
(1 π)
12.23 a. If Pai= 0,
EX
i
aiYi=X
i
ai[α+βxi+µ(1 δ)] = βX
i
aixi=β
for ai=xi¯x.
b.
E( ¯
Yβ¯x) = 1
nX
i
[α+βxi+µ(1 δ)] β¯x=α+µ(1 δ),
so the least squares estimate ais unbiased in the model Yi=α0+βxi+i, where α0=
α+µ(1 δ).
12.25 a. The least absolute deviation line minimizes
|y1(c+dx1)|+|y2(c+dx1)|+|y3(c+dx3)|.
Any line that lies between (x1, y1) and (x1, y2) has the same value for the sum of the first
two terms, and this value is smaller than that of any line outside of (x1, y1) and (x2, y2).
Of all the lines that lie inside, the ones that go through (x3, y3) minimize the entire sum.
b. For the least squares line, a=53.88 and b=.53. Any line with bbetween (17.914.4)/9 =
.39 and (17.911.9)/9 = .67 and a= 17.9136bis a least absolute deviation line.
12.27 In the terminology of M-estimators (see the argument on pages 485 486), ˆ
βLis consistent
for the β0that satisfies Eβ0Piψ(Yiβ0xi) = 0, so we must take the “true” βto be this
value. We then see that X
i
ψ(Yiˆ
βLxi)0
as long as the derivative term is bounded, which we assume is so.
Second Edition 12-5
12.29 The argument for the median is a special case of Example 12.4.3, where we take xi= 1
so σ2
x= 1. The asymptotic distribution is given in (12.4.5) which, for σ2
x= 1, agrees with
Example 10.2.3.
12.31 The LAD estimates, from Example 12.4.2 are ˜α= 18.59 and ˜
β=.89. Here is Mathematica
code to bootstrap the standard deviations. (Mathematica is probably not the best choice here,
as it is somewhat slow. Also, the minimization seemed a bit delicate, and worked better when
done iteratively.) Sad is the sum of the absolute deviations, which is minimized iteratively
in bmin and amin. The residuals are bootstrapped by generating random indices ufrom the
discrete uniform distribution on the integers 1 to 23.
1.First enter data and initialize
Needs["Statistics‘Master‘"]
Clear[a,b,r,u]
a0=18.59;b0=-.89;aboot=a0;bboot=b0;
y0={1,1.2,1.1,1.4,2.3,1.7,1.7,2.4,2.1,2.1,1.2,2.3,1.9,2.4,
2.6,2.9,4,3.3,3,3.4,2.9,1.9,3.9};
x0={20,19.6,19.6,19.4,18.4,19,19,18.3,18.2,18.6,19.2,18.2,
18.7,18.5,18,17.4,16.5,17.2,17.3,17.8,17.3,18.4,16.9};
model=a0+b0*x0;
r=y0-model;
u:=Random[DiscreteUniformDistribution[23]]
Sad[a_,b_]:=Mean[Abs[model+rstar-(a+b*x0)]]
bmin[a_]:=FindMinimum[Sad[a,b],{b,{.5,1.5}}]
amin:=FindMinimum[Sad[a,b/.bmin[a][[2]]],{a,{16,19}}]
2.Here is the actual bootstrap. The vectors aboot and bboot contain the bootstrapped values.
B=500;
Do[
rstar=Table[r[[u]],{i,1,23}];
astar=a/.amin[[2]];
bstar=b/.bmin[astar][[2]];
aboot=Flatten[{aboot,astar}];
bboot=Flatten[{bboot,bstar}],
{i,1,B}]
3.Summary Statistics
Mean[aboot]
StandardDeviation[aboot]
Mean[bboot]
StandardDeviation[bboot]
4.The results are Intercept: Mean 18.66, SD .923 Slope: Mean .893, SD .050.

Navigation menu